Automatic License Plate Recognition Using Yolov8: Submitted by
Automatic License Plate Recognition Using Yolov8: Submitted by
Project Report Submitted in Partial Fulfilment of the Requirements for the Degree of
Submitted by
This is to certify that the report entitled Automatic License Plate Recognition
using YOLOv8” is a bonafide record of the Project done by Raushan Kumar (Roll
No.: 2022PGCACA016) under my supervision, in partial fulfillment of the
requirements for the award of thedegree of Masters in Computers Applications in
Computer Science and Engineering from National Institute of Technology
Jamshedpur.
Department seal
ii
DECLARATION
I certify that the work contained in this report is original and has been done by us
under the guidance of my supervisor(s). The work has not been submitted to any
other Institute for any degree. I have followed the guidelines provided by the
Institute in preparing the report. I have conformed to the norms and guidelines given
in the Ethical Code of Conduct of the Institute. whenever I have used materials
(data, theoretical analysis, figures, and text) from other sources, I have given due
credit to them by citing them in the text of the report and giving their details in the
references. Further, I have taken permission from the copyright owners of the
sources, whenever necessary.
ii
ABSTRACT
This project, “ALPR: Automatic License Plate Recognition,” presents an intelligent system capable
of automatically detecting and recognizing vehicle license plates from images. Utilizing YOLOv8 (You
Only Look Once version 8) for real-time object detection and EasyOCR for accurate text recognition,
the application streamlines the process of license plate identification. Users simply upload an image of a
vehicle (e.g., a car), and the system automatically detects the license plate, extracts the characters, and
returns the plate number in text format. This recognized text is then securely stored for future use or
analysis.
Technology is playing a crucial role in transforming various sectors, and transportation management is
no exception. With the rapidly increasing number of vehicles globally, challenges such as traffic
congestion and unorganized parking have become common. In many regions, especially in developing
countries, there is no proper infrastructure for parking and traffic monitoring. Vehicles are often parked
The integration of ALPR technology can be a game-changer in creating smart traffic and parking
solutions. By automating license plate recognition, authorities can better monitor vehicle movement,
enforce parking regulations, and enhance security. This project represents a foundational step toward
implementing smarter, more efficient urban mobility systems through the use of computer vision and AI
Automated Surveillance.
iii
Table of Contents
ABSTRACT ...................................................................................................... ii
LIST OF ABBREVIATIONS .......................................................................... v
1 INTRODUCTION ...................................................................................... vi
1.1 Purpose
1.2 Features
1.3 Technologies Used
1.4 Execution Workflow
1.5 Specific Goals
1.6 Expected Outcomes
2. LITERATURE REVIEW ......................................................................... xii
2.1 Object Detection Techniques
iv
LIST OF ABBREVIATIONS
List of Abbreviations shall be included only if there are more than 3 abbreviations
used in the report.
When it appears for the first time in the text of the report, the expanded form shall
be given with the abbreviation in the parenthesis. For example: “Flexible
Manufacturing Systems (FMS) are extensively used …”
v
1. INTRODUCTION
Majority of road accidents or rule violations such as speeding, ignoring traffic signals, neglecting safety
equipment in vehicles (such as seatbelts and helmets), and improper overtaking occur due to distraction,
lack of concentration, or unawareness. In rural areas, these situations often arise mainly due to a lack of
awareness about the consequences of violating traffic rules. Road safety and driver behavior while
driving are largely influenced by the efficiency of the fine management system, as one of the most
effective ways to shape human behavior is by imposing fines immediately when rules are broken.
Our application, FineScan Pro, primarily focuses on utilizing an object detection module, YOLOv8.
YOLOv8 is a state-of-the-art framework designed to detect objects within images or video frames.
YOLO has a wide range of applications in autonomous vehicles and image recognition. It surpasses
conventional image detection systems by dividing an image into multiple grids and processing the entire
grid in a single pass. This method delivers exceptional performance in real-time applications, where
high accuracy and fast processing are crucial.
FineScan Pro is not just a reporting tool but a comprehensive system that promotes accountability and
transparency in traffic management. The application is further enhanced with features such as real-time
location tracking, payment gateway integration, and a fine appeals system, making it a holistic
solution for traffic violation management.
As the world shifts toward smart solutions for everyday challenges, FineScan Pro stands as a testament
to the power of technology in enhancing public services. This project explores the intersection of mobile
application development, machine learning, and cloud hosting to build a robust, user-friendly, and
efficient system for managing and reducing traffic violations.
Manual vehicle identification is slow, error-prone, and inefficient for handling the growing number of
vehicles on roads. There is a need for an automated system that can accurately detect and recognize
license plates from images or video in real-time. This project aims to develop an Automatic License
Plate Recognition (ALPR) system using YOLOv8 to improve traffic monitoring, rule enforcement, and
vehicle tracking efficie
vi
1.2 Purpose
The main purpose of building this project is to create a smart technology for future. We can see
number of cars are increasing day by day. It can create issues of accidents, crimes & traffic.
Now our plan is that, we need a one smart system which can detect car number plate and
recognize the number automatically. So, with this feature we can improve our traffic & parking
solutions. Even we can prevent the crime also. It can automate the traffic & parking
management in very smarter way. So, in this we are implementing our first step of Number
Plate Recognition.
1.2 Features
Uses YOLOv8 for fast and accurate detection of vehicle license plates from images.
License Plate Text Recognition
Utilizes OCR (like EasyOCR or PyTesseract) to extract and convert license plate text from
detected regions.
Data Augmentation:
The model incorporates data augmentation techniques (such as rotation and scaling) to enhance
generalization and improve performance across diverse datasets.
Real-Time Processing
Capable of detecting and recognizing plates from live video or real-time camera feed (if
applicable).
Image Upload Support
Users can upload images of vehicles to detect and recognize license plates automatically.
High Accuracy and Speed
Achieves real-time detection with optimized accuracy using YOLOv8’s deep learning
architecture
vii
1.3 Technologies Used
Programming Language: Python
Description:
Python is employed as the primary programming language for developing the Brain Tumor
Detection project. Renowned for its simplicity and flexibility, Python provides a robust
environment for implementing machine learning and deep learning models. Its extensive
library ecosystem ensures efficient handling of data processing, image manipulation, and
neural network implementation.
Significance:
Python’s support for machine learning frameworks and libraries like TensorFlow, Keras,
and OpenCV accelerates the development of complex models. Its clear syntax and strong
community support make it ideal for prototyping and building production-ready
applications.
OpenCV is utilized for preprocessing the MRI images before they are fed into the CNN
model. Tasks such as image resizing, normalization, thresholding, and noise reduction
using Gaussian blur are performed using OpenCV. These operations ensure that the input
images are clean and uniformly formatted for accurate model predictions.
Significance:
By enhancing the quality of input images, OpenCV plays a critical role in increasing the
viii
CNN’s ability to detect patterns, ultimately improving classification accuracy
Pytesseract: Pytesseract is an Optical Character Recognition (OCR) tool used to extract text from the detected
license plates. After the license plate is localized by YOLOv8, Pytesseract helps recognize and extract the
alphanumeric characters on the plate, providing a full License Plate Recognition (LPR) system when combined
with YOLOv8 for detectio
ix
3. Google Colab
Description:
Google Colab offers a cloud-based platform for running Python code and Jupyter
notebooks. With free access to GPUs and TPUs, it enables efficient training of
computationally intensive CNN models. Google Colab supports seamless integration with
libraries like TensorFlow and OpenCV and provides an interactive environment for
experimenting with deep learning workflows.
Significance:
The ability to access powerful hardware without local infrastructure makes Google Colab
an invaluable resource for researchers and developers. Its collaborative features and real-
time sharing options enhance productivity and experimentation.
x
Centralized storage enhances collaboration among team members and ensures that data and
models remain accessible and organized. It also acts as a backup to prevent data loss during
development.
xi
labeled with bounding box coordinates. The dataset should represent a variety of
vehicle types, license plate designs, and environmental conditions (e.g., lighting,
camera angles, and background clutter) to ensure the model generalizes well. The
images can be collected from public datasets, or custom data may be collected using
cameras in various real-world settings like highways or parking lots. It is essential to
have a large and diverse dataset to train the YOLOv8 model effectively.
• Preprocessing:
Once the dataset is collected, it is preprocessed to ensure the data is in a suitable format for
model training. Image resizing is done to standardize the input size, ensuring that all images
have consistent dimensions. Normalization is applied to adjust the pixel values, usually
scaling them to a range between 0 and 1 to improve training efficiency. Data augmentation
techniques are then employed, including random rotations, flips, scaling, and brightness
adjustments, to artificially expand the dataset. Augmentation helps the model become more
robust by introducing variability in the images, making it more capable of handling various
real-world conditions such as varying light levels, plate designs, and vehicle angles.
• Model Training:
In this step, the YOLOv8 model is trained on the preprocessed dataset. YOLOv8 is a deep
learning object detection model known for its speed and accuracy, making it ideal for real-
time applications like license plate detection. The model is trained using PyTorch, leveraging
its capabilities to perform efficient tensor operations and dynamic graph computation. During
training, the model learns to identify license plates by adjusting its internal weights through
backpropagation and optimization techniques. Hyperparameters such as the learning rate,
batch size, and number of epochs are fine-tuned to achieve the best results. The training
process involves the model iterating over the dataset multiple times, progressively improving
its ability to detect license plates.
• Evaluation:
After training, the model’s performance is evaluated using various metrics to ensure its
accuracy and effectiveness. Precision measures how many of the detected license plates were
correct, while recall assesses how many actual plates were detected by the model. mAP
(mean Average Precision) is a more comprehensive metric that considers both precision and
recall and is commonly used in object detection tasks to evaluate the overall performance of
the model. These metrics provide insights into the strengths and weaknesses of the model,
helping identify areas for improvement and guiding further fine-tuning of the model to
achieve optimal detection accuracy.
• Testing:
Following evaluation, the trained model is tested on a separate set of test images that were
not included in the training or validation sets. The goal of testing is to verify the model’s
ability to generalize to new, unseen data. The model’s detection capabilities are tested under
real-world conditions to ensure it performs well in diverse scenarios such as different
lighting conditions, varying angles, and occlusions. The results from testing help confirm
whether the model is ready for deployment or if additional improvements are necessary.
• Deployment:
Once the model achieves satisfactory performance on test images, it is ready for deployment
in real-time applications. The trained YOLOv8 model is integrated into a software system that
xii
can detect license plates in live video feeds or images from traffic cameras, parking lot
surveillance, or toll booths. Real-time detection is crucial for applications like automatic toll
collection and parking management, where the system must identify license plates with
minimal delay. The deployment step may involve optimizing the model for speed and
efficiency, ensuring that it can handle a high volume of images or video streams without
significant lag.
Expected Outcomes
The License Plate Detection System built using YOLOv8 and Pytesseract is expected to achieve several key
outcomes that align with the objectives of this project. These outcomes will demonstrate the system’s
effectiveness, scalability, and potential for real-world applications. The expected outcomes are as follows:
• High Accuracy in License Plate Detection:
The system should be capable of accurately detecting license plates in a wide variety of vehicle images.
This includes handling images with varying conditions, such as different lighting, angles, and vehicle
types. With the YOLOv8 model's capability for fast and precise object detection, the expected outcome
is a high detection rate of license plates across test datasets, as well as a low false-positive rate.
• Robustness to Environmental Variability:
Thanks to the application of data augmentation techniques, the model is expected to be highly robust to
environmental changes. Whether it’s day or night, sunny or rainy, the model should be able to detect
license plates under different lighting conditions, in cluttered backgrounds, and from various camera
angles. This would make the system reliable in real-world conditions, where such variables are
common.
• Real-Time License Plate Recognition (LPR):
One of the primary outcomes is that the model should be optimized for real-time performance. This will
ensure that the system can process live video feeds or images quickly enough for applications such as
toll collection or parking lot management. Real-time processing would allow the system to detect and
recognize license plates without significant delays, providing immediate results for use in automated
systems.
• Effective Optical Character Recognition (OCR):
After detecting the license plate, the Pytesseract OCR tool will be used to extract the characters on the
plate. The expected outcome is that the system should reliably extract the alphanumeric characters from
the plate and convert them into readable text. This feature will be essential for real-world applications
that require both detection and recognition of the license plate number, such as vehicle tracking, toll
processing, or access control in gated areas.
• Comprehensive Evaluation Metrics:
The system’s effectiveness will be measured using evaluation metrics such as precision, recall, and mAP
xiv
(mean Average Precision). The expected outcome is that the model should achieve high scores in these
metrics, demonstrating its ability to both correctly detect license plates and minimize false detections.
Precision will ensure that the detections are mostly accurate, recall will confirm that the model is
detecting as many license plates as possible, and mAP will provide an overall assessment of the
system’s performance.
• Scalability and Flexibility:
Another key outcome is that the model will be flexible enough to adapt to a variety of use cases and
environments. It should work seamlessly on different datasets, such as urban or rural environments, with
a range of vehicle types, license plate designs, and weather conditions. This scalability will make the
system suitable for integration into larger automated systems, including toll booths, parking
management systems, and surveillance systems, where it can handle high volumes of vehicle data.
• Deployment for Real-World Applications:
The final expected outcome is that the system will be ready for deployment in real-world environments.
By integrating the trained YOLOv8 model into an application for real-time license plate detection, the
system should be able to be easily deployed in environments such as highways, parking lots, or at toll
gates. This would streamline operations like automated toll collection, access control for parking lots,
and monitoring for security purposes.
• Improved Model Performance Over Time:
As the system is deployed and used in different environments, the model is expected to improve over
time through continuous training and fine-tuning. This would be an important outcome, as the system
would become more effective in recognizing license plates across a wider variety of scenarios and
vehicle type
xv
• 3. SYSTEM REQUIREMENT
Hardware Requirements:
Component Minimum Requirement Recommended Requirement
Graphics Card NVIDIA GPU with 4GB VRAM NVIDIA RTX 2060/3060 or
(GPU) (e.g., GTX 1050 Ti) higher with CUDA support
Software Requirements
:
Software Component Description
Operating System Windows 10 / Linux (Ubuntu 20.04 recommended) / macOS
Python Version 3.8 or above
PyTorch Deep learning framework for training YOLOv8
Ultralytics YOLOv8 YOLOv8 model for object detection (pip install ultralytics)
OpenCV For image processing (pip install opencv-python)
Pytesseract OCR library to extract text from detected license plates
Tesseract-OCR Engine Backend OCR engine (needs to be installed separately)
Jupyter Notebook / Google Colab For development, training, and experimentation
Required for GPU acceleration with
CUDA Toolkit (if using GPU)
PyTorch
xvi
xvii
xviii
2. CONVOLUTIONAL NEURAL NETWORK (CNN)
The basic idea of Convolutional Neural Network was introduced by Kunihiko Fukushima in
1980s. Convolutional Neural Networks (ConvNets or CNNs) are a category of Neural
Networks that have proven very effective in areas such as image recognition and
classification. Computer vision techniques are dominated by convolutional neural networks
because of their accuracy in image classification. CNN is a class of deep, feed-forward
artificial neural networks (where connections between nodes do not form a cycle) & use a
variation of multi-layer perceptrons designed to require minimal pre-processing. ConvNet
architectures make the explicit assumption that the inputs are images, which allows us to
encode certain properties into the architecture. These then make the forward function more
efficient to implement and vastly reduce the number of parameters in the network. ConvNets
are made up of neurons that have learnable weights and biases. Each neuron receives some
inputs, performs a dot product and optionally follows it with a non-linearity.
The whole network still expresses a single differentiable score function: from the raw image
pixels on one end to class scores at the other. And they have a loss function on the last layer.
xix
In the above, in figure 5, the left side represents a regular three Layer neural network. On
xx
the other hand, the right side of the figure represents a CNN which arranges its neurons in
three dimensions (width, height, depth).
Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron
activations. In this example, the red input layer holds the image, so its width and height
would be the dimensions of the image, and the depth would be three (Red, Green, Blue
channels).
There are three elements that enter into the convolution operation:
• Input image: It is the image that is given as an input.
• Feature detector: The feature detector is often referred to as a “kernel" or a “ filter".
Sometimes a 3*3 or a 5*5 or a 7*7 matrix is used as a feature detector.
• Feature map: The feature map is also known as an activation map. It is called feature
map because it is also a mapping of where a certain kind of feature is found in the
image.
xxii
Number of Strides = S
Amount of Zero Padding = P
Then the output image size will be:
The iteration of forward pass and back-propagation will continue until the network
finds the desired output.
• Convolutional Layer: The Convolutional layer is the core block of the Convolutional
Neural Network. It has some special properties. It does most of the computational
heavy lifting.The CONV layer’s parameters consist of a set of learn-able filters. Every
filter is small spatially (along width and height), but extends through the full depth
of the input volume.
For example, a typical 3X3 filter on a first layer of a ConvNet might have size 5*5*3
(i.e. 5 pixels width and height, and 3 because images have depth 3, the color channels).
During the forward pass, each filter is convolved across the width and height of
the input volume and dot products are computed between the entries of the filter and
the input at any position. As the filters are slided over the width and height of the input
volume, a 2-dimensional activation map will be produced that gives the responses
of that filter at every spatial position. Intuitively, the network will learn filters that activate
when they see some type of visual feature such as an edge of some orientation.
These activation maps are stacked along the depth dimension and the output volume
is produced.
xxiii
Figure 7: Convolution Operation of CNN
input image. When the stride is 1, then we move the filters one
2*2 filter moves along the input size of 4*4 through width and
xxiv
stride is 1.
– Zero Padding: Sometimes in the input layer, we pad the input image with zero
that is called zero-padding. Zero padding allows us to control the size of the
input layer. If we don’t use zero-padding, sometimes some property from the
edges can be lost.
Figure 10 illustrates the scenario of zero-padding of an input.
The task of pooling is done by is done by summarizing the sub-regions of the input
using some methods like taking the average or the maximum or the minimum value of
the sub-regions only. These methods are called pooling functions.
xxvi
We have used Max Pooling to reduce the computational cost.
– Pooling Layer Arithmetic:
Pooling layer works by sliding the window or filter
across the input.
Let, Spatial Extent = f
Stride = s
window size = w
The equation shows that the output size from the pooling layer will be,
• Fully-Connected Layer:
In fully connected layer, every neuron is connected to its previous layer neuron like
the neural network. Its activation is also computed by matrix multiplication with its
weight followed by bias as like neural network. Usually, fully connected layer is a
column vector.
xxvii
Figure 12: Before(left) and after(right) applying activation function
xxix
– Tanh/Hyperbolic Tangent:
This function is like the sigmoid function. It bounds all real numbers to the range [-1,
1]. The tanh function is mainly used classification between two classes.
xxxi
It introduces the Dying ReLU problem. When inputs approach zero, or are negative,
the gradient of the function becomes zero, the network cannot perform back-
propagation and cannot learn.
– Leaky ReLU:
It is an improved version of ReLU. It solves the Dying problem of ReLU. For a
positive number, it works like ReLU, and for a negative number, the number is
multiplied by a very small number (i.e. 0.001). The mathematical representation of
this function is:
Characteristics:
• Smooth and Differentiable: Like Swish, Mish is also smooth and avoids abrupt transitions in
gradient flow.
• Non-monotonic: Mish allows negative values, unlike ReLU, enabling richer information
propagation.
• Stronger Regularization: Mish tends to act as a natural regularizer, preventing overfitting and
enhancing generalization.
•
Benefits in Deep Learning:
• Outperforms ReLU, Swish, and other functions in object detection and image classification
benchmarks.
• More stable during training and capable of deeper network convergence.
• Used in recent versions of YOLO (including YOLOv4 and YOLOv8), where it improves detection
accuracy and model robustness.
xxxiii
Basic Terminology in yolo
Bounding Box:
A bounding box in essence, is a rectangle that surrounds an object, that specifies its position, class(eg: car,
person) and confidence (how likely it is to be at that location).
Bounding box specified with respect to its top left and Bounding box specified with respect to its center
bottom right points coordinates
xxxiv
IoU
IOU (Intersection over Union) is a term used to describe the extent of overlap of two boxes. The greater the
region of overlap, the greater the IOU.
If IoU is close to 1 then we can say that our model perfectly overlaps the object.
If IoU is close to 0 or 0 then we can say that our model didn’t predict the object coordinates at all.
NMS
Non max suppression is a technique used mainly in object detection that aims at selecting the best bounding box
out of a set of overlapping boxes.
xxxv
Cross Stage Partial Network (CSPNet)
xxvii
Spatial Pyramid Pooling (SPP)
• During classification tasks, the output feature map is flattened and directed to the FC layer for further
softmax operation. However, to use the FC layer, we have to fix the size of an input image while
training that hinders to detect objects at different scales and object aspect ratios.
• To solve this issue, the final output feature map undergoes channel-wise pooling for different sizes of
spatial bins. If input feature map dimensions are 512X100X100(CXHXW) and spatial bins are 1X1,
2X2, 4X4, then SPP generates 512, 4*512, 16*512 1-D vectors which are then concatenated to feed into
FC layer.
YOLOv8 is the latest version [23-24] of the YOLO (You Only Look Once) models. The YOLO models are
popular for their accuracy and compact size. It is a state-of-the-art model that could be trained on any powerful
or low-end hardware. Alternatively, they can also be trained and deployed on the cloud. The first YOLO
model was introduced in a C repository called Darknet in 2015 by Joseph Redmond [15] when he was working
on it as PHD at the University of Washington. It has since been developed by the community for subsequent
versions.
xxix
YOLOv8 is developed by Ultralytics, a team known for its innovative YOLOv5 model [20]. It was introduced
on January 10th, 2023. YOLOv8 is used to detect objects in images, classify images, and distinguish objects
from each other. Ultralytics has made numerous enhancements to YOLOv8, making it better and more user-
friendly than YOLOv5. It is an advanced model that improves upon the success of YOLOv5 by incorporating
modifications that enhance its power and user-friendliness in various computer vision tasks. These
enhancements include a modified backbone network, an anchor-free detection head, and a new lossfunction.
Furthermore, it provides built-in support for image classification tasks. YOLOv8 is distinctive in that it delivers
unmatched speed and accuracy performance while maintaining a streamlined design that makes it suitable for
different applications and easy to adapt to various hardware platforms.
Architecture of YOLOv8
As of the current writing, there is no published paper yet on YOLOv8, so detailed insights into the research
techniques and ablation studies conducted during its development are unavailable. However, an analysis of the
YOLOv8 repository [24] and its documentation [23] over its predecessor YOLOv5 [20], reveals several key
features and architectural improvements.
Architecture components
The YOLOv8 architecture is composed of two major parts, namely the backbone and the head, both of which
use a fully convolutional neural network.
YOLOv8 Variants
xxvii
All of these models belong to the YOLOv8 family, each variant offers different trade-offs between accuracy,
speed, and model size. The variants are divided based on the difference in the value of the parameters like
depth_multiple (d), width_multiple (w), and max_channel (mc).
depth_multiple(d):
depth_multiple parameter determines how many Bottleneck Blocks are used in the C2f block. This scales the
number of layers in the network. A value less than 1 reduces the depth (fewer layers), making the model smaller
and faster but potentially less accurate. Conversely, a value greater than 1 increases the depth (more layers),
leading to a larger and potentially more accurate model but slower to run.
width_multiple (w):
This scales the number of channels in the convolutional layers. A value less than 1 thins the network (fewer
channels), resulting in a smaller and faster model but potentially sacrificing some accuracy. On the other hand,
a value greater than 1 widens the network (more channels), creating a larger and potentially more accurate
model but requiring more processing power.
max_channels (mc):
This parameter sets an upper limit on the number of channels allowed in the network. It is a safety measure to
prevent the model from becoming too wide (too many channels) especially when width_multiple is set high.
This can help control the model size and prevent overfitting.
Types of YOLOv8:
• n: smallest model, fastest inference but lowest accuracy
• s: small model, good balance of speed and accuracy
• m: medium model, higher accuracy than small models with moderate inference speed
• l: large model, highest accuracy but slowest inference
• xl: extra-large model, best accuracy for resource-intensive applications
xxxi
It is the most basic block in the architecture which consists of the Conv2d layer, BatchNorm2d layer,
and SiLU activation function.
Conv2d Layer: Convolution is a mathematical operation that involves sliding a small matrix (called a
kernel or filter) over the input data, performing element-wise multiplication, and summing the results to
produce a feature map. The “2D” in Conv2D refers to the fact that the convolution is applied in two
spatial dimensions, typically height and width.
• k: Number of filters or kernels. It represents the depth of the output volume, and each filter is
responsible for detecting different features in the input.
• s: Stride. It is the step size at which the filter/kernel slides over the input. A larger stride reduces the
spatial dimensions of the output volume.
• p: Padding. Padding is the additional border of zeros added to the input on each side. It helps preserve
spatial information and can be used to control the spatial dimensions of the output volume.
• c: Number of channels in the input. For example, in an RGB image, c would be 3 (one channel for each
color: red, green, and blue).
xxvii
• SiLU Activation Function: SiLU, which stands for Sigmoid Linear Unit, is an activation function used
in neural networks. It is also known as the Swish activation function.
Bottleneck Block
The bottleneck block consists of the Conv Block with a shortcut connection. If the shortcut=true then
the shortcut is implemented in the bottleneck block else the input is passed through two Conv Blocks in
a series.
Shortcut Connection: The shortcut connection, also known as a skip connection or residual connection,
is a direct connection that bypasses one or more layers in the network. It allows the gradient to flow
more easily through the network during training, addressing the vanishing gradient problem and making
it easier for the model to learn.
In the specific context of a bottleneck block, the shortcut connection allows the model to bypass the
convolutional blocks if necessary. This way, the model can choose to use the identity mapping provided
by the shortcut, making it easier to learn the identity function when needed. The inclusion of a shortcut
connection enhances the ability of the model to learn complex representations and improves the training
of deep CNNs preventing vanishing gradient problems.
xxxiii
What is the vanishing gradient problem?
The vanishing gradient problem is a challenge that arises during the training of deep neural networks,
particularly in architectures with many layers. It occurs when the gradients of the loss function
concerning the parameters (weights) of the network become extremely small as they are backpropagated
from the output layer to the input layer during the training process.
C2f Block
C2f block consists of a convolutional block which then the resulting feature map will be split. One feature map
goes to the Bottleneck block whereas the other goes directly to the Concat block. In the C2f block, the number
of the Bottleneck blocks used is defined by the depth_multiple parameter of the model. At the end, the feature
map from the bottleneck block and the split feature map are concatenated and inputted into a final convolutional
block.
Spatial Pyramid Pooling Fast (SPPF) Block:
xxvii
The SPPF Block consists of a convolutional block followed by three MaxPool2d layers. Every resulting feature
map from the MaxPool2d layer is then concatenated at the end and fed to a convolutional block.
The basic idea behind Spatial Pyramid Pooling is to divide the input image into a grid and pool features from
each grid cell independently, allowing the network to handle images of different sizes effectively.
In essence, Spatial Pyramid Pooling enables neural networks to work with images of different resolutions by
capturing multi-scale information through pooling operations at different levels of granularity. This can be
particularly useful in tasks such as object recognition, where objects may appear at different scales within an
image
While SPP offers advantages, it can be computationally expensive. SPP-Fast addresses this by using a simpler
pooling scheme. Instead of using multiple pooling levels with different kernel sizes, SPP-Fast might use a
single fixed-size kernel for pooling, reducing the number of computations needed. SPP-Fast offers a trade-off
between accuracy and speed.
MaxPool2d Layer: Pooling layers are used to downsample the spatial dimensions of the input volume,
reducing the computational complexity of the network and extracting dominant features. Max pooling is a
specific type of pooling operation where, for each region in the input tensor, only the maximum value is
retained, and the other values are discarded.
In the case of MaxPool2d, the pooling is applied in both the height and width dimensions of the input tensor.
The layer is defined by specifying parameters such as the size of the pooling kernel and the stride. The kernel
size determines the spatial extent of each pooling region, and the stride determines the step size between
successive pooling regions.
xxxv
Detect Block
Detect Block is responsible for the detection of the objects. Unlike in previous versions of YOLO, YOLOv8 is
an anchor-free model which means it predicts directly the center of an object instead of the offset from a known
anchor box. Anchor-free detection reduces the number of box predictions, which speeds up complicated post-
processing steps that sift through candidate detections after inference.
The Detect Block contains two tracks. The first track is for bounding box predictions and the second track is
for class predictions. Both tracks contain two convolutional blocks followed by a single Conv2d layer which
gives the Bounding Box loss and Class Loss respectively
xxvii
YOLOv8 Architecture consists of three main sections: Backbone, Neck, and Head.
Backbone is the deep learning architecture that acts as a feature extractor of the inputted image.
Neck combines the features acquired from the various layers of the Backbone module.
Head predicts the classes and the bounding box of the objects which is the final output produced by the object
detection model.
Backbone Section:
xxxvii
In Block 0, the processing starts with the input image size of 640 x 640 x 3 which is fed to the convolutional
block with kernel size 3, stride 2, and padding 1. The spatial resolution is reduced when stride= 2 is used. The
convolutional block produces the feature map of 320 x 320 because the kernel moves in 2-pixel increments.
To obtain the output channel of the convolution block, the following formula is used :
min(64,mc)*w
Here,
64 is the base output channel
mc is the max_channel
w is the width_multiple
For example, if we are using the “n” variant YOLOv8 model then our final output channel becomes =
min(64,1024)*0.25 = 64*0.25 = 16
Likewise, this operation is calculated in every convolutional block present in the architecture.
Block 2, is a C2f block that contains two parameters i.e. shortcut and n. Here, the shortcut is the boolean
parameter that denotes if the Bottleneck block utilizes the shortcut or not. If the value of the shortcut= true then
the bottleneck block inside the C2f block utilizes the shortcut else it doesn’t.
Here, n determines how many bottleneck blocks are used inside the C2f block. In the case of Block 2, n is given
by:
n= 3*d
where d= depth_multiple
For example, if we are using the “n” variant YOLOv8 model the depth_multiple of the “n” type YOLOv8
model is 0.33 so,the number of bottleneck block used inside the C2f becomes (n) = 3* 0.33=0.99 i.e. 1
bottleneck block is used.
In the C2f block the resolution of the feature map and the output channel is unchanged.
In Block 9, the SPPF Block is used after the last convolution layer of the C2f block in the Backbone.
xxvii
The main function of the SPPF block is to generate the fixed feature representation of the object in various sizes
in an image without resizing the image or introducing spatial information loss.
The neck section is responsible for upsampling the feature map and combining the features acquired from the
various layers of the Backbone section.
The upsample layer present in the Neck section simply increases the feature map by double without making
any changes in the output channel.
Concat Block sums off the output channels of the blocks that are being concatenated without any change in
resolution.
The head section is responsible for predicting the classes and the bounding box of the objects which is the final
output produced by the object detection model.
The first Detect block in the Head section specializes in detecting small objects that are inputted from the C2f
block present in Block 15.
The second Detect block in the Head section specializes in detecting medium-sized objects which is inputted
from the C2f block present in Block 18.
The third Detect block in the Head section specializes in detecting small objects that are inputted from the C2f
block present in Block 21
xxxix
xxvii
Conclusion
In conclusion, YOLOv8, an evolution of the YOLO family, redefines object detection with its anchor-free
architecture, balancing speed and accuracy across various model variants. Utilizing convolutional and
bottleneck blocks, alongside innovative features like Spatial Pyramid Pooling Fast, YOLOv8 efficiently
processes images for real-time detection. Its backbone, neck, and head sections synergize to extract features,
upsample, and predict classes and bounding boxes. With its versatility, YOLOv8 offers a range of models
catering to diverse needs, from rapid inference to high accuracy. Overall, YOLOv8 represents a pinnacle in
object detection, empowering applications with unparalleled performance and user-friendliness.
The YOLOv8 framework can be used to perform computer vision tasks such as detection, segmentation,
classification, and pose estimation. It comes with pre-trained models for each task. The pretrained models for
detection, Segmentation and Pose are pretrained on the COCO dataset [25-26], while Classification models are
pretrained on the ImageNet dataset. YOLOv8 introduces scaled versions such as YOLOv8n (nano), YOLOv8s
(small), YOLOv8m (medium), YOLOv8l (large), and YOLOv8x (extra big).These several versions provide
variable model sizes and capabilities, catering tovarious requirements and use scenarios.For Segmentation,
Classification and Pose estimation; these various scaled versions use suffixes such as -seg, -cls and -pose
respectively. These tasks don’t require additional commands and scripts for making masks, contours or for
classifying the images. With a well-labelled and sufficient dataset, the accuracy can be high. Also using a GPU
over a CPU is recommended for the training process to furtherenhance the performance by decreasing
computation time. YOLOv8 offers multiple modes that can be used either through a command line interface
(CLI) or through Python scripting, allowing users to perform different tasks based on their specific needs and
requirements. These modes are
Train. This mode is used to train a custom model on a dataset with specified hyperparameters. During the
training process, YOLOv8 employs adaptive techniques to optimize the learning rate and balance the loss
function. This leads to enhanced model performance.
Val. This mode is used to evaluate a trained model on a validation set to measure its accuracy and
generalization performance. This mode can help in tuning the hyperparameters of the model for improved
performance.
Predict. This mode is used to make predictions using a trained model on new images or videos. The model is
loaded from a checkpoint file, and users can input images or videos for inference. The model predicts object
classes and locations in the input file.
Export. This mode is used to convert a trained model to a format suitable for deployment in other software
applications or hardware devices. This mode is useful for deploying the model in production environments.
Commonly used YOLOv8 export formats are PyTorch, TorchScript, TensorRT, CoreML, and PaddlePaddle.
Track. This mode is used to perform real-time object tracking in live video streams. The model is loaded from
a checkpoint file and can be used for applications like surveillance systems or self-driving cars.
xli
Benchmark. This mode is used to profile the performance of different export formats in terms of speed and
accuracy. It provides information on the size of the exported format, mAP50-95 metrics for object detection,
segmentation, and pose, or accuracy_top5 metrics for classification, as well as inference time per image.This
enables users to select the most suitable export format for their particular use case, considering their
requirements for speed and accuracy.
Performance Evaluation
Speed A100 TensorRT. This refers to the speed of the object detection model when running on an A100 GPU
(Graphics Processing Unit) using TensorRT, which is an optimization library developed by NVIDIA for deep
learning inference. Similar to Speed CPU ONNX, the speed can be measured in terms of inference time or
frames per second (FPS). Higher values for Speed A100 TensorRT indicate faster inference times on a
powerful GPU, which can be beneficial for applications that require high throughput or real-time processing.
Latency A100 TensorRT FP16 (ms/img). This refers to the latency or inference time of the object detection
model when running on an NVIDIA A100 GPU with TensorRT optimization, using the FP16 (half-precision
floating point) data type. It indicates how much time the model takes to process a single image, typically
measured in milliseconds per image (ms/img). Lower values indicate faster inference times, which are desirable
for real-time or low-latency applications.
Params (M). Params (M) refers to the number of model parameters in millions. It represents the size of the
model, and generally larger models tend to have more capacity for learning complex patterns but may also
require more computational resources for training and inference.
FLOPs (B). FLOPs (B) stands for Floating point operations per second in billions. It is a measure of the
computational complexity of the model, indicating the number of floating-point operations the model performs
per second during inference. Lower FLOPs (B) values indicate less computational complexity and can be
desirable for resource-constrained environments, while higher values indicate more computational complexity
and may require more powerful hardware for efficient inference.
Performance of YOLOv8 on COCO. The COCO val2017 dataset [28-29] is a commonly used benchmark
xxvii
dataset for evaluating object detection models. It consists of a large collection of more than 5000 diverse images
with 80 object categories, and it provides annotations for object instances, object categories, and other relevant
information. The dataset is an Industry-Standard benchmark for object detection performance and for
comparing the accuracy and speed of different object detection models
The training and results were sourced from Ultralytics GitHub repository [24]1. All scaled versions of YOLOv8
along with previous versions of YOLO i.e. YOLOv5, YOLOv7 were trained on COCO. Here mAPval values
are for single-model single-scale on the COCO val2017 dataset and Speed is averaged over COCO val images
using an Amazon EC2 P4d instance.
xliii
Performance of YOLOv8 on RF100.
The Roboflow 100 (RF100) dataset [30-31] is a diverse, multi-domain benchmark comprising 100 datasets
created by using over 90,000 public datasets and 60 million public images from the Roboflow Universe, a web
application for computer vision practitioners. The dataset aims to provide a more comprehensive evaluation
of object detection models by offering a wide range of real-life domains, including satellite, microscopic, and
gaming images. With RF100,researchers can test their models' generalizability on semantically diverse
data. 1
YOLOv8 is evaluated on the RF100 benchmark alongside YOLOv5 and YOLOv7. [email protected] is a specific
version of the mAP metric that measures the average precision of a model at a detection confidence threshold of
0.5. In other words, it measures how well the model is able to detect objectswhen it is at least 50% confident
that an object is present in the image.The process and results were sourced from the robofow blog [32]2.
Small versions of each model are trained for a total of 100 epochs. To minimize the effect of random
initialization and ensure reproducibility, each experiment is run using a single seed.
3. PROPOSED METHODOLOGY
Automatic License Plate Detection using YOLOv8" aims to build a smart and efficient system capable of
recognizing license plates in real time. It combines object detection with Optical Character Recognition (OCR)
to facilitate automated vehicle identification, which can further be integrated into mobile or web-based
applications for traffic law enforcement, smart parking, and toll management systems. The network architecture
used for the detection of number plates in the project is YOLOv8(You only view once). This process involves
training on the dataset which involves images containing vehicles with annotated bounding boxes around their
number plates.
The administrative interface also empowers the administrators to manage driver's licences, vehicle records, and
fine details. Additional features include the capability to modify fines, add new licences and vehicles, ensuring
the system remains adaptable to changing regulatory requirements
Our project's goal is to make it easier for citizens to report traffic offenses by offering an easy-to-use interface
xxvii
that makes it simple to submit facts and supporting documentation. The application uses machine learning
models to automate license plate recognition and vehicle identification, providing traffic enforcement personnel
with a quick and precise decision-making process.
The proposed methodology incorporates modern tools and frameworks including YOLOv8, PyTorch,
Pytesseract, and Python, structured as follows:
xlv
Optical Character Recognition (OCR)
After the stage of number plate detection by YOLOv8 the text must be extracted which is done by OCR engine.
The OCR engine[11] processes the number plate detected and generates an output of the characters thatwere
recognized.
These normalized values ensure that the bounding box coordinates are scaled between 0 and 1 relative
to the image dimensions, which is the format required for YOLO training.
This term minimizes the difference between the predicted and actual bounding box coordinates. It can be
expressed as:
2 2 2
𝑜𝑏𝑗
̂) + (√ℎ − √ℎ̂) ]
𝐿𝑏𝑏𝑜𝑥 = ∑𝑠𝑖=1 ∑𝐵𝑗=1 11𝑗 [(𝑥 − 𝑥̂)2 + (𝑦 − 𝑦̂)2 + (√𝑤 − √𝑤
where x,y,w, ℎ are the ground truth bounding box center and dimensions 𝑥̂, 𝑦, ̂ ℎ̂ are the predicted bounding
̂ 𝑤,
𝑜𝑏𝑗
box center and dimensions, S is the number of grid cells, B is the number of bounding boxes per cell, 1𝑖𝑗 is an
indicator function that is 1 if the object is present in the i-th grid cell and 0 otherwise.
xxvii
where 𝑐𝑖 is the ground truth confidence score (1 if the object is present, 0 otherwise), 𝑐̂𝑖 is the predicted
𝑛𝑜_𝑜𝑏𝑗
confidence score, 1𝑖𝑗 is 1 if no object is present in the grid cell and 0otherwise.
Classification Loss
For each bounding box, the model predicts the probability of the object belonging to a particular
class(e.g.,car,motorcycle, license plate). The classification loss is calculatedas the cross-entropy between the
true and predicted class:
where 𝑝𝑖 (c) is the ground truth probability for class C,𝑝̂ (𝑐) is the predicted class probability for class
c,C is the total number of classes. The total loss ℒ for the YOLO model is the sum of these three components:
where 𝜆𝑏𝑏𝑜𝑥, 𝜆𝑐𝑜𝑛𝑓, 𝜆𝑐𝑙𝑎𝑠𝑠 are weighting factors for the respective loss terms.
where Ct is the character at time step, 𝜃 are the parameters of the OCR model, T is the total number of
characters in the
recognized sequence. This sequence is generated by selecting the most probable character at each step 𝑡, based
on the
features extracted from the image.
xlvii
xlviii
EXPERIMENTAL RESULTS AND EVALUATION
3.1 Dataset
This dataset consists of images of car license plates, paired with their corresponding annotations in YOLO
format. It is designed for training and evaluating models focused on detecting car license plates in images.
The dataset was derived from the Car License Plate Detection dataset on Kaggle and has been split into
training and testing subsets.
• Label: The class of the object (for this dataset, it will always be 0, representing car license plates).
• Xc, Yc: Center coordinates of the bounding box, normalized to the width and height of the image.
• W, H: Width and height of the bounding box, also normalized.
xlix
Implementation platform
The proposed methodology is implemented and tested on the specified dataset using Google
Colab, a cloud-based platform that supports Python 3.9.16. The computations are performed
on a laptop equipped with a 12th Gen Intel(R) Core(TM) i5-1235U processor, operating at
1,300 MHz, featuring 10 cores and 12 logical processors. The operating system used is
Microsoft Windows 11.
2. Accuracy: It’s the most popular performance matrix which measures how
often the classifier produce the correct prediction. Mathematically Accuracy defined as
the ratio of the number of correct predicted images and the total number of images and
symbolically represented as
4. Recall: It is the fraction of the images which are successfully retrieved. Recall
is the ratio of the number of tumor images that are correctly classified and the number
of images that are to be predicted. Sensitivity, Hit Rate, True Positive Rate are the
other names of Recall. The lower the False negative the higher the recall because the
number of tumor images that are classified as non-tumor is low.
xxxii
xxxiv
5. F-Score: It is the harmonic mean of Precision and Recall and is a measure of
test accuracy. F-score reaches its best value at 1 (100% precision and recall) and worst
value at 0. F-Score can be defined as
6. Specificity: Specificity is the True Negative Rate (TNR) of the model and the
statistical measure of the binary classification test. As we are dealing with a binary
classification (tumor or nontumor) so we can use this as the performance evaluation. It
is the ratio of the number of non-tumor images that are correctly classified (TN) and the
number of images that are classified or misclassified as non-tumor (TN + FP). The lower
the false positive (FP) the higher the specificity or selectivity.
o Recall (0.96): Reflects the ability to identify true positives out of actual
positives.
xxxv
o mAP50 (0.91459): Indicates the overall correct predictions, which remain
high.
o
o o
xxxv
xxxv
xxxv
4. CONCLUSION AND SCOPE FOR FUTURE WORK
This study demonstrated the effectiveness of a YOLOv8-based system for Automatic License Plate Recognition
(ALPR)on resource-constrained devices, achieving high detection accuracy and real-time performance. Among
the YOLOv8 variants, YOLOv8-n balanced precision and speed, achieving a mean Average Precision (mAP) of
92% and maintaining 36 FPS on the UFPR-ALPR dataset, making it suitable for mobile and embedded systems.
Image preprocessing techniques, such as sharpening, further improved recognition rates inchallenging
environments. At the same time, comparisons withstate-of-the-art Optical Character Recognition (OCR)
methods highlighted the system’s robustness and adaptability across diverse datasets. Despite its success,
challenges remain in generalizing to uncommon license plate formats and extremeconditions, which can be
addressed by expanding training datasets, optimizing lightweight architectures, and exploring advanced transfer
learning
FUTURE WORK
Current system is working with the images, in future we will try to add the video functionality also so
user can upload the video.
In future we will try to add live tracking like from live CCTV Cam it can detect the vehicle's number
plate & make the record.
This system only can detect the number plate from the car, but in future we will do with the all types of
vehicle.
Currently we are working with the Haar cascading technique but we will try with the Yolo/TF object
detection for the plate detection, it will be more accurate.
Our system can work only on the desktop but we have planned for the android application, we will
implement it in future.
Currently we are using the dummy data for the vehicle's owner information, we have tried a lot but there
are no government API available which can provide the real data.
xxxv
REFERENCES
[1] R. Antar, S. Alghamdi, J. Alotaibi, and M. Alghamdi, "Automatic
Number Plate Recognition of Saudi License Car Plates," Engineering,
Technology & Applied Science Research, vol. 12, no. 2, pp. 8266–8272,
Apr. 2022, https://doi.org/10.48084/etasr.4727.
[2] M. S. Sheikh, J. Liang, and W. Wang, "A Survey of Security Services,
Attacks, and Applications for Vehicular Ad Hoc Networks (VANETs),"
Sensors, vol. 19, no. 16, Aug. 2019, Art. No. 3589,
https://doi.org/10.3390/s19163589.
[3] Y. Yang, Y. Xiao, Z. Chen, D. Tang, Z. Li, and Z. Li, "FCBTYOLO: A
Lightweight and High-Performance Fine Grain Detection Strategy for
Rice Pests," IEEE Access, vol. 11, pp. 101286–101295, 2023,
https://doi.org/10.1109/ACCESS.2023.3314697.
[4] M. Salemdeeb and S. Erturk, "Multi-national and Multi-language
License Plate Detection using Convolutional Neural Networks,"
Engineering, Technology & Applied Science Research, vol. 10, no. 4,
pp. 5979–5985, Aug. 2020, https://doi.org/10.48084/etasr.3573.
[5] C. Henry, S. Y. Ahn, and S.-W. Lee, "Multinational License Plate
Recognition Using Generalized Character Sequence Detection," IEEE
Access, vol. 8, pp. 35185–35199, 2020, https://doi.org/10.1109/
ACCESS.2020.2974973.
[6] H. Shi and D. Zhao, "License Plate Recognition System Based on
Improved YOLOv5 and GRU," IEEE Access, vol. 11, pp. 10429–10439,
2023, https://doi.org/10.1109/ACCESS.2023.3240439.
[7] X. Li, X. Wang, C. Qu, J. Song, H. Li, and Y. Xi, "Vehicle pose
estimation by parking AGV based on RGBD camera," in 2024 9th
International Conference on Automation, Control and Robotics
Engineering (CACRE), Jeju Island, Korea, Republic of, Jul. 2024, pp.
289–294, https://doi.org/10.1109/CACRE62362.2024.10635074.
[8] T. Mustafa and M. Karabatak, "Real Time Car Model and Plate
Detection System by Using Deep Learning Architectures," IEEE Access,
vol. 12, pp. 107616–107630, 2024, https://doi.org/10.1109/
ACCESS.2024.3430857.
[9] A. Tourani, A. Shahbahrami, S. Soroori, S. Khazaee, and C. Y. Suen, "A
Robust Deep Learning Approach for Automatic Iranian Vehicle License
Plate Detection and Recognition for Surveillance Systems," IEEE
Access, vol. 8, pp. 201317–201330, 2020, https://doi.org/10.1109/
ACCESS.2020.3035992.
[10] M. S. Beratoğlu and B. U. Töreyіn, "Vehicle License Plate Detector in
Compressed Domain," IEEE Access, vol. 9, pp. 95087–95096, 2021,
https://doi.org/10.1109/ACCESS.2021.3092938.
[11] K. Sangsuwan and M. Ekpanyapong, "Video-Based Vehicle Speed
xxxix
Estimation Using Speed Measurement Metrics," IEEE Access, vol. 12,
pp. 4845–4858, 2024, https://doi.org/10.1109/ACCESS.2024.3350381.
[12] W. Ali, G. Wang, K. Ullah, M. Salman, and S. Ali, "Substation Danger
Sign Detection and Recognition using Convolutional Neural Networks,"
Engineering, Technology & Applied Science Research, vol. 13, no. 1,
pp. 10051–10059, Feb. 2023, https://doi.org/10.48084/etasr.5476.
[13] D. Habeeb, A. H. Alhassani, L. N. Abdullah, C. S. Der, and L. K. Q.
Alasadi, "Advancements and Challenges: A Comprehensive Review of
GAN-based Models for the Mitigation of Small Dataset and Texture
Sticking Issues in Fake License Plate Recognition," Engineering,
Technology & Applied Science Research, vol. 14, no. 6, pp. 18401–
18408, Dec. 2024, https://doi.org/10.48084/etasr.8870.
[14] S. Luo and J. Liu, "Research on Car License Plate Recognition Based on
Improved YOLOv5m and LPRNet," IEEE Access, vol. 10, pp. 93692–
93700, 2022, https://doi.org/10.1109/ACCESS.2022.3203388.
[15] Z. Xu et al., "Towards End-to-End License Plate Detection and
Recognition: A Large Dataset and Baseline," in Computer Vision –
ECCV 2018, vol. 11217, V. Ferrari, M. Hebert, C. Sminchisescu, and Y.
Weiss, Eds. Cham: Springer International Publishing, 2018, pp. 261–
277, https://doi.org/10.1007/978-3-030-01261-8_16.
[16] R. Laroca et al., "A Robust Real-Time Automatic License Plate
Recognition Based on the YOLO Detector," in 2018 International Joint
Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, Jul.
2018, pp. 1–10, https://doi.org/10.1109/IJCNN.2018.8489629.
[17] R. Laroca, E. Cardoso, D. Lucio, V. Estevam, and D. Menotti, "On the
Cross-dataset Generalization in License Plate Recognition:," in
Proceedings of the 17th International Joint Conference on Computer
Vision, Imaging and Computer Graphics Theory and Applications,
Online Streaming, 2022, pp. 166–178, https://doi.org/
10.5220/0010846800003124.
[18] R. Laroca, A. B. Araujo, L. A. Zanlorensi, E. C. De Almeida, and D.
Menotti, "Towards Image-Based Automatic Meter Reading in
Unconstrained Scenarios: A Robust and Efficient Approach," IEEE
Access, vol. 9, pp. 67569–67584, 2021,
https://doi.org/10.1109/ACCESS.2021.3077415.
[19] S. M. Silva and C. R. Jung, "Real-time license plate detection and
recognition using deep convolutional neural networks," Journal of
Visual Communication and Image Representation, vol. 71, Aug. 2020,
Art. no. 102773, https://doi.org/10.1016/j.jvcir.2020.102773.
[20] F. Borisyuk, A. Gordo, and V. Sivakumar, "Rosetta: Large Scale System
for Text Detection and Recognition in Images," in Proceedings of the
24th ACM SIGKDD International Conference on Knowledge Discovery
xxxv
& Data Mining, London United Kingdom, Jul. 2018, pp. 71–79,
https://doi.org/10.1145/3219819.3219861.
[21] G. R. Gonçalves, M. A. Diniz, R. Laroca, D. Menotti, and W. R.
Schwartz, "Multi-task Learning for Low-Resolution License Plate
Recognition," in Progress in Pattern Recognition, Image Analysis,
Computer Vision, and Applications, vol. 11896, I. Nyström, Y.
Hernández Heredia, and V. Milián Núñez, Eds. Cham: Springer
International Publishing, 2019, pp. 251–261
xxxix