0% found this document useful (0 votes)
64 views70 pages

Final License Plate Detection and Recognition

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views70 pages

Final License Plate Detection and Recognition

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

TRIBHUVAN UNIVERSITY

INSTITUTE OF ENGINEERING
PURWANCHAL CAMPUS

A
PROJECT REPORT
ON
VEHICLE NUMBER PLATE DETECTION AND RECOGNITION

SUBMITTED BY:
KHUSHILAL MAHATO (PUR077BCT039)
KSHITIZ GAJUREL (PUR077BCT042)
MANISH KATHET (PUR077BCT044)
MANOJ KUMAR BANIYA (PUR077BCT046)

SUBMITTED TO:
DEPARTMENT OF ELECTRONICS & COMPUTER ENGINEERING

March, 2023
Page of Approval

TRIBHUVAN UNIVERSITY
INSTITUTE OF ENGINEERING
PURWANCHAL CAMPUS
DEPARTMENT OF ELECTRONICS AND COMPUTER ENGINEERING

The undersigned certifies that they have read and recommended to the Institute of
Engineering for acceptance of a project report entitled “Vehicle Number Plate Detection
and Recognition” submitted by Khushilal Mahato, Kshitiz Gajurel, Manish Kathet, Manoj
Kumar Baniya in partial fulfillment of the requirements for the Bachelor’s degree in
Electronics & Computer Engineering.

............................. .............................
Supervisor Internal examiner
Er. Pukar Karki Person B
Associate Professor Assistant Professor
Department of Electronics and Computer Department of Electronics and Computer
Engineering, Engineering,
Purwanchal Campus, IOE, TU. Purwanchal Campus, IOE, TU.

Date of approval:
January, 2024

ii
Copyright
The author has agreed that the Library, Department of Electronics and Computer
Engineering, Purwanchal Campus, Institute of Engineering may make this report freely
available for inspection. Moreover, the author has agreed that permission for extensive
copying of this project report for scholarly purposes may be granted by the supervisors who
supervised the project work recorded herein or, in their absence, by the Head of the
Department wherein the project report was done. It is understood that recognition will be
given to the author of this report and to the Department of Electronics and Computer
Engineering, Purwanchal Campus, Institute of Engineering in any use of the material of this
project report. Copying or publication or the other use of this report for financial gain
without approval of the Department of Electronics and Computer Engineering, Purwanchal
Campus, Institute of Engineering and author’s written permission is prohibited.

Request for permission to copy or to make any other use of the material in this report in
whole or in part should be addressed to:

Head
Department of Electronics and Computer Engineering
Purwanchal Campus, Institute of Engineering, TU
Dharan, Nepal.

iii
Acknowledgments
We would like to express our sincere gratitude and appreciation to all those who contributed
to the success of our final year project.
First and foremost, we would like to thank our project supervisor, Prof. Er. Pukar Karki sir
for providing us with invaluable guidance and continuous support throughout the project.
His knowledge in this field has been crucial in shaping our project and achieving its
objectives.
We express our gratitude to the administration and faculty members of Department of
Electronics and Computer Engineering for providing us with the necessary resources and
facilities to undertake this project.
We would also like to extend our thanks to the “District Traffic Police office, Dharan,
Itahari" for their support in making the availability of CCTV footage, which is essential for
the project.
We also extend our heartfelt appreciation to our classmates and friends for their unwavering
support and collaboration.
We wish to extend our sincerest appreciation to our esteemed senior batch student,
Prashanna Kumar Gyawali, whose invaluable assistance and insights have significantly
contributed to the success of our endeavor. His previous experience with a project of similar
title has provided us with invaluable guidance and served as a beacon of inspiration
throughout our journey.
Furthermore, we would like to acknowledge the contribution of our family and friends for
their continuous support throughout the project.
Once again, we would like to extend our sincere thanks to everyone who has contributed to
the successful completion of our project.

iv
Abstract
In this project we used YOLOv8n that was trained on custom dataset collected by us which
consisted of 1772 images of 3 classes and was split in the ratio of 80:20 for train and
validation respectively. For tracking the detected objects in the video, we used Yolov8 which
detects, tracks and outputs the bounding box for the object with respective track IDs. Then
for every unique detected and tracked object the corresponding license plate of the object is
cropped, and the new cropped image is sent as input for segmentation program. The image
of the license plate undergoes HSV color space conversion, color masking and perspective
transformed in that order before it is preprocessed for profiling the different types of license
plate in the dataset. The image undergoes horizontal projection profiling and vertical
projection profiling which is then validated to separate the characters of the license plate.
The segmented characters are then fed to a CNN trained on our custom dataset of characters
of license plate, for now which consists of private and public vehicles of Koshi province.

Keywords: YOLOv8n, tracking, character segmentation, CNN

v
Contents
Page of Approval ..................................................................................................................................................... ii

Copyright ..................................................................................................................................................................iii

Abstract ...................................................................................................................................................................... v

List of Figures ....................................................................................................................................................... viii

List of Tables ............................................................................................................................................................ ix

List of Abbreviations ............................................................................................................................................. x

1. Introduction ...................................................................................................................................................... 11

1.1 Background ............................................................................................................................................. 11

1.2 Problem Statements ............................................................................................................................. 11

1.3 Objectives ................................................................................................................................................. 12

1.4 Scope .......................................................................................................................................................... 12

2. Literature Review ........................................................................................................................................... 14

2.1 Related work ........................................................................................................................................... 14

2.2 Related theory ........................................................................................................................................ 15

2.2.1 History of YOLO ............................................................................................................................ 15

2.2.2 YOLOv8 Architecture .................................................................................................................. 18

Backbone: ............................................................................................................................................................. 19

Neck: ........................................................................................................................................................................ 20

Head: ........................................................................................................................................................................ 20

2.2.3 Nepali Vehicle Classification and License Plate ............................................................... 21

3. UML Diagrams ................................................................................................................................................. 27

4. Methodology ..................................................................................................................................................... 32

vi
5. System design................................................................................................................................................... 33

6. Results & Discussion ..................................................................................................................................... 34

6.1 Object Detection .................................................................................................................................... 34

6.1.1 Data Acquisition ........................................................................................................................... 34

6.1.2 Data Preprocessing ..................................................................................................................... 35

6.1.3 Model Training.............................................................................................................................. 36

Table 5.1: Confusion Matrix Table ................................................................................................................. 37

Table 5.2: Type I and Type II error ............................................................................................................... 38

6.2 Object Tracking with YOLOv8.................................................................................................................. 42

6.3 Plate Localization .................................................................................................................................. 43

6.4 Processing ROI ....................................................................................................................................... 45

6.5 Character Segmentation ..................................................................................................................... 46

6.6 Plate Recognition .................................................................................................................................. 52

6.6.1 Character Pre-processing ......................................................................................................... 52

6.6.2 Character Recognition ............................................................................................................... 53

Table 6.1: Feature Map of different layers.................................................................................................. 63

6.6.3 Multi-class classification model evaluation metrics ...................................................... 63

7. Conclusions ....................................................................................................................................................... 67

8. Limitations and Future enhancement .................................................................................................... 68

References ................................................................................................................................................................... 69

Final Output ................................................................................................................................................................ 70

vii
List of Figures
Figure 2.1: History of Yolo ..................................................................................................................................... 17
Figure 2.1: Yolov8 Architecture .......................................................................................................................... 18
Figure 2.2: Yolo Object Detection. ...................................................................................................................... 21
Figure 2.3: Comparison of Various versions of yolo ................................................................................... 21
Figure 2.4: Zonal Plate Format ............................................................................................................................ 23
Figure 2.5: Provincial Plate Format ................................................................................................................... 26
Figure 3.1: Use Case Diagram .............................................................................................................................. 28
Figure 3.2: Sequence Diagram ............................................................................................................................. 29
Figure 4.1: Project Methodology ........................................................................................................................ 32
Figure 5.1: Designed System ................................................................................................................................ 33
Figure 6.1: Object Detection ................................................................................................................................. 34
Figure 6.2: Confusion Matrix Yolov8n Model ................................................................................................ 38
Figure 6.4: Precision Confidence Plot mAP ................................................................................................... 39
Figure 6.7: Validation Batch ................................................................................................................................. 41
Figure 6.8: Yolov8 Tracking .................................................................................................................................. 42
Figure 6.10: Pre-processing Detected License Plate ROI .......................................................................... 46
Figure 6.11: Overview of Character Segmentation Steps ......................................................................... 47
Figure 6.12: Histogram of Gray Image and Grayscale image ................................................................... 48
Figure 6.13: Character Segmentation Stage 1 ............................................................................................... 50
Figure 6.14: Segmented Characters, final stage ............................................................................................ 51
Figure 6.15: Character Recognition Process Overview.............................................................................. 52
Figure 6.16: Segmented Character .................................................................................................................... 53
Figure 6.17: Example of convolution operation of 2*2 kernel with 4*4 feature map (When stride
is one.) .......................................................................................................................................................................... 54
Figure 6.18: Different Number of Strides usage in a convolution layer .............................................. 55
Figure 6.19: Example of Padding ........................................................................................................................ 56
Figure 6.20: Maxpool layer ................................................................................................................................... 56
Figure 6.21: ReLU Activation Function ............................................................................................................ 57
Figure 6.22: Fully Connected Layer ................................................................................................................... 58
Figure 6.23: Categorical Cross-Entropy Loss................................................................................................. 59
Figure 6.26: Train/Validation Model Error and Accuracy ........................................................................ 65
Figure 6.27: Test Classification Report for Mode ......................................................................................... 66

viii
List of Tables

Table 2.1: Devanagari Zonal License Plate Characters ............................................................................... 24


Table 2.2: Province number and names .......................................................................................................... 25
Table 2.3: Devanagari Province License Plate Characters ........................................................................ 26
Table 5.1: Confusion Matrix Table...................................................................................................................... 37
Table 5.2: Type I and Type II error .................................................................................................................... 38
Table 6.1: Feature Map of different layers ...................................................................................................... 63

ix
List of Abbreviations
AI Artificial Intelligence
CNN Convolution Neural Network
CCTV Closed Circuit Television
DNN Deep Neural Network
FPS Frames per second
HOG Histograms of Oriented Gradients
ROI Region of Interest
OCR Optical Character Recognition
OpenCV Open-Source Computer Vision Library
PC Personal Computer
R-CNN Region Based Convolutional Neural Network
RGB Red, Green, and Blue
SSD Single Shot Detector
WHO World Health Organization
HSV Hue Saturation Value
CLAHE Contrast Limited Adaptive Histogram
Equalization

IOU Intersection Over Union


SOTA State of the Art

x
1. Introduction
The study of computer vision focuses on simulating some of the complexity of the human visual
system so that computers can recognize and analyze items in pictures and videos in a similar
manner to how people do. Until recently, computer vision only worked in limited capacity.
Artificial intelligence (AI) has made enormous strides in recent years and is now capable of
outperforming humans in various tasks involving object detection and object classification. This
is due to developments in deep learning, neural networks, and artificial intelligence. Computer
vision has a wide range of applications, one of them is traffic monitoring and road safety. Today,
computer vision and machine learning strategies utilizing artificial intelligence provide
fascinating and encouraging remedies for boosting road safety and traffic monitoring.

1.1 Background
Accidents on the road are a major problem in today’s world. Every year, several countries invest
millions of dollars to reduce traffic accidents. To keep road users from being badly wounded or
killed, many road safety systems and measures are in place. Nevertheless, countries face a
significant number of vehicle accidents every day. The majority of these incidents are caused by
people who are unwilling to respect traffic laws and a lack of supervision. The problem is
particularly acute in countries like Nepal where people tend to follow rules only when traffic
police are watching. Additionally, keeping track of each vehicle at every checkpoint entering a
province, no. of vehicle leaving a province is tedious. Many Indian vehicle run on the street of
major cities, like Dharan, Itahari, Biratnagar. Manually checking whether they have a pass to drive
or not. If there was a system that could detect vehicles and track its record using CCTV cameras,
it would definitely contribute to road safety and track details. Our focus for this project will be in
major cities of Eastern Nepal where the traffic is high.

1.2 Problem Statements


According to the latest WHO data published in 2020 Road Traffic Accidents Deaths in Nepal
reached 4,654 or 2.90 % of total deaths [2]. The age adjusted Death Rate is 20.65 per 100,000 of
population ranks Nepal 72 in the world.

Since, 28 help desks across the valley to inspect the transport sector, as well as to provide
information regarding the right place to purchase tickets and board buses. As the number of long-
distance buses is not enough to take commuters to their destinations, many local buses which run
inside the valley are being used. We are also conducting mechanical testing of such vehicles to

11
confirm they are safe. 7,500 security officers had been deployed across the valley for security and
management purposes in view of the festive season. While leaving the Kathmandu valley,
there are many checkpoints deployed by the road safety and traffic police officials, including
Naghdunga, Kohalpur, Shambhunath. And many temporary checkpoints to record vehicle
information.

Given the situation, something must be done to track of details of thousands of vehicles leaving
and entering and detailing each vehicle is not effective so far. The details added in a notebook on
daily basis is not effective since we cannot get the record of desired vehicle easily. And much data
are lost. There are many cases seen where the traffic cannot provide the records of registered
vehicle due to improper maintenance.

1.3 Objectives
The main objectives of this project are as follows:

• To detect vehicles out of image.

• To recognize motorcycle, car, bus and license plates.

• To recognize the characters of detected license plates.

• To record the details of vehicles specially about the vehicle type and its license plate.

1.4 Scope
The scope of this project is wide-ranging, aiming to develop a sophisticated Automatic
Number Plate Detection and Recognition (ANPR) system mainly focused on the Nepali
context. This system will possess the capability to process random images, accommodating
variations in lighting, quality, and overall conditions typically encountered in real-world
scenarios. The primary objectives include the accurate detection and recognition of Nepali
vehicle number plates, with a keen emphasis on adaptability to the diverse formats
featuring Devanagari characters. Advanced object detection techniques will be
implemented to precisely identify vehicles within images, followed by the development of
algorithms for accurate number plate detection and character segmentation. Deep learning
models will play a vital role in character classification, enabling the system to categorize
vehicles based on the information extracted from segmented characters. The project also

12
explores the potential application of the ANPR system in monitoring traffic rule violations
through roadside cameras, contributing to enhanced road safety. Integration with existing
CCTV infrastructure and scalability considerations ensure a comprehensive and sustainable
solution for evolving traffic conditions in the Nepali context.

13
2. Literature Review
2.1 Related work
Automatic Number Plate Recognition (ANPR) systems are important systems in transportation
management and surveillance. They are capable of identifying vehicles by extracting the number
plate and reading the plate identity which is unique identification code given to each vehicle.
ANPR systems can be used for automatic traffic control, electronic toll collection, vehicle tracking
and monitoring, border crossing, security and many more.

Developing ANPR system requires integration of computer vision algorithms with imaging
hardware. Computer vision algorithms include image processing techniques for number plate
localization, plate orientating and sizing, normalization and character segmentation. Beside
these, it includes pattern recognition techniques for optical character recognition. For better
identification accuracy, machine learning techniques are used to learn from input data. There are
many difficulties that ANPR system may face; such as, poor resolution, poor illumination
conditions, blurry inputs, plate occlusion, different font size and variety of plate structures.

Here a proposed machine learning based Nepali number plate recognition system, which is
capable of automatically labelling a given number plate to its identity. Automatic number plate
recognition is a widely researched problem from many decades and in many countries it is
successfully applied to practical domain too. But for Nepali number plates, there are very few
researches conducted so far. Most of them are based on simple distance measures for character
matching. Plate localization and segmentation are again not researched much for handling all
the situations. Nepali number plate character are selected from the pool of 29 characters in a
specific orders. Order defines various characteristic of the number plates such as vehicle type,
vehicle load, etc. The number plates used in Nepal are usually of two formats, one containing all
the characters in a single row and the other containing two rows of characters. Characters are
selected from Devnagari script. Here, we propose a complete number plate recognition pipeline
that automatically localizes, normalizes and segments number plates from vehicle images;
segments characters from detected number plates and passes them to classification system for
labeling. Classification system implements SVM based machine learning algorithms for learning
and prediction.

Recent interests of ANPR systems include sophisticated machine learning techniques (like deep
learning, neural networks, SVMs) along with good plate localization and character segmentation
algorithms. Localization of license plate refers to extracting the region in an image that contains
the plate and some of the widely used techniques for localization include scale shape analysis,
14
edge detection, mathematical morphology [1], connected component analysis [2], regional
segmentation [3], and statistical classification [4]. Different algorithms have claimed their
accuracy for localization from 80% to 96%. The segmentation phase extracts the region of
individual characters from the plate. Frequently used algorithms for segmentation include
region merging and splitting, edge gradient analysis and region analysis. Coordinate of window
enclosing each character is ascertained by segmentation. Template matching and statistical
classification were widely used for number plate character recognition in the past. But with the
advent of technology and machine learning algorithms, Artificial Neural Networks, Support
Vector Machines, Hidden Markov Models are some of the widely used techniques in the current
scenario. These algorithms claim to offer accuracy of up to 98% for tasks like character
recognition even under different environmental variations. [5] presented quite a good results in
different Fig. 2: License plate identifiers for the Nepali vehicles. inclination and exposure
conditions. The shape and characters placement in the number plates of the vehicle are
exclusively distributed around the globe and moreover, Nepalese plate use Devnagari characters
which make the problem even more complex. At such the recognition methods and algorithm for
the Nepalese plate should be dealt uniquely.

2.2 Related theory


2.2.1 History of YOLO

The journey of YOLO (You Only Look Once) object detection spans several significant iterations,
each marked by notable advancements and improvements in accuracy, speed, and versatility. It
all began in 2016 when Joseph Redmon introduced the groundbreaking YOLO algorithm in a
seminal paper titled "You Only Look Once: Unified, Real-Time Object Detection." This
pioneering work proposed a novel single-shot object detection approach, enabling the
identification and classification of objects in real-time using a single neural network. Unlike
traditional methods that relied on multiple networks and stages, YOLO revolutionized the field
with its efficiency and speed.

Despite its innovative concept, the original YOLO algorithm had limitations, particularly in
accurately localizing objects within images. Responding to this challenge, Redmon and his team
introduced YOLOv2 in 2017. This iteration featured the adoption of a new detection
architecture known as Darknet-19, along with the introduction of anchor boxes. These
enhancements significantly improved object localization accuracy and detection speed, setting
a new benchmark for real-time object detection systems.

15
Building upon the success of YOLOv2, Redmon and his team unveiled YOLOv3 in 2018. This
version represented a significant leap forward in object detection accuracy and performance.
YOLOv3 introduced innovative features such as multi-scale detection and improved feature
extraction, further refining the algorithm's ability to detect and classify objects across various
scales and contexts. With its superior accuracy and speed, YOLOv3 quickly established itself as
a state-of-the-art solution in the field of object detection.

In 2020, the evolution of YOLO continued with the release of YOLOv4, ushering in a new era of
object detection capabilities. YOLOv4 introduced groundbreaking techniques like the Mish
activation function and Spatial Pyramid Pooling (SPP) block, pushing the boundaries of
performance and efficiency. With its state-of-the-art results in accuracy and speed, YOLOv4
solidified its position as the go-to choice for real-time object detection tasks.

The latest milestone in the YOLO saga came with the advent of YOLOv5 in the same year.
YOLOv5 represented a culmination of years of research and development, introducing a host of
innovative features and optimizations. This iteration introduced a new anchor-based
prediction system, a lightweight architecture, and a focus on model compression to reduce
memory usage. With its enhanced speed, accuracy, and versatility, YOLOv5 emerged as a
dominant force in the world of object detection, finding applications in diverse fields such as
autonomous driving and robotics.

Now, the latest advancement in the YOLO lineage brings us to YOLOv8. Building upon the
foundation laid by its predecessors, YOLOv8 incorporates advanced techniques and
optimizations to further elevate the algorithm's performance and capabilities. One of the key
improvements in YOLOv8 is the integration of a more efficient backbone network, enabling
faster processing of input images while maintaining high levels of accuracy in object detection.
Additionally, YOLOv8 introduces refinements in feature extraction and refinement, leading to
more precise object localization and classification. Furthermore, YOLOv8 incorporates novel
strategies for model optimization and deployment, ensuring optimal performance across a
wide range of applications and platforms. With its state-of-the-art performance and versatility,
YOLOv8 continues to push the boundaries of what is possible in the realm of object detection,
paving the way for new innovations and breakthroughs in the field.

16
Here is a step-by-step algorithm for the YOLO (You Only Look Once) object detection algorithm:

1. Input: a digital image I with width W and height H.

2. Preprocessing: resize the image I to a fixed size and normalize its pixel values to range
[0, 1].

3. Divide the image into a grid of S x S cells, where S is determined, depending on the network
architecture.

4. For each cell, predict B bounding boxes with confidence scores and (class probabilities for
each bounding box) for K object classes, using a convolutional neural network (CNN) model.

5. Calculate the confidence score for each bounding box by multiplying the conditional class
probability with the intersection over union (IoU) between the predicted box and the
ground truth box, if any.

6. Apply a threshold to the confidence scores to remove low-confidence predictions.

7. Non-maximum suppression (NMS): for each class, remove overlapping bounding boxes by
keeping only the one with the highest confidence score. This results in a final set of
predictions for all classes.

8. Output: a list of predicted bounding boxes with their corresponding class labels and
confidence scores.

Figure 2.1: History of Yolo

17
2.2.2 YOLOv8 Architecture

YOLO (You Only Look Once) is one of the most popular modules for real-time object detection
and image segmentation, currently (end of 2023) considered as SOTA. YOLO is a convolutional
neural network that predicts bounding boxes and class probabilities of an image in a single
evaluation.

Figure 2.1: Yolov8 Architecture

18
A modified version of the CSPDarknet53 architecture forms the backbone of YOLOv8. This
architecture consists of 53 convolutional layers and employs cross-stage partial connections to
improve information flow between the different layers.

• The head of YOLOv8 consists of multiple convolutional layers followed by a series of fully
connected layers.

• These layers are responsible for predicting bounding boxes, objectness scores, and class
probabilities for the objects detected in an image.

• One of the key features of YOLOv8 is the use of a self-attention mechanism in the head of
the network.

• This mechanism allows the model to focus on different parts of the image and adjust the
importance of different features based on their relevance to the task.

• Another important feature of YOLOv8 is its ability to perform multi-scaled object detection.
The model utilizes a feature pyramid network to detect objects of different sizes and scales
within an image.

This feature pyramid network consists of multiple layers that detect objects at different scales,
allowing the model to detect large and small objects within an image.

Main Blocks

The first step to understanding the YOLO architecture is to understand that there are 3 essential

blocks in the algorithm, and everything will occur in these blocks, which are: Backbone, Neck

and Head. The function of each block is described below.

Backbone:

Function: The backbone, also known as the feature extractor, is responsible for extracting

meaningful features from the input.

19
Activities:
- Captures simple patterns in the initial layers, such as edges and textures.

- Can have multiple scales of representation as you go, capturing features from different levels

of abstraction.

- Will provide a rich, hierarchical representation of the input.

Neck:

Function: The neck acts as a bridge between the backbone and the head, performing feature
fusion operations and integrating contextual information. Basically the Neck assembles feature

pyramids by aggregating feature maps obtained by the Backbone, in other words, the neck

collects feature maps from different stages of the backbone.

Activities:

- Perform concatenation or fusion of features of different scales to ensure that the network can

detect objects of different sizes.

- Integrates contextual information to improve detection accuracy by considering the broader

context of the scene.

- Reduces the spatial resolution and dimensionality of resources to facilitate computation, a fact

that increases speed but can also reduce the quality of the model.

Head:

Function: The head is the final part of the network and is responsible for generating the

network’s outputs, such as bounding boxes and confidence scores for object detection.

Activities:

- Generates bounding boxes associated with possible objects in the image.

- Assigns confidence scores to each bounding box to indicate how likely an object is present.

- Sorts the objects in the bounding boxes according to their categories.

20
Figure 2.2: Yolo Object Detection.

Overall, YOLOv8 is a powerful and flexible tool for object detection and image segmentation that

offers the best of both worlds: the SOTA technology and the ability to use and compare all previous

YOLO versions.

Figure 2.3: Comparison of Various versions of yolo

2.2.3 Nepali Vehicle Classification and License Plate

In Nepal, all road vehicles with or without a motor (except bicycles) are tagged with a registration
number. This is issued by the state-level Transport Management Office, previously by a zonal-level
Transport Management Office, a government agency under the Department of Transport
Management. The license plates must be placed in the front as well as back of the vehicle. The

21
international vehicle registration code for Nepal is NEP. We are specially focused on recognition
of Devnagari plates; therefore, we won’t be discussing about emboss plates.

Vehicle classification: For the purpose of vehicle registration Vehicle & Transport Management
Act, 2049 (1992) and Vehicle & Transport Management Rule, 2054 (1997) of Nepal [ref law],
classifies vehicles into the following 5 main categories on the basis of size and capacity:

1. Heavy and medium-sized vehicle:


This includes bus, truck, dozer, dumper, loader, crane, Fire engine, tanker, roller, pickup, van,
mini bus, mini truck, minivan etc. having the capacity to carry more than 14 people (for
passenger vehicle) or more than 4 tons (for cargo vehicle).

2. Light vehicle:
This includes car, SUV, van, pick-up, micro bus etc. having the capacity to carry less than 24
people or less than 4 tons.

3. Two-wheeler:
This includes vehicle having two wheels like motorcycle, scooter etc.

4. Tractor and power-trailer

5. Three-wheeler:
This includes vehicles having three wheels like electric-safari, rickshaw etc.

The above-mentioned each category are further divided into 5 subcategories on the basis of
ownership and service-type which are as follows:

1. Private vehicle: Vehicles which are for entirely personal purpose.

2. Public vehicle: Vehicles which are for public transport purposes.

3. Government vehicle: Vehicles owned by government agencies and constitutional bodies


such as ministries, departments, directorates, along with the police, military, etc.

4. National Corporation vehicle: Vehicles which are registered under the name of public
corporations that are fully or partially owned by the government fall under this category.

5. Tourist vehicle: Vehicles which are registered for tourist transport.

There are three types of license plate currently in use; emboss plate, Devnagari provincial, and
Devnagari Zonal. This classification is made for the sake of convenience only. The emboss number
22
plate adoption has been very slow, only about 32 thousand vehicles have emboss number plate
installed [Kantipur Article]. Our custom yolov5 model can detect both emboss, and Devnagari
plates, however due to less representation in the dataset, emboss plate recognition can fail
sometimes. Since, the majority of vehicles in Nepal are yet to switch to emboss system, and it’s
likely to take few more years practically speaking despite legal obligation, we have focused on
Devanagari plates only.

Devnagari Zonal Plates:

The previous system of the license plate of Nepal consisted of four parts composed of letters (L)
and numbers (N) in the L N L NNNN format:

Figure 2.4: Zonal Plate Format

• L: indicates the zonal code, signifying the zone in which the vehicle is registered.

• N: is a 1- or 2-digit number which is prefixed when the four-digit number runs out from the
last part.

• L: indicates vehicle category, whether it is a privately owned vehicle, public commercial,


governmental, etc., as well as whether it is a heavy vehicle, medium-sized vehicle, or a light
vehicle.

23
• NNNN: signifies four digits running in sequence.

Table 2.1: Devanagari Zonal License Plate Characters


Vehicle's Category

Numbers Zonal Representation Heavy Size Three-Wheeler


Light Size
Middle Size Two-Wheeler
Plate Color
0 ) d] Mechi Private s KA r CA k PA

1 ! sf] Koshi Government u GA em JHA a BA

2 @ ; Sagarmatha Public v KHA h JA km PHA

3 # h Janakpur Diplomatic l; l8 C. D. l; l8 C. D. –
4 $ gf Narayani Tourist o YA o YA o YA
Public/National
5 % af Bagmati
Corporation 3 GHA ` NA –
6 ^ u Gandaki
7 & n' Lumbini
8 * w Dhaulagiri
9 ( e] Bheri
/f Rapti
s Karnali
;] Seti
df Mahakali

Devanagari Provincial Plates

Vehicle Transport Management Rule, 2054 (1997) was amended on 2075/07/08 to start issuing
license plate on provincial names. The step was taken after dissolution of fourteen zones, and
establishment of seven provinces. The provinces didn’t have name at the beginning, they were
called Province1, to Province 7. Provinces replaced the zonal code from license plate with province
names, and new format of Devnagari plates were defined.

24
Table 2.2: Province number and names
Province Number Province Names

k|b]z – )! sf]zL k|b]z


k|b]z – )@ dw]z k|b]z
k|b]z – )# afudtL k|b]z
k|b]z – )$ u08sL k|b]z
k|b]z – )% n'lDagL k|b]z
k|b]z – )^ s0ff{nL k|b]z
k|b]z – )& ;'b"/klZrd k|b]z

The new format is: L# NN NNN L NNNN, where # can be either letter or number.

• L indicates the provincial code, signifying the province in which the vehicle is registered.
It is either province number or province name as shown in above table.

• #: Before provinces were named, it used to be a number from 1-7, now it is replaced by
Nepali word for province ().

• NN: Indicates the serial number of Transport Management office inside the province where
the vehicle is registered. The number is given by Transport Management Department.

• NNN: is a 3-digit number which is prefixed when the four-digit number runs out from the
last part. Extra zeros are added to maintain the format.

• L: indicates vehicle category, whether it is a privately owned vehicle, public commercial,


governmental, etc., as well as whether it is a heavy vehicle, medium-sized vehicle, or a light
vehicle.

• NNNN: NNNN signifies four digits running in sequence, extra zeros are padded at the front
to maintain the format.

25
Table 2.3: Devanagari Province License Plate Characters
Vehicle's Category

Pradesh
Numbers Heavy Size Middle Three-Wheeler
Representation Light Size
Size Two-Wheeler
Plate Color
0 ) k|b]z – )! Private s KA r CA k PA

1 ! k|b]z – )@ Government u GA em JHA a BA

2 @ k|b]z – )# Public v KHA h JA km PHA

3 # k|b]z – )$ Diplomatic l; l8 C. D. l; l8 C. D. –
4 $ k|b]z – )% Tourist o YA o YA o YA
Public/National
5 % k|b]z – )^ Corporation
3 GHA ` NA –
6 ^ k|b]z – )&
7 &
8 *
9 (

Figure 2.5: Provincial Plate Format

26
3. UML Diagrams

Use Case Diagrams:

A Use Case Diagram is a visual representation of the interactions between users (actors) and a
system. It depicts the various ways users interact with a system to achieve specific goals or tasks.
In simple terms, it shows what the system does from the perspective of the users.

• Actors: Represent the users or external systems interacting with the system.

• Use Cases: Represent the specific functionalities or tasks the system provides to its users.

• Relationships: Show how actors interact with use cases.

Use Case Diagrams are used in software development to:

1. Define Requirements: They help stakeholders understand and agree on the functionalities
and behaviors of the system.

2. Communicate System Functionality: They provide a clear and concise overview of the
system's features and interactions.

3. Guide Development: They serve as a blueprint for developers to implement and test the
system's functionalities.

27
Use Case Diagram of our System:

Figure 3.1: Use Case Diagram

Sequence Diagrams:

A Sequence Diagram is a type of interaction diagram that illustrates how objects interact in a
particular scenario over time. It shows the sequence of messages exchanged between objects
within a system to accomplish a specific task.

• Objects: Represent instances of classes or components within the system.

• Lifelines: Represent the lifespan of an object during the interaction.

• Messages: Represent communication between objects, indicating the order and type of
interactions.

• Activation Bars: Indicate when an object is active and processing a message.

28
Sequence Diagrams are used in software development to:

1. Visualize System Behavior: They provide a visual representation of the flow of control and
data between objects in a system.

2. Identify Collaboration: They help identify the objects involved in a particular scenario and
how they interact with each other.

3. Analyze System Performance: They can be used to analyze the timing and efficiency of
interactions between objects.

4. Design and Debugging: They aid in designing and debugging software systems by providing
a clear understanding of system behavior and potential issues.

Sequence Diagram of our System:

Figure 3.2: Sequence Diagram

29
Activity Diagrams:

An Activity Diagram is a behavioral diagram that depicts the flow of activities within a system or
process. It illustrates the sequence of activities, actions, and decisions that occur from start to
finish to achieve a particular goal or objective. Activity diagrams are particularly useful for
modeling business processes, workflow systems, and software functionalities.

• Activities: Represent actions or tasks performed within the system or process.

• Transitions: Show the flow of control between activities, indicating the order in which
activities are executed.

• Decisions (Branches): Represent conditional branching points where the flow of control
can diverge based on certain conditions.

• Forks and Joins: Forks indicate parallel activities that can be executed concurrently, while
joins synchronize the flow of control back into a single path.

• Start and End Nodes: Mark the beginning and end points of the activity diagram.

Activity Diagrams are used in software development to:

1. Model Business Processes: They provide a visual representation of business processes,


helping stakeholders understand and analyze the sequence of activities involved in achieving
business goals.

2. Design Software Functionality: They aid in designing and documenting software


functionalities by illustrating the sequence of actions and decisions required to perform
specific tasks.

3. Identify Bottlenecks and Optimization Opportunities: They help identify potential


bottlenecks and areas for optimization within a process by visualizing the flow of activities
and decision points.

4. Clarify System Behavior: They provide a clear and intuitive representation of system
behavior, making it easier for stakeholders and development teams to communicate and
understand complex processes.

30
Activity Diagrams of our Model:

Figure 3.3: Activity Diagram

31
4. Methodology
Our system will detect vehicles and their number plate by processing the video as shown in the
following figure. YOLOv8n model trained on our custom dataset detects motorcycle, car, bus, truck,
license plate. The detected objects are tracked using Yolov8 and the objects being tracked are
analyzed for motorcycles, car, bus and truck. Any vehicles overrunning traffic lights will be
annotated by a red bounding box in the frame. For every license plate detected in a frame, we keep
track of id, given by Yolov8, and send it for recognition if it is continuously seen for three
consecutive frames. The license plate is fed to the character segmentation module, and the
segmented characters are sent for recognition. The output given by the character recognition
model is annotated along with the plate in the video frame.

Figure 4.1: Project Methodology

32
5. System design

Figure 5.1: Designed System

33
6. Results & Discussion
6.1 Object Detection
Before the image is fed to the YOLOv8 Detection model in our project the data is preprocessed.

Figure 6.1: Object Detection

The general practices for preprocessing were practiced and we dive into data preprocessing and
its augmentation we need to talk about the methods and hurdles we faced during data acquisition.

6.1.1 Data Acquisition

Data had to be collected for training dataset and to collect data for creating a dataset for YOLOv8
to detect vehicles and license plates, we used a mobile phone to capture images of publicly parked
vehicles. Also, we collected some of the images from videos uploaded on the internet. We
positioned the camera at various angles and heights to capture different views of the vehicles. We
ensured that each image contained at least one vehicle, and the license plate was visible, which
was especially challenging due to the high volume of traffic on the roads of Nepal.

When using a mobile phone to collect data, it is important to be aware of the artifacts that might
be introduced to the images and data. For instance, the depth of field blur might cause certain
parts of the image to be out of focus which might make it hard to see the characters of the license
plate. The motion blur might result from the movement of the vehicle or camera and could also

34
lead to image distortions. These artifacts can significantly affect the dataset’s quality and accuracy,
making it challenging to detect the vehicles and license plates.

Furthermore, lighting conditions were closely monitored in order to ensure optimal illumination
so as not to obscure any license plate details captured in images taken for this project’s purposes.
By carefully following such procedures during data collection processes overall resulting images
obtained had minimal defects thus allowing YOLOv8 algorithms capabilities accurate detection of
objects like vehicles and their associated label information with reliability being paramount
throughout each step.

By coordinating with the District Traffic Police Office, Dharan, we collected some CCTV footage.
To collect the images from the CCTV footage we manually took the screenshot of each frame where
a vehicle with a clear license plate was observed.

The images collected thus far need to be labeled properly in order to turn it into useful data to
create a dataset for our training model. In order to label the images, we used an open-source tool
called “LabelImg” which is free to use. It does not store any of our images as it runs locally and
supports multiple output formats like YOLO, VOC XML, VGG JSON, CSV. More importantly it allows
us to use a custom model that we trained on the 1st batch of 1000 images to label the next batch
of images which makes our manual labeling job a bit easier. We used a lightweight YOLOv8n model
to help us label the images we acquired.

6.1.2 Data Preprocessing

Regardless of all the measures we took while acquiring the data some unwanted artifacts are
introduced in the image which hinders the accuracy of the model. To minimize its effects the
images 1st goes through histogram equalization which is a commonly used technique for image
preprocessing to adjust the image’s pixel intensities, enhance the contrast of the image and
increase its visual features.

The process of histogram equalization involves computing a histogram of the image’s intensity
values and then redistributing these values to create a more uniform distribution. The result is an
image with enhanced contrast, where the dark and light areas of the image are more
distinguishable. This technique is especially useful for images that have a limited range of pixel
intensities, resulting in low contrast and making it challenging for YOLOv8 to detect objects
accurately.

35
6.1.3 Model Training

YOLOv8n model was trained for detecting vehicles and number plates on a dataset of 1778 images
of 4 classes which was split in the ratio of 80:20 for train and validation respectively. Instead of
initializing the weights randomly, pretrained weights on COCO2017 dataset was used. After
training the model for 60 epochs, there was no improvement in the model, hence training process
was terminated.

When we evaluate the accuracy of any model, we consider evaluation metrics like mAP (mean
accuracy precision), recall and precision, confusion matrix, object loss, classification loss.

Confusion matrix

A confusion matrix is a table that is often used to evaluate the performance of a classification
model. It provides a detailed breakdown of how many instances were correctly or incorrectly
classified by the model, allowing us to measure the model’s overall accuracy.

A confusion matrix is typically composed of four different metrics:

1. True Positives (TP): This refers to the number of positive instances that were correctly
classified by the model.

2. False Positives (FP): This refers to the number of negative instances that were incorrectly
classified as positive by the model.

3. False Negatives (FN): This refers to the number of positive instances that were incorrectly
classified as negative by the model.

4. True Negatives (TN): This refers to the number of negative instances that were correctly
classified by the model.

By combining these metrics, we can create a table that summarizes the performance of the
classification model. The table is organized into a grid, with the actual class labels on one axis and
the predicted class labels on the other.

36
Table 5.1: Confusion Matrix Table

Actual Positive Actual Negative

Predicted Positive True Positive (TP) False Positive (FP)

Predicted Negative False Negative (FN) True Negative (TN)

Using this table, we can calculate a number of different metrics that are useful for evaluating the
performance of a classification model. For example:

1. Accuracy: This refers to the overall percentage of instances that were correctly classified by
the model. It is calculated as (TP + TN) / (TP + TN + FP + FN) [9].

2. Precision: This refers to the percentage of instances that were classified as positive by the
model that were actually positive. It is calculated as TP / (TP + FP) [9].

3. Recall: This refers to the percentage of actual positive instances that were correctly
identified by the model. It is calculated as TP / (TP + FN) [9].

4. F1-Score: This is a weighted average of precision and recall, with a higher score indicating
better performance. It is calculated as 2 * ((precision * recall) / (precision + recall)).

Overall, the confusion matrix is a powerful tool for evaluating the performance of a classification
model. By breaking down the model’s predictions into a detailed table, we can gain valuable
insights into its strengths and weaknesses and identify areas for improvement.

Type 1 error

Type 1 error occurs when the model predicts a positive instance, but it is actually negative.
Precision is affected by false positives, as it is the ratio of true positives to the sum of true
positives and false positives.

Type 2 error

37
Type 2 error occurs when the model fails to predict a positive instance. Recall is directly affected
by false negatives, as it is the ratio of true positives to the sum of true positives and false
negatives.

Table 5.2: Type I and Type II error

The confusion matrix for our trained model and precision, recall graphs are as follows:

Figure 6.2: Confusion Matrix Yolov8n Model

38
Figure 6.3: Recall Confidence Plot

Figure 6.4: Precision Confidence Plot mAP

39
mAP(mean Average Precision), is an evaluation metric in object detection and image classification
tasks. It measures the accuracy of an algorithm in identifying objects within an image and
assigning a level of confidence to the identified objects.

The value of mAP is calculated by taking the average of the precision values at different recall
levels. Here, the recall level represents the percentage of objects detected correctly out of all the
objects that are actually present in the image.mAP at 0.5 is a specific threshold used in object
detection tasks. It refers to the minimum level of intersection over union (IoU) required between
the predicted bounding box and the ground truth bounding box for the detection to be considered
correct.

In other words, if the predicted bounding box overlaps with the ground truth bounding box by at
least 50%, then the detection is considered correct and contributes to the mAP at 0.5 score. This
threshold of 0.5 is commonly used in object detection tasks as it strikes a balance between
precision and recall.

Our model’s performance upon referring to the mAP curve at 0.5 threshold shows that the average
precision is more than 0.928 at approximately 60 epochs.

YOLOv5s Training Result

Figure 6.5: YOLOv5s Training Result

40
Figure 6.6: Training Batch

Figure 6.7: Validation Batch

41
6.2 Object Tracking with YOLOv8

Figure 6.8: Yolov8 Tracking

Object tracking plays a crucial role in video analytics, enabling the identification and classification
of objects while maintaining their unique identities as the video progresses. With the advent
of deep learning and computer vision technologies, YOLOv8 has emerged as a powerful solution
for real-time object tracking.

When it comes to object tracking in video analytics, Ultralytics YOLO stands out as a top choice. Its
powerful features and capabilities make it the preferred solution for a wide range of applications.
Let's explore why Ultralytics YOLO is the go-to tool for object tracking. First and
foremost, Ultralytics YOLO excels in processing real-time video streams without
compromising accuracy. Whether you're tracking moving objects in surveillance footage or
analyzing objects in a live feed, Ultralytics YOLO offers exceptional performance, ensuring that you
capture every object with precision.

Another key advantage of Ultralytics YOLO is its versatility in tracking algorithms and
configurations. With multiple tracking algorithms to choose from, you can tailor the tracking
process to suit your specific requirements. Whether you need to track objects in crowded scenes
or challenging environments, Ultralytics YOLO has the capability to handle it effectively.

42
One of the standout features of Ultralytics YOLO is the ability to use custom trained models. With
custom trained YOLO models, you can enhance the tracking performance and adapt it to specific
domains and scenarios. This empowers you to address unique challenges and achieve highly
accurate object tracking results.

Available Trackers

Ultralytics YOLOv8 provides a range of available trackers to choose from. Two popular options
are:

BoT-SORT: BoT-SORT (Bounding Box Tracker with Simple Online Real-time Tracking) is a popular
and efficient tracker that combines object detection with a simple online tracking algorithm. It
effectively associates detections over consecutive frames, providing accurate and continuous
tracking results.

ByteTrack: ByteTrack is another noteworthy tracker offered by Ultralytics YOLOv8. It is built


specifically for object tracking purposes, providing excellent performance in real-time scenarios.
ByteTrack utilizes a deep regression framework to achieve reliable tracking results.

6.3 Plate Localization


Localization refers to identifying the location of an object in the image. In any ANPR system, plate
localization is the first stage. There are many approaches to this essential task, before getting
ready for plate recognition. Basic image processing techniques can be used in ANPR when done
under regulated lighting settings with predictable license plate types. P. R. Sanap and S. P. Narote
[23] have summarized various methods of license plate localization; histogram, morphological
processing, texture, edge detection, and transformation are the bases for the number plate
detection techniques.

Karthikeyan and V.J. Vijayalakshmi [24] used morphological operation for license plate
localization. The algorithm used morphological operations on the preprocessed, edge images of
the vehicles. Characteristic features such as license plate width and height, character height and
spacing are considered for defining structural elements for morphological operations.

Traditional approach of plate localization using morphological operations, and four-point contour
detection is difficult to generalize given the various types of license plate being used in Nepal right

43
now. [25][26] Law clearly defines the size, color, character size and spaces between lines and
character, but people have taken liberty to use number plate that fits their vehicle. Introduction of
emboss number plate has certainly brought some standardization, but its adoption rate has been
very slow.

Given the pretext, the task of plate localization was done by training YOLOv8 on a custom dataset,
both images and video, taken at various location and from multiple angles. YOLOv8 model trained
on our custom dataset identifies motorcycle, car, bus and license plates. The object detection step
draws a bounding box around the detected license plate above a set confidence threshold. The
plate candidate detected by the object detector is called the region of interest (ROI).

Figure 6.9: License Plate Localized by YOLOv8n

44
6.4 Processing ROI
Preprocessing the acquired image is the first stage in any image processing system. Preprocessing
involves some actions on the image to improve the area of interest, and as a result, these
procedures are wholly dependent on context. In our situation, the area of interest is the vehicle’s
license plate, so we put the following processes into place to transform the image into one that
can be processed further and improve the effectiveness of the segmentation algorithm.
Preprocessing of ROI includes, region grow, HSV Color Space Conversion, color masking, and
perspective transform.

1. HSV Color Space Conversion: Colors are described in terms of Hue, Saturation, and Value
according to the Hue, Saturation, and Value (HSV) color space model. When color
description is important, the HSV color model is frequently chosen over the RGB model.
Similar to how the human eye perceives color, the HSV model defines color [15]. Whereas
RGB depicts color as a mixture of primary colors, HSV characterizes color using more
relatable comparisons like color, vibrancy, and brightness. Color definitions are contained
within a hexcone and the coordinate system is cylindrical. The range of the hue value H is 0°
to 360°. The saturation S ranges from 0 to 1 and indicates the degree of strength or purity.
The brightness, which similarly varies from 0 to 1, is represented by value V.

2. Contrast Limited Adaptive Histogram Equalization (CLAHE): This is the improvement over
the Adaptive Histogram Equalization (AHE). AHE divides the image into distinct blocks and
computes histogram equalization for each section. Thus, AHE computes many histograms,
each corresponding to a distinct section of the image. CLAHE improvises AHE by employing
bilinear interpolation to remove artificial boundaries caused by merging the different
sections/tiles.

3. Color Masking: The image in HSV color space is now ready to be masked by using
appropriate color mask. Nepali license plate can come in multiple colors as discussed earlier
depending on the ownership of the vehicle. Our interest of vehicle is private vehicles, which
have red color plate with white characters. Thus, we masked the red color from the image.
In HSV model, the Hue value from around 0°- 10° and 350°360° can be approximated as the
red color. Using these ranges, we masked the red color regions from the images or frames
[15].

4. Perspective Transform: The color masked image is sent for external contour detection, the
contours are arranged by area, and the largest contour by area is approximated by a

45
minimum rectangle. This is where the characters in the plate are located. Due to variation in
camera position, and angle, the plate could be oriented in any direction. If such plate is sent
for segmentation, the algorithm won’t perform well [27]. The four coordinate points of the
minimum rectangle found earlier is arranged in order: top-left, top-right, bottom-right, and
bottom-left. A blank image whose width is the maximum distance between bottom-right and
bottom-left x-coordinate or the top-right and top-left x-coordinate, and height is the
maximum distance between the top-right and bottom right y-coordinate or top-left and
bottom-left y-coordinate is created.

Now we have the size of new image, a set of destinations points are obtained in the same
order: top-left, top-right, bottom-right, and bottom-left. To obtain a “birds eye view” or top-
down view of the plate, a perspective transform matrix is calculated using OpenCV’s get
PerspectiveTransform () method, and this matrix is used to compute the perspective
transform.

Figure 6.10: Pre-processing Detected License Plate ROI

6.5 Character Segmentation


The vehicles on the road are detected. The license plate number is unique to every vehicle on the
road. It conveys some additional information about the vehicle, such as vehicle type, and
ownership [26] [25]. The detected license plate must be processed prior feeding it to a CNN
trained on Nepali license plate characters for recognition. Character segmentation means
extracting the characters from a given license plate.

Yungang Zang and Changshui Zhang developed a new algorithm [28] using Hough transformation
and the prior knowledge in horizontal and vertical segmentation respectively. Advantages of this

46
algorithm are; no need of rotation correction of plate images, influence of background is weakened
and also the illumination variance.

Feng Yang, Zheng Ma, and Mei Xie [29] proposed a region growing based segmentation technique.
Contrast stretching transformation is used to improve character areas, and then Laplacian
Transformation is used to detect edges. With the aid of the region-growing algorithm, the
locations of the potential regions are discovered. Using predetermined parameters, the process of
“region growing” groups pixels or smaller regions into bigger ones. The method was tested on 320
images with a success rate of 97.2%.

Figure 6.11: Overview of Character Segmentation Steps

Pre-processing

The object detector gives a region of interest, we process the region of interest and get the region
of the plate where characters are present. The plate’s perspective must be top-down view to
ensure proper segmentation of characters. In the pre-processing step, we want to reveal the
characters clearly from the background of the plate. Before extracting characters, raw plate image

47
is fine-grained. It is the stage when unnecessary marks, noise, and blobs are removed. Dirt and
dust on the plate, old, discolored plates, and distorted characters cause problem in segmentation.
The primary preprocessing steps are histogram equalization (Clip Limited Adaptive Histogram
Equalization), grayscale, median filter, morphological operation, and binarization (Otsu’s
method).

RGB to Grayscale Conversion

By adding the R, G, and B weighted sums, a 24-bit RGB image is transformed into an 8-bit grayscale
image. The grayscale signal used to display images on monochrome televisions is chosen using
the same weights as the NTSC color space [30]. The grayscale picture for the RGB image f(x,y) is
provided by the formula:

g(x,y) = 0.2989 ∗ fR + 0.5870 ∗ fG + 0.1140 ∗ fB (5.1)

Where, fR, fG, and fB are the red, green, and blue components of the RGB image f(x,y), respectively.

Histogram Equalization

A histogram is a representation of frequency distribution of pixel intensity. This method usually


increases the global contrast of many images, especially when the image is represented by a
narrow range of intensity values. Global Histogram Equalization (GHE) is very simple and fast, but
its contrast enhancement power is low.

Figure 6.12: Histogram of Gray Image and Grayscale image

48
Median Filter

One of the crucial steps in image preparation for segmentation is noise removal. Filtering is used
to remove noisy image pixels. The noise removal method utilized in this case is non-linear median
filtering. An efficient way to reduce noise is to use a median filter, which can do so without
obliterating sharp edges [30]. A pixel is replaced with a median filter with the neighborhood
median value. The median filtered image for the digital image f (x, y) is derived as,
g(x,y) = median{f(i,j) | (i,j) ∈ w} (5.2)

Where, w is the neighborhood in the image centered on position (x,y).

Binarization (Otsu’s Method)

The central task of character segmentation is distinguishing the characters (foreground) from the
plate (background). The segmented image g(x,y) for the grayscale image f(x,y) is produced using
the image binarization procedure described below.

(5.3)
where, T is the threshold value and it can be obtained using the Otsu’s threshold selection
technique for grayscale image segmentation [31].The automatic threshold selection method
developed by Otsu for gray level picture binarization is nonparametric and unsupervised. An
optimal threshold is selected by the discriminant criteria i.e., by maximizing the interclass
variance between white and black pixels [30].

Morphological Operation

Morphological transformations are some simple operations based on the image shape. It often
takes place on binary images. It requires two inputs: our original image as the first, and the
structural element or kernel as the second, which determines the type of operation. Erosion and
dilation are two fundamental morphological operations. Then, its alternative forms, such as
Opening, Closing, Gradient, etc., also come into play.

Erosion: The fundamental concept of erosion is similar to soil erosion, except it only removes the
boundaries of foreground objects (always try to keep foreground in white). The kernel traverses
the picture (as in 2D convolution). A pixel in the original image—whether it is 1 or 0—will only
be treated as 1 if every pixel under the kernel is 1, otherwise it is eroded (made to zero).
49
Thus, depending on the size of the kernel, all pixels close to the boundary are eliminated. So, the
foreground object’s thickness or size, or just the white region itself, reduces in the image. It can be
used to eliminate faint white noises, separate two related items, like the characters in our case,
and more.

Dilation: It is directly opposed to erosion. A pixel element in this case is a ”1” if at least one pixel
under the kernel is “1”. Therefore, either the size of the foreground object or the white area in the
image grows. Typically, erosion is followed by dilation in situations like noise abatement. Because
erosion reduces the size of our item while simultaneously removing white noise. So, we dilate it.
They won’t return because the noise is gone, but our object area grows. It can also be used to
repair disconnected pieces of a character.

Cleaning

The preprocessed plate is not yet ready for segmentation, there might be plate boundary, dirt,
dust, and noise left in the plate. These unwanted objects will drastically hamper the performance
of segmentation algorithm. We clean them by finding contours all contours and filtering the
unwanted objects based on area, aspect ratio, width, height, solidity and extent.

A mask is created for all those objects which pass these tests, and ‘bitwise-and’ of original image
is taken with the mask. The threshold for above mentioned tests is determined based on
experiment, and such a value is chosen that performs well for all types of plates.

Figure 6.13: Character Segmentation Stage 1

50
Segmented Characters

Based on the number of horizontal, and vertical projection peaks, we determine the type of
license plate. If the plate passes tests in all rows, characters are segmented from the license plate
image at the specified bounding box of each character, and same for recognition.

Figure 6.14: Segmented Characters, final stage

51
6.6 Plate Recognition

Figure 6.15: Character Recognition Process Overview

6.6.1 Character Pre-processing

The segmented characters are fed to a CNN neural network trained on license plate character
dataset. Before, feeding the characters to the trained model, it must go throw following steps:

52
1. Resize character to (64,64)

Figure 6.16: Segmented Character

6.6.2 Character Recognition

Neural Networks

A neural network is a type of machine learning which models itself after the human brain. This
creates an artificial neural network that via an algorithm allows the computer to learn by
incorporating new data. Image is a 2-dimensional data, containing spatial information of pixel
intensity. Before feeding the ANN, we need to extract the features from the input image. Feature
extraction is a crucial step in image classification tasks.

Feature Extraction

A convolution operation with the kernel is performed to extract the features. This gives rise to a
Convolution Neural Network. Convolution operation is the fundamental building blocks of a
Convolution Neural Network. CNNs are state of the art algorithms for computer vision problems
like object detection, localization, recognition and so on. Filters/kernels in the CNN layers are used
to extract the features from the images. This difficult task of assigning the filters parameters is
learned in a supervised manner.

We emphasize that all weights in all layers of a convolutional network are learned through
training. Moreover, the network learns to extract its own features automatically.

Convolutional Layer

When designing a CNN, each convolutional layer within neural network should have the following
attribute: Input is a tensor with shape (number of images) x (images width) x (images height) x

53
(images depth) can convolutional kernels whose width and height are hyper-parameters, and
whose depth must be equal to that of the image.

Our Custom Model consists of 5 convolution layers, 3 max-pooling layers, 2 Normalized layers, 2
fully connected layers and 1 SoftMax layer. Each convolution layer consists of a convolution filter
and a non-linear activation function called “ReLU”. The pooling layers are used to perform the max-
pooling function and the input size is fixed due to the presence of fully connected layers. The input
size is mentioned at most of the places as 64x64x3. Above all this model has over 20 million
parameters.

Kernel

Kernels are matrix-like structures with weights according to the relevant dimensions and will
do the operations with the feature map in the layers. For example, the first convolution layer
has an 3 by 3 kernel matrix, and it will do convolution operation in there with the feature map.

Figure 6.17: Example of convolution operation of 2*2 kernel with 4*4 feature map (When
stride is one.)

Strides

Strides are simply the number of steps jumped by the kernels. It may be given in horizontal or
in the vertical direction. Strides are very important in CNN for extraction of features or in up
sampling too.

54
Figure 6.18: Different Number of Strides usage in a convolution layer

Padding

Two types of padding methods are used here. For someone who doesn’t know about the
concepts, padding is basically the extra features added in the corners of the feature map which
is useful for extracting essential features in the corners. Otherwise, they will disappear
sometimes accordingly with the filter size and the number of strides used there.

Same Padding: - This is also known as zero padding. Here if the kernel is unable to extract all
the features in the feature map, columns and rows are added around the feature map to fulfill
the operations with the kernel. The value of columns and rows added is zero. That is why it is
called zero padding. If there exists the same padding with the number of strides is one, the
input size is equal to the output size. That is why it is called the same padding.

Valid Padding: - There are no changes to the existing feature map. That is there is no padding.

55
Figure 6.19: Example of Padding

Pooling layer

ConvNet uses pooling to reduce the size of representation to speed the computation. This layer
has no parameters to learn. It is based on the intuition that most of the features in the kernel size
are held by pixels with the largest value (max pooling).

Figure 6.20: Maxpool layer

Activation Function

Activation function allows us to transform the given input into a required output that has a certain
range. Activation function introduces non-linearity, so as to make the network learn complex
patterns.

56
ReLU Layer

The rectified linear activation function or ReLU for short is a piecewise linear function that will
output the input directly if it is positive, otherwise, it will output zero. The ReLU function has been
found to be very good for networks with many layers because it can prevent vanishing gradients
when training deep networks and fast computation. f(x) = max(0, x)

Figure 6.21: ReLU Activation Function

Fully Connected Layers

Features extracted from the Convolution Neural Network are flattened to form a 1-dimensional
feature set. This 1- dimensional feature is fed to Fully connected layers, with input from earlier
convolution layers (filter height x filter width x number of channels).

57
Figure 6.22: Fully Connected Layer

Softmax function

The softmax function is a function that turns a vector of K real values into a vector of K real
values that sum to 1. Softmax function in the final layer of the Neural Network, which assigns
the probability associated with each class. The probability from different classes sums up to
1.

(5.10)
⃗z - the input vector to the softmax function, made up of (Z0,...,Zk)

K - number of classess in the multi-class classifier

Cross-Entropy Loss

Loss functions are used to quantize the difference between the actual and predicted labels.

58
Higher the loss, the worse the model is performing.
C

CE = −Xti log(si) (5.11)


i=1

Where ti and si are the growth truth and CNN score/ output vector of the earlier layer
respectively.

Categorical Cross-Entropy Loss

It is a Softmax activation plus a Cross-Entropy loss.

Figure 6.23: Categorical Cross-Entropy Loss

Where sp is the CNN score for positive class.

In the specific case of multi-Class classification, the labels are one-hot, so only the positive
class Cp keeps its term in the loss. There is only one element of the Target vector t which is
not zero ti = tp. So, discarding the elements of the summation which are zero due to target
labels.

Optimization

Optimization is the iterative process of minimizing model errors. It concerns minimizing the
loss. A loss function is often referred to as the objective function for the optimization
problem. An optimum solution for the objective function is the global minimum value of the

59
loss function for that model. Optimization algorithms: Gradient Descent, Mini-batch gradient
descent, RMS prop, Adams Optimization algorithm.

ADAMs Optimization algorithm

Adaptive Moment estimation is an optimization algorithm formed with a combination of the


‘gradient descent with momentum’ algorithm and the RMS propagation algorithm. In the
momentum gradient descent algorithm, the gradient descent algorithm is accelerated by
taking the exponentially weighted average of the gradients reducing the oscillation while
descending. By taking exponential weight, it makes the algorithm converge toward the
minimum quicker. The gradient of complex functions in the neural network may be subjected
to either vanish or explode as data propagates through the function. RMS prop deals with the
above problem by using the moving average of squared gradients to normalize the gradient.
We compute the decaying averages of past and past squared gradients mt and vt respectively
as follows:

mt = β1mt−1 + (1 − β1)gt

vt = β2vt−1 + (1 − β2)gt2
Here, mt and vt are estimates of the first moment (the mean) and the second moment (the
uncentered variance) of the gradients respectively. 1 and 2 represent the decay rates which
are small (i.e., 1 and 2 are close to 1), gt refers to a derivative of the loss function with respect
to a derivative of weight at time t(dL/dWt). For computing bias-corrected first and second
moment estimations, Now, instead of our normal weight parameters mt and vt, we take the
bias corrected weight parameters 𝑚
̂ t and 𝑣̂t . Bias correction for the fact that first and second
moment estimates start at zero. Now new weight (wt) is calculated as

Where ϵ is the very small positive number to avoid divide by zero.

60
Model

The architecture is inspired from the AlexNet architecture for image classification consisting of
5 different layers.

Our Custom Model consists of 5 convolution layers, 3 max-pooling layers, 2 Normalized layers,
2 fully connected layers and 1 SoftMax layer. Each convolution layer consists of a convolution
filter and a non-linear activation function called “ReLU”. The pooling layers are used to perform
the max-pooling function and the input size is fixed due to the presence of fully connected
layers. The input size is mentioned at most of the places as 64x64x3. Above all this model has
over 20 million parameters.

Figure 6.24: Model Architecture

61
Figure 6.25: Model Summary

For the input layer, the size of the feature map should be 64*64*3. Here three denotes the
depth. All the layers with their feature map sizes are listed in the following table.

62
Table 6.1: Feature Map of different layers

6.6.3 Multi-class classification model evaluation metrics

Training loss and training accuracy

Training loss and accuracy are used to evaluate the performance of a machine learning model
during the training phase.

Training loss is a metric that defines how well the model is able to fit the training data. It can
be computed as the difference between the predicted and actual values for each training
example and then averaged across all the training examples. The main goal is to minimize
the training loss thereby meaning that the model is able to make more accurate predictions
on the training data. The training loss is computed after every batch or epoch of training, and
its value decreases over time as the model learns to make better predictions.

On the other hand, training accuracy is a metric that is used to measure the percentage of
instances in the training set that were correctly classified. It can be calculated by dividing the
number of correct predictions in the training set by the total number of instances. A high
training accuracy indicates that the model is performing well on the training data set. But it
doesn’t necessarily ensure good performance on new, unseen data, as the model may have
overfit to the training data.

63
Validation loss and validation accuracy

Validation loss and accuracy are used to evaluate the generalization performance of a
machine learning model during the training phase. Validation loss is a metric that defines
how well the model is able to generalize to new and unseen data. It can be calculated as the
average difference between predicted values and true values over a validation set. The main
goal is to minimize the validation loss thereby meaning that the model is able to make more
accurate predictions on new data. The validation loss is computed after each epoch of
training. On the other hand, validation accuracy measures the percentage of instances in the
validation set that were classified correctly. It can be calculated by dividing the number of
correct predictions by the total number of instances in the validation set. A high validation
accuracy indicates that the model is able to generalize well to new and unseen data sets.

When validation loss is largely greater than the training loss, it may indicate that the model
is underfitting. It occurs when the model can’t accurately model the training data and
produces large errors. In such cases, additional training is required to minimize the incurred
loss during training. The training data can also be increased either by obtaining more
samples or augmenting the data.

If the training loss and accuracy continue to improve while the validation loss and accuracy
start to degrade, it may indicate overfitting. In this case, adjustments need to be made to the
model to improve its generalization performance such as reducing its complexity or adding
regularization techniques. The validation set can also be used to compare different models
and select the best one based on its validation performance.

64
When the training loss and validation loss both decrease and stabilize at a specific point, it
indicates an optimal fit i.e., the model does not overfit or underfit.

Figure 6.26: Train/Validation Model Error and Accuracy

65
Classification Report for Model

Figure 6.27: Test Classification Report for Mode

66
7. Conclusions
The project has successfully shown the possibilities of employing the YOLOv8n model for
object detection and tracking objects. Also, the license plate segmentation and Character
Recognition using Classification model which extracts and recognition the characters from
the license plate. The project demonstrated the use of deep learning models in practical
applications the project can be further enhanced for automatic traffic control, electronic toll
collection, vehicle tracking and monitoring, border crossing, security and many more.

67
8. Limitations and Future enhancement
Limitations:

1. If the object detection fails, the entire process of License plate Detection and Recognition
will fail. Hence, object detection is the key, and the most crucial part of the project.

2. The custom dataset doesn’t have all classes represented equally, hence mAP if only
0.683 which can surely be improved with balanced dataset.

3. If the object ID is switched due to occlusion, multiple copies of same vehicle will be
recorded on to the database.

4. Provincial number plates have many variations, hence currently the system doesn’t
recognize the province full name and transport management office where the vehicle is
registered.

5. If the license plates are very old, color washed out or unclear characters, character
recognition won’t perform well.

6. Emboss plates can be detected, but characters are not recognized.

7. Any other number plate apart from Nepali Number Plate won’t be recognized.

Enhancement:

Object detection is the first crucial step in the project, a balanced dataset with all classes
represented equally is definitely one way to improve the model. Adding new data, and having
sample variation in the dataset representing real deployment environment, will help to
improve model’s performance. As far as recognition of license plate is concerned, only private
and public vehicles of Bagmati and Koshi province are considered hence, it can be extended
to other vehicle types, and even other provinces. Due to existing variation in license plates,
provincial plates are a real challenge. The project can be improved to deal with all variations
of license plates, and even emboss plates written in English. To improve character
recognition, the detected license plate’s image can be enhanced as detecting motion blur, and
taking the best image from 5-10 frames. With these enhancements, the project will be able
to deal with all variations of number plates currently being used in Nepal.

68
References
[1] W. Team, Global status report on road safety 2018, https://www.who.int/publications/
i/item/9789241565684, 2018.

[2] Nepal accidental description, https://www.traffic.nepalpolice.gov.np/index.


php/news/traffic-activities/425-annually-accidental-descriptions.

[3] Nepal’s other pandemic : Road fatalitites, https://www.nepalitimes.com/multimedia/


nepal-s-other-pandemic-road-fatalities, 2021.

[4] S. Phuyal, Road kill, kathmandu: Nepali times, https://archive.nepalitimes.com/


article/nation/traffic-accidents-continue-to-increase-worryingly-inNepal,2799.

[5] V Karthikeyan and V. Vijayalakshmi, “Localization of license plate using morphological


operations,” arXiv preprint arXiv:1402.5623, 2014.

[6] Vehicle transport management rule, 2054 (1997), 1997.

[7] A. Rosebrock, Point opencv getperspective transform example-pyimagesearch, 2014, 4.

[8] Y. Zhang and C. Zhang, “A new algorithm for character segmentation of license plate,” in
IEEE IV2003 Intelligent Vehicles Symposium. Proceedings (Cat. No. 03TH8683), IEEE,
2003, pp. 106–109.

[9] F. Yang, Z. Ma, and M. Xie, “A novel approach for license plate character segmentation,” in
2006 1st IEEE Conference on Industrial Electronics and Applications, IEEE, 2006, pp. 1–6.

[10] A. R. Smith, “Color gamut transform pairs,” in ACM Siggraph Computer Graphics, vol. 12,
no. 3. ACM, 1978, pp. 12–19.

[11] S.-L. Chang, L.-S. Chen, Y.-C. Chung, and S.-W. Chen, “Automatic license plate recognition,”
Intelligent Transportation Systems, IEEE Transactions on, vol. 5, no. 1, pp. 42–53, 2004.

[12] D. P. Suri, D. E. Walia, and E. A. Verma, “Vehicle number plate detection using sobel edge
detection technique,” International Journal of Computer Science and Technology, ISSN,
pp. 2229–4333, 2010.

69
Final Output

70

You might also like