0% found this document useful (0 votes)
11 views6 pages

License Plate Detection Using Deep Learning Object Detection Models

The document discusses a study on license plate detection using deep learning models, highlighting the limitations of traditional image processing methods. It focuses on the implementation of YOLOv4 and EfficientDet-D7 models, achieving significant improvements in detection accuracy through preprocessing and architectural modifications. The research aims to enhance real-world applications such as traffic management and parking systems in Malaysia, addressing the challenges posed by varying image conditions.

Uploaded by

emil hard
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views6 pages

License Plate Detection Using Deep Learning Object Detection Models

The document discusses a study on license plate detection using deep learning models, highlighting the limitations of traditional image processing methods. It focuses on the implementation of YOLOv4 and EfficientDet-D7 models, achieving significant improvements in detection accuracy through preprocessing and architectural modifications. The research aims to enhance real-world applications such as traffic management and parking systems in Malaysia, addressing the challenges posed by varying image conditions.

Uploaded by

emil hard
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

.

2023 9th International Conference on Computer and Communication Engineering (ICCCE)

LICENSE PLATE DETECTION USING DEEP


LEARNING OBJECT DETECTION MODELS
2023 9th International Conference on Computer and Communication Engineering (ICCCE) | 979-8-3503-2521-8/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICCCE58854.2023.10246050

Kar Wan Leong Humaira Nisar Vooi Voon Yap


Department of Electronic Engineering Department of Electronic Engineering Department of Computer Science
Universiti Tunku Abdul Rahman Universiti Tunku Abdul Rahman Aberystwyth University
Kampar, Malaysia Kampar, Malaysia Penglais, UK
[email protected] [email protected] [email protected]

Kim Ho Yeap Po Kim Lo


Department of Electronic Department of Industrial
Engineering Engineering
Universiti Tunku Abdul Rahman Universiti Tunku Abdul Rahman
Kampar, Malaysia Kampar, Malaysia
[email protected] [email protected]

Abstract— Object detection is an extension of image Invariant Feature Transform (SIFT) [1], Speeded Up Robust
classification tasks in computer vision. Its goal is to locate an Features (SURF) [2], Features from Accelerated Segment
object of interest in any given image input. In the past, this was Test (FAST) [3], Binary Robust Independent Elementary
done by traditional hand-crafted feature algorithms i.e., SIFT, Features (BRIEF) [4], and Oriented FAST and Rotated
SURF, HOG, BRIEF, and ORB. These algorithms have been BRIEF [5] are used to locate and identify an object from a
successful in their field however they do possess some given image frame. Over the past decade, a new method
downsides due to their nature. For example, they can be slow known as deep learning (DL) has overtaken the traditional
in detection speed, not as accurate, and difficult to develop. methods in the field. However, there are many unknown
Since 2012, deep learning has become an emerging technology
factors when deploying deep learning algorithms in real-life
that can solve object detection problems with relatively better
performance. However, not many studies have been done to
applications, e.g., how well does the DL model perform?
deploy deep learning object detection models in real-world This paper will dive into a popular use case scenario i.e.,
scenarios, e.g., license plate detection. License plate detection is license plate detection task using DL models which in the
a challenging task in computer vision because the input image past was performed using traditional image processing
captured can be in different sizes, colors, distances, methods.
orientations, and lighting conditions. This project aims to study License plate recognition is an important task in many
and improve license plate detection using deep learning real-life applications e.g., parking management system,
models. As of the current year, the model YOLOv4 has
traffic control system. In Malaysia, the number of cars on the
achieved 43.5% Average Precision (AP) on MS COCO.
Meanwhile, EfficientDet-D7 has achieved 55.1 AP on COCO
road has been increasing each year. With the increase in
test-dev. In this paper off-the-shelves object detection models automotive volume especially in the cities, license plate
are trained on CCPD license plate dataset. Two approaches recognition systems can be useful on busy roads or shopping
have been carried out to improve their accuracy, i.e., image malls to avoid traffic congestion, car park management, etc.
preprocessing step, and modifying existing model architecture. In China, the highway is always stuck with loads of traffic
Preprocessing steps show improvement for all test sets in terms going back to hometown every Chinese New Year. In
of (TP-FN-FP) value, i.e., (69-31-45 to 75-25-43) for db test set, Malaysia, due to the toll station, traffic congestion will
(70-30-28 to 73-27-28) for blur test set, (52-48-75 to 66-34-68) happen if the number of vehicles keep on increasing in the
for fn test set, (92-8-9 to 97-3-4) for rotate test set, (77-23-26 to coming years. One innovative way to avoid the same
85-15-18) for tilt test set, and (67-33-61 to 76-24-56) for happening as in China is to develop a license plate
challenge test set. The improved model has achieved (96.96%) recognition system with a high-speed camera. The car
mean Average Precision [email protected] on the validation dataset doesn’t need to stop completely at the toll station for
compared to the original model (83.64%) payment hence the traffic congestion can be avoided. This
can also eliminate many sub-systems from the toll station
Keywords—object detection, deep learning, license plate such as the automated car blocker, digital card scanner, etc.,
detection saving a high amount of maintenance fees each year.
Another example would be to track stolen cars on the road.
I. INTRODUCTION In many countries, it is hard to search for any stolen car due
Computer vision is an important field of research for to its geometrical disadvantage. An unregistered stolen car
many real-life applications. Specifically, in object detection, can be used in crime. With an on-the-road car tracking
traditional image processing algorithms such as Scale- system, license plate recognition can solve such problems.

This research work is supported by Universiti Tunku Abdul Rahman In the past decade, Automatic License Plate Recognition
Research Fund (UTARRF) (Grant No. IPSR/RMC/UTARRF/2022-2/H03) (ALPR) had been a popular research topic in computer
and (IPSR/RMC/UTARRF/2018-C2/Y01), Malaysia. vision. The algorithm is generally divided into three tasks

979-8-3503-2521-8/23/$31.00 ©2023 IEEE


Authorized licensed use limited to: Zhejiang University. Downloaded on June 14,2025 at 11:43:04 UTC from IEEE Xplore. Restrictions apply.
377
i.e., license plate detection, character segmentation, and B. Non-Neural Network Object Detection
character recognition. Image processing techniques e.g., edge The term object recognition refers to a computer vision
detection, color matching, histogram analysis, and others technique used to identify objects in each digital image or
were used to extract the location of the license plate and video. With object detection, once an object of interest is
characters from the output of video-capturing devices. There detected it draws bounding boxes around the detected object.
are challenging issues regarding license plate detection that This enables the algorithm to locate the object of interest in
needs to be resolved. Therefore, much research has been any given image or video. Scale-Invariant Feature Transform
conducted to improve the efficiency, speed, and accuracy of (SIFT) [1] first finds key features from the image, followed
license plate detection tasks. In the past template matching by orientation assignment for the key points. Next, key
techniques [6] were used to identify vehicle number plates. points are constructed as a high-dimensional vector and
This approach identifies the width, height, and contour area lastly, it will perform key point matching. Speeded-Up
of the number plate. A recent paper [7] can detect and Robust Features (SURF) [2] is a feature detector and
recognize Iraqi and Malaysian license plates with 90.23% descriptor. It is a fast and robust algorithm used for local,
and 90.60% accuracy using SVM and YOLOv2-ResNet50. similarity invariant representation and comparing digital
Another paper [8] using YOLOv2-darknet19 can achieve an images. Important and unique features are first extracted
accuracy of 99.8% out of 410 samples of license plates. This from the image. A descriptor is then generated by fixing a
raises the question, is it worth continuing this research into position based on information gathered around the point of
vehicle license plate detection? But the answer is clear, as the interest. A square region is then constructed and aligned to
dataset used is normally small so the detection accuracy is the selected position. The square region is then divided up
good. However, if the larger datasets is used then work is into smaller sub-region. For each sub-region, a few features
more challenging. are computed at a regular-spaced sample point. The feature
descriptor is based on the Haar wavelet response around the
II. LITERATURE REVIEW point of interest. Binary Robust Independent Elementary
Features (BRIEF) [4] [17] finds binary string directly
A. Automatic License Plate Recognition (ALPR) without descriptors. It then makes use of smoothened image
In the last 20 years of research, Automatic License Plate patch and identifies a set of location pairs. Then, it compares
Recognition (ALPR) algorithms are divided into two to three the pixel intensity on the location pairs identified earlier.
tasks i.e., license plate detection, character segmentation, and
character recognition. Image processing techniques e.g., edge C. Neural Network Object Detection
detection, color matching, histogram analysis, and others Many neural-network-based object detection methods
were used to extract the locations of objects in an image. In have been created over the past few years due to the success
the license plate detection task, [9] used concentric sliding of deep learning. They are composed of stacks of
windows (SCWs) to perform segmentation and extract convolutional neurons with an input layer and an output
Region of Interest (RoI). [10] enhanced the luminance and layer. EfficientDet [18] by Google has a weighted bi-
contrast of the image before performing edge detection. [11] directional feature network (BiFPN) that can be scaled freely
used a combination of edge property and color property in for different resolutions. It uses EfficientNet as its backbone
the form of Hue, Saturation, and Intensity (HSI) to form a which is trained on ImageNet [19]. EfficientDet also replaces
fuzzy map. [12] used Sobel vertical operator to extract edges Softmax Normalized Fusion with Fast Normalized Fusion. In
followed by applying the Gaussian mixture model (GMM). the paper, it is reported that EfficientDet-D7 achieves 55.1
[13] used vertical edges, histograms, dilations, erosion, and AP on COCO test-dev. YOLOv4 [20] is a one-stage detector,
median filter. [6] used template matching and color features. the model consists of three main components: the backbone,
[14] used a modified traditional visual attention model with the neck, and the head. The backbone is an image
fusing color, intensity, and orientation feature maps. classification network. The neck and the head are responsible
Nowadays many large datasets are available publicly. to predict the location offsets of the object from the image. In
With the combination of large datasets, the computing power the fourth version of YOLO detector, Bag of Freebies (BoF)
of newer hardware, and innovative and successful deep and Bag of Specials (BoS) were introduced. BoF is a
learning mechanisms, researchers can create an object collection of techniques that can increase accuracy but will
detection model with high accuracy and speed. Specifically, only increase inference time by a little, e.g., new data
in the license plate detection task, researchers have been augmentation techniques, new Intersection over Union (IoU),
studying the performance of CNN models in detecting Generalized IoU (GIoU), and Complete IoU (CIoU). BoS are
license plates. [15] used the image processing method before techniques for mainly the post-processing stage, e.g.,
feeding the image into the CNN to classify the contours into attention module, feature integration, Mish activation, and
LP and non-LP. [7] used YOLOv2 with ResNet50 backbone soft Non-Maximum Suppression (NMS). The authors also
to detect license plate bounding box. [8] used darknet19 and design the models in such a way that they can be trained on
YOLOv2 for detection. The last layer was removed and consumer-grade GPU with limited RAM, unlike other
replaced with a linear classifier. Next, three additional models which need workstation machines with large
convolutional layers were added. [16] utilized Sliding amounts of RAM or on large-scale cloud Tensor Processing
Window Single Class Detection (SWSCD) to detect both Units (TPUs). YOLOv4 reportedly achieved 43.5% AP on
plates and characters because the original YOLO has MS COCO dataset. CenterNet [21] is also a one-stage
difficulty detecting bounding boxes of a small object due to detector with a different approach. It removes the Non-
its anchor-based detection. This sliding window approach is maximum Suppression (NMS) with a center point
possible due to the speed of YOLO-tiny. They modified representation. The combination of two corner points (top
YOLO-tiny even further by decreasing its number of layers left, bottom right) and the center point formed a triplet which
to increase its speed. can identify the location of an object more efficiently

Authorized licensed use limited to: Zhejiang University. Downloaded on June 14,2025 at 11:43:04 UTC from IEEE Xplore. Restrictions apply.
378
compared to anchor-based bounding boxes. Faster R-CNN blurry images, ccpd_db (10,132 images) – dark bright
[22] is one of the very accurate early two-stage detectors. It images, ccpd_fn (20,967 images) – far near images,
is a region-based CNN. Part of the images were detected by ccpd_rotate (10,053 images) – rotated images, ccpd_tilt
the Region Proposal Network (RPN) in the first stage. Then, (30,216 images) – tilted images, and ccpd_challenge (50,003
it is passed through the CNN to identify its object. This images) – combination/mixed of the above.
approach replaced the use of Selective Search (SS) from its
predecessor Fast R-CNN. Faster R-CNN reportedly score C. Hardware and Software
73.2% mAP on PASCAL VOC 2007 and 70.4% mAP on A consumer-grade PC will be used in the project.
PASCAL VOC 2012. A Single Shot Detector (SSD) [23] is Processor – Ryzen 2600 3.40ghz. GPU – Rtx2080ti 11Gb.
an early development of a one-stage detector. In comparison Ram – 16Gb. Storage – 500Gb Solid State Drive.
with Faster R-CNN, SSD removes RPN and is implemented
in a grid cell approach, also known as default box. The Software – Ubuntu 20.04 LTS, Tensorflow 2.5.0,
author stated that this can help detecting small objects which Pytorch, Python 3.9, OpenCV, Darknet, Yolomark.
is a problem faced by Faster R-CNN. SSD consists of a
backbone and a head. The backbone extracts features from D. Data Preparation
the input, while the head will be trained to find the location Raw training data (images with annotations in image
of the object by generating boxes and scores for the object filename) will be split into image files and text files as the
classes. label. Then, they will be converted into two separate formats
– YOLO format and Tensorflow format. For YOLO format,
III. METHODOLOGY training images and labels need to be separated into two
This section will elaborate on the models, the dataset standalone folders consisting of only images or labels. A list
used, data preparation and processing for training purposes, of file paths is generated using a bash script and saved as
data augmentation used, the proposed method to improve the train.txt, valid.txt, and test.txt. For Tensorflow format,
model’s accuracy, i.e., image preprocessing and model images and labels need to be converted into ‘.tfrecord’ file
architecture modification, and the evaluation metric and format. The advantages of tfrecords are that they can store
steps. data efficiently, have fast I/O, and have single-source data
files. These implementations allowed Google to take
A. Models advantage of Tensor Processing Units (TPUs) on the cloud.
The models chosen to be trained in the project are the E. Data Augmentation
state-of-the-art models in recent years which are very
efficient and have high accuracy: Data augmentations will be used in the project to
improve accuracy, avoid overfitting, etc. YOLOv4 uses
• EfficientDet (AutoML) multiple data augmentation techniques in their model
training, these techniques are categorized as Bag of Freebies
• YOLOv4 (Darknet)
(BoF) which means they do not add detection time during the
• CenterNet (TensorFlow) inference. YOLOv4 data augmentation techniques are as
follows:
• SSD (TensorFlow)
• Flip – flip training images left or right.
• Faster R-CNN (TensorFlow)
• Rotation – rotate the image 90, or 180 degrees
• YOLOv5 (Pytorch) clockwise or anticlockwise.
Among the chosen models, four frameworks have been • Cutmix – cut a random part of an image and replace it
used. They are Darknet framework in C language for in another image.
YOLOv4, Automl in Tensorflow for EfficientDet,
Tensorflow Object Detection API for Faster R-CNN, SSD, • Mosaic – combine multiple images into one.
and CenterNet, and Pytorch for YOLOv5.
• Mixup – stack images together with transparency.
B. Dataset • Blur – slightly blur the images.
The project will be using the Chinese City Parking • HSV – randomly slightly adjust the image’s Hue,
Dataset (CCPD) [24] dataset. CCPD is a dataset that consists Saturation, and Vue
of multiple challenging test images for license plates in
China. The characteristic of Chinese license plates is that It is important to mention that some of the
they have a blue background, white foreground that consists augmentations, e.g., random flipping, and rotation, are not
of letters, alphabets, and a Chinese character that represents appropriate and may reduce model accuracy. EfficientDet
the province. This dataset consists of train, valid, and test uses Scale Jittering which resizes an image and crops it into a
images. To test the models mentioned above, this project will fixed size.
be using test images from the Chinese City Parking Dataset
• Small jittering – uses a small ratio of [0.8, 1.2].
(CCPD). This dataset contains 250k unique car license plate
images. The image data include license plate location • Large jittering – uses a larger ratio of [0.1, 2.0].
annotations in individual text files. This dataset consists of
341,978 total images where 100,000 are used for training, The small jittering is good for shorter training time, i.e.,
99,996 are for validation and 141,982 are used for testing. 30 epochs. However, the accuracy decreases when using
Test images are grouped into ccpd_blur (20,611 images) – large jittering. For longer training time, i.e., 300 epochs,
large jittering performs better [18].

Authorized licensed use limited to: Zhejiang University. Downloaded on June 14,2025 at 11:43:04 UTC from IEEE Xplore. Restrictions apply.
379
F. Image Preprocessing TABLE I. MODELS’ ACCURACY ON TEST SETS

Image preprocessing techniques are useful in solving Test Set


many real-life problems, e.g., improving the visibility of a
low-resolution image. This project explores the possibility of Model
ROTAT CHALL AVERA
FPS DB BLUR FN TILT
simple and low-computation image preprocessing to improve E ENGE GE
object detection models’ performance. Most of the EfficientDet
preprocessing techniques used in this project are done 8.2 67.0 79.25 80.5 92.1 85.0 89.1 82.16
d6
through Python OpenCV and NumPy library. Image EfficientDet
34.53 53.5 70.6 62.9 91.0 91.0 78.3 73.77
processing techniques used in this project include d0
enlargement, sharpening, gamma correction, Contrast Centernet50-
28.84 50.14 64.48 70.20 90.59 74.40 70.87 70.11
512x512
Limited Adaptive Histogram Equalization (CLAHE), and Ssd-resnet50-
non-local means denoising. 24.85 45.24 64.06 47.28 92.46 83.79 72.72 67.59
640x640
Faster-rcnn-
G. Model Architecture Modification resnet50- 18.73 53.31 68.82 61.74 91.62 76.32 80.85 71.94
640x640
The model YOLOv4 will be selected to improve its Yolov4-csp-
performance due to several reasons, i.e., fast detection speed, 50.9 64.54 67.58 29.46 81.75 65.76 66.41 62.58
640x640
robustness to modification in DarkNet framework, etc. Yolov5s 80.0 50.5 74.0 11.5 75.8 51.6 83.1 57.75
YOLOv4 detects objects in three different scales, i.e., small,
medium, and large. As stated in the paper, this helps to detect Yolov5x 40.0 44.4 68.2 10.7 56.3 35.0 79.4 49.00
objects of different sizes. For this project, two identical
layers have been added to each scale before detection layers,
Table I shows that the best score in terms of average
making it a total of six additional convolution layers added to
mAP is achieved by the model EfficientDet-d6. However, it
the whole architecture. By doing this, the hypothesis is that it
is a network larger than the others in terms of parameter
helps the network to learn additional information for the new
counts and FLOPs (d0 - 3.9M, 2.54B) v.s. (d6 - 51.9M,
CCPD dataset that the model has not seen before. The initial
325B), therefore this is an unfair comparison.
weights of the six newly added layers were initialized
randomly. It has been observed from Table I that EfficientDet-d6
(82.16) scores the highest followed by Efficientdet-d0
H. Evaluation Metrics (73.77). However, the frames per second (FPS) has dropped
Evaluation of the accuracy of the models follows tremendously from 34.53 FPS to only 8.2 FPS which is not a
standard MS COCO object detection metrics very good model to perform in real-time applications.
([email protected]:0.05:0.95) which stands for the mean Average Comparing both models, Efficientdet-d6 outperforms
Precision for Intersection over Union (IoU) value ranges Efficientdet-d0 in all test datasets (DB – 67.0 vs. 53.5), (Blur
from 0.5 to 0.95 with 0.05 increment, averaged. For – 79.25 vs. 70.6), (FN – 80.5 vs. 62.9), (Rotate – 92.1 vs.
example, this will calculate the ([email protected] + [email protected] + 91.0), and (Challenge – 89.1 vs. 78.3) dataset, except for
[email protected] + [email protected] + [email protected] + [email protected] + (Tilt – 85.0 vs. 86.3) dataset.
[email protected] + [email protected] + [email protected] + [email protected]) / 10. If Efficientdet-d6 is not included, then the highest scores
However, since this project only consists of one object class will be Efficientdet-d0 (73.77) followed by Faster R-CNN
category, i.e., license plate, therefore Average Precision (AP) (71.94). Comparing the processing speed between these two
is set to be equivalent to mean Average Precision (mAP). models, Efficientdet-d0 performs faster (34.53) than Faster
Hence only [email protected] is used as the evaluation metric to R-CNN (18.73). Comparing the accuracy between
allow the comparison of results one-to-one with the result Efficientdet-d0 and Faster R-CNN, Efficientdet-d0 performs
from CCPD [23], which is also using [email protected]. better in all test sets (DB – 53.5 vs. 52.31), (Blur – 70.6 vs.
68.82), (FN – 62.9 vs. 61.74), (Tilt – 86.3 vs. 76.32) except
IV. RESULTS AND DISCUSSIONS for the, (Rotate – 91.0 vs. 91.62) dataset and (Challenge –
78.3 vs. 80.85) dataset.
The models were trained on the CCPD dataset. The
results of image preprocessing on images to increase the The YOLOv4-CSP model performed very badly in the
overall accuracy of the models will be seen. In the end, a FN dataset (29.46) compared to the best model in the
modified model is then trained on the same CCPD dataset to category Centernet-Resnet50 (70.20). This shows that
show the improvement that can be achieved through model Centernet-Resnet50 is very good at detecting very small
architecture modifications. In addition to the above results, objects in the image whilst YOLOv4-CSP is having
this section also addresses an issue regarding the calculation difficulty detecting small objects. The other model that
of 70% IoU, i.e., the 69% IoU problem where there are a performed similarly to YOLOv4-CSP (29.46) is SSD-
significant number of objects bounding boxes being rejected Resnet50 (47.28). More analysis and improvement will be
as false negatives despite being detected by the model, hence carried out on the FN dataset and the result will be discussed
lowering the overall accuracy. later.
Another highlight worth mentioning is that in the
A. Results of Models Training
dark_bright dataset, YOLOv4-CSP performs better (64.54)
A series of training steps and evaluations have been than other models (45.24 ~ 52.31). This shows YOLOv4-
performed to generate results as shown in Table I. CSP is less sensitive to the brightness of the image instead
the other features e.g., shape allows it to detect an object at a
higher chance.

Authorized licensed use limited to: Zhejiang University. Downloaded on June 14,2025 at 11:43:04 UTC from IEEE Xplore. Restrictions apply.
380
Regarding lower scores achieved by YOLOv5, this C. Modifying Model Architecture
model is still under development, and no record of a This paper also tested the possibility of improving the
published paper regarding the algorithm is available. model’s accuracy by modifying its fundamental architecture.
However, it is worth testing its performance. The model YOLOv4-CSP has been chosen because the
In terms of speed, YOLOv4-CSP achieved the highest Darknet framework is open-sourced, and this model has high
score (50.9) compared to the second highest Efficientdet-d0 speed + high average score from the previous result. Table
(34.53). YOLOv4-CSP will be selected to further improve its IV shows the modified model architecture.
performance to improve the speed and accuracy of license YOLOv4 detects objects at three different scales, i.e.,
plate detection tasks using deep learning models through small, medium, and large. This modified model has two
image processing, training, and modifying existing model additional layers at each of the scales and is initialized with
architecture. random values. This resulted in longer training time
compared to the pre-modified version. Table V shows the
B. Preprocessing accuracy of modified YOLOv4-CSP on the test sets and
The first intuitive way to improve the overall result is to validation sets.
perform simple image preprocessing for the test dataset. The
test set is very challenging for any model to perform object TABLE IV. MODIFIED YOLOV4-CSP ARCHITECTURE
detection as it is composed of images from difficult
conditions. A hundred test images from each test set are Layers Filters Size/stride Input Output Bflops
randomly selected for preprocessing, i.e., enlargement,
sharpening, gamma correction, CLAHE, and non-local 0conv 32 3x3/1 640x640x3 640x640x32 0.708BF
means denoising. TP-FN ratio denotes the number of true
positives (detected bounding boxes larger or equal to 70% 1conv 64 3x3/2 640x640x32 640x640x64 3,775BF
IoU with ground truth boxes, a.k.a. hit) vs. the number of
false negatives (the model not drawing any bounding boxes … … … … … …
on the ground truth boxes, a.k.a. missed). On the other hand,
false positives (FP) will occur if the bounding box has less 141conv 128 1x1/1 80x80x256 80x80x128 0.419BF
than 70% IoU with the ground truth box.
142conv 256 3x3/1 80x80x128 80x80x256 3.775BF
Tables II and II show the results before and after pre-
processing. An increase of TP count from each category can 143conv 128 1x1/1 80x80x256 80x80x128 0.419BF
be observed in tables II and III. Besides, this also greatly
reduces the number of FP which contributes a lot to the 144conv 256 3x3/1 80x80x128 80x80x256 3.775BF
calculation of mAP. False positives occur very frequently in
the above experiment due to the 0.69 IoU problem, where the 145conv 18 1x1/1 80x80x256 80x80x18 0.059BF
detected bounding box has 69% IoU with the ground truth
box and hence is rejected as a false positive instead. 146yolo … … … … …

158conv 256 1x1/1 40x40x512 40x40x256 0.419BF


TABLE II. RESULT BEFORE PREPROCESSING
159conv 512 3x3/1 40x40x256 40x40x512 3.775BF
Test Set
CHALLE 160conv 256 1x1/1 40x40x512 40x40x256 0.419BF
DB BLUR FN ROTATE TILT
NGE
161conv 512 3x3/1 40x40x256 40x40x512 3.775BF
[email protected] 64.20 63.29 43.90 88.90 66.64 62.08

TP-FN Ratio 69-31 70-30 52-48 92-8 77-23 67-33 162conv 18 1x1/1 40x40x512 40x40x18 0.029BF

FP 45 28 75 9 26 61 163yolo … … … … …
a.
TP (True Positive) FN (False Positive)
175conv 512 1x1/1 20x20x1024 20x20x512 0.419BF
TABLE III. RESULT AFTER PREPROCESSING
176conv 1024 3x3/1 20x20x512 20x20x1024 3.775BF
Test Set
177conv 512 1x1/1 20x20x1024 20x20x512 0.419BF
CHALLE
DB BLUR FN ROTATE TILT
NGE
178conv 1024 3x3/1 20x20x512 20x20x1024 3.775BF
[email protected] 67.09 68.26 57.39 96.40 77.94 71.11
179conv 18 1x1/1 20x20x1024 20x20x18 0.015BF
TP-FN Ratio 75-25 73-27 66-34 97-3 85-15 76-24
180yolo … … … … …
FP 43 28 68 4 18 56
b.
TP (True Positive) FN (False Positive)

Authorized licensed use limited to: Zhejiang University. Downloaded on June 14,2025 at 11:43:04 UTC from IEEE Xplore. Restrictions apply.
381
TABLE V. RESULT OF MODIFIED YOLOV4 Conference on Computer Vision, IEEE, Nov. 2011, pp. 2564–2571.
doi: 10.1109/ICCV.2011.6126544.
Test Set [6] A. H. Ashtari, Md. J. Nordin, and Seyed Mostafa Mousavi Kahaki,
“A new reliable approach for Persian license plate detection on colour
CHALL images,” in Proceedings of the 2011 International Conference on
DB BLUR FN ROTATE TILT VALID
ENGE Electrical Engineering and Informatics, IEEE, Jul. 2011, pp. 1–5. doi:
No. of 10.1109/ICEEI.2011.6021697.
20,611 10,132 20,967 10,053 30,216 50,003 99,996
Images [7] D. Habeeb et al., “Deep-Learning-Based Approach for Iraqi and
YOLOv4- Malaysian Vehicle License Plate Recognition,” Comput Intell
67.58 64.54 29.46 81.75 65,76 66.41 83.64 Neurosci, vol. 2021, 2021, doi: 10.1155/2021/3971834.
CSP
Modified [8] H. Jørgensen, “Automatic License Plate Recognition using Deep
YOLOv4- 74.65 51.48 49.78 67.57 45.12 83.62 96.96 Learning Techniques,” 2017. doi:
CSP http://hdl.handle.net/11250/2467209.
Increment +7.07 -13.06 +20.32 -14.18 -20.64 +17.21 +13.32 [9] C. N. E. Anagnostopoulos, I. E. Anagnostopoulos, V. Loumos, and E.
Kayafas, “A license plate-recognition algorithm for intelligent
transportation system applications,” IEEE Transactions on Intelligent
A significant improvement can be observed in FN test Transportation Systems, vol. 7, no. 3, pp. 377–391, Sep. 2006, doi:
set, Challenge test set, and Validation test set. However, 10.1109/TITS.2006.880641.
there is also a reduction of accuracy in Rotate and Tilt test [10] D. Zheng, Y. Zhao, and J. Wang, “An efficient method of license
plate location,” Pattern Recognit Lett, vol. 26, no. 15, pp. 2431–2438,
set. FN test set consists of images of objects from a far Nov. 2005, doi: 10.1016/j.patrec.2005.04.014.
distance and very near distance hence causing the ground [11] S. L. Chang, L. S. Chen, Y. C. Chung, and S. W. Chen, “Automatic
truth box to be smaller or bigger than the regular object License Plate Recognition,” IEEE Transactions on Intelligent
bounding box. This has added difficulty for the model to set Transportation Systems, vol. 5, no. 1, pp. 42–53, Mar. 2004, doi:
a tight bounding box for the object. 10.1109/TITS.2004.825086.
[12] G. S. Hsu, J. C. Chen, and Y. Z. Chung, “Application-oriented license
Rotate and Tilt test set have a sloppy license plate in the plate recognition,” IEEE Trans Veh Technol, vol. 62, no. 2, pp. 552–
image. However, due to the limitation of object detection, 561, 2013, doi: 10.1109/TVT.2012.2226218.
only a 90-degree rectangle bounding box can be drawn. In [13] F. Faradji, A. H. Rezaie, and M. Ziaratban, “A Morphological-Based
this case, the modified model has already learned to draw a License Plate Location,” in 2007 IEEE International Conference on
tighter bounding box on the object. This leads to the Image Processing, IEEE, Sep. 2007, pp. I-57-I–60. doi:
10.1109/ICIP.2007.4378890.
difference of IoU between modified and pre-modified
YOLOv4 hence the degradation in accuracy. [14] D. Zang, Z. Chai, J. Zhang, D. Zhang, and J. Cheng, “Vehicle license
plate recognition using visual attention model and deep learning,” J
Overall, this is a good improvement for the model Electron Imaging, vol. 24, no. 3, p. 033001, May 2015, doi:
10.1117/1.jei.24.3.033001.
because it has a +13.32 increase in accuracy for the Valid
[15] Z. Selmi, M. Ben Halima, and A. M. Alimi, “Deep Learning System
test set which consists of normal license plate images without for Automatic License Plate Detection and Recognition,” in 2017
difficult conditions. 14th IAPR International Conference on Document Analysis and
Recognition (ICDAR), IEEE, Nov. 2017, pp. 1132–1138. doi:
10.1109/ICDAR.2017.187.
V. CONCLUSION
[16] Hendry and R. C. Chen, “Automatic License Plate Recognition via
Deep learning object detection models can achieve high sliding-window darknet-YOLO deep learning,” Image Vis Comput,
accuracy in real-life scenarios such as license plate detection vol. 87, pp. 47–56, Jul. 2019, doi: 10.1016/j.imavis.2019.04.007.
tasks. This paper has carried out work to compare the [17] Z. Lye, H. Nisar, K. Lai and K. Yeap, "Localization and feature
capabilities of different models in terms of their speed and recognition: Implementation on an indoor navigator robot," 2014 10th
France-Japan/ 8th Europe-Asia Congress on Mecatronics
accuracy. In addition to that, this paper also shows that the (MECATRONICS2014- Tokyo), Tokyo, Japan, 2014, pp. 238-243,
image preprocessing step and existing model modification doi: 10.1109/MECATRONICS.2014.7018561.
can help to improve the overall accuracy of the model. The [18] M. Tan, R. Pang, and Q. V. Le, “EfficientDet: Scalable and Efficient
improved YOLOv4 model has achieved (96.96%) mean Object Detection,” Nov. 2019, [Online]. Available:
Average Precision [email protected] on the validation dataset http://arxiv.org/abs/1911.09070
compared to the original model (83.64%) on Chinese City [19] J. Deng, W. Dong, R. Socher, L.-J. Li, Kai Li, and Li Fei-Fei,
Parking Dataset (CCPD). “ImageNet: A large-scale hierarchical image database,” in 2009 IEEE
Conference on Computer Vision and Pattern Recognition, IEEE, Jun.
2009, pp. 248–255. doi: 10.1109/CVPR.2009.5206848.
REFERENCES [20] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “YOLOv4:
[1] D. G. Lowe, “Distinctive Image Features from Scale-Invariant Optimal Speed and Accuracy of Object Detection,” Apr. 2020.
Keypoints,” Int J Comput Vis, vol. 60, no. 2, pp. 91–110, Nov. 2004, [21] K. Duan, S. Bai, L. Xie, H. Qi, Q. Huang, and Q. Tian, “CenterNet:
doi: 10.1023/B:VISI.0000029664.99615.94. Keypoint Triplets for Object Detection,” 2019. [Online]. Available:
[2] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, “Speeded-Up Robust https://github.com/
Features (SURF),” Computer Vision and Image Understanding, vol. [22] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards
110, no. 3, pp. 346–359, Jun. 2008, doi: 10.1016/j.cviu.2007.09.014. Real-Time Object Detection with Region Proposal Networks,” 2015.
[3] Viswanathan DG, “Features from Accelerated Segment Test [Online]. Available: https://github.com/
(FAST),” 2011. [23] W. Liu et al., “SSD: Single Shot MultiBox Detector,” 2016, pp. 21–
[4] M. Calonder, V. Lepetit, C. Strecha, and P. Fua, “BRIEF: Binary 37. doi: 10.1007/978-3-319-46448-0_2.
Robust Independent Elementary Features,” 2010, pp. 778–792. doi: [24] Z. Xu, A. Meng, N. Lu, H. Huang, C. Ying, and L. Huang, “Towards
10.1007/978-3-642-15561-1_56. End-to-End License Plate Detection and Recognition: A Large
[5] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “ORB: An Dataset and Baseline,” 2018. [Online]. Available:
efficient alternative to SIFT or SURF,” in 2011 International https://github.com/detectRecog/CCPD.

Authorized licensed use limited to: Zhejiang University. Downloaded on June 14,2025 at 11:43:04 UTC from IEEE Xplore. Restrictions apply.
382

You might also like