License Plate Detection Using Deep Learning Object Detection Models
License Plate Detection Using Deep Learning Object Detection Models
Abstract— Object detection is an extension of image Invariant Feature Transform (SIFT) [1], Speeded Up Robust
classification tasks in computer vision. Its goal is to locate an Features (SURF) [2], Features from Accelerated Segment
object of interest in any given image input. In the past, this was Test (FAST) [3], Binary Robust Independent Elementary
done by traditional hand-crafted feature algorithms i.e., SIFT, Features (BRIEF) [4], and Oriented FAST and Rotated
SURF, HOG, BRIEF, and ORB. These algorithms have been BRIEF [5] are used to locate and identify an object from a
successful in their field however they do possess some given image frame. Over the past decade, a new method
downsides due to their nature. For example, they can be slow known as deep learning (DL) has overtaken the traditional
in detection speed, not as accurate, and difficult to develop. methods in the field. However, there are many unknown
Since 2012, deep learning has become an emerging technology
factors when deploying deep learning algorithms in real-life
that can solve object detection problems with relatively better
performance. However, not many studies have been done to
applications, e.g., how well does the DL model perform?
deploy deep learning object detection models in real-world This paper will dive into a popular use case scenario i.e.,
scenarios, e.g., license plate detection. License plate detection is license plate detection task using DL models which in the
a challenging task in computer vision because the input image past was performed using traditional image processing
captured can be in different sizes, colors, distances, methods.
orientations, and lighting conditions. This project aims to study License plate recognition is an important task in many
and improve license plate detection using deep learning real-life applications e.g., parking management system,
models. As of the current year, the model YOLOv4 has
traffic control system. In Malaysia, the number of cars on the
achieved 43.5% Average Precision (AP) on MS COCO.
Meanwhile, EfficientDet-D7 has achieved 55.1 AP on COCO
road has been increasing each year. With the increase in
test-dev. In this paper off-the-shelves object detection models automotive volume especially in the cities, license plate
are trained on CCPD license plate dataset. Two approaches recognition systems can be useful on busy roads or shopping
have been carried out to improve their accuracy, i.e., image malls to avoid traffic congestion, car park management, etc.
preprocessing step, and modifying existing model architecture. In China, the highway is always stuck with loads of traffic
Preprocessing steps show improvement for all test sets in terms going back to hometown every Chinese New Year. In
of (TP-FN-FP) value, i.e., (69-31-45 to 75-25-43) for db test set, Malaysia, due to the toll station, traffic congestion will
(70-30-28 to 73-27-28) for blur test set, (52-48-75 to 66-34-68) happen if the number of vehicles keep on increasing in the
for fn test set, (92-8-9 to 97-3-4) for rotate test set, (77-23-26 to coming years. One innovative way to avoid the same
85-15-18) for tilt test set, and (67-33-61 to 76-24-56) for happening as in China is to develop a license plate
challenge test set. The improved model has achieved (96.96%) recognition system with a high-speed camera. The car
mean Average Precision [email protected] on the validation dataset doesn’t need to stop completely at the toll station for
compared to the original model (83.64%) payment hence the traffic congestion can be avoided. This
can also eliminate many sub-systems from the toll station
Keywords—object detection, deep learning, license plate such as the automated car blocker, digital card scanner, etc.,
detection saving a high amount of maintenance fees each year.
Another example would be to track stolen cars on the road.
I. INTRODUCTION In many countries, it is hard to search for any stolen car due
Computer vision is an important field of research for to its geometrical disadvantage. An unregistered stolen car
many real-life applications. Specifically, in object detection, can be used in crime. With an on-the-road car tracking
traditional image processing algorithms such as Scale- system, license plate recognition can solve such problems.
This research work is supported by Universiti Tunku Abdul Rahman In the past decade, Automatic License Plate Recognition
Research Fund (UTARRF) (Grant No. IPSR/RMC/UTARRF/2022-2/H03) (ALPR) had been a popular research topic in computer
and (IPSR/RMC/UTARRF/2018-C2/Y01), Malaysia. vision. The algorithm is generally divided into three tasks
Authorized licensed use limited to: Zhejiang University. Downloaded on June 14,2025 at 11:43:04 UTC from IEEE Xplore. Restrictions apply.
378
compared to anchor-based bounding boxes. Faster R-CNN blurry images, ccpd_db (10,132 images) – dark bright
[22] is one of the very accurate early two-stage detectors. It images, ccpd_fn (20,967 images) – far near images,
is a region-based CNN. Part of the images were detected by ccpd_rotate (10,053 images) – rotated images, ccpd_tilt
the Region Proposal Network (RPN) in the first stage. Then, (30,216 images) – tilted images, and ccpd_challenge (50,003
it is passed through the CNN to identify its object. This images) – combination/mixed of the above.
approach replaced the use of Selective Search (SS) from its
predecessor Fast R-CNN. Faster R-CNN reportedly score C. Hardware and Software
73.2% mAP on PASCAL VOC 2007 and 70.4% mAP on A consumer-grade PC will be used in the project.
PASCAL VOC 2012. A Single Shot Detector (SSD) [23] is Processor – Ryzen 2600 3.40ghz. GPU – Rtx2080ti 11Gb.
an early development of a one-stage detector. In comparison Ram – 16Gb. Storage – 500Gb Solid State Drive.
with Faster R-CNN, SSD removes RPN and is implemented
in a grid cell approach, also known as default box. The Software – Ubuntu 20.04 LTS, Tensorflow 2.5.0,
author stated that this can help detecting small objects which Pytorch, Python 3.9, OpenCV, Darknet, Yolomark.
is a problem faced by Faster R-CNN. SSD consists of a
backbone and a head. The backbone extracts features from D. Data Preparation
the input, while the head will be trained to find the location Raw training data (images with annotations in image
of the object by generating boxes and scores for the object filename) will be split into image files and text files as the
classes. label. Then, they will be converted into two separate formats
– YOLO format and Tensorflow format. For YOLO format,
III. METHODOLOGY training images and labels need to be separated into two
This section will elaborate on the models, the dataset standalone folders consisting of only images or labels. A list
used, data preparation and processing for training purposes, of file paths is generated using a bash script and saved as
data augmentation used, the proposed method to improve the train.txt, valid.txt, and test.txt. For Tensorflow format,
model’s accuracy, i.e., image preprocessing and model images and labels need to be converted into ‘.tfrecord’ file
architecture modification, and the evaluation metric and format. The advantages of tfrecords are that they can store
steps. data efficiently, have fast I/O, and have single-source data
files. These implementations allowed Google to take
A. Models advantage of Tensor Processing Units (TPUs) on the cloud.
The models chosen to be trained in the project are the E. Data Augmentation
state-of-the-art models in recent years which are very
efficient and have high accuracy: Data augmentations will be used in the project to
improve accuracy, avoid overfitting, etc. YOLOv4 uses
• EfficientDet (AutoML) multiple data augmentation techniques in their model
training, these techniques are categorized as Bag of Freebies
• YOLOv4 (Darknet)
(BoF) which means they do not add detection time during the
• CenterNet (TensorFlow) inference. YOLOv4 data augmentation techniques are as
follows:
• SSD (TensorFlow)
• Flip – flip training images left or right.
• Faster R-CNN (TensorFlow)
• Rotation – rotate the image 90, or 180 degrees
• YOLOv5 (Pytorch) clockwise or anticlockwise.
Among the chosen models, four frameworks have been • Cutmix – cut a random part of an image and replace it
used. They are Darknet framework in C language for in another image.
YOLOv4, Automl in Tensorflow for EfficientDet,
Tensorflow Object Detection API for Faster R-CNN, SSD, • Mosaic – combine multiple images into one.
and CenterNet, and Pytorch for YOLOv5.
• Mixup – stack images together with transparency.
B. Dataset • Blur – slightly blur the images.
The project will be using the Chinese City Parking • HSV – randomly slightly adjust the image’s Hue,
Dataset (CCPD) [24] dataset. CCPD is a dataset that consists Saturation, and Vue
of multiple challenging test images for license plates in
China. The characteristic of Chinese license plates is that It is important to mention that some of the
they have a blue background, white foreground that consists augmentations, e.g., random flipping, and rotation, are not
of letters, alphabets, and a Chinese character that represents appropriate and may reduce model accuracy. EfficientDet
the province. This dataset consists of train, valid, and test uses Scale Jittering which resizes an image and crops it into a
images. To test the models mentioned above, this project will fixed size.
be using test images from the Chinese City Parking Dataset
• Small jittering – uses a small ratio of [0.8, 1.2].
(CCPD). This dataset contains 250k unique car license plate
images. The image data include license plate location • Large jittering – uses a larger ratio of [0.1, 2.0].
annotations in individual text files. This dataset consists of
341,978 total images where 100,000 are used for training, The small jittering is good for shorter training time, i.e.,
99,996 are for validation and 141,982 are used for testing. 30 epochs. However, the accuracy decreases when using
Test images are grouped into ccpd_blur (20,611 images) – large jittering. For longer training time, i.e., 300 epochs,
large jittering performs better [18].
Authorized licensed use limited to: Zhejiang University. Downloaded on June 14,2025 at 11:43:04 UTC from IEEE Xplore. Restrictions apply.
379
F. Image Preprocessing TABLE I. MODELS’ ACCURACY ON TEST SETS
Authorized licensed use limited to: Zhejiang University. Downloaded on June 14,2025 at 11:43:04 UTC from IEEE Xplore. Restrictions apply.
380
Regarding lower scores achieved by YOLOv5, this C. Modifying Model Architecture
model is still under development, and no record of a This paper also tested the possibility of improving the
published paper regarding the algorithm is available. model’s accuracy by modifying its fundamental architecture.
However, it is worth testing its performance. The model YOLOv4-CSP has been chosen because the
In terms of speed, YOLOv4-CSP achieved the highest Darknet framework is open-sourced, and this model has high
score (50.9) compared to the second highest Efficientdet-d0 speed + high average score from the previous result. Table
(34.53). YOLOv4-CSP will be selected to further improve its IV shows the modified model architecture.
performance to improve the speed and accuracy of license YOLOv4 detects objects at three different scales, i.e.,
plate detection tasks using deep learning models through small, medium, and large. This modified model has two
image processing, training, and modifying existing model additional layers at each of the scales and is initialized with
architecture. random values. This resulted in longer training time
compared to the pre-modified version. Table V shows the
B. Preprocessing accuracy of modified YOLOv4-CSP on the test sets and
The first intuitive way to improve the overall result is to validation sets.
perform simple image preprocessing for the test dataset. The
test set is very challenging for any model to perform object TABLE IV. MODIFIED YOLOV4-CSP ARCHITECTURE
detection as it is composed of images from difficult
conditions. A hundred test images from each test set are Layers Filters Size/stride Input Output Bflops
randomly selected for preprocessing, i.e., enlargement,
sharpening, gamma correction, CLAHE, and non-local 0conv 32 3x3/1 640x640x3 640x640x32 0.708BF
means denoising. TP-FN ratio denotes the number of true
positives (detected bounding boxes larger or equal to 70% 1conv 64 3x3/2 640x640x32 640x640x64 3,775BF
IoU with ground truth boxes, a.k.a. hit) vs. the number of
false negatives (the model not drawing any bounding boxes … … … … … …
on the ground truth boxes, a.k.a. missed). On the other hand,
false positives (FP) will occur if the bounding box has less 141conv 128 1x1/1 80x80x256 80x80x128 0.419BF
than 70% IoU with the ground truth box.
142conv 256 3x3/1 80x80x128 80x80x256 3.775BF
Tables II and II show the results before and after pre-
processing. An increase of TP count from each category can 143conv 128 1x1/1 80x80x256 80x80x128 0.419BF
be observed in tables II and III. Besides, this also greatly
reduces the number of FP which contributes a lot to the 144conv 256 3x3/1 80x80x128 80x80x256 3.775BF
calculation of mAP. False positives occur very frequently in
the above experiment due to the 0.69 IoU problem, where the 145conv 18 1x1/1 80x80x256 80x80x18 0.059BF
detected bounding box has 69% IoU with the ground truth
box and hence is rejected as a false positive instead. 146yolo … … … … …
TP-FN Ratio 69-31 70-30 52-48 92-8 77-23 67-33 162conv 18 1x1/1 40x40x512 40x40x18 0.029BF
FP 45 28 75 9 26 61 163yolo … … … … …
a.
TP (True Positive) FN (False Positive)
175conv 512 1x1/1 20x20x1024 20x20x512 0.419BF
TABLE III. RESULT AFTER PREPROCESSING
176conv 1024 3x3/1 20x20x512 20x20x1024 3.775BF
Test Set
177conv 512 1x1/1 20x20x1024 20x20x512 0.419BF
CHALLE
DB BLUR FN ROTATE TILT
NGE
178conv 1024 3x3/1 20x20x512 20x20x1024 3.775BF
[email protected] 67.09 68.26 57.39 96.40 77.94 71.11
179conv 18 1x1/1 20x20x1024 20x20x18 0.015BF
TP-FN Ratio 75-25 73-27 66-34 97-3 85-15 76-24
180yolo … … … … …
FP 43 28 68 4 18 56
b.
TP (True Positive) FN (False Positive)
Authorized licensed use limited to: Zhejiang University. Downloaded on June 14,2025 at 11:43:04 UTC from IEEE Xplore. Restrictions apply.
381
TABLE V. RESULT OF MODIFIED YOLOV4 Conference on Computer Vision, IEEE, Nov. 2011, pp. 2564–2571.
doi: 10.1109/ICCV.2011.6126544.
Test Set [6] A. H. Ashtari, Md. J. Nordin, and Seyed Mostafa Mousavi Kahaki,
“A new reliable approach for Persian license plate detection on colour
CHALL images,” in Proceedings of the 2011 International Conference on
DB BLUR FN ROTATE TILT VALID
ENGE Electrical Engineering and Informatics, IEEE, Jul. 2011, pp. 1–5. doi:
No. of 10.1109/ICEEI.2011.6021697.
20,611 10,132 20,967 10,053 30,216 50,003 99,996
Images [7] D. Habeeb et al., “Deep-Learning-Based Approach for Iraqi and
YOLOv4- Malaysian Vehicle License Plate Recognition,” Comput Intell
67.58 64.54 29.46 81.75 65,76 66.41 83.64 Neurosci, vol. 2021, 2021, doi: 10.1155/2021/3971834.
CSP
Modified [8] H. Jørgensen, “Automatic License Plate Recognition using Deep
YOLOv4- 74.65 51.48 49.78 67.57 45.12 83.62 96.96 Learning Techniques,” 2017. doi:
CSP http://hdl.handle.net/11250/2467209.
Increment +7.07 -13.06 +20.32 -14.18 -20.64 +17.21 +13.32 [9] C. N. E. Anagnostopoulos, I. E. Anagnostopoulos, V. Loumos, and E.
Kayafas, “A license plate-recognition algorithm for intelligent
transportation system applications,” IEEE Transactions on Intelligent
A significant improvement can be observed in FN test Transportation Systems, vol. 7, no. 3, pp. 377–391, Sep. 2006, doi:
set, Challenge test set, and Validation test set. However, 10.1109/TITS.2006.880641.
there is also a reduction of accuracy in Rotate and Tilt test [10] D. Zheng, Y. Zhao, and J. Wang, “An efficient method of license
plate location,” Pattern Recognit Lett, vol. 26, no. 15, pp. 2431–2438,
set. FN test set consists of images of objects from a far Nov. 2005, doi: 10.1016/j.patrec.2005.04.014.
distance and very near distance hence causing the ground [11] S. L. Chang, L. S. Chen, Y. C. Chung, and S. W. Chen, “Automatic
truth box to be smaller or bigger than the regular object License Plate Recognition,” IEEE Transactions on Intelligent
bounding box. This has added difficulty for the model to set Transportation Systems, vol. 5, no. 1, pp. 42–53, Mar. 2004, doi:
a tight bounding box for the object. 10.1109/TITS.2004.825086.
[12] G. S. Hsu, J. C. Chen, and Y. Z. Chung, “Application-oriented license
Rotate and Tilt test set have a sloppy license plate in the plate recognition,” IEEE Trans Veh Technol, vol. 62, no. 2, pp. 552–
image. However, due to the limitation of object detection, 561, 2013, doi: 10.1109/TVT.2012.2226218.
only a 90-degree rectangle bounding box can be drawn. In [13] F. Faradji, A. H. Rezaie, and M. Ziaratban, “A Morphological-Based
this case, the modified model has already learned to draw a License Plate Location,” in 2007 IEEE International Conference on
tighter bounding box on the object. This leads to the Image Processing, IEEE, Sep. 2007, pp. I-57-I–60. doi:
10.1109/ICIP.2007.4378890.
difference of IoU between modified and pre-modified
YOLOv4 hence the degradation in accuracy. [14] D. Zang, Z. Chai, J. Zhang, D. Zhang, and J. Cheng, “Vehicle license
plate recognition using visual attention model and deep learning,” J
Overall, this is a good improvement for the model Electron Imaging, vol. 24, no. 3, p. 033001, May 2015, doi:
10.1117/1.jei.24.3.033001.
because it has a +13.32 increase in accuracy for the Valid
[15] Z. Selmi, M. Ben Halima, and A. M. Alimi, “Deep Learning System
test set which consists of normal license plate images without for Automatic License Plate Detection and Recognition,” in 2017
difficult conditions. 14th IAPR International Conference on Document Analysis and
Recognition (ICDAR), IEEE, Nov. 2017, pp. 1132–1138. doi:
10.1109/ICDAR.2017.187.
V. CONCLUSION
[16] Hendry and R. C. Chen, “Automatic License Plate Recognition via
Deep learning object detection models can achieve high sliding-window darknet-YOLO deep learning,” Image Vis Comput,
accuracy in real-life scenarios such as license plate detection vol. 87, pp. 47–56, Jul. 2019, doi: 10.1016/j.imavis.2019.04.007.
tasks. This paper has carried out work to compare the [17] Z. Lye, H. Nisar, K. Lai and K. Yeap, "Localization and feature
capabilities of different models in terms of their speed and recognition: Implementation on an indoor navigator robot," 2014 10th
France-Japan/ 8th Europe-Asia Congress on Mecatronics
accuracy. In addition to that, this paper also shows that the (MECATRONICS2014- Tokyo), Tokyo, Japan, 2014, pp. 238-243,
image preprocessing step and existing model modification doi: 10.1109/MECATRONICS.2014.7018561.
can help to improve the overall accuracy of the model. The [18] M. Tan, R. Pang, and Q. V. Le, “EfficientDet: Scalable and Efficient
improved YOLOv4 model has achieved (96.96%) mean Object Detection,” Nov. 2019, [Online]. Available:
Average Precision [email protected] on the validation dataset http://arxiv.org/abs/1911.09070
compared to the original model (83.64%) on Chinese City [19] J. Deng, W. Dong, R. Socher, L.-J. Li, Kai Li, and Li Fei-Fei,
Parking Dataset (CCPD). “ImageNet: A large-scale hierarchical image database,” in 2009 IEEE
Conference on Computer Vision and Pattern Recognition, IEEE, Jun.
2009, pp. 248–255. doi: 10.1109/CVPR.2009.5206848.
REFERENCES [20] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “YOLOv4:
[1] D. G. Lowe, “Distinctive Image Features from Scale-Invariant Optimal Speed and Accuracy of Object Detection,” Apr. 2020.
Keypoints,” Int J Comput Vis, vol. 60, no. 2, pp. 91–110, Nov. 2004, [21] K. Duan, S. Bai, L. Xie, H. Qi, Q. Huang, and Q. Tian, “CenterNet:
doi: 10.1023/B:VISI.0000029664.99615.94. Keypoint Triplets for Object Detection,” 2019. [Online]. Available:
[2] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, “Speeded-Up Robust https://github.com/
Features (SURF),” Computer Vision and Image Understanding, vol. [22] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards
110, no. 3, pp. 346–359, Jun. 2008, doi: 10.1016/j.cviu.2007.09.014. Real-Time Object Detection with Region Proposal Networks,” 2015.
[3] Viswanathan DG, “Features from Accelerated Segment Test [Online]. Available: https://github.com/
(FAST),” 2011. [23] W. Liu et al., “SSD: Single Shot MultiBox Detector,” 2016, pp. 21–
[4] M. Calonder, V. Lepetit, C. Strecha, and P. Fua, “BRIEF: Binary 37. doi: 10.1007/978-3-319-46448-0_2.
Robust Independent Elementary Features,” 2010, pp. 778–792. doi: [24] Z. Xu, A. Meng, N. Lu, H. Huang, C. Ying, and L. Huang, “Towards
10.1007/978-3-642-15561-1_56. End-to-End License Plate Detection and Recognition: A Large
[5] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “ORB: An Dataset and Baseline,” 2018. [Online]. Available:
efficient alternative to SIFT or SURF,” in 2011 International https://github.com/detectRecog/CCPD.
Authorized licensed use limited to: Zhejiang University. Downloaded on June 14,2025 at 11:43:04 UTC from IEEE Xplore. Restrictions apply.
382