License Plate Recognition Methods Employing Neural Networks
License Plate Recognition Methods Employing Neural Networks
ABSTRACT Advances in both parallel processing capabilities because of graphical processing units
(GPUs) and computer vision algorithms have led to the development of deep neural networks (DNN) and
their utilization in real-world applications. Starting from the LeNet-5 architecture of the 1990s, modern
deep neural networks may have tens to hundreds of layers to solve complex problems such as license
plate detection or recognition tasks. In this article, we present a review of the state-of-the-art methods
related to automatic license plate recognition. Since deep networks have demonstrated a remarkable ability
to outperform other machine learning techniques, we focus only on neural network based license plate
recognition methods. We highlight the particular types of networks, i.e., convolutional, residual recurrent,
or long-short-term-memory, used for the specific tasks of license plate detection, extraction, or recognition in
different existing works. The presented summary also highlights some of the most widely used data sets for
comparison and shares the results reported in the reviewed papers. We also give an overview of the effects of
fog, motion, or the use of synthetic data on license plate recognition. Finally, promising directions for future
research in this domain are presented.
INDEX TERMS License plate, detection, recognition, deep learning, neural networks.
to capture license plates at night, as many license plates are license plates and extracting their characters. The first and
made using retro-reflective materials. third categories, i.e., license plate detection and license plate
In literature, the process of recognizing license plates is recognition, are further categorized or divided based on the
generally divided into the following three steps: i) Detection type of deep neural network architecture used. This is elab-
and extraction of the region containing the license plate; ii) orated using figures presenting a general description of the
Segmentation of the license plate characters; iii) Recognition type of network. A tabular comparison between different
of each alphanumeric character [2]. In their review paper [17], methods in each category is also presented, which contains
Du et al. presented a summary of the rationale along with the information about the deep learning architecture, used data
pros and cons of using boundary, texture, color, character size, set, results, and, if given, efficiency in terms of execution
or a hybrid of features for extracting license plates in images. time. A literature survey matrix has been added to present
They identified that most researchers preferred pixel con- a summary of the methods and details discussed later in
nectivity, projection profiles, prior knowledge of characters, Section II. The scope of this work is shown in Figures 1 & 2.
character contours, or a combination of these techniques for Figure 1 presents the methods employed for the individual
character detection and segmentation. They observed that the tasks of license plate detection and optical character recog-
methods used for character recognition varied significantly nition, whereas Figure 2 presents methods that holistically
based upon the underlying algorithms, including template address the complete pipeline of license plate detection and
matching, horizontal and vertical projections, Hotelling trans- recognition.
form [31], transitions between foreground and background To summarize, the main contributions of this work are:
pixels, Gabor filter, Kirsch edge detection, contour tracking,
• Providing detailed information about neural networks,
contour crossing, and topological features. These conven-
specifically deep networks, employed in the area of
tional methods focus on a specific type of license plate and
license plate detection, character recognition and license
the conditions in which it was captured.
plate recognition, in the past decade.
Similar to license plate recognition, several other computer
• Presenting a summary of deep learning methods
vision problems were also addressed with conventional learn-
employed for the task of license plate detection in
ing methods using handcrafted features. However, the advent
Table 3.
of modern neural networks, or deep neural networks (DNNs),
• Presenting a summary of methods employed for opti-
changed this problem-solving approach [70]. AlexNet [49],
cal character recognition, specifically in the context of
proposed in 2012, demonstrated a significant increase in
license plates, in Table 4.
the accuracy of object recognition tasks in computer vision
• Segregation of deep learning methods based upon the
algorithms and ushered in the era of deep learning. The
type of deep learning architecture in Section V and
10.8% gain shown by AlexNet in the ImageNet Large
discussion on their architectural details.
Scale Visual Recognition Challenge (ILSVRC) significantly
• Summary of the network along with details such as
outperformed the conventional methods. This difference in
data set used, country of the license plate, results, and
performance has also raised ample interest in exploring and
execution time when available.
identifying different deep-learning architectures for license
• Presentation of some key architectures in the form of
plate detection and recognition. The influence of deep learn-
block diagrams.
ing algorithms can be observed with training and testing
• Discussion on associated issues, including use and
data becoming more complex and challenging and license
generation of synthetic images, re-identification, and
plate recognition rates getting higher and closer to the human
denoising, in Section VI.
level. Another advantage of DNNs is that a network designed
• Discussion on type and trends of neural networks that
for one data may be used directly on another data without
have been employed in the literature, in Section VII.
changing the network as the features extracted by the network
• Discussion on pros and cons of the different categories
will be data dependent and hence do not require human
of neural networks in the context of license plate recog-
intervention.
nition, in Section VII.
The complete body of literature on license plate recogni-
tion methods is large and is mostly based on conventional The remainder of this paper is organized as follows.
methods that do not produce the best results. In [2] and Section II presents a brief introduction to the topic and
[17], the authors presented a comprehensive review of these presents the literature survey matrix. Section III presents
methods. We are focusing our attention on methods that a review of papers that use neural networks for license
employ deep neural networks. We have divided the related plate detection and extraction. Next, Section IV presents a
literature into three principal categories, i.e., i) License plate brief review of methods focusing on character recognition of
detection and character segmentation methods, ii) Optical license plates followed by Section V covering license plate
character recognition (OCR)-only methods, and iii) License recognition methods. Topics associated with license plates
plate recognition methods. The second category is a large i.e., re-recognition, deformation or blur removal etc are pre-
research area in its own right; however, we have covered sented in Section VI. Finally, Section VII presents a summary
some papers which use OCR methods after detecting the of the popular approaches and future directions in the area.
II. REVIEW: LITERATURE SURVEY MATRIX One of the earliest available and widely used dataset is
Literature on license plate recognition has thoroughly the Application Oriented License Plate (AOLP) dataset con-
addressed the three main research areas: license plate detec- taining Taiwanese license plates. Collected by Hsu et al.,
tion, character recognition, and license plate recognition. it comprises of three categories named Access Control (AC),
Most earlier works that used handcrafted features were gen- Law Enforcement (LE) and Road Patrol (RP) containing
erally applied to specific license plate types. However, these 681, 757 and 611 images respectively [36]. AC images have
conventional object detection and recognition methods have panning from −30 to 30 degrees, tilting from 0 to 60 degrees,
recently been replaced by neural networks based deep learn- capturing distance of less than 5m, the width of the license
ing techniques in the computer vision domain. The use of plate is 0.2 to 0.25 times of the image and was captured using
neural network based methods has resulted in improved a toll station type entrance. LE images have panning from
accuracy and ability to recognize license plates in more chal- −40 to 40 degrees, tilting from 20 to 70 degrees, capturing
lenging environments. Therefore, it may be more interesting distance of less than 15m, the width of the license plate
to focus on neural network based methods in the area of is 0.1 to 0.2 times that of the image and contains images
license plate recognition because of their superior perfor- captured when the vehicle violated a traffic rule. Finally, the
mance. We attempt to present and compare different types RP images have panning from −60 to 60 degrees, tilting
of neural networks proposed in the literature. We highlight from 0 to 50 degrees, capturing distance of less than 15m,
if the complete process of recognition has been achieved by the width of the license plate is 0.1 to 0.4 times that of the
using a neural network or if there is a preference to solve only image and captured either while using a handheld device or
a particular task in the recognition process using these state- from a mounted device.
of-the-art algorithms. The Caltech Cars1999 data set consists of 126 car images
An overview of some papers discussed in this review is with a resolution of 896 × 592 pixels. Another data set
presented as a literature survey matrix in Table 1. In this comprising 526 images at 360 × 240 pixels resolution is also
matrix, we have presented the first author’s name, the year available at the same link. The first data set includes parked
of publication, and the paper’s reference number. The matrix cars during the daytime, while the second data set, Caltech
also indicates if the paper proposes a solution for charac- Cars2001, was captured on a highway. Sample images of the
ter recognition, license plate detection (LPD), license plate data set are displayed in the first row of Table 2. A data set
recognition (LPR), or the complete pipeline. The matrix also shared by the University of California, San Diago comprises
presents the type of network used for the specific tasks. For around 10 hours of video of cars entering and leaving the
example, if the authors used a convolutional neural network campus during various times of the day. The still frames
(CNN) for license plate detection, it is mentioned in the table. extracted from these videos were hand-labeled; hence, data
The presented matrix can be used to identify the type of data for 878 cars were annotated. The data set also contains 291 car
set used, the country of origin of the license plate, the F1 images taken in the car park using a handheld digital camera.
score or accuracy reported for the method, the availability of The Peking University data set (PKU) is created by
the data set, and the number of citations to the work in the the National Engineering Laboratory for Video Technology
literature. It should be noted that certain papers have reported (NELVT), a research group at Peking University China. The
performance results for individual data sets. In the literature data set was captured by surveillance cameras in China during
review matrix, those results are averaged for presentation to both day and night. The data set consists of 221,763 images,
save space. However, they shall be presented separately when out of which 90,196 are labeled. Images in the data set are
those methods are discussed in the review. Other parameters, divided into five categories, where G4 and G5 categories
such as the amount of data used for training and testing, contain high-resolution images compared to G1, G2, and
detailed results for each data set, and the size of images, are G3. Only images in the G5 category have multiple cars in
not shared here but in the subsection discussing each method. an image. These images are displayed in the second, third,
One of the main issues associated with deep learning fourth, and fifth rows of Table 2.
techniques is the requirement for large amounts of data. Bjorklund et al. proposed using synthetic images con-
Since these techniques are data-dependent, the training data’s taining license plates in natural backgrounds and typical
variability and quantity impact performance. Without having background images with no license plates. The images
access to some benchmark data sets, it is not possible to used by them are presented in the row titled ‘‘Synthetic’’,
have a fair comparison of different methods. In this regard, in Table 2. The data set developed by Stanford contains
the data sets mentioned in Table 1 act as a good reference approximately 16,185 images of 196 cars. These images are
for the community. As mentioned in the notes of Table 1 presented in the seventh row of Table 2.
some of the data sets are publicly available, while others were The Chinese Car Parking Data set (CCPD) is a compre-
available but have been removed by the authors. To help the hensive data set, 12 GB in size, containing over 300,000
reader understand the type and variety of images available in images of Chinese license plates captured by a handheld
the data sets, we have presented samples from some open- camera by license plate inspectors across China. The images
source options. These images were downloaded from the were acquired between 0730 hrs and 2200 hrs. Another Chi-
links provided in the notes of Table 1. nese license plate data set titled Chinese Road Plate Dataset
(CRPD) was developed by Gong et al. [25] because they backgrounds. The background may be a simple or complex
noticed the lack of multi-objective real-world data sets. They as shown in Table 2 as ‘‘Synthetic images’’. The challenges
captured images using electronic monitoring systems. The associated to license plate detection also include the angle at
annotated images can be obtained from Github [24]. The main which the license plate is being viewed. As an example, in the
challenge presented by the dataset is the availability of images data set shared by Khan et al. [45], the images were taken via
with single, double and multiple license plates. a moving vehicle, by placing the camera on the dash. This
Images taken from Greek license plates were made avail- causes vehicles to either enter the frame from the edge and
able by a research group at the National Technical University move to the front, when the car is being overtaken, resulting in
of Athens and are presented as LPRD in Table 2. The data set a skewed plate at the beginning and a rectangular plate when
has images categorized as still images, difficult images, and the other car is in front of the car capturing the images. Videos
videos. The data is further subdivided into images captured were also captured from the side view window of the car. This
during daytime or night. There is also a division between causes the license plates of parked cars to appear rectangular,
color, grayscale, and blurry images. Images that have shad- however, the license plates of car passing by appear non-
ows, are close-ups, or were taken from far away are also saved rectangular. Thus, making the task of detection more difficult.
separately. Further challenges addressed by the above-mentioned data
A Brazilian car data set titled ‘‘Federal Univer- sets include variability in characters. Depending upon the
sity of Parana-Automatic License Plate Recognition’’, country of origin the license plate may contain English, Chi-
UFPR-ALPR, comprising 300 vehicles, was collected by nese, Arabic, Persian characters etc and hence the license
Laroca et al. [51]. The images in the data set have a resolution plate recognition algorithm has to be trained on a data set
of 1920 × 1080 pixels per image. The data contains multiple containing the characters of interest. This also holds true
images of the same vehicle captured from another vehicle for the numeric digits. This also visible from the diverse
while both are moving. This data set can be beneficial in license plate types shown in Table 2. Examples of Persian and
developing an LPR system for deployment in police cars. Arabic license plates are show the variability in the numbers
License plates used in Tunisia [78] and Saudi Arabia [45] and text used in the license plates as compared to English
are also presented at the bottom of the table. The images based license plates. Similar difference can be seen for Chi-
in the Saudi Arabian database have a resolution of 1920 × nese license plates in the figure. The Persian language has
1080 pixels and are captured from a moving vehicle. The 32 unique characters as compared to 26 for English language
license plate images from Iran are also shown in the table and 29 for Arabic. Some languages including Chinese does
and have been obtained from [82]. The data set comprises not have an alphabet system and hence the various number
approximately 20,000 images, and since these images contain plate permutations are not dependent upon them.
only license plates, their dimensions are not fixed. Finally, the change in lighting conditions also cause dif-
Spanhel et al. [92], shared a database comprising of rel- ficulty in recognition of license plates. This is covered very
atively low-quality images for recognition of license plates. efficiently in PKU data set. The data set not only provides
The authors also provided an HDR dataset, which is relatively images of vehicles at different times of day but also comprises
smaller in size and ReId database, which may be used for of the challenging case of license plate obstruction i.e. com-
re-identification of vehicles. Vehicle re-identification is a plete license plate is not visible in the frame.
topic associated with license plate recognition as LPR may A recent dataset on Brazilian license plates was developed
be a part of the process. by Laroca et al. [53]. They introduced a publicly available
Use of synthetic license plates to train large neural net- dataset called RodoSol-ALPR containing 20,000 images
works has gained some interest in the last few years and in captured at toll booths installed on Brazilian highway of
this context a tool to generate license plates was developed by car and motorcycle license plates from Mercosur (Brazil,
Matthew Earl and has been made available for use [19]. The Argentina, Paraguay and Uruguay) region. In the work
author has shared the procedure for generation of synthetic they proposed a traditional-split versus leave-one-dataset-out
license plate images and for developing a basic deep learning experimental setup for assessing the cross-dataset general-
architecture for detection and recognition of license plates. ization. They used 12 Optical Character Recognition (OCR)
The authors Usmankhujaev et al. have made their tool for gen- models applied to LP recognition on nine publicly avail-
eration of Korean license plates available on GitHub at [99]. able datasets containing various challenging images captured
The utility allows generation of five different types of Korean using different capture settings, image resolutions and license
license plates, i.e., with white, yellow or green backgrounds plate layouts.
and different shapes and sizes. In a recent work Laroca et al. [54] the authors have
The data sets presented above focus on different challenges explored the bias of the existing license plate datasets.
faced during license plate recognition. The first challenge is They tested deep learning algorithms on four Chinese
associated to detection of license plate in an image. In this and four Brazilian datasets and noticed that the algo-
regard synthetic data sets have been employed as they can rithms always performed better on the dataset they were
take license plate images and merge or paste them in various trained on.
They emphasized the need to have cross-dataset evalu- III. NEURAL NETWORKS FOR LICENSE PLATE DETECTION
ations to better judge the generalization capabilities of a The first step in license plate recognition is detecting a license
proposed license plate recognition algorithm. plate in an image and determining its boundaries. The bound-
aries around the license plate are referred to as the ‘‘bounding criteria vary from finding the correct number of license plates
box’’. Thus, the goal is to draw a bounding box around a to matching the intersection between the annotated region and
license plate or all license plates in an image. The success the region marked by the algorithm.
FIGURE 1. General pipeline for license plate detection or character recognition. Methods listed below each block represent papers reviewed in this work.
The papers presented here focus only on one block of the pipeline. Therefore, papers related to character recognition or region extraction assume that
license plate has already been detected.
FIGURE 2. General pipeline for both license plate detection and recognition. Papers listed in the figure are reviewed in this work. The method mentioned
on the left of the arrow is used for license plate detection while that on the right of the arrow is used for license plate recognition.
In 2012, Li et al. proposed using a multi-scale convolu- tion. The idea was to use low-level local features extracted
tional neural network for license plate detection [58]. Instead at the earlier layer and the high-level global invariant fea-
of just using the output of a layer in the next layer, they tures extracted in the deeper (later) layer as input to the
proposed using two different layers in the final classifica- fully connected classification layer. Their network comprised
FIGURE 3. Multi-scale CNN network comprised of an input layer of size 64 × 24 pixels, convolution layer with 4 filters of size 5 × 5 followed by
down-sampling by 2, followed by another convolutional layer having 14 filters of size 3 × 3 followed By down-sampling by 2. Thus, resulting in
feature/image having dimensions 14 × 4 × 14. After passing through the fully connected layer the output indicates if a license plate was detected or not.
two convolutional layers followed by down-sampling, thus, plate existed and that no sub-region contained more than one
reducing the size of the image. The network was trained license plate. They developed their own database of 1,829
by minimizing the sum-of-squares error using the gradient images containing 4,070 Brazilian license plates and reported
descent algorithm. The authors developed their own data set a precision rate of 87% and a recall rate of 83%.
by annotating 5,613 images and dividing them into 5,488 Fu et al. proposed using cascaded neural networks to detect
training and 125 validation images. They applied shear, rota- and extract license plates [20]. The idea of cascaded neural
tion, flipping, brightness, and contrast variations to images networks is to reduce candidates by combining simple fea-
to obtain 65,586 training and 1,500 validation plate samples. tures, thus, creating a shallow network. They proposed a five
Next, they extracted 4,000 non-plate samples and divided convolutional layer region proposal network (RPN) at the
them into 2,500 training and 1,500 validation images. They heart of FRCNN to extract cars from the images. This was
also generated 57,977 false positives, resulting in a total of followed by a VGG16 network [91] with PRN to detect the
60,477 non-plate samples. For testing purposes, the authors license plates in the cropped region from the previous stage.
collected 1,559 images containing 1,602 license plates. They The detected plate is refined by increasing the detected area
reported a detection rate of 93.2%. As presented in Table 3, by 30%. The authors proposed using transfer learning from
the above-mentioned details are summarized in terms of the ImageNet data set and training the parameters of the new
training and testing images. The data set used by the authors layers using 128 positive and negative samples. The authors
comprises Chinese license plates. However, the authors have generated a Chinese license plate data set of 30,975 images
neither made the data set publicly available nor reported and reported a precision rate of 98.4% and a recall rate of
the time required to detect a license plate. The proposed 90.5% on it.
procedure is presented as a block diagram in Figure 3. Kim et al. [48] noticed that (in the detection stage)
The success of AlexNet in 2012 led to the development algorithms often mistake grilles or headlights as plates.
of deeper convolutional networks capable of object detection To overcome these issues, they trained a CNN for detecting
in complex scenarios. This further led to the development these anomalies and reported an improvement in detection
of region-based fast convolutional neural networks. Kim et accuracy from 78% to 87%. However, the authors did not
al. proposed the use of a two-step detection procedure [47]. share the base model that was reported to have achieved 78%
In the first step, they proposed using faster region convo- detection rate.
lutional neural networks (FRCNN) [84] to detect cars in To assess the ability of different types of object detectors,
images. They reported that using the Caltech data set [103], Peker tested commonly used object detectors to locate Turk-
the proposed FRCNN method missed only one car. Next, ish license plates in images [76]. The author compared faster
the region containing the detected vehicle was passed to region convolutional neural network (FRCNN) [84], region-
a graph-based segmentation method that combined similar based fully convolutional neural network (RFCNN) [15]
regions based on color, texture, and size. Regions not con- and single-shot multi-box detector (SSD) [63]. The training
taining the ratio-based dimensions of the license plate were data set comprised 200 images, while the test set contained
removed. Finally, a CNN was applied for the removal of 100. FRCNN provided the best results for license plate
non-plate regions. The CNN comprises a pair of convolution recognition for the tested data sets. One of the variations
and max-pooling layers followed by two fully connected titled ‘MobileNet-SSD’ tested in this work was proposed
layers and a Softmax layer. The authors used 175,000 non- by Peng et al. [77]. The test data set used by the authors
plate and 35,000 plate images from the Internet to train this is relatively small. A larger data set would have made the
network. Testing on the Caltech database [103], comprising results more meaningful. This also highlights the need and
126 images of license plates from the USA, reported a preci- importance of publicly available benchmark data sets.
sion rate of 98.4% and a recall rate of 96.8%. In [62], the authors propose using edge guided sparse
Instead of passing the complete image to the algorithm, attention network (EGSANet) to detect license plates. The
Kurpiel et al. proposed performing image partitioning before EGSANet comprises a VGG19 backbone network, an EGSA
passing it to a CNN and then aggregating the results to locate module, and a cascaded multi-task detection head. Instead of
the license plate in the image [50]. Their algorithm assumed using all 19 layers, the authors used 12 layers of the VGG19
that in a sub-region of 120 × 180 pixels, only one license network with the EGSA module inserted. The advantage of
TABLE 3. Summary of methods employing neural networks for license plate detection.
EGSA is that it uses edge contours of LP and solves the for the recognition of characters and digits on license
LP detection problem in real time. To assess the impact of plates.
parameter selection on license plate detection, Lee et al. pro- Among some of the earliest efforts was the method
pose to fine-tune the parameters of YOLOv3 network in [57]. proposed by Nijhuis et al. [73] in 1995, where the authors
They identified and fine-tuned the following parameters a. proposed the use of Discrete-time Cellular Neural Networks
train test ratio b. number of epochs c. mini-batch size d. L2 (DTCNN’s) for feature extraction and an ordinary Multi-layer
regularization e. learning rate, to improve the accuracy of the Perceptron (MLP) network for recognition of Dutch license
YOLOv3 network for a Malaysian license plate dataset, from plates. In 2006, Anagnostopoulos et al. proposed using a fully
87.75% to 99%. connected, three-layer neural network for optical character
The above-presented methods for license plate detec- recognition of license plate characters [1]. The authors pro-
tion are summarized in Table 3. The table provides details posed a segmentation technique that they baptized ‘SCW’ in
about the authors’ names, the year of publication, and which they described local irregularity in the image using
the work’s reference number in this paper. The generic statistics based on mean and standard deviation. The speed
approach employed by authors is presented with information of fully connected neural networks in the mid-2000s was
on whether they developed their own data set for training slow, and the speedup of these computationally intensive
and testing or used an open-source data set. In some cases, algorithms was an active research area. Caner et al. proposed
the authors propose more than one network for license plate an FPGA-based implementation of license plate recogni-
detection; hence more than one network is mentioned in that tion incorporating neural networks [11]. They proposed a
case. The last column presents the license plate detection license plate detection algorithm that applied Gabor filter to
results reported by the authors. Some authors report accuracy, grayscale images, followed by connected component label-
while others report precision and recall results. For consis- ing, feature extraction, using min/max type operations, and
tency, we have combined them into accuracy and F1 scores in information about the aspect ratio of the plate. To segment
the table. The second last column specifies the size of training the characters, they used a vertical projection based threshold.
and testing data if mentioned by the authors; otherwise, the Finally, they proposed using Self-Organizing Maps (SOM)
size of the entire data set is provided. The size of images in for character recognition. The input matching is done by
the data set and the country of origin of license plates are also comparing the Euclidean distance between it and each output
noted in the table. node. Their sample data set contained 1,436 cars passing on a
highway. The proposed system, implemented on a Virtex IV
IV. NEURAL NETWORKS AS OCR FOR LICENSE PLATE FPGA, took 0.5 sec on average to recognize a license plate for
RECOGNITION vehicles going up to 90 km/h. The authors reported a license
Character and digit recognition has been extensively plate detection accuracy of 91.7% and stand-alone character
researched, and optical character recognition software have recognition accuracy of 90.9%.
been commercially available for years. Neural networks The early 2010s saw the revival and evolution of neural
have been employed for character recognition since the networks because of the power and ability of convolutional
early 90s. LeNet-5, the first deep convolutional neural net- architecture to solve problems related to feature extraction.
work, was developed for handwriting recognition for postal Hence, the simple fully-connected three-layer neural net-
services. However, there is a difference in general optical works were replaced by deeper neural networks for most
character recognition and license plate character recognition computer vision tasks. This was reflected in the license plate
because of the difference in size, resolution, orientation recognition tasks as well. In 2016, Gou et al. utilized a Hybrid
(skew due to tilt and pan), and noise in captured data. Discriminate Restricted Boltzmann Machine (HDRBM) to
Efforts have been made since the 90s to use neural networks recognize Chinese characters, a relatively more complex task
than recognizing English characters. The proposed method as the input to the SVM classifiers. The authors proposed to
was also suitable for handling variations in camera angle and use 65 binary SVMs for each character. The authors tested the
zoom to recognize distorted characters [27], both during day proposed system using 1,300 license plates captured by them
and night. The authors proposed using conventional methods, and reported an F1 score of 98.3% for character recognition.
based on the top-hat morphological operator on grayscale The average time required for license plate detection and
images, to identify edges for license plate detection. They character recognition was approximately 140 msec. However,
trained two separate RBMs for character recognition, one whether these results are for a GPU or a CPU-based imple-
for 31 Chinese characters and another for 34 English charac- mentation is not specified. The block diagram of the proposed
ters, i.e., ten digits and 24 alphabetical letters (characters ‘O’ architecture is reproduced in Figure 4.
and ‘I’ are excluded). For training, they used 12,366 labeled Inspired by their earlier work [78], the authors pro-
Chinese characters and 10,408 labeled digits and English posed the use of Squirrel Search Algorithm (SSA)-based
characters. The HDRBM comprised 150 hidden nodes, and CNN called SSA-CNN for character recognition [100]. They
the learning rate λ was set to 0.01. They tested their proposed utilized median filtering followed by edge detection and
method using a database presented in [26] and extended it morphological operators to detect license plates. Next, they
to 4,242 images taken at 1936 × 2592, 1280 × 736, and proposed using Hough Transform to segment license plates
720 × 280 pixel resolution. The images were taken using a and SSA-based CNN to recognize characters. The authors
traffic monitoring camera under varying lighting conditions, reported an F1 score of 98% for the FZU Cars data set, an F1
from day to night. The authors reported the correct license score of 96.9% for the Stanford Cars data set, and an F1 score
plate detection rate of 95.9%, the correct character recogni- of 96.1% for the HumAIn data set.
tion rate of 98.2%, and overall combined correct detection Liu et al. proposed using CNN for Chinese license plates
and recognition rate of 94.1%. One downside of the proposed with white digits and characters on a blue background [66].
method is the overall execution time, which is more than half Chinese license plates have different background colors,
a second (sec) per image. depending on the type of vehicle. However, the authors only
Radzi et al. proposed the use of CNN for license plate focused on the detection of one type. They stated that the
recognition [81]. The proposed CNN comprises five layers: letters (characters) ‘I’ and ‘O’ do not appear on Chinese
a convolution layer, a sub-sampling layer, another convo- license plates, thus reducing the number of possible char-
lution layer, and two fully connected layers. The authors acters from 26 to 24. Also, the first character is a Chinese
used a small data set of Malaysian license plates comprising character representing one of the 31 provinces. Hence, there
750 images for training and 434 for testing. The proposed are 31+24+10 = 65 possible characters that may take up
CNN was trained using a variable learning rate. Since the seven possible spaces on a license plate. They developed an
process does not consider license plate extraction or segmen- X-LN, i.e., X layer network, where each layer consists of
tation, it returns a 98.8% accuracy for the small controlled a convolutional layer followed by a max-pooling layer, and
data set. Similarly, Zhai et al. proposed using neural networks the network concludes with two fully connected layers. The
to recognize alphanumeric characters on license plates [113]. network output has 65 × 7 nodes representing one of the
However, the presented results were not for images con- 65 possible characters at each of the 7 locations. However,
taining license plates but only for alphanumeric characters. to simplify training, they trained the network to recognize
Similar to Radzi et al., they also stated that their classifier the first character as one of the 31 Chinese characters. The
considered the digit ‘0’ (zero) and the letter ‘O’ as the same second character is from the English alphabet and was trained
character, as well as the digit ‘1’ (one) and the letter ‘I’. to identify one of the 24 characters. Finally, the last five posi-
Another proposal puts forward the hybrid use of CNNs, tions were trained for the 34 alphanumeric characters of the
in which a neural network is paired up with another English language. They used an Adam optimizer to minimize
machine-learning tool for character recognition. In this con- the loss function during training. For training, they initially
text, Zang et al. proposed using a visual saliency model to used data from their own province, which was acquired and
detect license plates and integrate CNN based features and labeled manually. It comprised 3,220 different number plates,
Support Vector Machine (SVM) into a single framework to extended to 8,192 by applying geometric transformations.
recognize Chinese number plates [110]. The authors used They used 7,396 images for training and 796 for testing and
color, intensity, and orientation features to build Gaussian obtained an accuracy of 88.6%. Later, they expanded their
pyramids to generate the saliency map. The saliency map training data set by synthetically generating license plates and
may contain objects other than the license plate; hence, the reported an improved accuracy of 90% for tests on synthetic
authors proposed to use information about the aspect ratio and images. The results reported by the authors were for different
location to refine the estimate. To segment the characters, the depths of the network, i.e., X = 4 and X = 3. The authors did
authors proposed using horizontal and vertical projections. not discuss how the license plate was detected, as their work
The input to the CNN was an image of size 38 × 38 pixels, focused only on character recognition.
and the network comprised three pairs of convolution and Some other techniques proposed for character recognition
sub-sampling layers, followed by a fully connected layer revolved around using different types of neural networks.
comprising 64 nodes. The output of these 64 nodes was used Qian et al. proposed using competitive neural networks to
FIGURE 4. CNN based network comprised of three convolution and maxpooling layers in [110]. The strides and padding were different at each stage.
In the end the authors have used an SVM classifier to classify the Chinese characters.
TABLE 4. Summary of methods employing neural networks for license plate character recognition.
recognize license plate text [79]. They assumed that the work’s reference number. The generic approach employed
license plate had already been extracted and segmented. by authors is presented along with the information if they
Hence, each character was available for recognition. The developed their own data set for training and testing or if they
authors tested the proposed method on 115 images contain- used an open-source data set. In the case of Spanhel et al., the
ing 690 characters and reported an accuracy of 91.3% for data set produced by them was made public and is noted in
character recognition. Kilic et al. proposed using a Long the table. However, the link is not accessible now at the time
Short Term Memory (LSTM) network to recognize text from of this writing. Most authors in the papers presented above
license plates captured using a fixed CCTV camera [46]. For report the success rates of character recognition or the com-
a data set of 4,693 Turkish license plates, they reported an plete license plate recognition process. Also, some of them
accuracy of 88.8% for validation set images. report F1 scores, while others report accuracy. Therefore, the
Spanhel et al. proposed using a cascade classifier using results presented in Table 4 summarize both. In several cases,
local binary patterns to detect license plates in Full HD authors reported the average time required by the proposed
quality images, extracted from video [92]. Using a Kalman algorithm, which is shown in the last column.
filtering based approach, they also tracked the license plate
in the video. They developed a database titled ‘ReId’, com-
prising 7,393 training tracks containing 105,924 images and V. NEURAL NETWORKS FOR LICENSE PLATE
6,967 testing tracks containing 76,142 images. To recognize RECOGNITION
the license plate characters without segmentation, they pro- The previous sections covered different types of neural net-
posed using a CNN with eight branches, where each branch works used in literature to perform license plate detection,
predicted all eight characters of the license plate. Since the extraction of characters, or recognition of license plate char-
authors addressed the issue of the variable number of char- acters. More recent approaches have started exploiting the
acters in certain license plates, they replaced them with a prowess of the neural networks for both license plate detec-
‘#’ character during training so that the system could recog- tion and character recognition without having to perform
nize license plates with variable numbers of characters. They license plate or character extraction specifically. This section
reported that the proposed algorithm took approximately presents an overview of techniques that employ single or
5 ms for license plate detection and 8 ms for recognition multiple neural networks for the complete chain of license
of characters, minus processing overheads using a CPU. For plate detection and recognition. The material below could
GPU GTX 1080, they reported an overall time of 0.82 msec. have been presented in chronological order or based on the
They reported an accuracy rate of 98.6% for the ReId data origin of license plate or speed of algorithm execution, e.g.,
set. However, on another data set collected by the authors, slow or real-time, etc. However, we propose to divide the
titled ‘HDR’, they reported 90.3% accuracy for license plate methods based on the type of architecture proposed by the
recognition. authors. This choice highlights the community’s interest in a
The above-presented methods for license plate char- particular network architecture. To that end, we have divided
acter recognition are summarized in Table 4. The table networks into four principal categories; Convolutional Neu-
provides author names, the year of publication, and the ral Networks, Recurrent Neural Networks, YOLO networks
73624 VOLUME 11, 2023
M. M. Khan et al.: License Plate Recognition Methods Employing Neural Networks
(which are stripped-down versions of CNNs), and Other fully connected layers resulting in a binary classification of
architectures. images, either having a plate or not. The second sub-network
While going through this section, please note that the consists of three fully connected layers connected to eight
figures sometimes represent license plate detection and regression units to identify the x- and y-coordinates of the cor-
recognition outputs separately with a common network, as in ners of the license plate. Once the license plate is detected, the
the case of Figure 5. This does not mean that LPR is indepen- second stage identifies characters and assigns a confidence
dent of LPD, but this simply means that the same network score to each character while localizing the bounding box
parameters are used for both tasks. In other cases where corners. This stage comprises seven convolutional layers with
LPD and LPR have two separate networks, as in the case max-pooling after 2nd , 4th and 7th layers. Finally, the char-
of Figure 6, the change in input image size for LPR net- acter recognition network comprises three fully connected
work compared to LPD network indicates that the region of stages with 33 classes, as the Italian license plates have only
interest extracted from LPD is fed to LPR. The figures are 32 possible alphanumeric characters. The authors generated
represented in this way to indicate if the same or different synthetic images with positive and negative license plate
features/parameters are being used for LPD and LPR. samples. To train the network for license plates, they used
25,000 positive and 25,000 negative samples, whereas to train
A. CNN-BASED METHODS the network for character recognition, 175,000 positive and
In 2016, Li et al. proposed to address the problem of license 175,000 negative samples were used. A pyramid of images
plate recognition as a sequencing problem [59]. They pro- is provided as input to the license plate detection module to
posed a three-network approach. The first CNN consists of obtain the best results, and the outputs are combined using
four layers to identify text in grayscale images and divides intersection over union (Jaccard index). The authors then
them into 37 classes (10 digits, 26 characters, and one obtained multiple crops of the license plate before feeding
non-character). The architecture comprised a sequence of them to the character recognition network. If the confidence
convolution, Rectified Linear Unit (ReLU), max-pooling, score of a character is above a certain threshold, it is added
convolution, ReLU, max-pooling, fully connected, ReLU, to the list of candidate character boxes. The authors reported
dropout, fully connected, and Softmax layers. The second combined accuracy of around 93% on 1,000 test images
CNN classifier also has a four-layer classifier for identify- downloaded from the Internet.
ing plate and non-plate regions. It comprised convolution, The authors extended their work in [9], where they dis-
ReLU, max-pooling, convolution, ReLU, max-pooling, fully cuss the generation of synthetic images in detail. The size
connected, ReLU, dropout, fully connected, and Softmax of synthetic images for license plate detection was 768 ×
layers. The goal is to identify text in the image and determine 384 pixels, and they generated 20,000 positive and negative
the plate region overlapping the identified text. The authors training images and 2,500 positive and negative validation
used horizontal and vertical projections to determine the images. For license plate detection, they modified the earlier
license plate’s bounding box. A third neural network with network and proposed the use of a single feature extractor
nine layers is used to classify text further. The inputs to network comprising four convolution and max-pooling layers
this deeper network comprise a grayscale image and Local whose output is fed into two networks comprising three fully
Binary Pattern (LBP) features. Final character recognition connected layers, one for localization of license plate and a
combines the prediction results of both CNN classifiers. The second containing an additional softmax layer for differen-
authors presented recognition results on the Caltech cars data tiating between license plate and background. For character
set [103] and Application Oriented License Plate (AOLP) detection, a separate network with three convolutional layers
data set [36]. The Caltech data set comprises 126 images of and two fully connected layers was proposed. This network
American license plates at a resolution of 896×592, while the accepted input of size 128 × 64 pixels and produced the coor-
AOLP data set has 2,049 images of Taiwanese license plates dinates of the top-left and bottom-right corners as the output
divided into 681 Access Control (AC), 757 Law Enforcement for each character. For the character recognition module’s
(LE) and 611 Road Patrol (RP) samples. The authors reported training, the authors extracted 450,000 positive and negative
accuracy of 77.5% for Caltech and 92.3%, 91.3%, and 84.7% samples. For testing, the authors used 1,152 images from
for the AOLP, AC, LE, and RP datasets, respectively. These platesmania.com and reported an accuracy of 98.3%. They
scores are reported in Table 5. also tested the proposed method on the AOLP Taiwanese
In [8], Bjorklund et al. proposed the use of CNNs for the data set and reported an F1 score of 97.2% for AC, 98.9%
detection and localization of license plates and another pair for LE, and 98.4% for RP images. For the Chinese license
of CNNs for identifying the boundaries of characters and plate data set ‘PKUData’, they only reported LPD accuracy
their recognition. The authors proposed the use of synthetic of 98.8%, 99%, 98.9%, and 97.7% for the G1, G2, G3, and G4
images to train these neural networks. The input to the first data sets, respectively. They reported an average recognition
stage is a colored image and the CNN contains 10 convolution time of 25.5 msec and 72.7 msec for images of dimensions
layers and a max-pooling layer after 2nd , 4th , 7th and 10th 640 × 480 and 1280 × 960, respectively, on an Nvidia
convolutional layers. The output of the 10th layer is fed into GTX 1080 GPU. They also reported average recognition time
two sub-networks. The first sub-network consists of three of 389 and 845 msec for images of dimensions 320 × 240 and
640 × 480 on Jetson TX1. The advantage of this approach is did not report the speed of recognition, although they did
that the network can be used as a plate or character detector mention that the algorithm can process six license plates in
by adapting the hyper-parameters, thus allowing the use of an image in real time.
shared training procedures and cost functions. Another fea- Masood et al. proposed the sequential use of three CNNs to
ture of the approach is the use of synthetic data for training detect license plates, segmentation of characters, and recogni-
the network, which reduces the data collection and tagging tion of characters. However, they did not provide architectural
effort significantly. details of the networks but made their solution available
Spanhel et al. performed pre-processing and applied a online [68]. They used the Caltech and OpenALPR data sets,
geometric correction to license plates before recognizing containing 328 US (adding images to the Caltech99 data set)
them [93]. They proposed to use the stacked hourglass CNN and 550 European license plates, and reported an accuracy of
structure proposed in [71] for finding the corners of the 93.4% and 94.5%, respectively. Lu et al. proposed using four
license plate. This aligner CNN comprises three hourglass CNNs for license plate detection, angle correction, spatial
modules for down-sampling and up-sampling features in transformation, and recognition [67]. The first part comprised
the spatial domain. The network used skip connections for a neural network with ten convolutional layers, two fully
improved performance of the gradient descent algorithm and connected layers, a bounding box (bbox) regression layer,
provided a probability map of the four corners of the license and an orientation classification layer (using 0, 90, 180, and
plate. It was trained using images of arbitrarily rotated license 270 degrees classes). The feature maps from 2nd , 4th , 6th
plates, thus enabling the network to correct the geometric ori- convolutional layers, bbox regression, and rotation predic-
entation of plates. The CNN for recognition was proposed by tion layers are used for rotation correction. This is further
the authors earlier in [92]. The authors had access to multiple refined by passing the result to the spatial transformation
real and synthetic data sets and trained and tested the CNN network proposed in [42]. Finally, the authors used the net-
using different combinations. The accuracy of recognition work proposed in [92] for character recognition. The Rotated
reported by the authors was 96% on real data. The data set was license plate data (Rlpd) database of Chinese license plates
titled ‘CamCar6k’ by the authors and comprised 6,064 Czech consisting of 400,000 samples (380,000 used for training and
license plate images divided into 2,750 training and 3,314 10,000 each for validation and testing) was used. The authors
test images. The average recognition time of the proposed reported a recognition rate of 94.7%. The main advantage of
algorithm was 1.87 msec using an Nvidia GTX 1080 GPU. the technique is its ability to handle rotation. However, the
The authors also reported a correct recognition accuracy of lack of testing on more frequently used data sets makes it
73.6% for OpenALPR data set. difficult to compare the proposed method with existing state-
Goncalves et al. used multi-task networks proposed of-the-art techniques.
in [115] for recognition of Brazilian license plates [23]. They Another approach focusing on using a spatial transforma-
used one stage for license plate detection and a second for tion network was proposed in [117]. The authors proposed
character recognition. They observed that object detection an efficient implementation of deep neural networks that can
algorithms required multi-scale features. However, this may recognize a license plate in an image in 3 msec. The proposed
not be necessary for license plate detection, so they only network consists of two parts, one for detection and another
used the features from the 11th layer for detection instead of for recognition of the license plate. The authors proposed
multiple layers. They also observed that a higher Intersection the use of a spatial transformation network proposed in [42].
over Union (IoU) threshold may not necessarily encompass The output of this network was forwarded to a CNN inspired
all characters of license plates and devised a new loss func- by SqueezeNet along with inception blocks [39], [96]. They
tion for penalizing regression while finding the corners. For reported accuracy of 95% on a private data set containing
training, they used translation, vertical flipping, brightness, 11,696 images of Chinese number plates. The authors claim
and contrast changes to augment the data set. They used high- that the proposed approach can perform license plate recog-
resolution 1920 × 1080 images, allowing them to zoom in nition in real time on various platforms, including FPGAs.
and out to develop a better training set. For recognition of Therefore, one of the advantages of the proposed approach is
the seven characters, they proposed to use a multi-task net- that it can be deployed on standalone hardware systems.
work and directly recognize characters instead of segmenting Xu et al. developed a comprehensive database titled ‘Chi-
them first, an approach similar to [92]. The authors used nese City Parking Data set’ (CCPD) that contained 250,000
SSIG and UFPR-ALPR data sets comprising 6,660 images real images with annotated license plate corners, bounding
containing 8,683 license plates from 815 vehicles with a boxes, and license plate numbers [107]. Having a com-
split of 3,595 training, 705 validation, and 2,360 test images. prehensive data set of real images captured and annotated
They reported a correct detection rate of 79.3% and, assum- by inspectors, the authors proposed an efficient algorithm
ing 100% detection, a recognition rate of 85.6%. They also titled ‘Roadside Parking net’ (RPnet) capable of processing
reported actual results for the complete recognition process, 60 frames per sec on NVIDIA Quadro P4000 GPU, i.e.,
i.e., detection and recognition, for SSIG and UFPR-ALPR approximately 16.7 msec per image. The authors also devel-
data sets, at 88.8% and 55.6%, respectively. However, they oped a ‘CCPD-Characters’ data set that contained at least
1,000 samples of each possible character on Chinese license Connectionist Temporal Classification (CTC) loss. They
plates. Their proposed RPnet comprised two modules. The tested the proposed algorithm on four data sets: ‘CarFlag-
first is the detection module containing ten convolutional Large,’ comprising 460,000 images of Chinese license plates
layers for extracting features and feeding three fully con- (322,000 training / 138,000 testing), ‘AOLP’ [36] compris-
nected layers for bounding box prediction. The second, the ing 2,049 images of Taiwanese license plates, ‘Caltech99’
recognition module, uses pooling layers [84] for extracting comprising 126 images of American license plates that were
features of interest along with classifier layers for character augmented by adding 1,626 new images, and ‘PKUData’
recognition. The two modules work as a single network, comprising 3,977 images of Chinese license plates. They
where the detection module is used as ‘attention’ [14] for reported a recognition accuracy of 95.6% for AOLP-AC,
the recognition module. To train the network, the authors 96.4% for AOLP-LE, 83.8% for AOLP-RP while taking
set the license plate detection loss to smooth L1 loss func- 400 msec per image, 94.1% for Caltech99 data set, 99.73%
tion between the predicted and ground-truth bounding box, for PKUData while taking 310 msec per image, and 97.1%
while the classification loss was set to cross entropy loss. for CarFlag-Large data set while taking 310 msec per image.
The authors reported a detection accuracy of 92.7% and Further details related to these results can be observed in
recognition accuracy of 85.1% on their proposed data set at Table 5, and the block diagram of their proposed algorithm
61 frames per sec. The work presents an excellent comparison is presented in Figure 5. The main advantage of the proposed
of variation in images in available data sets to justify the approach is that no segmentation is required.
development of their own data set. Their proposed tech- The authors in [98] proposed using two deep neural net-
nique has the advantage of detecting and recognizing images works to detect and recognize Korean license plates. They
of varying dimensions because of their region of interest used 8,075 images to train the YOLOv3 network to identify
approach. One of Xu’s collaborators, Meng et al., proposed the position of number plate in an image, both during the day
another method based on locating, cutting, and recognizing and night time. For night images, they modified the histogram
license plates and characters in [69]. To detect the license so that the number plate becomes visible in the training
plate, they proposed using a network baptized ‘LocateNet’ data. To recognize the license plate, they proposed using
comprising ten convolution layers, two max-pooling layers, Long Short Term Memory (LSTM) architecture. Instead of
and three fully connected layers to predict the four vertices directly using the images for recognition, the authors applied
of license plates. The detected license plate was spatially geometric transformation. This was done to ensure that the
transformed to have a rectangular size of 256 × 72 pix- characters provided to LSTM are not distorted and, therefore,
els. The horizontal and vertical projections of these images result in a correct label sequence. This was specifically done
were passed through a four-layer fully connected network, for Korean license plates with two rows of characters. They
titled ‘CutNet’, for character segmentation. These segmented trained the license plate detection algorithm on 8,075 images
characters were passed to AlexNet [49] for recognition. The and the character recognition LSTM network on 110,000
authors tested the proposed method on CCPD data set images images. They tested the algorithm with 1,425 images and
and reported an accuracy of 96.2%. obtained an accuracy of 95.6% for the five different types
of Korean license plates. In another RNN based approach,
authors proposed using CNN and gated recurrent unit (GRU,
B. RNN-BASED METHODS similar to LSTMs) based neural networks for license plate
Building on their prior work [59], in which they used three recognition [109]. The authors used CNN to extract features
convolutional neural networks, Li et al. [60] proposed a solu- and GRU to encode and decode these features. On their own
tion comprising a convolutional neural network for detection data set, the authors claimed to have achieved 99% recogni-
and a recurrent neural network for recognition of license tion accuracy with an average execution time of 200 msec per
plates. They proposed to use a VGG-16 network proposed image.
in [91]. The network is pre-trained on ImageNet to extract In another recent work [87], Selmi et al. proposed the
features from the input images but the authors retained only use of a recurrent neural network titled Mask-RCNN [30] to
the 2nd and the 5th pooling layers. They also removed the detect and recognize license plates. For feature extraction, the
fully connected stages of the VGG-16 network to obtain the authors used a modified version of GoogleLeNet by reducing
desired features. Next, they proposed to use Region Proposal its number of inception modules to six and adding several
Network (RPN) by Ren et al. [84], also known as FRCNN, pooling layers. They also used the Swish activation function
to obtain bounding boxes around objects in images. This was instead of ReLU. To generate the proposal regions, they
done by modifying the algorithm to apply two convolutional used Mask-RCNN using 12 anchor boxes. The features are
filters across each sliding position to get a 512-dimensional then fed to two convolutional layers for license plate detec-
feature vector that can be fed to the two fully connected tion and bounding box regression. They used non-negative
RPNs. To avoid character segmentation, the authors tackled maximum suppression to avoid duplicate regions of interest.
the character recognition problem as a sequence labeling To remove false positives and false negatives, the authors
problem. They used bidirectional recursive neural networks used the RoI-Align layer of Mask-RCNN. After using fully
(BRNN) and Long Short Term Memory (LSTMs) with connected layers for the detection and estimation of the
FIGURE 5. Input image is passed to a convolutional neural network (VGG-16) followed by Region Proposal Network (RPN) to identify multiple regions of
interest. These features are passed through two fully connected layers to extract features for License plate detection and recognition. Softmax layer on
top provide probability of each region being plate/non-plate and Regression helps find a bounded box while the BRNN layer followed by Softmax
proposes the probability of recognition of number plate letters and characters.
FIGURE 6. The convolutional layers proposed in [87] used Swish activation function instead of ReLu. The pipeline for detection of license plate is shown
at the top and for character recognition at the bottom.
detect license plates and a CNN for recognizing digits and 78.3% at 35 frames per sec, i.e., 28.6 msec per image. The
characters on Serbian license plates in [5]. They proposed proposed method may struggle to detect small license plates,
using the YOLO algorithm for better speed because it can as the YOLO networks generally do not detect small-sized
theoretically process 45 images per sec. For training, they objects very well.
generated Serbian license plates using backgrounds from the In another study, the authors used a modified version of
SUN [106] and Stanford [32] background databases. Their the Fast YOLO network for character recognition. Extend-
reported character recognition rate is 97%, and the digit ing their previous work, they proposed YOLOv2 for vehicle
recognition rate is 94.5%. detection, Fast-YOLO2 for license plate detection and layout
Laroca et al. proposed using different architectures for classification, and CR-NET for recognition [52]. They mod-
license plate detection, character extraction, and recognition ified the layers of Fast-YOLO2 to improve its performance
in [51]. They tested both YOLOv2 and Fast-YOLO models and tested the proposed method using Caltech99 [103],
for license plate detection. They noticed that they had to English LP [94], UCSD-stills [16], Chinese LP [118],
change the number of filters depending on whether the data AOLP [36], Open ALPR (EU) [74], SSIG [22] and UFPR-
set comprised only cars or both cars and motorcycles, i.e., one ALPR [51] data sets and reported average processing time
or two classes. The number of filters was set to 30 or 35 for of 13.6 msec per image. They reported accuracy of 96.1%,
one class or two classes, respectively. Following license plate 95.5%, 97.3%, 95.4% 98.4%, 95.7%, 96.9%, and 82.5%,
detection, they proposed using CR-NET [89] in two sequen- on these datasets, respectively. This comprehensive testing
tial stages for character segmentation and recognition. The covers license plates from the United States, Europe, China,
input to the character segmentation stage was a patch of size Taiwan, and Brazil. The work focused on using different
240×80 pixels. They fed both original and negative images of YOLO algorithms and their modification to achieve faster
each license plate during training. To obtain better results, the recognition speed. In [7], a comparative analysis of different
authors modified the depth of networks for character and digit algorithms for synthetic data sets was presented.
recognition. They conducted experiments with SSIG [22] Hendry et al. proposed using a YOLO-based neural net-
and Federal University of Parana-Automatic License Plate work architecture for the detection and recognition of license
Recognition (UFPR-ALPR) data sets [51], divided into 40% plates [33], [34]. Instead of using a two-stage approach,
training, 20% validation, and 40% testing samples. The SSIG they employed 36 tiny YOLO networks to detect ten digits,
data set contains 2,000 frames from 101 videos, on which the 25 characters (‘o’ and ‘0’ are considered the same in Tai-
proposed system achieved an accuracy of 93.5% at 47 frames wanese license plates), and one network for detecting the
per sec, i.e., 21.2 msec per image. For the UFPR-ALPR data license plate. The novelty of the method was in a sliding
set that comprises 4,500 frames from 150 videos with camera window based approach and a modified YOLO network with
and vehicle moving, the method obtained an accuracy of ten convolution layers, three residual layers, one average
FIGURE 7. Upper row comprises of convolutional and maxpooling layers transferred from YOLO-VOC network in [89]. The lower rows also comprises of
convolutional and maxpooling layers, with a slightly different arrangement.
pooling layer, one fully connected layer, and one softmax to be used after character recognition to extract the correct
layer. Since the objects were detected using a sliding window sequence of numbers for license plates of different countries.
based approach, multiple objects may be detected at the same The benefit of the proposed system is that YOLOv3 has a
location, and objects that are not part of the license plate better mean average precision than the YOLOv2 algorithm.
may be detected. Therefore, localization is essential. Since Also, the ability of YOLOv3 to make predictions at multiple
the proposed method is for Taiwanese license plates, the scales helps it detect smaller objects. Contrary to the popular
authors took advantage of the fact that there are six objects character segmentation and recognition method, the authors
on these plates. The authors used the AC images of AOLP treated character recognition as an object detection problem.
data set [36] for training and tested the system on all of the This was achieved by a spatial pooling pyramid that splits
images, including AC, LE, and RP subsets. They reported a a feature map into Bi bins at the i-th layer. Feature maps
recognition speed of approximately 800 msec to 1 sec per are then pooled using max-pooling into the same size and
image on an Nvidia GTX 970 GPU and an accuracy of 78%. thus producing N × Bi vector. YOLOv3 is better at detecting
The authors also reported a license plate detection accuracy smaller objects than YOLOv2 and is comparable to other
of 98.2%. The block diagram of the proposed architecture is CNN-based techniques; however, there is still a trade-off
presented in Figure 8. The single-pass approach for detecting between accuracy and speed. It is generally recommended to
and recognizing individual digits requires a sliding window significantly increase the training data size for the YOLOv3
to be moved over the image. This results in slowing the network to reduce this trade-off.
algorithm and thus not providing real-time results. Secondly, The focus of authors in [35] was to build a robust Net-
having a separate network detecting each character or digit work for LPR and was trained on images of varying sizes
may result in a large memory requirement. for KarPlate, AOLP, Caltech, MediaLab and University of
In other reported results, the authors in [97] proposed to Zagreb. There were a total of 71 character classes, i.e., 35 for
use YOLOv2 for license plate detection and recognition and Korean, 10 for numerals and 26 for English.
reported a processing rate of 5 frames per sec and an accuracy For training and testing, the authors developed their own
of 97%. In another approach, Izidio et al. proposed to use data set titled KarPlate, comprising 4,000 full High Defini-
a pair of networks [41] comprising the YOLOv3 network tion (HD) images of Korean cars. The proposed algorithm
for detecting license plates and a second neural network was tested for license plates from South Korea, Croatia, the
to recognize digits and characters. They implemented their United States, Greece, and Taiwan. From the KarPlate data
system on a low-power Raspberry Pi3 platform and reported set, 3147 images were used for training and 850 for testing.
an overall recognition accuracy of 98.4% on images of size Out of the 2049 images of AOLP, 681 + 757 images of AC
1024 × 768 while taking 4.9 sec on average. They passed the and LE subsets were used for training, and 611 images of RP
training images through affine transformations to augment subset for testing. For MediaLab data, 259 images augmented
the training data set. The advantage of the proposed approach by 501 images of the University of Zagreb, 108 images of
is that it has low memory and computational power require- OpenALPR Europe, and 1,428 images of ReId were used for
ments, even if it is relatively slow due to implementation on training, and 431 images of MediaLab data set were used for
less capable hardware. testing. For training, 80 out of 244 images of the Caltech
In [35], the authors addressed the issue of generalization, Car data set augmented by 244 images of the OpenALPR
i.e., recognition of license plates from multiple countries. US data set, 108 images of the OpenALPR Europe data set,
They proposed using Tiny YOLOv3 for license plate detec- and 501 images of the University of Zagreb data set were
tion and YOLOv3-SPP (Spatial Pyramid Pooling) to rec- used. The accuracy of the proposed system remained 99.41%
ognize license plate characters. The character recognition for AOLP-AC, 97.88% for AOLP-LE, 99.51% for AOLP-RP,
algorithm returns the bounding boxes of characters without 96.98% for MediaLab, 97.83% for Caltech Cars, 97% for Uof
their sequence; therefore, a layout detection algorithm had Zagreb, and 98.17% for KarPlate. The processing times were
FIGURE 8. The authors propose sliding window single class detection in [34]. To account for increase in computations because of sliding window method
they have reduced the number of layers in the architecture.
29.19 msec for AOLP, 47.02 msec for MediaLab, 34.3 msec 99.7%, 99.4%, 99.9%, 99.9%, 99.4%, 94.8% for the Base,
for Caltech Cars, 32.18 msec for Uof Zagreb, and 65.12 msec DB, FN, Rotate, Tilt, Weather, and Challenge sub-cases,
for KarPlate, resulting in an average of 41.56 msec for license while reporting an average processing time of 6.7 msec.
plate recognition. In [80], authors tackled the problem of detecting and recog-
In [44], the authors propose a super-resolution convolu- nizing license plates containing one or two lines of text. They
tional neural network for license plate detection in fog-haze proposed using a lightweight convolutional neural network
environments. They hypothesize that the captured foggy to extract license plate features and simultaneously perform
image is usually brighter due to added atmospheric light than classification (one or two lines plate) and character recog-
the fog-free image. They developed a joint fog-haze removal nition while treating the recognition as a sequence labeling
model to address this issue using a convolutional neural problem. The first part of the lightweight network is the
network. They employed YOLOv3 for feature extraction backbone responsible for extracting features that are passed
and WPOD-NET to correct the position of bounding boxes. to the classification head, which determines if the license
Detected license plates may suffer from blurring or noise plate has a single or two lines. In the case of two lines,
pollution, affecting character recognition. Therefore, they it determines the slicing parameter. The slicing parameter can
proposed to use SRCNN to obtain high-resolution images. be fixed if the height ratio between two lines of characters
The authors next used a combination of connected com- is constant. Finally, the recognition head using both fea-
ponents and template matching using neural networks for tures and output of the classifier head, recognizes the license
character recognition. The authors collected a data set of plate characters. The feature extraction network is inspired
3000 images and added noise by rotating them and other by LPRNet with added inception layers for better feature
image processing techniques to get 12000 images. They extraction. License plate recognition rates for CCPD base
reported accuracy of 94.12% using a 7-layer network. examples were reported to be 99.1%, while they remained
In [102], the authors proposed using two deep neural 87.6% for the CCPD challenge examples and 98.84% for the
networks, baptized VertexNet and SCR-Net, for LPD and AOLP-RP data set.
character recognition, respectively. The advantage of using In a recent work, Zou et al. [121] proposed using the
VertexNet is that it assists in rectifying license plates and YOLOv3 network for license plate detection and extraction.
hence improves license plate detection and recognition accu- They employed an Improved License Plate Recognition Net
racy for images captured at different angles. In VertexNet, the (ILPRNET) for character localization and recognition. They
authors proposed to use a single detector on 256 × 256 size only discuss the ILPRNET while using the stock YOLOv3
images. The network consists of a backbone, fusion, and head algorithm for license plate detection. They proposed to use a
networks. In the backbone, the residual neural network block U-Net for the localization of license plate characters which
is used along with the Squeeze-and-Excitation attention mod- incorporates a spatial attention mechanism. They reported
ule to aggregate the feature maps across spatial dimensions. accuracy of 96.3%, 97.9%, and 95% for the AC, LE, and
These features are fused by the feature pyramid network, RP sub-sets of the AOLP data set. For the CCPD data set,
and the data is fed to the head network for the prediction they reported accuracy of 99.2%, 98.1%, 98.5%, 90.3%,
of four vertices. The SCR-Net utilizes images of dimension 95.2%, 97.8%, and 86.2% for the Base, DB, FN, Rotate, Tilt,
64 × 256 to predict the characters. The authors tested the Weather, and Challenge sub-sets, respectively.
proposed system on CCPD, AOLP, PKU, and CLPD data sets Huang et al. proposed using two fully convolutional one-
and reported an accuracy of 89.8% and 96.8% on CLPD and stage (FCOS) object detectors for simultaneously classifying
PKU data sets. For the AOLP data set, they reported accuracy license plates and characters, followed by an assembly mod-
of 99.4%, 99.9%, and 99.7% for AC, LE, and RP data sets. ule in [38]. They employed Res-Net50 as the backbone
For the CCPD data set, they reported accuracy of 99.9%, network for the extraction of features. Features having a
resolution 1/8 with respect to (wrt.) input image were used sets 1 and 2, respectively, when trained with both real and
for LP detection, while features having a resolution 1/4 wrt. synthetic images. The authors also collected a third data
to input image were used for character recognition. The set in Suzhou, China, containing 22026 license plates. They
advantage of using FCOS is that they removed intermediate reported accuracy of 89.4% on this data set. Since the authors
steps like ROI proposals and segmentation and also reduced have used both CNN and RNN type networks, we have added
the problem of positive-negative anchor boxes imbalance. this method to other networks category.
The license plate detection branch contains four sub-branches Zhuang et al. proposed using a convolutional neural net-
for bounding box and orientation regression, centerness work for semantic segmentation to extract license plate
regression, and classification. Similarly, the character recog- characters and then used them for counting-based refine-
nition branch also has four sub-branches of bounding box ment and recognition of characters of license plates [120].
regression, centerness, and character classification. The clas- The advantage of their proposed technique is that the com-
sification subbranch predicts 73 channels map (26 English, plete image is processed at once instead of using a sliding
ten numeric, 36 Chinese). For the HZM data set comprising window based approach. The proposed semantic map based
license plates from China, Hong Kong, and Macao, they approach classifies each input image pixel into a particular
reported a license plate recognition accuracy of 98.2%. They class. For pre-processing, the authors clipped the sides of the
reported LPR accuracy of 95.8%, 96.6%, and 91.6% for the license plate by taking horizontal and vertical projections of
AC, LE, and RP subsets of the AOLP data set. the characters. For semantic segmentation, they employed a
DeeplabV2 ResNet-101 model and, to train the network, they
D. OTHER ARCHITECTURES generated 60,000 synthetic license plate images. The ground
Wang et al. proposed using Generative Adversarial Net- truth had a bounding box drawn around characters, and the
works (GAN) to generate synthetic training data to train their authors did not use pixel-level segmented characters [12].
proposed networks [101] to circumvent the requirement of To remove any inaccuracies in case of overlap, the authors
a large training data set for training deep networks. They checked for overlap between the area of the current character
proposed the use of synthetic RGB images as input to an and previous characters using Intersection over Union (IoU).
eight-layer CNN with batch normalization in 3rd , 5th , 7th and To address the issue of two characters appearing as one
8th layers. Feature vectors extracted from the CNN were fed because of close proximity, they proposed counting char-
to the LSTM based RNN, which contained a fully connected acters for such regions since the number of characters on
layer of 68 neurons at the output (31 Chinese characters, 26 a license plate is generally fixed. They assumed character
English characters, ten digits, and one non-character class). counting as a classification problem and employed AlexNet.
The two data sets used by the authors comprised 203,774 In the final step, they utilized an Inception-v3 [96] model for
training plus 9,986 testing plates in data set 1 and 45,139 plus character recognition. They tested the proposed method on
5,925 training and testing plates in data set 2. The authors the AOLP data set [36], Media Lab data set [2], and their
defined recognition accuracy as the number of correctly rec- own data set titled Chinese License Plate Data set (CLPD),
ognized license plates with respect to the given number of and accuracy rates of 99.4%, 99.3%, 99%, 97.9%, and 99.7%,
license plates and reported it as 98.6%, and 96.2% for data respectively.
Peng et al. proposed using a Mobile-Net Single Shot in their previous work [89]. The authors tested the proposed
MultiBox Detector (SSD) neural network for detecting algorithm on AOLP-RP data set containing 611 images, SSIG
license plates, followed by a CNN for recognition of license containing 804 images extracted by authors, OpenALPR
plates [77]. SSD has the advantage of discretizing the space in (EU) containing 104 images, OpenALPR (Brazil) containing
an image into a series of boxes and providing scores to each 108 images, and a new data set titled ‘CD-HARD’ containing
object in the box to get better fitting around the objects. The 102 images. They reported a correct recognition rate of 93.5%
advantage of SSD is that it is faster than Faster R-CNN and for OpenALPR (EU), 91.2% for OpenALPR (Brazil), 88.6%
YOLO. The authors used the SSD network with 47 convolu- for SSIG, 98.4% for AOLP-PR, and 75% for CD-HARD. The
tional layers. Next, they refined the box by applying Sobel average recognition time reported by them is 200 msec on a
operator based filtering and using the histogram and hori- Titan X GPU.
zontal and vertical sums. The plate’s top and bottom edges Duan et al. [18] proposed a real-time license plate recog-
were obtained using Random Sample Consensus (RANSAC) nition system without performing character extraction. The
algorithm. It should be noted that although SSD may be faster proposed systems used color, texture, depth, and morpho-
than R-CNN and YOLO, RANSAC is a relatively compu- logical features along with an SSD [63] for license plate
tationally intensive algorithm. However, the authors did not detection. The VGG16-Atrous network based SSD was com-
compare the results between these algorithms in terms of pared by authors to YOLO and Faster RCNN, and they
speed and accuracy. Next, the authors proposed rectifying the noticed that the performance of SSD made it a better option.
license plate’s geometry by making the top and bottom line They concluded this because the results produced by SSD
segments parallel to improve recognition rates. To recognize were similar in quality to FRCNN while being computation-
the characters, the authors separated the characters of the ally comparable to YOLO [18]. To improve the performance
license plate by using vertical projections. The segmented of SSD, the authors suggested using randomly clipped and
characters were passed for recognition to a CNN, containing flipped versions of the images along with the original.
three convolutional layers, three inner product layers, two For license plate recognition, the authors proposed using
max-pooling layers, and one flattened layer. The input to the GoogleLeNet and sped up its non-tensor layers, for real-time
CNN was of size 30 × 14 × 3. To test their network, the performance, by using DeepRebirth [61] based model. The
authors generated geometrically distorted images by verti- authors collected 17,500 real license-plate images comprising
cally rotating the car image from 60◦ to 90◦ and distorting five plate colors, 34 Chinese characters, 24 English charac-
it horizontally. For a data set of 5,000 images, the authors ters, and ten digits. The authors used 360 images to test the
reported accuracy of 92% for standard and 85% for distorted results of the detection algorithm and 6,688 samples to test
images and an average accuracy of 89%, including a third recognition and reported a correct recognition rate of 96.6%
data set. Although the method employed a neural network while taking approximately 24 msec to process one image.
for license plate detection, the authors employed conventional The structure of the proposed system is presented in Figure 9.
techniques to refine the bounding box. Since the authors pro- In [85], the author proposed using Capsule networks to
pose using both CNN and SSD networks, we have categorized recognize 31 Chinese characters, 24 English language char-
their work into a hybrid category and presented it in this acters (minus ‘O’ and ‘I’), and ten digits. It is assumed that the
section. input to the proposed neural network will be a license plate
Building on their previous work [89], Silva et al. pro- of size 272 × 272 pixels. The network comprises five layers:
posed using the YOLOv2 network to detect vehicles in their input layer, convolution layer, capsule layer, character cap-
three-step license plate recognition system [90]. The output sule layer, and fully connected layer, as shown in Figure 10.
was scaled to account for the ratio between the license plate The character capsule layer consists of 65 character capsules,
and car sizes before it was fed to their newly proposed Warped i.e., one for each character to be recognized. The author gen-
Planar Object Detection Network (WPOD-NET) for license erated 160,000 training images, collected 3,000 real images
plate detection and finally to an optical character recognizer for testing, and reported an accuracy of 89.6%.
(OCR). The proposed WPOD-NET combined the strengths In [83], the authors proposed using an encoder-decoder
of Spatial Transformation Networks [67], i.e., the ability to pair to extract features of license plates and eight parallel
detect non-rectangular regions and multiple object detection classifiers to recognize the number plate characters. They
capability of SSD [63]. WPOD-NET’s eight-channel fea- collected a database of 11,000 license plates and achieved
ture map contains object and non-object probabilities and an accuracy of 96% on 4,000 test images. The input to the
affine transformation parameters. The proposed network had encoder-decoder network pair is a grayscale image, while the
21 convolutional layers, of which 14 were in residual blocks target is a binary image. The segmented characters are passed
and two prior to softmax and regression layers. They used to eight parallel classifiers trained to recognize the characters.
the residual blocks presented in [29]. To detect and trans- This is done by extracting features from the encoder and
form the object, their loss function comprised of i) affine passing them to the classifier. One classifier identifies the ten
transformation loss and ii) probability of object / non-object numerals, while the other classifies the 19 characters. Images
loss. For recognition, they used the YOLO network proposed captured for the data set have 1280 × 960 resolution and are
FIGURE 9. DNN architecture proposed in [18] passes the input image to a Single Shot Detector (SSD) for license plate detection. SSD comprises of a
VGG-16 network followed by multiple convolution layers with each layer contributing features for license plate detection. The detected region is passed
to a layer of convolution and inception layers followed by a softmax layer for recognition of characters.
FIGURE 11. Encoder Decoder pair architecture proposed by Rakhshani et al. [83] for binarization of license plate images to separate the characters from
the background. They propose to use regression based classifiers for recognition of license plate characters.
1 × 1 convolution, batch normalization, LReLU, 3 × 3 con- an augmented data set. The 200,000 images obtained are
volution layer, and batch normalization. In the second part of transformed to grayscale and inverted to obtain a total of
the network, a discriminator attempts to identify whether an 400,000 images. The authors reported that when the proposed
image passed to it is the generator’s output or a real image. DenseNet, was trained with 300 real license plate images
Thus, the generator tries to deceive the discriminator while along with generated and augmented images, the results were
the discriminator tries not to be fooled. The discriminator similar to a baseline method trained with 200,000 real images.
proposed by Nguyen et al. comprised four 3×3 convolutional The authors reported 97.5% detection and 99.3% charac-
layers, batch normalization, and LReLU. This is the same ter recognition accuracy. The authors also highlighted that
architecture as the PatchGAN classifier. The authors used a data generation using CycleWGAN-GP produced an increase
data set of 500 images having resolution 128×128 pixels and of 1% accuracy in character recognition and a further 5%
a second data set of 80 images of size 128 × 192 pixels. The increase in accuracy because of data augmentation. On the
authors did not report the results, as the algorithm only pro- AOLP data set, the authors reported recognition accuracy of
duced a binarized deblurred image. The idea presented in [72] 96.6%, 97.8%, and 91% for AC, LE, and RP images of the
was inspired by the work of Svoboda et al., who proposed data set.
using CNN to deblur license plate images [95]. Svoboda et In a recent work, Lee et al. proposed denoising and rec-
al. proposed a 15-layer convolutional network comprising tification networks to improve the quality of license plate
convolutional and ReLU layers. The network was trained recognition [56]. The proposed algorithm is comprised of
using actual patches and artificially blurred patches. Their prediction networks for denoising and rectification. They pro-
database comprised 140,000 images divided into 14,000 test posed auxiliary prediction networks for count classification
images and 126,000 training images. The process does not and segment prediction, and recognition networks for text
detect license plates or segment them but only focuses on detection and classification. For denoising, the authors pro-
character recognition and yields an accuracy of 91%. posed to use U-Net-based architecture [40] with the addition
In [104], Wu et al. tried to answer the question about the of skipped connections to share low-level information across
type and size of the data set required to train a deep neural network layers. The U-Net-based encoder and decoder pair
network for license plate recognition. They generated syn- produced an output that is fed to the rectification network,
thetic data and augmented it using various data augmentation which is also comprised of an encoder-decoder pair. The
techniques to train and test a modified form of DenseNet [37]. output from this network would be denoised and rectified.
Inspired by the CycleWGAN proposed in [101], the authors The authors further proposed using binary segmentation and
proposed to use an improved version of the CycleWGAN count estimation using the features extracted by the denois-
network, titled WGAN-GP [28], which improved training sta- ing and rectifier encoders. The authors used the segment
bility. The license plates generated by the proposed network decoder based on U-Net architecture to produce a license
were cropped and flipped horizontally and vertically for data plate segment with values indicating the probability of pixels
augmentation. The proposed DenseNet comprised a convolu- belonging to the license plate. Similarly, the sum of features
tion layer, followed by two pairs of dense block and transition from the last layer of denoising and rectifier encoders is
layers, followed by another dense layer. The last dense layer is fed to the counting decoder, which predicts the number of
connected to a fully connected layer consisting of 68 neurons characters in the image. The authors tested the proposed
for the class labels (31 Chinese, 26 English, ten digits, one method on AOLP data set [36] and Korean data set compris-
blank). The network was trained using Adam optimizer and ing 10,650 images, divided into 6,400/4,250 training/testing
stochastic gradient descent. To train the GAN, the authors images titled ‘VTLP’. Authors reported an accuracy of 99.2%
used 1,000 fake license plates as the source and 100 real on AOLP-RP and 93.1% on VTLP.
plates from the data set used in [101] as the target. They Another interesting topic associated with license plate
employed the CyclesWGAN-GP network to obtain 80,000 recognition is vehicle re-identification, which may require
license plates and then used dilation and erosion, motion more than just license plate recognition. However, it is still
blurring, uneven lighting, stretching, affine transformation, an important constituent of the process. In 2016, Liu et
downsampling, and addition of Gaussian noise to generate al. discussed the issue of re-identification of a vehicle at
TABLE 6. Best F1 and accuracy scores reported in literature for different datasets.
more recently, the focus has shifted to solving the complete classification is generally slow. Its variant, Fast RCNN, takes
chain of detection and recognition using deep networks. For an image as input instead of features; hence, CNN convolu-
that, authors generally choose between RCNN, YOLO, and tions have to be performed once for the image instead of the
SSD algorithms for detecting license plates in images. The 2,000 candidate regions, thus, making it faster. However, both
reasons for choosing among these methods are presented RCNN and Fast RCNN suffer from selective search, which is
below. slow. Faster RCNN allows region proposals to be predicted by
The design of RCNN, region-convolutional neural net- a parallel network, thus speeding up calculations. Variants of
works, focuses on object detection and localization. As the RCNN have improved prediction speed; however, only Faster
name indicates, it is a CNN capable of classifying objects RCNN can produce near real-time results. Their training
in images with the additional capability of localizing them. requires multiple phases, and overall the network is slow to
RCNN speeds up the process of identifying objects by predict.
proposing around 2,000 candidate regions. Since 2,000 To address these problems, the YOLO and SSD algo-
regions proposals per image is a large number, RCNN based rithms were proposed. As the name suggests, you only look
once (YOLO) is an efficient algorithm for finding objects for identification compared to YOLO, resulting in higher
in images. Since objects are identified in different grids and accuracy at the cost of speed. VGG-16 takes most of the
grids representing similar objects are subsequently combined, processing time, which may be more than 75% of the total
the algorithm does not require multiple passes. This results processing time. The use of multi-scale feature extraction
in a fast algorithm. However, the resizing of the input image benefits SSD in terms of localization; however, the downside
and the size of the grid determines if the algorithm would is that smaller objects that may not appear across all feature
suffer from localization errors or not. Also, compared to other maps may not get detected.
algorithms, the results are not very accurate. While covering license plate recognition methods,
Comparatively, Single Shot Detector (SSD) performs bet- we observed that the three most frequent architecture choices
ter than YOLO in the localization of objects. SSD builds were CNN, YOLO, and LSTMs. Some authors proposed
on the VGG-16 architecture while discarding its fully con- using Capsule networks or Alexnet with inception layers
nected layers. The addition of auxiliary convolutional layers or even simple logistic regression based classification for
increases features at multiple spatial scales. SSD builds upon recognition of characters on the detected license plate region.
the philosophy of multibox based bounding box proposals However, CNN, YOLO, and LSTMs remained the three main
but is capable of classifying objects. SSD adds more boxes choices for the task of recognition. As for the license plate
detection, YOLO-based methods for character recognition Compared to conventional deep learning strategies, the
performed efficiently with respect to time; however, the focus has recently shifted to semi-supervised learning. The
quality of results was not as accurate as some CNN or LSTM current trend is to use large amounts of unsupervised data
based methods. and a small amount of supervised data to improve model
Convolutional neural networks from LeNet-5 to date have performance. This is done by optimizing two loss functions,
been specifically used for the task of character recognition. one with supervised data and another with unsupervised data.
They are more successful in approaches where the license The advantage of semi-supervised methods is the reduced
plate has already been detected and hence do not have to time required for data annotation and preparation, as fewer
localize the sequence of characters. Thus, it was observed data samples have to be marked manually. An interesting area
that in most of the cases where CNN was proposed for char- to explore could be the minimum number of labeled samples
acter recognition, the license plate had already been detected. required to train a license plate recognition system using a
LSTMs have proven to be very efficient for text-based natural semi-supervised approach.
language processing (NLP). However, the problem for license Transformers have recently seen much interest in the
plate text recognition is slightly different from NLP since domain of natural language processing. Similar to it, Google
the characters appear randomly, and nearly all combinations has introduced a vision transformer for image classifica-
exist. This is unlike natural language processing, where the tion. Transformers address the problem of requiring large
sequence of characters depends upon the language vocab- annotated/tagged data sets for supervised learning. They
ulary. Therefore, the only advantage, in this context, that employ self-supervised learning where large unlabeled data
LSTMs may have is their ability to determine how many sets are used for training huge networks with billions of
characters and numbers appear on a license plate and in which parameters [13]. These pre-trained models cannot be used
order. This can help reduce errors, especially when zero ‘0’ for any specific task; however, re-trained or fine-tuned using
is mixed up with the English alphabet ‘o’, and similarly ‘9’ a small data set, they can produce state-of-the-art results.
and ‘g’, etc. Such networks have already been used for biomedical image
It was also observed in the literature that instead of relying classification [6] and fine-grained image classification [10].
only on real data, the focus has shifted to exploiting synthetic Since many image data sets comprise characters and num-
data. This is generated using conventional image processing bers, such pre-trained networks may be trained for license
techniques or more recently developed generative adversarial plate recognition by using a reduced fine-tuning data set.
networks (GANs). To get better results, authors trained their Alongside these directions, methods like YOLO will con-
networks on huge data sets comprising positive and negative tinue to produce better results. Recently N-YOLO [43] was
examples of license plates and their characters. However, proposed for real-time object detection and tracking. The
it seems that the trend of generating huge data sets will idea was to use fixed-size image patches instead of reducing
not continue as the focus has shifted to fine-tuning for new the image size. This approach may help in the license plate
applications instead of complete training. The idea is to train a recognition task since the license plate is generally a smaller
huge common network on large amounts of data and then use object in the complete image frame. Most of the approaches in
applications or use-case specific examples to fine-tune this the literature are top-bottom approaches, i.e., a significantly
common model. This has resulted in significant gains in both huge list of objects is considered as hypotheses, rectangular
computer vision and natural language processing based tasks. boxes are identified, and then this list is reduced. Recently,
the focus has shifted to bottom-up approaches as well,
VIII. CONCLUSION where objects emerge from combining sub-objects or parts.
Current state-of-the-art literature on License plate recogni- An example is CornerNet [55] where corners are detected
tion has focused mainly on the detection, extraction, and first, and objects are then detected based on detected corners.
recognition of license plates. Some authors have also focused HoughNet [86] comprises a CNN connected to three branches
on deblurring, denoising, and geometric transformations to predicting dimensions of the object bounding box, object
improve recognition accuracy. Neural networks were ini- center location, and visual evidence scores calculated using
tially employed to solve the individual problems of detection, log-polar based voting. The reason HoughNet is interesting
extraction, or optical character recognition. However, later is that it has demonstrated promising results for the detection
efforts focused on employing them for the complete process. of smaller objects.
In this aspect, researchers have proposed using two, three, The algorithms presented in this work have increased the
four, and even five networks to identify if an image contains accuracies of license plate recognition considerably, so much
a license plate or not. Over the years, various authors have so, that for the Caltech data set an accuracy of 96% was
employed custom CNNs, generative adversarial neural net- achieved using YOLO architecture. For AOLP data set an
works, recurrent neural networks, and single shot multi-box average accuracy of 99% was achieved using CNN architec-
to solve the problem of license plate recognition. The focus ture. For the PKU dataset an average F1 score of 99% was
on increasing the speed of these networks has revolved around achieved using RCNN and LSTM architectures. For Ope-
the use and modification of the YOLO network and its evo- nALPR, UCSD, SSIG, KarPlate and University of Zargeb
lution with time. data sets accuracies of 96%, 97%, 97%, 98% and 97% were
achieved using YOLO type networks while for CLPD an [15] J. Dai, Y. Li, K. He, and J. Sun, ‘‘R-FCN: Object detection via region-
accuracy of 77% was achieved using Xception CNN and based fully convolutional networks,’’ in Proc. Adv. Neural Inf. Process.
Syst., 2016, pp. 379–387.
LSTM architecture. [16] L. Dlagnekov and S. Belongie. (2005). UCSD Dataset. Accessed:
License plate recognition has proven to be extremely ben- Apr. 7, 2020. [Online]. Available: http://vision.ucsd.edu/belongie-
eficial in tasks related to access control, traffic monitoring, grp/research/carRec/car_data.html
[17] S. Du, M. Ibrahim, M. Shehata, and W. Badawy, ‘‘Automatic license plate
automatic violation ticketing, paid metering, toll booth tick- recognition (ALPR): A state-of-the-art review,’’ IEEE Trans. Circuits
eting, pollution control, security and surveillance. The birth Syst. Video Technol., vol. 23, no. 2, pp. 311–325, Feb. 2013.
of smart cities will require more and more automation tasks [18] N. Duan, J. Cui, L. Liu, and L. Zheng, ‘‘An end to end recognition for
license plates using convolutional neural networks,’’ IEEE Intell. Transp.
and this will require license plate recognition algorithms that Syst. Mag., vol. 13, no. 2, pp. 177–188, Summer. 2021.
are both efficient and streamlined. More and more cameras [19] M. Earl. (2016). Deep APNR. Accessed: Nov. 7, 2022. [Online]. Avail-
able: https://github.com/matthewearl/deep-anpr
will increase the requirement of re-identification of cars for [20] Q. Fu, Y. Shen, and Z. Guo, ‘‘License plate detection using deep cascaded
mapping their trajectories. An open challenge, with regards convolutional neural networks in complex scenes,’’ in Proc. Int. Conf.
to license plate recognition, are those countries where stan- Neural Inf. Process. Cham, Switzerland: Springer, 2017, pp. 696–706.
[21] J. Gao, L. Sun, and M. Cai, ‘‘Quantifying privacy vulnerability of indi-
dardized license plates are still not used by all the vehicles. vidual mobility traces: A case study of license plate recognition data,’’
Transp. Res. C, Emerg. Technol., vol. 104, pp. 78–94, Jul. 2019.
ACKNOWLEDGMENT [22] G. Goncalves, S. Da Silva, D. Menotti, and W. Schwartz, ‘‘Benchmark for
The authors would like to thank Deanship of Scientific license plate character segmentation,’’ J. Electron. Imag., vol. 25, no. 5,
pp. 34–53, 2016.
Research (DSR), University of Jeddah, for technical and [23] G. R. Gonçalves, M. A. Diniz, R. Laroca, D. Menotti, and W. R. Schwartz,
financial support. ‘‘Real-time automatic license plate recognition through deep multi-task
networks,’’ in Proc. 31st SIBGRAPI Conf. Graph., Patterns Images (SIB-
GRAPI), Oct. 2018, pp. 110–117.
REFERENCES [24] Y. Gong. (2022). Chinese Road Plate Dataset. Accessed: Nov. 7, 2022.
[1] C. Anagnostopoulos, I. Anagnostopoulos, V. Loumos, and E. Kayafas, [Online]. Available: https://github.com/yxgong0/CRPD
‘‘A license plate-recognition algorithm for intelligent transportation sys- [25] Y. Gong, L. Deng, S. Tao, X. Lu, P. Wu, Z. Xie, Z. Ma, and M. Xie,
tem applications,’’ IEEE Trans. Intell. Transp. Syst., vol. 7, no. 3, ‘‘Unified Chinese license plate detection and recognition with high
pp. 377–392, Sep. 2006. efficiency,’’ J. Vis. Commun. Image Represent., vol. 86, Jul. 2022,
[2] C.-N.-E. Anagnostopoulos, I. E. Anagnostopoulos, I. D. Psoroulas, Art. no. 103541.
V. Loumos, and E. Kayafas, ‘‘License plate recognition from still images [26] C. Gou, K. Wang, B. Li, and F.-Y. Wang, ‘‘Vehicle license plate recog-
and video sequences: A survey,’’ IEEE Trans. Intell. Transp. Syst., vol. 9, nition based on class-specific ERs and SaE-ELM,’’ in Proc. 17th Int.
no. 3, pp. 377–391, Sep. 2008. IEEE Conf. Intell. Transp. Syst. (ITSC), Qingdao, China, Oct. 2014,
[3] C. E. Anagnostopoulos, ‘‘License plate recognition: A brief tutorial,’’ pp. 2956–2961.
IEEE Intell. Transp. Syst. Mag., vol. 6, no. 1, pp. 59–67, Spring. 2014. [27] C. Gou, K. Wang, Y. Yao, and Z. Li, ‘‘Vehicle license plate recognition
[4] M. Arjovsky, S. Chintala, and L. Bottou, ‘‘Wasserstein generative adver- based on extremal regions and restricted Boltzmann machines,’’ IEEE
sarial networks,’’ in Proc. Int. Conf. Mach. Learn., vol. 70. Aug. 2017, Trans. Intell. Transp. Syst., vol. 17, no. 4, pp. 1096–1107, Apr. 2016.
pp. 214–223. [28] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville,
[5] M. Arsenovic, S. Sladojevic, A. Anderla, and D. Stefanovic, ‘‘Deep ‘‘Improved training of Wasserstein GANs,’’ in Proc. Adv. Neural Inf.
learning driven plates recognition system,’’ in Proc. 17th Int. Sci. Conf. Process. Syst. (NIPS), 2017, pp. 5767–5777.
Ind. Syst., 2017, pp. 1–4. [29] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Deep residual learning for
[6] S. Azizi, B. Mustafa, F. Ryan, Z. Beaver, J. Freyberg, J. Deaton, image recognition,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
A. Loh, A. Karthikesalingam, S. Kornblith, T. Chen, V. Natarajan, and (CVPR), Jun. 2016, pp. 770–778.
M. Norouzi, ‘‘Big self-supervised models advance medical image classi- [30] K. He, G. Gkioxari, P. Dollár, and R. Girshick, ‘‘Mask R-CNN,’’ in Proc.
fication,’’ in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021, IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 2980–2988.
pp. 3458–3468. [31] H. A. Hegt, R. J. De La Haye, and N. A. Khan, ‘‘A high performance
[7] S. Barreto, J. Lambert, and F. Vidal, ‘‘Using synthetic images for deep license plate recognition system,’’ in Proc. IEEE Int. Conf. Syst., Man,
learning recognition process on automatic license plate recognition,’’ in Cybern., Feb. 1998, pp. 4357–4362.
Proc. Mex. Conf. Pattern Recognit. Cham, Switzerland: Springer, 2019, [32] G. Heitz, S. Gould, A. Saxena, and D. Koller, ‘‘Cascaded classification
pp. 115–126. models: Combining models for holistic scene understanding,’’ in Proc.
[8] T. Björklund, A. Fiandrotti, M. Annarumma, G. Francini, and E. Magli, Adv. Neural Inf. Process. Syst. (NIPS), 2009, pp. 641–648.
‘‘Automatic license plate recognition with convolutional neural networks [33] R.-C. Chen, ‘‘A new method for license plate character detection and
trained on synthetic data,’’ in Proc. IEEE 19th Int. Workshop Multimedia recognition,’’ in Proc. 6th Int. Conf. Inf. Technol., IoT Smart City,
Signal Process. (MMSP), Oct. 2017, pp. 1–6. Dec. 2018, pp. 204–208.
[9] T. Björklund, A. Fiandrotti, M. Annarumma, G. Francini, and E. Magli, [34] R.-C. Chen, ‘‘Automatic license plate recognition via sliding-window
‘‘Robust license plate recognition using neural networks trained on syn- darknet-YOLO deep learning,’’ Image Vis. Comput., vol. 87, pp. 47–56,
thetic images,’’ Pattern Recognit., vol. 93, pp. 134–146, Sep. 2019. Jul. 2019.
[10] F. Al Breiki, M. Ridzuan, and R. Grandhe, ‘‘Self-supervised learning for [35] C. Henry, S. Y. Ahn, and S. Lee, ‘‘Multinational license plate recognition
fine-grained image classification,’’ 2021, arXiv:2107.13973. using generalized character sequence detection,’’ IEEE Access, vol. 8,
[11] H. Caner, H. S. Gecim, and A. Z. Alkar, ‘‘Efficient embedded neural- pp. 35185–35199, 2020, doi: 10.1109/ACCESS.2020.2974973.
network-based license plate recognition system,’’ IEEE Trans. Veh. [36] G. Hsu, J. Chen, and Y. Chung, ‘‘Application-oriented license plate
Technol., vol. 57, no. 5, pp. 2675–2683, Sep. 2008. recognition,’’ IEEE Trans. Veh. Technol., vol. 62, no. 2, pp. 552–561,
[12] L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, Feb. 2013.
‘‘DeepLab: Semantic image segmentation with deep convolutional nets, [37] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, ‘‘Densely
atrous convolution, and fully connected CRFs,’’ IEEE Trans. Pattern connected convolutional networks,’’ in Proc. IEEE Conf. Comput. Vis.
Anal. Mach. Intell., vol. 40, no. 4, pp. 834–848, Apr. 2018. Pattern Recognit. (CVPR), Jul. 2017, pp. 2261–2269.
[13] D. Chen, Y. Chen, Y. Li, F. Mao, Y. He, and H. Xue, ‘‘Self-supervised [38] Q. Huang, Z. Cai, and T. Lan, ‘‘A single neural network for mixed
learning for few-shot image classification,’’ in Proc. IEEE ICASSP, style license plate detection and recognition,’’ IEEE Access, vol. 9,
Jun. 2021, pp. 1745–1749. pp. 21777–21785, 2021.
[14] J. Chorowski, D. Bahdanau, D. Serdyuk, K. Cho, and Y. Bengio, ‘‘Atten- [39] F. Iandola, S. Han, M. Moskewicz, K. Ashraf, W. Dally, and K. Keutzer,
tion based models for speech recognition,’’ in Proc. Adv. Neural Inf. ‘‘SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and
Process. (NIPS), 2015, pp. 577–585. <0.5 MB model size,’’ 2016, arXiv:1602.07360.
[40] P. Isola, J. Zhu, T. Zhou, and A. A. Efros, ‘‘Image-to-image translation [64] X. Liu, W. Liu, T. Mei, and H. Ma, ‘‘A deep learning-based approach
with conditional adversarial networks,’’ in Proc. IEEE Conf. Comput. Vis. to progressive vehicle re-identification for urban surveillance,’’ in
Pattern Recognit. (CVPR), Jul. 2017, pp. 5967–5976. Proc. Eur. Conf. Comput. Vis. Cham, Switzerland: Springer, 2016,
[41] D. M. F. Izidio, A. P. A. Ferreira, H. R. Medeiros, and E. N. D. S. Barros, pp. 869–884.
‘‘An embedded automatic license plate recognition system using deep [65] X. Liu, W. Liu, H. Ma, and H. Fu, ‘‘Large-scale vehicle re-identification
learning,’’ Design Autom. Embedded Syst., vol. 24, no. 1, pp. 23–43, in urban surveillance videos,’’ in Proc. IEEE Int. Conf. Multimedia Expo
Mar. 2020. (ICME), Jul. 2016, pp. 1–6.
[42] M. Jaderberg, K. Simonyan, and A. Zisserman, ‘‘Spatial transformer [66] J. Liu, X. Li, H. Zhang, C. Liu, L. Dou, and L. Ju, ‘‘An implementation
networks,’’ in Proc. Adv. Neural Inf. Process. Syst. (NIPS), 2015, of number plate recognition without segmentation using convolutional
pp. 2017–2025. neural network,’’ in Proc. IEEE 19th Int. Conf. High Perform. Comput.
[43] S. Jha, C. Seo, E. Yang, and G. Joshi, ‘‘Real time object detection and Commun., IEEE 15th Int. Conf. Smart City, IEEE 3rd Int. Conf. Data Sci.
tracking system for video surveillance system,’’ Multimedia Tools Appl., Syst. (HPCC/SmartCity/DSS), Dec. 2017, pp. 246–253.
no. 80, pp. 3981–3996, Jan. 2020. [67] N. Lu, W. Yang, A. Meng, Z. Xu, H. Huang, and L. Huang, ‘‘Automatic
[44] X. Jin, R. Tang, L. Liu, and J. Wu, ‘‘Vehicle license plate recogni- recognition for arbitrarily tilted license plate,’’ in Proc. 2nd Int. Conf.
tion for fog-haze environments,’’ IET Image Process., vol. 15, no. 6, Video Image Process., Dec. 2018, pp. 23–28.
pp. 1273–1284, May 2021. [68] S. Z. Masood, G. Shu, A. Dehghan, and E. G. Ortiz, ‘‘License plate
[45] I. R. Khan, S. T. A. Ali, A. Siddiq, M. M. Khan, M. U. Ilyas, detection and recognition using deeply learned convolutional neural net-
S. Alshomrani, and S. Rahardja, ‘‘Automatic license plate recognition works,’’ 2017, arXiv:1703.07330.
in real-world traffic videos captured in unconstrained environment by a [69] A. Meng, W. Yang, Z. Xu, H. Huang, L. Huang, and C. Ying, ‘‘A robust
mobile camera,’’ Electronics, vol. 11, no. 9, p. 1408, Apr. 2022. and efficient method for license plate recognition,’’ in Proc. 24th Int.
[46] I. Kilic and G. Aydin, ‘‘Turkish vehicle license plate recognition using Conf. Pattern Recognit. (ICPR), Aug. 2018, pp. 1713–1718.
deep learning,’’ in Proc. Int. Conf. Artif. Intell. Data Process. (IDAP), [70] N. M. Murad, L. Rejeb, and L. B. Said, ‘‘The use of DCNN for road path
Sep. 2018, pp. 1–5. detection and segmentation,’’ Iraqi J. Comput. Sci. Math., vol. 3, no. 2,
[47] S. G. Kim, H. G. Jeon, and H. I. Koo, ‘‘Deep-learning-based license plate pp. 119–127, 2022.
detection method using vehicle region extraction,’’ Electron. Lett., vol. 53, [71] A. Newell, K. Yang, and J. Deng, ‘‘Stacked hourglass networks for
no. 15, pp. 1034–1036, Jul. 2017. human pose estimation,’’ in Proc. Eur. Conf. Comput. Vis. (ECCV). Cham,
[48] B. Kim, T. Won, S. Park, and J. Heo, ‘‘Anomaly detection for deep- Switzerland: Springer, 2016, pp. 483–499.
learning based license plate recognition in real time video,’’ in Proc. Conf. [72] V. Nguyen and D. L. Nguyen, ‘‘Joint image deblurring and binarization
Res. Adapt. Convergent Syst., Sep. 2019, pp. 123–124. for license plate images using deep generative adversarial networks,’’
[49] A. Krizhevsky, I. Sutskever, and G. Hinton, ‘‘ImageNet classification with in Proc. 5th NAFOSTED Conf. Inf. Comput. Sci. (NICS), Nov. 2018,
deep convolutional neural networks,’’ in Proc. Adv. Neural Inf. Process. pp. 430–435.
Syst., 2012, pp. 1097–1105. [73] J. A. G. Nijhuis, M. H. T. Brugge, K. A. Helmholt, J. P. W. Pluim,
[50] F. D. Kurpiel, R. Minetto, and B. T. Nassu, ‘‘Convolutional neural net- L. Spaanenburg, R. S. Venema, and M. A. Westenberg, ‘‘Car license plate
works for license plate detection in images,’’ in Proc. IEEE Int. Conf. recognition with neural networks and fuzzy logic,’’ in Proc. Int. Conf.
Image Process. (ICIP), Sep. 2017, pp. 3395–3399. Neural Netw., 1995, pp. 2232–2236.
[51] R. Laroca, E. Severo, L. A. Zanlorensi, L. S. Oliveira, G. R. Gonçalves, [74] OpenALPR. (2016). OpenALPR-EU Dataset. [Online]. Available:
W. R. Schwartz, and D. Menotti, ‘‘A robust real-time automatic license https://github.com/openalpr/benchmarks/tree/master/endtoend/eu
plate recognition based on the YOLO detector,’’ in Proc. Int. Joint Conf. [75] H. Padmasiri, J. Shashirangana, D. Meedeniya, O. Rana, and C. Perera,
Neural Netw. (IJCNN), Jul. 2018, pp. 1–10. ‘‘Automated license plate recognition for resource-constrained environ-
[52] R. Laroca, L. A. Zanlorensi, G. R. Gonçalves, E. Todt, W. R. Schwartz, ments,’’ Sensors, vol. 22, no. 4, p. 1434, Feb. 2022.
and D. Menotti, ‘‘An efficient and layout-independent automatic [76] M. Peker, ‘‘Comparison of tensorflow object detection networks for
license plate recognition system based on the YOLO detector,’’ 2019, licence plate localization,’’ in Proc. 1st Global Power, Energy Commun.
arXiv:1909.01754. Conf. (GPECOM), Jun. 2019, pp. 101–105.
[53] R. Laroca, E. Cardoso, D. Lucio, V. Estevam, and D. Menotti, ‘‘On the [77] X. Peng, L. Wen, D. Bai, and B. Peng, ‘‘Reformative vehicle license
cross-dataset generalization in license plate recognition,’’ in Proc. 17th plate recognition algorithm based on deep learning,’’ in Proc. In Int.
Int. Joint Conf. Comput. Vis., Imag. Comput. Graph. Theory Appl., 2022, Conf. Cognit. Syst. Signal Process. Cham, Switzerland: Springer, 2018,
pp. 166–178. pp. 243–255.
[54] R. Laroca, M. Santos, V. Estevam, E. Luz, and D. Menotti, ‘‘A first look [78] I. V. Pustokhina, D. A. Pustokhin, J. J. P. C. Rodrigues, D. Gupta,
at dataset bias in license plate recognition,’’ 2022, arXiv:2208.10657. A. Khanna, K. Shankar, C. Seo, and G. P. Joshi, ‘‘Automatic vehicle
[55] H. Law and J. Deng, ‘‘CornerNet: Detecting objects as paired keypoints,’’ license plate recognition using optimal K-means with convolutional neu-
in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 734–750. ral network for intelligent transportation systems,’’ IEEE Access, vol. 8,
[56] Y. Lee, J. Lee, H. Ahn, and M. Jeon, ‘‘SNIDER: Single noisy image pp. 92907–92917, 2020.
denoising and rectification for improving license plate recognition,’’ in [79] J. Qian and B. Qu, ‘‘Fast license plate recognition method based on com-
Proc. IEEE/CVF Int. Conf. Comput. Vis. Workshop (ICCVW), Oct. 2019, petitive neural network,’’ in Proc. 3rd Int. Conf. Commun., Inf. Manag.
pp. 1017–1026. Netw. Secur. (CIMNS), 2018, pp. 114–117.
[57] Y. Y. Lee, Z. A. Halim, and M. N. Ab Wahab, ‘‘License plate detection [80] S. Qin and S. Liu, ‘‘Efficient and unified license plate recognition via
using convolutional neural network-back to the basic with design of lightweight deep neural network,’’ IET Image Process., vol. 14, no. 16,
experiments,’’ IEEE Access, vol. 10, pp. 22577–22585, 2022. pp. 4102–4109, Dec. 2020.
[58] J. Li, C. Niu, and M. Fan, ‘‘Multi-scale convolutional neural networks for [81] M. Rahmani, M. Sabaghian, S. M. Moghadami, M. M. Talaie, M. Naghibi,
natural scene license plate detection,’’ in Proc. Int. Symp. Neural Netw. and M. A. Keyvanrad, ‘‘IR-LPR: Large scale of Iranian license plate
Cham, Switzerland: Springer, 2012, pp. 110–119. recognition dataset,’’ 2022, arXiv:2209.04680.
[59] H. Li and C. Shen, ‘‘Reading car license plates using deep convolutional [82] S. Radzi and M. Khalil-Hani, ‘‘Character recognition of license plate
neural networks and LSTMs,’’ 2016, arXiv:1601.05610. number using convolutional neural networks,’’ in Proc. IVIC, 2011,
[60] H. Li, P. Wang, and C. Shen, ‘‘Toward end-to-end car license plate pp. 45–55.
detection and recognition with deep neural networks,’’ IEEE Trans. Intell. [83] S. Rakhshani, E. Rashedi, and H. Nezamabadi-Pour, ‘‘Representation
Transp. Syst., vol. 20, no. 3, pp. 1126–1136, Mar. 2019. learning in a deep network for license plate recognition,’’ Multimedia
[61] D. Li, X. Wang, and D. Kong, ‘‘Deeprebirth: Accelerating deep neural Tools Appl., vol. 79, pp. 13267–13289, May 2020, doi: 10.1007/s11042-
network execution on mobile devices,’’ in Proc. 32nd AAAI Conf. Artif. 019-08416-0.
Intell., 2018, pp. 1–9. [84] S. Ren, K. He, R. Girshick, and J. Sun, ‘‘Faster R-CNN: Towards
[62] J. Liang, G. Chen, Y. Wang, and H. Qin, ‘‘EGSANet: Edge-guided sparse real-time object detection with region proposal networks,’’ IEEE
attention network for improving license plate detection in the wild,’’ Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149,
Applied Intell., vol. 52, no. 4, pp. 4458–4472, 2022. Jun. 2017.
[63] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Fu, and A. Berg, [85] H. Rui, ‘‘License plate recognition using capsule networks with an
‘‘SSD: Single shot MultiBox detector,’’ in Proc. Eur. Conf. Comput. Vis. improved dynamic routing algorithm,’’ M.S. thesis, Dept. Comput. Sci.,
Cham, Switzerland: Springer, 2016, pp. 21–37. Hanyang Univ., South Korea, 2019.
[86] N. Samet, S. Hicsonmez, and E. Akbas, ‘‘HoughNet: Integrating near and [109] F. You, Y. Zhao, and X. Wang, ‘‘Combination of CNN with GRU for plate
long-range evidence for bottom-up object detection,’’ in Proc. Eur. Conf. recognition,’’ J. Phys., Conf., vol. 1187, no. 3, Apr. 2019, Art. no. 032008.
Comput. Vis. (ECCV), 2020, pp. 406–423. [110] D. Zang, Z. Chai, J. Zhang, D. Zhang, and J. Cheng, ‘‘Vehicle license plate
[87] Z. Selmi, M. B. Halima, U. Pal, and M. A. Alimi, ‘‘DELP-DAR system for recognition using visual attention model and deep learning,’’ J. Electron.
license plate detection and recognition,’’ Pattern Recognit. Lett., vol. 129, Imag., vol. 24, no. 3, pp. 1–10, May 2015.
pp. 213–223, Jan. 2020. [111] M. Zha, G. Meng, C. Lin, Z. Zhou, and K. Chen, ‘‘RoLMA: A practical
[88] Y. Shen, T. Xiao, H. Li, S. Yi, and X. Wang, ‘‘Learning deep neural adversarial attack against deep learning-based LPR systems,’’ in Proc. Inf.
networks for vehicle re-ID with visual-spatio-temporal path proposals,’’ Secur. Cryptol., 15th Int. Conf. (Inscrypt). Cham, Switzerland: Springer,
in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 1918–1927. 2020, pp. 101–117.
[89] S. Montazzolli and C. Jung, ‘‘Real-time Brazilian license plate detection [112] L. Zhang, P. Wang, H. Li, Z. Li, C. Shen, and Y. Zhang, ‘‘A robust
and recognition using deep convolutional neural networks,’’ in Proc. attentional framework for license plate recognition in the wild,’’ IEEE
30th SIBGRAPI Conf. Graph., Patterns Images (SIBGRAPI), Oct. 2017, Trans. Intell. Transp. Syst., vol. 22, no. 11, pp. 6967–6976, Nov. 2021,
pp. 55–62. doi: 10.1109/TITS.2020.3000072.
[90] S. Silva and C. Jung, ‘‘License plate detection and recognition in uncon- [113] X. Zhai, F. Bensaali, and R. Sotudeh, ‘‘OCR-based neural network
strained scenarios,’’ in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, for ANPR,’’ in Proc. IEEE Int. Conf. Imag. Syst. Techn., Jul. 2012,
pp. 580–596. pp. 393–397.
[91] K. Simonyan and A. Zisserman, ‘‘Very deep convolutional networks for [114] Y. Zhao, Z. Yu, X. Li, and M. Cai, ‘‘Chinese license plate image database
large-scale image recognition,’’ 2014, arXiv:1409.1556. building methodology for license plate recognition,’’ J. Electron. Imag.,
[92] J. Spanhel, J. Sochor, R. Juránek, A. Herout, L. Marsík, and P. Zemcík, vol. 28, Jan. 2019, Art. no. 013001.
‘‘Holistic recognition of low quality license plates by CNN using track [115] Y. Zhang and Q. Yang, ‘‘A survey on multi-task learning,’’ 2017,
annotated data,’’ in Proc. 14th IEEE Int. Conf. Adv. Video Signal Based arXiv:1707.08114.
Surveill. (AVSS), Aug. 2017, pp. 1–6. [116] L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, and Q. Tian, ‘‘Scalable
[93] J. Spanhel, J. Sochor, R. Juránek, and A. Herout, ‘‘Geometric alignment person re-identification: A benchmark,’’ in Proc. IEEE Int. Conf. Comput.
by deep learning for recognition of challenging license plates,’’ in Proc. Vis. (ICCV), Dec. 2015, pp. 1116–1124.
21st Int. Conf. Intell. Transp. Syst. (ITSC), Nov. 2018, pp. 3524–3529. [117] S. Zherzdev and A. Gruzdev, ‘‘LPRNet: License plate recognition via
[94] V. Srebric. (2003). EnglishLP Database. Accessed: Oct. 1, 2021. deep neural networks,’’ 2018, arXiv:1806.10447.
[Online]. Available: http://www.zemris.fer.hr/projects/LicensePlates/ [118] W. Zhou, H. Li, Y. Lu, and Q. Tian, ‘‘Principal visual word discovery for
english/baza_slika.zip automatic license plate detection,’’ IEEE Trans. Image Process., vol. 21,
[95] P. Svoboda, M. Hradis, L. Marsík, and P. Zemcík, ‘‘CNN for license no. 9, pp. 4269–4279, Sep. 2012.
plate motion deblurring,’’ in Proc. IEEE Int. Conf. Image Process. (ICIP), [119] J. Zhu, T. Park, P. Isola, and A. Efros, ‘‘Unpaired image-to-image trans-
Sep. 2016, pp. 3832–3836. lation using cycle-consistent adversarial networks,’’ in Proc. IEEE Int.
[96] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi, ‘‘Inception-v4, Conf. Comput. Vis. (ICCV), 2017.
inception-ResNet and the impact of residual connections on learning,’’ [120] J. Zhuang, S. Hou, Z. Wang, and Z. Zha, ‘‘Towards human-level license
in Proc. 31st AAAI Conf. Artif. Intell., 2017, pp. 1–7. plate recognition,’’ in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018,
[97] T. Tsai, Z. Lu, and C. Huang, ‘‘License plate recognition system based on pp. 306–321.
deep learning,’’ in Proc. IEEE Int. Conf. Consum. Electron., May 2019, [121] Y. Zou, Y. Zhang, J. Yan, X. Jiang, T. Huang, H. Fan, and Z. Cui, ‘‘License
pp. 1–2. plate detection and recognition based on YOLOv3 and ILPRNET,’’ Sig-
[98] S. Usmankhujaev, S. Lee, and J. Kwon, ‘‘Korean license plate recognition nal, Image Video Process., vol. 16, no. 2, pp. 473–480, Mar. 2022.
system using combined neural networks,’’ in Proc. Int. Symp. Distrib.
Comput. Artif. Intell. Cham, Switzerland: Springer, 2019, pp. 10–17.
[99] S. Usmankhujaev. (2021). Korean Car Plate Generator. Accessed:
Nov. 7, 2022. [Online]. Available: https://github.com/Usmankhujaev/
KoreanCarPlateGenerator
[100] T. Vaiyapuri, S. N. Mohanty, M. Sivaram, I. V. Pustokhina,
D. A. Pustokhin, and K. Shankar, ‘‘Automatic vehicle license plate
recognition using optimal deep learning model,’’ Comput., Mater.
Continua, vol. 67, no. 2, pp. 1881–1897, 2021.
[101] X. Wang, Z. Man, M. You, and C. Shen, ‘‘Adversarial generation of train-
ing examples: Applications to moving vehicle license plate recognition,’’
2017, arXiv:1707.03124.
[102] Y. Wang, Z. Bian, Y. Zhou, and L. Chau, ‘‘Rethinking and design-
ing a high-performing automatic license plate recognition approach,’’
IEEE Trans. Intell. Transp. Syst., vol. 23, no. 7, pp. 8868–8880,
MUHAMMAD MURTAZA KHAN (Member,
Jul. 2022.
[103] M. Weber. (1999). Caltech Plate Dataset 1999. Accessed: Oct. 1, 2021. IEEE) received the B.Sc. degree (Hons.) in electri-
[Online]. Available: http://www.vision.caltech.edu/html-les/archive.html cal engineering from the University of Engineering
[104] C. Wu, S. Xu, G. Song, and S. Zhang, ‘‘How many labeled license and Technology Taxila, Taxila, Pakistan, in 2000,
plates are needed?’’ in Proc. Chin. Conf. Pattern Recognit. Comput. Vis. the M.S. degree in computer software engineering
(PRCV). Cham, Switzerland: Springer, 2018, pp. 334–346. from the EME College Rawalpindi, National Uni-
[105] B. Wu, X. Dai, P. Zhang, Y. Wang, F. Sun, Y. Wu, Y. Tian, P. Vajda, Y. Jia, versity of Sciences and Technology, Islamabad,
and K. Keutzer, ‘‘FBNet: Hardware-aware efficient ConvNet design via Pakistan, in 2005, and the M.S. and Ph.D. degrees
differentiable neural architecture search,’’ in Proc. IEEE/CVF Conf. Com- in image processing from the Institut National
put. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 10726–10734. Polytechnique de Grenoble, Grenoble, France, in
[106] J. Xiao, J. Hays, K. Ehinger, A. Olivia, and A. Torralba, ‘‘Sun database: 2006 and 2009, respectively. From 2000 to 2004, he was a Software Engi-
Large-scale scene recognition from abbey to zoo,’’ in Proc. IEEE Com- neer and a Senior Software Engineer with Streaming Networks Pvt Ltd.,
put. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2010,
Islamabad, where his responsibilities revolved around developing video
pp. 3485–3492.
[107] Z. Xu, W. Yang, A. Meng, N. Lu, H. Huang, C. Ying, and L. Huang,
drivers and implementation of optimized video codec solutions for Philips
‘‘Toward end-to-end license plate detection and recognition: A large TriMedia processor. He has been an Associate Professor in computer sci-
dataset and baseline,’’ in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, ence and artificial intelligence with the College of Computer Science and
pp. 255–271. Engineering, University of Jeddah, Jeddah, Saudi Arabia, since 2015, and
[108] Y. Xu, L. Xie, X. Zhang, X. Chen, G.-J. Qi, Q. Tian, and H. Xiong, ‘‘PC- also he has been an Assistant Professor in electrical engineering with the
DARTS: Partial channel connections for memory-efficient architecture School of Electrical Engineering and Computer Science, National University
search,’’ 2019, arXiv:1907.05737. of Sciences and Technology, since 2010.
MUHAMMAD U. ILYAS (Senior Member, IEEE) SALEH M. ALSHOMRANI received the bach-
received the B.E. degree (Hons.) in electrical engi- elor’s (B.Sc.) degree in computer science from
neering from the National University of Sciences King Abdulaziz University, Saudi Arabia, in 1997,
and Technology, Islamabad, Pakistan, in 1999, the master’s degree in computer science from
the M.S. degree in computer engineering from Ohio University, USA, in 2001, and the Ph.D.
the Lahore University of Management Sciences, degree in computer science from Kent State Uni-
Lahore, Pakistan, in 2004, and the M.S. and Ph.D. versity, OH, USA, in 2008. He is currently a
degrees in electrical engineering from Michigan Professor with the Information Systems Depart-
State University, East Lansing, MI, USA, in ment, University of Jeddah. His research inter-
2007 and 2009, respectively. He was a Postdoc- ests include web distributed systems and internet
toral Research Associate appointed jointly by the Electrical and Computer computing, data mining, and algorithms.
Engineering Department and the Computer Science and Engineering Depart-
ment, Michigan State University. He worked under the joint supervision of
Dr. H. Radha and Dr. A. Liu at East Lansing. He has been an Assistant
Professor and the Program Director of PG programs with the University of
Birmingham Dubai, United Arab Emirates, since 2022. Prior to this, he has
been an Associate Professor with the Department of Computer and Network
Engineering, College of Computer Science and Engineering, University of
Jeddah, Jeddah, Saudi Arabia, since 2016, and an Assistant Professor in elec-
trical engineering with the School of Electrical Engineering and Computer
Science, National University of Sciences and Technology, since 2011.