A Single Neural Network For Mixed Style License Plate Detection and Recognition

The document presents ALPRNet, a single neural network designed for the detection and recognition of mixed style license plates (LPs). Unlike traditional methods that treat LP detection and character recognition as separate tasks, ALPRNet integrates both processes into a unified framework, enhancing performance and efficiency. Experimental results demonstrate that ALPRNet achieves state-of-the-art results on various datasets, supporting multiple LP styles and configurations.

Uploaded by

ayush21cse

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views9 pages

A Single Neural Network For Mixed Style License Plate Detection and Recognition

Uploaded by

ayush21cse

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Received December 14, 2020, accepted January 11, 2021, date of publication January 28, 2021, date of current

version February 8, 2021.

Digital Object Identifier 10.1109/ACCESS.2021.3055243

A Single Neural Network for Mixed Style License

Plate Detection and Recognition
QIUYING HUANG, ZHANCHUAN CAI , (Senior Member, IEEE), AND TING LAN
Faculty of Information Technology, Macau University of Science and Technology, Macau 999078, China
Corresponding author: Zhanchuan Cai ([email protected])
This work was supported in part by the Science and Technology Development Fund of Macau under Grants 0052/2020/AFJ, 0038/2020/A,
and 0069/2018/A2.

ABSTRACT Most existing methods for automatic license plate recognition (ALPR) focus on a specific
license plate (LP) type, but little work focuses on multiple or mixed LPs. This article proposes a single
neural network called ALPRNet for detection and recognition of mixed style LPs. In ALPRNet, two fully
convolutional one-stage object detectors are used to detect and classify LPs and characters simultaneously,
which are followed by an assembly module to output the LP strings. ALPRNet treats LP and character
equally, object detectors directly output bounding boxes of LPs and characters with corresponding labels,
so they avoid the recurrent neural network (RNN) branches of optical character recognition (OCR) of the
existing recognition approaches. We evaluate ALPRNet on a mixed LP style dataset and two datasets with
single LP style, the experimental results show that the proposed network achieves state-of-the-art results
with a simple one-stage network.

INDEX TERMS ALPRNet, license plate recognition, object recognition, convolutional neural network.

I. INTRODUCTION recognition, which are implemented sequentially. The widely

From paid-parking to traffic control and toll violations, auto- used CNN-based methods usually employ object detec-
matic license plate recognition (ALPR) is one of the key com- tors to detect LPs [10]–[15], such as you only look once
ponents of many traffic-related applications [1]–[6]. There (YOLO), faster region-based convolutional neural network
has considerable work on ALPR in the past decade, but (Faster R-CNN) and mask region-convolutional neural net-
most of the existing approaches focus on specific types of work (Mask R-CNN). For the training error accumulation,
license plates (LPs), and there is little work supporting multi- the independent sequential tasks result in a sub-optimization
ple or mixed LPs [7]–[9]. The growth of cross-boder commu- problem. Indeed, the tasks of LP detection and character
nication and exchanges have promoted the demand of ALPR recognition can work collaboratively by providing context
systems that support multiple and mixed style LPs. In the and detail information to each other, and they can be con-
cities in the China’s Pearl River Delta Zone, vehicles may ducted simultaneously to improve performance.
carry up three LPs with different styles. This article focuses Recently, more and more end-to-end ALPR networks have
on multiple and mixed LPs recognition and introduces a new been developed, which can complete the tasks of LP detec-
ALPRNet framework that integrates the LP detection and tion and character recognition by using a single unified
character recognition into a single neural network. ALPRNet deep neural network [16]. Since these methods alway com-
treats LPs and characters as basic elements to detect and bine the region-based convolutional neural network with an
classify, and the tasks of LP and character detection and RNN-based sequential model, they have some problems:
classification are conduct simultaneously in one pass. 1) the RNN-based part of the network consumes high com-
Most of existing ALPR methods consider LP recogni- putational cost, such that the detection part of the network
tion as two independent tasks: LP detection and character is difficult to optimize; 2) the mainstream detectors of the
region-based object detection, such as Faster R-CNN, single
The associate editor coordinating the review of this manuscript and shot detector (SSD), and YOLO, are anchor-based detec-
approving it for publication was Songwen Pei . tors, they rely on a set of pre-defined anchor boxes. The

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 9, 2021 21777
Q. Huang et al.: Single Neural Network for Mixed Style LP Detection and Recognition

performance of these object detection networks is sensitive construct conditional random field (CRF) models on the
to their size, aspect ratios, and the number of anchor boxes, candidate characters to represent the relationship among can-
moreover, anchor boxes involve complicated computation didate characters. And it localize LPs through the belief
of intersection over union (IoU) scored with ground-truth propagation inference on CRF. In [22], stroke width trans-
bounding boxes. form (SWT) and MSER are combined to detect character
In this article, we propose an ALPR network called regions and LPs by probabilistic Hough transform. Character
ALPRNet, wherein a single neural network conducts LP first methods have received a high recall, but they are easily
detection and recognition simultaneously, and the characters disturbed by background text.
are considered as objects that need to be detect and clas-
sify. Such scheme predicts the bounding boxes and labels of B. ALPR WITH DEEP LEARNING
LPs and characters directly and gets rid of the complicated In order to achieve high accuracy, more and more ALPR
RNN-based recognition procedure with the region of interest work turns to use CNNs to detect and recognize LPs. Most
(RoI) pooling and crop procedures of object detection. And of these methods are based on the traditional approach with
the contributions of this work are concluded as follows: two stages, i.e., the detection network based on the generic
1). We design a novel one-stage network for LP detection object detection methods (such as YOLO, Faster R-CNN,
and recognition, wherein two parallel branches of object and SSD-based scheme) is used to detect LPs, and the
detection and classification are introduced, they directly recurrent neural network is used as the recognition net-
produce the LP type and the LP string excluding the work of OCR. In [14], the authors proposed a complete
redundant and intermediate steps. ALPR scheme that employs three subnetworks to conduct
2). We introduce the classification branch of LPs to support vehicle location, LP detection, and character recognition.
multiple and mixed style LPs. A YOLOv2 [23] based subnetwork is used to detect vehicle,
3). We use the bounding box of LPs and characters to wherein a novel CNN called WPOD-NET is created for LP
assemble the LPs, thus ALPRNet naturally supports detection and affine transformation regress, and a modified
multi-line and variable length of LPs. YOLO network served as OCR module for character recog-
nition. In [13], a real-time ALPR system is proposed, which
The rest of the article is organized as follows: Section II uses a YOLO-based network for vehicle and LP detection,
gives a brief review of detection and related work. Section III and a CR-NET [24] network is used for character segmenta-
presents a detail description of the proposed network. tion and recognition. In [25], CNN-based text detectors are
Section IV shows the experimental results of the proposed used to search character regions in digital images and cluster
network. The conclusion of this work is given in Section V. them by edge information to construct LP, and a RNN with
connectionist temporal classification (CTC) is employed to
II. RELATED WORK label and recognize characters.
A. ALPR WITHOUT DEEP LEARNING In addition, some end-to-end ALPR methods have been
The existing ALPR methods can be classified into two cate- proposed by using united network to perform LP detection
gories: LP first and character first approaches. A traditional and character recognition, but they still include sequential
two-stage ALPR procedure uses the features of edge, color, stages for LP detection and OCR. In [26], a method uses
and texture to detect the location of LP, and then performs VGG16 network [27] as backbone network and employs a
character segmentation and features extraction followed region proposal network (RPN) on different layers of feature
by machine learning classifiers to identify each character. pyramid network (FPN) to generate LP proposals and regress
Yuan et al. [17] proposed a line density filter to connect the bounding box, and RNNs with CTC is used to recognize
regions by edge density from binary images. In [18], edges characters.
are clustered by Expectation-Maximization algorithm for LP
detection. Ashtari et al. [19] developed a method of LP detec- C. SCENE TEXT SPOTTING WITH DEEP LEARNING
tion that analyzes pixels by a color geometric template via Currently, many scene text spotting (STS) approaches based
strip search. Yu et al. [20] employed a wavelet transform to on deep learning have been proposed, but most of them
produce horizontal and vertical features of an image and used employ generic object detectors (such as Faster R-CNN, SSD,
empirical mode decomposition (EMD) analysis to determine and YOLO) to detect and regress text instances directly, or to
the position of a LP. These methods are suitable for LPs predict text/non-text probability of each pixel by semantic
that have distinguished features, but they are too sensitive to segmentation methods (such as fully convolutional network
complex images. (FCN) [28]. Li et al. [29] proposed an end-to-end approach
Character first methods try to find character regions in that integrates text detection and recognition into a uni-
the image and then cluster these regions based on the graph fied framework, which consists a tailored region proposal
semantics to construct LPs. The graphic semantics includes network for text detection and an attention-based recurrent
the front and background color, orientation, characteristic neural network (RNN) encoder/decoder for text recognition.
scale, position, etc. In [21], maximally stable extremal region Zhou et al. [30] proposed an efficient and accurate scene text
(MSER) is used to extract candidate characters, and then detector (EAST detector) to predict a word or line-level text,

21778 VOLUME 9, 2021

Q. Huang et al.: Single Neural Network for Mixed Style LP Detection and Recognition

FIGURE 1. Framework of the proposed network: ALPRNet. Two fully convolutional one-stage object detectors are employed to detect bounding boxes
and classify LPs and characters. The class number of LPs is 4 (Background, Mainland China, Hong Kong, and Macao), and the class number of characters
is 73 (1 background, 26 English characters, 10 numeric characters, and 36 Chinese characters).

which consists of a fully convolutional network with non- LP strings. Branches of LP and character are integrated seam-
maximum suppression (NMS) merging state. He et al. [31] lessly, resulting in an end-to-end trainable model.
proposed an end-to-end framework based on SSD by intro- The object detectors are fully convolutional, which are
ducing an text attention module, which enables a direct text anchor free and get rid of the complicated RoI cropping and
mask supervision and achieves strong performance improve- pooling operations of mainstream object detector methods.
ments by training text detection and recognition jointly. It directly classifies the detected objects, so it avoids the
Xing [32] proposed a one-stage model that processes text RNNs branch of OCR. As a result, the network is simple and
detection and recognition simultaneously. easy to extend in practical use.
The proposed network is a one-stage end-to-end model
that treats LP detection and character recognition as object A. BACKBONE AND FPN
detection. We directly regress each object’s bounding box ResNet-50 is employed as backbone networks and applies a
and classify object types (LP type and character). It naturally feature pyramid structure (FPN) by using a top-down struc-
supports multiple and mixed style LPs, and also supports LP ture to fuse features across multiple resolutions, and we use
that has multiple lines or variable length of characters. different layers of FPN to perform character and LP detection.
The P2 layer of FPN has 1/8 resolution of the input image
III. METHODOLOGY that provides a larger receptive field size, it is suitable for
The proposed ALPRNet is a one-stage convolutional network LP detection. The P1 layer of FPN has 1/4 resolution of the
consisting of two fully convolutional one-stage object detec- input image that provides more detail features, it is suitable
tors (FCOS detectors) [33] for the detection and classification for small object detection and is used for character branch.
of LPs and characters. These two detectors are treated equally
and implemented in parallel on different levels of feature B. FULL CONVOLUTIONAL ONE-STAGE OBJECT DETECTOR
maps derived from backbone network, as shown in Fig. 1. We introduce two FCOS detectors to densely predict bound-
We employ a ResNet-50 [34] with feature pyramid structure ing box, center-ness, orientation, and classification of LPs
(FPN) to serve as backbone, design a assembly module to and characters. Different from anchor-base detectors that
combine detected LPs and characters, and finally output the regress target bounding box with anchor boxes as reference,

VOLUME 9, 2021 21779

Q. Huang et al.: Single Neural Network for Mixed Style LP Detection and Recognition

FCOS detector regards locations as training samples and and character classification (class). It uses the P1 output
directly regresses the target bounding box at each location. of FPN as input feature map that has 1/4 spatial resolu-
FCOS detector reduces intermediate steps, such as RoI pro- tion of the input image. The classification sub-branch pre-
posals and segmentation. Moreover, it eliminates the sample dicts 73-channel probability maps including 1 background,
imbalance problem between positive and negative anchor 26 English characters, 10 numeric characters, and 36 Chinese
boxes during training, that causes the training to be inef- characters.
ficient. The regression targets of a location (x, y) are the
distances to the borders of the bounding box, and it has the E. ASSEMBLE MODULE
form: Assemble module is to combine the detected LPs and
characters and finally outputs strings. The assembly operation
l ∗ = x − x0 , t ∗ = y − y0 ,
depends on the calculation of the max overlap rate of charac-
r ∗ = x1 − x, b∗ = y1 − y (1) ter to each LP, which supports LPs with multiple lines or vari-
where (x0 ,y0 ) and (x1 ,y1 ) are the top-left and right-bottom ant in character count. By IoU calculation, the predicted
points of a bounding box, respectively. bounding boxes of LPs and characters are applied to manip-
FCOS detector predicts the center-ness cnx,y of each loca- ulate the detected characters into LPs.
tion (x, y) to depict the normalized distance from the location The IoU calculation orientated bounding box is more accu-
to the center of the object. In inference stage, center-ness is rate than rectangle bounding box, especially for LPs with
used to filter out low-quality predicted bounding boxes pro- multi-lines. The use of orientated bounding box helps to
duced by locations far from the center of an object. Such that calculate the correct order of each character of multi-line LP
it improves detection performance and reduces the workload by rotating the LP to horizontal and sorting by the coordinate
of NMS process. The center-ness of each location is defined (x, y). Assemble module helps to handle non-LP characters
as: in input image, only candidate characters in LP are treated as
s detected characters and only candidate LPs have characters
min(l ∗ , r ∗ ) min(t ∗ , b∗ ) are treated as detected LP. It filters out most of the non-LP
cnx,y = × (2)
max(l ∗ , r ∗ ) max(t ∗ , b∗ ) characters. Finally we run the the outputs of assemble module
against a regular expression matcher to find results that match
where l∗ , t∗ , r∗ , and b∗ are the regression predictions of each LP patterns of the corresponding LP type.
location bx,y .
F. LOSS FUNCTIONS
C. LICENSE PLATE BRANCH Corresponding to 4 sub-branches of LP detection and charac-
The LP branch is designed to detect and classify the type ter recognition branches, the proposed train target is divided
of LP in a higher level concept. This branch contains four into four parts: score map loss (Ls ), center-ness loss (Lcenter ),
sub-branches for LP segmentation (score map), bounding box geometry loss (Lg ), and classification loss (Lc ). The loss
and orientation regression (geometry), center-ness regres- formula is defined as follows:
sion, and LP classification (class), respectively. We use the
convolutional feature map from the P2 level of FPN to imple- L = Ls + Lcenter + λLg + βLc (3)
ment LP detection, and the input feature map has 1/8 spatial
In this article, λ is set to 1, and β is set to 10 to balance
resolution of the input image.
the losses of geometry and classification. In training, we find
The bounding box regression sub-branch employs 2
that characters and LPs are small objects, so the deviation
convolutional layers with filter size of 3 × 3, followed by
from bounding box results bigger loss than the general object
2 convolutional layers with filter size of 1 × 1 to produce the
detection. This situation leads to a training stagnation, so the
5-channel feature maps, estimating a LP bounding box at each
β value is set to 10 for solving this problem.
spatial location. Each bounding box is parameterized by five
Score Map Loss: Most of the detection pipelines face
parameters, indicating the distances of current location to the
to the class imbalanced problem, thus one should carefully
top, bottom, left, and right sides of the bounding box, as well
process training image. However, it always introduces more
as the orientation of bounding box. The LP segmentation and
parameters to tune and make the pipeline more complicated.
center-ness sub-branches have 3 convolutional layers with
The goal of dice coefficient loss is the maximization of these
filter sizes of 3 × 3, 3 × 3, and 1 × 1, respectively. The
metrics, it performs better on the class imbalanced problems,
classification sub-branch of LP has four convolutional layers
which is defined as
with one more 3 × 3 convolutional layer, it predicts 4-channel
| X ∩ Y | + smooth
probability maps including 3 LP types and 1 background. Ls = 1 − 2 × (4)
| X | + | Y | + smooth
D. CHARACTER BRANCH Center-ness Loss: This is L2 loss and is defined as
Similar with the LP branch, the character branch also divides 1 X
n
into four sub-branches that perform character probability Lcenter = (c − c∗ )2 (5)
Npos
prediction (score map), bounding box regression,center-ness, k=1

21780 VOLUME 9, 2021

Q. Huang et al.: Single Neural Network for Mixed Style LP Detection and Recognition

where Npos denotes the number of positive samples, c and merge the bounding boxes row by row, thus reducing the
c∗ represent the prediction and ground truth of center-ness, computation cost of NMS.
respectively.
Geometry Loss: The proposed training loss function of H. TRAINING STRATEGY
geometry is The training strategy is an end-to-end learning process,
wherein the LP detection and character recognition are
Lg = Ldiou + λθ Lθ (6)
trained concurrently on the same network. The joint training
where Ldiou is the loss of regression, Lθ is angle loss, and λθ of these two tasks in a unified framework can avoid error
is set to 1. accumulations among cascade models. The amount of anno-
Regression Loss: Since the size of the LPs and characters tation is critical to the accuracy of the proposed network, but
vary widely, the loss function of regression should be scale- it is not practical to annotate it only by a human operator.
invariant. Otherwise, it will causes loss bias. Distance-IoU Therefore, we choose a two steps strategy. The training pro-
(DIoU) loss [35] is invariant to the scale of regression, and cess contains two stages: pre-trained on the synthetic dataset
it provides moving direction for bounding box when there is and fine-tuned on the real-world dataset.
no overlap with ground truth box. It considers on the overlap
area and central point distance of bounding boxes. DIoU is IV. EXPERIMENTS
also used in NMS. In this section, we conduct experiments to verify the effective-
ρ 2 (b, bgt ) ness of ALPRNet, and then we summary the testing results
Ldiou = 1 − IoU + (7) on the Hong Kong-Zhuhai-Macao (HZM) multi-style dataset
c2
and compare the proposed network with some state-of-the-art
where b and bgt indicate central points of bounding box, ρ(·) end-to-end ALPR methods on the AOLP [18] and PKU [17]
is the Euclidean distance, c is the diagonal length of the box datasets. These tasks are carried out on a digital computer
covering the two boxes, and IoU is the rate of intersection/ with one Nvidia RTX2070 GPU (PCIe 8 GB), wherein the
union areas between b and bgt . used CPU is Intel Corel i7-6700.
Angle Loss: The loss of rotation angle of bounding box is We implement ALPRNet in PyTorch1.0, and ResNet-50 is
computed by used as the backbone networks. The model is trained with
Lθ = 1 − cos (θ − θ ∗ ). (8) ‘‘Ranger’’ optimizer, and the initial learning rate is 0.0002,
it is reduced by the formula
where θ and θ∗
represent the prediction and ground truth of
the rotation angle of a bounding box, respectively. lr = 0.94(epoch−num/8) × lr0 . (10)
Classification Loss: This loss has the form:
1 X The learning rate is reduced after about 10 K iterations. The
Lc = Lcls (cx,y , c∗x,y ) (9) size of input images is 1024 × 1024. To support variant size
Npos x,y
of LPs and orientations, the training images are cropped,
where Lcls is the focal loss of the position (x, y), cx,y and resized, and rotated randomly, and then they are padded to
c∗x,y represent prediction and ground truth of class of the the size of 1024 × 1024 before feed to train.
position (x, y), respectively, and Npos is the number of positive
samples. A. HZM MULTI-STYLE DATASET OF LICENSE PLATES
The HZM multi-style dataset includes three styles of
G. INFERENCE LPs: Mainland China LP, Hong Kong LP with white
In ALPRNet, two different feature map layers (P1 and P2) background, and Macao LP with black background,
are used to predict characters and LPs. During the infer- which is a private dataset including 1376 images col-
ence, we forward the image through ALPRNet to obtain the lecting from the real-world system running on the
object scores sorex,y , regression prediction geox,y , center- Hong Kong-Zhuhai-Macao Bridge. The images can be
ness centerx,y , and classification classx,y of each location on divided into 4 groups: Mainland China+Macao LPs, Main-
the feature map. land China+Hong Kong LPs, Macao+Hong Kong LPs, and
Only the locations with scorex,y > 0.95 are considered Mainland China+Macao+Hong Kong LPs. The resolution of
as positive samples, but there are still a large number of these images is 1190 × 500 pixels. Fig. 2 shows the examples
bounding boxes that increase the workload of the following of vehicle running on the Hong Kong-Zhuhai-Macao Bridge.
NMS process. The quality of bounding boxes produced by
the locations that far away from the center of object is poor. B. ANNOTATION
Therefore, we use center-ness to filter out these locations that In this work, we choose VGG Image Annotator (VIA) to
have centerx,y < 0.3. The combination of using center-ness edit image annotations, the annotations of ground truths of
and score helps to filter out most of the low quality predic- all images of a dataset are saved in a single file. There are
tions. Moreover, since the bounding boxes from nearby pixels 74 object types to be annotated, including 3 LP types and
are highly correlated, we use Locality-Aware NMS [30] to 71 characters. The bounding boxes of objects are defined by

VOLUME 9, 2021 21781

Q. Huang et al.: Single Neural Network for Mixed Style LP Detection and Recognition

FIGURE 2. Example images from the HZM multi-style dataset. (a) Example image A. (b) Example image B. (c) Example image C.

TABLE 1. Performance of the Proposed Network on the HZM Multi-Style

Dataset, Wherein 176 Images Includes 280 LPs.

FIGURE 3. Score and classes map.

where r is the shrink rate, which is set to 0.3 by following

polygons that are more accurate than rectangles and are easy EAST.
to rotate the bounding box when we apply data augmentation.
E. EXPERIMENTAL RESULTS
C. GROUND TRUTH GENERATION The testing subset of the HZM multi-style dataset consists
We follow EAST to generate geometry map for each training of 176 images that are continuous in captured time. Totally,
object. For each training object, we calculate a rotated rect- there are 280 LPs in these images, about 1.5 LPs per image.
angle (RBOX) that covers the object with minimal area, and As shown in Table 1, the LP detection accuracy (LPDA) is
then generate a 4-channel (left, right, top, and bottom) geom- 100%, and the LP detection recall (LPDR) is 100%, there is
etry map that depicts the distance to 4 boundaries of RBOX no false alarm after post handle. The LP recognition accuracy
of each pixel of positive score. Different from EAST, we con- rate (E2E) is 98.21%, and there are 5 LPs that have errors with
struct two sets of score maps and geometry maps that are only one character misclassification.
used for LP detection and character recognition. To support The passing rate means only images that all LPs are mis-
object classification, the score map has a number of channels recognized causing the vehicle to be rejected. According to
corresponding to the number object classes to detect, such the rules of the Hong Kong-Zhuhai-Macao Bridge, a vehicle
as 4 (3 LP types and 1 background) channels for LPs and with multiple LPs can be identified by any of these LPs.
72 (71 characters and 1 background) channels for characters. Passing rate is an index for this specialized scenario, if one LP
To reduce the memory used for training, the channels of score has been correctly recognized in an image, it will be counted
map are only generated when calculating loss. Fig. 3 indicates as passing. In the testing dataset, only one image has not been
the score and class maps for LPs and characters. correctly identified, so the passing rate is 99.43%. Fig. 4 lists
some images of LPs that have been detected and read.
D. SHRINKING SCORE MAP
EAST has proved that the shrinking score map is critical F. PERFORMANCE ON THE AOLP DATASET
to dense object detection, it means that the center part of The AOLP dataset is a dataset of Taiwan LPs, which has
object is more important for training and predicting than edge 2049 images in total. The AOLP dataset is categorized into
parts. In this article, we use a simpler method that shrinks AC, LE, and RP subsets, wherein the RP subset of AOLP
the endpoint along the line from the endpoint to the center is the most challenging subset. We use images from other
point of quadrangle. The method is defined as follows: For a two subsets to train the model of each subsets, such as we
quadrangle V = {vi |i = 1, 2, 3, 4}, vi = {xi , yi } represents use images from LE and AC to train the model for RP sub-
the vertices on the quadrangle. To shrink V , we calculate the set. Then, we apply data augmentation by cropping, resize,
center point of the quadrangle and move its endpoints inward rotation, and affine transformation. Annotation of LPs and
along the line between endpoint and center point, as follows: characters are generated based on the ground truths infor-
4 mation from the AOLP dataset. We train the model with the
V (xi , yi )
P
synthesized dataset that is generated based on Taiwan LP
center =
i=0
, regulation first and trained it with the AOLP dataset.
4 Table 2 shows that the proposed network surpasses in LP
0
Vi = center + (Vi − center) × (1 − r ÷ 2) (11) detection on the three subsets. For the end-to-end recognition,

21782 VOLUME 9, 2021

Q. Huang et al.: Single Neural Network for Mixed Style LP Detection and Recognition

FIGURE 4. LPs have been detected and recognized, wherein ‘‘hk’’ means Hong Kong, ‘‘mo’’ means Macao, and ‘‘cn’’ means Mainland China.

FIGURE 5. Example results for LP detection and recognition based on the AOLP dataset.

FIGURE 6. Example results for LP detection and recognition based on the PKU dataset with the G5 subset.

the proposed network surpasses on all the subsets, especially Fig. 5 shows some images of the AOLP dataset that LPs
in the RP subset, the proposed network exceeds by 3 points. have been detected and recognized. The results prove the

VOLUME 9, 2021 21783

Q. Huang et al.: Single Neural Network for Mixed Style LP Detection and Recognition

TABLE 2. Comparison Results Based on the AOLP Dataset. [2] M.-X. He and P. Hao, ‘‘Robust automatic recognition of Chinese license
plates in natural scenes,’’ IEEE Access, vol. 8, pp. 173804–173814, 2020.
[3] W. Weihong and T. Jiaoyang, ‘‘Research on license plate recognition
algorithms based on deep learning in complex environment,’’ IEEE Access,
vol. 8, pp. 91661–91675, 2020.
[4] I. V. Pustokhina, D. A. Pustokhin, J. J. P. C. Rodrigues, D. Gupta,
A. Khanna, K. Shankar, C. Seo, and G. P. Joshi, ‘‘Automatic vehicle
license plate recognition using optimal K-means with convolutional neu-
ral network for intelligent transportation systems,’’ IEEE Access, vol. 8,
pp. 92907–92917, 2020.
[5] A. Tourani, A. Shahbahrami, S. Soroori, S. Khazaee, and C. Y. Suen,
‘‘A robust deep learning approach for automatic iranian vehicle license
plate detection and recognition for surveillance systems,’’ IEEE Access,
TABLE 3. Comparison Results Based on the PKU Dataset.
vol. 8, pp. 201317–201330, 2020.
[6] Y. Zou, Y. Zhang, J. Yan, X. Jiang, T. Huang, H. Fan, and Z. Cui, ‘‘A robust
license plate recognition model based on bi-LSTM,’’ IEEE Access, vol. 8,
pp. 211630–211641, 2020.
[7] H. Seibel, S. Goldenstein, and A. Rocha, ‘‘Eyes on the target: Super-
resolution and license-plate recognition in low-quality surveillance
videos,’’ IEEE Access, vol. 5, pp. 20020–20035, 2017.
[8] S. Zhang, G. Tang, Y. Liu, and H. Mao, ‘‘Robust license plate recog-
nition with shared adversarial training network,’’ IEEE Access, vol. 8,
pp. 697–705, 2020.
[9] B. B. Yousif, M. M. Ata, N. Fawzy, and M. Obaya, ‘‘Toward an opti-
mized neutrosophic k-means with genetic algorithm for automatic vehi-
effectiveness of the proposed network, it shows that the cle license plate recognition (ONKM-AVLPR),’’ IEEE Access, vol. 8,
proposed network has advantage in LPDA, which comes pp. 49285–49312, 2020.
[10] W. Wang, J. Yang, M. Chen, and P. Wang, ‘‘A light CNN for end-to-
from two collaborative parallel branches of LP detection and end car license plates detection and recognition,’’ IEEE Access, vol. 7,
character recognition. pp. 173875–173883, 2019.
[11] Hendry and R.-C. Chen, ‘‘Automatic license plate recognition via sliding-
G. PERFORMANCE ON THE PKU DATASET window darknet-YOLO deep learning,’’ Image Vis. Comput., vol. 87,
pp. 47–56, Jul. 2019.
The PKU dataset contains 3977 images of Mainland China
[12] Z. Selmi, M. B. Halima, U. Pal, and M. A. Alimi, ‘‘DELP-DAR system for
LPs. The dataset is divided into 5 groups (i.e., G1, G2, G3, license plate detection and recognition,’’ Pattern Recognit. Lett., vol. 129,
G4, and G5). Since there is only ground-truth information of pp. 213–223, Jan. 2020.
bounding boxes of LPs, we only use it to evaluate the perfor- [13] R. Laroca, E. Severo, L. A. Zanlorensi, L. S. Oliveira, G. R. Goncalves,
W. R. Schwartz, and D. Menotti, ‘‘A robust real-time automatic
mance of LP detection. Based on the training results of the license plate recognition based on the YOLO detector,’’ in Proc. Int.
synthesis dataset, we use G1 to train the model for G2 subset Joint Conf. Neural Netw. (IJCNN), Rio de Janeiro, Brazil, Jul. 2018,
and apply data augmentation on it. Then, the G1 and G2 sub- Art. no. 18165770.
[14] S. M. Silva and C. R. Jung, ‘‘License plate detection and recognition in
sets are used to train the model for G3, G4, and G5, The unconstrained scenarios,’’ in Proc. Conf. Comput. Vis. Munich, Germany:
G2 subset is used to train the model for G1. Table 3 shows that Springer, 2018, pp. 593–609.
the proposed network achieves good results when compared [15] H. Li and C. Shen, ‘‘Reading car license plates using deep convolutional
neural networks and LSTMs,’’ 2016, arXiv:1601.05610. [Online]. Avail-
with some state-of-the-art methods. Fig. 6 shows that LPs able: http://arxiv.org/abs/1601.05610
have been detected and recognized. [16] Y. Cao, H. Fu, and H. Ma, ‘‘An end-to-end neural network for multi-
line license plate recognition,’’ in Proc. 24th Int. Conf. Pattern Recognit.
V. CONCLUSION (ICPR), Beijing, China, Aug. 2018, pp. 3698–3703.
In this article, we present a one-stage ALPRNet for multiple [17] Y. L. Yuan, W. B. Zou, Y. Zhao, X. Wang, X. F. Hu, and N. Komodakis,
‘‘A robust and efficient approach to license plate detection,’’ IEEE Trans.
and mixed style LP recognition, which equally treats LPs Image Process., vol. 26, no. 3, pp. 1102–1114, Mar. 2016.
and characters as objects to detect and classify, and it con- [18] G.-S. Hsu, J.-C. Chen, and Y.-Z. Chung, ‘‘Application-oriented license
ducts these two tasks simultaneously. This results in a one- plate recognition,’’ IEEE Trans. Veh. Technol., vol. 62, no. 2, pp. 552–561,
Feb. 2013.
stage fully convolutional framework that solves LP detection [19] A. H. Ashtari, M. J. Nordin, and M. Fathy, ‘‘An iranian license plate
and recognition tasks in a integrated framework without any recognition system based on color features,’’ IEEE Trans. Intell. Transp.
RNNs branches. By sharing the convolutional feature maps, Syst., vol. 15, no. 4, pp. 1690–1705, Aug. 2014.
[20] S. Yu, B. Li, Q. Zhang, C. Liu, and M. Q.-H. Meng, ‘‘A novel license plate
ALPRNet is compact with less parameters, and these two location method based on wavelet transform and EMD analysis,’’ Pattern
tasks can be trained more effectively and collaboratively. Recognit., vol. 48, no. 1, pp. 114–125, Jan. 2015.
In the experiments, ALPRNet achieves 98.21% accuracy rate [21] B. Li, B. Tian, Y. Li, and D. Wen, ‘‘Component-based license plate detec-
on the HZM multi-style dataset, and the results on the datasets tion using conditional random field model,’’ IEEE Trans. Intell. Transp.
Syst., vol. 14, no. 4, pp. 1690–1699, Dec. 2013.
with single LP style also show that the proposed network [22] D. F. Llorca, C. Salinas, M. Jimenez, I. Parra, A. G. Morcillo, R. Izquierdo,
achieves state-of-the-art recognition accuracy. J. Lorenzo, and M. A. Sotelo, ‘‘Two-camera based accurate vehicle speed
measurement using average speed at a fixed point,’’ in Proc. IEEE 19th
Int. Conf. Intell. Transp. Syst. (ITSC), Rio de Janeiro, Brazil, Nov. 2016,
REFERENCES pp. 2533–2538.
[1] C. Henry, S. Y. Ahn, and S. Lee, ‘‘Multinational license plate recognition [23] J. Redmon and A. Farhadi, ‘‘YOLO9000: Better, faster, stronger,’’ in Proc.
using generalized character sequence detection,’’ IEEE Access, vol. 8, IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Honolulu, HI, USA,
pp. 35185–35199, 2020. Jul. 2017, pp. 6517–6525.

21784 VOLUME 9, 2021

Q. Huang et al.: Single Neural Network for Mixed Style LP Detection and Recognition

[24] M. A. Rafique, W. Pedrycz, and M. Jeon, ‘‘Vehicle license plate detection ZHANCHUAN CAI (Senior Member, IEEE)
using region-based convolutional neural networks,’’ Soft Comput., vol. 22, received the Ph.D. degree from Sun Yat-sen Uni-
no. 19, pp. 6429–6440, Oct. 2018. versity, Guangzhou, China, in 2007.
[25] H. Li and C. Shen, ‘‘Reading car license plates using deep convolutional From 2007 to 2008, he was a Visiting Scholar
neural networks and LSTMs,’’ 2016, arXiv:1601.05610. [Online]. Avail- with the University of Nevada at Las Vegas, NV,
able: http://arxiv.org/abs/1601.05610 USA. He is currently a Professor with the Faculty
[26] H. Li, P. Wang, and C. Shen, ‘‘Toward end-to-end car license plate of Information Technology, Macau University of
detection and recognition with deep neural networks,’’ IEEE Trans. Intell.
Science and Technology, Macau, China, where he
Transp. Syst., vol. 20, no. 3, pp. 1126–1136, Mar. 2019.
is also with the State Key Laboratory of Lunar and
[27] K. Simonyan and A. Zisserman, ‘‘Very deep convolutional networks for
large-scale image recognition,’’ in Proc. Int. Conf. Learn. Represent., San Planetary Sciences. His research interests include
Diego, CA, USA, 2015, pp. 1–14. image processing and computer graphics, intelligent information processing,
[28] J. Long, E. Shelhamer, and T. Darrell, ‘‘Fully convolutional networks multimedia information security, and remote sensing data processing and
for semantic segmentation,’’ in Proc. IEEE Conf. Comput. Vis. Pattern analysis.
Recognit. (CVPR), Boston, MA, USA, Jun. 2015, pp. 3431–3440. Dr. Cai is a member of the Association for Computing Machinery,
[29] H. Li, P. Wang, and C. Shen, ‘‘Towards end-to-end text spotting with con- the Chang’e-3 Scientific Data Research and Application Core Team, and
volutional recurrent neural networks,’’ in Proc. IEEE Int. Conf. Comput. the Asia Graphics Association. He is also a Distinguished Member of the
Vis. (ICCV), Venice, Italy, Oct. 2017, pp. 5248–5256. China Computer Federation. He was a recipient of the Third prize of the
[30] X. Zhou, C. Yao, H. Wen, Y. Wang, S. Zhou, W. He, and J. Liang, Macau Science and Technology Award-Natural Science Award, in 2012,
‘‘EAST: An efficient and accurate scene text detector,’’ in Proc. IEEE Conf. the BOC Excellent Research Award from the Macau University of Science
Comput. Vis. Pattern Recognit. (CVPR), Honolulu, HI, USA, Jul. 2017, and Technology, in 2016, the Third Prize of the Macau Science and Technol-
pp. 2642–2651. ogy Award-Technological Invention Award, in 2018, and the Second Prize
[31] P. He, W. Huang, T. He, Q. Zhu, Y. Qiao, and X. Li, ‘‘Single shot text of the Teaching Achievement Award from the Macau University of Science
detector with regional attention,’’ in Proc. IEEE Int. Conf. Comput. Vis.
and Technology, in 2020.
(ICCV), Venice, Italy, Oct. 2017, pp. 3066–3074.
[32] L. Xing, Z. Tian, W. Huang, and M. Scott, ‘‘Convolutional character
networks,’’ in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Seoul,
South Korea, Oct. 2019, pp. 9125–9135.
[33] Z. Tian, C. Shen, H. Chen, and T. He, ‘‘FCOS: Fully convolutional one-
stage object detection,’’ in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV),
Seoul, South Korea, Oct. 2019, pp. 9626–9635.
[34] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Deep residual learning for image
recognition,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
Las Vegas, NV, USA, Jun. 2016, pp. 770–778.
[35] Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, and D. Ren, ‘‘Distance-IoU
loss: Faster and better learning for bounding box regression,’’ 2019,
arXiv:1911.08287. [Online]. Available: http://arxiv.org/abs/1911.08287
[36] W. Zhou, H. Li, Y. Lu, and Q. Tian, ‘‘Principal visual word discovery for
automatic license plate detection,’’ IEEE Trans. Image Process., vol. 21, TING LAN received the M.S. degree from the
no. 9, pp. 4269–4279, Sep. 2012. University of Macau, Macau, China, in 2014, and
the Ph.D. degree from the Macau University of
Science and Technology, Macau, in 2019.
He is currently a Postdoctoral Fellow with the
Faculty of Information Technology, Macau Uni-
QIUYING HUANG received the M.S. degree versity of Science and Technology. His research
in software engineering from the Beijing Insti- interests include image processing, data process-
tute of Technology, Beijing, China, in 2013. He ing and analysis, and computer graphics.
is currently pursuing the Ph.D. degree with the Dr. Lan was a recipient of the First Prize at
Faculty of Information Technology, Macau Uni- the 14th China Postgraduate Mathematical Contest in Modeling, China
versity of Science and Technology, Macau, China. Academic Degrees, and the Graduate Education Development Center and
His research interests include image processing China Graduate Mathematical Contest in Modeling Committee, in 2017, and
and artificial intelligence. the Third Prize of the Macau Science and Technology Award-Technological
Invention Award, in 2018.