A Single Neural Network For Mixed Style License Plate Detection and Recognition
A Single Neural Network For Mixed Style License Plate Detection and Recognition
ABSTRACT Most existing methods for automatic license plate recognition (ALPR) focus on a specific
license plate (LP) type, but little work focuses on multiple or mixed LPs. This article proposes a single
neural network called ALPRNet for detection and recognition of mixed style LPs. In ALPRNet, two fully
convolutional one-stage object detectors are used to detect and classify LPs and characters simultaneously,
which are followed by an assembly module to output the LP strings. ALPRNet treats LP and character
equally, object detectors directly output bounding boxes of LPs and characters with corresponding labels,
so they avoid the recurrent neural network (RNN) branches of optical character recognition (OCR) of the
existing recognition approaches. We evaluate ALPRNet on a mixed LP style dataset and two datasets with
single LP style, the experimental results show that the proposed network achieves state-of-the-art results
with a simple one-stage network.
INDEX TERMS ALPRNet, license plate recognition, object recognition, convolutional neural network.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 9, 2021 21777
Q. Huang et al.: Single Neural Network for Mixed Style LP Detection and Recognition
performance of these object detection networks is sensitive construct conditional random field (CRF) models on the
to their size, aspect ratios, and the number of anchor boxes, candidate characters to represent the relationship among can-
moreover, anchor boxes involve complicated computation didate characters. And it localize LPs through the belief
of intersection over union (IoU) scored with ground-truth propagation inference on CRF. In [22], stroke width trans-
bounding boxes. form (SWT) and MSER are combined to detect character
In this article, we propose an ALPR network called regions and LPs by probabilistic Hough transform. Character
ALPRNet, wherein a single neural network conducts LP first methods have received a high recall, but they are easily
detection and recognition simultaneously, and the characters disturbed by background text.
are considered as objects that need to be detect and clas-
sify. Such scheme predicts the bounding boxes and labels of B. ALPR WITH DEEP LEARNING
LPs and characters directly and gets rid of the complicated In order to achieve high accuracy, more and more ALPR
RNN-based recognition procedure with the region of interest work turns to use CNNs to detect and recognize LPs. Most
(RoI) pooling and crop procedures of object detection. And of these methods are based on the traditional approach with
the contributions of this work are concluded as follows: two stages, i.e., the detection network based on the generic
1). We design a novel one-stage network for LP detection object detection methods (such as YOLO, Faster R-CNN,
and recognition, wherein two parallel branches of object and SSD-based scheme) is used to detect LPs, and the
detection and classification are introduced, they directly recurrent neural network is used as the recognition net-
produce the LP type and the LP string excluding the work of OCR. In [14], the authors proposed a complete
redundant and intermediate steps. ALPR scheme that employs three subnetworks to conduct
2). We introduce the classification branch of LPs to support vehicle location, LP detection, and character recognition.
multiple and mixed style LPs. A YOLOv2 [23] based subnetwork is used to detect vehicle,
3). We use the bounding box of LPs and characters to wherein a novel CNN called WPOD-NET is created for LP
assemble the LPs, thus ALPRNet naturally supports detection and affine transformation regress, and a modified
multi-line and variable length of LPs. YOLO network served as OCR module for character recog-
nition. In [13], a real-time ALPR system is proposed, which
The rest of the article is organized as follows: Section II uses a YOLO-based network for vehicle and LP detection,
gives a brief review of detection and related work. Section III and a CR-NET [24] network is used for character segmenta-
presents a detail description of the proposed network. tion and recognition. In [25], CNN-based text detectors are
Section IV shows the experimental results of the proposed used to search character regions in digital images and cluster
network. The conclusion of this work is given in Section V. them by edge information to construct LP, and a RNN with
connectionist temporal classification (CTC) is employed to
II. RELATED WORK label and recognize characters.
A. ALPR WITHOUT DEEP LEARNING In addition, some end-to-end ALPR methods have been
The existing ALPR methods can be classified into two cate- proposed by using united network to perform LP detection
gories: LP first and character first approaches. A traditional and character recognition, but they still include sequential
two-stage ALPR procedure uses the features of edge, color, stages for LP detection and OCR. In [26], a method uses
and texture to detect the location of LP, and then performs VGG16 network [27] as backbone network and employs a
character segmentation and features extraction followed region proposal network (RPN) on different layers of feature
by machine learning classifiers to identify each character. pyramid network (FPN) to generate LP proposals and regress
Yuan et al. [17] proposed a line density filter to connect the bounding box, and RNNs with CTC is used to recognize
regions by edge density from binary images. In [18], edges characters.
are clustered by Expectation-Maximization algorithm for LP
detection. Ashtari et al. [19] developed a method of LP detec- C. SCENE TEXT SPOTTING WITH DEEP LEARNING
tion that analyzes pixels by a color geometric template via Currently, many scene text spotting (STS) approaches based
strip search. Yu et al. [20] employed a wavelet transform to on deep learning have been proposed, but most of them
produce horizontal and vertical features of an image and used employ generic object detectors (such as Faster R-CNN, SSD,
empirical mode decomposition (EMD) analysis to determine and YOLO) to detect and regress text instances directly, or to
the position of a LP. These methods are suitable for LPs predict text/non-text probability of each pixel by semantic
that have distinguished features, but they are too sensitive to segmentation methods (such as fully convolutional network
complex images. (FCN) [28]. Li et al. [29] proposed an end-to-end approach
Character first methods try to find character regions in that integrates text detection and recognition into a uni-
the image and then cluster these regions based on the graph fied framework, which consists a tailored region proposal
semantics to construct LPs. The graphic semantics includes network for text detection and an attention-based recurrent
the front and background color, orientation, characteristic neural network (RNN) encoder/decoder for text recognition.
scale, position, etc. In [21], maximally stable extremal region Zhou et al. [30] proposed an efficient and accurate scene text
(MSER) is used to extract candidate characters, and then detector (EAST detector) to predict a word or line-level text,
FIGURE 1. Framework of the proposed network: ALPRNet. Two fully convolutional one-stage object detectors are employed to detect bounding boxes
and classify LPs and characters. The class number of LPs is 4 (Background, Mainland China, Hong Kong, and Macao), and the class number of characters
is 73 (1 background, 26 English characters, 10 numeric characters, and 36 Chinese characters).
which consists of a fully convolutional network with non- LP strings. Branches of LP and character are integrated seam-
maximum suppression (NMS) merging state. He et al. [31] lessly, resulting in an end-to-end trainable model.
proposed an end-to-end framework based on SSD by intro- The object detectors are fully convolutional, which are
ducing an text attention module, which enables a direct text anchor free and get rid of the complicated RoI cropping and
mask supervision and achieves strong performance improve- pooling operations of mainstream object detector methods.
ments by training text detection and recognition jointly. It directly classifies the detected objects, so it avoids the
Xing [32] proposed a one-stage model that processes text RNNs branch of OCR. As a result, the network is simple and
detection and recognition simultaneously. easy to extend in practical use.
The proposed network is a one-stage end-to-end model
that treats LP detection and character recognition as object A. BACKBONE AND FPN
detection. We directly regress each object’s bounding box ResNet-50 is employed as backbone networks and applies a
and classify object types (LP type and character). It naturally feature pyramid structure (FPN) by using a top-down struc-
supports multiple and mixed style LPs, and also supports LP ture to fuse features across multiple resolutions, and we use
that has multiple lines or variable length of characters. different layers of FPN to perform character and LP detection.
The P2 layer of FPN has 1/8 resolution of the input image
III. METHODOLOGY that provides a larger receptive field size, it is suitable for
The proposed ALPRNet is a one-stage convolutional network LP detection. The P1 layer of FPN has 1/4 resolution of the
consisting of two fully convolutional one-stage object detec- input image that provides more detail features, it is suitable
tors (FCOS detectors) [33] for the detection and classification for small object detection and is used for character branch.
of LPs and characters. These two detectors are treated equally
and implemented in parallel on different levels of feature B. FULL CONVOLUTIONAL ONE-STAGE OBJECT DETECTOR
maps derived from backbone network, as shown in Fig. 1. We introduce two FCOS detectors to densely predict bound-
We employ a ResNet-50 [34] with feature pyramid structure ing box, center-ness, orientation, and classification of LPs
(FPN) to serve as backbone, design a assembly module to and characters. Different from anchor-base detectors that
combine detected LPs and characters, and finally output the regress target bounding box with anchor boxes as reference,
FCOS detector regards locations as training samples and and character classification (class). It uses the P1 output
directly regresses the target bounding box at each location. of FPN as input feature map that has 1/4 spatial resolu-
FCOS detector reduces intermediate steps, such as RoI pro- tion of the input image. The classification sub-branch pre-
posals and segmentation. Moreover, it eliminates the sample dicts 73-channel probability maps including 1 background,
imbalance problem between positive and negative anchor 26 English characters, 10 numeric characters, and 36 Chinese
boxes during training, that causes the training to be inef- characters.
ficient. The regression targets of a location (x, y) are the
distances to the borders of the bounding box, and it has the E. ASSEMBLE MODULE
form: Assemble module is to combine the detected LPs and
characters and finally outputs strings. The assembly operation
l ∗ = x − x0 , t ∗ = y − y0 ,
depends on the calculation of the max overlap rate of charac-
r ∗ = x1 − x, b∗ = y1 − y (1) ter to each LP, which supports LPs with multiple lines or vari-
where (x0 ,y0 ) and (x1 ,y1 ) are the top-left and right-bottom ant in character count. By IoU calculation, the predicted
points of a bounding box, respectively. bounding boxes of LPs and characters are applied to manip-
FCOS detector predicts the center-ness cnx,y of each loca- ulate the detected characters into LPs.
tion (x, y) to depict the normalized distance from the location The IoU calculation orientated bounding box is more accu-
to the center of the object. In inference stage, center-ness is rate than rectangle bounding box, especially for LPs with
used to filter out low-quality predicted bounding boxes pro- multi-lines. The use of orientated bounding box helps to
duced by locations far from the center of an object. Such that calculate the correct order of each character of multi-line LP
it improves detection performance and reduces the workload by rotating the LP to horizontal and sorting by the coordinate
of NMS process. The center-ness of each location is defined (x, y). Assemble module helps to handle non-LP characters
as: in input image, only candidate characters in LP are treated as
s detected characters and only candidate LPs have characters
min(l ∗ , r ∗ ) min(t ∗ , b∗ ) are treated as detected LP. It filters out most of the non-LP
cnx,y = × (2)
max(l ∗ , r ∗ ) max(t ∗ , b∗ ) characters. Finally we run the the outputs of assemble module
against a regular expression matcher to find results that match
where l∗ , t∗ , r∗ , and b∗ are the regression predictions of each LP patterns of the corresponding LP type.
location bx,y .
F. LOSS FUNCTIONS
C. LICENSE PLATE BRANCH Corresponding to 4 sub-branches of LP detection and charac-
The LP branch is designed to detect and classify the type ter recognition branches, the proposed train target is divided
of LP in a higher level concept. This branch contains four into four parts: score map loss (Ls ), center-ness loss (Lcenter ),
sub-branches for LP segmentation (score map), bounding box geometry loss (Lg ), and classification loss (Lc ). The loss
and orientation regression (geometry), center-ness regres- formula is defined as follows:
sion, and LP classification (class), respectively. We use the
convolutional feature map from the P2 level of FPN to imple- L = Ls + Lcenter + λLg + βLc (3)
ment LP detection, and the input feature map has 1/8 spatial
In this article, λ is set to 1, and β is set to 10 to balance
resolution of the input image.
the losses of geometry and classification. In training, we find
The bounding box regression sub-branch employs 2
that characters and LPs are small objects, so the deviation
convolutional layers with filter size of 3 × 3, followed by
from bounding box results bigger loss than the general object
2 convolutional layers with filter size of 1 × 1 to produce the
detection. This situation leads to a training stagnation, so the
5-channel feature maps, estimating a LP bounding box at each
β value is set to 10 for solving this problem.
spatial location. Each bounding box is parameterized by five
Score Map Loss: Most of the detection pipelines face
parameters, indicating the distances of current location to the
to the class imbalanced problem, thus one should carefully
top, bottom, left, and right sides of the bounding box, as well
process training image. However, it always introduces more
as the orientation of bounding box. The LP segmentation and
parameters to tune and make the pipeline more complicated.
center-ness sub-branches have 3 convolutional layers with
The goal of dice coefficient loss is the maximization of these
filter sizes of 3 × 3, 3 × 3, and 1 × 1, respectively. The
metrics, it performs better on the class imbalanced problems,
classification sub-branch of LP has four convolutional layers
which is defined as
with one more 3 × 3 convolutional layer, it predicts 4-channel
| X ∩ Y | + smooth
probability maps including 3 LP types and 1 background. Ls = 1 − 2 × (4)
| X | + | Y | + smooth
D. CHARACTER BRANCH Center-ness Loss: This is L2 loss and is defined as
Similar with the LP branch, the character branch also divides 1 X
n
into four sub-branches that perform character probability Lcenter = (c − c∗ )2 (5)
Npos
prediction (score map), bounding box regression,center-ness, k=1
where Npos denotes the number of positive samples, c and merge the bounding boxes row by row, thus reducing the
c∗ represent the prediction and ground truth of center-ness, computation cost of NMS.
respectively.
Geometry Loss: The proposed training loss function of H. TRAINING STRATEGY
geometry is The training strategy is an end-to-end learning process,
wherein the LP detection and character recognition are
Lg = Ldiou + λθ Lθ (6)
trained concurrently on the same network. The joint training
where Ldiou is the loss of regression, Lθ is angle loss, and λθ of these two tasks in a unified framework can avoid error
is set to 1. accumulations among cascade models. The amount of anno-
Regression Loss: Since the size of the LPs and characters tation is critical to the accuracy of the proposed network, but
vary widely, the loss function of regression should be scale- it is not practical to annotate it only by a human operator.
invariant. Otherwise, it will causes loss bias. Distance-IoU Therefore, we choose a two steps strategy. The training pro-
(DIoU) loss [35] is invariant to the scale of regression, and cess contains two stages: pre-trained on the synthetic dataset
it provides moving direction for bounding box when there is and fine-tuned on the real-world dataset.
no overlap with ground truth box. It considers on the overlap
area and central point distance of bounding boxes. DIoU is IV. EXPERIMENTS
also used in NMS. In this section, we conduct experiments to verify the effective-
ρ 2 (b, bgt ) ness of ALPRNet, and then we summary the testing results
Ldiou = 1 − IoU + (7) on the Hong Kong-Zhuhai-Macao (HZM) multi-style dataset
c2
and compare the proposed network with some state-of-the-art
where b and bgt indicate central points of bounding box, ρ(·) end-to-end ALPR methods on the AOLP [18] and PKU [17]
is the Euclidean distance, c is the diagonal length of the box datasets. These tasks are carried out on a digital computer
covering the two boxes, and IoU is the rate of intersection/ with one Nvidia RTX2070 GPU (PCIe 8 GB), wherein the
union areas between b and bgt . used CPU is Intel Corel i7-6700.
Angle Loss: The loss of rotation angle of bounding box is We implement ALPRNet in PyTorch1.0, and ResNet-50 is
computed by used as the backbone networks. The model is trained with
Lθ = 1 − cos (θ − θ ∗ ). (8) ‘‘Ranger’’ optimizer, and the initial learning rate is 0.0002,
it is reduced by the formula
where θ and θ∗
represent the prediction and ground truth of
the rotation angle of a bounding box, respectively. lr = 0.94(epoch−num/8) × lr0 . (10)
Classification Loss: This loss has the form:
1 X The learning rate is reduced after about 10 K iterations. The
Lc = Lcls (cx,y , c∗x,y ) (9) size of input images is 1024 × 1024. To support variant size
Npos x,y
of LPs and orientations, the training images are cropped,
where Lcls is the focal loss of the position (x, y), cx,y and resized, and rotated randomly, and then they are padded to
c∗x,y represent prediction and ground truth of class of the the size of 1024 × 1024 before feed to train.
position (x, y), respectively, and Npos is the number of positive
samples. A. HZM MULTI-STYLE DATASET OF LICENSE PLATES
The HZM multi-style dataset includes three styles of
G. INFERENCE LPs: Mainland China LP, Hong Kong LP with white
In ALPRNet, two different feature map layers (P1 and P2) background, and Macao LP with black background,
are used to predict characters and LPs. During the infer- which is a private dataset including 1376 images col-
ence, we forward the image through ALPRNet to obtain the lecting from the real-world system running on the
object scores sorex,y , regression prediction geox,y , center- Hong Kong-Zhuhai-Macao Bridge. The images can be
ness centerx,y , and classification classx,y of each location on divided into 4 groups: Mainland China+Macao LPs, Main-
the feature map. land China+Hong Kong LPs, Macao+Hong Kong LPs, and
Only the locations with scorex,y > 0.95 are considered Mainland China+Macao+Hong Kong LPs. The resolution of
as positive samples, but there are still a large number of these images is 1190 × 500 pixels. Fig. 2 shows the examples
bounding boxes that increase the workload of the following of vehicle running on the Hong Kong-Zhuhai-Macao Bridge.
NMS process. The quality of bounding boxes produced by
the locations that far away from the center of object is poor. B. ANNOTATION
Therefore, we use center-ness to filter out these locations that In this work, we choose VGG Image Annotator (VIA) to
have centerx,y < 0.3. The combination of using center-ness edit image annotations, the annotations of ground truths of
and score helps to filter out most of the low quality predic- all images of a dataset are saved in a single file. There are
tions. Moreover, since the bounding boxes from nearby pixels 74 object types to be annotated, including 3 LP types and
are highly correlated, we use Locality-Aware NMS [30] to 71 characters. The bounding boxes of objects are defined by
FIGURE 2. Example images from the HZM multi-style dataset. (a) Example image A. (b) Example image B. (c) Example image C.
FIGURE 4. LPs have been detected and recognized, wherein ‘‘hk’’ means Hong Kong, ‘‘mo’’ means Macao, and ‘‘cn’’ means Mainland China.
FIGURE 5. Example results for LP detection and recognition based on the AOLP dataset.
FIGURE 6. Example results for LP detection and recognition based on the PKU dataset with the G5 subset.
the proposed network surpasses on all the subsets, especially Fig. 5 shows some images of the AOLP dataset that LPs
in the RP subset, the proposed network exceeds by 3 points. have been detected and recognized. The results prove the
TABLE 2. Comparison Results Based on the AOLP Dataset. [2] M.-X. He and P. Hao, ‘‘Robust automatic recognition of Chinese license
plates in natural scenes,’’ IEEE Access, vol. 8, pp. 173804–173814, 2020.
[3] W. Weihong and T. Jiaoyang, ‘‘Research on license plate recognition
algorithms based on deep learning in complex environment,’’ IEEE Access,
vol. 8, pp. 91661–91675, 2020.
[4] I. V. Pustokhina, D. A. Pustokhin, J. J. P. C. Rodrigues, D. Gupta,
A. Khanna, K. Shankar, C. Seo, and G. P. Joshi, ‘‘Automatic vehicle
license plate recognition using optimal K-means with convolutional neu-
ral network for intelligent transportation systems,’’ IEEE Access, vol. 8,
pp. 92907–92917, 2020.
[5] A. Tourani, A. Shahbahrami, S. Soroori, S. Khazaee, and C. Y. Suen,
‘‘A robust deep learning approach for automatic iranian vehicle license
plate detection and recognition for surveillance systems,’’ IEEE Access,
TABLE 3. Comparison Results Based on the PKU Dataset.
vol. 8, pp. 201317–201330, 2020.
[6] Y. Zou, Y. Zhang, J. Yan, X. Jiang, T. Huang, H. Fan, and Z. Cui, ‘‘A robust
license plate recognition model based on bi-LSTM,’’ IEEE Access, vol. 8,
pp. 211630–211641, 2020.
[7] H. Seibel, S. Goldenstein, and A. Rocha, ‘‘Eyes on the target: Super-
resolution and license-plate recognition in low-quality surveillance
videos,’’ IEEE Access, vol. 5, pp. 20020–20035, 2017.
[8] S. Zhang, G. Tang, Y. Liu, and H. Mao, ‘‘Robust license plate recog-
nition with shared adversarial training network,’’ IEEE Access, vol. 8,
pp. 697–705, 2020.
[9] B. B. Yousif, M. M. Ata, N. Fawzy, and M. Obaya, ‘‘Toward an opti-
mized neutrosophic k-means with genetic algorithm for automatic vehi-
effectiveness of the proposed network, it shows that the cle license plate recognition (ONKM-AVLPR),’’ IEEE Access, vol. 8,
proposed network has advantage in LPDA, which comes pp. 49285–49312, 2020.
[10] W. Wang, J. Yang, M. Chen, and P. Wang, ‘‘A light CNN for end-to-
from two collaborative parallel branches of LP detection and end car license plates detection and recognition,’’ IEEE Access, vol. 7,
character recognition. pp. 173875–173883, 2019.
[11] Hendry and R.-C. Chen, ‘‘Automatic license plate recognition via sliding-
G. PERFORMANCE ON THE PKU DATASET window darknet-YOLO deep learning,’’ Image Vis. Comput., vol. 87,
pp. 47–56, Jul. 2019.
The PKU dataset contains 3977 images of Mainland China
[12] Z. Selmi, M. B. Halima, U. Pal, and M. A. Alimi, ‘‘DELP-DAR system for
LPs. The dataset is divided into 5 groups (i.e., G1, G2, G3, license plate detection and recognition,’’ Pattern Recognit. Lett., vol. 129,
G4, and G5). Since there is only ground-truth information of pp. 213–223, Jan. 2020.
bounding boxes of LPs, we only use it to evaluate the perfor- [13] R. Laroca, E. Severo, L. A. Zanlorensi, L. S. Oliveira, G. R. Goncalves,
W. R. Schwartz, and D. Menotti, ‘‘A robust real-time automatic
mance of LP detection. Based on the training results of the license plate recognition based on the YOLO detector,’’ in Proc. Int.
synthesis dataset, we use G1 to train the model for G2 subset Joint Conf. Neural Netw. (IJCNN), Rio de Janeiro, Brazil, Jul. 2018,
and apply data augmentation on it. Then, the G1 and G2 sub- Art. no. 18165770.
[14] S. M. Silva and C. R. Jung, ‘‘License plate detection and recognition in
sets are used to train the model for G3, G4, and G5, The unconstrained scenarios,’’ in Proc. Conf. Comput. Vis. Munich, Germany:
G2 subset is used to train the model for G1. Table 3 shows that Springer, 2018, pp. 593–609.
the proposed network achieves good results when compared [15] H. Li and C. Shen, ‘‘Reading car license plates using deep convolutional
neural networks and LSTMs,’’ 2016, arXiv:1601.05610. [Online]. Avail-
with some state-of-the-art methods. Fig. 6 shows that LPs able: http://arxiv.org/abs/1601.05610
have been detected and recognized. [16] Y. Cao, H. Fu, and H. Ma, ‘‘An end-to-end neural network for multi-
line license plate recognition,’’ in Proc. 24th Int. Conf. Pattern Recognit.
V. CONCLUSION (ICPR), Beijing, China, Aug. 2018, pp. 3698–3703.
In this article, we present a one-stage ALPRNet for multiple [17] Y. L. Yuan, W. B. Zou, Y. Zhao, X. Wang, X. F. Hu, and N. Komodakis,
‘‘A robust and efficient approach to license plate detection,’’ IEEE Trans.
and mixed style LP recognition, which equally treats LPs Image Process., vol. 26, no. 3, pp. 1102–1114, Mar. 2016.
and characters as objects to detect and classify, and it con- [18] G.-S. Hsu, J.-C. Chen, and Y.-Z. Chung, ‘‘Application-oriented license
ducts these two tasks simultaneously. This results in a one- plate recognition,’’ IEEE Trans. Veh. Technol., vol. 62, no. 2, pp. 552–561,
Feb. 2013.
stage fully convolutional framework that solves LP detection [19] A. H. Ashtari, M. J. Nordin, and M. Fathy, ‘‘An iranian license plate
and recognition tasks in a integrated framework without any recognition system based on color features,’’ IEEE Trans. Intell. Transp.
RNNs branches. By sharing the convolutional feature maps, Syst., vol. 15, no. 4, pp. 1690–1705, Aug. 2014.
[20] S. Yu, B. Li, Q. Zhang, C. Liu, and M. Q.-H. Meng, ‘‘A novel license plate
ALPRNet is compact with less parameters, and these two location method based on wavelet transform and EMD analysis,’’ Pattern
tasks can be trained more effectively and collaboratively. Recognit., vol. 48, no. 1, pp. 114–125, Jan. 2015.
In the experiments, ALPRNet achieves 98.21% accuracy rate [21] B. Li, B. Tian, Y. Li, and D. Wen, ‘‘Component-based license plate detec-
on the HZM multi-style dataset, and the results on the datasets tion using conditional random field model,’’ IEEE Trans. Intell. Transp.
Syst., vol. 14, no. 4, pp. 1690–1699, Dec. 2013.
with single LP style also show that the proposed network [22] D. F. Llorca, C. Salinas, M. Jimenez, I. Parra, A. G. Morcillo, R. Izquierdo,
achieves state-of-the-art recognition accuracy. J. Lorenzo, and M. A. Sotelo, ‘‘Two-camera based accurate vehicle speed
measurement using average speed at a fixed point,’’ in Proc. IEEE 19th
Int. Conf. Intell. Transp. Syst. (ITSC), Rio de Janeiro, Brazil, Nov. 2016,
REFERENCES pp. 2533–2538.
[1] C. Henry, S. Y. Ahn, and S. Lee, ‘‘Multinational license plate recognition [23] J. Redmon and A. Farhadi, ‘‘YOLO9000: Better, faster, stronger,’’ in Proc.
using generalized character sequence detection,’’ IEEE Access, vol. 8, IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Honolulu, HI, USA,
pp. 35185–35199, 2020. Jul. 2017, pp. 6517–6525.
[24] M. A. Rafique, W. Pedrycz, and M. Jeon, ‘‘Vehicle license plate detection ZHANCHUAN CAI (Senior Member, IEEE)
using region-based convolutional neural networks,’’ Soft Comput., vol. 22, received the Ph.D. degree from Sun Yat-sen Uni-
no. 19, pp. 6429–6440, Oct. 2018. versity, Guangzhou, China, in 2007.
[25] H. Li and C. Shen, ‘‘Reading car license plates using deep convolutional From 2007 to 2008, he was a Visiting Scholar
neural networks and LSTMs,’’ 2016, arXiv:1601.05610. [Online]. Avail- with the University of Nevada at Las Vegas, NV,
able: http://arxiv.org/abs/1601.05610 USA. He is currently a Professor with the Faculty
[26] H. Li, P. Wang, and C. Shen, ‘‘Toward end-to-end car license plate of Information Technology, Macau University of
detection and recognition with deep neural networks,’’ IEEE Trans. Intell.
Science and Technology, Macau, China, where he
Transp. Syst., vol. 20, no. 3, pp. 1126–1136, Mar. 2019.
is also with the State Key Laboratory of Lunar and
[27] K. Simonyan and A. Zisserman, ‘‘Very deep convolutional networks for
large-scale image recognition,’’ in Proc. Int. Conf. Learn. Represent., San Planetary Sciences. His research interests include
Diego, CA, USA, 2015, pp. 1–14. image processing and computer graphics, intelligent information processing,
[28] J. Long, E. Shelhamer, and T. Darrell, ‘‘Fully convolutional networks multimedia information security, and remote sensing data processing and
for semantic segmentation,’’ in Proc. IEEE Conf. Comput. Vis. Pattern analysis.
Recognit. (CVPR), Boston, MA, USA, Jun. 2015, pp. 3431–3440. Dr. Cai is a member of the Association for Computing Machinery,
[29] H. Li, P. Wang, and C. Shen, ‘‘Towards end-to-end text spotting with con- the Chang’e-3 Scientific Data Research and Application Core Team, and
volutional recurrent neural networks,’’ in Proc. IEEE Int. Conf. Comput. the Asia Graphics Association. He is also a Distinguished Member of the
Vis. (ICCV), Venice, Italy, Oct. 2017, pp. 5248–5256. China Computer Federation. He was a recipient of the Third prize of the
[30] X. Zhou, C. Yao, H. Wen, Y. Wang, S. Zhou, W. He, and J. Liang, Macau Science and Technology Award-Natural Science Award, in 2012,
‘‘EAST: An efficient and accurate scene text detector,’’ in Proc. IEEE Conf. the BOC Excellent Research Award from the Macau University of Science
Comput. Vis. Pattern Recognit. (CVPR), Honolulu, HI, USA, Jul. 2017, and Technology, in 2016, the Third Prize of the Macau Science and Technol-
pp. 2642–2651. ogy Award-Technological Invention Award, in 2018, and the Second Prize
[31] P. He, W. Huang, T. He, Q. Zhu, Y. Qiao, and X. Li, ‘‘Single shot text of the Teaching Achievement Award from the Macau University of Science
detector with regional attention,’’ in Proc. IEEE Int. Conf. Comput. Vis.
and Technology, in 2020.
(ICCV), Venice, Italy, Oct. 2017, pp. 3066–3074.
[32] L. Xing, Z. Tian, W. Huang, and M. Scott, ‘‘Convolutional character
networks,’’ in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Seoul,
South Korea, Oct. 2019, pp. 9125–9135.
[33] Z. Tian, C. Shen, H. Chen, and T. He, ‘‘FCOS: Fully convolutional one-
stage object detection,’’ in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV),
Seoul, South Korea, Oct. 2019, pp. 9626–9635.
[34] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Deep residual learning for image
recognition,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
Las Vegas, NV, USA, Jun. 2016, pp. 770–778.
[35] Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, and D. Ren, ‘‘Distance-IoU
loss: Faster and better learning for bounding box regression,’’ 2019,
arXiv:1911.08287. [Online]. Available: http://arxiv.org/abs/1911.08287
[36] W. Zhou, H. Li, Y. Lu, and Q. Tian, ‘‘Principal visual word discovery for
automatic license plate detection,’’ IEEE Trans. Image Process., vol. 21, TING LAN received the M.S. degree from the
no. 9, pp. 4269–4279, Sep. 2012. University of Macau, Macau, China, in 2014, and
the Ph.D. degree from the Macau University of
Science and Technology, Macau, in 2019.
He is currently a Postdoctoral Fellow with the
Faculty of Information Technology, Macau Uni-
QIUYING HUANG received the M.S. degree versity of Science and Technology. His research
in software engineering from the Beijing Insti- interests include image processing, data process-
tute of Technology, Beijing, China, in 2013. He ing and analysis, and computer graphics.
is currently pursuing the Ph.D. degree with the Dr. Lan was a recipient of the First Prize at
Faculty of Information Technology, Macau Uni- the 14th China Postgraduate Mathematical Contest in Modeling, China
versity of Science and Technology, Macau, China. Academic Degrees, and the Graduate Education Development Center and
His research interests include image processing China Graduate Mathematical Contest in Modeling Committee, in 2017, and
and artificial intelligence. the Third Prize of the Macau Science and Technology Award-Technological
Invention Award, in 2018.