0% found this document useful (0 votes)
32 views13 pages

On The Cross-Dataset Generalization in License Plate Recognition

Uploaded by

Rayson Laroca
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views13 pages

On The Cross-Dataset Generalization in License Plate Recognition

Uploaded by

Rayson Laroca
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

On the Cross-Dataset Generalization in License Plate Recognition

Rayson Laroca1 , Everton V. Cardoso1 , Diego R. Lucio1 ,


Valter Estevam1,2 , and David Menotti1
1 Federal
University of Paraná, Curitiba, Brazil
2 Federal
Institute of Paraná, Irati, Brazil
{rblsantos, evcardoso, drlucio, vlejunior, menotti}@inf.ufpr.br, [email protected]

Keywords: Deep Learning, Leave-one-dataset-out, License Plate Recognition, Optical Character Recognition

Abstract: Automatic License Plate Recognition (ALPR) systems have shown remarkable performance on license
plates (LPs) from multiple regions due to advances in deep learning and the increasing availability of datasets.
The evaluation of deep ALPR systems is usually done within each dataset; therefore, it is questionable if such
results are a reliable indicator of generalization ability. In this paper, we propose a traditional-split versus
leave-one-dataset-out experimental setup to empirically assess the cross-dataset generalization of 12 Optical
Character Recognition (OCR) models applied to LP recognition on nine publicly available datasets with a great
variety in several aspects (e.g., acquisition settings, image resolution, and LP layouts). We also introduce a
public dataset for end-to-end ALPR that is the first to contain images of vehicles with Mercosur LPs and the
one with the highest number of motorcycle images. The experimental results shed light on the limitations of
the traditional-split protocol for evaluating approaches in the ALPR context, as there are significant drops in
performance for most datasets when training and testing the models in a leave-one-dataset-out fashion.

1 INTRODUCTION 2017; Henry et al., 2020; Wang et al., 2022).


Deep ALPR systems have shown remarkable per-
The global automotive industry expects to produce formance on LPs from multiple regions due to ad-
more than 82 million light vehicles in 2022 alone, de- vances in deep learning and the increasing availabil-
spite the ongoing coronavirus pandemic and chip sup- ity of datasets (Henry et al., 2020; Silva and Jung,
ply issues (Forbes, 2021; IHS Markit, 2021). In addi- 2022). In the past, the evaluation of ALPR systems
tion to bringing convenience to owners, vehicles also used to be done within each of the chosen datasets,
significantly modify the urban environment, posing i.e., the proposed methods were trained and evalu-
challenges concerning pollution, privacy and security ated on different subsets from the same dataset. Such
– especially in large urban centers. The constant mon- an evaluation was carried out independently for each
itoring of vehicles through computational techniques dataset. Recently, considering that deep models can
is of paramount importance and, therefore, it has been take considerable time to be trained (especially on
a frequent research topic. In this context, Automatic low- or mid-end GPUs), the authors have adopted a
License Plate Recognition (ALPR) systems (Weihong protocol where the proposed models are trained once
and Jiaoyang, 2020; Lubna et al., 2021) stand out. on the union of the training images from the chosen
ALPR systems exploit image processing and pat- datasets and evaluated individually on the respective
tern recognition techniques to detect and recognize test sets (Selmi et al., 2020; Laroca et al., 2021b). Al-
the characters on license plates (LPs) from images or though the images for training and testing belong to
videos. Some practical applications for an ALPR sys- disjoint subsets, these protocols do not make it clear
tem are road traffic monitoring, toll collection, and ve- whether the evaluated models have good generaliza-
hicle access control in restricted areas (Špaňhel et al., tion ability, i.e., whether they perform well on im-
ages from other scenarios, mainly due to domain di-
This is an author-prepared version of a paper accepted for presentation vergence and data selection bias (Torralba and Efros,
at the International Conference on Computer Vision Theory and Applications
2011; Tommasi et al., 2017; Zhang et al., 2019).
(VISAPP) 2022. The published version is available at the SciTePress Digital
Library (DOI: 10.5220/0010846800003124). In this regard, many computer vision researchers
have carried out cross-dataset experiments – where last information is relevant because motorcycle LPs
training and testing data come from different have two rows of characters, which is a challenge for
sources – to assess whether the proposed models per- sequential/recurrent-based methods (Silva and Jung,
form well on data from an unknown domain (Ashraf 2022), and therefore have been overlooked in the eval-
et al., 2018; Zhang et al., 2019; Estevam et al., 2021). uation of LP recognition models (see Section 2).
However, as far as we know, there is no work focused Our paper has two main contributions:
on such experimental settings in the ALPR context.
• A traditional-split versus leave-one-dataset-out
Considering the above discussion, in this work we experimental setup that can be considered a valid
evaluate for the first time various Optical Character testbed for cross-dataset generalization methods
Recognition (OCR) models for LP recognition in a proposed in future works on ALPR. We present
leave-one-dataset-out experimental setup over nine a comparative assessment of 12 OCR models
public datasets with different characteristics. The for LP recognition on nine publicly available
results obtained are compared with those achieved datasets. The main findings were that (i) there are
when training the models in the same way as in significant drops in performance for most datasets
recent works, that is, using the union of the training when training and testing the recognition models
set images from all datasets (hereinafter, this protocol in a leave-one-dataset-out fashion, especially
is referred to as traditional-split). when there are different fonts of characters in the
Deep learning-based ALPR systems have often training and test images; (ii) no model achieved
achieved recognition rates above 99% in existing the best result in all experiments, with 6 different
datasets under the traditional-split protocol (some models reaching the best result in at least one
examples are provided in Section 2). However, in dataset under the leave-one-dataset-out protocol;
real-world applications, new cameras are regularly and (iii) the proposed dataset proved very chal-
being installed in new locations without existing lenging, as both the models trained by us and two
systems being retrained as often, which can dramati- commercial systems failed to reach recognition
cally decrease the performance of those models. A rates above 70% on its test set images.
leave-one-dataset-out protocol enables simulating
this specific scenario and providing an adequate • A public dataset with 20,000 images acquired in
evaluation of the generalizability of the models. real-world scenarios, being half of them of vehi-
ALPR is commonly divided into two tasks: LP de- cles with Mercosur LPs. Indeed, one of the objec-
tection and LP recognition. The former refers to lo- tives of this work is to provide a reliable source of
cating the LP region in the input image, while the lat- information about Mercosur LPs, as much news
ter refers to extracting the string related to the LP. In – often outdated – has been used as references.
this work, we focus on the LP recognition stage since The remainder of this paper is organized in the fol-
it is the current bottleneck of ALPR systems (Laroca lowing manner. In Section 2, we briefly review related
et al., 2021b). Thus, we simply train the off-the-shelf works. The RodoSol-ALPR dataset is introduced in
YOLOv4 model (Bochkovskiy et al., 2020) to detect Section 3. The setup adopted in our experiments is
the LPs in the input images. For completeness, we thoroughly described in Section 4. Section 5 presents
also report the results achieved in this stage on both the results achieved. Finally, Section 6 concludes the
of the aforementioned protocols. paper and outlines future directions of research.
As part of this work, we introduce a publicly avail-
able dataset, called RodoSol-ALPR1 , that contains
20,000 images captured at toll booths installed on a 2 RELATED WORK
Brazilian highway. It has images of two different LP
layouts: Brazilian and Mercosur2 , with half of the ve- In this section, we first present concisely recent works
hicles being motorcycles (see details in Section 3). on LP recognition. Then, we situate the current state
To the best of our knowledge, this is the first public of ALPR research in terms of cross-dataset experi-
dataset for ALPR with images of Mercosur LPs and ments, Mercosur LPs, and motorcycle LPs.
the largest in the number of motorcycle images. This
The good speed/accuracy trade-off provided by
1 The YOLO networks (Redmon et al., 2016; Bochkovskiy
RodoSol-ALPR dataset is publicly available to
the research community at https://github.com/raysonlaroca/
et al., 2020) inspired many authors to explore similar
rodosol-alpr-dataset/ architectures targeting real-time performance for LP
2 Mercosur (Mercado Común del Sur, i.e., Southern recognition. For example, Silva and Jung (2020) pro-
Common Market in Castilian) is an economic and political posed a YOLO-based model to simultaneously detect
bloc comprising Argentina, Brazil, Paraguay and Uruguay. and recognize all characters within a cropped LP. This
model, called CR-NET, consists of the first eleven et al., 2018) and tested them on images from other
layers of YOLO and four other convolutional layers datasets that also contain only Chinese LPs. In this
added to improve non-linearity. Impressive results case, it is not clear whether the proposed models per-
were achieved through CR-NET both in the original form well on LPs from other regions. In fact, the au-
work and in more recent ones (Laroca et al., 2021b; thors trained another instance of the respective mod-
Oliveira et al., 2021; Silva and Jung, 2022). els to evaluate them in the AOLP dataset (Hsu et al.,
While Kessentini et al. (2019) applied the 2013), which contains LPs from the Taiwan region.
YOLOv2 model without any change or refinement Recently, Mercosur countries adopted a unified
to this task, Henry et al. (2020) used a modified standard of LPs for newly purchased vehicles, in-
version of YOLOv3 that includes spatial pyramid spired by the integrated system adopted by European
pooling. Although these two models achieved high Union countries many years ago. Although the new
recognition rates in multiple datasets, they are very standard has been implemented in all countries in the
deep for LPs recognition, making it difficult to meet bloc, there is still no public dataset for ALPR with
the real-time requirements of ALPR applications. images of Mercosur LPs as far as we know.
Rather than exploring object detectors, Zou et al. In this sense, Silvano et al. (2021) presented a
(2020) adopted a bi-directional Long Short-Term methodology that couples synthetic images of Mer-
Memory (LSTM) network to implicitly locate the cosur LPs with real-world images containing vehicles
characters on the LP. They explored a 1-D attention with other LP layouts. A model trained exclusively
module to extract useful features of the character with synthetic images achieved promising results on
regions, improving the accuracy of LP recognition. 1,000 real images from various sources; however, it is
In a similar way, Zhang et al. (2021) used a 2-D difficult to assess these results accurately since the test
attention mechanism to optimize their recognition images were not made available to the research com-
model, which uses a 30-layer Convolutional Neural munity. The LP recognition stage was not addressed.
Network (CNN) based on Xception for feature ex- Despite the fact that motorcycles are one of the
traction. An LSTM model was adopted to decode the most popular transportation means in metropolitan
extracted features into LP characters. areas (Hsu et al., 2015), they have been largely
There are also several works where multi-task net- overlooked in ALPR research. There are even works
works were designed to holistically process the en- where images of motorcycles were excluded from the
tire LP image and, thus, avoid character segmentation, experiments (Gonçalves et al., 2018; Silva and Jung,
such as (Špaňhel et al., 2017; Gonçalves et al., 2019). 2020), mainly because LPs of motorcycles usually
As these networks employ fully connected layers as have two rows of characters, which are challenging
classifiers to recognize the characters on the prede- to sequential/recurrent-based methods (Kessentini
fined positions of the LPs, they may not generalize et al., 2019; Silva and Jung, 2022), and also because
well with small-scale training sets since the probabil- they are generally smaller in size (having less space
ity of a specific character appearing in a specific po- between characters) and are often tilted.
sition is low. To deal with this, Wang et al. (2022) In this regard, there is a great demand for a public
proposed a weight-sharing classifier, which is able to dataset for end-to-end ALPR with the same number
spot instances of each character across all positions. of images of cars and motorcycles to give equal im-
Considering that the recognition rates achieved portance to LPs with one or two rows of characters in
under the traditional-split protocol have significantly the assessment of ALPR systems.
increased in recent years, some authors began to
conduct small cross-dataset experiments to analyze
the generalization ability of the proposed methods. 3 RODOSOL-ALPR DATASET
For example, Silva and Jung (2020); Laroca et al.
(2021b) used all 108 images from the OpenALPR-EU The RodoSol-ALPR dataset contains 20,000 images
dataset for testing, rather than using some for train- captured by static cameras located at pay tolls owned
ing/validation. Nevertheless, the results achieved in by the Rodovia do Sol (RodoSol) concessionaire (Ro-
so few test images are susceptible to tricks, especially doSol, 2022) (hence the name of the dataset), which
considering that heuristic rules were explored to operates 67.5 kilometers of a highway (ES-060) in the
improve the LP recognition results in both works. Brazilian state of Espírito Santo.
As another example, Zou et al. (2020); Zhang As can be seen in Figure 1, there are images of dif-
et al. (2021); Wang et al. (2022) trained their recogni- ferent types of vehicles (e.g., cars, motorcycles, buses
tion models specifically for Chinese LPs on approx- and trucks), captured during the day and night, from
imately 200K images from the CCPD dataset (Xu distinct lanes, on clear and rainy days, and the dis-
Figure 1: Some images extracted from the RodoSol-ALPR dataset. The first and second rows show images of cars and
motorcycles, respectively, with Brazilian LPs (i.e., the standard used in Brazil before the adoption of the Mercosur standard).
The third and fourth rows show images of cars and motorcycles, respectively, with Mercosur LPs. We show a zoomed-in
version of the vehicle’s LP in the lower right region of the images in the last column for better viewing of the LP layouts.

tance from the vehicle to the camera varies slightly. SSIG-SegPlate (Gonçalves et al., 2016) and UFPR-
All images have a resolution of 1,280 × 720 pixels. ALPR (Laroca et al., 2018) datasets. We preserved
An important feature of the proposed dataset is the percentage of samples for each vehicle type and
that it has images of two different LP layouts: Brazil- LP layout; for example, there are 2,000 images of
ian and Mercosur. To maintain consistency with pre- cars with Brazilian LPs in each of the training and test
vious works (Izidio et al., 2020; Oliveira et al., 2021; sets, and 1,000 images in the validation one. For re-
Silva and Jung, 2022), we refer to “Brazilian” as the producibility purposes, the subsets generated are ex-
standard used in Brazil before the adoption of the plicitly available along with the proposed dataset.
Mercosur standard. All Brazilian LPs consist of three Every image has the following information avail-
letters followed by four digits, while the initial pattern able in a text file: the vehicle’s type (car or motorcy-
adopted in Brazil for Mercosur LPs consists of three cle), the LP’s layout (Brazilian or Mercosur), its text
letters, one digit, one letter and two digits, in that or- (e.g., ABC-1234), and the position (x, y) of each of
der. In both layouts, car LPs have seven characters its four corners. We labeled the corners instead of just
arranged in one row, whereas motorcycle LPs have the LP bounding box to enable the training of methods
three characters in one row and four characters in an- that explore LP rectification, as well as the application
other. Even though these LP layouts are very similar of a wider range of data augmentation techniques.
in shape and size, there are considerable differences The datasets for ALPR are generally very unbal-
in their colors and characters’ fonts. anced in terms of character classes due to LP alloca-
The 20,000 images are divided as follows: 5,000 tion policies (Zhang et al., 2021). In Brazil, for ex-
images of cars with Brazilian LPs; 5,000 images of ample, one letter can appear much more often than
motorcycles with Brazilian LPs; 5,000 images of cars others according to the state in which the LP was is-
with Mercosur LPs; and 5,000 images of motorcy- sued (Gonçalves et al., 2018; Laroca et al., 2018).
cles with Mercosur LPs. For the sake of simplicity This information must be taken into account when
of definitions, here “car” refers to any vehicle with training recognition models in order to avoid unde-
four wheels or more (e.g., passenger cars, vans, buses, sirable biases – this is usually done through data aug-
trucks, among others), while “motorcycle” refers to mentation techniques (Zhang et al., 2021; Hasnat and
both motorcycles and motorized tricycles. As far as Nakib, 2021); for example, a network trained exclu-
we know, RodoSol-ALPR is the public dataset for sively in our dataset may learn to always classify the
ALPR with the highest number of motorcycle images. first character as ‘P’ in cases where it should be ‘B’
We randomly split the RodoSol-ALPR dataset as or ‘R’ since it appears much more often in this posi-
follows: 8,000 images for training; 8,000 images for tion than these two characters (see Figure 2).
testing; and 4,000 images for validation, following Regarding privacy concerns related to our dataset,
the split protocol (i.e., 40%/40%/20%) adopted in the we remark that in Brazil the LPs are related to the re-
Table 1: OCR models explored in our experiments.
1st char
2nd char Model Original Application
8000 3rd char
Framework: PyTorch3
4th char
5th char
R2 AM (Lee and Osindero, 2016) Scene Text Recognition
6th char RARE (Shi et al., 2016) Scene Text Recognition
6000 7th char STAR-Net (Liu et al., 2016) Scene Text Recognition
# instances

CRNN (Shi et al., 2017) Scene Text Recognition


GRCNN (Wang and Hu, 2017) Scene Text Recognition
4000 Rosetta (Borisyuk et al., 2018) Scene Text Recognition
TRBA (Baek et al., 2019) Scene Text Recognition
ViTSTR-Base (Atienza, 2021) Scene Text Recognition
2000 Framework: Keras4
Holistic-CNN (Špaňhel et al., 2017) License Plate Recognition
Multi-task (Gonçalves et al., 2019) License Plate Recognition
A B C D E F G H I J K L MNO P Q R S T U VWX Y Z 0 1 2 3 4 5 6 7 8 9 Framework: Darknet5
character class
CR-NET (Silva and Jung, 2020) License Plate Recognition
Figure 2: The distribution of character classes in the Fast-OCR (Laroca et al., 2021a) Image-based Meter Reading
RodoSol-ALPR dataset. Observe that there is a significant
imbalance in the distribution of the letters (due to LP allo-
cation policies), whereas the digits are well balanced. These models were chosen/implemented by us for
two main reasons: (i) they have been employed for
spective vehicles, i.e., no public information is avail- OCR tasks with promising/impressive results (Baek
able about the vehicle drivers/owners (Presidência da et al., 2019; Atienza, 2021; Laroca et al., 2021a), and
República, 1997; Oliveira et al., 2021). Moreover, all (ii) we believe we have the necessary knowledge to
human faces (e.g., drivers or RodoSol’s employees) train/adjust them in the best possible way in order
were manually redacted (i.e., blurred) in each image. to ensure fairness in our experiments, as the authors
provided enough details about the architectures used,
and also because we designed/employed similar net-
4 EXPERIMENTS works in previous works (even the same ones in some
cases) (Gonçalves et al., 2018, 2019; Laroca et al.,
2019, 2021a). Note that we are not aware of any
In this section, we describe the setup adopted in our
work in the ALPR literature where so many recog-
experiments. We first list the models we implemented
nition models were explored in the experiments.
for our assessments, explaining why they were cho-
sen and not others. Afterward, we provide the im-
plementation details, for example, which framework The CR-NET and Fast-OCR models are based
was used to train/test each model and the respective on the YOLO object detector (Redmon et al., 2016).
hyperparameters. We then present and briefly de- Thus, they are trained to predict 35 classes (0-9, A-Z,
scribe the datasets used in our experiments, as well where ‘O’ and ‘0’ are detected/recognized jointly)
as the data augmentation techniques explored to avoid using the bounding box of each LP character as
overfitting. Lastly, we detail the evaluation protocols input. Although these methods have been attaining
adopted by us, that is, which images from each dataset impressive results, they require laborious data anno-
were used for training or testing in each experiment, tations, i.e., each character’s bounding box needs to
and how we evaluate the performance of each method. be labeled for training them (Wang et al., 2022). All
the other 10 models, on the other hand, output the LP
4.1 Methods characters in a segmentation-free manner, i.e., they
predict the characters holistically from the LP region
In this work, we evaluate 12 OCR models for LP without the need to detect/segment them. According
recognition: RARE (Shi et al., 2016), R2 AM (Lee to previous works (Gonçalves et al., 2018; Atienza,
and Osindero, 2016), STAR-Net (Liu et al., 2016), 2021; Hasnat and Nakib, 2021), the generalizability
CRNN (Shi et al., 2017), GRCNN (Wang and Hu, of such segmentation-free models tends to improve
2017), Holistic-CNN (Špaňhel et al., 2017), Multi- significantly through the use of data augmentation.
task (Gonçalves et al., 2019), Rosetta (Borisyuk et al.,
2018), TRBA (Baek et al., 2019), CR-NET (Silva and
Jung, 2020), Fast-OCR (Laroca et al., 2021a), and
ViTSTR-Base (Atienza, 2021). Table 1 presents an 3 https://github.com/roatienza/deep-text-recognition-
overview of these methods, listing the original OCR benchmark/
application for which they were designed as well as 4 https://keras.io/

the framework we used to train and evaluate them. 5 https://github.com/AlexeyAB/darknet/


4.2 Setup were introduced over the last 23 years and have
considerable diversity in terms of the number of
All experiments were carried out on a computer with images, acquisition settings, image resolution, and
an AMD Ryzen Threadripper 1920X 3.5GHz CPU, LP layouts. As far as we know, there is no other
96 GB of RAM (2133 MHz), HDD 7200 RPM, and work in the ALPR literature where experiments were
an NVIDIA Quadro RTX 8000 GPU (48 GB). carried out on images from so many public datasets.
Although run-time analysis is considered a criti- Table 2: The datasets used in our experiments. In this work,
cal factor in the ALPR literature (Lubna et al., 2021), the “Chinese” layout refers to LPs of vehicles registered in
we consider such analysis beyond the scope of this mainland China, while the “Taiwanese” layout refers to LPs
work since we used different frameworks to imple- of vehicles registered in the Taiwan region.
ment the recognition models and there are probably Dataset Year Images Resolution LP Layout
differences in implementation and optimization be- Caltech Cars 1999 126 896 × 592 American
EnglishLP 2003 509 640 × 480 European
tween them – we implemented each method using UCSD-Stills 2005 291 640 × 480 American
either the framework where it was originally imple- ChineseLP 2012 411 Various Chinese
AOLP 2013 2049 Various Taiwanese
mented or well-known public repositories. For exam- OpenALPR-EU 2016 108 Various European
ple, the YOLO-based models were implemented us- SSIG-SegPlate 2016 2000 1920 × 1080 Brazilian
UFPR-ALPR 2018 4500 1920 × 1080 Brazilian
ing Darknet5 while the models originally proposed for RodoSol-ALPR 2022 20000 1280 × 720 Brazilian/Mercosur
scene text recognition were trained and evaluated us-
ing a fork3 of the open source repository of Clova AI
Figure 3 shows the diversity of the chosen datasets
Research (PyTorch) used to record the 1st place of IC-
in terms of LP layouts. It is clear that even LPs from
DAR2013 focused scene text and ICDAR2019 ArT,
the same country can be quite different, e.g., the Cal-
and 3rd place of ICDAR2017 COCO-Text and IC-
tech Cars and UCSD-Stills datasets were collected in
DAR2019 ReCTS (task1) (Baek et al., 2019).
the same region (California, United States), but they
For completeness, below we list the hyperpa- have images of LPs with significant differences in
rameters used in each framework for training the terms of colors, aspect ratios, backgrounds, and the
OCR models; we remark that these hyperparameters number of characters. It can also be observed that
were defined based on previous works fas well as some datasets have LPs with two rows of characters
on experiments performed in the validation set. and that the LPs may be tilted or have low resolution
In Darknet, we employed the following parame- due to camera quality or vehicle-to-camera distance.
ters: Stochastic Gradient Descent (SGD) optimizer,
90K iterations (max batches), batch size = 64, and
learning rate = [10-3 , 10-4 , 10-5 ] with decay steps
In order to eliminate biases from the public
at 30K and 60K iterations. In Keras, we used the
datasets, we also used 772 images from the internet –
Adam optimizer, initial learning rate = 10-3 (with Re-
those labeled and provided by Laroca et al. (2021b) –
duceLROnPlateau’s patience = 5 and factor = 10-1 ),
to train all models. These images include 257 Ameri-
batch size = 64, max epochs = 100, and patience = 11
can LPs, 347 Chinese LPs, and 178 European LPs.
(patience refers to the number of epochs with no
We chose not to use two datasets introduced recently:
improvement after which training is stopped). In
KarPlate (Henry et al., 2020) and CCPD (Xu et al.,
PyTorch, we adopted the following parameters:
2018). The former cannot currently be downloaded
Adadelta optimizer, whose decay rate is set to
due to legal problems. The latter, although already
ρ = 0.99, 300K iterations, and batch size = 128.
available, was not employed for two main reasons:
(i) it contains highly compressed images, which
4.3 Datasets significantly compromises the readability of the
LPs (Silva and Jung, 2022); and (ii) it has some large
Our experiments were conducted on images from the errors in the corners’ annotations (Meng et al., 2020)
RodoSol-ALPR dataset and eight publicly available – this was somewhat expected since the corners were
datasets that are often employed to benchmark labeled automatically using RPnet (Xu et al., 2018).
ALPR algorithms: Caltech Cars (Weber, 1999), Additionally, we could not download the CLPD
EnglishLP (Srebrić, 2003), UCSD-Stills (Dlagnekov dataset (Zhang et al., 2021), as the authors made
and Belongie, 2005), ChineseLP (Zhou et al., 2012), it available exclusively through a Chinese website
AOLP (Hsu et al., 2013), OpenALPR-EU (Ope- where registration – using a Chinese phone number
nALPR Inc., 2016), SSIG-SegPlate (Gonçalves or identity document – is required (we contacted the
et al., 2016), UFPR-ALPR (Laroca et al., 2018). authors requesting an alternative link to download the
Table 2 shows an overview of these datasets. They dataset, but have not received a response so far).
using the labels provided by Laroca et al. (2021b).
(a) Caltech Cars

(b) EnglishLP

(c) UCSD-Stills

(d) ChineseLP Figure 4: Illustration of the character permutation-based


data augmentation technique (Gonçalves et al., 2018) we
adopted to avoid overfitting. The images in the first row are
(e) AOLP the originals, while the others were generated automatically.

In this process, we do not enforce the generated


(f) OpenALPR-EU LPs to have the same arrangement of letters and dig-
its of the original LPs so that the recognition models
do not memorize specific patterns from different LP
(g) SSIG-SegPlate layouts. For example, as described in Section 3, all
Brazilian LPs consist of 3 letters followed by 4 dig-
its, while Mercosur LPs have 3 letters, 1 digit, 1 let-
(h) UFPR-ALPR ter and 2 digits, in that order. Considering that LPs
Figure 3: Some LP images from the public datasets used of these layouts are relatively similar (in size, shape,
in our experimental evaluation. We show some LP images etc.), the segmentation-free networks would probably
from the RodoSol-ALPR dataset in the last column of Fig 1.
predict 3 letters followed by 4 digits for most Merco-
4.3.1 Data Augmentation sur LP when holding the RodoSol-ALPR dataset out
in a leave-one-dataset-out evaluation, as none of the
other datasets have vehicles with Mercosur LPs.
As shown in Table 2, two-thirds of the images used in
our experiments are from the RodoSol-ALPR dataset.
In order to prevent overfitting, we initially balanced 4.4 Evaluation Protocols
the number of images from different datasets through
data augmentation techniques such as random crop- In our experiments, we consider both traditional-split
ping, random shadows, conversion to grayscale, and and leave-one-dataset-out protocols. In the following
random perturbations of hue, saturation and bright- subsections, we first describe them in detail. Then, we
ness. We used Albumentations (Buslaev et al., 2020), discuss how the performance evaluation is carried out.
which is a well-known Python library for image aug-
mentation, to apply these transformations. Neverthe- 4.4.1 Traditional Split
less, preliminary experiments showed that some of the
recognition models were prone to predict only LP pat- The traditional-split protocol assesses the ability of
terns that existed in the training set, as some patterns the models to perform well in seen scenarios, as each
were being fed numerous times per epoch to the net- model is trained on the union of the training set im-
works – especially from small-scale datasets, where ages from all datasets and evaluated on the test set
many images were created from a single original one. images from the respective datasets. In recent works,
Therefore, inspired by Gonçalves et al. (2018), we the authors have chosen to train a single model on
also randomly permuted the position of the characters images from multiple datasets (instead of training a
on each LP to eliminate such biases in the learning specific network for each dataset or LP layout as was
process (as illustrated in Figure 4). As the bounding commonly done in the past) so that the proposed mod-
box of each LP character is required to apply this data els are robust for different scenarios with considerably
augmentation technique – these annotations are very less manual effort since their parameters are adjusted
time-consuming and laborious – we do not augment only once for all datasets (Selmi et al., 2020; Laroca
the training images from the RodoSol-ALPR dataset. et al., 2021b; Silva and Jung, 2022).
We believe this is not a significant problem as the pro- For reproducibility, it is important to make clear
posed dataset is much larger than the others. The im- how we divided the images from each of the datasets
ages from the other public datasets were augmented to train, validate and test the chosen models. The
UCSD-Stills, SSIG-SegPlate, UFPR-ALPR and set of one dataset as the unseen data, and train every
RodoSol-ALPR datasets were split according to the model on all images from the other datasets. As an
protocols defined by the respective authors, while example, if AOLP’s test set is the current unseen
the other datasets, which do not have well-defined data, the models are trained on all images from
evaluation protocols, were divided following previ- Caltech Cars, EnglishLP, UCSD-Stills, ChineseLP,
ous works. In summary, as in (Xiang et al., 2019; OpenALPR-EU, SSIG-SegPlate, UFPR-ALPR and
Henry et al., 2020), the Caltech Cars dataset was RodoSol-ALPR, in addition to the images taken from
randomly split into 80 images for training/validation the internet and provided by Laroca et al. (2021b).
and 46 images for testing. Following (Panahi and We evaluate the models only on the test set images
Gholampour, 2017; Beratoğlu and Töreyin, 2021), from each unseen dataset, rather than including the
the EnglishLP dataset was randomly divided as training and validation images in the evaluation, so
follows: 80% of the images for training/validation that the results achieved by each model on a given
and 20% for testing. For the ChineseLP dataset, dataset are fully comparable with those achieved by
we employed the same protocol as Laroca et al. the same model under the traditional-split protocol.
(2021b): 40% of the images for training, 20% for
validation and 40% for testing. We split each of 4.4.3 Performance Evaluation
the three subsets of the AOLP dataset (i.e., AC, LE,
and RP) into training and test sets with a 2:1 ratio, As mentioned in Section 1, in our experiments, the
following (Xie et al., 2018; Liang et al., 2021), with LPs fed to the recognition models were detected using
20% of the training images being used for validation. YOLOv4 (Bochkovskiy et al., 2020) – with an input
Finally, as most works in the literature (Masood et al., size of 672 × 416 pixels – rather than cropped directly
2017; Laroca et al., 2021b; Silva and Jung, 2022), from the ground truth. This procedure was adopted to
we used all the 108 images from the OpenALPR-EU better simulate real-world scenarios, as the LPs will
dataset for testing (this division has been considered not always be detected perfectly, and certain OCR
as a mini leave-one-dataset-out evaluation in recent models are not as robust in cases where the region of
works). Table 3 lists the exact number of images used interest has not been detected so precisely (Gonçalves
for training, validating and testing the chosen models. et al., 2018). We employed the YOLOv4 model for
this task because impressive results are consistently
Table 3: An overview of the number of images from each being reported in the ALPR context through YOLO-
dataset used for training, validation, and testing.
based models (Weihong and Jiaoyang, 2020). Indeed,
Dataset Training Validation Testing Discarded Total
as detailed in Section 5, YOLOv4 reached an aver-
Caltech Cars 61 16 46 3 126
EnglishLP 326 81 102 0 509 age recall rate above 99.5% in our experiments (we
UCSD-Stills 181 39 60 11 291 considered as correct the detections with Intersection
ChineseLP 159 79 159 14 411
AOLP 1,093 273 683 0 2,049 over Union (IoU) ≥ 0.5 with the ground truth).
OpenALPR-EU 0 0 108 0 108 For each experiment, we report the number of cor-
SSIG-SegPlate 789 407 804 0 2,000
UFPR-ALPR 1,800 900 1,800 0 4,500 rectly recognized LPs divided by the number of LPs
RodoSol-ALPR 8,000 4,000 8,000 0 20,000 in the test set. A correctly recognized LP means that
all characters on the LP were correctly recognized, as
As also detailed in Table 3, a few images (0.01%) a single incorrectly recognized character can result in
were discarded in our experiments because it is im- the vehicle being incorrectly identified.
possible to recognize the LP(s) on them due to occlu- Note that the first character in Chinese LPs is
sion, lighting or image acquisition problems6 . Such a Chinese character that represents the province in
images were also discarded by Masood et al. (2017) which the vehicle is affiliated (Xu et al., 2018; Zhang
and Laroca et al. (2021b). et al., 2021). Even though Chinese LPs are used in
our experiments (see Figure 3d), the evaluated models
4.4.2 Leave-one-dataset-out were not trained/adjusted to recognize Chinese char-
acters; that is, only digits and English letters are con-
The leave-one-dataset-out protocol evaluates the sidered. This same procedure was adopted in previous
generalization performance of the trained models by works (Li et al., 2019; Selmi et al., 2020; Laroca et al.,
testing them on the test set of an unseen dataset; that 2021b) for several reasons, including scope reduction
is, no images from that dataset are available during and the fact that it is not trivial for non-Chinese speak-
training. For each experiment, we hold out the test ers to analyze the different Chinese characters in or-
der to make an accurate error analysis or to choose
6 The list of discarded images can be found at https://raysonlaroca. which data augmentation techniques to explore. Fol-
github.io/supp/visapp2022/discarded-images.txt lowing Li et al. (2019), we denoted all Chinese char-
acters as a single class ‘*’ in our experiments. Ac- no images are used for training even under the
cording to our results, the recognition models learned traditional-split protocol (see Table 3). We kept this
well the difference between Chinese characters and division for three main reasons: (i) to better evaluate
others – i.e., digits and English letters – and this pro- the recognition models on European LPs; (ii) to
cedure did not affect the recognition rates obtained. maintain consistency with previous works (Masood
et al., 2017; Laroca et al., 2021b; Silva and Jung,
2022), which also used all images from that dataset
for testing; and (iii) to analyze how the models
5 RESULTS AND DISCUSSION perform with more training data from other datasets,
which in this case corresponds to the leave-one-
dataset-out protocol since all images from the other
First, we report in Table 4 the recall rates obtained by datasets – and not just the training set ones – are
the YOLOv4 model in the LP detection stage. As can used for training. Although some studies have shown
be seen, it reached surprisingly good results in both that the performance on the test set of a particular
protocols. More specifically, recall rates above 99.9% dataset often decreases when the training data is
were achieved in 14 of the 18 assessments. As in augmented with data from other datasets (Torralba
previous works (Laroca et al., 2018; Gonçalves et al., and Efros, 2011; Khosla et al., 2012), the recognition
2018; Silva and Jung, 2020), the detection results are rates reached in the OpenALPR-EU dataset were
slightly worse for the UFPR-ALPR dataset due to its higher with more training data from other datasets.
challenging nature, as (i) it has images where the ve- In the same way, CR-NET performed better in the
hicles are considerably far from the camera; (ii) some EnglishLP dataset when using all images from the
of its frames have motion blur because the dataset was OpenALPR-EU dataset for training (both datasets
recorded in real-world scenarios where both the ve- contain images of European LPs).
hicle and the camera – inside another vehicle – are
moving; and (iii) it also contains images of motorcy- The average recognition rate across all datasets
cles, where the backgrounds can be much more com- decreased from 82.4% under the traditional-split pro-
plicated due to different body configurations and mix- tocol to 74.5% under the leave-one-dataset-out proto-
tures with other background scenes (Hsu et al., 2015). col. This drastic performance drop is accentuated by
Considering the discussion above, we assert that the poor results achieved on the EnglishLP and AOLP
deep models trained for LP detection on images from datasets under the leave-one-dataset-out protocol. For
multiple datasets can be employed quite reliably on instance, the average recognition rate of 90.8% ob-
images from unseen datasets (i.e., leave-one-dataset- tained in the AOLP dataset under the traditional-split
out protocol). Of course, this may not hold true in protocol drops to 62.7% under the leave-one-dataset-
extraordinary cases where the test set domain is very out protocol. These results caught us by surprise, as
different from training ones, but this was not the case both datasets have been considered to be relatively
in our experimental evaluation carried out on images simple due to the fact that recent works have reported
from nine datasets with different characteristics. recognition rates close to 97% for the EnglishLP and
Regarding the recognition stage, the results above 99% for the AOLP dataset (Henry et al., 2020;
achieved by all models across all datasets on the Laroca et al., 2021b; Silva and Jung, 2022; Wang
traditional-split and leave-one-dataset-out protocols et al., 2022). According to our analysis, most of
are shown in Table 5 and Table 6, respectively. the recognition errors under the leave-one-dataset-out
In Table 6, we included the results obtained protocol occurred due to differences in the fonts of the
by Sighthound (Masood et al., 2017) and Ope- LP characters in the training and test images, as well
nALPR (OpenALPR API, 2021), which are two as because of specific patterns in the LP (e.g., a coat of
commercial systems frequently used as baselines arms between the LP characters or a straight line un-
in the ALPR literature, since in principle they are der them). To better illustrate, Figure 5 shows three
trained on images from large-scale private datasets LPs from the AOLP dataset where the TRBA model,
and not from the public datasets explored here (i.e., which performed best on that dataset, recognized at
leave-one-dataset-out protocol). least one character incorrectly under the leave-one-
The first observation is that, as expected, the best dataset-out protocol but not under the traditional split.
results – on average for all models – were attained Such analysis highlights the importance of perform-
when training and evaluating the models on different ing cross-dataset experiments in the ALPR context.
subsets from the same datasets (i.e., traditional-split The second observation is that, regardless of the
protocol). The only case where this did not occur evaluation protocol adopted, no recognition model
was precisely in the OpenALPR-EU dataset, where achieved the best results in every single dataset we
Table 4: Recall rates obtained by YOLOv4 in the LP detection stage.
Test set Caltech Cars EnglishLP UCSD-Stills ChineseLP AOLP OpenALPR-EU SSIG-SegPlate UFPR-ALPR RodoSol-ALPR
Average
Approach # 46 # 102 # 60 # 161 # 687 # 108 # 804 # 1,800 # 8,000
YOLOv4 (traditional-split) 100.0% 100.0% 100.0% 100.0% 100.0% 100.0% 99.9% 99.1% 100.0% 99.9%
YOLOv4 (leave-one-dataset-out) 100.0% 100.0% 100.0% 100.0% 99.9% 99.1% 100.0% 96.8% 99.6% 99.5%

Table 5: Recognition rates obtained by all models under the traditional-split protocol, which assesses the ability of the models
to perform well in seen scenarios. Each model (rows) was trained once on the union of the training set images from all datasets
and evaluated on the respective test sets (columns). The best recognition rate achieved in each dataset is shown in bold.
Test set Caltech Cars EnglishLP UCSD-Stills ChineseLP AOLP OpenALPR-EU SSIG-SegPlate UFPR-ALPR RodoSol-ALPR
Average
Approach # 46 # 102 # 60 # 161 # 687 # 108 # 804 # 1,800 # 8,000
CR-NET (Silva and Jung, 2020) 95.7% 92.2% 100.0% 96.9% 97.7% 97.2% 97.1% 78.3% 55.8%‡ 90.1%
CRNN (Shi et al., 2017) 87.0% 81.4% 88.3% 88.2% 87.6% 89.8% 93.4% 64.9% 48.2% 81.0%
Fast-OCR (Laroca et al., 2021a) 93.5% 81.4% 95.0% 85.1% 95.8% 91.7% 87.1% 65.9% 49.7%‡ 82.8%
GRCNN (Wang and Hu, 2017) 93.5% 87.3% 91.7% 84.5% 85.9% 87.0% 94.3% 63.3% 48.4% 81.7%
Holistic-CNN (Špaňhel et al., 2017) 89.1% 68.6% 88.3% 90.7% 86.3% 78.7% 94.8% 70.3% 49.0% 79.5%
Multi-task (Gonçalves et al., 2019) 87.0% 62.7% 85.0% 86.3% 84.7% 66.7% 93.0% 65.3% 49.1% 75.5%
R2 AM (Lee and Osindero, 2016) 84.8% 70.6% 81.7% 87.0% 83.1% 63.9% 92.0% 66.9% 48.6% 75.4%
RARE (Shi et al., 2016) 91.3% 84.3% 90.0% 95.7% 93.4% 91.7% 93.7% 69.0% 51.6% 84.5%
Rosetta (Borisyuk et al., 2018) 87.0% 75.5% 81.7% 90.1% 83.7% 81.5% 94.3% 63.9% 48.7% 78.5%
STAR-Net (Liu et al., 2016) 95.7% 93.1% 96.7% 96.9% 96.8% 95.4% 96.1% 70.9% 51.8% 88.2%
TRBA (Baek et al., 2019) 91.3% 87.3% 96.7% 96.9% 99.0% 93.5% 97.3% 72.9% 59.6% 88.3%
ViTSTR-Base (Atienza, 2021) 84.8% 80.4% 90.0% 99.4% 95.6% 84.3% 96.1% 73.3% 49.3% 83.7%
Average 90.0% 80.4% 90.4% 91.5% 90.8% 85.1% 94.1% 68.7% 50.8% 82.4%
‡ Images from the RodoSol-ALPR dataset were not used for training the CR-NET and Fast-OCR models, as each character’s bounding box needs to be labeled for training them (as detailed in Section 4.1).

Table 6: Recognition rates obtained by all models under the leave-one-dataset-out protocol, which assesses the generalization
performance of the models by testing them on the test set of an unseen dataset. For each dataset (columns), we trained the
recognition models (rows) on all images from the other datasets. The best recognition rates achieved are shown in bold.
Test set Caltech Cars EnglishLP UCSD-Stills ChineseLP AOLP OpenALPR-EU SSIG-SegPlate UFPR-ALPR RodoSol-ALPR
Average
Approach # 46 # 102 # 60 # 161 # 687 # 108 # 804 # 1,800 # 8,000
CR-NET (Silva and Jung, 2020) 93.5% 96.1% 96.7% 88.2% 76.9% 96.3% 94.7% 61.8% 45.4% 83.3%
CRNN (Shi et al., 2017) 91.3% 62.7% 75.0% 76.4% 59.4% 88.0% 91.3% 61.7% 38.8% 71.6%
Fast-OCR (Laroca et al., 2021a) 93.5% 91.2% 95.0% 90.1% 77.0% 94.4% 91.2% 53.2% 47.8% 81.5%
GRCNN (Wang and Hu, 2017) 95.7% 65.7% 90.0% 80.7% 53.9% 88.9% 90.3% 60.8% 39.8% 74.0%
Holistic-CNN (Špaňhel et al., 2017) 80.4% 40.2% 73.3% 81.4% 59.7% 83.3% 93.4% 61.8% 33.4% 67.4%
Multi-task (Gonçalves et al., 2019) 82.6% 34.3% 66.7% 77.6% 50.8% 79.6% 89.9% 57.9% 44.8% 64.9%
R2 AM (Lee and Osindero, 2016) 89.1% 52.9% 66.7% 74.5% 52.5% 80.6% 93.5% 57.9% 40.7% 67.6%
RARE (Shi et al., 2016) 84.8% 50.0% 85.0% 88.8% 62.9% 91.7% 93.5% 71.3% 40.1% 74.2%
Rosetta (Borisyuk et al., 2018) 89.1% 63.7% 68.3% 83.2% 51.1% 81.5% 94.4% 61.8% 42.5% 70.6%
STAR-Net (Liu et al., 2016) 89.1% 80.4% 91.7% 95.0% 79.3% 93.5% 94.0% 69.1% 43.6% 81.8%
TRBA (Baek et al., 2019) 95.7% 66.7% 93.3% 95.0% 70.0% 92.6% 96.9% 73.2% 42.6% 80.7%
ViTSTR-Base (Atienza, 2021) 89.1% 58.8% 90.0% 95.0% 59.2% 89.8% 97.9% 69.6% 41.7% 76.8%
Average 89.5% 63.6% 82.6% 85.5% 62.7% 88.3% 93.4% 63.3% 41.8% 74.5%
Average (traditional-split protocol) 90.0% 80.4% 90.4% 91.5% 90.8% 85.1%† 94.1% 68.7% 50.8% 82.4%
Sighthound (Masood et al., 2017) 87.0% 94.1% 90.0% 84.5% 79.6% 94.4% 79.2% 52.6% 51.0% 79.2%
OpenALPR (OpenALPR API, 2021)∗ 95.7% 99.0% 96.7% 93.8% 81.1% 99.1% 91.4% 87.8% 70.0% 90.5%
† Underthe traditional-split protocol, no images from the OpenALPR-EU dataset were used for training. This is the protocol commonly adopted in the literature (Laroca et al., 2021b; Silva and Jung, 2022).
∗ OpenALPR contains specialized solutions for LPs from different regions and the user must enter the correct region before using its API. Hence, it was expected to achieve better results than the other methods.

different countries/regions, especially under the


leave-one-dataset-out protocol because six different
LODO: 7615BG LODO: PG379I LODO: 0X7655 models obtained the best result in at least one dataset.
Trad.: 7615RG Trad.: P63791 Trad.: DX7655
Figure 5: The predictions obtained by TRBA on three im- The third observation is that the RodoSol-ALPR
ages of the AOLP dataset. In general, the errors (outlined in dataset proved very challenging since all the recogni-
red) under the leave-one-dataset-out (LODO) protocol did tion models trained by us, as well as both commercial
not occur in challenging cases (e.g., blurry or tilted images); systems, failed to reach recognition rates above 70%
therefore, they were probably caused by differences in the
training and test images. Trad.: traditional-split protocol. on its test set images. The main reason for such dis-
appointing results is the large number of motorcycle
performed experiments on. For instance, although images, which are very challenging in nature (as dis-
the CR-NET model obtained the best average cussed in Section 2). For example, OpenALPR cor-
recognition rates, corroborating the state-of-the-art rectly recognized 3,772 of the 4,000 cars in the test
results reported recently in (Laroca et al., 2021b; set (94.3%) and only 1,827 of the 4,000 motorcycles
Silva and Jung, 2022), it did not reach the best in the test set (45.7%). These results accentuate the
results in the ChineseLP, AOLP, SSIG-SegPlate and importance of the proposed dataset for the accurate
RodoSol-ALPR datasets in either protocol. These evaluation of ALPR systems, as it avoids bias in the
results emphasize the importance of carrying out assessments by having the same number of “easy”
experiments on multiple datasets, with LPs from (cars with single-row LPs) and “difficult” (motorcy-
cles with two-row LPs) samples.
We also did not rule out challenging images when sess the cross-dataset generalization of the chosen
selecting the images for the creation of the dataset. models. It is noteworthy that we are not aware of any
Figure 6 shows some of these images along with the work in the ALPR context where so many methods
predictions returned by TRBA (traditional-split) and were implemented and compared or where so many
OpenALPR, which were the model trained by us and datasets were explored in the experiments.
the commercial system that performed better on this
dataset. The results are in line with what was recently As expected, the experimental results showed sig-
stated by Zhang et al. (2021): that recognizing LPs in nificant drops in performance for most datasets when
complex environments is still far from satisfactory. training and testing the recognition models in a leave-
one-dataset-out fashion. The fact that very low recog-
nition rates (around 63%) were reported in the En-
glishLP and AOLP datasets underscored the impor-
TRBA:HLPA594 TRBA:PPY6026 TRBA:QRE4E6Z tance of carrying out cross-dataset experiments, as
OpenALPR:HLP4594 OpenALPR:PPY6026 OpenALPR:QRE4E62
GT: HLP4594 GT: PPY6C26 GT: QRE4E62
very high recognition rates (above 95% and 99%, re-
spectively) are frequently achieved on these datasets
under the traditional-split protocol.
TRBA:QRG6D57 TRBA:OOM8060 TRBA:MRO3095
The importance of exploring various datasets in
OpenALPR:----- OpenALPR:OOM8060 OpenALPR:MRO3095
GT: QRG6D57 GT: ODM8060 GT: MRU3095
the evaluation was also demonstrated, as no model
performed better than the others in all experiments.
Figure 6: Some LP images from RodoSol-ALPR along with
the predictions returned by TRBA and OpenALPR. Note It was quite unexpected for us that six different mod-
that one character may become very similar to another due els reached the best result in at least one dataset under
to factors such as blur, low/high exposure, rotations and oc- the leave-one-dataset-out protocol. In this sense, we
clusions. For correctness, we checked if the ground truth draw attention to the fact that most works in the liter-
(GT) matched the vehicle make and model on the National ature used three or fewer datasets in the experiments,
Traffic Department of Brazil (DENATRAN) database. although this has been gradually changing in recent
Lastly, it is important to highlight the number of years (Selmi et al., 2020; Laroca et al., 2021b).
experiments we carried out for this work. We trained
We also introduced a publicly available dataset for
each of the 12 chosen OCR models 10 times: once
ALPR that, to the best of our knowledge, is the first
following the split protocols traditionally adopted in
to contain images of vehicles with Mercosur LPs. We
the literature (see Table 5) and nine for the leave-one-
expect it will assist in developing new approaches for
dataset-out evaluation (see Table 6). We remark that
this LP layout and the fair comparison between meth-
a single training process of some models (e.g., TRBA
ods proposed in different works. Additionally, the
and ViTSTR-Base) took several days to complete on
proposed dataset includes 10,000 motorcycle images,
an NVIDIA Quadro RTX 8000 GPU. In fact, we be-
being by far the largest in this regard. RodoSol-ALPR
lieve that this large number of necessary experiments
has proved challenging in our experiments, as both
is precisely what caused a leave-one-dataset-out eval-
the models trained by us and two commercial systems
uation to have not yet been performed in the literature.
reached recognition rates below 70% on its test set.

As future work, we plan to gather images from the


6 CONCLUSIONS internet to build a novel dataset for end-to-end ALPR
with images acquired in various countries/regions, by
As the performance of traditional-split LP recognition many different cameras, both static or mobile, with
is rapidly improving, researchers should pay more at- a well-defined evaluation protocol for both intra- and
tention to cross-dataset LP recognition since it better cross-dataset LP detection and LP recognition. In ad-
simulates real-world ALPR applications, where new dition, we intend to leverage the potential of Gener-
cameras are regularly being installed in new locations ative Adversarial Networks (GANs) to generate hun-
without existing systems being retrained every time. dreds of thousands of synthetic LP images with dif-
As a first step towards that direction, in this work ferent transformations and balanced character classes
we evaluated 12 OCR models for LP recognition on 9 in order to improve the generalization ability of deep
public datasets with a great variety in several aspects models. Finally, we would like to carry out more ex-
(e.g., acquisition settings, image resolution, and LP periments to quantify the influence of each dataset,
layouts). We adopted a traditional-split versus leave- especially RodoSol-ALPR, on the performance of the
one-dataset-out experimental setup to empirically as- models under the leave-one-dataset-out protocol.
ACKNOWLEDGMENTS resolution license plate recognition. In Iberoamerican
Congress on Pattern Recognition, pages 251–261.
This work was supported in part by the Coordina- Gonçalves, G. R. et al. (2018). Real-time automatic license
tion for the Improvement of Higher Education Per- plate recognition through deep multi-task networks. In
sonnel (CAPES) (Social Demand Program), and in Conference on Graphics, Patterns and Images (SIB-
GRAPI), pages 110–117.
part by the National Council for Scientific and Tech-
Hasnat, A. and Nakib, A. (2021). Robust license plate
nological Development (CNPq) (Grant 308879/2020- signatures matching based on multi-task learning ap-
1). The Quadro RTX 8000 GPU used for this research proach. Neurocomputing, 440:58–71.
was donated by the NVIDIA Corporation. We also Henry, C., Ahn, S. Y., and Lee, S.-W. (2020). Multinational
thank the Rodovia do Sol (RodoSol) concessionaire, license plate recognition using generalized character
particularly the information technology (IT) manager sequence detection. IEEE Access, 8:35185–35199.
Marciano Calvi Ferri, for providing the images for the Hsu, G., Zeng, S., Chiu, C., and Chung, S. (2015). A com-
creation of the RodoSol-ALPR dataset. parison study on motorcycle license plate detection. In
IEEE International Conference on Multimedia Expo
Workshops (ICMEW), pages 1–6.
Hsu, G. S., Chen, J. C., and Chung, Y. Z. (2013).
REFERENCES Application-oriented license plate recognition. IEEE
Trans. on Vehicular Technology, 62(2):552–561.
Ashraf, A., Khan, S. S., Bhagwat, N., and Taati, B. (2018). IHS Markit (2021). 2022 global light vehicle produc-
Learning to unlearn: Building immunity to dataset tion outlook intact. https://ihsmarkit.com/research-
bias in medical imaging studies. In Machine Learn- analysis/2022-global-light-vehicle-production-
ing for Health Workshop at NeurIPS 2018, pages 1–5. outlook.html.
Atienza, R. (2021). Vision transformer for fast and efficient Izidio, D. M. F. et al. (2020). An embedded automatic li-
scene text recognition. In International Conference on cense plate recognition system using deep learning.
Document Analysis and Recognition, pages 319–334. Design Automation for Embedded Systems, 24:23–43.
Baek, J. et al. (2019). What is wrong with scene text recog- Kessentini, Y., Besbes, M. D., Ammar, S., and Chabbouh,
nition model comparisons? dataset and model analy- A. (2019). A two-stage deep neural network for multi-
sis. In IEEE/CVF International Conference on Com- norm license plate detection and recognition. Expert
puter Vision (ICCV), pages 4714–4722. Systems with Applications, 136:159–170.
Beratoğlu, M. S. and Töreyin, B. U. (2021). Vehicle license Khosla, A., Zhou, T., Malisiewicz, T., Efros, A. A., and
plate detector in compressed domain. IEEE Access, Torralba, A. (2012). Undoing the damage of dataset
9:95087–95096. bias. In European Conference on Computer Vision
Bochkovskiy, A., Wang, C.-Y., and Liao, H.-Y. M. (2020). (ECCV), pages 158–171.
YOLOv4: Optimal speed and accuracy of object de- Laroca, R., Araujo, A. B., Zanlorensi, L. A., de Almeida,
tection. arXiv preprint, arXiv:2004.10934:1–14. E. C., and Menotti, D. (2021a). Towards image-
Borisyuk, F., Gordo, A., and Sivakumar, V. (2018). Rosetta: based automatic meter reading in unconstrained sce-
Large scale system for text detection and recognition narios: A robust and efficient approach. IEEE Access,
in images. In ACM SIGKDD International Conference 9:67569–67584.
on Knowledge Discovery & Data Mining, page 71–79. Laroca, R., Barroso, V., Diniz, M. A., Gonçalves, G. R.,
Buslaev, A. et al. (2020). Albumentations: Fast and flexible Schwartz, W. R., and Menotti, D. (2019). Convolu-
image augmentations. Information, 11(2):125. tional neural networks for automatic meter reading.
Dlagnekov, L. and Belongie, S. (2005). UCSD dataset. Journal of Electronic Imaging, 28(1):013023.
https://www.belongielab.org/car_data.html. Laroca, R. et al. (2018). A robust real-time automatic li-
Estevam, V., Laroca, R., Pedrini, H., and Menotti, D. cense plate recognition based on the YOLO detector.
(2021). Tell me what you see: A zero-shot action In International Joint Conference on Neural Networks
recognition method based on natural language de- (IJCNN), pages 1–10.
scriptions. arXiv preprint, arXiv:2112.09976:1–15. Laroca, R., Zanlorensi, L. A., Gonçalves, G. R., Todt, E.,
Forbes (2021). Poor Auto Financials Likely as Sales Schwartz, W. R., and Menotti, D. (2021b). An effi-
Sag, but Forecasts Point to Strong Turnaround. cient and layout-independent automatic license plate
https://www.forbes.com/sites/neilwinton/2021/10/ recognition system based on the YOLO detector. IET
10/poor-auto-financials-likely-as-sales-sag-but- Intelligent Transport Systems, 15(4):483–503.
forecasts-point-to-strong-turnaround/. Lee, C. and Osindero, S. (2016). Recursive recurrent nets
Gonçalves, G. R., da Silva, S. P. G., Menotti, D., and with attention modeling for OCR in the wild. In
Schwartz, W. R. (2016). Benchmark for license plate IEEE/CVF Conference on Computer Vision and Pat-
character segmentation. Journal of Electronic Imag- tern Recognition (CVPR), pages 2231–2239.
ing, 25(5):053034. Li, H., Wang, P., and Shen, C. (2019). Toward end-to-
Gonçalves, G. R., Diniz, M. A., Laroca, R., Menotti, D., and end car license plate detection and recognition with
Schwartz, W. R. (2019). Multi-task learning for low- deep neural networks. IEEE Transactions on Intelli-
gent Transportation Systems, 20(3):1126–1136.
Liang, J. et al. (2021). EGSANet: edge–guided sparse at- scenarios. IEEE Transactions on Intelligent Trans-
tention network for improving license plate detection portation Systems, 23(6):5693–5703.
in the wild. Applied Intelligence, 52(4):4458–4472. Silvano, G. et al. (2021). Synthetic image generation for
Liu, W., Chen, C., Kwan-Yee K. Wong, Z. S., and Han, J. training deep learning-based automated license plate
(2016). STAR-Net: A spatial attention residue net- recognition systems on the Brazilian Mercosur stan-
work for scene text recognition. In British Machine dard. Design Automation for Embedded Systems,
Vision Conference (BMVC), pages 1–13. 25(2):113–133.
Lubna, Mufti, N., and Shah, S. A. A. (2021). Automatic Špaňhel, J. et al. (2017). Holistic recognition of low qual-
number plate Recognition: A detailed survey of rele- ity license plates by CNN using track annotated data.
vant algorithms. Sensors, 21(9):3028. In IEEE International Conference on Advanced Video
Masood, S. Z. et al. (2017). License plate detection and and Signal Based Surveillance (AVSS), pages 1–6.
recognition using deeply learned convolutional neural Srebrić, V. (2003). EnglishLP database. https:
networks. arXiv preprint, arXiv:1703.07330. //www.zemris.fer.hr/projects/LicensePlates/english/
Meng, S., Zhang, Z., and Wan, Y. (2020). Accelerating au- baza_slika.zip.
tomatic license plate detection in the wild. In IEEE Tommasi, T., Patricia, N., Caputo, B., and Tuytelaars, T.
Joint International Information Technology and Arti- (2017). A deeper look at dataset bias. In Domain
ficial Intelligence Conference, pages 742–746. Adaptation in Computer Vision Applications, pages
Oliveira, I. O. et al. (2021). Vehicle-Rear: A new dataset 37–55. Springer.
to explore feature fusion for vehicle identification us- Torralba, A. and Efros, A. A. (2011). Unbiased look at
ing convolutional neural networks. IEEE Access, dataset bias. In IEEE Conference on Computer Vision
9:101065–101077. and Pattern Recognition (CVPR), pages 1521–1528.
OpenALPR API (2021). http://www.openalpr.com/. Wang, J. and Hu, X. (2017). Gated recurrent convolution
OpenALPR Inc. (2016). OpenALPR-EU dataset. neural network for OCR. In Annual Conference on
https://github.com/openalpr/benchmarks/tree/master/ Neural Information Processing Systems (NeurIPS).
endtoend/eu. Wang, Y. et al. (2022). Rethinking and designing a high-
Panahi, R. and Gholampour, I. (2017). Accurate detec- performing automatic license plate recognition ap-
tion and recognition of dirty vehicle plate numbers for proach. IEEE Transactions on Intelligent Transporta-
high-speed applications. IEEE Transactions on Intel- tion Systems, 23(7):8868–8880.
ligent Transportation Systems, 18(4):767–779. Weber, M. (1999). Caltech Cars dataset. https://data.
Presidência da República (1997). LEI Nº 9.503, DE caltech.edu/records/20084.
23 DE SETEMBRO DE 1997 - Código de Trânsito Weihong, W. and Jiaoyang, T. (2020). Research on license
Brasileiro. http://www.planalto.gov.br/ccivil_03/leis/ plate recognition algorithms based on deep learning in
l9503compilado.htm. complex environment. IEEE Access, 8:91661–91675.
Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. Xiang, H., Zhao, Y., Yuan, Y., Zhang, G., and Hu, X.
(2016). You only look once: Unified, real-time object (2019). Lightweight fully convolutional network for
detection. In IEEE Conference on Computer Vision license plate detection. Optik, 178:1185–1194.
and Pattern Recognition (CVPR), pages 779–788. Xie, L., Ahmad, T., Jin, L., Liu, Y., and Zhang, S. (2018).
RodoSol (2022). Concessionária Rodovia do Sol A new CNN-based method for multi-directional car
S/A. https://www.rodosol.com.br/blog/conheca-a- license plate detection. IEEE Transactions on Intelli-
rodosol-2. Accessed: 2022-02-06. gent Transportation Systems, 19(2):507–517.
Selmi, Z., Halima, M. B., Pal, U., and Alimi, M. A. (2020). Xu, Z. et al. (2018). Towards end-to-end license plate detec-
DELP-DAR system for license plate detection and tion and recognition: A large dataset and baseline. In
recognition. Pattern Recog. Letters, 129:213–223. European Conf. on Computer Vision, pages 261–277.
Shi, B., Bai, X., and Yao, C. (2017). An end-to-end train- Zhang, J., Li, W., Ogunbona, P., and Xu, D. (2019). Recent
able neural network for image-based sequence recog- advances in transfer learning for cross-dataset visual
nition and its application to scene text recognition. recognition: A problem-oriented perspective. ACM
IEEE Transactions on Pattern Analysis and Machine Computing Surveys, 52(1):1–38.
Intelligence, 39(11):2298–2304. Zhang, L., Wang, P., Li, H., Li, Z., Shen, C., and Zhang,
Shi, B., Wang, X., Lyu, P., Yao, C., and Bai, X. (2016). Y. (2021). A robust attentional framework for license
Robust scene text recognition with automatic rectifi- plate recognition in the wild. IEEE Trans. on Intelli-
cation. In IEEE/CVF Conference on Computer Vision gent Transportation Systems, 22(11):6967–6976.
and Pattern Recognition (CVPR), pages 4168–4176. Zhou, W., Li, H., Lu, Y., and Tian, Q. (2012). Princi-
Silva, S. M. and Jung, C. R. (2020). Real-time license pal visual word discovery for automatic license plate
plate detection and recognition using deep convolu- detection. IEEE Transactions on Image Processing,
tional neural networks. Journal of Visual Communi- 21(9):4269–4279.
cation and Image Representation, page 102773. Zou, Y., Zhang, Y., Yan, J., Jiang, X., Huang, T., Fan, H.,
Silva, S. M. and Jung, C. R. (2022). A flexible approach for and Cui, Z. (2020). A robust license plate recognition
automatic license plate recognition in unconstrained model based on Bi-LSTM. IEEE Access, 8:211630.

You might also like