Automatic License Plate Recognition System
Automatic License Plate Recognition System
ITEGAM-JETIA
Manaus, v.10 n.48, p. 129-134. July/August., 2024.
DOI: https://doi.org/10.5935/jetia.v10i48.955
ISSN ONLINE: 2447-0228
Copyright ©2024 by authors and Galileo Institute of Technology and Education of the Amazon (ITEGAM). This work is licensed
under the Creative Commons Attribution International License (CC BY 4.0).
Support Vector Machine, RCNN and Cascade Color Space II. AUTOMATED LICENSE PLATE RECOGNITION
Transformation of Pixel Features, Cascaded Contrast-Color Haar- APPROACHES
like Features, Cascaded Convolution Network. The Multi-Level
Extended Local Binary Patterns and Extreme Learning Machine II.1 MULTI-STAGE LICENSE PLATE RECOGNITION
are also used to detect the number plate from vehicles. SYSTEMS
License plate recognition approaches can be broadly
I.2 SURVEY MOTIVATION classified as multi-stage and single stage approaches. In multistage
Although automatic number plate recognition systems are approaches mainly three classes are there. These include License
intended for outdoor usage, they struggle to find and identify plate detection or extraction [1], License Plate segmentation
license plates in constantly changing weather and environmental (extracting separate characters), and character recognition.
circumstances. The application of most current systems is In the first stage, License plate is detected by using an
constrained by elements like shifting lighting conditions, snow or attention mechanism. Region proposal network (RPN) generates
fog, day and night, camera shaking, rotations, and occlusions. the rectangular object proposals for the subsequent processing that
Automatic Number Plate Recognition systems, which are sensitive helps to detect the number plate efficiently [1],[2],[5]. The
to changes in light and typically work in daylight, must deal with cascaded CC-Haar-like detector, the cascaded CST-pixel detector,
cars traveling at varied speeds in the real world. Many methods, and the cascaded ConvNet detector make up a hybrid cascade for
which primarily function in daylight, are sensitive to variations in the detection of license plates with various resolutions [3]. A
light. Production Systems for automatic number plate recognition vertexNET architecture, composed of two cascaded CNNs, is used
must also satisfy non-functional criteria including acquisition and for license plate recognition [6]. The license plate detection stage
operation costs, physical specifications, power needs, and is classified into two stages namely feature extraction stage and
connection restrictions. ELM classification stage [7].
In the second stage, some common techniques for the
I.3 ARTICLE STRUCTURE segmentation and individual character extraction are used such as
vertical or horizontal projection [1]. There are some algorithms in
A comprehensive overview of license plate recognition the ALPR system that don't make use of segmentation techniques.
systems is provided in Section 1. The two basic strategies are So, this is not a mandatory stage in the ALPR system.
covered in Section 2. Multi-stage and single-stage ALPR for In the last stage, the character recognition algorithms are
license plate recognition. Principal components of a multi-stage used such as OCR or neural networks for character recognition.
license plate identification, such as license plate detection and Every stage in the multi-stage approach is equally important for the
license plate recognition are discussed in Sections 3 and 4. These overall performance of the ALPR system.
sections each provide the relevant advantages, difficulties,
restrictions, and suggested solutions. In Section 5 different datasets
used for the study are discussed with their respective parameters,
advantages, and disadvantages. In Section 6, we quickly go over
several evaluation parameters for the ALPR system. Section 7
addresses the challenges that need to be addressed for optimal
performance of the ALPR system. Finally, Section 8 concludes the
study.
Page 130
One, Two and Three, ITEGAM-JETIA, Manaus, v.10 n.48, p. 129-134, July/August., 2024.
while down sampling the spatial dimension from 160 48 to 40 6. III.2 YOLO-v2
They use repeated ResSeparableConv blocks in the middle flow,
keeping the spatial size and channel number constant, to extract The deep learning model YOLOv2 (You Only Look Once
deep features that contain higher level representations. We extract version 2) is frequently used for tasks involving the identification
a middle-level feature map M of size 406512 as the attention of license plates. YOLOv2, which has been trained on datasets with
network's context from the exit flow, along with a final feature annotated license plate areas, is excellent at real-time object
vector F of 512 dimensions. In the sequence decoder, LSTMs with identification. When used on license plates, it successfully
two layers and 512 hidden states each are used [8]. A 2-layer recognizes and localizes them inside picture or video frames. It is
attention mechanism is used which reduces the need for a useful tool for many applications since the model forecasts
segmenting each character in the license plate separately. A similar bounding boxes that include the precise placements of the license
kind of single stage approach is used in which use of inception v3 plates. A YOLOv2 [8] detector is utilized to obtain the bounding
with three layers of CNN and six layers of SSD 300 are used [9]. boxes of license plates.
III.3 VERTEXNET
VertexNet [6] is a good performing one-stage detector with
a limited input size, a narrow channel of high-level layers, and
vertex estimation. Three components, the head, fusion, and
backbone networks, make up the proposed VertexNet [6].
VertexNet is built using small-resolution input, even with character
information lost, to achieve fast inference speed and reduce
memory use [6].
Page 131
One, Two and Three, ITEGAM-JETIA, Manaus, v.10 n.48, p. 129-134, July/August., 2024.
image is converted to a grayscale image, and processing speed is has a total of 1376 photographs taken from the Hong Kong-Zhuhai-
also considerably increased [11]. Macao Bridge's entrance control system. The toll gate is where the
The License Plate is resampled and rectified to a higher test photographs are taken. There are 280 LPs photos altogether
resolution (64 * 256) according to the vertices. This resampling with a resolution of 1024 × 800 in the testing subset. Vehicles from
step normalizes the location of characters in the LP. Another Hong Kong, Mainland China, and Macao with a maximum of three
process performed is the rectification, which is achieved using license plates are included in the dataset.
perspective transformation. This generates the bird’s eye view of
the detected license plate [6]. Multi-level pre-processing V.2 ALOP DATASET
approaches are used to pass the detected license plate through
multiple stages of pre-processing. A Gaussian filter and the The 2049 license plate images in the ALOP [1],[6]
CLAHE method are used for multi-level pre-processing in [7],[9]. collection are broken up into three categories: road patrol (RP),
In this paper, G (σ) is the Gaussian filter with the standard deviation traffic law enforcement (LE), and access control (AC). The photos
σ = 0.25. CLAHE (σ) is defined as a contrast-limited adaptive of the RP subgroup are taken at different distances and
histogram equalization method with a standard deviation σ = perspectives. The AOLP dataset is split into two subsets: training
0.01[7]. A new image is produced at each stage of multi-level pre- pictures are utilized for the remaining photos, and files with
processing which eventually expands the training image size. filenames that begin with "1" are used as testing images (111
Based on the results of the LP detection, the License Plate images images). The ratio between the groups for training and testing is
are cropped depending on the required ROI. These cropped images around 4.5:1. LPs and characters are made and annotated using the
can be tilted vertically and horizontally. In [5] the focus is on the AOLP dataset's ground truth data.
vertical tilt only. The image is binarized after correction of
horizontal tilt. Selecting the specific pixels, starting from the first V.3 PKU DATASET
pixel value to judge the tilt level [8].
The PKU collection contains 3977 photos with Mainland
IV.2 CHARACTER SEGMENTATION China LPs. Because it solely provides the ground-truth file of LPs,
this dataset is used to assess the efficacy of LP detection. Five
The Mask branch is used for instantly segmenting the LP groups (G1–G5) make up the PKU dataset [1], with G1 being the
characters provided input of Region Proposals which is output from simplest and G5 being the most challenging. The [1] utilizes the
the Faster R-CNN [1]. The recognition of the rivet position and other 4 groups as the testing datasets and 810 photos from G1 as
white dots on the license plate to perform character segmentation the training dataset. PKU Data [6] is an LP detection dataset in
[10]. which the characters in 2253 images are labelled. Three subsets are
used to choose those images: G1 (daytime under normal
IV.3 CHARACTER RECOGNITION conditions), G2 (daytime with sun glare), and G3 (nighttime).
Character recognition is the last and hence the very V.4 FIELD TESTING DATASET
important step, as the evaluation of performance is based on this
step. Most of the character recognition techniques use variants of The field-testing dataset included 12000 photos from the
CNN [3]. Some use a single stage License Plate Recognition Transport Bureau of the Macao S.A.R. (DSAT) [1]. There are three
approach that uses end-to-end CNN architecture while some multi- main types of vehicles in the dataset: automobiles, trucks, and
stage approaches use a CNN architecture in their License Plate buses. The resolution of these images is 1024 x 800. Between the
recognition stage [1],[4],[11],[12]. In the papers [1-4], a variant of training and validation sets, these photos are split in a 4:1 ratio.
CNN, VGG & VGG-16 are used. In the paper [1], a multi-stage
license plate recognition is used which uses VGG to recognize the V.5 LPST-110K
license plate. VGG-16 is the modified version of VGG that is used
in [4] with some modification in the architecture to acquire better The LPST-110K [2] Dataset consists of pictures shot in
results. open spaces. It is the first dataset to simultaneously handle LP and
A 2-layer LSTMs is used for recognition of license plates scene text for LP detection. The LPST-110K is the first dataset that
[8], which eliminates the need for individual character provides text annotations in addition to a significant number of
segmentation and hence gives better performance without examples (LP and non-LP) in a picture, even when those instances
character segmentation. Resampling and rectification of LP is done are taken from scenes without any limitations. The LPST-110K
according to the vertices obtained from VertexNet. The corrected dataset compiles images from hundreds of dash cameras and
LP picture is then sent to SCR-Net. Through a forward pass, SCR- security cameras installed in moving cars and structures,
Net guesses the characters [6]. The number-plate's alphanumeric encompassing locations in East Asia and Europe. Along with the
characters are recognized using the Tesseract OCR engine. Prior LP Road signs, wallpaper text, banners, and commercial adverts
training is done to increase the Tesseract OCR engine's accuracy are also included in the collection in [2] as non-LP scene texts.
[9]. There are 9,795 photos and 110,000 scene text pieces in the LPST-
110K collection.
V. DATASET ANALYSIS The scene texts contain 51,031 LP instances and 58,969
non-LP instances. The resolution of each image in the collection is
V.1 HZM MULTI-STYLE DATASET 1280 (Width) x 720 (Height) x 3 (Channels). The photos in LPST-
The "HZM multi-style dataset" in [1] is a collection of 110K are compressed using the h264 codec setting in contrast to
automobile pictures taken from the Hong Kong-Zhuhai-Macao most other LP detection datasets.
Bridge and is referred to as such because it contains several forms
of license plates.176 photos are utilized for testing, and 1200
images are used to train the model in this proprietary dataset, which
Page 132
One, Two and Three, ITEGAM-JETIA, Manaus, v.10 n.48, p. 129-134, July/August., 2024.
V.6 VALID CCPD-Base and the 80k examples of sub-datasets such as CCPD-
DB, CCPD-FN, CCPD-Rotate, CCPD-Tilt, CCPD-Weather, and
The two auto-mobile data recorders are used to record CCPD-Challenge.
videos in 720 x 1280 resolution on the streets of a Chinese city1.
The collected dataset is known as the "Vehicle and License Plate VI. PERFORMANCE EVALUATION APPROACHES
Dataset" (VALID [4]). A dataset includes a total 887 well
annotated images. The test set consists of 78 photos from a single Most of the ALPR system uses loss function as the
recorder. 809 additional photos from another recorder are divided evaluation method [1],[2]. This loss function is calculated at each
randomly in the ratio 7:3 into the training set and the validation set. stage in the single stage [4],[8],[9] as well as the multi-stage
approach [1-3], [5],[6]. Finally, the results are aggregated to get the
V.7 DETROIT right accuracy of overall ALPR systems. The accuracy is calculated
as per loss value at each stage [12].
The "Car" and "Vehicle registration plate" are part of the re- In the [2] paper the evaluation techniques used are
annotated DETROIT Dataset, which is a subset of the Open Image Precision, Recall, F-measure and IoU.
Dataset (OID).in a simple way DETROIT is called as (Dataset from
Open Image Dataset). DETROIT Dataset is a re-annotated subset VI.1 PRECISION
of the Open Image Dataset (OID), which contains “Car” and
“Vehicle registration plate”. For simplicity, DETROIT is called as The ratio of the number of successfully identified
(Dataset from Open Image Dataset) [4]. The size and aspect ratio bounding boxes to all acquired bounding box candidates is known
of the DETROIT photos, which are downloaded from the Internet, as precision. [2],[5-7].
can vary greatly. The test set consists of 386 images taken from the 𝑇𝑝
OID validation set. 1113 OID test photos are randomly split into a 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = (1)
𝑇𝑝+𝐹𝑝
training set and a validation set in the ratio of 7:3.
Where, Tp = correctly estimated bounding box
V.8 DOC Fp = incorrectly estimated bounding box
To obtain DOC (Dataset from Cars), the location of the VI.2 RECALL
vehicle and the location of the license plate are combined. There
are overall 105 photos in the dataset. Out of which 70% chosen at The recall is the ratio of the correctly estimated bounding
random as the training-validation set while the remaining 30% are boxes among all the ground truths [2],[5],[7].
used as the test set. The size and aspect ratio of the DOC images, 𝑇𝑝
which are downloaded from the Internet, vary widely. 𝑅𝑒𝑐𝑎𝑙𝑙 = (2)
𝑇𝑝+𝐹𝑛
There are a total of 1200 images in the CLPD [8] (China VI.3 F-MEASURE
License Plate Dataset) dataset, which comes from all 31 provinces
on the mainland. It covers a wide range of photographic situations, Benchmark for LP detection evaluation used in PKU dataset
vehicle types and regional codes, allowing for an in-depth analysis [2],[7].
(𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛∗𝑅𝑒𝑐𝑎𝑙𝑙)
of current license plate recognition techniques while promoting the 𝐹 − 𝑚𝑒𝑎𝑠𝑢𝑟𝑒 = 2 ∗ (3)
(𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙)
development of a more useful model. Licence plate images in the
CLPD dataset are gathered from various kinds of real-scene image VI.4 IoU
sources, such as web searches, images taken from smartphones,
and driving recorder recordings of automobiles. The photography When the detected bounding box's IoU overlaps the ground
angles, shooting times, resolutions, and background are also taken truth region by more than 50% (IoU > 0.5), it is deemed to be
into consideration when capturing LP images to account for the accurate [2],[3].
various conditions. Various vehicle types, including cars, trucks, 𝑎𝑟𝑒𝑎(𝑅𝑑𝑒𝑡∪ 𝑅𝑔𝑡)
police cars and new energy vehicles, are included in the CLPD 𝐼𝑜𝑈 = (4)
𝑎𝑟𝑒𝑎(𝑅𝑑𝑒𝑡∩ 𝑅𝑔𝑡)
dataset. The real-world dataset CLPD [6] contains a wide range of
vehicle types, environment, and area codes. Where, Rdet = area of the detected bounding box
Rgt = ground truth
V.10 CCPD AP (Average Precision) is another method used for the
evaluation of ALPR systems [2],[4]. AP is calculated over IoU
The Chinese City Parking Dataset (CCPD) [6] provides a (Intersection over Union) [2].
large-scale and comprehensive Licence Plate benchmark to The performance evaluation measure Success Ratio is
evaluate Automated License Plate Recognition techniques under used in [13]. Success Ratio is the ratio of the number of success
uncontrolled conditions. CCPD contains 280k vehicle images, samples to the total number of samples.
which is two orders of magnitude greater than other LP datasets,
𝑁𝑆𝑠
that were taken under uncontrolled conditions, such as diverse 𝑆𝑅 = ∗ 100 (5)
𝑇𝑁𝑠
weathers, lighting, rotation, and vagueness. Each image has a 720
x 1160 resolution. The dataset provides sufficient annotations, Where, SR = Success Ratio
including the LP character, bounding box, four vertices, degree of NSs = Number of success samples
tilt in both the horizontal and vertical axes, brightness, and TNs = Total number of samples
vagueness levels. The model is trained by using 100k examples of The evaluation of classification accuracy in [9] is done
CCPD-Base, and it is tested on the remaining 100k examples of using the formula below.
Page 133
One, Two and Three, ITEGAM-JETIA, Manaus, v.10 n.48, p. 129-134, July/August., 2024.
Conceptualization: Vishakha H. Jagtap, Rohit V. Dhotre, Utkarsh [11] Chenxu Duan, Shiqiang Luo, Design of License Plate Recognition System
R. Khandare, Harshada N. Khuspe, Rohini B. Kokare. Based on OpenCV, 2022 15TH INTERNATIONAL SYMPOSIUM ON
COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), 2022.
Methodology: Vishakha H. Jagtap, Rohit V. Dhotre.
https://doi.org/10.1109/ISCID56505.2022.00031
Investigation: Vishakha H. Jagtap, Rohit V. Dhotre, Utkarsh R.
Khandare, Harshada N. Khuspe. [12] Liang Wang, Yimei Huanga, Chengqun Lianga, Jinrong Zhoua, Taoqiang
Discussion of results: Vishakha H. Jagtap, Rohit V. Dhotre, Zhua, License plate recognition system based on image recognition, 2022 IEEE
21ST INTERNATIONAL CONFERENCE ON UBIQUITOUS COMPUTING
Utkarsh R. Khandare, Harshada N. Khuspe, Rohini B. Kokare. AND COMMUNICATIONS (IUCC/CIT/DSCI/SMARTCNS), 2022.
Writing – Original Draft: Vishakha H. Jagtap, Rohit V. Dhotre,
Utkarsh R. Khandare, Harshada N. Khuspe. 13] Milan Samantaray, Anil Kumar Biswal, Debabrata Singh, Debabrata Samanta,
Writing – Review and Editing: Vishakha H. Jagtap. Marimuthu Karuppiah, Niju P Joseph, Optical Character Recognition (OCR) based
Vehicle’s License Plate Recognition System Using Python and OpenCV, 2021 5TH
Resources: Rohit V. Dhotre, Utkarsh R. Khandare, Harshada N. INTERNATIONAL CONFERENCE ON ELECTRONICS, COMMUNICATI ON
Khuspe. AND AEROSPACE TECHNOLOGY (ICECA), 2021.
Supervision: Rohini B. Kokare. https://doi.org/10.1109/ICECA52323.2021.9676015
Page 134