CNN-RNN Based Method For License Plate Recognition: July 2018
CNN-RNN Based Method For License Plate Recognition: July 2018
net/publication/327099013
CITATIONS READS
9 1,398
6 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Hossein Anisi on 31 August 2018.
Research Article
ISSN 2468-2322
CNN-RNN based method for license Received on 20th July 2018
Accepted on 21st July 2018
plate recognition doi: 10.1049/trit.2018.1015
www.ietdl.org
Abstract: Achieving good recognition results for License plates is challenging due to multiple adverse factors.
For instance, in Malaysia, where private vehicle (e.g., cars) have numbers with dark background, while public vehicle
(taxis/cabs) have numbers with white background. To reduce the complexity of the problem, we propose to classify the
above two types of images such that one can choose an appropriate method to achieve better results. Therefore, in
this work, we explore the combination of Convolutional Neural Networks (CNN) and Recurrent Neural Networks
namely, BLSTM (Bi-Directional Long Short Term Memory), for recognition. The CNN has been used for feature
extraction as it has high discriminative ability, at the same time, BLSTM has the ability to extract context information
based on the past information. For classification, we propose Dense Cluster based Voting (DCV), which separates
foreground and background for successful classification of private and public. Experimental results on live data given
by MIMOS, which is funded by Malaysian Government and the standard dataset UCSD show that the proposed
classification outperforms the existing methods. In addition, the recognition results show that the recognition
performance improves significantly after classification compared to before classification.
Fig. 4 Min and Max clusters for foreground and background images of
Fig. 2 Intensity distribution of foreground and background of normal and normal and taxi license plate images
taxi plate images a Min and Max clusters for the foreground of normal plate image
b Min and Max clusters for the background of the normal image
a Canny images of normal and taxi plate images
c Min and Max clusters for the foreground of taxi image
b Histogram for grey of the foreground of the normal plate
d Min and Max clusters for the background of taxi image
c Histogram for grey of the background of the normal plate
d Histogram for grey of the foreground of taxi plate
e Histogram for grey of the background of taxi plate
Gx,y , if Canny x, y = 0
cluster for normal plates. This results in response ‘1’. In this way, the GBx,y = (2)
proposed method derives three hypotheses and finds the responses. If 0, else
the hypothesis gives two responses as ‘1’ out of three, it is identified as
a normal plate else a taxi plate. The proposed method applies K-means clustering with K = 2 on
The foreground and background separation is illustrated in Fig. 3, intensity values of foreground and background of private and
where Figs. 3a and b of private and public images provide edges public plate images to classify pixels which have high intensity
without losing shapes. Then the proposed method extracts intensity values into Max cluster and pixels which have low intensity values
values in the grey image G corresponding to foreground and back- into Min cluster as shown in Figs. 4a–d, respectively. It is noted
ground pixels, say GF and GB as defined in (1) and (2) for private from Figs. 4a and b that the number of pixels classified into the
and public plate images as, respectively, shown in Figs. 3c and d, Min cluster is higher than that of the Max cluster. Though the
where we notice there are colour changes in foreground and back- Max cluster gets high values, the number of pixels in the cluster is
ground of private and public plate images lower than that of pixels in the Min cluster. Therefore, the number
of pixels in the cluster is considered as the weight, which is
Gx,y , if Canny x, y = 1 multiplied by the standard deviation as defined in (3). On the other
GFx,y = (1) hand, it is noted from Figs. 4c and d that the number of pixels
0, else which are classified into the Min cluster is lower than that of the
The proposed architecture of license plate recognition is shown Fig. 6 CNN is used for feature extraction
in Fig. 5, where we can see there are three steps, namely, CNN for
feature extraction, LSTM for extracting sequence, and connectionist
temporal classification (CTC) for the final recognition.
The architecture for CNN is derived from visual geometry group
(VGG) (very deep architecture) as shown in Fig. 5, where the
proposed method constructs the components of convolutional
layers by considering convolutional and max-pooling layers from
the basic VGG [15]. Then the components are used to extract
features that represent the input image. Before passing the input
image to the network, all the images are scaled to a standard size,
namely, 32 × 100 × 3, as shown in Fig. 6. For the input image in
Fig. 7a, the proposed method generates a feature map as shown in
Fig. 7b, from which the sequence of the feature are extracted
Fig. 7 Feature extraction using CNN
given by the convolutional layers. The output of this process is
considered as the input for the recurrent layers. In other words, the a Input license plate images
b Feature maps of CNN forward propagation
proposed method scans the feature map from left to right column
by column to generate feature vectors of a feature sequence.
Therefore, each column of the feature map corresponds to a information and in turn help in obtaining stable sequence recog-
rectangle region of the input image. The proposed method nition than treating each feature independently as defined in (8).
arranges rectangle regions in the order from left to right with LSTM is illustrated in Fig. 8a, which comprises memory cells and
respect columns. three multiplicative gates, namely, the input, output and forget
It is noted that traditional gradient vanishing does not have gates. In general, memory cells are used to store the past context
the ability to extract context to predict the next state [14]. To find information, the input and output cells are used to store the
a solution to this issue, the long-short term memory (LSTM) context for a long period of time. Similarly, a forget cell is used to
has been developed, which is a special type of RNN unit. The clear the context information in the cells. To strengthen the feature
RNNs are special neural networks, which exploit past contextual extraction and selection, the proposed method combines both
The padding type ‘Valid’ means no padding, ‘SAME’ means that the
output feature map has the same spatial dimensions as the input feature
map. Zero padding is introduced to make the shapes match as needed,
equally on every side of the input map.
4 Experimental results
texts are unpredictable, which suffer from distortions affected Fig. 10 Sample successful and unsuccessful images of the proposed method
by multiple causes as public images in this work. Similarly, the a Sample successful images
methods consider caption texts have good clarity and contrast, b Sample unsuccessful images
which is the same as private images compared to public images.
For recognition, we implement one state-of-the-art method [2]
which proposes stroklets for recognition of texts in natural scene Table 2 Confusion matrix of the proposed method and existing
images. Since license plate recognition is almost the same as methods (%)
natural scene text recognition, we use the above existing methods Proposed method Xu et al. [5] Roy et al. [7]
for comparative study in this work. In addition, according to the
method, it is robust to the poses caused by distortion and generic. Private Public Private Public Private Public
The method [2] uses a random forest classifier for recognition. In
order to test the features extracted by the method, we pass the private 77.54 22.46 57.6 42.4 53.5 46.5
public 19.78 80.22 33.69 66.31 31.98 68.02
same features to CNN for recognition, which is considered as one
more existing method for comparative study. We also use the
Bold values indicate, highest results achieved by the respective methods.
system which is available online [3] for license plate recognition.
This system explores deep learning as the proposed method for
recognition. In summary, we choose one state-of-the-art method
from natural scene text and one more from license plat recognition
to show that existing natural scene text and license plate
recognition methods may not work well for license plate images
affected by multiple adverse factors.
6 Acknowledgment
extraction and BLSTM makes sense for tackling the challenges 7 References
posed by adverse factors. This is the advantage of the proposed
recognition compared to existing methods. [1] Du, S., Ibrahim, M., Shehata, M., et al.: ‘Automatic license plate recognition
(ALPR): a state-of-the-art review’, IEEE Trans. Circuits Syst. Video Technol.,
Quantitative results of the proposed recognition and existing 2013, 1, pp. 311–325
methods are reported in Table 3. It is observed from Table 3 that [2] Bai, X., Cong, Y., Wenyu, L.: ‘Strokelets: a learned multi-scale mid-level
the proposed recognition scores the best for UCSD and before-after representation for scene text recognition’, IEEE Trans. Image Process., 2016,
classification of MIMOS dataset compared to the three existing 25, pp. 2789–2802
[3] ‘High Accuracy Chinese Plate Recognition Framework’, https://github.com/
methods. The HLPR system is the second best compared for all the zeusees/HyperLPR, Accessed 18 May 2018
experiments. Interestingly, when we look at the results of before clas- [4] Raghunandan, K.S., Shivakumara, P., Kumar, G.H., et al.: ‘New sharpness
sification and after classification, all the methods including the pro- features for image type classification based on textual information’.
posed method report low recognition rate prior to classification and Proc. Document Analysis Systems (DAS), Greece, 2016, pp. 204–209
[5] Xu, J., Shivakumara, P., Lu, T., et al.: ‘Graphics and scene text classification in
high recognition rate after classification. The difference between video’. Proc. Int. Conf. on Pattern Recognition (ICPR), Sweden, 2014,
before and after is significant. This is due to the advantage of classi- pp. 4714–4719
fication MIMOS data into private and public, which helps us to [6] Shivakumara, P., Kumar, N.V., Guru, D.S., et al.: ‘Separation of graphics
reduce the complexity. Because of classification, we can tune or (superimposed) and scene text in video’. Proc. Document Analysis Systems
(DAS), France, 2014, pp. 344–348
modify parameters for tackling the issues with the dataset. In this [7] Roy, S., Shivakumara, P., Pal, U., et al.: ‘New tampered features for scene and
experiment, we tune the key parameters of CNN and BLSTM based caption text classification in video frame’. Proc. Int. Conf. on Frontiers in
on training samples with respect classes. Therefore, we can conclude Handwriting Recognition (ICFHR), China, 2016, pp. 36–41
that the recognition performance improves significantly when we [8] Yang, Y., Li, D., Duan, Z.: ‘Chinese vehicle license plate recognition using kernel
based extreme learning machine with deep convolutional features’, IET Intell.
classify complex data into sub-classes according to complexity. In Transp. Syst., 2018, 12, pp. 213–219
other words, rather than developing a new method, this is one way [9] Abedin, M.Z., Nath, A.C., Dhar, P., et al.: ‘License plate recognition system
of achieving a high recognition rate for complex data. It is also based on contour properties and deep learning model’. Proc. Region 10
observed from Table 3 that the proposed recognition scores the best Humanitarian Technology Conf. (R10-HTC), Bangladesh, 2017, pp. 590–593
[10] Polishetty, R., Roopaei, M., Rad, P: ‘A next generation secure cloud based deep
recognition rate for the UCSD dataset, which shows the method can learning license plate recognition for smart cities’. Proc. Int. Conf. on Machine
be used for any other dataset irrespective of scripts. The main Learning and Applications (ICMLA), USA, 2016, pp. 286–294
reason for the poor results of the existing methods is the same as dis- [11] Montazzolli, S., Jung, C.: ‘Real time Brazilian license plate detection and
cussed earlier. recognition using deep convolutional neural networks’. Proc. SIBGRAPI on
Conf. on Graphics, Patterns and Images, Brazil, 2017, pp. 55–62
When the images are oriented in different directions, the proposed [12] Bulan, O., Kozitsky, V., Ramesh, P., et al.: ‘Segmentation and annotation free
recognition method does not work well as shown sample results in license plate recognition with deep localization and failure identification’, IEEE
Fig. 12. This is not a problem of feature extraction but it is the Trans. Intell. Transp. Syst., 2017, 18, pp. 2351–2363
limitation of CTC, which gets confusing when an image is rotated [13] Selmi, Z., Halima, M.B., Alimi, A.M.: ‘Deep learning system for automatic
license plate detection and recognition’. Proc. Int. Conf. on Document Analysis
in different directions. However, it is noted that in the case of and Recognition (ICDAR), Japan, 2017, pp. 1132–1137
license plate images, it is seldom to have different orientations like [14] Li, H., Wang, P., You, M., et al.: ‘Reading car license plates using deep neural
texts in natural scene images. Therefore, this issue may not be a networks’, Image Vis. Comput., 2018, 72, pp. 14–23
major drawback of the proposed recognition method. [15] Shi, B., Xiang, B., Cong, Y.: ‘An end-to-end trainable neural network for
image-based sequence recognition and its application to scene text
recognition’, IEEE Trans. Pattern Anal. Mach. Intell., 2017, 39, (11),
pp. 2298–2304
5 Conclusion and future work [16] Shivakumara, P., Konwer, A., Bhowmick, A., et al.: ‘A new GVF arrow pattern
for character segmentation from double line license plate images’. Proc. Asian
Conf. on Pattern Recognition (ACPR), China, 2017, pp. 782–787
In this work, we have proposed the combination of CNN and RNN [17] Zamberletti, A., Gallo, I., Noce, L.: ‘Augmented text character proposal
for recognising license plate images through classification. For and convolutional neural networks for text spotting from scene images’.
the classification of private and public number plate images, the Proc. Asian Conf. on Pattern Recognition (ACPR), Malaysia, 2015