Design of A Recognition System Automatic
Design of A Recognition System Automatic
ABSTRACT
The present work is a study on the practical application of registrations from images captured by a camera, and are
Learning process (Deep Learning) in the development of a mainly used in monitoring and control devices. In these
system of Automatic recognition of vehicle license plates. systems the recognition of the images of numbers and letters
These systems commonly referred to as ALPR (Automatic can be implemented through different Machine learning,
License Plate Recognition) - are able to recognize the with the most common being the neural networks of type
content of vehicles from the images captured by a camera. Multilayer Perceptron (MLP) and Support Vector Machines
The system proposed in this work is based on an image (SVM). Even some systems choose to use commercial
classifier developed through supervised learning techniques Optical Character Recognition (OCR) for this purpose.
with convolution neural network. These networks are one of
the most profound learning architectures and are specifically 1.3 Classification of images
designed to solve artificial vision, such as pattern The recognition or classification of images consists of
recognition and classification of images. This paper also assigning an image a label of a defined set of categories
examines basic processing techniques and Image based on their characteristics. Despite seeming to be a
segmentation - such as smoothing filters, contour detection - relatively trivial problem from our perspective, is one of the
necessary for the proposed system to be able to extract the most important challenges artificial vision systems. Factors
contents of the license plates for further analysis and such as scale, lighting, deformations or partial concealment
classification. This paper demonstrates the feasibility of an of objects make the classification of images is a complex
ALPR system based on a convolution neural network, noting task, to which a great deal of effort to develop sophisticated
the critical importance it has to design a network architecture pattern recognition techniques, which do not always produce
and training data set appropriate to the problem to be solved. the expected results. From the point of view of machine
learning, the classification of images is a supervised
General Terms learning problem, in which the classifiers algorithms
Deep Learning, Tensor flow, Python generate a model from a dataset or set of previously
categorized images. The model obtained is used later to
classify new images.
Keywords
Convolution Neural Network, Deep Learning, ALPR
1. INTRODUCTION
1.1 Artificial vision systems
Artificial vision is a branch of artificial intelligence whose
purpose is design computer systems capable of
"understanding" the elements and characteristics of a scene
or image of the real world. These systems allow extracting
information - numerical and symbolic - from of the
recognition of objects and structures present in the image.
Artificial vision is closely related to Image processing and Fig.1: Digital image expressed as an array (pixel values are
pattern recognition. In earlier, used to facilitate the Fictitious)
localization and detection of areas of interest in the images; For these algorithms the images are three - dimensional
latter are used to identify and classify objects and structures matrices whose dimensions are the width, height and color
detected according to their characteristics. depth. The content is being of each position of the matrix a
numerical value representing the intensity of each pixel in
1.2 ALPR systems the digital image.
Systems ANPR (Automatic License Plate Recognition)
vehicles are a particular case of systems of artificial vision. Until the popularization of convolution neural networks
These systems are designed to "read" the content vehicle (CNN), the systems classifiers based on support vector
47
International Journal of Computer Applications (0975 – 8887)
Volume 177 – No.3, November 2017
machines (SVM) were the ones that presented the best necessary develop selection processes and attribute
results in recognition problems and classification of images. extraction. The learning algorithm of these networks allows
The main drawback of these systems is that they require a extracting the attributes or characteristics of each class from
prior process of extracting the relevant features of the a set of training data previously classified. These attributes
Images, almost always designed to the problem of the are the weights of the different neurons in the network, and
intended purpose solve, for which sophisticated techniques their values are calculated iteratively by a supervised
of detection of objects are used and patterns - histograms of learning method called back propagation or "Backward
oriented gradients (HOG),SIFT descriptors etc. The SVM error propagation".
classifier algorithm is trained from the characteristics drawn
to a subset of the training dataset, so the efficiency of the In broad outline, the algorithm consists of two repeating
generated model depends on how representative they are. stages iteratively for each element of the training set:
The most negative aspect of these systems is that they really • In the first class to which it belongs is calculated input
do not learn the characteristics or attributes of each copy according to the current values of the network weights.
category, as they predefined in the extraction step. In Once classified, the algorithm determines the validity of
addition, these systems are extremely sensitive to variations such classification by an error function that calculates cost
in scale, lighting, perspective etc. or how good or bad it is, comparing it with the class to
which the example of training introduced into the network.
1.4 Artificial neural networks • Known error, the second stage of the algorithm propagates
Artificial neural networks are an automatic learning back to all neurons in the network that have contributed to
paradigm inspired by the functioning of the biological brain. the classification of example, each receiving the "portion" of
These networks are composed of interconnected neurons that the corresponding error in function of their contribution, so
collaborate to produce an output from the input data of the that they update the weights proportionally, so that the new
network. Each artificial neuron or perceptron is a processing values reduce the classification error.
unit receives a series of input signals multiplying by a given
weight (Synaptic weights). The neuron calculates the sum of Gradient descent method allows calculating a minimum
the product of each input by its corresponding weight - to value of the error function is obtained. To calculate this
which a correction factor is usually added or bias - and value, the algorithm descends iteratively by the slope
applies the resulting value to an activation function that (Gradient) from the current value, forward in each iteration
produces a output value or other, depending on whether the step with a length or rate specific learning. The main
sum of signals and weights exceeds one certain threshold. drawback of the neural networks described in this section is
the difficulties posed by the total interconnection between
layers of neurons with increasing dimensionality of the data
of entry. For example: a color image of 300x300 pixels
requires a input layer with 270,000 pesos (300x300x3), if
multiple layers are added intermediate network to the
number of weights and bias necessary to grow exorbitantly,
increasing the computational cost and the risk of over
network training (over fitting).
48
International Journal of Computer Applications (0975 – 8887)
Volume 177 – No.3, November 2017
image or input layer, sliding from left to right and up and system will execute the segmentation it is consisting
down until you reach the end. The size of the sale and its detecting possible numbers and letters present on the license
length displacement are parameters to be determined plate. The code module that performs this process leads to a
according to the characteristics of the problem. As the segmentation based on the detection of contours, using it
window moves, the corresponding neurons in the features the popular OpenCV computer vision library. The
convolution layer calculate the product between the matrix stage character recognition will be made through a CNN
of shared weights, bias and the pixel values of the region to previously trained with a set of data with computer
which are connected, sending the result of this operation to generated numbers and letters with different typographies
an activation function determined - usually RELU(Rectified and styles. The output of the system will be a string
Linear Unit) function, which returns the max (0, x)value. composed of the characters detected in the image of the
The result of these calculations is a series of maps entry plate.
activation allow detecting the presence of a certain
characteristic in the image of entry. This feature is defined 3. DESCRIPTION OF THE DATA
by the weight matrix shared and the value of bias; the
combination of both is called filter or kernel and its 3.1 Training data set
application to a region of pixels is called convolution. As The effectiveness of a network-based image classification
you can see in the previous image, in the convolution layers system CNN depends on both the architecture and network
it is several filters - weight matrices - to detect more of a settings, and of the data set used for their training and must
feature in the picture. The convolution layers are usually sufficiently broad and representative of the problem that is
accompanied by a layer of pooling which condenses the wanted solve. In order to train the neural network of the
information collected, leaving only the maximum values of proposed system, the set Chars74k data created by T. M.
an area of the convolution layer, which dimensions of the Campos and Varma [3]. This dataset has over 74,000 images
input volume for the next layer. The typical architecture of numbers and letters in PNG format organized in three
a CNN network is a succession of layers pooling and collections: characters manuscripts, drawn characters and
convolution, being habitual that the last layer is a Fully photographs of everyday scenes computer generated
Connected network which is responsible for calculating the characters. This latest collection has more than 60,000
scores obtained by the input image for each of the classes or images in grayscale of digits and letters represented with
categories defined in the problem. The learning algorithm different fonts and styles (normal, bold and italics), a priori
of these networks is seen in back propagation gradient suitable for the CNN network "learn" to recognize patterns
descent and other similar methods. But in these networks of different numbers and letters of the license plate.
training aims at the different layers of neurons learn the
filters or characteristics of low and high level (lines, edges,
etc.) which represent each class or category of problem
image. CNN networks are today the "state of the art" in the
field of vision Artificial, where they have demonstrated a
superior efficiency to other techniques, and this is due to
three key advantages of the architecture:
49
International Journal of Computer Applications (0975 – 8887)
Volume 177 – No.3, November 2017
license plates have been obtained using tools Free Web sites with vectors and matrices, fundamental when working with
platesmania.com [5] and acme.com [6]. images and artificial neural networks.
The first allows you to create registrations under existing 4.1.2 OpenCV
national format, two groups of characters consisting of a OpenCV is a library of artificial vision developed by Intel
four-digit number, from the 0000 to 9999, and three letters, and now released under BSD license, which has more than
starting with the BBB letters and ending with the ZZZ five hundred algorithms optimized to perform the main tasks
letters, where the five vowels and the letters Ñ, Q, CH and of computer vision, such as image processing, feature
LL are deleted. Furthermore, the tool website acme.com can detection or recognition objects. The library also has
generate state license plates according to different formats different learning algorithms machine as support vector
United States, but without any restrictions when entering machines (SVM), Naïve Bayes or KNN among others. It is
combinations of letters and numbers. written in C ++, multiplatform and has interfaces to work
with languages like Java or Python. The large number of
available algorithms, its execution speed and extensive user
community that have, they make OpenCV an indispensable
tool for developing systems artificial vision based on open
source software.
50
International Journal of Computer Applications (0975 – 8887)
Volume 177 – No.3, November 2017
soft where small details are lost, similarly to what it occurs parameters to the needs of the problem raised in this paper.
in blurred pictures. This transformation is performed by This design also follows the general strategy proposed by
function Gaussian Blur of OpenCV. Simard et al [9] for visual analysis of documents by
convolution networks, it consist extract simple features of
4.2.3 Thresholding the characters in the first network layers, and later turns
The thresholding or "threshold technique" is a method that them into complex features thanks to the combination of the
allows you to converting image in black and white setting a various filters of successive layers convolution.
value from which all excess pixels that are transformed into
a binary color (white or black), and rest counter. In the case 5. IMPLEMENTATION
of this vehicle registrations transformation creates images 5.1 Data processing
which clearly fall defined outlines of the characters,
Then the various operations are described preprocessing they
facilitating the process of segmentation or isolation areas
carried out in the system, both the segmentation module
containing them.
characters, such as during the loading process image set
training the neural network.
In the proposed system this transformation is performed by
the function adaptiveThreshold OpenCV, wherein the
threshold value is calculated for different regions of the 5.1.1 Pre-processing input images
image, providing good results even if there illumination The various processing techniques described images that
variation image. apply to enrollments coming into the system - conversion
grayscale, Gaussian blur, thresholding and detection
contours - in order to isolate those regions that may contain
4.2.4 Edge Detection or letter of enrollment. Once identified these "regions of
The contours or edges are sets of curves joining adjacent
interest" (ROI), the Python module segmentation process
pixels having the same color or intensity. These curves can
manager performs a series of checks to determine if their
locate the borders of the objects in the image, and its
content is effectively a digit or a letter that is to be analyzed
detection is essential for artificial vision system can
and classified by the neural network. These checks are
recognize or detect shapes in a image. In the proposed
basically checked whether the height, width and pixel aspect
system this detection is the last stage of pre-image
ratio i.e. the ratio between width and height the region of
processing of tuition, and is carried out by the functions
interest - are within certain values. The minimum and
findContours and boundingRect of OpenCV.
maximum values of these parameters are determined by the
proportions they should have the characteristics of a standard
4.3 NEURAL NETWORK enrollment approximately 230x50 pixels, being the
ARCHITECTURE resolution to which you scale all the images that enter the
The main component of ALPR system presented in this system. If the proportions of a given region are valid, i.e.
work is its convolution neural network (CNN). This is they are within limits, the next step is to extract its contents
responsible for classifying characters of the license plates and save it as an image of 32x32 pixels that can be analyzed
from images taken in the module segmentation, for which a by the CNN network system.
prior process of training is necessary for the network to learn
the characteristics that define the different letters and 5.1.2 Pre-processing training data set
numbers that may be present in an enrollment. To train the CNN network is necessary to import the subset
of representative images of the problem that contains the
dataset Chars74K in a data matrix. Each subset of the
dataset is composed of images in grayscale computer
generated characters corresponding to the numbers 0-9 and
uppercase letters AZ. The big advantage of using a network
neuronal as ranking algorithm is their learning ability
automatic, thanks to which it is not necessary to develop
extraction processes attribute or use techniques to reduce the
dimensionality as PCA ( Principal Component Analysis ) or
the like; greatly simplifying the pre-processing tasks of the
training data.
51
International Journal of Computer Applications (0975 – 8887)
Volume 177 – No.3, November 2017
5.2.1 Initialization
Before running the training process is necessary to generate
and initializes by Tensor Flow network architecture. Briefly,
in the code module, A declares graph operations that
contains both the variables that store data training, weight
matrices, bias and outputs of the network; as calls the Fig.7: Evolution of precision during training
primitives framework to define the error function, the layers
convolution (with window stride, filters and activation As you can see the distance between them is very small,
functions), layers pooling, layer FC, exit softmax and the call which a priori is a good indicator that has not been produced
to update weights by back propagation and gradient descent. over fitting. You can also observe the high accuracy obtained
In the next picture you can see the graph of the CNN in both subsets: about 99% to 98% and training for
network system exactly it represents as the display tool validation. For the set of images of test accuracy obtained
Tensor Board included in the framework. Besides declare with the settings defined in the preceding paragraph is
the structure of the network, in operation network indicated 97.83%. Regarding the role of errors, the following chart
also how it should be initialized. shows its 4,500 iterations. Here you can see a fall very
pronounced during the first 500 iterations and quite softer
5.2.2 Training but constant for the rest, indicating that in principle the
Once declared and initialized the network, the next step is to learning rate chosen for the gradient method seems
import the set of training images preprocessed by the code appropriate.
module. Before starting the training, this set of images is
divided randomly into three subsets called training,
validation and test: the first contains a total of 28,012
images, the second3113 and third 3459. The training set is
used to determine the weights of the network allow
classification with the lowest possible error images,
validation to check if over fitting (occurs over fitting ) during
training and test set to estimate the accuracy of the system to
classifying images that have not been analyzed during the
iterations of training. The section of code responsible for
implementing the training process - makes a total of 4,500
iterations selecting each one lot or batch of 300 images of
the training set to optimize network weights by function
Gradient Descent Optimizer of Tensor Flow with a learning
rate of 0.0001; which means that the end of the will process
each image is analyzed about 50 times approximately.
5.3 Evaluation
To evaluate the effectiveness and efficiency of the learning Fig.8: Evolution error function during training
process of the network has chosen to monitor the evolution
of the overall accuracy and error in classifying images of In addition to tracking the above figures, the assessment
subsets of training and validation during the iterative effectiveness of the learning process is completed running
training process. Tracking accuracy detect if overload is the network trained with a sample of digits and letters
occurring training (over fitting) network, i.e., if the weights extracted license plate images synthetic and real.
and bias are excessively adjusting to the peculiarities of
images. Training subset, which may cause once trained the
network is not able to correctly classify the different images 5.4 OPTIMIZATION
analyzed during learning. In the next picture you can see the The characteristics of the network architecture (layers,
evolution of precision in both the training set and the filters, window size, etc.), as learning process parameters
validation for 4500 iterations. (number of iterations, batch size, learning rate, etc.); they
52
International Journal of Computer Applications (0975 – 8887)
Volume 177 – No.3, November 2017
53
International Journal of Computer Applications (0975 – 8887)
Volume 177 – No.3, November 2017
IJCATM : www.ijcaonline.org 54