DLBELPDR Final
DLBELPDR Final
Developing Deep Learning Based Ethiopian Car’s License Plate Detection and
Recognition Model
MASTER’S THESIS
By
Joseph Wondwosen
February 2, 2020
Arba Minch, Ethiopia
ARBA MINCH UNIVERSITY
Developing Deep Learning Based Ethiopian Car’s License Plate Detection and
Recognition Model
MASTER’S THESIS
By
Joseph Wondwosen
February 2, 2020
Arba Minch, Ethiopia
ARBA MINCH UNIVERSITY
This is to certify that the thesis entitled: “Developing Deep Learning Based Ethiopian Cars License
Plate Detection and Recognition Model” an submitted in partial fulfillment of the requirement for
the degree of Masters of Science in computer science, the graduate program of faculty of
computing & software engineering, Arba Minch university, Arba Minch Institute of Technology
has been carried out by Joseph Wondwosen, under my supervision.
Therefore, I recommend that the student has fulfilled the requirement and hence herby can submit
the thesis to the faculty for defense.
i
Declaration
I hereby declare that this MSc thesis is my original work and has not been presented in any other
university, and that all sources of materials used for the thesis have been fully acknowledged.
Signature: _______________
Date: _______________
ii
ARBA MINCH UNIVERSITY
We, the undersigned, members of the Board of Examiners of the final open defense by Joseph
Wondwosen have read and evaluated his thesis entitled “Developing Deep Learning Based
Car’s License Plate Detection and Recognition Model” and examined the candidate. This is,
therefore, to certify that the thesis has been accepted in partial fulfillment of the requirement of the
degree of Masters in Computer Science.
Approved By:
iii
Acknowledgement
First of all, I would like to thank GOD for everything that has happened in my life. His love and
mercy have always been with me through all my ups and downs even though I don’t deserve any
of them.
I would also like to thank my thesis advisor Dr. Anusuya. She consistently allowed this paper to
be my own work, but steered me in the right direction whenever she thought I needed it.
Finally, I must express my very profound gratitude to my wife Mercy Mohammed, my father
Wondwosen Tefera, my mother Antonina Tefera, my sister Scarlet Wondwosen and my brother
Nathanael Wondwosen for providing me with unfailing support and continuous encouragement
throughout my years of study and through the process of researching and writing this thesis. This
accomplishment would not have been possible without them. Thank you.
iv
Table of Contents
Acknowledgement ...................................................................................................................................... iv
List of Figures............................................................................................................................................ vii
List of Tables ............................................................................................................................................ viii
List of Equations ........................................................................................................................................ ix
List of Abbreviations and Acronyms ........................................................................................................ x
Abstract...................................................................................................................................................... xii
1. CHAPTER ONE: INTRODUCTION ............................................................................................... 1
1.1. Background ................................................................................................................................. 1
1.2. Statement of Problem ................................................................................................................. 2
1.3. Research Questions ..................................................................................................................... 3
1.4. Objectives..................................................................................................................................... 3
1.4.1. General Objective ............................................................................................................... 3
1.4.2. Specific Objectives .............................................................................................................. 3
1.5. Significance of the Study ............................................................................................................ 4
1.6. Scope and Limitation of the Study ............................................................................................ 4
1.7. Organization of the Thesis ......................................................................................................... 4
2. CHAPTER TWO: REVIEW OF RELATED LITRATURE .......................................................... 6
2.1. Introduction ................................................................................................................................. 6
2.2. Image Processing ......................................................................................................................... 6
2.2.1. Image Processing Techniques ............................................................................................ 7
2.2.2. Image Segmentation.......................................................................................................... 10
2.3. Deep Learning ........................................................................................................................... 12
2.3.1. Deep Learning Based Object Recognition (or Classification) ....................................... 13
2.3.2. Deep Learning Based Object Detection .......................................................................... 15
2.3.3. Deep Learning Platforms ................................................................................................. 17
2.4. License Plates ............................................................................................................................ 18
2.5. Related Work ............................................................................................................................ 20
2.5.1. Researches on Ethiopian License Plates ......................................................................... 20
2.5.2. Researches on Non-Ethiopian License Plates ................................................................. 20
2.5.3. Review Summary .............................................................................................................. 22
3. CHAPTER THREE: RESEARCH METHODOLOGY................................................................ 27
v
3.1. Introduction ............................................................................................................................... 27
3.2. Research Approach ................................................................................................................... 27
3.3. Process Design ........................................................................................................................... 28
3.4. Data Collection .......................................................................................................................... 29
3.5. Data Analysis ............................................................................................................................. 30
3.6. Tools ........................................................................................................................................... 31
3.6.1. Hardware Tools ................................................................................................................. 32
3.6.2. Software Tools ................................................................................................................... 32
3.6.3. Programming Languages and Platforms ........................................................................ 32
4. CHAPTER FOUR: RESEARCH DESIGN .................................................................................... 36
4.1. Introduction ............................................................................................................................... 36
4.2. Proposed Design ........................................................................................................................ 36
4.2.1. Image Data Acquisition .................................................................................................... 37
4.2.2. Cleaning ............................................................................................................................. 38
4.2.3. Augmentation .................................................................................................................... 38
4.2.4. Splitting .............................................................................................................................. 39
4.2.5. License Plate Detection ..................................................................................................... 39
4.2.6. Character Segmentation ................................................................................................... 43
4.2.7. Character Recognition...................................................................................................... 47
5. Chapter Five: Results and Discussion ............................................................................................. 50
5.1. Introduction ............................................................................................................................... 50
5.2. Object Detection ........................................................................................................................ 50
5.3. Character Segmentation ........................................................................................................... 54
5.4. Character Recognition.............................................................................................................. 56
6. Chapter Six: Conclusion and Future Work ................................................................................... 62
6.1. Conclusion ................................................................................................................................. 62
6.2. Future Work .............................................................................................................................. 63
Appendixes ................................................................................................................................................ 69
Appendix A: Sample Code for Detection Model’s Configuration .................................................... 69
Appendix B: Sample Code for Character Segmentation................................................................... 71
Appendix C: Sample Code for Character Recognition / Classification Model ............................... 75
vi
List of Figures
vii
List of Tables
Table 2-1: Ethiopian License Plate Classification Based on Service Code .................................. 19
Table 2-2: Ethiopian License Plate Classification Based on Regional Code ............................... 19
Table 2-3: Review Summary for paper [2] ................................................................................... 22
Table 2-4: Previous Works on Object Detection and Recognition Part 1 .................................... 23
Table 2-5: Previous Works on Object Detection and Recognition Part 2 .................................... 25
Table 3-1: Ethiopian Car's License Plate Color Properties Based on There Code ....................... 29
Table 4-2: ResNet101 Network Architecture ............................................................................... 41
Table 4-3: Character Recognition Model's Architecture .............................................................. 49
Table 5-1: Comparison of Ethiopian License Plate's Detection Model with Some Related Works
....................................................................................................................................................... 54
Table 5-2: CR Model’s First Experiment Parameters................................................................... 56
Table 5-3: CR Model’s First Experiment Evaluation Results ...................................................... 56
Table 5-4: CR Model’s Second Experiment Evaluation Results ................................................. 57
Table 5-5: CR Model’s Third Experiment Evaluation Results ..................................................... 58
Table 5-6: CR Model’s Fourth Experiment Evaluation Results ................................................... 59
Table 5-7: CR Model’s Fifth Experiment Evaluation Results ...................................................... 60
Table 5-8: Summary Table for The Experiments ......................................................................... 61
viii
List of Equations
ix
List of Abbreviations and Acronyms
AP Average Precision
Conv Convolution
CR Character Recognition
FC Fully Connected
LP License Plate
MP Mega Pixel
ms Milli Seconds
x
RNN Recurrent Neural Network
xi
Abstract
Keywords: Amharic Characters, Deep Learning, Image Processing, License Plate, OpenCV-
Python, Tensorflow
xii
1. CHAPTER ONE: INTRODUCTION
1.1. Background
In recent years, the number of motor vehicles has increased considerably and this has, in turn,
exacerbated the traffic management burden. The resultant congestion has caused extreme problems
such as traffic accidents or public space vulnerability to crime or terrorist attacks [1].
As a result, various Intelligent Transportation Systems (ITSs) emerge as viable solutions to those
problems. ITSs apply technologies of information processing and communication on transport
infrastructures to improve the transportation outcome. One of these systems is License Plate
Recognition (LPR) which is the most popular and important element of ITS [2].
The main job of LPR systems is to detect the license plate from rest of objects found in an image
and recognize the Alphanumeric characters which in case of Ethiopian license plates are Amharic
characters, Digits (ranging from 0-9) and in most cases English characters also [2].
As far as the knowledge of the researcher, till date there is one research that has been conducted
specifically on Ethiopian license plates. It was attempted using conventional Image Processing
algorithms and was able to achieve an overall accuracy of 63.1%.
In this research an effort has been made in detecting and recognizing Ethiopian license plates using
Deep Learning. The basic advantage in using deep learning is that unlike the traditional template
machining algorithms of image processing, feature extraction is done through directly learning
from images, text or sound [3] [4] [5]. As, a result deep learning models can achieve state of art
accuracy, sometimes even outperforming domain expert on their respective filed [6].
LPR systems basically contain three phases or stages: Plate Detection, Character Segmentation
and Character Recognition.
Plate Detection is categorized under one of the broader typical tasks of computer vision that is
Object Detection [7]. Object Detection is scanning or going through an image in search of a
specific object and there are so many algorithms that has been used till date in fulfillment of this
task. Before emergence of deep learning different mathematical models which are based on some
prior knowledge like Hough Transform, Frame Difference, Background Subtraction, Optical
Flow, Sliding Window and Deformable Part methods were used and are still used to some extent,
but currently deep learning-based algorithms are showing a state-of-art performance when it comes
1
to object detection [8]. In this research an effort has been made in trying to detect the license plate
from an image using deep learning approach.
The second phase is Character Segmentation which is performed on the detected or extracted plate
image. This phase aims at separating each character found on a detected plate and feeding them
individually to the next phase, to be recognized. There are many image processing techniques and
libraries which are used for image segmentation and this research used OpenCV-Python library
for its image preprocessing and segmentation tasks.
The third and final phase is recognition where each character that has been segmented earlier gets
recognized or classified. There are many traditional image processing techniques like Histograms
of Oriented Gradients which are used to represent an image to be classified but they need domain
expert’s knowledge and guidance for feature extraction. There is also a deep learning approach
which automatically learns and extracts those features by itself which in turn has a dramatic impact
on the performance of classification task [9].
Overall, in building the deep learning models a library called Tensorflow has been used and for all
other image processing and segmentation tasks the research used OpenCV-Python.
It is known that number of motor vehicles on the main roads of Ethiopian cities are increasing
considerably every year which as a result causes high traffic congestion problem. The congestion
problem in turn leads to many other problems like loss of life from traffic accidents, vehicle thefts
and other security related problems. Even though this problem is being addressed through
Intelligent Transportation Systems (ITSs) in most of the developed countries through research, in
Ethiopia there has only been one research and it was attempted using a conventional OCR based
Image Processing.
In case of License Plate Recognition, image of a car whose plate is to be processed may have an
angle where the plate is partially visible, low resolution quality, varying distance from the camera
and low lighting conditions which makes is very hard (challenging) to process and recognize.
2
1.3. Research Questions
➢ RQ 1: Which model building parameter set is better in training our model and achieving
better accuracy.
➢ RQ 2: How much accuracy boost can be achieved by using Deep Learning based approach
in solving recognition problem of Ethiopian license plates which has their own unique
morphology and set of characters (Amharic and English characters in addition to Arabic
Numbers).
1.4. Objectives
As a general objective this research tries to build a Deep Learning based Ethiopian license plates
recognition model with better accuracy.
In order to meet the general objective specified above the research have the following specific
objectives:
➢ To prepare two sets of datasets for detection and recognition models, where the detection
model’s dataset will include images of the whole car with some other surrounding objects
in context while the images are being captured and the recognition model’s dataset will
include images of individual characters that are composed of Amharic and English letters
and Arabic Numbers.
➢ Analyzing different model building parameter sets in context of our problem and building
both classification and detection models with a better one.
➢ Building image processing python scripts which makes use of better image processing
algorithms available: a detected license plate is going to pass through many image
processing stages (like: resizing, noise removal, binarization, contouring, segmentation)
before being feed as an input to the classification model.
➢ Building a script that can process and interpret the output of classification model Measuring
accuracy of the overall model.
3
1.5. Significance of the Study
This research has a huge impact not only on the transportation system but also in testing the limits
of deep learning models in handling such a unique problem with its own features and varieties.
Considering its applicability, it can be used significantly in many areas like: Parking, Access
Control in Restricted Areas, Motor Way Tooling, Border Control, Law Enforcement and many
more [10].
The techniques and approaches used in this research will definitely benefit the field of computer
vision in context of license plate recognition since by default it inherits the challenges of computer
vision like: Image Classification, Object Detection and Segmentation [11].
➢ The LPR model has been designed by considering Ethiopian license plates only.
➢ For both Amharic and English characters only selected ones that are used in Ethiopian
license plates around Addis Ababa city been used in training the recognition model since
it was impossible to collect images from all over the country because of time.
➢ Digits (ranging from 0 to 9) are going to be used in training the recognition model.
➢ The model is able to both detect and recognize Ethiopian license plates.
➢ The model is able to detect only license plates from an image that contains many other
objects and once detected, it is able to recognize it.
The entire document has a total of 6 chapters including the current chapter. The second Chapter is
about literature review and related works, which focuses on review of different image processing
and deep learning technologies (methods) that are used and mentioned in different literatures till
date. The third Chapter is about research methodologies used in conducting this research. The
fourth Chapter is about design of the research that documents the overall architecture as a whole.
It discusses all the modules and algorithms used in developing Ethiopian license plate detection
and recognition model. The fifth Chapter is all about evaluation of the developed model and
different experiments that were conducted in order to come up with an optimal model. Finally,
4
Chapter six concludes the overall result and findings of this research. It also contains
recommendations for future work.
5
2. CHAPTER TWO: REVIEW OF RELATED LITRATURE
2.1. Introduction
This chapter deals with review of literature with respect to this research’s domain area and the
problem that it is entitled to solve, which is detection and recognition of Ethiopian car’s license
plate. So, since this research work’s on integration of two domain areas that are deep learning and
image processing, the next two consecutive sections: Section 2.2 and Section 2.3 of this chapter
are going to focus on reviewing literatures on image processing and deep learning respectively.
Section 2.2 is a review on a brief concept behind the whole image processing field, well known
image processing techniques and finally since the second main task of LPR system is character
segmentation which falls under image processing domain, a review on different image
segmentation techniques and algorithms has been made. Section 2.3 is a review on deep learning
in general, deep learning-based object recognition (or classification), deep learning-based object
detection and deep learning platforms. Then, Section 2.4 is about license plates (or number plates)
and the work that have been done in detecting them (i.e. in automated way). Finally, in Section 2.5
some major journal articles and publications which are related to this research has been reviewed
and there is a conclusion that outlines implications and significance of the identified themes for
this research and field of study. The conclusion is also going to outline how and why this research
aims to address the gaps identified.
Images are built from colors or intensities of light which are called pixels. Pixels being building
blocks of an image; they can be represented in either Gray Scale (or single channel) and Colored
form. In grayscale images every pixel is represented in scalar value ranging between 0 and 255
(where 0 represents ‘black’ and 255 represents ‘white’) while the colored pixels are not scaler but
rather represented by a list of three values which mostly stand for Red, Green and Blue when we
are working with RGB color space [4].
So, Digital Image Processing which is a subcategory of Signal Processing is all about use of
computers to process digital images [12]. Image processing is used in solving many major tasks
6
of Computer Vision, which is a science of programming computers so that they can understand
images and video in a highly detailed way, or in other words making computers see [13].
Although computer vision with traditional image processing techniques where we have to
explicitly perform feature extraction has been applied in different application areas and showed
some promising results, when it comes to using it in solving complicated problems like object
detection in an image that may have different factors of variations like [4]:
➢ Viewpoint Variation: when an object is rotated or oriented in multiple dimensions with how
the object is photographed or captured.
➢ Scale Variation: when the object is the same but it varies in size
➢ Deformation: when the object to be detected has a deformity in shape, which makes it really
difficult to detect compared to other variations.
➢ Occlusions: object to be detected is hidden or covered by something (i.e. probably some
other object) in an image.
➢ Changes in Illumination: objects of an image captured under different lighting conditions
(i.e. low lighting or high lighting).
➢ Background Clutter: difficulty of identifying an object from an image because it has very
noisy background.
➢ Intra-class Variation: object that belongs to the same class but having diversified forms.
This research uses many image processing algorithms (i.e. techniques) for different tasks like
Geometric Transformations, Smoothing, Morphological Transformations, Thresholding, Edge
Detection and Contouring.
2.2.1.1.Geometric Transformations
Geometric transformation is image preprocessing technique which enables us to remove any
geometric distortion from a given image. It is a one to one mapping or bijection of a set having
some geometric structure to itself or another. It can be categorized based on its operand sets.
7
Different operations of geometric transformations include:
2.2.1.2.Thresholding
It is the easiest way of image segmentation.
i. Simple Thresholding: replaces each pixel of an image with either black or white
pixel depending on whether the intensity of the pixel is less than or greater than
some fixed constant value.
ii. Adaptive Thresholding: simple thresholding is limited when it comes to images
having different lighting conditions in its different areas.
In adaptive thresholding different thresholds are applied to different regions letting
it achieve better result for images with varying illumination. The threshold value
can be either a mean or a weighted sum of the neighborhood area.
iii. Otsu’s Binarization: used for binomial images whose histogram has two peaks. It
automatically takes a threshold value that is in the middle of the two peaks.
8
2.2.1.3.Smoothing
Smoothing: is achieved by convolving an image with low-pass filter kernel which removes
high frequency content like noise and edges. OpenCV provides four types of smoothing or
blurring techniques:
i. Averaging: is simply replacing the central pixel under a kernel or filter area with an
average of pixels under that same kernel.
ii. Gaussian Filtering: doesn’t use filter consisting of equal filter coefficients. It is
typically used to reduce noise and is achieved by convolving an image with
Gaussian function which results in reducing images high-frequency components.
iii. Median Filtering: looks like averaging but instead of computing average it
calculates median of all pixels under kernel and replace the central pixel with the
computed median value. Unlike other filtering methods, the central value is always
replaced by a value in the image.
iv. Bilateral Filtering: unlike other filters bilateral filtering is effective in removing
noise in an image while preserving its edges. It replaces each pixel with weighted
average of intensity values from nearby pixels where the weight is based on
Gaussian distribution.
2.2.1.4.Morphological Transformations
Operations performed on binary images based on the image shape.
i. Erosion: a way through which we erode the boundaries of an object in foreground.
It is achieved by convolving a kernel through an image where a given pixel through
an image where a given pixel remains 1 if and only if all pixels under the kernel are
1, otherwise erode or make zero.
ii. Dilation: It is opposite of erosion where each pixel under a kernel is 1 if at least 1
pixel under the same kernel is 1.
iii. Opening: It is performing erosion followed by dilation aiming at removing noise.
iv. Closing: opposite of opening which is performing dilation followed by erosion.
2.2.1.5.Edge Detection
Aims at identifying pixels in an image where they form curved line segment known as edge. It
is one of the fundamental steps in image processing and computer vision.
9
i. Canny Edge Detection: it is a multistage algorithm that mainly go through the
following stages:
➢ Noise reduction through Gaussian filtering
➢ Finding intensity gradient of the image which is achieved by filtering it with
Sobel kernel in both directions (i.e. vertical and horizontal).
➢ Non-maximum suppression by scanning a full image and removing pixel
which doesn’t form and edge.
➢ Hysteresis Thresholding is making a final decision of whether the detected
edges are really edges or not. It performs thresholding with two threshold
value that is minimum and maximum values. If the intensity gradient is
greater than max value it is an edge, if it is below min value it is not. Finally,
if the value is in between max and min, the decision will be made by its
connectivity.
2.2.1.6.Contouring
A contour is a curve joining all the continuous points along the boundary that are having same
color or intensity. It is very important object detection and recognition.
Image segmentation being an important part of image processing is all about dividing an image
into smaller pieces called segments. It is concerned with partitioning the image in to meaningful
components which have similar features or pixel characteristics [16].
There are different kinds of image segmentation techniques some of which are mentioned in the
previous section. In general, according to [16] and [17] segmentation algorithms can be
categorized as:
➢ Edge based segmentation: edges are a set of linked or connected pixels which are found in
between boundaries of different regions. Edges can be found by calculating the derivative
of an image function. Different techniques fall under this category like: Gray Histogram
Method and Gradient Based Method.
10
Gray histogram method is based on fitness of the Threshold. But since gray-histogram is
uneven for impact of noise it makes searching for minimum and maximum gray value very
difficult.
Gradient based method is recommended and mostly used for non-noisy images where gray
intensity around edges is intense enough. Different well-known algorithms like Differential
coefficient, Laplacian of Gaussian and Canny are categorized under it, where canny is the
most representative of them.
➢ Region based segmentation: divides an image into set of areas or regions based on pixels
that have similar characteristics. The main methods under this category are Thresholding,
Region Growing, Region Splitting and Merging.
Thresholding is choosing the right threshold value in order to separate image pixels to
different classes by using a principle that is based on characteristics of the image. It is the
most popular and commonly used method in image segmentation. It separates the
foreground objects from background where the objects are found to be lighter that the
background.
Region Growing works by first selecting a set of seed points from an image and then adding
those seeds to their neighboring pixels based on some predefined criteria like gray scale or
color. So, as a result adding similar regions or sub-regions there will be region growing.
Region Splitting and Merging works by randomly choosing regions and then tries to either
split or merge them based on certain condition. So, first the image is divided into regions
until there no more regions to divide (or split) and then merge until there are no more
similar (or related) regions to merge.
➢ Special theory-based segmentation: include segmentation techniques like Fuzzy
Clustering and Neural Network-based.
Fuzzy Clustering uses fuzzy set theory to cluster and enables fuzzy boundaries to exist
between different clusters.
Neural Network-based method unlike any previously mentioned segmentation methods,
uses dynamic equations to find edges by first mapping each pixel of an input image into
the neural network where each neuron of networks input layer represents a single pixel.
➢ Watershed based segmentation: the basic view in this algorithm is that the image is
considered as topology geomorphology and each pixel is taken as the altitude above the
11
sea level. Each local minimum including its neighborhood in the image is considered as
catchment basin and its boundary is called watershed.
The aim of this algorithm is finding the local maximum of the segmented region [18].
Deep learning is a subfield of a broader family called Machine Learning which is all about enabling
computers to learn from a given data on their own rather than being ordered what to do using some
set of rules [19]. Deep learning is a representation learning which learns from representation of
data through its multiple layers with multiple level of abstraction. It solved the limitation of
conventional machine learning in learning from raw data which rather required a feature extractor
to convert raw data into appropriate representation or feature vector [3].
Representation learning enables AI systems to achieve better performance than that of hand-
engineered representations. Not only that but it also enables achieving higher adaptability and
saves a great deal of human time and effort which will otherwise be wasted in manually designing
features [5].
The deep in the name deep learning is not to mean the level of depth of understanding but rather
the amount or number of successive layers of representation. Deep learning can be conceptualized
as large neural networks where there are many layers which are built on top of each other [4].
Machine learning algorithms whether deep or not are categorized in to three main categories based
on how they learn [20]:
➢ Supervised learning: requires the training data to be labeled so as a result the output
will be supervised. It has two major tasks that are Classification and Regression.
➢ Unsupervised learning: the training data is not labeled as a result the output is not
supervised.
➢ Reinforcement learning: the learning architecture here interacts with the environment
and gets rewards or penalties (i.e. negative rewards). The learning system which is also
known as an agent learns by applying the best strategy (i.e. called policy) so that it gets
more rewards overtime.
12
Although deep learning can be applied in many areas, the concern of this research is in its
application with computer vision. Because of its ability in extracting features on its own, it is
currently being widely used in the field of computer vision. It is not only dominating but also
replacing the traditional machine learning algorithms which were very popular in their own time
[21].
Even though there are many deep learning algorithms out there, when it comes to their applicability
in computer vision tasks like object recognition, Convolutional Neural Network is demonstrating
better and state-of-art performance [19] [22] [23].
Convolutional neural network also called Convnet or CNN is one of the algorithms in a family of
Deep Neural Networks which is inspired by the visual cortex of animals is widely used computer
vision [24]. CNNs eliminate the hand-crafted feature extraction (or feature engineering) phase
which is common in traditional machine learning systems by automatically learning them in the
training phase which makes them end-to-end learners [4]. The two major properties of CNN which
makes it the right model for the job is, its ability in learning patterns which are translation invariant
and their spatial hierarchies [19].
➢ Translation invariant patterns: once learning certain corner of a picture, CNN model can
recognize anywhere even if it is rotated.
➢ Spatial hierarchies of patterns: the initial convolution layer will learn small local patterns
while the next layer learns larger patterns which are made of the initial ones.
Architecturally CNN has the following major components or layers [4] [19] [20] [24] [25]:
i. Convolutional Layer or CONV: it is a unit where major computational work involved and
can be considered as a set that contains feature maps with neurons arranged in it. It consists
of filters (or kernels) where each filter has a width and a height that are nearly always a
square, letting CNNS take advantage of optimized linear algebra libraries that operate
efficiently on square matrices. So, after convolving K kernels across the input image the
output is stored in 2D shaped matrices called activation maps or feature maps.
CONV has three parameters through which it controls the size of an output volume: Depth,
Stride and Zero-Padding.
13
Stride is the step that the kernel takes while convolving, where smaller strides lead to
overlapping Receptive Fields (i.e. size of local region of an input image where neurons of
the current volume connect to).
In Zero-Padding zeros are added along the borders of the input so that the size of input and
output volume would be the same.
Overall CONV layer requires 4 parameters as an input:
➢ Number of Kernels (K)
➢ Size of Kernels or Receptive Fields (F)
➢ The Stride (S)
➢ The amount of Zero-Padding (P)
The output of CONV layer is then Wout (Output Width) X Hout (Output Height) X Dout
(Output Depth) where:
14
➢ The Stride (S)
So, POOL layer yields an output volume of size Wout (Output Width) X Hout (Output
Height) X Dout (Output Depth) where:
➢ Wout = ((Win – F) / S) + 1
➢ Hout = ((Hin – F) / S) + 1
➢ Dout = Din
Reducing the input image size also makes the neural network tolerate a little bit of image
shift (i.e. Location Invariance)
iv. Fully Connected Layer or FC: where neurons in this layer are all (i.e. fully) connected to
all neurons in the previous layer, as is the standard for feedforward neural networks. They
are placed at the end of the network.
Object detection is an application of different methods in order to find or locate a given object
either from image or video source.
Having feature learning and representation capabilities deep learning is being applied in many
areas of computer vision but more importantly in object detection [8]. Creating a detection model
in deep learning requires a large dataset and high computing power for training. And to have
similar detection rate for each class in the dataset, number of images and size of images should be
even across classes [26]. Even though deep learning-based object detection algorithms are robust
there still should be an improvement when it comes to their application in real time [8].
15
Currently there are many deep learning-based objects detection models available that are briefly
categorized into two: models based on region proposal and models based on regression.
➢ R-CNN: is a CNN based region proposal model that uses region-based segmentation
method to get the regions and feed them to CNN. Even though this model has some
improvements compared to prior ones, its performance is poor when it comes to using it
in real time.
➢ SPP-net: this model uses pyramid pooling which solved the problem of object
deformation and incompleteness which existed in R-CNN but it still is performs poorly
when it comes to real time usage.
➢ Fast R-CNN: improves performance of by using ROI pooling on region proposals after
mapping them to feature layer of CNN. The role of ROI pooling lets the model have a
fixed size vector for successful connection with full connection.
➢ Faster R-CNN: adds RPN (Region Proposal Network) in minimizing the computation
time taken by selective search approach in obtaining region proposals. It reduces the
proposed regions which were 2000 in R-CNN to 300.
➢ R-FCN: adopts RPN from Faster R-CNN and solves a problem of ROIs in sharing
computation. It adapts SoftMax classifier for feature vector classification. Even though
its accuracy is the same as that of Faster R-CNN its speed is improved 2.5 times.
➢ YOLO: removed ROI module so it doesn’t extract object region proposals anymore and
instead it uses CNN at front end for feature extraction and 2 fully connected layers at the
other end for classification and regression. It improved the speed so that now it can be
used in real time fashion but it has detection accuracy as a tradeoff.
➢ SDD: integrated YOLO’s regression idea and Faster R-CNN’s anchor’s mechanism (i.e.
RPN). Unlike YOLO’s global feature extraction it uses local feature extraction. It became
16
the first deep learning-based object detection model to have a higher accuracy and still
maintain real time requirement.
Deep learning has around 16 top ranking open-source platforms according to GitHub’s stars and
contributors count [27] [28]. Among the 16 platforms the top 5 has been mentioned in this
document starting from highest ranking one. And using these platforms deep learning can be
applied in various application areas like: image and video recognition/classification, audio
processing, text analysis and natural language processing, autonomous systems and robotics,
medical diagnostics, computational biology, physical science, finance, economics, market, cyber
security, architectural and algorithmic enhancement.
2.3.3.1.Tensorflow
Developed by Google Brain Team within Google’s Machine Intelligence research organization. It
is a library for numerical computation which enables deployment to both CPUs and GPUs without
a need to rewrite a code. It includes a data visualization toolkit called Tensorboard and it provides
a stable APIs for python and C.
2.3.3.2.Keras
Keras is a high-level neural network API. It run on top of Tensorflow, CNTK or Theano and is
written in python.
Keras allows for easy and fast experimentation. Like Tensorflow it runs on both CPU and GPU
and supports both CNN and RNN.
2.3.3.3.Caffe
Developed by Berkeley Vision and learning center and community contributors. It was developed
with speed, modularity and expression in mind.
17
2.3.3.5. PyTorch
PyTorch is a python package which provides two high level features: Tensors and DNN with a
strong GPU acceleration.
PyTorch is usually used as a replacement for NumPy and considered as a deep learning framework
which provides speed and flexibility.
License Plate is a term from American English (Number plate: British English) and it is either a
metal or plastic material attached to vehicles for their identification [29]. Although in some
governments it is only required to be attached at one side, in most countries like Ethiopia it is
required to be attached both in front and back side of the vehicle. So, license plate recognition is a
task of being able to extract the information found on the car’s license plate, in a way that it can
be used for further processing in different applications.
License plates recognition (LPR) systems are mostly used by a police force (or other security
enforcement teams) in order to minimize a challenge of tackling criminal’s movement and make
it a little more manageable [30] [31]. It may be used in checking whether a vehicle is registered or
not, in electronic toll collection on pay-per-road scenarios, in indexing the activity of traffic by
highway agencies, in crime deterrent and data collection.
The license plates in Ethiopia has three main body parts or features where each part or feature
convey different information accordingly: Service Code, Regional Code and Alphanumeric Code
[2].
➢ Considering the service code, Ethiopian license plates has the following types (or
variations):
18
Table 2-1: Ethiopian License Plate Classification Based on Service Code
➢ Considering the regional code, ELPs has the following types (or variations):
Table 2-2: Ethiopian License Plate Classification Based on Regional Code
19
2.5. Related Work
In this section different papers related to this research has been reviewed. The review mainly
focuses on a problem or a research gap that the reviewed paper tries to solve and methodology it
uses.
Although this is a popular research area and it has been attempted using so many techniques
including traditional image processing, machine learning and deep learning, since, license plates
of different countries have different features like size of the plates, characters contained in the plate
(which in this research’s case are Amharic letters, English letters and Numbers ranging from 0-9)
and methodology used, it is still worth investigating.
In case of Ethiopian license plates there is only one research till date and it has been attempted
with traditional image processing. But other technologically developed countries like China,
America, India, UAE and many more have exploited and conducted many researches regarding
the state-of-art deep learning models in recognizing their own language’s character features which
are included in their country’s respective license plates.
A paper titled “Automatic Recognition of Ethiopian License Plates” was conducted on recognition
of Ethiopian license plates [2]. It used Gabor Filtering, Morphological Closing Operation and
Connected Component Analysis for plate region detection. Gabor filter is performed on gray scale
image and then its response is binarized to perform morphological closing. Finally, connected
component labeling is performed to find connected objects or components in a resultant image. In
segmentation phase first it applies orientation adjustment and size normalization by performing
Canny edge detection and Hough transform. Second it performs segmentation with connected
component labeling, finally, numbers and other characters are separated based on their location
and size (i.e. width and height). Once characters of the plate are segmented correlation-based
template matching is performed for recognition.
[32] basically, has three major phases: plate detection, character segmentation and character
recognition where 2 separate CNN models are used for detection and recognition. The detection
20
phase has both image preprocessing and CNN classification (detection) steps. Here the
preprocessing phase consists of morphology filtering to contrast maximization, gaussian blur filter
to remove noise, adaptive threshold to eliminate unimportant regions in the image, finding all
contours to locate a curve that joins all continuous points having the same intensity, geometric
filtering to improve the precision of LP detection, CNN detection and drawing boundary boxes
around plate region with minimum threshold value of 0.7. Segmentation phase consists of gray
scaling, canny edge detection, extraction of contours, geometric filtering and boundary boxes of
characters segmented. Finally, in last phase which is character recognition the second CNN model
is used.
[33] is about classification of images as license plate or non-license plate by using CNN. The
network used was constructed of 7 layers, where the first convolutional and sub-sampling layers
have 4 feature maps and the next once after that have 11. The third convolutional layer has 120
feature maps it is connected to the fully-connected layers which has 84 units. Finally, it has an
output layer which classifies the image as either a license plate or non-license plate.
[34] proposed efficient hierarchical methodology for license plate recognition system by first
detecting vehicles and then retrieving license plates from detected vehicles to reduce false
positives. Finally, CNN based LPR is used in recognition of plates characters. It used YOLO V2
for vehicle detection which has 19 convolutional layers and 5 max pooling layers. For license plate
detection it used SVM and for character segmentation it performed a number of image processing
techniques like: gray scaling, binarization, horizontal and vertical projection. For character
recognition it used a CNN model with 2 convolutional layers, two ma pooling, two fully connected
layers and one output layer.
[35] this paper is pure image processing based and it has 4 basic steps: preprocessing, localization,
segmentation and recognition. In preprocessing phase, it performs gray scaling and median
filtering to get rid of the noise while conserving sharpness of the image. In localization phase,
region of the plate is detected from rest of objects in an image.
[36] the paper aims at demonstrating the capability of CNN in recognizing vehicle’s states (or
regions) from a number plate. The researchers only considered 4 classes for simplicity sake. For
each class or state the dataset contained 200 images where each image had some kind of distortion,
tilt and illumination at different angles. The results achieved are more than 95% in average.
21
[37] Proposed an end-to-end deep-learning based ALPR system for Brazilian license plates. The
system presents two considerably fast and small YOLO based networks operating in cascade
mode. Since searching for a relatively very small object such as a license plate from a higher
resolution image demands too much of computing resource, the paper first performs frontal view
extraction where it extracts the front of a car which in turn contains the license plate.
Even though [2] has been conducted on Ethiopian license plates, the methodology it used for
detection and recognition of the plates was plain image processing techniques whereas this
research tryed to use DL based methods. In [32] DL or CNN based methods are used in both
detection and recognition but the whole system was built for features that are specific to Tunisian
license plates which mostly include only numbers. In [33] CNN is used not to extract the
information that is available on the plate but rather to simply classify a given image as a plate or
not. It doesn’t have neither detection nor recognition capabilities. [34] also used CNN for
recognition but SVM for plate detection after isolating front of the car whose plate is to be detected
using YOLO. It has different methodology (approach) and plate features compared to current
22
research. [35] unlike the current research uses pure image processing in both detection and
recognition of the plates. [36] and [37] are both DL based but the plate features (plate shape, color,
dimension and its content) they work on are greatly different from current research.
23
either as plate or non-
license plate
Evaluation On Detection it got For license plates it Plate detection rate is
96% and on character has 98% and for non- 96.12% and plate
Recognition it got license plates it has localization rate is
95.3% 100% accuracy 94.23%. The recognition
phase achieved 99.2%
accuracy.
Limitation It uses plain CNN for This research is It uses extra phase for
detection which has an limited to classifying plate detection which
impact on both weather a given image needs an additional
accuracy and speed. is a license plate or computational time and
not. It does not do has negative impact on
detection, overall speed. But yet the
segmentation and localization phase’s
character recognition. accuracy is relatively
low.
24
Table 2-5: Previous Works on Object Detection and Recognition Part 2
25
matches of at least 5
characters it presented an
accuracy of 97.39%.
Limitation The recognition The system is only The overall accuracy of
method is an efficient capable of classifying ALPR system is
since the matching is the state or region to relatively low.
done pixel by pixel, which the license plate
which makes it hard to belongs to. Other than
recognize or classify that, it does not
images of varying perform neither
angles. Detection nor
Segmentation. As a
result, there is no
recognition of
characters.
26
3. CHAPTER THREE: RESEARCH METHODOLOGY
3.1. Introduction
This chapter of the document discusses about an overall methodological approach, data collection
methods, data analysis methods and tools used in order to successfully meet the main objective of
this research that is, building a deep-learning based model for detection and recognition of
Ethiopian car license plates.
Section 3.2 discuss the general research approach used in this document, Section 3.3 is about data
collection methods, it talks about different methods used in collecting suitable data that is
necessary in developing a robust deep learning-based detection and recognition model, Section 3.4
discuss about the data analysis approaches used in analyzing the collected data, finally Section 3.5
lists out and describes different tools used in conducting this research, which mainly includes both
hardware and software tools used in implementing and evaluating the deep learning models.
Considering the purpose of this thesis work, it is categorized under applied research. It focuses on
application of different deep-learning and image processing algorithms that are appropriate in
solving the problem at hand (i.e. deep-learning based LPR).
As mentioned in previous chapters of this document, the research is basically composed of three
main phases: detection of the license plate from the rest of objects found in an image, segmentation
of characters found on the detected plate and finally recognition (or classification) of each
character obtained from the segmentation phase.
So, first a detection system is built from a deep-learning based object detection model by tuning
the hyper parameters of its architecture to make it specifically fit the features in Ethiopian license
plates. Then once the plate has been detected, image processing is applied to segment plate’s
characters. But since the detected plate is mostly noisy it is almost impossible to have an accurate
segmentation without first cleaning it (i.e. which includes tasks like resizing, smoothing and
morphological transformation) using appropriate image preprocessing algorithms. Finally, a
Convolutional Neural Network based recognition model capable of recognizing characters
contained in Ethiopian cars license plates has been built.
27
3.3. Process Design
Below is the research design process diagram which illustrates each step that were followed while
conducting this research starting from problem identification to conclusion. So, it started by
identifying the problem and formulating research questions. Then a literature was reviewed both
on scientific concepts and related works. After finishing the review general and specific objectives
of the research were specified. And then data collection, tool selection and dataset preparation
were conducted. Based on the dataset prepared and tools selected, the ELPR model was trained
and built. Finally, the model was evaluated and a conclusion was made.
28
3.4. Data Collection
This main data that is taken as an input in this research is images of different license plates. But
before starting to take pictures of LP images, information about different currently existing LP
types in Ethiopia had to be collected. So, in conducting the whole work two major data collection
mechanisms were used.
Firstly, a literature or document review was conducted on related works with this research. In
doing so all the information regarding the types and variations of different LPs has been acquired.
So, Ethiopian license plates can basically be categorized based on plate code and its region. Plate
codes are represented both in numeric and character format where the numeric code ranges from
1 to 5 and the character codes which can either be Amharic or English have around 8 categories.
Considering regional classification, in Ethiopia there are 9 national regions (i.e. Tigray, Afar,
Amhara, Oromia, Somali, Benishangul-Gumuz, Southern Nations Nationalities and People
Region (SNNPR), Gambella and Harari) and 2 administrative states (i.e. Addis Ababa city
administration and Dire Dawa city council) [38] where each can print their own plates of code: “1-
5”, “የዕለት”, “ተላላፊ” and “ልዩ”. So, in order to identify a given plate the LPR system must be able to
get the its code and regional information since a same plate number can be given to different
vehicles in different regions and plate codes. Each plate’s code class has different foreground color
while their background color is all white except for police vehicle plates which is yellow.
Morphologically speaking Ethiopian license plates have two formats, that are single row and
double row plates.
29
Figure 3-2: Sample Ethiopian License
Plates
Secondly, using the information found from previous phase of data collection (i.e. literature
review), different types of license plate images were taken using a digital camera considering
varying factors like environmental condition and different car angles. A digital camera having a
resolution of 13 MP was used to capture all the images. Both back and front sides of the vehicle
were captured by varying camera angles to right and left of the license plate. Different
environmental conditions like rainy, sunny and night time were considered, where finally, a total
of 1100 images were collected.
Once enough images were collected, an analysis was performed in order to make sure that the
license plate is captured in an appropriate way. So, it needs to be made sure that the LP in an image
is not partially cutoff, occluded by some other object in a context, the plate is so far from the
camera that it is not even distinguishable to human eye, the plate characters are blurred due to
unstable capturing and camera angle where the plate characters all seem to be merged together.
30
3.6. Tools
Here all the tools that are both hardware, software and programming languages which were used
starting from writing the document to implementing the model has been mentioned and specified.
31
3.6.1. Hardware Tools
➢ A DELL computer with Intel Core I5-5200U CPU at 2.20GHZ processor, a 4GB RAM
and 1TB hard disk were used for writing the document and training the model.
➢ A Techno mobile with a rear camera of 13MP and a storage of 16GB were used in capturing
LP images.
➢ OS: since most of the programming tools used in this research were built to be used with
Windows operating system, Windows 10 was used as an OS in both composing the
document and training the DL model.
➢ Composing: in writing the research document MS-Word 2019 was used
➢ Diagrams: in writing the document the diagrams are drawn with MS-Visio 2019.
➢ Image Annotation: once the images were collected, a software called labelImg which was
built using python programming language and QT for graphical user interface, was used in
labeling (or annotating) the images collected. This tool is preferable because it is easy to
use, its light weight and it stores the labeling information in XML file in PASCAL VOC
format (a format user by an image database organized according to WordNet hierarchy
called ImageNet) that is suitable with the object detection platform’s requirement.
➢ OpenCV-Python (Open Computer Vision): is a library that is built with python and enables
to implement image processing tasks. All image processing parts of this research has been
implemented with this library.
32
OpenCV started as a research project at Intel and now it is basically the largest computer
vision library in terms of shear [39]. It contains implementations of more than 2500
algorithms.
➢ Tensorflow: is a mathematical library which provides tools which is used as an end-to-end
open source platform for building machine learning models. It makes the process of
building and training models easy using intuitive and high-level API like Keras [40].
It’s a math library build by Google Brain team to be used internally for both research and
production. It allows the creation of dataflow graphs where each node in a graph is a
mathematical operation and their connection is known to be a tensor (i.e. a
multidimensional array). The math operations in the libraries are written in C++ while the
high-level abstraction that enables the communication between those libraries is
implemented with Python [41].
➢ TFOD API: is Google’s open source framework built on top of Tensorflow to construct
and train object detection models [42]. It has two different variations of installation
depending on where one wants to run it. There is Tensorflow CPU that runs on CPU and
Tensorflow GPU that runs on GPU.
➢ Faster R-CNN Model: when it comes to selecting an appropriate model for a problem at
hand there is always a tradeoff between speed and accuracy. So, in this research a license
plate has to be detected from an image which doesn’t need that much speed compared to
other real time systems. Not only that but the detection accuracy is very much needed
because if the bounding box of detected area has even a little bit of inaccuracy, there will
be some characters of the plate missing which impacts the later stages and the recognition
as a whole. And the other thing to note while choosing object detection model is the size
of the object to be detected and license plates are really small.
So, considering the above-mentioned criteria, Faster R-CNN proved to be the most
accurate detection model based on Google’s research [42]. Based on the research Faster R-
CNN tends to be slower but more accurate than R-FCN and SSD requiring about 100 ms
per image.
33
Figure 3-3: Deep Learning Based Object Detection Model’s Speed and Accuracy Comparison
➢ Feature Extraction: the accuracy and speed of the Faster R-CNN detection model highly
depends on the feature extractor that it uses [43]. And even though there are many feature
extractors like: VGG, MobileNet, Inception, ResNet and Inception-ResNet, this research
uses ResNet101which enables Faster R-CNN achieve higher accuracy next to Inception-
ResNet.
The reason why Inception-ResNet won’t be used is because it takes higher GPU time (or
has slower speed) compared to ResNet for almost negligible accuracy tradeoff [42].
Figure 3-4: Deep Learning Based Object Detection Model's Accuracy with Different Feature Extractors
34
Figure 3-5: Deep Learning Based Object Detection Model's GPU Time (ms)
35
4. CHAPTER FOUR: RESEARCH DESIGN
4.1. Introduction
This chapter of the document discusses about the research design that this research employed. The
first section discusses in detail about different properties of Ethiopian license plates. The second
section discusses about proposed design for each phase individually and finally the third section
discusses about how the three modules from each corresponding license plate recognition phase
communicate with each other and work as a single system.
In order to build a deep learning model for license plate recognition which contains both image
detection and classification tasks, the research has to go through many stages which are shown in
an overall design diagram below:
36
Input Image to Be
Image Data Acquisition
Detected
Cleaning
Augmentation
Splitting Segmentaion
Recognized Output
In this part of the research, images of the different cars with different variations of license plates
have been captured using a digital camera with a resolution of 13 MP. Addis Ababa were chosen
as a place of data collection because it is a capital city of Ethiopia and so, different cars from
different regions of the country having different regional code can be found there. The images
were taken from different distance, camera angles and illumination in order improve model’s
detection and recognition accuracy under different conditions and circumstances. The images were
also captured while the car is moving with slow speed, medium speed and high speed.
37
The images needed to train the recognition model (English letters, Numbers and Amharic letters)
were taken from cropped from the collected license plate images.
4.2.2. Cleaning
In this part of the research before the images were used in training there was image pyramiding
step. Pyramiding is a representation where the image has to go through series of smoothing and
subsampling. There are two types of pyramiding, Gaussian and Laplacian pyramids. In this
research in order to make the model training process faster a down pyramiding with Gaussian filter
was used. Here 5 x 5 Gaussian kernel is used to produce (i + 1) layer from input layer (i). So, the
resulting image denoted Gi which is found after convolving Gi-1 (input image) will be one fourth
of the original area, which is also called octave. [14]. So, if the original image were M x N it will
be M/2 x N/2 (at second level of subsampling). All the images used in detection model were
reduced to height less than 1600 pixels and a width less than 1200 pixels.
Figure 4-2: Visual representation of image pyramid with 5 levels (image source: Wikipedia)
4.2.3. Augmentation
Image data augmentation is a crucial stage while developing deep learning models. It enables
expanding the amount of training image data in a given training dataset by creating modified
versions of images in a dataset which in turn makes the model generalize. Even though there are
different augmentation techniques out there, in this research a Random Horizontal flip is used
which is basically reversing the columns of pixels and it helps the plate to be detected in scenarios
where the camera is unintentionally installed inappropriately. Other augmentation styles like
vertical or horizontal shift weren’t used because in case of shifting some part of the plate might be
cutoff and that makes recognition impossible.
38
4.2.4. Splitting
Splitting is a process of separating the whole dataset in to Train and Test set, where a train set will
be used by a model to learn different categories or classes by making predictions on input image
and making corrections when the prediction that has been made is wrong. Test dataset is used to
evaluate the performance of the model once it has been trained. Even though there is no rule in
choosing the proportion of what train and test set size to use, after reading many blogs and books
80 to 20 percent ratio (80% for train and 20% for test) has been used splitting the entire dataset.
So, for the detection model out of 1100 collected car images 880 were used for train set and the
rest 220 were used for test set. And for classification model out of 4240 total cropped character
images 3392 were used for train set and the rest 848 were used for test set.
The license plate detection part was developed using deep learning approach but while
applying this approach for object detection there is one tradeoff that must be considered
depending on the problem and that is Speed/Accuracy. So, considering license plate detection,
accuracy is more important feature to consider than speed because missing even one character
from the detected plate means not identifying the vehicle as a whole. So, in this research a
better model (when it comes to accuracy) called Faster R-CNN has been used [44]. This model
is mainly composed of two modules. The first one is a fully convolutional network that
proposes regions letting it have two primary benefits, like being fast and able to accept images
of varying resolution having any width and height. The second one is the Fast R-CNN detector
which uses the proposed regions from the first module.
39
4.2.5.1.Faster R-CNN Detection
So, for feature extraction although the original Faster R-CNN paper has used VGG (a CNN
architecture invented by Visual Geometry Group with 13 shareable convolutional layers) and
ZF (Zeiler Fergus CNN architecture with 5 shareable convolutional layers) as base networks,
in the developed system deeper and more accurate ResNet (Residual Network, a CNN
architecture 101 layers) has be used [4] [44].
ResNet is a much deeper network than VGG. It uses residual model to train CNNs which have
over 1000 layers on CIFAR-10 dataset (consists of 60000 32 X 32 color images in 10 classes)
[45] [46]. In order to reduce the volume size ResNet uses only two pooling layers, the first one
is Max Pooling which is used at the beginning to reduce the special dimensions and the second
one is Average Pooling at the end. Unlike the common CNNs ResNet adds the original input
to the outputs of convolution, ReLU (Rectified Linier Unit) and BN (Batch Normalization)
layers, this operation is called Identity Mapping and the reason for using the term Residual.
Batch Normalization layer normalizes the activation from a given input layer and passes it to
the next layer, which helps to stabilize training and reduce the number of epochs used to train
the model [4].
In this research ResNet 101 has been used as base network for feature extraction so it has 101
layers. At first it resizes an image having 3 channels (Red, Green and Blue) into 224 X 224
(height to width), so as a result the input shape will be 224 X 224 X 3. After the first
convolution layer having (7 X 7) with a depth of 64 and stride 2 the output will be 112 X 112.
40
Table 4-1: ResNet101 Network Architecture
1 x 1, 64
3 x 3, 64 x3
1 x 1, 256
Conv3_x 28 x 28
1 x 1, 128
3 x 3, 128 x4
1 x 1, 512
Conv4_x 14 x 14
1 x 1, 256
3 x 3, 256 x 23
1 x 1, 1024
Conv5_x 7x7
1 x 1, 512
3 x 3, 512 x3
1 x 1, 2048
41
Figure 4-5: Sample Input Images and Their Output After Plate Has Been Detected
42
highest scoring bounding box crop (using a basic python slicing). The result of the extraction phase
(i.e. cropped plate image) is given as an input to the segmentation phase.
The segmentation of characters on the detected license plate is done with a combination of different
digital image processing techniques. OpenCV or Open Computer Vision with python has been
used to do each image processing task including all necessary preprocessing or cleaning.
First the input image is gray scaled and then the all the edges are found using Canny edge detector.
Canny edge detector is a multi-stage edge detector which comprises 4 stages. On its initial phase,
it does Noise Reduction with 5 x 5 Gaussian filter and then it finds the Intensity Gradient of the
image by filtering it in both horizontal (Gx) and vertical (Gy) direction. The gradient will always
be perpendicular to the edges. It uses the below equations:
43
Equation 4-1: Equation of Canny Edge Detector
Once the gradients magnitude and direction has been found the next step is Non-Maximum
Suppression where pixels that doesn’t constitute an edge are removed. Finally, with Hysteresis
Thresholding by using 2 threshold values (min and max) it decides whether an edge is really edge
or not. If the edge’s intensity value is more than max then it is sure-edge else if it is less than min
it is not. If intensity value falls in between max and min value the algorithm checks its connectivity.
So, if it is connected to sure-edge then it is an edge else it is not.
Once all the edges are found using Canny next step is to find edges that constitute a line using
HoughLines method that takes four parameters. First parameter is a binary image that has been
found using canny, second and third parameters are rho and theta values (where rho is the
perpendicular distance from origin to the line and theta is the angle formed by this perpendicular
line and horizontal axis) which are measured in pixels and radians respectively. The fourth
argument is a threshold value that specifies the least vote it should get in order to be considered as
a line. Finally, by using angle of the longest line, the transformation matrix is found using the
below equation (Equation 4.2), which is a scaled rotation with an adjustable center of rotation so
that it can be rotated at any location of preference depending on the angle of a line [14]:
The transformation matrix is given to warpAffine method which takes 2 x 3 transformation matrix
and returns the transformed image based on the matrix given.
44
Figure 4-7: License Plate Orientation Adjustment Sample
If the plate image has one row then the height of the largest contour is to be either added or
subtracted from the y value of the detected line. If the detected line is bottom of the plate then
height of the largest contour is subtracted else it gets added. So, the resulting area is taken as ROI
(Region of Interest) and gets cropped out.
4.2.6.3. Segmentation
Here all the characters on the plate are going to be segmented into individual letters and numbers.
So, the output image from border elimination is binarized using OTSU method which works on
bimodal images. Bimodal image is an image whose histogram has two peaks. So, in OTSU unlike
global binarization methods where an arbitrary value is taken as a threshold, it takes an
approximate value that is in a middle of the two peaks of a bimodal image. OTSU algorithm tries
to find a threshold value (t) which minimizes the weighted within-class variance given by relation
[14]:
45
Equation 4-3: Equation through which OTSU Finds Threshold Value (t)
Once an image has been binarized the next step is to find all the contours. Contour is simply a
curve that joins all continuous pixels that have same intensity level. It takes three arguments where
the first is source image, second is contour retrieval method and third is contour approximation
method. Approximation method determines whether the contour holds all the coordinates of the
boundary or not. In this research a method called CHAIN_APPROX_SIMPLE is used which
removes redundant points and compresses the contours in order to save memory.
Although there are different kinds of contour retrieval modes, this research used
RETER_EXTERNAL because of the unique properties that Ethiopian license plates have. For
some specific characters like number codes, rather than taking circle shape and the number code
inside it separately RETER_EXTERNAL takes the circle and the number code all at once.
Finally, noises (i.e. non-characters) are going to be filtered based on their contour area. So, any
contour that does not have properties of a plate character such as width and height is removed and
only valid contours are cropped and saved as separate images for recognition model.
46
Figure 4-9: Plate Character Segmentation Sample
In this part a convolutional neural network (CNN) has been trained to classify or recognize the
characters on the plate. Plate characters or images have 3 channels (RGB) with both height and
width having a size of 28. ReLU (Rectified Linier Unit) has been used as an activation function
except for the outer layer of the network where SoftMax has been used since it is preferable choice
when it comes to a deep learning model with more than 2 classes [47].
ReLU unlike the old Sigmoid and Tanh functions it is not saturable meaning the gradients doesn’t
get killed when neurons saturate (i.e. It doesn’t have a vanishing gradient problem). It is also
extremally computationally efficient and sparsely activated meaning there is a strong likelihood
for any given unit not to activate at all since it is zero for all negative inputs. And it shows better
performance being applied on different application areas [4]. Visually:
47
Figure 4-10: ReLU Activation Diagram
As a final layer an activation function called SoftMax which mostly used in multiclass
classification problems [48]. It extends the idea of logistic regression which produces a decimal
between 0 and 1 into a multi-class problem by assigning probabilities for each class in that
problem. Those decimal probabilities must finally add up to 1.0. SoftMax should be used with
mutually exclusive classes because it predicts only one class at a time. Below is its equation:
Where:
• K is the number of classes, s(x) is a vector containing the scores of each class for instance
x, is the estimated probability that the instance x belongs to class k given the scores
of each class for that instance [20].
The number of samples preprocessed before updating the model (batch size) is 1 and an iterative
algorithm, SGD (Stochastic Gradient Descent) has been used as an optimizer while training. SGD
is a modification to standard Gradient Decent algorithm. After computing the gradient, rather than
updating the weight matrix on the whole training data it only updates on small batches or samples.
This make the algorithm much faster since the data or number of samples being manipulated at
each iteration is really small.
48
As mentioned above in this section of the document, the recognition model is going to be trained
with images of a character having 3 channels (Red, Green and Blue).
The input shape is 28 X 28 X 3, below is a table that contains an overall network architecture:
As it can be seen in a table the network has 2 convolutional layers, 2 pooling layers, 3 ReLU
activations, 2 fully connected layers and finally a SoftMax activation.
Max pooling has been used for all pooling layers. Pooling basically reduces the special size of
the input image which in turn reduces the number of parameters and computation in the network.
Although a pool size more than 2 x 2 can be used for larger input images, for smaller sized ones
like those used in this research 2 x 2 is an appropriate choice.
49
5. Chapter Five: Results and Discussion
5.1. Introduction
In this chapter of the research, the results of the developed system are evaluated and its outcome
has been discussed. So, the first part is about object detection, the second part about character
segmentation and finally the third part about character recognition (i.e. classification).
The object detection model takes an image which contain different objects including the car and
detect or localize only the license plate. the model was trained with 1100 different images that
include both the car and the license plate. The license plates images in the training dataset are
composed of different plates with varying regional code and distance from the camera so that the
model may be able to generalize more.
The whole dataset is divided into training and testing set with a ratio of 80 to 20 respectively. So,
the training set contains 880 images and the testing set contains 220 images. The original images
were taken with a 13 MP camera that has an image resolution of 3120 x 4160 which is too big for
training. So, image pyramiding was used in order to reduce the resolution of the image. After the
pyramiding a resolution of 3120 x 4160 image was reduced to 780 x 1040.
The model was trained with 50000 steps and a batch size of 1. It has a learning rate of 0.0001
which were scheduled to decrease to 0.00001 after 90000 steps and to 0.000001 after 120000 steps,
but since the loss stopped to drop down after the 40000’s step training process was terminated after
going 50000 steps so the learning rate remained at 0.0001.
The box classifiers classification and localization loss are 0.018383 and 0.012104 respectively.
50
Figure 5-2: Graphical Representation of ResNet Trained Detection Model's Evaluation Result in AP an AR
Among the nine images which were randomly chosen during model evaluation the license plates
were detected successfully in all cases.
51
Figure 5-3: Sample Detection Results
52
The same dataset was trained with Inception v2 model having a learning rate of 0.0002 and a batch
size of 1. And it had lower MAP @ 0.75 IOU and AR @ 0.50 IOU that is 0.849132 and 0.732857
respectively, which results in the detected plate having some of its part being cutoff. Below are
its results:
Figure 5-5: Graphical Representation of Inception V2 Trained Detection Model's Evaluation Result in AP an AR
53
So, the proposed model not only showed a better detection accuracy than a model by [2] which
was developed with conventional image processing techniques having a detection accuracy of
88.9, it also performed better than a deep learning model trained with Inception feature extractor.
Table 5-1: Comparison of Ethiopian License Plate's Detection Model with Some Related Works
As can be seen from the table above the proposed model achieved a better accuracy comparatively.
The character segmentation part is developed using conventional image processing methods which
were specifically described in chapter 4. So, after testing the algorithm on 15 randomly selected
plate images, it was able to detect 13 correctly but 2 of them had some of its characters missing
due plate’s image quality. Both of the images that are incorrectly segmented are blurred and have
very low image quality which actually shows the drawback of conventional image.
So, the segmentation module achieved a total of 86.66 % which wouldn’t have been achieved if
there were no preprocessing like orientation adjustment and plate border elimination.
54
Figure 5-6: Sample of Some Correctly Segmented License Plates
55
5.4. Character Recognition
The character recognition part was developed using Convolutional Neural Network which as
stated in chapter 4 of this document has 2 convolutional layers, 2 pooling layers, 3 ReLU
activations, 2 fully connected layers and finally a SoftMax activation. The final model that this
research used has 0.01 learning rate, SGD optimizer, batch size of 1 and ReLU activation function.
But to select those hyperparameters different experiments were done.
So, after training the model the following results were found:
So, the model was able to get 94% accuracy and 91% AP.
56
Figure 5-8: CR Model's First Experiment Evaluation Results Graphically
On the second round the model was trained with a learning rate 0.001 by keeping other parameters
used on model one as they are.
The model was able to get 99.9% accuracy and 96% AP which is a lot better than the Model-1 but
still training accuracy and validation accuracy lines could still be closer. The average precision has
also increased by 5% (from 91% to 96%).
57
Figure 5-9: CR Model’s Second Experiment Evaluation Results Graphically
On the third round the model was trained with a learning rate 0.01 by keeping other parameters
used on Model-2 as they are.
The model was able to get 99.9% accuracy and 98% AP. As it can be seen from the table, Model-
3 is yet a better model compared to Model-1 an Model-2 not only it increased in its AP but the
validation accuracy is much closer to training accuracy which shows that the model is not
overfitted.
58
Figure 5-10: CR Model’s Third Experiment Evaluation Results Graphically
On the fourth round the model was trained with a batch size of 128 and learning rate 0.01 which
is a learning rate used in Model-3, it is kept because it enabled the model to get better accuracy.
Other parameters are kept as they are.
As can be seen from the table changing batch size to 128 decreased the accuracy from 99.99% to
94.4% which is a lot and the AP is also decreased from 98% to 92%.
59
Figure 5-11: CR Model’s Fourth Experiment Evaluation Results Graphically
On the fifth-round batch size is changed back to 1, learning rate is 0.01 and activation function is
changed to tanh. Other parameters are kept the same as Model-4.
Model-5 has shown very good results almost closer to Model-3 but still its validation accuracy
and AP are less. In Model-3 the validation accuracy was 98% while Model-5 got 96.52% and AP
of Model-3 was 98% while in Model-5 it is 97%.
So, a Learning Rate: 0.01, Batch Size: 1, Optimizer: SGD, Epochs: 50, Activation function: ReLU
resulted to be better parameters for LP dataset. And with those parameters the model is able to
generalize without neither overfitting nor underfitting.
60
Figure 5-12: CR Model’s Fifth Experiment Evaluation Results Graphically
So, the research used Model-3 as its recognition model since it performed better compared to the
other four.
61
6. Chapter Six: Conclusion and Future Work
6.1. Conclusion
This research studied about a deep learning-based approach for recognition of Ethiopian car license
plates. This research is important because Ethiopian license plates has their own unique features
and this problem has only been approached with conventional Image processing method. This
study has three main parts: License Plate Detection, Character Segmentation and Character
Recognition.
In the first part that is license plate detection an image in RGB format is given to the detection
model and it detects a license plate area that has the highest score. A detection model basically
scans the whole image in search of a specific object which in this case is the license plate. The
detected plate is then given to the segmentation model. Overall, the detection model was developed
with Faster RCNN using ResNet as a feature extractor (which originally used VGG). The model
was able to get 99.1% mAP@50IOU.
Once the license plate is detected its characters are segmented using image processing methods.
So, the segmentation module takes an image: preprocess it, performs orientation adjustment,
removes the borders and finally segment each alpha numeric character. It was built with OpenCV-
python. The segmentation model achieved 86.66%.
The segmented characters are going to be given to the classification or recognition model which
was developed using Convolutional Neural Network. The CNN model classifies each character
image to its corresponding class. It achieved a pretty good classification accuracy in both training
and validation set which is 99.9% and 98% respectively.
The experimental results of this research show that all the three modules perform very well and
each achieved much better accuracy compared to the related works. Which proves that deep
learning is by far superior when it comes to modeling an object or a single class that has
inconsistent features such as Ethiopian license plates. Deep learning reduces the load of extracting
image features manually by directly learning from the them which is the main reason for it to get
state of art accuracy.
62
6.2. Future Work
Both deep learning models: detection and recognition, work well but the segmentation model
which is developed using conventional image processing could be made more accurate by also
implementing is with deep learning-based segmentation approach.
63
References
[1] X. Lele, . A. Tasweer and J. Liyanwen , "A New CNN-Based Method for Multi-Directional
Car License Plate Detection," IEEE, p. 11, 2018.
[2] S. Nigussie and A. Yaregal , "Automatic Recognition of Ethiopian License Plates," IEEE, p.
5, 2015.
[3] Y. LeCun, Y. Bengio and G. Hinton, "Deep Learning," Nature, vol. 521, p. 9, 2015.
[4] A. Rosebrock, Deep Learning for Computer Vision with Python, PyImageSearch, 2017.
[5] I. Goodfellow, Y. Bengio and A. Courville, Deep Learning, MIT Press, 2016.
[8] C. Tang, Y. Feng, Y. Xing, C. Zheng and Y. Zhou, "The Object Detection Based on Deep
Learning," IEEE, p. 6, 2017.
[9] N. Jmour, S. Zayen and A. Abdelkrim, "Convolutional Neural Networks for Image
Classification," IEEE, p. 6, 2018.
[10] ARH, "Automatic Number Plate Recognition," Adaptive Recognition Hungary, [Online].
Available: www.anpr.net/anpr_09/anpr_applicationareas.html. [Accessed 18 12 2018].
64
[13] I. Culjak, D. Abram, T. Pribanic, H. Dzapo and M. Cifrek, "A Brief Introduction to
OpenCV," IEEE, p. 6, 22 May 2012.
[15] R. Fisher, S. Perkins, A. Walker and E. Wolfart, "Image Processing Learning Resources,"
HIPR2, [Online]. Available: homepages.inf.ed.ac.uk/rbf/HIPR2/hipr_top.htm. [Accessed 10
May 2019].
[17] W.-X. Kang, Q.-Q. Yang and R.-P. Liang, "The Comparative Research on Image
Segmentation Algorithms," IEEE, p. 5, 2009.
[18] L. Chao-yang and L. Jun-hua, "Vehicle License Plate Character Segmentation Method Based
on Watershed Algorithm," IEEE, p. 6, 2010.
[19] F. Chollet, Deep Learning with Python, New York: Manning Publications Co., 2018.
[20] A. Géron, Hands on Machine Learning with Scikit-Learn and Tensorflow, United States of
America: O’Reilly Media, Inc., 2017.
[21] Q. Wu, Y. Liu, Q. Li, S. Jin and F. Li, "The Application of Deep Learning in Computer
Vision," IEEE, p. 6, 2017.
[22] T. Guo, J. Dong, H. Li and Y. Gao, "Simple Convolutional Neural Network on Image
Classification," IEEE, p. 4, 2017.
65
[25] N. Aloysius and G. M, "A Review on Deep Convolutional Neural Netrorks," IEEE, p. 5,
2017.
[26] X. Zhou, W. Gong, W. Fu and F. Du, "Application of Deep Learning in Object Detection,"
IEEE, p. 4, 2017.
[27] W. G. Hatcher and W. Yu, "A Survey of Deep Learning: Platforms, Applications and
Emerging Research Trends," IEEE, p. 21, 2018.
[32] Z. Selmi, M. B. Halima and A. M. Adel , "Deep Learning System for Automatic License
Plate Detection and Recognition," IEEE, p. 7, 2017.
[33] Z. Zhihong, Y. Shaopu and M. Xinna , "Chinese License Plate Recognition Using a
Convolutional Neural Network," IEEE, p. 4, 2008.
[34] C.-H. Lin, Y.-S. Lin and W.-C. Liu, "An Efficient License Plate Recognition System Using
Convolution Neural Networks," IEEE, p. 4, 2018.
66
[36] M. Mondal, P. Mondal, N. Saha and P. Chattopadhyay, "Automatic Number Plate
Recognition Using CNN Based Self Synthesized Feature Learning," IEEE, p. 4, 2017.
[37] S. Montazzolli and C. Jung, "Real-Time Brazilian License Plate Detection and Recognition
Using Deep Convolutional Neural Networks," IEEE, p. 8, 2017.
[42] J. Huang, V. Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi, I. Fischer, Z. Wojna, Y. Song,
S. Guadarrama and K. Murphy, "Speed/accuracy trade-offs for modern convolutional object
detectors," IEEE, p. 10, 2017.
[43] J. Hui, "Object detection: speed and accuracy comparison (Faster R-CNN, R-FCN, SSD,
FPN, RetinaNet and YOLOv3)," 18 March 2018. [Online]. Available:
https://medium.com/@jonathan_hui/object-detection-speed-and-accuracy-comparison-
faster-r-cnn-r-fcn-ssd-and-yolo-5425656ae359. [Accessed 20 October 2019].
[44] S. Ren, K. He, R. Girshick and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection
with Region Proposal Networks," IEEE, p. 14, 2016.
[45] K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image Recognition," IEEE,
p. 9, 2015.
67
[46] A. Krizhevsky, "The CIFAR-10 Dataset," 2009. [Online]. Available:
https://www.cs.toronto.edu/~kriz/cifar.html. [Accessed 10 March 2019].
68
Appendixes
Appendix A: Sample Code for Detection Model’s Configuration
69
70
Appendix B: Sample Code for Character Segmentation
71
72
73
74
Appendix C: Sample Code for Character Recognition / Classification Model
75