0% found this document useful (0 votes)
125 views89 pages

DLBELPDR Final

This thesis investigates developing a deep learning model for detecting and recognizing Ethiopian license plates from images. The student, Joseph Wondwosen, is submitting this work to Arba Minch University's School of Graduate Studies in partial fulfillment of a Master of Science degree in Computer Science. The thesis is supervised by Dr. Anusuya. It includes chapters on reviewing related literature on image processing, deep learning, license plates, and prior research; describing the research methodology; implementing and evaluating a deep learning model; and concluding with results and potential future work.

Uploaded by

shemse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
125 views89 pages

DLBELPDR Final

This thesis investigates developing a deep learning model for detecting and recognizing Ethiopian license plates from images. The student, Joseph Wondwosen, is submitting this work to Arba Minch University's School of Graduate Studies in partial fulfillment of a Master of Science degree in Computer Science. The thesis is supervised by Dr. Anusuya. It includes chapters on reviewing related literature on image processing, deep learning, license plates, and prior research; describing the research methodology; implementing and evaluating a deep learning model; and concluding with results and potential future work.

Uploaded by

shemse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 89

ARBA MINCH UNIVERSITY

ARBA MINCH INSTITUTE OF TECHNOLOGY

SCHOOL OF GRADUATE STUDIES

Developing Deep Learning Based Ethiopian Car’s License Plate Detection and
Recognition Model

MASTER’S THESIS
By
Joseph Wondwosen

A THESIS SUBMITTED TO SCHOOL OF GRADUATE STUDIES IN PARTIAL


FULFILLMENT OF THE REQUIREMENT FOR THE DEGREE OF MASTER OF
SCIENCE IN COMPUTER SCIENCE

ADVISOR: Dr. Anusuya (PHD)

February 2, 2020
Arba Minch, Ethiopia
ARBA MINCH UNIVERSITY

ARBA MINCH INSTITUTE OF TECHNOLOGY

SCHOOL OF GRADUATE STUDIES

Developing Deep Learning Based Ethiopian Car’s License Plate Detection and
Recognition Model

MASTER’S THESIS
By
Joseph Wondwosen

A THESIS SUBMITTED TO SCHOOL OF GRADUATE STUDIES IN PARTIAL


FULFILLMENT OF THE REQUIREMENT FOR THE DEGREE OF MASTER OF SCIENCE
IN COMPUTER SCIENCE

ADVISOR: Dr. Anusuya (PHD)

February 2, 2020
Arba Minch, Ethiopia
ARBA MINCH UNIVERSITY

SCHOOL OF GRADUATE STUDIES

ADVISORS APPROVAL SHEET

This is to certify that the thesis entitled: “Developing Deep Learning Based Ethiopian Cars License
Plate Detection and Recognition Model” an submitted in partial fulfillment of the requirement for
the degree of Masters of Science in computer science, the graduate program of faculty of
computing & software engineering, Arba Minch university, Arba Minch Institute of Technology
has been carried out by Joseph Wondwosen, under my supervision.

Therefore, I recommend that the student has fulfilled the requirement and hence herby can submit
the thesis to the faculty for defense.

Dr. Anusuya (PHD) (Ass. Prof) _________________________ __________________

Name of Principal Advisor Signature Date

i
Declaration
I hereby declare that this MSc thesis is my original work and has not been presented in any other
university, and that all sources of materials used for the thesis have been fully acknowledged.

Name: Joseph Wondwosen

Signature: _______________

Date: _______________

ii
ARBA MINCH UNIVERSITY

SCHOOL OF GRADUATE STUDIES

EXAMINER’S APPROVAL SHEET

We, the undersigned, members of the Board of Examiners of the final open defense by Joseph
Wondwosen have read and evaluated his thesis entitled “Developing Deep Learning Based
Car’s License Plate Detection and Recognition Model” and examined the candidate. This is,
therefore, to certify that the thesis has been accepted in partial fulfillment of the requirement of the
degree of Masters in Computer Science.

Approved By:

___________________ ____________________ ____________________


(Chairperson) Signature Date

___________________ ____________________ ____________________


(External Examiner) Signature Date

___________________ ____________________ ____________________


(Internal Examiner) Signature Date

iii
Acknowledgement

First of all, I would like to thank GOD for everything that has happened in my life. His love and
mercy have always been with me through all my ups and downs even though I don’t deserve any
of them.

I would also like to thank my thesis advisor Dr. Anusuya. She consistently allowed this paper to
be my own work, but steered me in the right direction whenever she thought I needed it.

Finally, I must express my very profound gratitude to my wife Mercy Mohammed, my father
Wondwosen Tefera, my mother Antonina Tefera, my sister Scarlet Wondwosen and my brother
Nathanael Wondwosen for providing me with unfailing support and continuous encouragement
throughout my years of study and through the process of researching and writing this thesis. This
accomplishment would not have been possible without them. Thank you.

iv
Table of Contents

Acknowledgement ...................................................................................................................................... iv
List of Figures............................................................................................................................................ vii
List of Tables ............................................................................................................................................ viii
List of Equations ........................................................................................................................................ ix
List of Abbreviations and Acronyms ........................................................................................................ x
Abstract...................................................................................................................................................... xii
1. CHAPTER ONE: INTRODUCTION ............................................................................................... 1
1.1. Background ................................................................................................................................. 1
1.2. Statement of Problem ................................................................................................................. 2
1.3. Research Questions ..................................................................................................................... 3
1.4. Objectives..................................................................................................................................... 3
1.4.1. General Objective ............................................................................................................... 3
1.4.2. Specific Objectives .............................................................................................................. 3
1.5. Significance of the Study ............................................................................................................ 4
1.6. Scope and Limitation of the Study ............................................................................................ 4
1.7. Organization of the Thesis ......................................................................................................... 4
2. CHAPTER TWO: REVIEW OF RELATED LITRATURE .......................................................... 6
2.1. Introduction ................................................................................................................................. 6
2.2. Image Processing ......................................................................................................................... 6
2.2.1. Image Processing Techniques ............................................................................................ 7
2.2.2. Image Segmentation.......................................................................................................... 10
2.3. Deep Learning ........................................................................................................................... 12
2.3.1. Deep Learning Based Object Recognition (or Classification) ....................................... 13
2.3.2. Deep Learning Based Object Detection .......................................................................... 15
2.3.3. Deep Learning Platforms ................................................................................................. 17
2.4. License Plates ............................................................................................................................ 18
2.5. Related Work ............................................................................................................................ 20
2.5.1. Researches on Ethiopian License Plates ......................................................................... 20
2.5.2. Researches on Non-Ethiopian License Plates ................................................................. 20
2.5.3. Review Summary .............................................................................................................. 22
3. CHAPTER THREE: RESEARCH METHODOLOGY................................................................ 27

v
3.1. Introduction ............................................................................................................................... 27
3.2. Research Approach ................................................................................................................... 27
3.3. Process Design ........................................................................................................................... 28
3.4. Data Collection .......................................................................................................................... 29
3.5. Data Analysis ............................................................................................................................. 30
3.6. Tools ........................................................................................................................................... 31
3.6.1. Hardware Tools ................................................................................................................. 32
3.6.2. Software Tools ................................................................................................................... 32
3.6.3. Programming Languages and Platforms ........................................................................ 32
4. CHAPTER FOUR: RESEARCH DESIGN .................................................................................... 36
4.1. Introduction ............................................................................................................................... 36
4.2. Proposed Design ........................................................................................................................ 36
4.2.1. Image Data Acquisition .................................................................................................... 37
4.2.2. Cleaning ............................................................................................................................. 38
4.2.3. Augmentation .................................................................................................................... 38
4.2.4. Splitting .............................................................................................................................. 39
4.2.5. License Plate Detection ..................................................................................................... 39
4.2.6. Character Segmentation ................................................................................................... 43
4.2.7. Character Recognition...................................................................................................... 47
5. Chapter Five: Results and Discussion ............................................................................................. 50
5.1. Introduction ............................................................................................................................... 50
5.2. Object Detection ........................................................................................................................ 50
5.3. Character Segmentation ........................................................................................................... 54
5.4. Character Recognition.............................................................................................................. 56
6. Chapter Six: Conclusion and Future Work ................................................................................... 62
6.1. Conclusion ................................................................................................................................. 62
6.2. Future Work .............................................................................................................................. 63
Appendixes ................................................................................................................................................ 69
Appendix A: Sample Code for Detection Model’s Configuration .................................................... 69
Appendix B: Sample Code for Character Segmentation................................................................... 71
Appendix C: Sample Code for Character Recognition / Classification Model ............................... 75

vi
List of Figures

Figure 2-1: Typical CNN Architecture ......................................................................................... 15


Figure 3-1: Research Design Process ........................................................................................... 28
Figure 3-2: Tool Selection Diagram ............................................................................................. 31
Figure 4-1: Overall System's Design Diagram ............................................................................. 37
Figure 4-2: Visual representation of image pyramid with 5 levels (image source: Wikipedia) ... 38
Figure 4-3: License Plate Detection Pipeline................................................................................ 39
Figure 4-4: Faster R-CNN Architecture Diagram ........................................................................ 40
Figure 4-5: Sample Input Images and Their Output After Plate Has Been Detected ................... 42
Figure 4-6: License Plate Extraction Samples .............................................................................. 43
Figure 4-7: License Plate Orientation Adjustment Sample .......................................................... 45
Figure 4-8: Border Elimination Sample ........................................................................................ 45
Figure 4-9: Plate Character Segmentation Sample ....................................................................... 47
Figure 4-10: ReLU Activation Diagram ....................................................................................... 48
Figure 4-11: Character Recognition Model's Architecture ........................................................... 49
Figure 5-1: ResNet Trained Detection Model's Evaluation Result in AP an AR ......................... 50
Figure 5-2: Graphical Representation of ResNet Trained Detection Model's Evaluation Result in
AP an AR ...................................................................................................................................... 51
Figure 5-3: Sample Detection Results .......................................................................................... 52
Figure 5-4: Inception V2 Trained Detection Model's Evaluation Result in AP an AR ................ 53
Figure 5-5: Graphical Representation of Inception V2 Trained Detection Model's Evaluation
Result in AP an AR ....................................................................................................................... 53
Figure 5-6: Sample of Some Correctly Segmented License Plates .............................................. 55
Figure 5-7: Sample of Some Incorrectly Segmented License Plates ............................................ 55
Figure 5-8: CR Model's First Experiment Evaluation Results Graphically .................................. 57
Figure 5-9: CR Model’s Second Experiment Evaluation Results Graphically............................. 58
Figure 5-10: CR Model’s Third Experiment Evaluation Results Graphically ............................. 59
Figure 5-11: CR Model’s Fourth Experiment Evaluation Results Graphically ............................ 60
Figure 5-12: CR Model’s Fifth Experiment Evaluation Results Graphically............................... 61

vii
List of Tables

Table 2-1: Ethiopian License Plate Classification Based on Service Code .................................. 19
Table 2-2: Ethiopian License Plate Classification Based on Regional Code ............................... 19
Table 2-3: Review Summary for paper [2] ................................................................................... 22
Table 2-4: Previous Works on Object Detection and Recognition Part 1 .................................... 23
Table 2-5: Previous Works on Object Detection and Recognition Part 2 .................................... 25
Table 3-1: Ethiopian Car's License Plate Color Properties Based on There Code ....................... 29
Table 4-2: ResNet101 Network Architecture ............................................................................... 41
Table 4-3: Character Recognition Model's Architecture .............................................................. 49
Table 5-1: Comparison of Ethiopian License Plate's Detection Model with Some Related Works
....................................................................................................................................................... 54
Table 5-2: CR Model’s First Experiment Parameters................................................................... 56
Table 5-3: CR Model’s First Experiment Evaluation Results ...................................................... 56
Table 5-4: CR Model’s Second Experiment Evaluation Results ................................................. 57
Table 5-5: CR Model’s Third Experiment Evaluation Results ..................................................... 58
Table 5-6: CR Model’s Fourth Experiment Evaluation Results ................................................... 59
Table 5-7: CR Model’s Fifth Experiment Evaluation Results ...................................................... 60
Table 5-8: Summary Table for The Experiments ......................................................................... 61

viii
List of Equations

Equation 4-1: Equation of Canny Edge Detector ......................................................................... 44


Equation 4-2: Equation for Transformation Matrix in HoughLines ............................................. 44
Equation 4-3: Equation through which OTSU Finds Threshold Value (t) ................................... 46
Equation 4-4: Equation for SoftMax Activation Function ........................................................... 48

ix
List of Abbreviations and Acronyms

AP Average Precision

CNN Convolutional Neural Network

Conv Convolution

CPU Central Processing Unit

CR Character Recognition

DNN Deep Neural Network

ELPR Ethiopian License Plate Recognition

Faster R-CNN Faster Recurrent-Convolutional Neural Network

FC Fully Connected

GPU Graphical Processing Unit

HOG Histogram of Oriented Gradients

ITS Intelligent Transportation System

LP License Plate

LPR License Plate Recognition

mAP Mean Average Precision

MP Mega Pixel

ms Milli Seconds

OpenCV Open Computer Vision

ReLU Rectified Linier Unit

R-FCN Region-Based Fully Convolutional Networks

RGB Red Green Blue

x
RNN Recurrent Neural Network

ROI Region of Interest

SGD Stochastic Gradient Descent

SSD Single Shot Detection

TFODAPI Tensorflow Object Detection Application Programming Interface

YOLO You Only Look Once

xi
Abstract

Digital Image Processing is application of computer algorithms to process, manipulate and


interpret images. As a field it is playing an increasingly important role in many aspects of people’s
daily life. Even though Image Processing has accomplished a great deal on its own, nowadays
researches are being conducted in using it with Deep Learning (which is part of a broader family,
Machine Learning) to achieve better performance in detecting and classifying objects in an image.
Car’s License Plate Recognition is one of the hottest research topics in the domain of Image
Processing (Computer Vision). It is having wide range of applications since license number is the
primary and mandatory identifier of motor vehicles. When it comes to license plates in Ethiopia,
they have unique features like Amharic characters, differing dimensions and plate formats.
Although there is a research conducted on ELPR, it was attempted using the conventional image
processing techniques but never with deep learning. In this research an attempt has been made in
tackling the problem of recognizing Ethiopian license plates with better accuracy using both deep
learning and image processing. Tensorflow was used in building the deep learning model and all
the image processing is done with OpenCV-Python. So, the developed deep learning model was
able to recognizes Ethiopian license plates with better accuracy by achieving 99.1%, 86.66% and
98% accuracy on plate detection, segmentation and recognition respectively which averages to an
overall accuracy of 94.6%.

Keywords: Amharic Characters, Deep Learning, Image Processing, License Plate, OpenCV-
Python, Tensorflow

xii
1. CHAPTER ONE: INTRODUCTION

1.1. Background

In recent years, the number of motor vehicles has increased considerably and this has, in turn,
exacerbated the traffic management burden. The resultant congestion has caused extreme problems
such as traffic accidents or public space vulnerability to crime or terrorist attacks [1].
As a result, various Intelligent Transportation Systems (ITSs) emerge as viable solutions to those
problems. ITSs apply technologies of information processing and communication on transport
infrastructures to improve the transportation outcome. One of these systems is License Plate
Recognition (LPR) which is the most popular and important element of ITS [2].
The main job of LPR systems is to detect the license plate from rest of objects found in an image
and recognize the Alphanumeric characters which in case of Ethiopian license plates are Amharic
characters, Digits (ranging from 0-9) and in most cases English characters also [2].
As far as the knowledge of the researcher, till date there is one research that has been conducted
specifically on Ethiopian license plates. It was attempted using conventional Image Processing
algorithms and was able to achieve an overall accuracy of 63.1%.
In this research an effort has been made in detecting and recognizing Ethiopian license plates using
Deep Learning. The basic advantage in using deep learning is that unlike the traditional template
machining algorithms of image processing, feature extraction is done through directly learning
from images, text or sound [3] [4] [5]. As, a result deep learning models can achieve state of art
accuracy, sometimes even outperforming domain expert on their respective filed [6].
LPR systems basically contain three phases or stages: Plate Detection, Character Segmentation
and Character Recognition.

Plate Detection is categorized under one of the broader typical tasks of computer vision that is
Object Detection [7]. Object Detection is scanning or going through an image in search of a
specific object and there are so many algorithms that has been used till date in fulfillment of this
task. Before emergence of deep learning different mathematical models which are based on some
prior knowledge like Hough Transform, Frame Difference, Background Subtraction, Optical
Flow, Sliding Window and Deformable Part methods were used and are still used to some extent,
but currently deep learning-based algorithms are showing a state-of-art performance when it comes

1
to object detection [8]. In this research an effort has been made in trying to detect the license plate
from an image using deep learning approach.

The second phase is Character Segmentation which is performed on the detected or extracted plate
image. This phase aims at separating each character found on a detected plate and feeding them
individually to the next phase, to be recognized. There are many image processing techniques and
libraries which are used for image segmentation and this research used OpenCV-Python library
for its image preprocessing and segmentation tasks.

The third and final phase is recognition where each character that has been segmented earlier gets
recognized or classified. There are many traditional image processing techniques like Histograms
of Oriented Gradients which are used to represent an image to be classified but they need domain
expert’s knowledge and guidance for feature extraction. There is also a deep learning approach
which automatically learns and extracts those features by itself which in turn has a dramatic impact
on the performance of classification task [9].

Overall, in building the deep learning models a library called Tensorflow has been used and for all
other image processing and segmentation tasks the research used OpenCV-Python.

1.2. Statement of Problem

It is known that number of motor vehicles on the main roads of Ethiopian cities are increasing
considerably every year which as a result causes high traffic congestion problem. The congestion
problem in turn leads to many other problems like loss of life from traffic accidents, vehicle thefts
and other security related problems. Even though this problem is being addressed through
Intelligent Transportation Systems (ITSs) in most of the developed countries through research, in
Ethiopia there has only been one research and it was attempted using a conventional OCR based
Image Processing.

In case of License Plate Recognition, image of a car whose plate is to be processed may have an
angle where the plate is partially visible, low resolution quality, varying distance from the camera
and low lighting conditions which makes is very hard (challenging) to process and recognize.

2
1.3. Research Questions

➢ RQ 1: Which model building parameter set is better in training our model and achieving
better accuracy.
➢ RQ 2: How much accuracy boost can be achieved by using Deep Learning based approach
in solving recognition problem of Ethiopian license plates which has their own unique
morphology and set of characters (Amharic and English characters in addition to Arabic
Numbers).

1.4. Objectives

1.4.1. General Objective

As a general objective this research tries to build a Deep Learning based Ethiopian license plates
recognition model with better accuracy.

1.4.2. Specific Objectives

In order to meet the general objective specified above the research have the following specific
objectives:

➢ To prepare two sets of datasets for detection and recognition models, where the detection
model’s dataset will include images of the whole car with some other surrounding objects
in context while the images are being captured and the recognition model’s dataset will
include images of individual characters that are composed of Amharic and English letters
and Arabic Numbers.
➢ Analyzing different model building parameter sets in context of our problem and building
both classification and detection models with a better one.
➢ Building image processing python scripts which makes use of better image processing
algorithms available: a detected license plate is going to pass through many image
processing stages (like: resizing, noise removal, binarization, contouring, segmentation)
before being feed as an input to the classification model.
➢ Building a script that can process and interpret the output of classification model Measuring
accuracy of the overall model.

3
1.5. Significance of the Study

This research has a huge impact not only on the transportation system but also in testing the limits
of deep learning models in handling such a unique problem with its own features and varieties.

Considering its applicability, it can be used significantly in many areas like: Parking, Access
Control in Restricted Areas, Motor Way Tooling, Border Control, Law Enforcement and many
more [10].

The techniques and approaches used in this research will definitely benefit the field of computer
vision in context of license plate recognition since by default it inherits the challenges of computer
vision like: Image Classification, Object Detection and Segmentation [11].

1.6. Scope and Limitation of the Study

Below is the scope of the proposed research:

➢ The LPR model has been designed by considering Ethiopian license plates only.
➢ For both Amharic and English characters only selected ones that are used in Ethiopian
license plates around Addis Ababa city been used in training the recognition model since
it was impossible to collect images from all over the country because of time.
➢ Digits (ranging from 0 to 9) are going to be used in training the recognition model.
➢ The model is able to both detect and recognize Ethiopian license plates.
➢ The model is able to detect only license plates from an image that contains many other
objects and once detected, it is able to recognize it.

1.7. Organization of the Thesis

The entire document has a total of 6 chapters including the current chapter. The second Chapter is
about literature review and related works, which focuses on review of different image processing
and deep learning technologies (methods) that are used and mentioned in different literatures till
date. The third Chapter is about research methodologies used in conducting this research. The
fourth Chapter is about design of the research that documents the overall architecture as a whole.
It discusses all the modules and algorithms used in developing Ethiopian license plate detection
and recognition model. The fifth Chapter is all about evaluation of the developed model and
different experiments that were conducted in order to come up with an optimal model. Finally,

4
Chapter six concludes the overall result and findings of this research. It also contains
recommendations for future work.

5
2. CHAPTER TWO: REVIEW OF RELATED LITRATURE

2.1. Introduction

This chapter deals with review of literature with respect to this research’s domain area and the
problem that it is entitled to solve, which is detection and recognition of Ethiopian car’s license
plate. So, since this research work’s on integration of two domain areas that are deep learning and
image processing, the next two consecutive sections: Section 2.2 and Section 2.3 of this chapter
are going to focus on reviewing literatures on image processing and deep learning respectively.
Section 2.2 is a review on a brief concept behind the whole image processing field, well known
image processing techniques and finally since the second main task of LPR system is character
segmentation which falls under image processing domain, a review on different image
segmentation techniques and algorithms has been made. Section 2.3 is a review on deep learning
in general, deep learning-based object recognition (or classification), deep learning-based object
detection and deep learning platforms. Then, Section 2.4 is about license plates (or number plates)
and the work that have been done in detecting them (i.e. in automated way). Finally, in Section 2.5
some major journal articles and publications which are related to this research has been reviewed
and there is a conclusion that outlines implications and significance of the identified themes for
this research and field of study. The conclusion is also going to outline how and why this research
aims to address the gaps identified.

2.2. Image Processing

Images are built from colors or intensities of light which are called pixels. Pixels being building
blocks of an image; they can be represented in either Gray Scale (or single channel) and Colored
form. In grayscale images every pixel is represented in scalar value ranging between 0 and 255
(where 0 represents ‘black’ and 255 represents ‘white’) while the colored pixels are not scaler but
rather represented by a list of three values which mostly stand for Red, Green and Blue when we
are working with RGB color space [4].

So, Digital Image Processing which is a subcategory of Signal Processing is all about use of
computers to process digital images [12]. Image processing is used in solving many major tasks

6
of Computer Vision, which is a science of programming computers so that they can understand
images and video in a highly detailed way, or in other words making computers see [13].

Although computer vision with traditional image processing techniques where we have to
explicitly perform feature extraction has been applied in different application areas and showed
some promising results, when it comes to using it in solving complicated problems like object
detection in an image that may have different factors of variations like [4]:

➢ Viewpoint Variation: when an object is rotated or oriented in multiple dimensions with how
the object is photographed or captured.
➢ Scale Variation: when the object is the same but it varies in size
➢ Deformation: when the object to be detected has a deformity in shape, which makes it really
difficult to detect compared to other variations.
➢ Occlusions: object to be detected is hidden or covered by something (i.e. probably some
other object) in an image.
➢ Changes in Illumination: objects of an image captured under different lighting conditions
(i.e. low lighting or high lighting).
➢ Background Clutter: difficulty of identifying an object from an image because it has very
noisy background.
➢ Intra-class Variation: object that belongs to the same class but having diversified forms.

it may not be possible to achieve the desired accuracy.

2.2.1. Image Processing Techniques

This research uses many image processing algorithms (i.e. techniques) for different tasks like
Geometric Transformations, Smoothing, Morphological Transformations, Thresholding, Edge
Detection and Contouring.

Below is there description [14] [15]:

2.2.1.1.Geometric Transformations
Geometric transformation is image preprocessing technique which enables us to remove any
geometric distortion from a given image. It is a one to one mapping or bijection of a set having
some geometric structure to itself or another. It can be categorized based on its operand sets.

7
Different operations of geometric transformations include:

i. Translation: is a kind of geometric transformation which maps the position of each


pixel in an input image to a new corresponding position in an output image. In
translation every pixel is moved by the same distance in chosen direction. Its an
affine transformation with no fixed points.
ii. Scaling: Image scaling refers to changing or manipulating size of a given digital
image (i.e. resizing). It can be used to either shrink or zoom an image (or part of it),
it is either a subsampling where a group of pixels are replaced by one randomly
chosen pixel from that group or pixel replication.
iii. Rotation: is a geometric transformation which maps a pixel in a position (x1, y1) to
(x2, y2) of output image by rotating it with a user defined angle value. It is a circular
movement of an object around a center of rotation.
iv. Affine Transform: it is an application of linear combination of linear combination of
translation, rotation and scaling to intensity of a pixel located at (x1, y1) in an input
image, in order to map it to a new pixel located at (x2, y2) in an output image.
A set of parallel lines remain parallel after affine transformation.
v. Perspective Transform: all straight lines in the original image are kept straight in the
output image as well.

2.2.1.2.Thresholding
It is the easiest way of image segmentation.

i. Simple Thresholding: replaces each pixel of an image with either black or white
pixel depending on whether the intensity of the pixel is less than or greater than
some fixed constant value.
ii. Adaptive Thresholding: simple thresholding is limited when it comes to images
having different lighting conditions in its different areas.
In adaptive thresholding different thresholds are applied to different regions letting
it achieve better result for images with varying illumination. The threshold value
can be either a mean or a weighted sum of the neighborhood area.
iii. Otsu’s Binarization: used for binomial images whose histogram has two peaks. It
automatically takes a threshold value that is in the middle of the two peaks.

8
2.2.1.3.Smoothing
Smoothing: is achieved by convolving an image with low-pass filter kernel which removes
high frequency content like noise and edges. OpenCV provides four types of smoothing or
blurring techniques:
i. Averaging: is simply replacing the central pixel under a kernel or filter area with an
average of pixels under that same kernel.
ii. Gaussian Filtering: doesn’t use filter consisting of equal filter coefficients. It is
typically used to reduce noise and is achieved by convolving an image with
Gaussian function which results in reducing images high-frequency components.
iii. Median Filtering: looks like averaging but instead of computing average it
calculates median of all pixels under kernel and replace the central pixel with the
computed median value. Unlike other filtering methods, the central value is always
replaced by a value in the image.
iv. Bilateral Filtering: unlike other filters bilateral filtering is effective in removing
noise in an image while preserving its edges. It replaces each pixel with weighted
average of intensity values from nearby pixels where the weight is based on
Gaussian distribution.

2.2.1.4.Morphological Transformations
Operations performed on binary images based on the image shape.
i. Erosion: a way through which we erode the boundaries of an object in foreground.
It is achieved by convolving a kernel through an image where a given pixel through
an image where a given pixel remains 1 if and only if all pixels under the kernel are
1, otherwise erode or make zero.
ii. Dilation: It is opposite of erosion where each pixel under a kernel is 1 if at least 1
pixel under the same kernel is 1.
iii. Opening: It is performing erosion followed by dilation aiming at removing noise.
iv. Closing: opposite of opening which is performing dilation followed by erosion.

2.2.1.5.Edge Detection
Aims at identifying pixels in an image where they form curved line segment known as edge. It
is one of the fundamental steps in image processing and computer vision.

9
i. Canny Edge Detection: it is a multistage algorithm that mainly go through the
following stages:
➢ Noise reduction through Gaussian filtering
➢ Finding intensity gradient of the image which is achieved by filtering it with
Sobel kernel in both directions (i.e. vertical and horizontal).
➢ Non-maximum suppression by scanning a full image and removing pixel
which doesn’t form and edge.
➢ Hysteresis Thresholding is making a final decision of whether the detected
edges are really edges or not. It performs thresholding with two threshold
value that is minimum and maximum values. If the intensity gradient is
greater than max value it is an edge, if it is below min value it is not. Finally,
if the value is in between max and min, the decision will be made by its
connectivity.

2.2.1.6.Contouring
A contour is a curve joining all the continuous points along the boundary that are having same
color or intensity. It is very important object detection and recognition.

2.2.2. Image Segmentation

Image segmentation being an important part of image processing is all about dividing an image
into smaller pieces called segments. It is concerned with partitioning the image in to meaningful
components which have similar features or pixel characteristics [16].

There are different kinds of image segmentation techniques some of which are mentioned in the
previous section. In general, according to [16] and [17] segmentation algorithms can be
categorized as:

➢ Edge based segmentation: edges are a set of linked or connected pixels which are found in
between boundaries of different regions. Edges can be found by calculating the derivative
of an image function. Different techniques fall under this category like: Gray Histogram
Method and Gradient Based Method.

10
Gray histogram method is based on fitness of the Threshold. But since gray-histogram is
uneven for impact of noise it makes searching for minimum and maximum gray value very
difficult.
Gradient based method is recommended and mostly used for non-noisy images where gray
intensity around edges is intense enough. Different well-known algorithms like Differential
coefficient, Laplacian of Gaussian and Canny are categorized under it, where canny is the
most representative of them.
➢ Region based segmentation: divides an image into set of areas or regions based on pixels
that have similar characteristics. The main methods under this category are Thresholding,
Region Growing, Region Splitting and Merging.
Thresholding is choosing the right threshold value in order to separate image pixels to
different classes by using a principle that is based on characteristics of the image. It is the
most popular and commonly used method in image segmentation. It separates the
foreground objects from background where the objects are found to be lighter that the
background.
Region Growing works by first selecting a set of seed points from an image and then adding
those seeds to their neighboring pixels based on some predefined criteria like gray scale or
color. So, as a result adding similar regions or sub-regions there will be region growing.
Region Splitting and Merging works by randomly choosing regions and then tries to either
split or merge them based on certain condition. So, first the image is divided into regions
until there no more regions to divide (or split) and then merge until there are no more
similar (or related) regions to merge.
➢ Special theory-based segmentation: include segmentation techniques like Fuzzy
Clustering and Neural Network-based.
Fuzzy Clustering uses fuzzy set theory to cluster and enables fuzzy boundaries to exist
between different clusters.
Neural Network-based method unlike any previously mentioned segmentation methods,
uses dynamic equations to find edges by first mapping each pixel of an input image into
the neural network where each neuron of networks input layer represents a single pixel.
➢ Watershed based segmentation: the basic view in this algorithm is that the image is
considered as topology geomorphology and each pixel is taken as the altitude above the

11
sea level. Each local minimum including its neighborhood in the image is considered as
catchment basin and its boundary is called watershed.
The aim of this algorithm is finding the local maximum of the segmented region [18].

2.3. Deep Learning

Deep learning is a subfield of a broader family called Machine Learning which is all about enabling
computers to learn from a given data on their own rather than being ordered what to do using some
set of rules [19]. Deep learning is a representation learning which learns from representation of
data through its multiple layers with multiple level of abstraction. It solved the limitation of
conventional machine learning in learning from raw data which rather required a feature extractor
to convert raw data into appropriate representation or feature vector [3].

Representation learning enables AI systems to achieve better performance than that of hand-
engineered representations. Not only that but it also enables achieving higher adaptability and
saves a great deal of human time and effort which will otherwise be wasted in manually designing
features [5].

The deep in the name deep learning is not to mean the level of depth of understanding but rather
the amount or number of successive layers of representation. Deep learning can be conceptualized
as large neural networks where there are many layers which are built on top of each other [4].

Machine learning algorithms whether deep or not are categorized in to three main categories based
on how they learn [20]:

➢ Supervised learning: requires the training data to be labeled so as a result the output
will be supervised. It has two major tasks that are Classification and Regression.
➢ Unsupervised learning: the training data is not labeled as a result the output is not
supervised.
➢ Reinforcement learning: the learning architecture here interacts with the environment
and gets rewards or penalties (i.e. negative rewards). The learning system which is also
known as an agent learns by applying the best strategy (i.e. called policy) so that it gets
more rewards overtime.

12
Although deep learning can be applied in many areas, the concern of this research is in its
application with computer vision. Because of its ability in extracting features on its own, it is
currently being widely used in the field of computer vision. It is not only dominating but also
replacing the traditional machine learning algorithms which were very popular in their own time
[21].

2.3.1. Deep Learning Based Object Recognition (or Classification)

Even though there are many deep learning algorithms out there, when it comes to their applicability
in computer vision tasks like object recognition, Convolutional Neural Network is demonstrating
better and state-of-art performance [19] [22] [23].

Convolutional neural network also called Convnet or CNN is one of the algorithms in a family of
Deep Neural Networks which is inspired by the visual cortex of animals is widely used computer
vision [24]. CNNs eliminate the hand-crafted feature extraction (or feature engineering) phase
which is common in traditional machine learning systems by automatically learning them in the
training phase which makes them end-to-end learners [4]. The two major properties of CNN which
makes it the right model for the job is, its ability in learning patterns which are translation invariant
and their spatial hierarchies [19].

➢ Translation invariant patterns: once learning certain corner of a picture, CNN model can
recognize anywhere even if it is rotated.
➢ Spatial hierarchies of patterns: the initial convolution layer will learn small local patterns
while the next layer learns larger patterns which are made of the initial ones.

Architecturally CNN has the following major components or layers [4] [19] [20] [24] [25]:

i. Convolutional Layer or CONV: it is a unit where major computational work involved and
can be considered as a set that contains feature maps with neurons arranged in it. It consists
of filters (or kernels) where each filter has a width and a height that are nearly always a
square, letting CNNS take advantage of optimized linear algebra libraries that operate
efficiently on square matrices. So, after convolving K kernels across the input image the
output is stored in 2D shaped matrices called activation maps or feature maps.
CONV has three parameters through which it controls the size of an output volume: Depth,
Stride and Zero-Padding.

13
Stride is the step that the kernel takes while convolving, where smaller strides lead to
overlapping Receptive Fields (i.e. size of local region of an input image where neurons of
the current volume connect to).
In Zero-Padding zeros are added along the borders of the input so that the size of input and
output volume would be the same.
Overall CONV layer requires 4 parameters as an input:
➢ Number of Kernels (K)
➢ Size of Kernels or Receptive Fields (F)
➢ The Stride (S)
➢ The amount of Zero-Padding (P)

The output of CONV layer is then Wout (Output Width) X Hout (Output Height) X Dout
(Output Depth) where:

➢ Wout = ((Win – F + 2P) / S) + 1


➢ Hout = ((Hin – F + 2P) / S) + 1
➢ Dout = K
After each convolutional layer in a CNN, a non-linear activation function such as ReLU
(Rectified Linear Unit) is applied.
ii. Activation or RELU: are mostly used after CONV layer. They don’t learn on any
parameters or weights but instead they are applied in element wise manner yielding an
output of the same size as an input.
RELU is ignored most of CNN’s architectural diagrams since it is known to be included
by default.
iii. Pooling Layer or POOL: reduces the dimensions of the data by taking cluster of certain
layer’s neurons output and combining it into a single neuron at next layer. Some common
operations of pooling are: max pooling, average pooling, stochastic pooling, spectral
pooling, spatial pyramid pooling and multi-scale order-less pooling.
For smaller images a Pool of size 2 X 2 is used but for deeper CNNs with larger input
images may use 3 X 3.
POOL layer requires two parameters:
➢ Receptive Field size (Pool size) (F)

14
➢ The Stride (S)

So, POOL layer yields an output volume of size Wout (Output Width) X Hout (Output
Height) X Dout (Output Depth) where:

➢ Wout = ((Win – F) / S) + 1
➢ Hout = ((Hin – F) / S) + 1
➢ Dout = Din

Reducing the input image size also makes the neural network tolerate a little bit of image
shift (i.e. Location Invariance)

iv. Fully Connected Layer or FC: where neurons in this layer are all (i.e. fully) connected to
all neurons in the previous layer, as is the standard for feedforward neural networks. They
are placed at the end of the network.

Here is a typical CNN architecture diagram taken from [20]:

Figure 2-1: Typical CNN Architecture

2.3.2. Deep Learning Based Object Detection

Object detection is an application of different methods in order to find or locate a given object
either from image or video source.

Having feature learning and representation capabilities deep learning is being applied in many
areas of computer vision but more importantly in object detection [8]. Creating a detection model
in deep learning requires a large dataset and high computing power for training. And to have
similar detection rate for each class in the dataset, number of images and size of images should be
even across classes [26]. Even though deep learning-based object detection algorithms are robust
there still should be an improvement when it comes to their application in real time [8].

15
Currently there are many deep learning-based objects detection models available that are briefly
categorized into two: models based on region proposal and models based on regression.

2.3.2.1.Models based on region proposal


These types of models mainly perform extraction of region and building the appropriate network.
Below is the list of well-known models based on region proposal [8]:

➢ R-CNN: is a CNN based region proposal model that uses region-based segmentation
method to get the regions and feed them to CNN. Even though this model has some
improvements compared to prior ones, its performance is poor when it comes to using it
in real time.
➢ SPP-net: this model uses pyramid pooling which solved the problem of object
deformation and incompleteness which existed in R-CNN but it still is performs poorly
when it comes to real time usage.
➢ Fast R-CNN: improves performance of by using ROI pooling on region proposals after
mapping them to feature layer of CNN. The role of ROI pooling lets the model have a
fixed size vector for successful connection with full connection.
➢ Faster R-CNN: adds RPN (Region Proposal Network) in minimizing the computation
time taken by selective search approach in obtaining region proposals. It reduces the
proposed regions which were 2000 in R-CNN to 300.
➢ R-FCN: adopts RPN from Faster R-CNN and solves a problem of ROIs in sharing
computation. It adapts SoftMax classifier for feature vector classification. Even though
its accuracy is the same as that of Faster R-CNN its speed is improved 2.5 times.

2.3.2.2.Models Based on Regression


The models under this category doesn’t use region-based methods [8].

➢ YOLO: removed ROI module so it doesn’t extract object region proposals anymore and
instead it uses CNN at front end for feature extraction and 2 fully connected layers at the
other end for classification and regression. It improved the speed so that now it can be
used in real time fashion but it has detection accuracy as a tradeoff.
➢ SDD: integrated YOLO’s regression idea and Faster R-CNN’s anchor’s mechanism (i.e.
RPN). Unlike YOLO’s global feature extraction it uses local feature extraction. It became

16
the first deep learning-based object detection model to have a higher accuracy and still
maintain real time requirement.

2.3.3. Deep Learning Platforms

Deep learning has around 16 top ranking open-source platforms according to GitHub’s stars and
contributors count [27] [28]. Among the 16 platforms the top 5 has been mentioned in this
document starting from highest ranking one. And using these platforms deep learning can be
applied in various application areas like: image and video recognition/classification, audio
processing, text analysis and natural language processing, autonomous systems and robotics,
medical diagnostics, computational biology, physical science, finance, economics, market, cyber
security, architectural and algorithmic enhancement.

2.3.3.1.Tensorflow
Developed by Google Brain Team within Google’s Machine Intelligence research organization. It
is a library for numerical computation which enables deployment to both CPUs and GPUs without
a need to rewrite a code. It includes a data visualization toolkit called Tensorboard and it provides
a stable APIs for python and C.

2.3.3.2.Keras
Keras is a high-level neural network API. It run on top of Tensorflow, CNTK or Theano and is
written in python.

Keras allows for easy and fast experimentation. Like Tensorflow it runs on both CPU and GPU
and supports both CNN and RNN.

2.3.3.3.Caffe
Developed by Berkeley Vision and learning center and community contributors. It was developed
with speed, modularity and expression in mind.

2.3.3.4.Microsoft Cognitive Toolkit


Is a unified deep learning toolkit. It allows users to easily work with DNNs, CNNs and RNNs. It
implements stochastic gradient descent learning.

17
2.3.3.5. PyTorch
PyTorch is a python package which provides two high level features: Tensors and DNN with a
strong GPU acceleration.

PyTorch is usually used as a replacement for NumPy and considered as a deep learning framework
which provides speed and flexibility.

2.4. License Plates

License Plate is a term from American English (Number plate: British English) and it is either a
metal or plastic material attached to vehicles for their identification [29]. Although in some
governments it is only required to be attached at one side, in most countries like Ethiopia it is
required to be attached both in front and back side of the vehicle. So, license plate recognition is a
task of being able to extract the information found on the car’s license plate, in a way that it can
be used for further processing in different applications.

License plates recognition (LPR) systems are mostly used by a police force (or other security
enforcement teams) in order to minimize a challenge of tackling criminal’s movement and make
it a little more manageable [30] [31]. It may be used in checking whether a vehicle is registered or
not, in electronic toll collection on pay-per-road scenarios, in indexing the activity of traffic by
highway agencies, in crime deterrent and data collection.

The license plates in Ethiopia has three main body parts or features where each part or feature
convey different information accordingly: Service Code, Regional Code and Alphanumeric Code
[2].

➢ Considering the service code, Ethiopian license plates has the following types (or
variations):

18
Table 2-1: Ethiopian License Plate Classification Based on Service Code

Vehicle Service Type Code (Number or (EN / AM))


Taxi 1
Private 2
Commercial/Business 3
Government 4
Red Cross 5
United Nations UN/የተመ
African Union AU/አህ
Diplomatic CD/ኮዲ
Aid Organization AO/ዕድ
Temporary የዕለት
Police ፖሊስ
Special ልዩ

➢ Considering the regional code, ELPs has the following types (or variations):
Table 2-2: Ethiopian License Plate Classification Based on Regional Code

Region Code (EN / AM)


Ethiopia ET/ኢት
Addis Ababa AA/አአ
Amhara AM/አማ
Afar AF/አፋ
Benishangul Gumuz BG/ቤጉ
Dire Dewa DR/ድሬ
Gambella GM/ጋም
Harar HR/ሐረ
Oromia OR/ኦሮ
Somali SM/ሶማ
SNNP SP/ደሕ
Tigray TG/ትግ

19
2.5. Related Work

In this section different papers related to this research has been reviewed. The review mainly
focuses on a problem or a research gap that the reviewed paper tries to solve and methodology it
uses.

Although this is a popular research area and it has been attempted using so many techniques
including traditional image processing, machine learning and deep learning, since, license plates
of different countries have different features like size of the plates, characters contained in the plate
(which in this research’s case are Amharic letters, English letters and Numbers ranging from 0-9)
and methodology used, it is still worth investigating.

In case of Ethiopian license plates there is only one research till date and it has been attempted
with traditional image processing. But other technologically developed countries like China,
America, India, UAE and many more have exploited and conducted many researches regarding
the state-of-art deep learning models in recognizing their own language’s character features which
are included in their country’s respective license plates.

2.5.1. Researches on Ethiopian License Plates

A paper titled “Automatic Recognition of Ethiopian License Plates” was conducted on recognition
of Ethiopian license plates [2]. It used Gabor Filtering, Morphological Closing Operation and
Connected Component Analysis for plate region detection. Gabor filter is performed on gray scale
image and then its response is binarized to perform morphological closing. Finally, connected
component labeling is performed to find connected objects or components in a resultant image. In
segmentation phase first it applies orientation adjustment and size normalization by performing
Canny edge detection and Hough transform. Second it performs segmentation with connected
component labeling, finally, numbers and other characters are separated based on their location
and size (i.e. width and height). Once characters of the plate are segmented correlation-based
template matching is performed for recognition.

2.5.2. Researches on Non-Ethiopian License Plates

[32] basically, has three major phases: plate detection, character segmentation and character
recognition where 2 separate CNN models are used for detection and recognition. The detection

20
phase has both image preprocessing and CNN classification (detection) steps. Here the
preprocessing phase consists of morphology filtering to contrast maximization, gaussian blur filter
to remove noise, adaptive threshold to eliminate unimportant regions in the image, finding all
contours to locate a curve that joins all continuous points having the same intensity, geometric
filtering to improve the precision of LP detection, CNN detection and drawing boundary boxes
around plate region with minimum threshold value of 0.7. Segmentation phase consists of gray
scaling, canny edge detection, extraction of contours, geometric filtering and boundary boxes of
characters segmented. Finally, in last phase which is character recognition the second CNN model
is used.

[33] is about classification of images as license plate or non-license plate by using CNN. The
network used was constructed of 7 layers, where the first convolutional and sub-sampling layers
have 4 feature maps and the next once after that have 11. The third convolutional layer has 120
feature maps it is connected to the fully-connected layers which has 84 units. Finally, it has an
output layer which classifies the image as either a license plate or non-license plate.

[34] proposed efficient hierarchical methodology for license plate recognition system by first
detecting vehicles and then retrieving license plates from detected vehicles to reduce false
positives. Finally, CNN based LPR is used in recognition of plates characters. It used YOLO V2
for vehicle detection which has 19 convolutional layers and 5 max pooling layers. For license plate
detection it used SVM and for character segmentation it performed a number of image processing
techniques like: gray scaling, binarization, horizontal and vertical projection. For character
recognition it used a CNN model with 2 convolutional layers, two ma pooling, two fully connected
layers and one output layer.

[35] this paper is pure image processing based and it has 4 basic steps: preprocessing, localization,
segmentation and recognition. In preprocessing phase, it performs gray scaling and median
filtering to get rid of the noise while conserving sharpness of the image. In localization phase,
region of the plate is detected from rest of objects in an image.

[36] the paper aims at demonstrating the capability of CNN in recognizing vehicle’s states (or
regions) from a number plate. The researchers only considered 4 classes for simplicity sake. For
each class or state the dataset contained 200 images where each image had some kind of distortion,
tilt and illumination at different angles. The results achieved are more than 95% in average.

21
[37] Proposed an end-to-end deep-learning based ALPR system for Brazilian license plates. The
system presents two considerably fast and small YOLO based networks operating in cascade
mode. Since searching for a relatively very small object such as a license plate from a higher
resolution image demands too much of computing resource, the paper first performs frontal view
extraction where it extracts the front of a car which in turn contains the license plate.

2.5.3. Review Summary

Table 2-3: Review Summary for paper [2]

Review Criteria Automatic Recognition of Ethiopian License Plates [2]

Preprocessing Orientation adjustment and size normalization by applying linear


Hough line transform on canny-based detected edges.
Detection Gabor filtering, Morphological Closing operation and Connected
Component Analysis. All is performed on Gray scale images.
Segmentation Connected Component Analysis (CCL) is used with 4 neighborhood
pixel connectivity.
Classification Correlation based template matching
(Recognition)
Evaluation Achieved an overall accuracy of 63.1%.
Limitation Accuracy is low.

Even though [2] has been conducted on Ethiopian license plates, the methodology it used for
detection and recognition of the plates was plain image processing techniques whereas this
research tryed to use DL based methods. In [32] DL or CNN based methods are used in both
detection and recognition but the whole system was built for features that are specific to Tunisian
license plates which mostly include only numbers. In [33] CNN is used not to extract the
information that is available on the plate but rather to simply classify a given image as a plate or
not. It doesn’t have neither detection nor recognition capabilities. [34] also used CNN for
recognition but SVM for plate detection after isolating front of the car whose plate is to be detected
using YOLO. It has different methodology (approach) and plate features compared to current

22
research. [35] unlike the current research uses pure image processing in both detection and
recognition of the plates. [36] and [37] are both DL based but the plate features (plate shape, color,
dimension and its content) they work on are greatly different from current research.

Table 2-4: Previous Works on Object Detection and Recognition Part 1

Review Criteria Deep learning Chinse License Plate An Efficient License


System for Recognition using Plate Recognition using
Automatic License Convolutional Convolutional
Plate Detection and Neural Networks Networks [34]
Recognition [32] [33]

Preprocessing Converting from RGB No preprocessing. Gray-scale conversion


to HSV color space, and binarization to
Morphological remove unwanted noises
filtering for contrast before segmentation
maximization,
Gaussian Blur filter,
Adaptive Threshold,
Geometric Filtering
Detection Plain Convolutional The system does not First the vehicle it self is
Neural Network is detect license plates. detected using YOLO v2
applied on various and then SVM (with
boxes guessed to be a HOG values) is used to
license plate. localize a plate.
Segmentation Connected No segmentation Horizontal and Vertical
Component Analysis projection is used to
with Geometric determine the position of
Filtering a character.
Classification Convolutional Neural It uses CNN to Convolutional Neural
(Recognition) Network with 37 classify an image Network composed of 34
Classes. neurons,

23
either as plate or non-
license plate
Evaluation On Detection it got For license plates it Plate detection rate is
96% and on character has 98% and for non- 96.12% and plate
Recognition it got license plates it has localization rate is
95.3% 100% accuracy 94.23%. The recognition
phase achieved 99.2%
accuracy.
Limitation It uses plain CNN for This research is It uses extra phase for
detection which has an limited to classifying plate detection which
impact on both weather a given image needs an additional
accuracy and speed. is a license plate or computational time and
not. It does not do has negative impact on
detection, overall speed. But yet the
segmentation and localization phase’s
character recognition. accuracy is relatively
low.

24
Table 2-5: Previous Works on Object Detection and Recognition Part 2

Review Criteria Automatic Vehicle Automatic Number Real-Time Brazilian


Number Plate Plate Recognition License Plate Detection
Detection and using CNN Based and Recognition using
Recognition [35] Self Synthesized Deep Convolutional
Feature Learning Networks [37]
[36]

Preprocessing Applying Sharpness, No No


Histogram
Equalization,
Smoothing and
Thresholding
Detection Edge detection, No Firs the frontal view of
Morphological the vehicle is extracted
operations, Plate and then the license plate
region extraction is localized. In both cases
using Hough lines. FAST YOLO is used.
Segmentation Horizontal projection, No Here both Segmentation
Adaptive and Recognition is done
Thresholding. at the same time since
Classification Perform Binarization, The system uses CNN direct YOLO based
(Recognition) compare the for classification of detection is applied on
segmented characters different regions or the detected license plate.
directly with a model states of a given
matrix license plate.
Evaluation Has a precision of 97.5% accuracy. The full ALPR system
94% running end-to-end
achieved 63.18%. But
when considering partial

25
matches of at least 5
characters it presented an
accuracy of 97.39%.
Limitation The recognition The system is only The overall accuracy of
method is an efficient capable of classifying ALPR system is
since the matching is the state or region to relatively low.
done pixel by pixel, which the license plate
which makes it hard to belongs to. Other than
recognize or classify that, it does not
images of varying perform neither
angles. Detection nor
Segmentation. As a
result, there is no
recognition of
characters.

26
3. CHAPTER THREE: RESEARCH METHODOLOGY

3.1. Introduction

This chapter of the document discusses about an overall methodological approach, data collection
methods, data analysis methods and tools used in order to successfully meet the main objective of
this research that is, building a deep-learning based model for detection and recognition of
Ethiopian car license plates.

Section 3.2 discuss the general research approach used in this document, Section 3.3 is about data
collection methods, it talks about different methods used in collecting suitable data that is
necessary in developing a robust deep learning-based detection and recognition model, Section 3.4
discuss about the data analysis approaches used in analyzing the collected data, finally Section 3.5
lists out and describes different tools used in conducting this research, which mainly includes both
hardware and software tools used in implementing and evaluating the deep learning models.

3.2. Research Approach

Considering the purpose of this thesis work, it is categorized under applied research. It focuses on
application of different deep-learning and image processing algorithms that are appropriate in
solving the problem at hand (i.e. deep-learning based LPR).

As mentioned in previous chapters of this document, the research is basically composed of three
main phases: detection of the license plate from the rest of objects found in an image, segmentation
of characters found on the detected plate and finally recognition (or classification) of each
character obtained from the segmentation phase.

So, first a detection system is built from a deep-learning based object detection model by tuning
the hyper parameters of its architecture to make it specifically fit the features in Ethiopian license
plates. Then once the plate has been detected, image processing is applied to segment plate’s
characters. But since the detected plate is mostly noisy it is almost impossible to have an accurate
segmentation without first cleaning it (i.e. which includes tasks like resizing, smoothing and
morphological transformation) using appropriate image preprocessing algorithms. Finally, a
Convolutional Neural Network based recognition model capable of recognizing characters
contained in Ethiopian cars license plates has been built.

27
3.3. Process Design

Below is the research design process diagram which illustrates each step that were followed while
conducting this research starting from problem identification to conclusion. So, it started by
identifying the problem and formulating research questions. Then a literature was reviewed both
on scientific concepts and related works. After finishing the review general and specific objectives
of the research were specified. And then data collection, tool selection and dataset preparation
were conducted. Based on the dataset prepared and tools selected, the ELPR model was trained
and built. Finally, the model was evaluated and a conclusion was made.

Figure 3-1: Research Design Process

28
3.4. Data Collection

This main data that is taken as an input in this research is images of different license plates. But
before starting to take pictures of LP images, information about different currently existing LP
types in Ethiopia had to be collected. So, in conducting the whole work two major data collection
mechanisms were used.

Firstly, a literature or document review was conducted on related works with this research. In
doing so all the information regarding the types and variations of different LPs has been acquired.

So, Ethiopian license plates can basically be categorized based on plate code and its region. Plate
codes are represented both in numeric and character format where the numeric code ranges from
1 to 5 and the character codes which can either be Amharic or English have around 8 categories.
Considering regional classification, in Ethiopia there are 9 national regions (i.e. Tigray, Afar,
Amhara, Oromia, Somali, Benishangul-Gumuz, Southern Nations Nationalities and People
Region (SNNPR), Gambella and Harari) and 2 administrative states (i.e. Addis Ababa city
administration and Dire Dawa city council) [38] where each can print their own plates of code: “1-
5”, “የዕለት”, “ተላላፊ” and “ልዩ”. So, in order to identify a given plate the LPR system must be able to
get the its code and regional information since a same plate number can be given to different
vehicles in different regions and plate codes. Each plate’s code class has different foreground color
while their background color is all white except for police vehicle plates which is yellow.

Number Foreground EN/AM Foreground EN/AM Foreground


Code Color Code Color Code Color
1 Red CD/ኮዲ Black -/የዕለት Red
2 Blue AU/አህ Light Green -/ተላላፊ Light Blue
3 Green AO/ዕድ Orange -/ልዩ ተ Red
4 Black UN/የተመ Light Blue Table 3-1: Ethiopian Car's License
Plate Color Properties Based on
5 Orange -/ፖሊስ Black There Code

Morphologically speaking Ethiopian license plates have two formats, that are single row and
double row plates.

29
Figure 3-2: Sample Ethiopian License
Plates

Secondly, using the information found from previous phase of data collection (i.e. literature
review), different types of license plate images were taken using a digital camera considering
varying factors like environmental condition and different car angles. A digital camera having a
resolution of 13 MP was used to capture all the images. Both back and front sides of the vehicle
were captured by varying camera angles to right and left of the license plate. Different
environmental conditions like rainy, sunny and night time were considered, where finally, a total
of 1100 images were collected.

3.5. Data Analysis

Once enough images were collected, an analysis was performed in order to make sure that the
license plate is captured in an appropriate way. So, it needs to be made sure that the LP in an image
is not partially cutoff, occluded by some other object in a context, the plate is so far from the
camera that it is not even distinguishable to human eye, the plate characters are blurred due to
unstable capturing and camera angle where the plate characters all seem to be merged together.

30
3.6. Tools

Figure 3-2: Tool Selection Diagram

Here all the tools that are both hardware, software and programming languages which were used
starting from writing the document to implementing the model has been mentioned and specified.

31
3.6.1. Hardware Tools

➢ A DELL computer with Intel Core I5-5200U CPU at 2.20GHZ processor, a 4GB RAM
and 1TB hard disk were used for writing the document and training the model.
➢ A Techno mobile with a rear camera of 13MP and a storage of 16GB were used in capturing
LP images.

3.6.2. Software Tools

➢ OS: since most of the programming tools used in this research were built to be used with
Windows operating system, Windows 10 was used as an OS in both composing the
document and training the DL model.
➢ Composing: in writing the research document MS-Word 2019 was used
➢ Diagrams: in writing the document the diagrams are drawn with MS-Visio 2019.
➢ Image Annotation: once the images were collected, a software called labelImg which was
built using python programming language and QT for graphical user interface, was used in
labeling (or annotating) the images collected. This tool is preferable because it is easy to
use, its light weight and it stores the labeling information in XML file in PASCAL VOC
format (a format user by an image database organized according to WordNet hierarchy
called ImageNet) that is suitable with the object detection platform’s requirement.

3.6.3. Programming Languages and Platforms

➢ Python: is used as a major programming or scripting language in building the model.


Python is currently a state-of-art language when it comes to machine learning. It has many
powerful and efficient libraries and platforms like Pandas, NumPy, Scikit-learn and
Tensorflow that enables to realize any machine learning research (or deep learning).
Python gets its popularity in the field of AI mainly because of its simplicity, adaptability
and that it supports different styles of programming like object, functional and object
oriented.

➢ OpenCV-Python (Open Computer Vision): is a library that is built with python and enables
to implement image processing tasks. All image processing parts of this research has been
implemented with this library.

32
OpenCV started as a research project at Intel and now it is basically the largest computer
vision library in terms of shear [39]. It contains implementations of more than 2500
algorithms.
➢ Tensorflow: is a mathematical library which provides tools which is used as an end-to-end
open source platform for building machine learning models. It makes the process of
building and training models easy using intuitive and high-level API like Keras [40].
It’s a math library build by Google Brain team to be used internally for both research and
production. It allows the creation of dataflow graphs where each node in a graph is a
mathematical operation and their connection is known to be a tensor (i.e. a
multidimensional array). The math operations in the libraries are written in C++ while the
high-level abstraction that enables the communication between those libraries is
implemented with Python [41].
➢ TFOD API: is Google’s open source framework built on top of Tensorflow to construct
and train object detection models [42]. It has two different variations of installation
depending on where one wants to run it. There is Tensorflow CPU that runs on CPU and
Tensorflow GPU that runs on GPU.
➢ Faster R-CNN Model: when it comes to selecting an appropriate model for a problem at
hand there is always a tradeoff between speed and accuracy. So, in this research a license
plate has to be detected from an image which doesn’t need that much speed compared to
other real time systems. Not only that but the detection accuracy is very much needed
because if the bounding box of detected area has even a little bit of inaccuracy, there will
be some characters of the plate missing which impacts the later stages and the recognition
as a whole. And the other thing to note while choosing object detection model is the size
of the object to be detected and license plates are really small.
So, considering the above-mentioned criteria, Faster R-CNN proved to be the most
accurate detection model based on Google’s research [42]. Based on the research Faster R-
CNN tends to be slower but more accurate than R-FCN and SSD requiring about 100 ms
per image.

33
Figure 3-3: Deep Learning Based Object Detection Model’s Speed and Accuracy Comparison

➢ Feature Extraction: the accuracy and speed of the Faster R-CNN detection model highly
depends on the feature extractor that it uses [43]. And even though there are many feature
extractors like: VGG, MobileNet, Inception, ResNet and Inception-ResNet, this research
uses ResNet101which enables Faster R-CNN achieve higher accuracy next to Inception-
ResNet.
The reason why Inception-ResNet won’t be used is because it takes higher GPU time (or
has slower speed) compared to ResNet for almost negligible accuracy tradeoff [42].

Figure 3-4: Deep Learning Based Object Detection Model's Accuracy with Different Feature Extractors

34
Figure 3-5: Deep Learning Based Object Detection Model's GPU Time (ms)

35
4. CHAPTER FOUR: RESEARCH DESIGN

4.1. Introduction

This chapter of the document discusses about the research design that this research employed. The
first section discusses in detail about different properties of Ethiopian license plates. The second
section discusses about proposed design for each phase individually and finally the third section
discusses about how the three modules from each corresponding license plate recognition phase
communicate with each other and work as a single system.

4.2. Proposed Design

In order to build a deep learning model for license plate recognition which contains both image
detection and classification tasks, the research has to go through many stages which are shown in
an overall design diagram below:

36
Input Image to Be
Image Data Acquisition
Detected

English Amharic Noise Removal and


License Plate Arabic Number s
Character Character s Other Image
Images (0 - 9) Images
Images Images Preprocessing

Cleaning

Augmentation

Splitting Segmentaion

Test Set Train Set

Train and Evaluating


Model

Recognition Model Detection Model

Recognized Output

Figure 4-1: Overall System's Design Diagram

4.2.1. Image Data Acquisition

In this part of the research, images of the different cars with different variations of license plates
have been captured using a digital camera with a resolution of 13 MP. Addis Ababa were chosen
as a place of data collection because it is a capital city of Ethiopia and so, different cars from
different regions of the country having different regional code can be found there. The images
were taken from different distance, camera angles and illumination in order improve model’s
detection and recognition accuracy under different conditions and circumstances. The images were
also captured while the car is moving with slow speed, medium speed and high speed.

37
The images needed to train the recognition model (English letters, Numbers and Amharic letters)
were taken from cropped from the collected license plate images.

4.2.2. Cleaning

In this part of the research before the images were used in training there was image pyramiding
step. Pyramiding is a representation where the image has to go through series of smoothing and
subsampling. There are two types of pyramiding, Gaussian and Laplacian pyramids. In this
research in order to make the model training process faster a down pyramiding with Gaussian filter
was used. Here 5 x 5 Gaussian kernel is used to produce (i + 1) layer from input layer (i). So, the
resulting image denoted Gi which is found after convolving Gi-1 (input image) will be one fourth
of the original area, which is also called octave. [14]. So, if the original image were M x N it will
be M/2 x N/2 (at second level of subsampling). All the images used in detection model were
reduced to height less than 1600 pixels and a width less than 1200 pixels.

Figure 4-2: Visual representation of image pyramid with 5 levels (image source: Wikipedia)

4.2.3. Augmentation

Image data augmentation is a crucial stage while developing deep learning models. It enables
expanding the amount of training image data in a given training dataset by creating modified
versions of images in a dataset which in turn makes the model generalize. Even though there are
different augmentation techniques out there, in this research a Random Horizontal flip is used
which is basically reversing the columns of pixels and it helps the plate to be detected in scenarios
where the camera is unintentionally installed inappropriately. Other augmentation styles like
vertical or horizontal shift weren’t used because in case of shifting some part of the plate might be
cutoff and that makes recognition impossible.

38
4.2.4. Splitting

Splitting is a process of separating the whole dataset in to Train and Test set, where a train set will
be used by a model to learn different categories or classes by making predictions on input image
and making corrections when the prediction that has been made is wrong. Test dataset is used to
evaluate the performance of the model once it has been trained. Even though there is no rule in
choosing the proportion of what train and test set size to use, after reading many blogs and books
80 to 20 percent ratio (80% for train and 20% for test) has been used splitting the entire dataset.
So, for the detection model out of 1100 collected car images 880 were used for train set and the
rest 220 were used for test set. And for classification model out of 4240 total cropped character
images 3392 were used for train set and the rest 848 were used for test set.

4.2.5. License Plate Detection

The license plate detection part was developed using deep learning approach but while
applying this approach for object detection there is one tradeoff that must be considered
depending on the problem and that is Speed/Accuracy. So, considering license plate detection,
accuracy is more important feature to consider than speed because missing even one character
from the detected plate means not identifying the vehicle as a whole. So, in this research a
better model (when it comes to accuracy) called Faster R-CNN has been used [44]. This model
is mainly composed of two modules. The first one is a fully convolutional network that
proposes regions letting it have two primary benefits, like being fast and able to accept images
of varying resolution having any width and height. The second one is the Fast R-CNN detector
which uses the proposed regions from the first module.

Image of a Vehicle Plate Detection with Extracted Plate


Faster R-CNN Image
Figure 4-3: License Plate
Detection Pipeline

39
4.2.5.1.Faster R-CNN Detection

Figure 4-4: Faster R-CNN Architecture Diagram

So, for feature extraction although the original Faster R-CNN paper has used VGG (a CNN
architecture invented by Visual Geometry Group with 13 shareable convolutional layers) and
ZF (Zeiler Fergus CNN architecture with 5 shareable convolutional layers) as base networks,
in the developed system deeper and more accurate ResNet (Residual Network, a CNN
architecture 101 layers) has be used [4] [44].
ResNet is a much deeper network than VGG. It uses residual model to train CNNs which have
over 1000 layers on CIFAR-10 dataset (consists of 60000 32 X 32 color images in 10 classes)
[45] [46]. In order to reduce the volume size ResNet uses only two pooling layers, the first one
is Max Pooling which is used at the beginning to reduce the special dimensions and the second
one is Average Pooling at the end. Unlike the common CNNs ResNet adds the original input
to the outputs of convolution, ReLU (Rectified Linier Unit) and BN (Batch Normalization)
layers, this operation is called Identity Mapping and the reason for using the term Residual.
Batch Normalization layer normalizes the activation from a given input layer and passes it to
the next layer, which helps to stabilize training and reduce the number of epochs used to train
the model [4].
In this research ResNet 101 has been used as base network for feature extraction so it has 101
layers. At first it resizes an image having 3 channels (Red, Green and Blue) into 224 X 224
(height to width), so as a result the input shape will be 224 X 224 X 3. After the first
convolution layer having (7 X 7) with a depth of 64 and stride 2 the output will be 112 X 112.

40
Table 4-1: ResNet101 Network Architecture

Layer Name Output Size Layers


Conv1 112 x 112 7 x 7, 64, Stride 2
Conv2_x 56 x 56 3 x 3 max pool, stride 2

1 x 1, 64
3 x 3, 64 x3
1 x 1, 256

Conv3_x 28 x 28
1 x 1, 128
3 x 3, 128 x4
1 x 1, 512

Conv4_x 14 x 14
1 x 1, 256
3 x 3, 256 x 23
1 x 1, 1024

Conv5_x 7x7
1 x 1, 512
3 x 3, 512 x3
1 x 1, 2048

1x1 average pool, 1000-depth – fc (fully connected layer), and


finally SoftMax (activation)
FLOPS 7.6 x 109

41
Figure 4-5: Sample Input Images and Their Output After Plate Has Been Detected

4.2.5.2. Plate Extraction


Once the plate has been detected or located, the detection model provides a list of coordinates of
possible plate region with their corresponding score results. So, the extraction phase finds the

42
highest scoring bounding box crop (using a basic python slicing). The result of the extraction phase
(i.e. cropped plate image) is given as an input to the segmentation phase.

Figure 4-6: License Plate Extraction Samples

4.2.6. Character Segmentation

The segmentation of characters on the detected license plate is done with a combination of different
digital image processing techniques. OpenCV or Open Computer Vision with python has been
used to do each image processing task including all necessary preprocessing or cleaning.

4.2.6.1. Orientation Adjustment


Before segmenting the characters on the plate, the orientation of the plate should be in a correct
format (i.e. horizontally aligned). So, in order to do that first angle of the longest line, which is
either the upper or the lower border of the plate needs to be found.

First the input image is gray scaled and then the all the edges are found using Canny edge detector.
Canny edge detector is a multi-stage edge detector which comprises 4 stages. On its initial phase,
it does Noise Reduction with 5 x 5 Gaussian filter and then it finds the Intensity Gradient of the
image by filtering it in both horizontal (Gx) and vertical (Gy) direction. The gradient will always
be perpendicular to the edges. It uses the below equations:

43
Equation 4-1: Equation of Canny Edge Detector

Once the gradients magnitude and direction has been found the next step is Non-Maximum
Suppression where pixels that doesn’t constitute an edge are removed. Finally, with Hysteresis
Thresholding by using 2 threshold values (min and max) it decides whether an edge is really edge
or not. If the edge’s intensity value is more than max then it is sure-edge else if it is less than min
it is not. If intensity value falls in between max and min value the algorithm checks its connectivity.
So, if it is connected to sure-edge then it is an edge else it is not.

Once all the edges are found using Canny next step is to find edges that constitute a line using
HoughLines method that takes four parameters. First parameter is a binary image that has been
found using canny, second and third parameters are rho and theta values (where rho is the
perpendicular distance from origin to the line and theta is the angle formed by this perpendicular
line and horizontal axis) which are measured in pixels and radians respectively. The fourth
argument is a threshold value that specifies the least vote it should get in order to be considered as
a line. Finally, by using angle of the longest line, the transformation matrix is found using the
below equation (Equation 4.2), which is a scaled rotation with an adjustable center of rotation so
that it can be rotated at any location of preference depending on the angle of a line [14]:

Equation 4-2: Equation for Transformation Matrix in HoughLines

The transformation matrix is given to warpAffine method which takes 2 x 3 transformation matrix
and returns the transformed image based on the matrix given.

44
Figure 4-7: License Plate Orientation Adjustment Sample

4.2.6.2. Border Elimination


Here the upper and lower borders of the horizontally oriented plate image are eliminated. The first
step is to determine whether the image has one or two rows. So, the aspect ratio of the image is
taken into consideration. Once the number of rows is known HoughLines is used to get either the
upper of the lower border (since the longest line is always one of the two) of a plate image.

If the plate image has one row then the height of the largest contour is to be either added or
subtracted from the y value of the detected line. If the detected line is bottom of the plate then
height of the largest contour is subtracted else it gets added. So, the resulting area is taken as ROI
(Region of Interest) and gets cropped out.

Figure 4-8: Border Elimination Sample

4.2.6.3. Segmentation
Here all the characters on the plate are going to be segmented into individual letters and numbers.
So, the output image from border elimination is binarized using OTSU method which works on
bimodal images. Bimodal image is an image whose histogram has two peaks. So, in OTSU unlike
global binarization methods where an arbitrary value is taken as a threshold, it takes an
approximate value that is in a middle of the two peaks of a bimodal image. OTSU algorithm tries
to find a threshold value (t) which minimizes the weighted within-class variance given by relation
[14]:

45
Equation 4-3: Equation through which OTSU Finds Threshold Value (t)

Once an image has been binarized the next step is to find all the contours. Contour is simply a
curve that joins all continuous pixels that have same intensity level. It takes three arguments where
the first is source image, second is contour retrieval method and third is contour approximation
method. Approximation method determines whether the contour holds all the coordinates of the
boundary or not. In this research a method called CHAIN_APPROX_SIMPLE is used which
removes redundant points and compresses the contours in order to save memory.

Although there are different kinds of contour retrieval modes, this research used
RETER_EXTERNAL because of the unique properties that Ethiopian license plates have. For
some specific characters like number codes, rather than taking circle shape and the number code
inside it separately RETER_EXTERNAL takes the circle and the number code all at once.

Finally, noises (i.e. non-characters) are going to be filtered based on their contour area. So, any
contour that does not have properties of a plate character such as width and height is removed and
only valid contours are cropped and saved as separate images for recognition model.

46
Figure 4-9: Plate Character Segmentation Sample

4.2.7. Character Recognition

In this part a convolutional neural network (CNN) has been trained to classify or recognize the
characters on the plate. Plate characters or images have 3 channels (RGB) with both height and
width having a size of 28. ReLU (Rectified Linier Unit) has been used as an activation function
except for the outer layer of the network where SoftMax has been used since it is preferable choice
when it comes to a deep learning model with more than 2 classes [47].

ReLU unlike the old Sigmoid and Tanh functions it is not saturable meaning the gradients doesn’t
get killed when neurons saturate (i.e. It doesn’t have a vanishing gradient problem). It is also
extremally computationally efficient and sparsely activated meaning there is a strong likelihood
for any given unit not to activate at all since it is zero for all negative inputs. And it shows better
performance being applied on different application areas [4]. Visually:

47
Figure 4-10: ReLU Activation Diagram

As a final layer an activation function called SoftMax which mostly used in multiclass
classification problems [48]. It extends the idea of logistic regression which produces a decimal
between 0 and 1 into a multi-class problem by assigning probabilities for each class in that
problem. Those decimal probabilities must finally add up to 1.0. SoftMax should be used with
mutually exclusive classes because it predicts only one class at a time. Below is its equation:

Equation 4-4: Equation for SoftMax Activation Function

Where:

• K is the number of classes, s(x) is a vector containing the scores of each class for instance
x, is the estimated probability that the instance x belongs to class k given the scores
of each class for that instance [20].

The number of samples preprocessed before updating the model (batch size) is 1 and an iterative
algorithm, SGD (Stochastic Gradient Descent) has been used as an optimizer while training. SGD
is a modification to standard Gradient Decent algorithm. After computing the gradient, rather than
updating the weight matrix on the whole training data it only updates on small batches or samples.
This make the algorithm much faster since the data or number of samples being manipulated at
each iteration is really small.

48
As mentioned above in this section of the document, the recognition model is going to be trained
with images of a character having 3 channels (Red, Green and Blue).

The input shape is 28 X 28 X 3, below is a table that contains an overall network architecture:

Table 4-2: Character Recognition Model's Architecture

Layer Type Output size Filter size / stride


Input Image 28 x 28 x 3
Conv 28 x 28 x 20 5 x 5, k = 20
ACT 28 x 28 x 20
POOL 14 x 14 x 20 2x2
Conv 14 x 14 x 50 5 x 5, k = 50
ACT 14 x 14 x 50
POOL 7 x 7 x 50 2x2
FC 500
ACT 500
FC 16
SoftMax 16

As it can be seen in a table the network has 2 convolutional layers, 2 pooling layers, 3 ReLU
activations, 2 fully connected layers and finally a SoftMax activation.

Max pooling has been used for all pooling layers. Pooling basically reduces the special size of
the input image which in turn reduces the number of parameters and computation in the network.
Although a pool size more than 2 x 2 can be used for larger input images, for smaller sized ones
like those used in this research 2 x 2 is an appropriate choice.

So, the architecture looks like:

Input → Conv → ReLU → POOL → Conv → ReLU → POOL → FC → ReLU → FC

Figure 4-11: Character Recognition Model's Architecture

49
5. Chapter Five: Results and Discussion

5.1. Introduction

In this chapter of the research, the results of the developed system are evaluated and its outcome
has been discussed. So, the first part is about object detection, the second part about character
segmentation and finally the third part about character recognition (i.e. classification).

5.2. Object Detection

The object detection model takes an image which contain different objects including the car and
detect or localize only the license plate. the model was trained with 1100 different images that
include both the car and the license plate. The license plates images in the training dataset are
composed of different plates with varying regional code and distance from the camera so that the
model may be able to generalize more.

The whole dataset is divided into training and testing set with a ratio of 80 to 20 respectively. So,
the training set contains 880 images and the testing set contains 220 images. The original images
were taken with a 13 MP camera that has an image resolution of 3120 x 4160 which is too big for
training. So, image pyramiding was used in order to reduce the resolution of the image. After the
pyramiding a resolution of 3120 x 4160 image was reduced to 780 x 1040.

The model was trained with 50000 steps and a batch size of 1. It has a learning rate of 0.0001
which were scheduled to decrease to 0.00001 after 90000 steps and to 0.000001 after 120000 steps,
but since the loss stopped to drop down after the 40000’s step training process was terminated after
going 50000 steps so the learning rate remained at 0.0001.

So, after training the following results:

Figure 5-1: ResNet Trained Detection Model's Evaluation Result in AP an AR

The box classifiers classification and localization loss are 0.018383 and 0.012104 respectively.

50
Figure 5-2: Graphical Representation of ResNet Trained Detection Model's Evaluation Result in AP an AR

Among the nine images which were randomly chosen during model evaluation the license plates
were detected successfully in all cases.

51
Figure 5-3: Sample Detection Results

52
The same dataset was trained with Inception v2 model having a learning rate of 0.0002 and a batch
size of 1. And it had lower MAP @ 0.75 IOU and AR @ 0.50 IOU that is 0.849132 and 0.732857
respectively, which results in the detected plate having some of its part being cutoff. Below are
its results:

Figure 5-4: Inception V2 Trained Detection Model's Evaluation Result in AP an AR

Figure 5-5: Graphical Representation of Inception V2 Trained Detection Model's Evaluation Result in AP an AR

53
So, the proposed model not only showed a better detection accuracy than a model by [2] which
was developed with conventional image processing techniques having a detection accuracy of
88.9, it also performed better than a deep learning model trained with Inception feature extractor.

Below is comparison of the model with some related works:

Table 5-1: Comparison of Ethiopian License Plate's Detection Model with Some Related Works

Related Works Accuracy


YOLO based detection [34] 94.23 %
Fast-YOLO [37] 95.07 %
Plain CNN [32] 96 %
Proposed System 99.1%

As can be seen from the table above the proposed model achieved a better accuracy comparatively.

5.3. Character Segmentation

The character segmentation part is developed using conventional image processing methods which
were specifically described in chapter 4. So, after testing the algorithm on 15 randomly selected
plate images, it was able to detect 13 correctly but 2 of them had some of its characters missing
due plate’s image quality. Both of the images that are incorrectly segmented are blurred and have
very low image quality which actually shows the drawback of conventional image.

So, the segmentation module achieved a total of 86.66 % which wouldn’t have been achieved if
there were no preprocessing like orientation adjustment and plate border elimination.

Below is sample of some correctly segmented images:

54
Figure 5-6: Sample of Some Correctly Segmented License Plates

And below are sample of some incorrectly segmented images:

Figure 5-7: Sample of Some Incorrectly Segmented License Plates

55
5.4. Character Recognition

The character recognition part was developed using Convolutional Neural Network which as
stated in chapter 4 of this document has 2 convolutional layers, 2 pooling layers, 3 ReLU
activations, 2 fully connected layers and finally a SoftMax activation. The final model that this
research used has 0.01 learning rate, SGD optimizer, batch size of 1 and ReLU activation function.
But to select those hyperparameters different experiments were done.

Firstly, the model was trained with the following parameters:

Table 5-2: CR Model’s First Experiment Parameters

Model Name Parameters Parameter Values


Test Size 25 %
Optimizer SGD
Learning Rate 0.0001
Model 1 Epochs 50
Batch Size 1
Metrices Accuracy
Activation ReLU

So, after training the model the following results were found:

Table 5-3: CR Model’s First Experiment Evaluation Results

Model Training Training Validation Validation Average Precision


Name Loss Accuracy Loss Accuracy on Each Class

Model-1 0.1697 0.9485 0.3204 0.9038 0.91

So, the model was able to get 94% accuracy and 91% AP.

56
Figure 5-8: CR Model's First Experiment Evaluation Results Graphically

On the second round the model was trained with a learning rate 0.001 by keeping other parameters
used on model one as they are.

After training the model the following results were found:

Table 5-4: CR Model’s Second Experiment Evaluation Results

Model Training Training Validation Validation Average Precision


Name Loss Accuracy Loss Accuracy on Each Class

Model-2 0.0017 0.999 0.2281 0.9550 0.96

The model was able to get 99.9% accuracy and 96% AP which is a lot better than the Model-1 but
still training accuracy and validation accuracy lines could still be closer. The average precision has
also increased by 5% (from 91% to 96%).

57
Figure 5-9: CR Model’s Second Experiment Evaluation Results Graphically

On the third round the model was trained with a learning rate 0.01 by keeping other parameters
used on Model-2 as they are.

After training the model the following results were found:

Table 5-5: CR Model’s Third Experiment Evaluation Results

Model Training Training Validation Validation Average Precision


Name Loss Accuracy Loss Accuracy on Each Class

Model-3 0.000029 0.999 0.1488 0.98 0.98

The model was able to get 99.9% accuracy and 98% AP. As it can be seen from the table, Model-
3 is yet a better model compared to Model-1 an Model-2 not only it increased in its AP but the
validation accuracy is much closer to training accuracy which shows that the model is not
overfitted.

58
Figure 5-10: CR Model’s Third Experiment Evaluation Results Graphically

On the fourth round the model was trained with a batch size of 128 and learning rate 0.01 which
is a learning rate used in Model-3, it is kept because it enabled the model to get better accuracy.
Other parameters are kept as they are.

After training the model the following results were found:

Table 5-6: CR Model’s Fourth Experiment Evaluation Results

Model Training Training Validation Validation Average Precision


Name Loss Accuracy Loss Accuracy on Each Class

Model-4 0.2255 0.9440 0.3190 0.9150 0.92

As can be seen from the table changing batch size to 128 decreased the accuracy from 99.99% to
94.4% which is a lot and the AP is also decreased from 98% to 92%.

59
Figure 5-11: CR Model’s Fourth Experiment Evaluation Results Graphically

On the fifth-round batch size is changed back to 1, learning rate is 0.01 and activation function is
changed to tanh. Other parameters are kept the same as Model-4.

After training the model the following results were found:

Table 5-7: CR Model’s Fifth Experiment Evaluation Results

Model Training Training Validation Validation Average Precision


Name Loss Accuracy Loss Accuracy on Each Class

Model-5 0.00019 0.999 0.1481 0.9652 0.97

Model-5 has shown very good results almost closer to Model-3 but still its validation accuracy
and AP are less. In Model-3 the validation accuracy was 98% while Model-5 got 96.52% and AP
of Model-3 was 98% while in Model-5 it is 97%.

So, a Learning Rate: 0.01, Batch Size: 1, Optimizer: SGD, Epochs: 50, Activation function: ReLU
resulted to be better parameters for LP dataset. And with those parameters the model is able to
generalize without neither overfitting nor underfitting.

60
Figure 5-12: CR Model’s Fifth Experiment Evaluation Results Graphically

So, the research used Model-3 as its recognition model since it performed better compared to the
other four.

Below is a summary table for all the 5 models:

Table 5-8: Summary Table for The Experiments

Model Training Training Validation Validation Average Precision


Name Loss Accuracy Loss Accuracy on Each Class

Model-1 0.1697 0.9485 0.3204 0.9038 0.91


Model-2 0.0017 0.999 0.2281 0.9550 0.96
Model-3 0.000029 0.999 0.1488 0.98 0.98
Model-4 0.2255 0.9440 0.3190 0.9150 0.92
Model-5 0.00019 0.999 0.1481 0.9652 0.97

61
6. Chapter Six: Conclusion and Future Work

6.1. Conclusion

This research studied about a deep learning-based approach for recognition of Ethiopian car license
plates. This research is important because Ethiopian license plates has their own unique features
and this problem has only been approached with conventional Image processing method. This
study has three main parts: License Plate Detection, Character Segmentation and Character
Recognition.

In the first part that is license plate detection an image in RGB format is given to the detection
model and it detects a license plate area that has the highest score. A detection model basically
scans the whole image in search of a specific object which in this case is the license plate. The
detected plate is then given to the segmentation model. Overall, the detection model was developed
with Faster RCNN using ResNet as a feature extractor (which originally used VGG). The model
was able to get 99.1% mAP@50IOU.

Once the license plate is detected its characters are segmented using image processing methods.
So, the segmentation module takes an image: preprocess it, performs orientation adjustment,
removes the borders and finally segment each alpha numeric character. It was built with OpenCV-
python. The segmentation model achieved 86.66%.

The segmented characters are going to be given to the classification or recognition model which
was developed using Convolutional Neural Network. The CNN model classifies each character
image to its corresponding class. It achieved a pretty good classification accuracy in both training
and validation set which is 99.9% and 98% respectively.

The experimental results of this research show that all the three modules perform very well and
each achieved much better accuracy compared to the related works. Which proves that deep
learning is by far superior when it comes to modeling an object or a single class that has
inconsistent features such as Ethiopian license plates. Deep learning reduces the load of extracting
image features manually by directly learning from the them which is the main reason for it to get
state of art accuracy.

62
6.2. Future Work

Both deep learning models: detection and recognition, work well but the segmentation model
which is developed using conventional image processing could be made more accurate by also
implementing is with deep learning-based segmentation approach.

63
References

[1] X. Lele, . A. Tasweer and J. Liyanwen , "A New CNN-Based Method for Multi-Directional
Car License Plate Detection," IEEE, p. 11, 2018.

[2] S. Nigussie and A. Yaregal , "Automatic Recognition of Ethiopian License Plates," IEEE, p.
5, 2015.

[3] Y. LeCun, Y. Bengio and G. Hinton, "Deep Learning," Nature, vol. 521, p. 9, 2015.

[4] A. Rosebrock, Deep Learning for Computer Vision with Python, PyImageSearch, 2017.

[5] I. Goodfellow, Y. Bengio and A. Courville, Deep Learning, MIT Press, 2016.

[6] Mathworks, "Mathworks," [Online]. Available: www.mathworks.com/discovery/deep-


learning.html. [Accessed 18 12 2018].

[7] "Wikipedia," 9 May 2019. [Online]. Available:


https://en.m.wikipedia.org/wiki/Computer_Vision. [Accessed 9 May 2019].

[8] C. Tang, Y. Feng, Y. Xing, C. Zheng and Y. Zhou, "The Object Detection Based on Deep
Learning," IEEE, p. 6, 2017.

[9] N. Jmour, S. Zayen and A. Abdelkrim, "Convolutional Neural Networks for Image
Classification," IEEE, p. 6, 2018.

[10] ARH, "Automatic Number Plate Recognition," Adaptive Recognition Hungary, [Online].
Available: www.anpr.net/anpr_09/anpr_applicationareas.html. [Accessed 18 12 2018].

[11] Luozm, "Different Tasks in Computer Vision," [Online]. Available:


www.luozm.github.io/cv-tasks. [Accessed 18 12 2018].

[12] "Wikipedia," [Online]. Available:


https://en.m.wikipedia.org/wiki/Digital_image_processing. [Accessed 15 April 2019].

64
[13] I. Culjak, D. Abram, T. Pribanic, H. Dzapo and M. Cifrek, "A Brief Introduction to
OpenCV," IEEE, p. 6, 22 May 2012.

[14] A. Mordvintsev and A. K, "OpenCV-Python Tutorials Documentation," 5 Nov 2017.


[Online]. Available: https://opencv-python-
tutorials.readthedocs.io/en/latest/py_tutorials.html. [Accessed 1 May 2019].

[15] R. Fisher, S. Perkins, A. Walker and E. Wolfart, "Image Processing Learning Resources,"
HIPR2, [Online]. Available: homepages.inf.ed.ac.uk/rbf/HIPR2/hipr_top.htm. [Accessed 10
May 2019].

[16] A. and R. Kaur, "Review of Image Segmentation Technique," IJARCS, p. 4, 2017.

[17] W.-X. Kang, Q.-Q. Yang and R.-P. Liang, "The Comparative Research on Image
Segmentation Algorithms," IEEE, p. 5, 2009.

[18] L. Chao-yang and L. Jun-hua, "Vehicle License Plate Character Segmentation Method Based
on Watershed Algorithm," IEEE, p. 6, 2010.

[19] F. Chollet, Deep Learning with Python, New York: Manning Publications Co., 2018.

[20] A. Géron, Hands on Machine Learning with Scikit-Learn and Tensorflow, United States of
America: O’Reilly Media, Inc., 2017.

[21] Q. Wu, Y. Liu, Q. Li, S. Jin and F. Li, "The Application of Deep Learning in Computer
Vision," IEEE, p. 6, 2017.

[22] T. Guo, J. Dong, H. Li and Y. Gao, "Simple Convolutional Neural Network on Image
Classification," IEEE, p. 4, 2017.

[23] S. ALBAWI, T. A. MOHAMMED and S. AL-ZAWI, "Understanding of a Convolutional


Neural Network," IEEE, p. 6, 2017.

[24] "Wikipedia," 21 April 2019. [Online]. Available:


http://en.wekipedia.org/wiki/Convolutional_neural_network. [Accessed 1 May 2019].

65
[25] N. Aloysius and G. M, "A Review on Deep Convolutional Neural Netrorks," IEEE, p. 5,
2017.

[26] X. Zhou, W. Gong, W. Fu and F. Du, "Application of Deep Learning in Object Detection,"
IEEE, p. 4, 2017.

[27] W. G. Hatcher and W. Yu, "A Survey of Deep Learning: Platforms, Applications and
Emerging Research Trends," IEEE, p. 21, 2018.

[28] "KDNuggets," 2018. [Online]. Available: https://www.kdnuggets.com/2018/04/top-16-


open-source-deep-learning-libraries.html. [Accessed 10 May 2019].

[29] "Wikipedia," 28 April 2019. [Online]. Available:


https://en.m.wikipedia.org/wiki/Vehiclle_registration_plate. [Accessed 1 May 2019].

[30] "NGSC," 12 January 2017. [Online]. Available: https://ngscinc.com/advantages-of-license-


plate-recognition-systems/. [Accessed 1 May 2019].

[31] "Omnitec," 12 February 2018. [Online]. Available:


www.omnitecgroup.com/blog/advantages-of-automatic-license-plate-recognition-system-
6545. [Accessed 1 May 2019].

[32] Z. Selmi, M. B. Halima and A. M. Adel , "Deep Learning System for Automatic License
Plate Detection and Recognition," IEEE, p. 7, 2017.

[33] Z. Zhihong, Y. Shaopu and M. Xinna , "Chinese License Plate Recognition Using a
Convolutional Neural Network," IEEE, p. 4, 2008.

[34] C.-H. Lin, Y.-S. Lin and W.-C. Liu, "An Efficient License Plate Recognition System Using
Convolution Neural Networks," IEEE, p. 4, 2018.

[35] P. Prabhakar, A. P and R. S. R, "Automatic Vehicle Number Plate Detection and


Recognition," IEEE, p. 7, 2014.

66
[36] M. Mondal, P. Mondal, N. Saha and P. Chattopadhyay, "Automatic Number Plate
Recognition Using CNN Based Self Synthesized Feature Learning," IEEE, p. 4, 2017.

[37] S. Montazzolli and C. Jung, "Real-Time Brazilian License Plate Detection and Recognition
Using Deep Convolutional Neural Networks," IEEE, p. 8, 2017.

[38] "Ethi Visit," [Online]. Available: http://www.ethiovisit.com/ethiopia/ethiopia-regions-and-


cities.html. [Accessed 20 July 2019].

[39] S. Pal, "Analytics Vidhyda," 25 March 2019. [Online]. Available:


https://www.analyticsvidhya.com/blog/2019/03/opencv-functions-computer-vision-
python/. [Accessed 10 August 2019].

[40] "Tensorflow," Google, [Online]. Available: https://www.tensorflow.org/. [Accessed 10


August 2019].

[41] S. Yegulalp, "InfoWorld," 18 June 2019. [Online]. Available:


https://www.infoworld.com/article/3278008/what-is-tensorflow-the-machine-learning-
library-explained.html. [Accessed 10 August 2019].

[42] J. Huang, V. Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi, I. Fischer, Z. Wojna, Y. Song,
S. Guadarrama and K. Murphy, "Speed/accuracy trade-offs for modern convolutional object
detectors," IEEE, p. 10, 2017.

[43] J. Hui, "Object detection: speed and accuracy comparison (Faster R-CNN, R-FCN, SSD,
FPN, RetinaNet and YOLOv3)," 18 March 2018. [Online]. Available:
https://medium.com/@jonathan_hui/object-detection-speed-and-accuracy-comparison-
faster-r-cnn-r-fcn-ssd-and-yolo-5425656ae359. [Accessed 20 October 2019].

[44] S. Ren, K. He, R. Girshick and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection
with Region Proposal Networks," IEEE, p. 14, 2016.

[45] K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image Recognition," IEEE,
p. 9, 2015.

67
[46] A. Krizhevsky, "The CIFAR-10 Dataset," 2009. [Online]. Available:
https://www.cs.toronto.edu/~kriz/cifar.html. [Accessed 10 March 2019].

[47] A. Kızrak, "Towards Data Science," 9 May 2019. [Online]. Available:


https://towardsdatascience.com/comparison-of-activation-functions-for-deep-neural-
networks-706ac4284c8a. [Accessed 15 Oct 2019].

[48] "Google Developers," [Online]. Available: https://developers.google.com/machine-


learning/crash-course/multi-class-neural-networks/softmax. [Accessed 20 Dec 2019].

68
Appendixes
Appendix A: Sample Code for Detection Model’s Configuration

69
70
Appendix B: Sample Code for Character Segmentation

71
72
73
74
Appendix C: Sample Code for Character Recognition / Classification Model

75

You might also like