Nepalese Currency Recognition System
Nepalese Currency Recognition System
INSTITUTE OF ENGINEERING
THAPATHALI CAMPUS
Submitted By:
Ashma Rai (THA075BCT011)
Niruta Shrestha (THA075BCT030)
Shikshya Shiwakoti (THA075BCT041)
Swostika Basukala (THA075BCT048)
Submitted To:
Department of Electronics and Computer Engineering
Thapathali Campus
Kathmandu, Nepal
April, 2022
TRIBHUVAN UNIVERSITY
INSTITUTE OF ENGINEERING
THAPATHALI CAMPUS
Submitted By:
Ashma Rai (THA075BCT011)
Niruta Shrestha (THA075BCT030)
Shikshya Shiwakoti (THA075BCT041)
Swostika Basukala (THA075BCT048)
Submitted To:
Department of Electronics and Computer Engineering
Thapathali Campus
Kathmandu, Nepal
April, 2022
DECLARATION
We hereby declare that the report of the project entitled “Nepalese Currency
Recognition” which is being submitted to the Department of Electronics and
Computer Engineering, IOE, Thapathali Campus, in the partial fulfillment of the
requirements for the award of the Degree of Bachelor of Engineering in Computer
Engineering, is a bonafide report of the work carried out by us. The materials contained
in this report have not been submitted to any University or Institution for the award of
any degree and we are the only author of this complete work and no sources other than
the listed here have been used in this work.
i
CERTIFICATE OF APPROVAL
The undersigned certify that they have read and recommended to the Department of
Electronics and Computer Engineering, IOE, Thapathali Campus, a minor project
work entitled “Nepalese Currency Recognition System” submitted by Swostika
Basukala, Shikshya Shiwakoti, Niruta Shrestha and Ashma Rai in partial
fulfillment for the award of Bachelor’s Degree in Computer Engineering. The project
was carried out under special supervision and within the time frame prescribed by the
syllabus.
We found the students to be hardworking, skilled and ready to undertake any related
work to their field of study and hence we recommend the award of partial fulfillment
of Bachelor’s degree of Computer Engineering.
___________________________
Project Supervisor
Mr. Shikhar Bhattarai
Department of Electronics and Computer Engineering, Thapathali Campus
___________________________
External Examiner
Mr. Sudeep Shakya
Associate Professor
Kathmandu Engineering College
___________________________
Project Co-ordinator
Mr. Umesh Kanta Ghimire
Department of Electronics and Computer Engineering, Thapathali Campus
___________________________
Mr. Kiran Chandra Dahal
Head of the Department,
Department of Electronics and Computer Engineering, Thapathali Campus
April, 2022
ii
COPYRIGHT
The author has agreed that the library, Department of Electronics and Computer
Engineering, Thapathali Campus, may make this report freely available for inspection.
Moreover, the author has agreed that the permission for extensive copying of this
project work for scholarly purpose may be granted by the professor, who supervised
the project work recorded herein or, in their absence, by the head of the department. It
is understood that the recognition will be given to the author of this report and to the
Department of Electronics and Computer Engineering, IOE, Thapathali Campus in any
use of the material of this report. Copying of publication or other use of this report for
financial gain without approval of the Department of Electronics and Computer
Engineering, IOE, Thapathali Campus and author’s written permission is prohibited.
Request for permission to copy or to make any use of the material in this project in
whole or part should be addressed to Department of Electronics and Computer
Engineering, IOE, Thapathali Campus.
iii
ACKNOWLEDGEMENT
We would like to express our deepest gratitude to Er. Kiran Chandra Dahal, HOD of
Department of Electronics and Computer Engineering, Thapathali Campus for
providing this opportunity. We would also like to thank Er. Umesh Kanta Ghimire, Er.
Rama Bastola, DHOD of Department of Electronics and Computer Engineering and Er.
Shikhar Bhattarai, our supervisor under Thapathali Campus for their continuous and
constructive critique in research and study for the preparation of this project. Without
their guidance, this project would not have been successful.
This journey would also not have been possible without the sheer support of our
classmates, friends and family members. We would like to acknowledge their
encouragements and ever-lasting support.
iv
ABSTRACT
v
Table of Contents
DECLARATION.......................................................................................................... 1
COPYRIGHT ............................................................................................................... 3
ACKNOWLEDGEMENT ........................................................................................... 4
ABSTRACT .................................................................................................................. 5
1. INTRODUCTION.................................................................................................... 1
3. REQUIREMENT ANALYSIS................................................................................ 7
vi
5.2 Image Pre-Processing....................................................................................... 20
5.4 Architecture...................................................................................................... 21
8. CONCLUSION ...................................................................................................... 49
9. APPENDICES ........................................................................................................ 50
References ................................................................................................................... 52
vii
List of Figures
viii
Figure 6-9: Confusion Matrix for Model 3 .................................................................. 43
Figure 6-10: Output for Rupees 5 ................................................................................ 44
Figure 6-11: Output for Rupees 10 .............................................................................. 44
Figure 6-12: Output for Rupees 20 .............................................................................. 45
Figure 6-13: Output for Rupees 50 .............................................................................. 45
Figure 6-14: Output for Rupees 100 ............................................................................ 46
Figure 6-15: Output for Rupees 500 ............................................................................ 46
Figure 6-16: Output for Rupees 1000 .......................................................................... 47
ix
List of Tables
x
List of Abbreviations
AI Artificial Intelligence
BRIEF Binary Robust Independent Elementary Features
CNN Convolutional Neural Network
CRSFVI Currency Recognition System for Visually Impaired
FAST Features from Accelerated and Segments Test
IKA Inverse Kinematic Animation
LBP Local Binary Pattern
LVQ Learning Vector Quantization
ORB Oriented FAST and rotated BRIEF
RBF Radial Basis Function
SIFT Scale-Invariant Features
UV Ultraviolet
xi
1. INTRODUCTION
1.1 Background
With the arrival of the new age of digitalization, many problems of the past were solved
using the modern technologies and software. This has made the lives of differently-
abled people easy and has made it possible for them to live as a normal person. Various
software applications have been developed to help handicapped people such as Be My
Eyes, TapTapSee, Supersense, Audible, Blind Square, Color ID, KNFB Reader and
more.
Every year there are many cases of counterfeit currency in Nepal. Large scale
businesses use currency recognition device which verifies the authenticity using UV
lights, security threads and more. But such devices is not feasible for small businesses.
Currency recognition is even more difficult for individuals who are differently-abled
and this is what the project aims to aid.
Currency Recognition System is an artificial intelligence (AI) based project whose main
theme is to identify the paper notes. There are around 1.3 billion people estimated to be
living with visual imparity worldwide, out of which 36 million are blind. Almost 220
million people have moderate to severe vision impairment [1]. In Nepal, even though
the paper notes have some special markings in these notes to let the visually impaired
know what the value of the currency is, blind people with years of experience finally
learn to recognize them while few still have to seek help of others. With the help of our
system we aim to shed some lights to this problem.
1
1.2 Motivation
Upon analyzing the data received through the survey conducted by Anne Jarry and her
colleagues on ‘Blind Adults’ Perspectives on Technical Problems When Using
Technology’ [2], we deduced a conclusion. The age distribution included working-age
adults and among these candidates surveyed, most were visually impaired since their
birth. These people had been familiar with different types of technologies and operating
systems, so it is safe to presume these people were quite experienced on the concept
about how a simple mobile phone works. However, it was observed that major
challenges were faced due to the accessibility barriers and usability issues and so
external help was required. After having learnt the extent of difficulties that the visually
impaired people have to go through on a daily basis, it inspired a concept within our
team to create a user friendly system, focusing on the visually impaired people. The
concept of this project has been initiated as a stand to bring about focus on the
development of systems which could be usable by even handicapped people.
This problem is to train datasets for Nepalese Currency (Rs. 5, Rs. 10, Rs. 20, Rs. 50,
Rs. 100, Rs. 500 and Rs. 1000), develop a model with accuracy greater than 90 %,
deploy the model through web application and finally recognize Nepalese Currency
efficiently. The project is made with the vision of developing blind-friendly system in
recognizing currency in future, aiming to empower and aid them.
1.4 Objectives
2
1.5 Scope and Applications
This project aims to recognize Nepalese Currencies efficiently and accurately. The
system would be fed with a data to be recognized and an accurate denomination is to
be expected.
According to Nepalese Blind Survey conducted in 2011, about 0.84% Nepalese are
blind. While some might say, only minor people may benefit from this project, we shall
not forget the inclusiveness that is direly needed if we are aiming for a sustainable
development of country. With the advancement of technology and realization of
Nepalese parents to send their children to school irrespective of gender, caste, race and
any conventional pseudo-walls minorities have put up with, we have seen immense
involvement of children in education field. Blind people are no exception. It is no longer
surprising that blind people have excelled in technologies, mastered in various fields
and given a new light to their identity. According to an informal interview conducted
with one of the blind sisters, who is exceptionally well-versed, has completed her under-
graduation and currently pursuing Master’s degree, she spilled her trouble of inability
to recognize money, and how often she felt helpless. So, with the vision of helping the
entire blind community in future, we aim to develop Nepalese Currency Recognition
system.
The following report first discusses on the historical background of the technological
development regarding recognition of currencies worldwide, and elaborates on the
milestones achieved on Currency Recognition System. The required tools in order to
complete our work has been discussed within Chapter 3. In Chapter 4, it has been
further elaborated about the actual methodology we have proposed after the analysis of
different approaches already applied so far. In Chapter 5, we have included the actual
system architecture that we have used for the project. On the course of completion of
our project, the detours that had to be conducted within the architecture has been
explained in Chapter 6 along with its result analysis. The limitations and further future
enhancements that shall be done have been explained in Chapter 7. Analyzing the
results of different models, we have deduced our final approach that we have chosen
on the Conclusion Chapter.
3
2. LITERATURE REVIEW
In 2003,’Euro Recognition System’ was proposed by M. Aoba et al. The main features
of the method are: [3]
1. Three Layered perception
2. Radial Basis Function for accuracy check
"Celeric" system got proposed by D. Gunnaratna et al. with a shift operation in 2008.
Noise patterns were removed without disturbing the coin paper’s unique pictures. The
prominent features of this method are [5]:
1. Neural Networks trained with color, brightness, noise, dust, etc.
2. Many layers of back propagation used.
3. Canny Algorithm for edge recognition
The side of the precious paper recognition method was proposed by B. V. Chetan et al.
in 2012, with two-stages:
1. Match all database notes
2. Use correlation of the input edges.
Accuracy of 65% was achieved when applied to "Gabor Muweijeh”. Resolution of 51%
was achieved by the method of subtraction whereas 52.5% accuracy was achieved by
the Local Binary Pattern (LBP) method. The proposed technique output 99.5 % of
accuracy for a particular dataset [6].
4
A technique proposed by Manzoor and Ali in 2013 used image processing, which was
an economical cost-efficient method to identify the Pakistani currency, and the
efficiency was almost a hundred percent [7].
In 2012, a system was proposed by F. Lamont, et al. Different lighting change was the
main emphasis of image classification. The main objective of this approach was the
identification of Mexican Currencies. Local binary model was used to extract color and
texture and characterized accordingly [8].
Nayak and Danti in 2014 based the recognition on prominent characteristics such as the
denomination and print date; the research was done on Indian currency. The efficiency
was also based on geometrical shape [9].
Oriented FAST and rotated BRIEF (ORB) algorithm was suggested by Ahmed & Taha,
Mohamed & Selim, Mazen in 2018 [10], in which the FAST detector and the visual
descriptor BRIEF (Binary Robust Independent Elementary Features) was used
successfully becoming a more optimal alternative compared to Local Scale-Invariant
Features (SIFT). This approach was tested on six kinds of Egyptian currencies.
Important features were extracted from the background after image preprocessing,
using ORB, followed by Hamming Distance used for matching binary descriptors.
Accuracy of 96 % was achieved and runtime of 0.682s was observed, which was shorter
compared to CRSFVI system.
Our ‘Nepalese Currency Recognition System’ has used Convolutional Neural Network
because they are very good at picking up on patterns (lines, eyes, faces, colors, etc.) in
the input images. CNN can operate directly on a raw image without any preprocessing.
This can make training computationally heavy and might not be feasible at all times
5
since tuning many parameters can be a complex task, which is made more convenient
by the use of CNNs. Also, the images will be pre-processed and reshaped, alongside
with augmentation of them to improve the quality and performance of neural networks.
6
3. REQUIREMENT ANALYSIS
3.1.1 Python
3.1.2 Numpy
Numpy is a high performance numeric programming extension for python. It is the core
library for scientific computing in python.
3.1.3 Tensorflow
3.1.4 Keras
3.1.5 Flask
Since the project is based on using software applications, the only hardware required is
a PC with at least 4 GB RAM and dedicated GPU.
7
4. SYSTEM ARCHITECTURE AND METHODOLOGY
1. Dataset Collection
2. Image Processing
3. CNN Training
4. Recognition
In the first case, the data of sample images have been collected by capturing images
from cellphone and converted into trainable dataset. This involves series of image
processing steps after collecting the samples and then extracting the features from it.
Thus, extracted features of each image are properly labelled and used as dataset to train
and validate the network.
After this, the user provides the images and the system perform processing, and display
output from the provided image. Image processing is used while preparing dataset and
also during the recognition phase. Every real-world image that are to be recognized are
passed through the exact same image processing steps that were applied to the images
while preparing the dataset.
After the datasets are prepared, the CNN is built using appropriate algorithms and then
trained until we obtain a best hypothesis that can recognize images with highest
accuracy, greater than 90%. The model is taught about the various classes and the way
they are represented.
8
The trained classifier, in this case the CNN is used in recognition to get the
approximations about paper note currency that are present in the image to be
recognized.
Since, most of the neural network models train assuming a square shaped input image,
the datasets would be first resized to certain pixels. In order to ensure faster
convergence during training, dataset would be first converted to grayscale, the
histogram would be equalized followed by normalization. The mean is subtracted from
each pixel during data normalization. Thus obtained result is then divided by the
standard deviation. Thus, each input parameter has been maintained in a similar
distribution.
Image Augmentation
Since for limited training datasets, the neural network tends to over-fit as the number
of epochs increases, image augmentation techniques would be implemented. Image
augmentation parameters include zoom, shear, rotation and pre-processing functions.
Furthermore, contrast stretching, histogram equalization and adaptive histogram
equalization can be used to produce augmented images. During training, these
parameters results in output images having these attributes.
9
4.2.3 One-Hot Encoding
One-Hot Encoding is a method applied to converge data and thus prepare it for further
algorithm processing. With one-hot encoding, a new categorical column is used to
represent each categorical value. A binary vector is used to represent each categorical
column i. e. 1 or 0 such that the indices are marked with one.
Above operations showcase the basic building blocks of any Convolutional Neural
Network, the main important step to develop a proper CNN based model, is to first
understand how the blocks work.
1. Convolution Layer
𝑧 𝑙 = 𝑤 𝑙 . 𝐴𝑙−1 + 𝑏 𝑙 …4.1
10
𝐴𝑙 = 𝑔𝑙 . 𝑧 𝑙 …4.2
where,
𝑧 𝑙 = output of the neurons located in layer l
𝑤 𝑙 = weights of neurons in layer l
𝑔𝑙 = activation function
𝑏 𝑙 = bias in layer l
Taking padding and stride into account, the output matrix dimensions can be calculated
by the formula below.
𝑛+2𝑝−𝑓 𝑛+2𝑝−𝑓
[𝑛, 𝑛, 𝑛𝑐 ] ∗ [𝑓, 𝑓, 𝑛𝑐 ] = [[ + 1] , [ + 1] , 𝑛𝑓 ] …4.3
𝑠 𝑠
where,
n = image size
f = filter size
nc = number of channels in the image
p = padding used
s = stride used
nf = number of filters.
2. Activation Functions
Activation functions are mathematical equations that determine the output of a neural
network model. Since two linear functions’ composition is a linear function itself, all
the layers behave in the same way regardless of the number of hidden layers attached.
A non-linear activation function is required to learn the difference between desired
output and generated output. Thus the following activation functions have been decided
to be used.
a. Softmax Function
The Softmax function is sometimes called the soft argmax function, or multi-class
logistic regression since it is a generalization of logistic regression that can be used for
11
multi-class classification. The softmax function is ideally used in the output layer of the
classifier where the probabilities are required to define the input images’ class.
𝑒 𝑧𝑖
𝑆𝑜𝑓𝑡𝑚𝑎𝑥 𝜎(𝑧⃗)𝑖 = ∑𝐾
𝑧𝑗 …4.4
𝑗=1 𝑒
where,
𝑧⃗ = input vector to the softmax function
zi = elements of the input vector
𝑒 𝑧𝑖 = standard exponential function applied to each element of the input vector
K = number of classes in the multi-class classifier
The Softmax function is a function that turns a vector of K real values into a vector of
K real values that sum to 1. The input values can be positive, negative, zero, or greater
than one, but the values are converted to values between 0 and 1 by softmax. Thus now
they are interpreted as probability. Small or negative inputs are converted to a small
probability value.
ReLU is the most commonly used activation function in neural networks, whose
mathematical equation is:
So if the input is negative, the output of ReLU is 0 and for positive values, it is x.
1 𝑓𝑜𝑟 𝑥 > 0
𝑅𝑒𝐿𝑢 𝐷𝑒𝑟𝑖𝑣𝑎𝑡𝑖𝑣𝑒 𝑓′(𝑥) = { } …4.6
0 𝑓𝑜𝑟 𝑥 ≤ 0
12
Figure 4-1: ReLU Function
13
3. Batch Normalization
𝑧−𝑚𝑧
𝑧𝑁 = ( ) …4.7
𝑠𝑧
where,
𝑚𝑧 = mean of the neurons’ output
𝑠𝑧 = standard deviation of the neurons’ output.
4. Pooling
The pooling operation involves sliding a two-dimensional filter over each channel of
feature map and summarizing the features lying within the region covered by the filter.
a. Max Pooling
Max pooling is a pooling operation in which the maximum element gets chosen from
the pool of the feature map generated after running it through a filter. Therefore, a
feature map containing the most prominent features is generated from the preceding
feature map.
a. Categorical Cross-entropy
Categorical cross entropy is a cost function which is aimed for classifications for
datasets having more than two classes. The difference between two probability
distributions is quantified using these functions. The loss is calculated by the following
formula:
𝑜𝑢𝑡𝑝𝑢𝑡
𝑠𝑖𝑧𝑒
𝐿𝑜𝑠𝑠 = − ∑𝑖=1 𝑦𝑖 . log 𝑦̂𝑖 …4.8
where,
𝑦̂𝑖 = ith scalar value in the model output
14
𝑦𝑖 = Corresponding target value
output size = the number of scalar values in the model output.
b. Adam Optimizer
𝜕𝐿 2
𝑣𝑡 = 𝛽2 𝑣𝑡−1 + (1 − 𝛽2 ) [𝜕𝑤 ] …4.10
𝑡
𝑚𝑡
𝑚
̂𝑡 = …4.11
1−𝛽1𝑡
𝑣𝑡
𝑣̂𝑡 = …4.12
1−𝛽2𝑡
∝
𝑤𝑡+1 = 𝑤𝑡 − 𝑚
̂𝑡 ( ) …4.13
̂+∈
√𝑣𝑡
Where,
𝑤𝑡 = weight at time t
𝑤𝑡+1= weight at time t+1
β = moving average parameter
𝛽1 , 𝛽2 = decay rates of average of gradients (𝛽1 = 0.9, 𝛽2 = 0.999)
𝑚𝑡 = aggregate of gradients at time t
𝑚
̂ 𝑡 = bias-corrected aggregate of gradients at time t
𝑣𝑡 = sum of square of past gradients
𝑣̂𝑡 = bias-corrected sum of square of past gradients
∝ = step-size parameter/ learning rate (0.001)
∈ = a small +ve constant to avoid 'division by 0' error when 𝑣̂𝑡 -> 0, (10-8)
15
6. Performance Metrics
a. Confusion Matrix
To find the correctness and model accuracy, confusion matrix is a suitable approach
since it is used in classification problem for classes of two or more types.
Following are the name of the cases, when the result of the input data are found as
follows:
b. Accuracy
Accuracy is the number of correct predictions made by the model over all types of
predictions made. Accuracy is a good measure when the target variable classes in the
data are nearly balanced.
𝑇𝑃+𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = …4.14
𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
16
c. Precision
It is the ratio of positive predictions among which are the actual positives.
𝑇𝑃
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = …4.15
𝑇𝑃+𝐹𝑃
d. Recall
𝑇𝑃
𝑅𝑒𝑐𝑎𝑙𝑙 = …4.16
𝑇𝑃+𝐹𝑁
e. F1-Score
2∗𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛∗𝑅𝑒𝑐𝑎𝑙𝑙
𝐹1 − 𝑆𝑐𝑜𝑟𝑒 = …4.17
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙
17
4.3 Flowcharts
Start
Dataset Collection
Data Splitting
Preprocessing
Model Training
True
Overfitting? Model Optimization
False
Confusion Matrix Analysis
False
0.1<Loss<1? Change in Architecture
True
Model Deployment
Image Recognition
End
18
5. IMPLEMENTATION DETAILS
The dataset has been collected from Kaggle website, from the author Gaurav Neupane
which was last updated 2 years ago. [13] After inspecting through the dataset, it was
found that the quantity for Rupees 5 was inadequate, and so we were firsthand involved
in collection of images of five rupee notes. The team members and the classmates
captured images of five rupee notes published in different years having different
orientations on different backgrounds. And thus, the resultant dataset had equal
distribution for each classes.
1200
1000
800
600
400
200
0
Fifty [0] Five [1] Five Hundred Hundred [3] Ten [4] Thousand [5] Twenty [6]
[2]
Class ID
From our total dataset 17,229 we have used 10,609 data for training and 3537 data for
validation and the remaining 3334 dataset was used for testing. The number of training
samples of Rupees 50, 5, 500, 100, 1, 1000 and 20 are 1487, 1500, 1513, 1483, 1547,
1522 and 1557 respectively.
19
5.2 Image Pre-Processing
There has been no preprocessing done so as to extract the color feature. In this model,
we have decided to augment the datasets before training the model. The difference of
accuracy and loss while implementing models with preprocessing and without
preprocessing shall be further discussed on the Result and Analysis chapter.
Since the categorical cross entropy produces a one-hot array containing the probable
match for each category, the target labels of training, testing and validation datasets
were also one-hot encoded.
20
5.4 Architecture
Conv2D
Kernel (3X3X32X32) Conv2D
Batch Normalization
Gamma (32) Conv2D
Conv2D
Kernel (3X3X64X64)
Bias (64)
ReLU
21
Conv2D
Conv2D Kernel (3X3X64X32)
Kernel (3X3X64X128) Bias (32)
Bias (128) ReLU
ReLU
MaxPooling2D
Conv2D
Kernel (3X3X128X128) Flatten
Bias (128)
ReLU Dense
Kernel (8192X64)
Batch Normalization Bias (64)
Gamma (128) ReLU
Beta (128)
Moving_mean (128) Dense
Moving_variance (128)
Kernel (64X32)
Bias (32)
MaxPooling2D
ReLU
Dropout
Dense
Kernel (32X7)
Conv2D Bias (7)
Kernel (3X3X128X64) Softmax
Bias (64)
ReLU Dense_2
Batch Normalization
Gamma (64)
Beta (64)
Moving_mean (64)
Moving_variance (64)
The model is trained on datasets of seven classes, after resizing them in (128, 128)
pixels using Convolutional Neural Network. The sequential class is used which
implicitly constructs the forward method.
The input image of shape (128, 128, 3) is fed into two convolution layers where the size
of filter is (3, 3) with 32 filter nodes. ‘ReLU’ activation function is used and the padding
is assigned ‘Same’. Followed by convolution, batches are normalized using Batch
Normalization. The output shape after these layers was found to be (128, 128, 32).
Similarly, the features are further extracted by feeding the output images at above layers
into two convolution layers where the filter nodes is 64, keeping remaining parameters
constant. Batches are again, normalized using Batch Normalization. Size of images are
reduced using ‘MaxPooling’ of pool size (2, 2) and model is regularized using dropout
of 0.2. The model is now once again trained on datasets by feeding the extracted layers,
maintaining the same filter size, filter nodes, activation function and padding. The
output shape after these layers was found to be (64, 64, 64).
The images are then, fed into two convolutional layers where the nodes of filter is now
incremented to 128, keeping remaining parameters constant. This process is followed
by Batch Normalization, ‘MaxPooling’ of pool size (2, 2) and Dropout of 0.2. The
output shape after these layers was found to be (64, 64,128).
The features are further extracted by decreasing the nodes of filter to 64 and 32,
sequentially with the remaining parameters retaining their original assigned values.
Finally, ‘MaxPooling’ of pool size (2, 2) is implemented. The output shape before
sending them to dense layer is (16, 16, 32).
The loss of each training, testing and validating layers are computed using categorical
cross entropy. The model uses Adam Optimizer to update weights during the back
propagation as optimization technique for gradient descent.
We have flattened above generated feature map to one dimensional matrix which is
then converted into a fully connected dense layer. The probability of the image
23
classification of the input image is saved in a list and hence the correct output is
displayed by calculating the ClassID that corresponds to the index of the maximum
probability value.
After executing the above architecture, thus created model had been saved which has
then been used henceforth to predict the test dataset. The result thus achieved further
shall be discussed on the next chapter.
24
Image after being resized to 128X128 pixel is as shown below.
Input Image
25
5.4.2 Visualization of each layers
26
Figure 5-7: Convolution_Layer_3
27
Figure 5-9: Max_Pooling_0
28
Figure 5-11: Convolution_Layer_5
29
Figure 5-13: Convolution_Layer_7
30
Figure 5-15: Dropout _1
31
Figure 5-17: Batch_Normalization_3
32
Figure 5-19: Max_Pooling_2
33
5.5 Model Deployment
In order to implement thus created model, different functions from keras, PIL, flask,
tensorflow have been used to build a minimal localhost website which can be seen
below.
Upon choosing an image for testing, the image is saved as variable ‘imagefile’ in the
index.html, which is accessed using a python script in the backend. This python file
uses functions from flask to save the image file, which is saved in a path and then loaded
after executing necessary preprocessing techniques according to the requirement to
send it under the model. Thus tested image is then shown on the screen along with an
audio playback which is played on loop automatically. This audio declares the
denomination of the currency in both Nepali and English language.
Upon analyzing different architectures for our project, this chapter shows the results
achieved from three models: Model 1, Model 2 and Model 3, using the mentioned
architecture.
6.1 Model 1
In this model we have used the same architecture as the model discussed in our project
but the only difference is that, the training dataset is first preprocessed in which, the
images are resized to 32X32 pixel, converted to grayscale and then the histogram has
been equalized followed by normalization.
LOSS
Training Validation
2
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
EPOCHS
35
ACCURACY
Training Validation
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
EPOCHS
The model was over-fitted after the 7th epoch after which the validation graph
(accuracy and loss) started diverging from the training graph. The overall test accuracy
and score is as given below:
1. Accuracy = 46.01%
2. Loss = 2.2964
36
Figure 6-3: Confusion Matrix of Model 1
From the table, the lowest value of F1-Score was found for Class [6], i.e. Rs 20 in which
the value of precision and recall was 0.24 and 0.29 respectively. This implies that only
24% of the images which were predicted Rs 20 are actually Rs. 20 whereas 29% of the
true total Rs. 20 was correctly predicted.
The highest value of F1-Score was found for Class [2], i.e. Rs 500 in which the value
of precision and recall was 0.61 and 0.57 respectively. This implies that 64% of the
images which were predicted Rs 500 are actually Rs. 500 whereas 57% of the true total
Rs. 500 was correctly predicted.
37
6.2 Model 2
Since the validation graph did not converge with the training graph, we decided to
exclude the preprocessing from Model 1. This model uses the same architecture but it
classifies the images using the color feature as well.
LOSS
Training Validation
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
EPOCHS
38
ACCURACY
Training Validation
1.2
0.8
0.6
0.4
0.2
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
EPOCHS
The model seemed to converge better than the first model but the overall performance
score could not be increased more than 68.9% accuracy no matter how much the model
was optimized. The overall test accuracy and score is as given below.
1. Accuracy =75.9%
2. Loss =1.5005
39
Figure 6-6: Confusion Matrix for Model 2
From the table, the lowest value of F1-Score was found for Class [4], i.e. Rs 10 in which
the value of precision and recall was 0.64 and 0.62 respectively. This implies that only
64% of the predicted Rs 10 are actually Rs. 10 whereas 62% of the total Rs. 10 was
correctly predicted.
The highest value of F1-Score was found for Class [1], i.e. Rs 5 in which the value of
precision and recall was 0.94 and 0.89 respectively. This implies that 94% of the
predicted Rs 5 are actually Rs. 5 whereas 89% of the total Rs. 5 was correctly predicted.
40
6.3 Model 3
Since in Model 1 and Model 2, the input images were resized to 32X32 pixels, the
model seemed to lose a lot of information due to distortion which is why the images
have been resized to higher dimensions. And thus the final model that has been
discussed in our report has been represented as Model 3, where the images are first
converted to 128X128 pixel and are not preprocessed. The architecture remains the
same.
LOSS
Training Validation
3.5
2.5
1.5
0.5
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
EPOCHS
41
ACCURACY
Training Validation
1.2
0.8
0.6
0.4
0.2
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
EPOCHS
Since this model exhibited higher accuracy of 93.25%, this model was finalized as our
project model. The overall test accuracy and score is as given below.
1. Accuracy = 93.25%
2. Loss = 0.26
42
Figure 6-9: Confusion Matrix for Model 3
From the table, the lowest value of F1-Score was found for Class [4], i.e. Rs 10 in which
the value of precision and recall was 0.95 and 0.78 respectively. This implies that only
95% of the predicted Rs 10 are actually Rs. 10 whereas 78% of the total Rs. 10 was
correctly predicted.
The highest value of F1-Score was found for Class [1], i.e. Rs 5 in which the value of
precision and recall was 0.96 and 0.97 respectively. This implies that 96% of the
predicted Rs 5 are actually Rs. 5 whereas 97% of the total Rs. 5 was correctly predicted.
43
6.4 Output of Final Model
For each rupee note, we have chosen the best case and the worst cases. The best case
refers to the output when 100% of the note is visible whereas the worst case refers to
about (50-70) % of the note being visible on the image.
44
6.4.3 Output for Rs. 20
45
6.4.5 Output for Rs. 100
46
6.4.7 Output for Rs. 1000
47
7. FUTURE ENHANCEMENTS
7.1 Limitations
The main objective of this project was to obtain a model with highest accuracy that can
predict the real-world images. But yet the model we have obtained is not perfectly
intelligent.
So, the enhancements that can be done in the future are as follows:
1. Currencies shall be detected on real time.
2. This system shall be deployed as a mobile application.
3. Negative images shall be distinguished from actual currency classes.
4. This system architecture can be reused for foreign currencies by updating the
corresponding dataset.
5. Images received from user can be treated as datasets for further training.
48
8. CONCLUSION
Among the three models we have discussed before, the validation graph of Model 3
converged more with the training graph with the highest test accuracy 93.25% and the
lowest loss value 0.26, this model has been chosen to be deployed. For the time being,
we have deployed the system on a localhost server. The system was implemented in
Python programming language and its performance was further tested on fresh images.
Despite the unaccounted limitations that exist in our project work, this project can be
remarked as a great achievement and opportunity for us to explore applications of a
CNN model. The main objective of the project was to recognize the Paper Currency
Issued by Nepal Rastriya Bank and provide its value to the user, which we have
rightfully succeeded in doing so.
Hence, this project work has exhibited a successful output for the course ‘Minor
Project’ in the partial fulfillment of the requirements for the award of the Degree of
Bachelor of Engineering in Computer Engineering.
49
9. APPENDICES
Coding
Designing
Familiarization of Tools
Research
11-Oct
10-Dec
30-Dec
19-Jan
31-Oct
20-Nov
9-Apr
28-Feb
8-Feb
20-Mar
Duration
50
Appendix B: Headers and Commands
In order to train our model, some of the headers and functions that have been used are
as follows:
‘os’ library: This library has been used in order to handle all the directory paths for
datasets which includes testing dataset and validation dataset. The ‘train’ dataset
has been saved in the list variable named path.
Np.array(): This function has been then used to create a numpy array of images that
are traversed one by one from the path defined above such that ‘classNo’ stores the
Class ID for each currency folder and ‘images’ stores the corresponding images
inside each folder.
np.load(savepath3, allow_pickle=True): Above made numpy arrays are saved using
pickle which makes it easier to load these pickle files later on.
Train_test_split: In order to classify the dataset folder ‘train’ into train and
validation dataset, we have used train_test_split function from sklearn library and
split the data in the ratio of 0.25.
Matplotlib.plyplot: This function has been imported in order to create a sample
graph for the distribution of number of images inside every ClassID folder,i.e. for
each denomination of currency, a separate ClassID with certain number of images.
Cv2.cvtColor(), cv2.equalizeHist(): Since, we cannot train the images as it is in its
raw form, we have used a separate ‘preProcessing’ function under which we have
used cv2 library to convert the image into grayscale, to equalize the histogram and
to normalize the image.
X_train = np.array (list (map ( preProcessing, X_train ))): In order to preprocess
each images under train, test and validation dataset, we have used a map function.
X_train.reshape(X_train.shape[0],X_train.shape[1],X_train.shape[2],1)The shape
of each test, train, valid data has been set with channel 1 using reshape() function.
to_categorical(): Using the method to_categorical() from, a numpy array has been
converted into a matrix which has binary values and has columns equal to the
number of categories in the data.
51
References
[4] A. Ahmadi, S. Omatu and T. Kosaka, "A reliable method for recognition of paper
currency by approach to local PCA," 2003.
[6] D. P. V. Chetan B.V, "A Robust Side Invariant Technique of Indian Paper
Currency Recognition," International Journal of Engineering Research &
Technology (IJERT), vol. 1, no. 3, May 2012.
[8] M. M. Ahmed Ali, "Recognition System for Pakistani Paper Currency," Research
Journal of Applied Sciences, Engineering and Technology , pp. 3078-3085, 2013.
52
[9] K. N. Ajit Danti, "Grid Based Feature Extraction for the," International Journal
of Latest Trends in Engineering and Technology (IJLTET), vol. 4, no. 1, 2014.
[11] K. Rimal, "Cash Recognition," Cash Recognition for Visually Impaired, 2018.
53