Handwriting Recognition Using Machine Learning
Handwriting Recognition Using Machine Learning
BACHELOR OF TECHNOLOGY
IN
ELECTRONICS AND COMMUNICATION ENGINEERING
By
May 2021
TABLE OF CONTENTS
DECLARATION i
ACKNOWLEDGEMENT ii
LIST OF FIGURES iv
ABSTRACT v
CHAPTER-1: INTRODUCTION 1
1.1 Introduction 1
3.2 Modelling 9
3.3 Designing 10
CHAPTER-4: ALGORITHMS 12
4.6 LSTM 18
CHAPTER-5: IMPLEMENTATION 20
5.5 Code 23
6.1 Results 32
6.2 Conclusion 33
6.3 Future scope 34
REFERENCES 35
DECLARATION
I hereby declare that the work revealed in the B.Tech Project Report named "Handwriting
Recognition using Machine Learning" submitted at Jaypee University of Information
Technology,Waknaghat, India is a genuine record of my work done under the management of
Mr. Munish Sood. I have not presented this turn out somewhere else for some other degree or
recognition.
Hitesh Thakur
171028
This is to certify that the above statement made by the candidate is correct to the best of my
knowledge.
Date: 29/06/2021
i
ACKNOWLEDGEMENT
I would like to thank God for guiding us throughout our academic journey and to acknowledge our
project supervisor Mr. Munish Sood, for his undying support, priceless motivation and guidance
throughout the project duration. Moreover, I extend my sincere gratitude to all the faculties and
non-teaching staff of the Department of Electronics and Communication Engineering for their
contribution towards the success of this work.
My friends have also helped and motivated us at every step of this project. Without such immense
support, making of this project would have been very challenging.
ii
LIST OF ACRONYMS AND ABBREVIATIONS
ML Machine Learning
AI Artificial Intelligence
VM Virtual Machine
LIST OF FIGURES
iv
ABSTRACT
This project is very helpful in classifying handwritten words and characters and converting them
into digital format. I have used a concept of machine learning known as recurrent convolutional
neural networks to recognize and classify words and characters.
This model is going to be very helpful in doing basic tasks like data entry, keeping receipts,
maintaining medical records etc. I will be training this model using a large amount of data so that it
can learn easily and efficiently.
v
CHAPTER – 1
INTRODUCTION
1.1 Introduction
Machine Learning is a part of Artificial Intelligence (AI). It uses various techniques to provide
machines the ability to learn with the help of large amounts of data. While Machine Learning has
reduced the burden on programmers to explicitly program the computers, it has touched the areas
which have never been explored before. This has led to various technological advances. There are
two types of Machine Learning algorithms :
A regulated taking in computation gains from named getting ready data, urges you to foresee results
for unforeseen data. Adequately building, scaling, and passing on exact oversaw AI Data science
models requires some genuine energy and particular capacity from a gathering of astoundingly
skilled data scientists. Furthermore, Data scientists must revamp models to guarantee the encounters
given remain substantial until its data changes.
Independent learning is an AI technique, where you don't need to coordinate the model. Taking
everything into account, you need to allow the model to manage its own to discover information. It
essentially deals with the unlabelled data.
Independent learning estimations license you to perform all the more confounding taking care of
tasks diverged from coordinated learning. But, independent learning can be more eccentric
differentiated and other typical learning significant learning and stronghold learning methods.
Through cutting edge processing advancements, AI isn't what it resembled previously. It was
conceived from design acknowledgment and the hypothesis that PCs can learn without being
modified to play out specific assignments. Researchers intrigued by AI needed to check whether
PCs could gain from information. The iterative part of AI is significant on the grounds that as
models are presented to new information, they can autonomously adjust. They gain from past
calculations to deliver dependable, repeatable choices and results.
Simple Programming Model: Machine Learning models can be programmed using Python which
is one of the easiest yet the most productive programming languages. Also Python has one of the
biggest developer communities who have provided us with huge libraries which make machine
learning algorithms easy to implement.
Cost Effective: Machine Learning with Python guarantees that it doesn't consume a gap in your
pocket with regards to overseeing humongous measures of data. This has been an issue with
ancestor programming which has been cost restrictive. Numerous organizations have needed to
erase and downsize data with the end goal to diminish their expenses.
It's the ability of the computer to interpret and recognise handwritten input. It’s sometimes known
as HTR(handwritten text recognition). This could be a scanned handwritten document or a photo of
a handwritten note, for instance. The growth and proliferation of touch screens add another way to
input handwriting.
The goal of handwriting recognition has been around since the 80s — and has suffered from
accuracy issues from the beginning. There are two types of handwriting recognition. First is the
older of the two, known as offline handwriting recognition. This is where the handwritten input is
scanned or photographed and given to the computer.
The second is online, which is where the writing is input through a stylus/touchscreen. This offers
the computer more clues about what’s being written. (For instance, stroke direction and pen weight)
There is an abundance of Handwritten Data which is easily available on the internet. This data can
be used for training Machine Learning models which can convert handwritten documents into
digital form. This could be advantageous over the following domains:
Healthcare:
Handwriting Recognition can be very beneficial in maintaining the Patient records which are
handwritten. This will be a gamechanger for the healthcare industry and make it very easy for
patients to access their medical records. It will also decrease the chances of people misreading
prescriptions and getting wrong medicines.
Building Databases:
As we all know that paper documents can be destroyed by different means such as floods, fire
breakouts and termites. So handwriting recognition will help us overcome all these problems by
digitally storing the data over the cloud.
Reducing the cost of storing the Data:
Saving a large amount of data in Physical form can be very costly as it requires huge storage spaces
and accessing this data can also be very challenging. Converting the same data to digital format is
very useful and cost effective. It will also decrease the use of paper for copying and storing this data
will be very beneficial for the environment also.
CHAPTER-2
LITERATURE SURVEY
A Review Character affirmation is one of the most captivating and testing research areas in the field
of Image taking care of. English character affirmation has been extensively amassed over the most
recent 50 years. Nowadays different approaches are in use for character affirmation. File
affirmation, progressed library, scrutinizing bank store slips, examining postal addresses, removing
information from checks, data area, applications for charge cards, clinical inclusion, credits, charge
records, etc are application locales of electronic report getting ready. This paper gives a survey of
assessment turn out finished for affirmation of physically composed English letters. In Hand formed
substance there is no necessity in the creating style. Interpreted letters are difficult to see as a result
of various human handwriting styles, assortment in point, size and condition of letters. Various
strategies of physically composed character affirmation are analyzed here close by their
introduction.[1]
The majority of current offline handwritten text recognition (HTR) algorithms operate at the line
level, converting the text-line picture into a series of feature vectors.These characteristics are
supplied into an optical model (for example, a recurrent model).In order to distinguish handwritten
characters, a neural network was used. Recent work on document-level text identification and
localization and combined line segmentation and identification at the paragraph level has yielded
encouraging results.The end outcome However, the finest outcomes in terms of recognition are still
to be found. Systems that work at the line level are able to do this.
In this model thirteen stack convolutional layers and three bidirectional layers are used having 256
units in each layer. ReLU is also used to have non- linearity after each layer of CNN. Bidirectional
LSTM is used with CTC[5] loss function for making the model end to end trainable. This model
was found to be very accurate in predicting on the READ dataset and this model also won second
place in ICDAR2017 competition.
Fig 2.1: RNN model
CHAPTER 3
SYSTEM DEVELOPMENT
Jupyter Notebook:
The Jupyter Notebook is an open-source web application that licenses you to make and share
reports that contain live code, conditions, observations and record text. Uses include: data cleaning
and change, numerical reenactment, quantifiable showing, data recognition, AI, and impressively
more. Jupyter Notebooks are a fantastic technique to make and rehash on your Python code for data
assessment.
Google Collab:
It is made by Google’s Research Department. It's used to execute and write Python code online in a
browser. It is very useful and efficient for people who are trying to learn Machine Learning.
3.1.2 Minimum Hardware Requirement:
2GB RAM
Memory Required 20 MB
3.2 Modelling
Incremental Model is a system of programming improvement where prerequisites are broken into
various autonomous modules of programming advancement cycle. Steady advancement is done in
endeavors from investigation plan, execution, testing/confirmation, support. Each accentuation will
encounter the necessities, structure, coding and testing stages. Each resulting appearance of the
system adds ability to the last release until the point that all arranged value has been completed .
The structure is placed into age when the essential expansion is passed on. The primary growth is
much of the time a middle thing where the crucial necessities are tended to, and worthwhile features
are remembered for going with increments. At the point when the middle thing is br0ke somewhere
around the cust0mer, there is other plan improvement for the accompanying enlargement.
Volume of information is extending every day that we can manage business trades, sensible data,
pictures, chronicles and various others. In this way, we need a system that will be good for isolating
the information open and that can normally make reports, viewpoints or overview of data for better
use.
There are three phases in designing a model:
Training Data: It is the data on which the machine learning model learns and trains itself. Usually
it is large in comparison to test data.
Validation Dataset: Hyper-parameters of a classifier. It is sometimes also called the development
set.
Test Data: It is the data on which testing is done and the model is evaluated on the basis of results
obtained from this dataset. Usually it is small in comparison to training data.
Logistic Regression is an algorithm that is utilized for binary classification and is the most basic
algorithm for classification techniques.It uses sigmoid function for predicting the output. The
algorithm makes use of the decision boundary to predict the output.
In ANN (Artificial Neural Networks) we manufacture a type of transient states, which allows the
machine to learn in a more refined manner. The objective of this article is to draw out the
arrangement of ANN figuring in relating to the value of the psyche. It is truly said that the working
of ANN takes its hidden establishments from the neural association living in the human brain. ANN
chips away at something suggested as Hidden State. These covered states resemble neurons. All of
these covered states is a transient structure which has a probabilistic lead. A framework of such hid
state goes probably as a platform between the data and the yield.
The estimation is incredibly standard in various competitions. The end yield of the model takes
after a black box and from this time forward should be used wisely. Subjective woods looks like a
bootstrapping computation with a Decision tree model. In Random Forest, we create different trees
rather than a lone tree in the CART model. To describe another article subject to attributes, each
tree gives a portrayal and we express the tree "votes" for that class. The forest area picks the course
of action having the most votes (over all the trees in the forested areas) and if there ought to emerge
an event of backslide, it takes the ordinary of yields by different trees.[4]
Activate the nodes ofCan take huge number Fast for multiclass
Artificial Neural the next layer and of features and gets classification
Networks then apply refined as no. of nodes
backpropagation. increases
Each tree votes for a Highly scalable and is Fast for multiclass
Random Forest class and the one very efficient for classification
with the most vote multiclass
wins. classification.
The designing of a ConvNet is like that of the accessibility illustration of Neurons in the Human
Brain and was excited by the relationship of the Visual Cortex. Solitary neurons respond to
upgrades simply in a kept territory of the visual field known as the Receptive Field. A collection of
such fields cover the entire visual region.
A recurrent neural network is a type of neural network having feedback , the output from the
previous step is fed into the current step as an input. Unlike CNNs and ANNs they have a memory
element where they can store the sequence of outputs. This feature makes them suitable for
applications like speech recognition, handwriting recognition, and also for making predictions.
Fig 4.8: RNN
But there are some limitations of RNNs and the 2 major limitations are:
Exploding gradients
Vanishing gradients
In a neural network weights are updated constantly but if an error gradient is being used by the
model to update the network. If an error gradient which is assigned a very large value is used to
update the weights of the model, this could accumulate and become very huge with every iteration.
The problem of exploding gradients makes the network unstable and the loss of the network
becomes very high. This problem can easily be solved by gradient clipping and squishing the
gradients.
This is exactly the opposite of exploding gradients, here the values of gradients are very small. So
the model stops learning then and skips ahead without learning them. This is very hard to solve as
compared to exploding gradients. But this problem can be solved by the use of LSTMs.
4.5 LSTM:
Long short term memory(LSTM) is a special kind of recurrent neural network which is capable of
learning long term dependencies. LSTMs make it possible for RNNs to remember inputs for a
longer period of time because of the presence of a memory element in LSTMs.
Fig 4.9: LSTM
There are different operations happening inside the LSTM cell, these enable LSTM to remember
the useful information and forget the information which will not help the model to improve further.
Connectionist Temporal Classification Loss, or CTC Loss is very helpful in tasks like speech and
handwriting recognition. It is help full where there are pauses or repeating words or alphabets.
Between a continuous (unsegmented) time series and a target sequence, it estimates a loss. It does
this by accumulating the probabilities of alternative input-target alignments, yielding a loss value
that is differentiable with respect to each input node. The input to target alignment is expected to be
many-to-one.
IMPLEMENTATION
Why Python?
Python is very easy to understand and is a beginner friendly high level language.
Python's simplicity allows us to write reliable systems. Python is more mechanical and models are
quickly trained for machine learning.
Datasets Used:
The MNIST database of physically composed digits, open from this page, has an arrangement set of
60,000 models, and a test set of 10,000 models. It is a subset of a greater set open from NIST. The
digits have been size-normalized and centered in a fixed-size picture.
The data from the dataset is divided into training and validation sets which are in the ratio of 10:1.
Training set contains 38000 images whereas the testing set contains 3800 images.
The RNN is trained on the images from the training set and then its accuracy is checked by using
the model on the validation set.
RAM : 14 GB DDR4
HDD : 73 GB
5.5 Code:
Firstly we import all the libraries that are necessary for this model like pandas, keras, matplotlib,
numpy, etc. These libraries are very important in this project.
Fig 5.4: Loading dataset
In this step we are loading the data from our selected dataset. Train and valid are loading data from
training and validation files of the dataset. Pandas method called head is used to view first and last 5
rows from the loaded dataset.
Here we are using matplotlib to view the images and the labels of theimages. This is done to check
if the images and labels match or not. As we can see the labels and words in the images are correct
so we can now move forward.
Fig 5.6: Cleaning data
In this step we are checking that if there are any cells that are having missing data, these cells are
filled with NaN. Then we are using dropna to remove these rows from our training and validation
sets.
All the images of this data set are not of uniform dimensions so in this step we are trying to make
the size of all the images the same. We are doing this by cropping the image if the image is larger
than the required dimensions and if the image is smaller then we are padding it to make all the
images of uniform size.
Here we are loading validation images in the same fashion. We are also reshaping all the images of
the train and validation set and normalizing them in the range of 0-1.
In this step the whole RNN model has been built. I have used 4 Covnets with max pooling and
batch normalization so that our model can learn the features. The output is reshaped so that it can be
fed into the LSTM RNN which will be learning the words. At last classification is done in the last
dense layer.
Fig 5.13: CTC loss
Here we have built the CTC loss function so that we can train our model using CTC loss.CTC is
used when there is a use of LSTM layer. CTC does a very good job in applications like speech
recognition and handwriting recognition.
And finally we are training the model so that we can find out how well this model is fitting the data.
We have used the lambda function to call the CTC function we built in the last step.
The batch size is set to 128 which is chosen arbitrarily and the model will be trained for 20 epochs
before giving any result.
In this model there are 4 Convolution layers which are used for feature extraction from the image.
There are 4 Max Pooling layers to get only prominent features.
3 dropout layers are also there so that the model does generalize and overfit the data.
The reshape layer is used to reshape the data from the Covnets and make it suitable to feed it into
the RNN.
The RNN layer uses bidirectional LSTM so that words can be predicted accurately.
At the end we have used a dense layer with then outputs to classify the words.
CHAPTER-6
6.1 Results:
I have successfully trained our RNN model on the selected dataset and tested it. This model is able
to recognise the handwritten words with high accuracy.
The model has been able to predict different characters with an accuracy of more than 90%.
Further training this model for more epochs would have been a waste of resources as after this there
is very less room to improve.
The model was able to correctly predict 5 out of 6 images. It was not able to predict the word
“GORTCHAKOFT” are it is not written properly.
6.2 Conclusion:
This project will be helpful in converting the handwritten text to a digital format which will be very
helpful in converting old handwritten documents and notes to text files. These files are easy to
store, edit, share and read easily. This project will also help in preserving old important documents
which are difficult to store physically.
6.3 Future Scope:
The accuracy of this model can be further improved by using more training data which will help
this model to learn and generalize better.
Some of the images in the dataset are not of very good quality and the annotations of some images
are also wrong.
Removing such images will also help in model’s learning,
REFERENCES
[1] Nisha Sharma et al, “Recognition for handwritten English letters: A Review”International
Journal of Engineering and Innovative Technology (IJEIT) Volume 2, Issue 7
[2] J.Pradeep et al,, “Diagonal based feature extraction for handwritten alphabets recognition
System using neural network”International Journal of Computer Science and Information
Technology (IJCSIT), Vol 3, No 1
[3] Miroslav NOHAJ, Rudolf JAKA, “Image preprocessing for optical character recognition
using neural networks”Journal of Pattern Recognition Research
[4] Jehad Ali, Rehanullah Khan, Nasir Ahmad, Imran Maqsood " Random Forests and Decision
Trees ".
https://www.analyticsvidhya.com/blog/2016/04/complete-tutorial-tree-based-modeling-scratch-in-
python/