0% found this document useful (0 votes)
19 views6 pages

Extraction of Information From Handwriting Using Optical Character Recognition and Neural Networks

Uploaded by

Navya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views6 pages

Extraction of Information From Handwriting Using Optical Character Recognition and Neural Networks

Uploaded by

Navya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Fourth International Conference on Electronics, Communication and Aerospace Technology (ICECA-2020)

IEEE Xplore Part Number: CFP20J88-ART; ISBN: 978-1-7281-6387-1

Extraction of Information from Handwriting using


Optical Character recognition and Neural Networks
Piyush Mishra Pratik Pai
Electronics and Telecommunication Department Electronics and Telecommunication Department
Sardar Patel Institute Of Technology Sardar Patel Institute Of Technology
Mumbai,India Mumbai,India

Mihir Patel Dr. Reena Sonkusare, Head Of Department


Electronics and Telecommunication Department Electronics and Telecommunication Department
Sardar Patel Institute Of Technology Sardar Patel Institute Of Technology
Mumbai,India Mumbai,India

Abstract—Image Processing is a vital tool when one is dealing to one’s requirement and can arrive at results, which earlier
with several images and wishes to perform several complex used to take a lot of time, in a matter of seconds. Converting
actions on the same. With advances in technologies, one can now handwriting to digital data can be characterized into the above
compress, manipulate, extract required information, etc. from
any image one wants to. One such application of Image processing category. This conversion opens a myriad of avenues for us
is detecting handwritten text and converting it to a digital text and can have its own wide range of applications.
format. The main objective is to bridge the gap between the It is important for the converted data i.e in most cases digital
actual bit of paper and the digital world and in doing so, one can text, to be in a palpable and understandable format, for the
operate on the digital data much faster as compared to the actual user to be able to make full use of the same. Hence, this
data. Hence, in this paper, we aim to implement the detection
of handwritten text via Optical Character Recognition (OCR).
paper has converted it into a digital text which is fairly easy
The entire paper will be implemented on TensorFlow. T h i s to comprehend. Using optical character recognition (OCR), it
r e s e a r c h w o r k h a s also analyzed various results and taken aims to achieve this task. A Neural Network (NN) model is
appropriate dataset to train the model. Further, the importance devised to be trained on the dataset. This neural network
of this paper lies in the fact that it can facilitate and open model will consist of various layers as discussed in detail
various unexplored avenues. The key novelty of the paper lies
in the fact that the data-set used is comprehensive which helps afterward. The image of the word will act as the input to the
us to produce better result. In addition, the paper successfully entire model and pass through the several layers, eventually to
analyzes handwritten scripts and extracts it in digital form. come out as digital text data. Since the data-set chosen is a
Analyzing the text can help combat forgery, understand certain fairly exhaustive one, the training will also be fairly sufficient
temperaments of the person writing the text, and so on. to keep the accuracy of the model satisfactory. Although this
Coupled with this, this paper has successfully implemented an
improved version as compared to the pre-existing solutions by is an assumption, this have strengthened the proposition by
using the convergence of convoluted neural networks (CNN) and bolstering the work in this paper with some performance
the Recurrent Neural Network (RNN). metrics. This will help us to deduce the exact accuracy of the
model and hence would indicate certain areas for further
Index Terms—Image processing, handwriting detection, research. The speed of computing the same is also analyzed
optical character recognition (OCR), TensorFlow, convoluted
neural and kept in mind for comparison.
networks (CNN), recurrent neural network(RNN)
II. LIT ERAT URE SURVEY
I. INT RODUCT ION The optical character recognition (OCR) is a broad domain
of research in soft computing, artificial intelligence (AI),
Humans have constantly been evolving and working towards
pattern recognition (PR), and computer vision. OCR is a
making their lives better. Technology forms one of those
general technique of handwritten texts or digitizing pictures of
aspects, wherein the humans continuously make innovations
printed documents, so that they could be electronically
and advancements to improve both the user experience and
amended, stored and searched more efficiently and correctly.
perform complex tasks in a very short span of time. Coupled
According to [1], there are two main types of OCR. One of
with this, the internet penetration has increased by leaps and
them being offline and the other being online OCR. Both of
bounds. Since the inception of the World Wide Web [WWW],
these types differ mainly in the input of images. In the offline
the number of users of the internet has been increasing at a
mode, the input is basically the static information (via
striking rate. Commensurate to this increase, a lot of data has
images), whereas in online mode the information is obtained
been digitized. This digitization has enabled a seamless
via real time writing objects. Online methods gain the position
transmission of data in various forms. This further enables us
to extract a ton of information both in a very short span of
time and efficiently. When one has digital data one can
manipulate it according
978-1-7281-6387-1/20/$31.00 ©2020 IEEE 1328

Authorized licensed use limited to: Carleton University. Downloaded on May 30,2021 at 05:16:37 UTC from IEEE Xplore. Restrictions apply.
Fourth International Conference on Electronics, Communication and Aerospace Technology (ICECA-2020)
IEEE Xplore Part Number: CFP20J88-ART; ISBN: 978-1-7281-6387-1

of pen as a function of time straight from the interface. This is after the development of Deep Learning, the HMM has been
typically done through pen-based interfaces and somewhere obsolete. Deep Learning has made up for the shortcomings
the author writes with an individual pen on an electronic of HMM. There are several alternatives to the HMM [7] and
tablet. Therefore, the online method usually has a higher has hence increased the accuracy of the model. In the paper,
complexity due to the dynamics involved. the CNN and RNN i.e Convolutional Neural Networks and
Recurrent Neural Networks have been used which has been
depicted in Fig.2.

Fig. 1. T ypes of OCR.

However, the offline mode also does not seem simple either,
due to the variation of handwriting and several unprecedented
hand writings. Due to such wide applications, OCR has been a
great research topics and has been advancing by leaps and Fig. 2. Basic Model Overview.
bounds. [2] shows us the historical development of OCR and
how OCR has turned out in its modern form. Such an overview As we can clearly see in Fig.2, it consists of the following
provided us with a firm understanding of OCR. layers:
Now for the implementation of OCR one actually have a 1) CNN (Convolutional Neural Networks): CNN is a type
myriad of options. As observed in [3] one can implement OCR of neural network which consists of an input and output layer
without segmentation. This process has both its merits and with several hidden layers between them. These layers are
demerits. One of the biggest cons of this process is that it colloquially known as convolutes but technically, they are
is not much comprehensive. Similarly in [4] we can see that known as sliding dot product or cross -correlation. The
support vector machines (SVM) can also be used for OCR. activation function used is the Rectified Linear Unit (RELU)
Further, K nearest neighbor (KNN) has also been used for the function, which basically serves the purpose of a rectifier. This
same implementation. KNN is a smart way to perform OCR is subsequently followed by various other convolutions such
but has its limitations because it is after all only a classification as pooling layers, fully connected layers and normalization
algorithm and hence fails to provide us the adequate insights layers. The usage of RELU is opted because of its patent
[5]. advantages [8].
Given the advancements of technologies and the patent A convolution is designed such that the kernel is defined by
shortcomings of the above discussed ideas, a holistic solution width and height (hyper parameter), number of input and
is provided by Convolution Neural Networks. This paper output channel (hyper-parameter) and the depth of the
explores the ways in which the limitations of [3], [4], [5] are convolution filter (the input channels) must be equal to the
tackled and how the paper has been able to do the same with a number channels (depth) of the input feature map.
greater level of accuracy. As discussed before the main aspect
of solving the limitations is the usage of NN. Text is an
arbitrary sequence of characters, and for those reasons one
requires a higher accuracy. This problem is efficiently solved
by using Recurrent Neural Network (RNN).
III. M E T HODOL OGY
As discussed above, OCR has been in research for a
long time and the accuracy of every model is increasing
and implementation is improving day by day. Earlier, for Fig. 3. Overview of a CNN with the RELU activation model.
handwriting text recognition, the widely acclaimed hidden
markov models (HMM) was used [6]. HMM’s are basically a The aforementioned hyper parameters, then control the size
set of probabilistic tools for dealing with sequences. But of the output of the convolution layers. Firstly, the number

978-1-7281-6387-1/20/$31.00 ©2020 IEEE 1329

Authorized licensed use limited to: Carleton University. Downloaded on May 30,2021 at 05:16:37 UTC from IEEE Xplore. Restrictions apply.
Fourth International Conference on Electronics, Communication and Aerospace Technology (ICECA-2020)
IEEE Xplore Part Number: CFP20J88-ART; ISBN: 978-1-7281-6387-1

of neurons in the input region is controlled by the depth. writers. This collection helps to classify the writers based
After training, they learn to activate specific parts of the input. ontheir respective handwriting. The data set is fairly big and
Secondly, the depth columns along spatial diversity are has complex handwriting, with a lot of punctuation. The
organized are controlled by strides. As the stride length is corpus of the IAM dataset has been formulated by
increased, the spatial distancing is also increased and the data categorizing them by assigning unique word-id, gray level to
passed and received by the receptive layer is also increased. binarize the line containing the word, bounding box around the
Lastly, the output and the input size must always be matched, word in form of cartesian coordinates along with width and
hence padding is a vital role in the CNN. These are the three height (x,y,z,w,h) , grammatical tag for the word and the actual
Hyper Parameters. label for the word. The features word-id, gray level, bounding
After all the hyper-parameters have been devised and box dimensions and the actual label for the word.
specified, the number of hidden layers are then decided. In The partition methodology has been used in order to im-
case of a linear relationship between input and output, a plement the model. Firstly,the data-set is trained. In training,
single hidden layer is present. This forms a simple neural the 70 percent of the corpus is used up where the model sees
network. In case of any layers greater than 1, it forms a and learns from the data. In this case the foundations of the
deep neural network. In this paper the number of layers of weights and biases of the NN are determined. Secondly, the
CNN implemented is 5. Now as each input can be taken as test data-set, in which 30 percent of the data-set is taken for
an individual neurons and each input has a relation to the final evaluation of the model.
neuron in a hidden layer. Each mapping has a different weight
or bias. These weights are then summed and this addition is
passed through the RELU layer. Further, the RELU layer is
responsible of removing the negative values from an
activation map by setting them to zero.
By doing so, the decision making of the non-linear properties
of the decision function a lot simpler.

Fig. 4. CNN model.

2) RNN(recurrent neural networks): Recursive neural net- Fig. 5. Data-set snippet.


work are non-linear adaptive models that are formed by
applying the weights recursively over a structured input,
producing a structured prediction over input of different size V . A L GORITHM
structures, or a scalar prediction on it, by traversing a given The Handwritten Text Recognition process can be broadly
structure in topological order. divided into two sub-parts: 1. Character Recognition 2. Text
RNN have been proven to be highly efficient and successful (Word) Recognition The input image that consists of a sen-
in the field of natural language processing. Recurrent neural tence will comprise of several words. In order to detect a
network is proposed to use unbounded history information, particular word, the system first needs to start identifying each
and it has recurrent connections on hidden states, so that character in the word. The letters then will be used to predict
history information can be used circularly inside the network the word. Thus for implementing the following system we
for arbitrarily long time. There are many advantages of using need two kinds of Neural Network model in the system: A.
RNN over traditional neural networks such as the possibility Convolutional Neural Network (CNN)- Character recognition
of processing large data, better accuracy as it makes use of using Image Processing B. Recurrent Neural Network(RNN)
historical information and also the size of the model does not [9] – To determine the word based on previous and current
increase with an increase in size of input. However, RNN character input.
computation is slow and also it cannot consider any future 1) Image Pre-processing: The input image that consists the
input for the current state. written text is gray value image of dimensions 128x32. Since
all input images do not have same dimensions, the image needs
I V. DAT A-SET USED
to be resized and normalized without distorting and make the
The data-set used is a fairly comprehensive and a complex resultant image of the dimensions of 128x32. This is done by
data-set so as to achieve better results with the paper. The adding white space to height or width (whichever needs to be
data set used by us is IAM dataset. It is a data-set, which adjusted) as shown in Fig.6. After this, the gray scale values
consists of a collection of handwritten passages by several

978-1-7281-6387-1/20/$31.00 ©2020 IEEE 1330

Authorized licensed use limited to: Carleton University. Downloaded on May 30,2021 at 05:16:37 UTC from IEEE Xplore. Restrictions apply.
Fourth International Conference on Electronics, Communication and Aerospace Technology (ICECA-2020)
IEEE Xplore Part Number: CFP20J88-ART; ISBN: 978-1-7281-6387-1

need to be normalized which are of the input image. Thus


the image is now pre-processed for passing it to the neural
network.

Fig. 6. Image Padding: An arbitrary sized image padded with white space to
fit the target image size of 128x32.

2) Convolutional Neural Network (5 Layers): The pre-


processed image is further divided in a fixed sequence length
before passing to the CNN network. Here, a sequence length
of 32 has been decided and divided into features. For feature
extraction by the CNN network, the sequence length has been
divided into 256 features. Some features show relatively high
correlation with high-level properties of the input image. Fig. 8. T op is the Input image. Bottom is the output given by the code, which
Therefore, high correlation is obtained between some features includes the image detected and its probability.
of the input image. Thus, when the entire training data-set
is passed into the CNN, all features are extracted from the
input images. After training the network, following features
are extracted from the training data-set and will be used for
next layer of RNN:“!”’()*+,-./0123456789:;?ABCDEFGHIJK
LMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz ”

Fig. 9. T op is the Input image. Bottom is the output given by the code, which
includes the image detected and its probability.

blank label with most confidence score i.e. above the threshold,
when concatenated together gives us the required letter “m-i-h-
Fig. 7. T op is the Input image. Bottom is the output given by the code, which i-r”. This is done dynamically and hence the procured scores
includes the image detected and its probability. are the scores starting from first letter ‘m’ till CTC blank
character is obtained. Also, Fig.9 showcases the performance
3) Recurrent Neural Network (2 Layers): Recurrent Neural of the model when it encounters bad handwriting.
Networks (RNN) is used for word prediction. RNN is required
as memory element is needed to predict the word based on V I. EXPERIMENT AL ANALYSIS AND RESULT S
current feature and the previous feature. Thus BLSTM
network (Bidirectional Long Short Term Memory) of RNN is The CNN and RNN model was trained for 1000 epochs
used in implementing the model [10]. Fig.8 shows the output with a batch size of 16 and was saved separately as an .h5py
from the trained RNN network which correctly determines the file format and was later used for model evaluation and text
handwritten text”mihir” with a probability of 0.4865. The
Recurrent Neural Network implements the Bi-directional Params Default With Punctuations Without Punctuation
LSTM. ADAM optimization algorithm is used for updating the percent error percent error percent error
CER WER CER WER CER WER
weights iteratively. The letters, most of the time are predicted 820k
11.29 37.34 10.23 29.099 7.52 19.11
at the location where they appear with the letters including the
CTC

978-1-7281-6387-1/20/$31.00 ©2020 IEEE 1331

Authorized licensed use limited to: Carleton University. Downloaded on May 30,2021 at 05:16:37 UTC from IEEE Xplore. Restrictions apply.
Fourth International Conference on Electronics, Communication and Aerospace Technology (ICECA-2020)
IEEE Xplore Part Number: CFP20J88-ART; ISBN: 978-1-7281-6387-1

prediction. The model was evaluated for IAM test dataset raw image. The image has to be properly padded for greater
which had 860 lines which consisted of 6960 words which had efficiency. The gray color normalization in the pre-processing
a total of 38429 characters. The decoding of the text was done stage ensures efficiency in feature extraction. If we increase
using CTC (Connectional Temporal Loss) network [11] which the number of epochs of training with a lower batch size,
is used for complementing the RNN. As BLSTM network is we can increase the accuracy of feature extraction from the
used, the timing of the incoming non-blank feature is not input image, but would increase the training time. RNN was
constant, hence we need this complementary network to detect used for text prediction because CNN is used only for feature
the stream of characters that can be collectively identified as extraction, there was a requirement of a neural network with
a word. Hence to identify a stream of characters as a word memory i.e. BLSTM. The CTC (Connectional Temporal Loss)
we wait for the CTC blank character. The character error rate decoding is taken into account for final text prediction as the
(CER) and the word error rate (WER) was evaluated for the timing of input sequence is variable. Hence, we wait for the
test data-set. The test data-set consisted of 820k parameters CTC blank character to identify a set of characters as a word.
and it was observed that the average character error rate (CER) The CER and WER was computed for inline sentences in the
came out to be 11.29 percent and the average word error rate image, which, in this case, acts as a performance metrics, to
(WER) came out to be 37.34 percent. Sentences with judge the model accuracy. Looking at the ways to improve the
punctuation marks had an average CER of paper, the proposed research work can focus on the following
10.23 percent and WER of 29.099 percent. Sentences without concepts. Firstly, a larger data-set with an even more
punctuation marks had an average CER of 7.52 percent and comprehensive data of hand writing, the accuracy of the
WER of 19.11 percent, where it was observed that about 20 model can be easily increased. Secondly, given the capability
percent of the overall error rate was contributed by of the mobile devices, a mobile application can be developed
punctuation marks. Thus, if the punctuation marks are to increase the portability and accessibility. Lastly, the
disregarded, then the rate of recognition outperforms by 5 applications of such devices i.e. the handwriting detector can
percent. The following Fig.10, Fig.11 and Fig.12 shows the be customized to match the specific demands and needs of the
text prediction that was obtained from test data images. The user. Owing to the needs the software or algorithm can be
first image is the handwritten text which is the test data image customized and the users can leverage from these highly
given to the input layer of HTR model. TEL is the label specialized features. For example, if the Handwriting detector
associated with the image sample i.e. the actual handwritten is put to use for detecting cheque forgery [12] [13], then the
text. TEP is the predicted text by the HTR model. data set needs to be modified with respect to signatures and
other aspects. Further, for the implementation of text to speech
and added vocalization as seen in [14], the model implemented
in this paper can be enhanced Another application for the
similar cause can be seen in [15] where text recognition
model has been combined with face detection blind people.

Fig. 10. Handwritten text prediction from test input image. V II I. RE FE RE NCES

[1] M. Sonkusare and N. Sahu, “A survey on handwritten character


recognition (hcr) techniques for English alphabets”, Advances in Vision
Computing: An International Journal, vol. 3, pp. 1–12, 03 2016.
[2] Mori S, Suen CY, Yamamoto K.” Historical review of OCR research
and development” Proceedings of the IEEE. 1992 Jul;80 (7):1029-58.
Fig. 11. Handwritten text prediction from test input image. [3] M. A. Ozdil and F. T. Y. Vural, ”Optical charac- ter recognition
without segmentation,” Proceedings of the Fourth International Conference on
Document Analysis and Recognition, Ulm, Germany, 1997, pp. 483-486
vol.2, doi: 10.1109/ICDAR.1997.620545.
[4]J. Xie, ”Optical Character Recognition Based on Least Square
Support Vector Machine,” 2009 T hird In- ternational Symposium on
Intelligent Information Tech- nology Application, Shanghai, 2009, pp. 626-
629, doi: 10.1109/IIT A.2009.327.
[5] T. K. Hazra, D. P. Singh and N. Daga, ”Optical character recognition
using KNN on custom image dataset,” 2017 8th Annual Industrial Automation
and Electromechanical Engi-
Fig. 12. Handwritten text prediction from test input image.

V II . CONCL USION
The Handwritten Text Recognition model was implemented
using 5 Layer CNN and 2 Layer of RNN with BLSTM
network. CNN is an efficient way of feature extraction from a

978-1-7281-6387-1/20/$31.00 ©2020 IEEE 1332

Authorized licensed use limited to: Carleton University. Downloaded on May 30,2021 at 05:16:37 UTC from IEEE Xplore. Restrictions apply.
Fourth International Conference on Electronics, Communication and Aerospace Technology (ICECA-2020)
IEEE Xplore Part Number: CFP20J88-ART; ISBN: 978-1-7281-6387-1

neering Conference (IEMECON), Bangkok, 2017, pp. 110 - 114, doi:


10.1109/IEMECON.2017.8079572
[6]H. Bunke, M. Roth, and E. G. Schukat -Talamazzini, “Off-line cursive
handwriting recognition using hidden markov models”, Pattern Recognition,
vol. 28, pp. 1399–1413, 09
1995.
[7]A. Kundu, Yang He and Mou-Yen Chen, ”Alternatives to variable
duration HMM in handwriting recognition,” in IEEE Transactions on Pattern
Analysis and Machine Intel- ligence, vol. 20, no. 11, pp. 1275-1280, Nov.
1998, doi: 10.1109/34.730561.
[8]H. Ide and T. Kurita, ”Improvement of learning for CNN with
ReLU activation by sparse regulariza- tion,” 2017 International Joint
Conference on Neural Net- works (IJCNN), Anchorage, AK, 2017, pp. 2684-
2691, doi: 10.1109/IJCNN.2017.7966185.
[9]A. Graves, S. Fernández, M. Liwicki, H. Bunke, and J. Schmidhuber,
“ Unconstrained on-line handwriting recognition with recurrent neural
networks” in Advances in Neural Infor- mation Processing Systems 20, vol.
20, 01 2007.
[10]J. Puigcerver, “Are multidimensional recurrent layers really necessary
for handwritten text recognition?” 14th IAPR International Conference on
Document Analysis and Recog- nition (ICDAR), pp. 67–72, 11 2017.
[11]A. Graves, S. Fernández, and F. Gomez, “Connectionist temporal
classification: Labelling unsegmented sequence data with recurrent neural
networks”, International Conference on Machine Learning, pp. 369–376,
2006.
[1 2] Yulia S. Chernyshova, Mikhail A. Aliev, Ekaterina S.
Gushchanskaia, Alexander V. Sheshkus, ”Optical font recogni- tion in
smartphone-captured images and its applicability for ID forgery detection,”
Proc. SPIE 11041, Eleventh International Conference on Machine Vision
(ICMV 2018), 110411J (15 March 2019).
[1 3] Khan, Muhammad Yousaf, Adeel Abbas, Asad Khur- shid, Khurram.
(2018). Deep Learning for Automated Forgery Detection in Hyperspectral
Document Images. Journal of Electronic Imaging. 27. 053001.
10.1117/1.JEI.27.5.053001.
[1 4] Manoharan, Samuel. ”A Smart Image Processing Al- gorithm for
T ext Recognition Information Extraction and Vo- calization for the Visually
Challenged.” Journal of Innovative Image Processing (JIIP) 1, no. 01 (2019):
31-38.
[1 5] M. Rajesh et al., ”T ext recognition and face detec- tion aid for
visually impaired person using Raspberry PI,” 2017 International Conference
on Circuit ,Power and Com- puting Technologies (ICCPCT ), Kollam, 2017,
pp. 1-5, doi: 10.1109/ICCPCT .2017.8074355.

978-1-7281-6387-1/20/$31.00 ©2020 IEEE 1333

Authorized licensed use limited to: Carleton University. Downloaded on May 30,2021 at 05:16:37 UTC from IEEE Xplore. Restrictions apply.

You might also like