0% found this document useful (0 votes)
53 views5 pages

Nikitha 2020

This document summarizes a research paper presented at the 2020 5th International Conference on Recent Trends on Electronics, Information, Communication & Technology about using deep learning for handwritten text recognition. Specifically, it collected handwritten text data, extracted features, and trained a deep learning model using an LSTM network to recognize text at the word level rather than character level to improve accuracy. It integrated the developed approach into an optical character recognition system. The paper compared two approaches on the IAM dataset and found the 2DLSTM based approach performed better.

Uploaded by

Gaurav Dutta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views5 pages

Nikitha 2020

This document summarizes a research paper presented at the 2020 5th International Conference on Recent Trends on Electronics, Information, Communication & Technology about using deep learning for handwritten text recognition. Specifically, it collected handwritten text data, extracted features, and trained a deep learning model using an LSTM network to recognize text at the word level rather than character level to improve accuracy. It integrated the developed approach into an optical character recognition system. The paper compared two approaches on the IAM dataset and found the 2DLSTM based approach performed better.

Uploaded by

Gaurav Dutta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2020 5th International Conference on Recent Trends on Electronics, Information, Communication & Technology (RTEICT-2020), November 12th &

13th 2020

Handwritten Text Recognition using Deep Learning


2020 International Conference on Recent Trends on Electronics, Information, Communication & Technology (RTEICT) | 978-1-7281-9772-2/20/$31.00 ©2020 IEEE | DOI: 10.1109/RTEICT49044.2020.9315679

Nikitha A Dr. Geetha J Dr. JayaLakshmi D.S


Computer Science Computer Science Computer Science
Ramaiah Institute of Technology Ramaiah Institute of Technology Ramaiah Institute of Technology
Bangalore, India Bangalore, India Bangalore, India
[email protected] [email protected] [email protected]

Abstract—There are many researchers working on handwrit-


various different types of image formats: handwritten
ten text recognition (HTR) and also contributing to HTR domain.
Even though many research methods are existing for HTR, there notes, historical documents, whiteboards, medical records
is a need for some more improvements in the accuracy of the etc. There are many applications of Automated HTR
HTR systems. This paper is a contribution of the application of system, a need for complete Optical Character
the Deep Learning algorithm for the HTR system. In this paper Recognition (OCR) solution has to be included in HTR in
first we will collect the data for training the handwritten texts, images. This motivates the need for the research in the
later features have been extracted from those text datasets and
domain of HTR and OCR systems.
perform training of the model using Deep Learning approach.
In this work we are going to use the strategy to recognize in terms
of words rather those characters so that accuracy will be B. Deep Learning
improved. The built model using LSTM deep model achieves a Deep Learning is a subdivision pertaining to Machine
very good accuracy. Lastly, this developed approach of the HTR Learning, which is also a subdivision pertaining to Artifi-
system is integrated into the OCR system and comparison of cial Intelligence. Artificial intelligence is an approach which
results are reported in this paper. Two approaches have been empowers the system to mirror human behaviors. Machine
compared in this paper on IAM handwritten data set, and found Learning is an approach to attain Artificial Intelligence using
that 2DLSTM based approach outperforms the other approach. algorithms trained along datasets and lastly Deep Learning is
Keywords—LSTM, Deep Learning, Convolutional Neural a kind of Machine Learning influenced by the complexion of
Network, 2DLSTM. the human brain. Deep Learning is a Machine Learning
approach that grasps features and tasks directly from data.
I. INTRODUCTION Data can be images, text, or sound. Deep Learning is
generally indicated as end to end learning with Deep
This system is built to recognize handwritten text and Learning features picked up with Neural Network without
then convert the recognized into digital form. human intervention. Deep Neural Networks requires hours or
even months to train. The training time increases with the
A. Overview amount of data and the number of layers in the network.
Although there exists many technologies that can be C. Convolutional Neural Network
used in the creation of text based documents, many people
tend to use pen and paper and create physical documents The Convolutional Neural Network (CNN) is a deep
which are important. Data storage and physical data learning network that is used for image classification.
CNN’s basic concept is to use predefined convertible filters
retrieval of traditional documents is a most challenging
to distinguish patterns in image edges, parts of objects, and
problem. Handwriting recognition is growing rapidly in the
expand on this information to detect complete objects such
present globalization. Handwriting recognition is the
as animals, humans, cars, etc. Hubel and Wiesel’s reporting
ability of the computer to trans- late the human writing to on CNN has been going on since 1959. The models worked
digital form. Handwritten recognition is a technology that but models were not able to automate learning. CNNs are
can be used to recognize handwritten characters and also mainly used to identify images, cluster and classify images,
can be deployed on other devices like PDA and tablet PCs. detect objects, etc. CNNs use comparatively less
Advantages of handwritten recognition is that electronic preprocessing than other image processing algorithms. The
storage can be adopted which requires fewer employees to CNN’s communication pattern resembles the visual cortex
sort the documents and to organize the documents. Other structure of animals. Compared to other algorithms for the
than that electronic data retrieval can be performed. image classification, CNNs use very little preprocessing.
Moreover, another advantage is that ancient preservations. This means the network is learning the filters which have
Old family records, personnel dairies which might be been hand-engineered in conventional algorithms. This
corrupted due to accidents or any other reasons that is when freedom from previous expertise and human initiative in the
handwriting recognition software is very helpful. Apart design of features is a huge gain.
from data storage and data retrieval the most challenging
D. Overall Description
problem associated with handwriting recognition is that
most of the people tend to write with their style and Handwritten Text Recognition HTR, is a method which
dealing with accuracy is also a major problem. This system allows us to convert documents, like scanned image or
which we have implemented provides solutions to tackle images browsed by the user into digital form. This method
such problems. We have provided a solution using a stack is very helpful as it minimizes time without the necessity of
re-typing the documents. It can perform the action in few
of bidirectional LSTMs (BLSTMs) as they provide better
minutes. It can recognize texts the in pictures and convert it
results. Handwriting text will be in

978-1-7281-9772-2/20/$31.00 ©2020 IEEE


388

Authorized licensed use limited to: Universiti Kebangsaan Malaysia. Downloaded on May 14,2021 at 03:49:49 UTC from IEEE Xplore. Restrictions apply.
into the digital form using a simplified process as illustrated The HTR is crucial with the necessity of invading
in figure 1. This process generally consists of three stages: handwritten texts into the devices in a world where a big
Open the file, recognize the data and then display in a number of humans can acquire an access to HTR services
convenient format. using mobility devices. Based on the literature survey, this
is the primary work which explains the interrelationship
between online HTR systems with offline handwriting
recognition with reusing an existing training dataset to
build a fully functional HTR system. HTR presented in this
paper [14] is trained on images which are used from the
trajectory data of the online HTR systems. In order to
obtain acceptable accuracy in the real time scenarios for
handwritten text, adoptive image degradation techniques
have been used for generating training datasets for HTR to
OCR systems [14]. After enhanced handwritten images are
rendered, the handwritten images are preprocessed by using
an image degradation algorithm which has been applied to
realistic images. In addition to the collecting online
handwritten dataset, also handwriting synthetic image data
set has been used to increase the variability in the training
Fig. 1. HTR Process dataset to achieve better output. This paper demonstrated
II. LITERATURE SURVEY the feasibility of this aforementioned technique for Latin
script, but there is strong proof that the same approach with
Humans have been writing their thoughts for quite a long
pre- processing operation will work for other languages
time in the form of letter, transcripts etc.; for dissemination
also.
to others. But since computer technology has evolved the
format of handwritten text quickly transferred to digital
For the purpose of line recognition for HTR systems,
generated machine Text and so people feel the need for such
a way of transforming of handwritten text to digital text, as many researchers have experimented with different
processing those data are very simple and convenient. There architectures: one among is LSTM technique, similar to
have been more researchers who are working on handwriting many of the existing well proved approaches. However, one
text recognition (HTR) techniques. Some of research is primary issue with respect to recurrent models is that
mentioned below. recurrent models will not perform training operations and
cannot run easily on any specialized hardware as the Feed
A. . Hand-written Text Recognition Forward Networks. Therefore many researchers proposed a
They have followed the usage of line level strategies by fully Feed Forward Network model which can achieve a
combining the Convolutional Neural Network (CNN) and good comparative accuracy as the LSTM based
Long Short Term Memory (LSTM) [4] recurrent neural net- architectures.
works. These techniques have been used to extract features,
and train the model with Connectionist Temporal Classifi- However the handwritten text line recognition approach
cation (CTC [3]). There are many data-driven deep-learning is the dominant step in many HTR systems also it’s merely
based approaches which will extract and select prominent one of the important components of a complete handwritten
features to be employed from the training sample set, text recognition system. We will outline the new steps that
whereas in traditional methods which employ hand- have been used to combine Handwritten text recognition
engineered features. However above said techniques have reinforce into a text recognition system consisting of text
resulted very good improvements in recognizing handwritten recognition, directions detection, scripts identification, and
text on public data set [9], [8], scaling up these approaches text line recognition models [15].
for the purpose of supporting the trending disciplines or to a
new coding languages will be extremely challenging due to III. PROPOSED METHODOLOGY
higher costs and difficulties present in obtaining and
labeling a dataset of handwriting training data set. By A. Problem Statement
literature survey it is understood that the challenging issue in
The problem with respect to handwritten text recognition
developing an HTR system is collecting and labeling the
is very complex, and even now there is no single approach
training and testing data set of handwritten text.
that solves the problem of handwritten text recognition
efficiently. Recent works accomplished until now use the
This paper is a contribution to the field of HTR and strategy of recognizing in terms of character and the
OCR and also tries to address the common issues by using accuracy obtained by the recent works is not up to the mark
large training data of handwriting dataset: The data set from or not with good accuracy.
online HTR system [10- 11] has been used to training the
model to recognize line text in the existing HTR system B. Methodology
[12]. The data is a series of strokes (x, y) coordinates along
the timestamp of the users writing using their fingers or a The project follows an approach of developing the Hand-
stylus on the screens. written Text Recognition (HTR) such that the handwritten

389

Authorized licensed use limited to: Universiti Kebangsaan Malaysia. Downloaded on May 14,2021 at 03:49:49 UTC from IEEE Xplore. Restrictions apply.
text can be recognized accurately. In this project first we In this context in order for the image to be transformed to
will collect the data for training the handwritten texts, later black and white, we would need to transform a color
features have been extracted from those text datasets and image into a grey scale. To isolate a word, the picture
perform training of the model using Deep Learning must be clipped. We may use a fixed- size window, which
approach. In this work we are going to use the strategy to is equal to the size of a single word, for clipping. As per
recognize in terms of words rather those characters so the requirement, the window therefore breaks a single
that accuracy will be improved. The built model using word. The fig 4 shows segmented words.
LSTM deep model achieves a very good accuracy. The
general methodology of HTR is demonstrated in figure 1.
This figure presents the workflow of HTR systems. It
involves pre-processing, training and classification steps as
shown in figure 2. Each of the pre- processing steps are
explained below.

C. Pre-Processing
Fig. 4. Segmented Words
Whenever a document gets scanned or the original data
is specified, this could require some preliminary 3) Binarization: Binarization is also an essential step
processing. The pre-processing helps to create the final in the processing of images wherein the image pixels are
version of the document that will eventually be processed classified into two parts: background that is white and
by a handwritten method of text recognition. The major foreground that is black. A binary image can only be
pre-processing goals are: Noise Removal, Segmentation, found in two shades that is white and black. Therefore a
Thinning, Binarization, Normalization global gray scale intensity threshold is used in the
proposed binarization strategy. The binarized image is
1) Noise Removal: During scanning, the input can shown in fig 5.
contain different types of noise that can be unacceptable
during pro- cessing. The unnecessary pixels in the image
can be noise that is a black pixel where white is always
required, and conversely. The input image as shown in fig
2 contains background noise. So before continued
processing, some of this noise should indeed be
eliminated. A median filter with a filter size of
approximately 3 x 3 was chosen from the various noise Fig. 5. Binarized Image
reduction algorithm. As shown in figure 4 the background 4) Normalization: The framework should have the
from the input image has been eliminated using a noise ability to identify various font sizes. So, for that before
reduction algorithm that is median filtering. sending it to the classifier, all incoming characters must
be translated into a regular size. The method of changing
the picture size to such a size that is accepted by the
classifier is called normalization. The neural network
comprises of input layers that receives image pixels as the
input, for which the number of pixels is always set.
5) Classification: Just one the preprocessing is been
completed, output from the pre-processing steps that is an
Fig. 2. Noise in the Image image with a standard size is sent to a classifier. The
pixel location of the word is used as the input for the
classifier. In a neural network, there is indeed a unique
algorithm that could be utilized for classification, but here
the only algorithm for back propagation has been used.
D. LSTM Based Model
Fig. 3. Noise being eliminated from the Input Image For modern HTR systems mostly LSTM and other Re-
current Neural Networks (RNN) have been prominently
2) Segmentation: The method of separating the words used for text line recognition steps. This paper presented a
is segmentation. In HTR, it is very essential because only model, motivated by the CLDNN (Convolutions, LSTMs,
one word at a point can be recognized by the framework. and Deep Networks). For all the CNN layers, the
The basic connectivity principle was used. In this principle, inception architectures [20] has been used. For the LSTM
at some point, pixels of a words are considered to be layers, four stacked bidirectional LSTMs (BLSTMs) have
related to each other. Segmentation also consists of the been used. This is the way, how the model has only a
separation process between the image and the background. feed- forward network. This paper reported the method

390

Authorized licensed use limited to: Universiti Kebangsaan Malaysia. Downloaded on May 14,2021 at 03:49:49 UTC from IEEE Xplore. Restrictions apply.
with combination achieves sufficient accuracy in LSTM- network and LSTM model, the handwritten text
complaint systems. recognition system is implemented. The original image in
this framework is trans- formed into a gray scale image
E. Training Dataset after the gray scale image, after which the text in the image
is segmented, accompanied by training and recognition.
To demonstrate the performance of the HTR system, IAM The fig 7 shows the recognized text. The built model using
online handwritten dataset, online handwritten dataset and LSTM deep model achieves a very good accuracy of 94
offline handwritten datasets samples are being used. The percent.
IAM offline handwriting database [30] (offline) which
consist of scanned documents of handwritten text
images, which is written by around 500 different persons
using different prompt from Lancaster Oslo/Bergen (LOB)
texts corpus. The dataset consists of pictures with text lines,
which will be grouped based on different training
samplings, acceptance, and test datasets respectively. These
experiments make use of a combination of various sources
of datasets to acquire better accuracy in Handwritten Text
Recognition systems: Researchers have access to a
considerable bulk of ink datasets which is used to develop
an HTR model [10], [11] in different languages. For the
purpose of experimentation and reporting the results in this
paper, only Latin script is being considered but also
planning to extend this system to other languages also. The
dataset is feeded into images, pre-processing steps using an
image pre-processing pipeline is described in the paper. Fig. 7. Recognized Text
Later those pre-processed images have been processed using
Table 1 provides a comparison of the accuracy of the
the same degrading operation pipeline, which can be used to
train an HTR system on an artificial synthetic data set. In proposed method with previously recorded algorithms and
order to increase the accuracy with respect to text also compares the performance. Character Error Rate
recognition on printed image text, the degraded artificial (CER) and Words Error Rate (WER) obtained by using the
synthetic data set has been used. In addition to this data set 2DLSTM methods is 8.2 percent and 27.5 percent
also historic data image sets from many publicly available respectively. Character Error Rate (CER) and Word Error
datasets have been utilized in order to increase the accuracy Rate (WER) obtained by using the CNN-1DLSTM CTC is
with respect to text recognition [23]–[26]. And also a minor 6.2 percent and 20.5 percent respectively. While LSTM is
amount of modern image dataset of handwritten text shall leading in providing word level accuracy, with 2DLSTM
also be used. the accuracy of the character level is slightly reduced in
contrast. It means that the LSTM model appears to make
F. General Design increased spelling errors in the words that have already
been mislabeled, but ultimately it produces less spelling
The generalized design of the system is given in fig 6. errors at the aggregate word level.
The process of recognition consists of uploading an image
which is sent for the pre-processing steps so that the image Table:1 Comparison of HTR Method
is normalized to a size which can be accepted by the Methods Character Error Rate Word Error Rate
classifier. 2DLSTM 8.2 27.5
CNN-1DLSTM- CTC 6.2 20.5

V. CONCLUSION
This paper presented the approach to build a
handwritten text recognition system which is scalable in
future. The built model achieves sufficient accuracy for both
printed as well as the handwritten text when related to
specific handwritten text recognition models. In this paper
only English language is considered and the same is
reported in this result section of this paper. Experimentation
has been performed on the IAM handwritten text data set to
evaluate the performance of the two well-known approaches
Fig. 6. General Design of the System namely, CNN-1DLSTM-CTC and 2DLSTM. This paper
IV. RESULTS AND DISCUSSION reported that the LSTM based model for HTR systems
outperforms other methods. In future work many different
languages will be considered to evaluate the performance of
The goal of HTR is to recognize handwritten the presented approach of a handwritten HTR system.
characters using deep learning. Using a recurrent neural

391

Authorized licensed use limited to: Universiti Kebangsaan Malaysia. Downloaded on May 14,2021 at 03:49:49 UTC from IEEE Xplore. Restrictions apply.
REFERENCES [27]. J. Sa ´nchez, V. Romero, A. Toselli, M. Villegas, and E. Vidal,
“ICDAR2017 Competition on Handwritten Text Recognition on
[1]. Shahbaz Hassan, Ayesha Irfan, Ali Mirza, Imran Siddiqi, the READ Dataset,” in ICDAR, 2017.
”Cursive Handwritten Text Recognition using Bi-Directional [28]. Folger Shakespeare Library, “Early Modern Manuscripts Online
LSTMs: A case study on Urdu Handwriting”, in Deep-ML, (EMMO).” [Online]. Available: https://emmo.folger.edu
2019. [29]. K. Chen, L. Tian, H. Ding, M. Cai, L. Sun, S. Liang, and
[2]. Herleen Kour;Naveen Kumar Gondhi,”Machine Learning Q. Huo, “A compact cnn-dblstm based character model for
approaches for Nastaliq style Urdu handwritten recognition: A online handwritten chinese text recognition,” in ICDAR, 2017.
survey” in ICACCS, 2020. [30]. A. Graves, “Generating sequences with recurrent neural
[3]. A. Graves, S. Fernandez, F. Gomez, and J. Schmidhuber, networks,” ArXiV, 2013.
“Connection- ist temporal classification: Labelling [31]. Adhesh Garg;Diwanshi Gupta;Sanjay Saxena;Parimi Praveen
unsegmented sequence data with recurrent neural networks,” in Sa- hadev,”Validation of Random Dataset Using an Efficient
ICML, 2006. CNN Model Trained on MNIST Handwritten Dataset”, in SPIN
[4]. A. Graves and J. Schmidhuber, “Offline handwriting 2019.
recognition with multidimensional recurrent neural networks,” [32]. K. Asha;H.K. Krishnappa,”Kannada Handwritten Document
in NIPS, 2009. Recogni- tion using Convolutional Neural Network” in CSITSS,
[5]. Rohan Vaidya;Darshan Trivedi;Sagar Satra;Prof. Mrunalini 2018
Pimpale, ”Handwritten Character Recognition Using Deep- [33]. Thomas M. Breuel, ”High Performance Text Recognition Using
Learning”, in ICICCT, 2018 a Hybrid Convolutional-LSTM Implementation”, in ICDAR
[6]. Gideon Maillette de Buy Wenniger;Lambert Schomaker;Andy 2017.
Way,”No Padding Please: Efficient Neural Handwriting
Recognition”, in ICDAR 2019
[7]. A. Graves, M. Liwicki, S. Fernndez, R. Bertolami, H. Bunke,
and J. Schmidhuber, “A novel connectionist system for
unconstrained hand- writing recognition,” PAMI, vol. 31, no. 5,
pp. 855–868, 2009.
[8]. V. Pham, T. Bluche, C. Kermorvant, and J. Louradour, “Dropout
improves recurrent neural networks for handwriting
recognition,” in ICFHR, 2014.
[9]. P. Voigtlaender, P. Doetsch, and H. Ney, “Handwriting
recognition with large multidimensional long short-term
memory recurrent neural networks,” in ICFHR, 2016.
[10]. T. Bluche and R. Messina, “Gated convolutional recurrent
neural net- works for multilingual handwriting recognition,” in
ICDAR, vol. 01, 2017.
[11]. J. Puigcerver, “Are multidimensional recurrent layers really
necessary for handwritten text recognition?” in ICDAR, 2017.
[12]. D. Castro, B. L. D. Bezerra, and M. Valena, “Boosting the deep
multidi- mensional long-short-term memory network for
handwritten recognition systems,” in ICFHR, 2018.
[13]. D. Keysers, T. Deselaers, H. A. Rowley, L. Wang, and V.
Carbune, “Multi-language online handwriting recognition,”
PAMI, vol. 39, no. 6, pp. 1180–1194, 2017.
[14]. V. Carbune, P. Gonnet, T. Deselaers, H. A. Rowley, A. Daryin,
M. Calvo, L.-L. Wang, D. Keysers, S. Feuz, and P. Gervais,
“Fast multi-language lstm-based online handwriting
recognition,” ArXiV, 2019.
[15]. J. Walker, Y. Fujii, and A. C. Popat, “A web-based ocr service
for documents,” in DAS, Apr 2018, pp. 21–22.
[16]. S. Ghosh and A. Joshi, “Text entry in indian languages on
mobile: User perspectives,” in India HCI, 2014.
[17]. T. M. Breuel, “Tutorial on ocr and layout analysis,” in DAS,
2018. [18]Y. Fujii, K. Driesen, J. Baccash, A. Hurst, and A. C.
Popat, “Sequence-to-label script identification for multilingual
ocr,” in ICDAR, 2017
[18]. M. Kozielski, P. Doetsch, and H. Ney, “Improvements in rwth’s
system for off-line handwriting recognition,” in ICDAR, 2013.
[19]. P. Doetsch, M. Kozielski, and H. Ney, “Fast and robust training
of re- current neural networks for offline handwriting
recognition,” in ICFHR, 2014.
[20]. P. Voigtlaender, P. Doetsch, S. Wiesler, R. Schlter, and H. Ney,
“Sequence-discriminative training of recurrent neural
networks,” in ICASSP, 2015.
[21]. T. N. Sainath, O. Vinyals, A. Senior, and H. Sak,
“Convolutional, long short-term memory, fully connected deep
neural networks,” in ICASSP, 2015.
[22]. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov,
D. Erhan, Vanhoucke, and A. Rabinovich, “Going
deeper with convolutions,” in CVPR, 2015.
[23]. J. Wang and X. Hu, “Gated recurrent convolution neural
network for ocr,” in NIPS, 2017.
[24]. M. Liang and X. Hu, “Recurrent convolutional neural network
for object recognition,” in CVPR, 2015, pp. 3367–3375.
[25]. J. Sa ´nchez, A. Toselli, V. Romero, and E. Vidal, “ICDAR
2015.Competition HTRtS: Handwritten Text Recognition on the
tranScriptorium Dataset,” in ICDAR, 2015.
[26]. A. Toselli, V. Romero, M. Villegas, E. Vidal, and J. Sa ´nchez,
“ICFHR,2016 Competition on Handwritten Text Recognition on
the READ Dataset,” in ICFHR, 2016.

392

Authorized licensed use limited to: Universiti Kebangsaan Malaysia. Downloaded on May 14,2021 at 03:49:49 UTC from IEEE Xplore. Restrictions apply.

You might also like