Nikitha 2020
Nikitha 2020
13th 2020
Authorized licensed use limited to: Universiti Kebangsaan Malaysia. Downloaded on May 14,2021 at 03:49:49 UTC from IEEE Xplore. Restrictions apply.
into the digital form using a simplified process as illustrated The HTR is crucial with the necessity of invading
in figure 1. This process generally consists of three stages: handwritten texts into the devices in a world where a big
Open the file, recognize the data and then display in a number of humans can acquire an access to HTR services
convenient format. using mobility devices. Based on the literature survey, this
is the primary work which explains the interrelationship
between online HTR systems with offline handwriting
recognition with reusing an existing training dataset to
build a fully functional HTR system. HTR presented in this
paper [14] is trained on images which are used from the
trajectory data of the online HTR systems. In order to
obtain acceptable accuracy in the real time scenarios for
handwritten text, adoptive image degradation techniques
have been used for generating training datasets for HTR to
OCR systems [14]. After enhanced handwritten images are
rendered, the handwritten images are preprocessed by using
an image degradation algorithm which has been applied to
realistic images. In addition to the collecting online
handwritten dataset, also handwriting synthetic image data
set has been used to increase the variability in the training
Fig. 1. HTR Process dataset to achieve better output. This paper demonstrated
II. LITERATURE SURVEY the feasibility of this aforementioned technique for Latin
script, but there is strong proof that the same approach with
Humans have been writing their thoughts for quite a long
pre- processing operation will work for other languages
time in the form of letter, transcripts etc.; for dissemination
also.
to others. But since computer technology has evolved the
format of handwritten text quickly transferred to digital
For the purpose of line recognition for HTR systems,
generated machine Text and so people feel the need for such
a way of transforming of handwritten text to digital text, as many researchers have experimented with different
processing those data are very simple and convenient. There architectures: one among is LSTM technique, similar to
have been more researchers who are working on handwriting many of the existing well proved approaches. However, one
text recognition (HTR) techniques. Some of research is primary issue with respect to recurrent models is that
mentioned below. recurrent models will not perform training operations and
cannot run easily on any specialized hardware as the Feed
A. . Hand-written Text Recognition Forward Networks. Therefore many researchers proposed a
They have followed the usage of line level strategies by fully Feed Forward Network model which can achieve a
combining the Convolutional Neural Network (CNN) and good comparative accuracy as the LSTM based
Long Short Term Memory (LSTM) [4] recurrent neural net- architectures.
works. These techniques have been used to extract features,
and train the model with Connectionist Temporal Classifi- However the handwritten text line recognition approach
cation (CTC [3]). There are many data-driven deep-learning is the dominant step in many HTR systems also it’s merely
based approaches which will extract and select prominent one of the important components of a complete handwritten
features to be employed from the training sample set, text recognition system. We will outline the new steps that
whereas in traditional methods which employ hand- have been used to combine Handwritten text recognition
engineered features. However above said techniques have reinforce into a text recognition system consisting of text
resulted very good improvements in recognizing handwritten recognition, directions detection, scripts identification, and
text on public data set [9], [8], scaling up these approaches text line recognition models [15].
for the purpose of supporting the trending disciplines or to a
new coding languages will be extremely challenging due to III. PROPOSED METHODOLOGY
higher costs and difficulties present in obtaining and
labeling a dataset of handwriting training data set. By A. Problem Statement
literature survey it is understood that the challenging issue in
The problem with respect to handwritten text recognition
developing an HTR system is collecting and labeling the
is very complex, and even now there is no single approach
training and testing data set of handwritten text.
that solves the problem of handwritten text recognition
efficiently. Recent works accomplished until now use the
This paper is a contribution to the field of HTR and strategy of recognizing in terms of character and the
OCR and also tries to address the common issues by using accuracy obtained by the recent works is not up to the mark
large training data of handwriting dataset: The data set from or not with good accuracy.
online HTR system [10- 11] has been used to training the
model to recognize line text in the existing HTR system B. Methodology
[12]. The data is a series of strokes (x, y) coordinates along
the timestamp of the users writing using their fingers or a The project follows an approach of developing the Hand-
stylus on the screens. written Text Recognition (HTR) such that the handwritten
389
Authorized licensed use limited to: Universiti Kebangsaan Malaysia. Downloaded on May 14,2021 at 03:49:49 UTC from IEEE Xplore. Restrictions apply.
text can be recognized accurately. In this project first we In this context in order for the image to be transformed to
will collect the data for training the handwritten texts, later black and white, we would need to transform a color
features have been extracted from those text datasets and image into a grey scale. To isolate a word, the picture
perform training of the model using Deep Learning must be clipped. We may use a fixed- size window, which
approach. In this work we are going to use the strategy to is equal to the size of a single word, for clipping. As per
recognize in terms of words rather those characters so the requirement, the window therefore breaks a single
that accuracy will be improved. The built model using word. The fig 4 shows segmented words.
LSTM deep model achieves a very good accuracy. The
general methodology of HTR is demonstrated in figure 1.
This figure presents the workflow of HTR systems. It
involves pre-processing, training and classification steps as
shown in figure 2. Each of the pre- processing steps are
explained below.
C. Pre-Processing
Fig. 4. Segmented Words
Whenever a document gets scanned or the original data
is specified, this could require some preliminary 3) Binarization: Binarization is also an essential step
processing. The pre-processing helps to create the final in the processing of images wherein the image pixels are
version of the document that will eventually be processed classified into two parts: background that is white and
by a handwritten method of text recognition. The major foreground that is black. A binary image can only be
pre-processing goals are: Noise Removal, Segmentation, found in two shades that is white and black. Therefore a
Thinning, Binarization, Normalization global gray scale intensity threshold is used in the
proposed binarization strategy. The binarized image is
1) Noise Removal: During scanning, the input can shown in fig 5.
contain different types of noise that can be unacceptable
during pro- cessing. The unnecessary pixels in the image
can be noise that is a black pixel where white is always
required, and conversely. The input image as shown in fig
2 contains background noise. So before continued
processing, some of this noise should indeed be
eliminated. A median filter with a filter size of
approximately 3 x 3 was chosen from the various noise Fig. 5. Binarized Image
reduction algorithm. As shown in figure 4 the background 4) Normalization: The framework should have the
from the input image has been eliminated using a noise ability to identify various font sizes. So, for that before
reduction algorithm that is median filtering. sending it to the classifier, all incoming characters must
be translated into a regular size. The method of changing
the picture size to such a size that is accepted by the
classifier is called normalization. The neural network
comprises of input layers that receives image pixels as the
input, for which the number of pixels is always set.
5) Classification: Just one the preprocessing is been
completed, output from the pre-processing steps that is an
Fig. 2. Noise in the Image image with a standard size is sent to a classifier. The
pixel location of the word is used as the input for the
classifier. In a neural network, there is indeed a unique
algorithm that could be utilized for classification, but here
the only algorithm for back propagation has been used.
D. LSTM Based Model
Fig. 3. Noise being eliminated from the Input Image For modern HTR systems mostly LSTM and other Re-
current Neural Networks (RNN) have been prominently
2) Segmentation: The method of separating the words used for text line recognition steps. This paper presented a
is segmentation. In HTR, it is very essential because only model, motivated by the CLDNN (Convolutions, LSTMs,
one word at a point can be recognized by the framework. and Deep Networks). For all the CNN layers, the
The basic connectivity principle was used. In this principle, inception architectures [20] has been used. For the LSTM
at some point, pixels of a words are considered to be layers, four stacked bidirectional LSTMs (BLSTMs) have
related to each other. Segmentation also consists of the been used. This is the way, how the model has only a
separation process between the image and the background. feed- forward network. This paper reported the method
390
Authorized licensed use limited to: Universiti Kebangsaan Malaysia. Downloaded on May 14,2021 at 03:49:49 UTC from IEEE Xplore. Restrictions apply.
with combination achieves sufficient accuracy in LSTM- network and LSTM model, the handwritten text
complaint systems. recognition system is implemented. The original image in
this framework is trans- formed into a gray scale image
E. Training Dataset after the gray scale image, after which the text in the image
is segmented, accompanied by training and recognition.
To demonstrate the performance of the HTR system, IAM The fig 7 shows the recognized text. The built model using
online handwritten dataset, online handwritten dataset and LSTM deep model achieves a very good accuracy of 94
offline handwritten datasets samples are being used. The percent.
IAM offline handwriting database [30] (offline) which
consist of scanned documents of handwritten text
images, which is written by around 500 different persons
using different prompt from Lancaster Oslo/Bergen (LOB)
texts corpus. The dataset consists of pictures with text lines,
which will be grouped based on different training
samplings, acceptance, and test datasets respectively. These
experiments make use of a combination of various sources
of datasets to acquire better accuracy in Handwritten Text
Recognition systems: Researchers have access to a
considerable bulk of ink datasets which is used to develop
an HTR model [10], [11] in different languages. For the
purpose of experimentation and reporting the results in this
paper, only Latin script is being considered but also
planning to extend this system to other languages also. The
dataset is feeded into images, pre-processing steps using an
image pre-processing pipeline is described in the paper. Fig. 7. Recognized Text
Later those pre-processed images have been processed using
Table 1 provides a comparison of the accuracy of the
the same degrading operation pipeline, which can be used to
train an HTR system on an artificial synthetic data set. In proposed method with previously recorded algorithms and
order to increase the accuracy with respect to text also compares the performance. Character Error Rate
recognition on printed image text, the degraded artificial (CER) and Words Error Rate (WER) obtained by using the
synthetic data set has been used. In addition to this data set 2DLSTM methods is 8.2 percent and 27.5 percent
also historic data image sets from many publicly available respectively. Character Error Rate (CER) and Word Error
datasets have been utilized in order to increase the accuracy Rate (WER) obtained by using the CNN-1DLSTM CTC is
with respect to text recognition [23]–[26]. And also a minor 6.2 percent and 20.5 percent respectively. While LSTM is
amount of modern image dataset of handwritten text shall leading in providing word level accuracy, with 2DLSTM
also be used. the accuracy of the character level is slightly reduced in
contrast. It means that the LSTM model appears to make
F. General Design increased spelling errors in the words that have already
been mislabeled, but ultimately it produces less spelling
The generalized design of the system is given in fig 6. errors at the aggregate word level.
The process of recognition consists of uploading an image
which is sent for the pre-processing steps so that the image Table:1 Comparison of HTR Method
is normalized to a size which can be accepted by the Methods Character Error Rate Word Error Rate
classifier. 2DLSTM 8.2 27.5
CNN-1DLSTM- CTC 6.2 20.5
V. CONCLUSION
This paper presented the approach to build a
handwritten text recognition system which is scalable in
future. The built model achieves sufficient accuracy for both
printed as well as the handwritten text when related to
specific handwritten text recognition models. In this paper
only English language is considered and the same is
reported in this result section of this paper. Experimentation
has been performed on the IAM handwritten text data set to
evaluate the performance of the two well-known approaches
Fig. 6. General Design of the System namely, CNN-1DLSTM-CTC and 2DLSTM. This paper
IV. RESULTS AND DISCUSSION reported that the LSTM based model for HTR systems
outperforms other methods. In future work many different
languages will be considered to evaluate the performance of
The goal of HTR is to recognize handwritten the presented approach of a handwritten HTR system.
characters using deep learning. Using a recurrent neural
391
Authorized licensed use limited to: Universiti Kebangsaan Malaysia. Downloaded on May 14,2021 at 03:49:49 UTC from IEEE Xplore. Restrictions apply.
REFERENCES [27]. J. Sa ´nchez, V. Romero, A. Toselli, M. Villegas, and E. Vidal,
“ICDAR2017 Competition on Handwritten Text Recognition on
[1]. Shahbaz Hassan, Ayesha Irfan, Ali Mirza, Imran Siddiqi, the READ Dataset,” in ICDAR, 2017.
”Cursive Handwritten Text Recognition using Bi-Directional [28]. Folger Shakespeare Library, “Early Modern Manuscripts Online
LSTMs: A case study on Urdu Handwriting”, in Deep-ML, (EMMO).” [Online]. Available: https://emmo.folger.edu
2019. [29]. K. Chen, L. Tian, H. Ding, M. Cai, L. Sun, S. Liang, and
[2]. Herleen Kour;Naveen Kumar Gondhi,”Machine Learning Q. Huo, “A compact cnn-dblstm based character model for
approaches for Nastaliq style Urdu handwritten recognition: A online handwritten chinese text recognition,” in ICDAR, 2017.
survey” in ICACCS, 2020. [30]. A. Graves, “Generating sequences with recurrent neural
[3]. A. Graves, S. Fernandez, F. Gomez, and J. Schmidhuber, networks,” ArXiV, 2013.
“Connection- ist temporal classification: Labelling [31]. Adhesh Garg;Diwanshi Gupta;Sanjay Saxena;Parimi Praveen
unsegmented sequence data with recurrent neural networks,” in Sa- hadev,”Validation of Random Dataset Using an Efficient
ICML, 2006. CNN Model Trained on MNIST Handwritten Dataset”, in SPIN
[4]. A. Graves and J. Schmidhuber, “Offline handwriting 2019.
recognition with multidimensional recurrent neural networks,” [32]. K. Asha;H.K. Krishnappa,”Kannada Handwritten Document
in NIPS, 2009. Recogni- tion using Convolutional Neural Network” in CSITSS,
[5]. Rohan Vaidya;Darshan Trivedi;Sagar Satra;Prof. Mrunalini 2018
Pimpale, ”Handwritten Character Recognition Using Deep- [33]. Thomas M. Breuel, ”High Performance Text Recognition Using
Learning”, in ICICCT, 2018 a Hybrid Convolutional-LSTM Implementation”, in ICDAR
[6]. Gideon Maillette de Buy Wenniger;Lambert Schomaker;Andy 2017.
Way,”No Padding Please: Efficient Neural Handwriting
Recognition”, in ICDAR 2019
[7]. A. Graves, M. Liwicki, S. Fernndez, R. Bertolami, H. Bunke,
and J. Schmidhuber, “A novel connectionist system for
unconstrained hand- writing recognition,” PAMI, vol. 31, no. 5,
pp. 855–868, 2009.
[8]. V. Pham, T. Bluche, C. Kermorvant, and J. Louradour, “Dropout
improves recurrent neural networks for handwriting
recognition,” in ICFHR, 2014.
[9]. P. Voigtlaender, P. Doetsch, and H. Ney, “Handwriting
recognition with large multidimensional long short-term
memory recurrent neural networks,” in ICFHR, 2016.
[10]. T. Bluche and R. Messina, “Gated convolutional recurrent
neural net- works for multilingual handwriting recognition,” in
ICDAR, vol. 01, 2017.
[11]. J. Puigcerver, “Are multidimensional recurrent layers really
necessary for handwritten text recognition?” in ICDAR, 2017.
[12]. D. Castro, B. L. D. Bezerra, and M. Valena, “Boosting the deep
multidi- mensional long-short-term memory network for
handwritten recognition systems,” in ICFHR, 2018.
[13]. D. Keysers, T. Deselaers, H. A. Rowley, L. Wang, and V.
Carbune, “Multi-language online handwriting recognition,”
PAMI, vol. 39, no. 6, pp. 1180–1194, 2017.
[14]. V. Carbune, P. Gonnet, T. Deselaers, H. A. Rowley, A. Daryin,
M. Calvo, L.-L. Wang, D. Keysers, S. Feuz, and P. Gervais,
“Fast multi-language lstm-based online handwriting
recognition,” ArXiV, 2019.
[15]. J. Walker, Y. Fujii, and A. C. Popat, “A web-based ocr service
for documents,” in DAS, Apr 2018, pp. 21–22.
[16]. S. Ghosh and A. Joshi, “Text entry in indian languages on
mobile: User perspectives,” in India HCI, 2014.
[17]. T. M. Breuel, “Tutorial on ocr and layout analysis,” in DAS,
2018. [18]Y. Fujii, K. Driesen, J. Baccash, A. Hurst, and A. C.
Popat, “Sequence-to-label script identification for multilingual
ocr,” in ICDAR, 2017
[18]. M. Kozielski, P. Doetsch, and H. Ney, “Improvements in rwth’s
system for off-line handwriting recognition,” in ICDAR, 2013.
[19]. P. Doetsch, M. Kozielski, and H. Ney, “Fast and robust training
of re- current neural networks for offline handwriting
recognition,” in ICFHR, 2014.
[20]. P. Voigtlaender, P. Doetsch, S. Wiesler, R. Schlter, and H. Ney,
“Sequence-discriminative training of recurrent neural
networks,” in ICASSP, 2015.
[21]. T. N. Sainath, O. Vinyals, A. Senior, and H. Sak,
“Convolutional, long short-term memory, fully connected deep
neural networks,” in ICASSP, 2015.
[22]. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov,
D. Erhan, Vanhoucke, and A. Rabinovich, “Going
deeper with convolutions,” in CVPR, 2015.
[23]. J. Wang and X. Hu, “Gated recurrent convolution neural
network for ocr,” in NIPS, 2017.
[24]. M. Liang and X. Hu, “Recurrent convolutional neural network
for object recognition,” in CVPR, 2015, pp. 3367–3375.
[25]. J. Sa ´nchez, A. Toselli, V. Romero, and E. Vidal, “ICDAR
2015.Competition HTRtS: Handwritten Text Recognition on the
tranScriptorium Dataset,” in ICDAR, 2015.
[26]. A. Toselli, V. Romero, M. Villegas, E. Vidal, and J. Sa ´nchez,
“ICFHR,2016 Competition on Handwritten Text Recognition on
the READ Dataset,” in ICFHR, 2016.
392
Authorized licensed use limited to: Universiti Kebangsaan Malaysia. Downloaded on May 14,2021 at 03:49:49 UTC from IEEE Xplore. Restrictions apply.