Generating and Smoothing Handwriting With LSTM
Generating and Smoothing Handwriting With LSTM
2021
Edward Fry
Southern Methodist University, [email protected]
Ikenna Nwaogu
Southern Methodist University, [email protected]
YuMei Bennett
Southern Methodist University, [email protected]
John Santerre
Southern Methodist University, [email protected]
Recommended Citation
kimari, muchigi; Fry, Edward; Nwaogu, Ikenna; Bennett, YuMei; and Santerre, John (2021) "Generating and
Smoothing Handwriting with Long Short-Term Memory Networks," SMU Data Science Review: Vol. 5: No.
1, Article 4.
Available at: https://scholar.smu.edu/datasciencereview/vol5/iss1/4
This Article is brought to you for free and open access by SMU Scholar. It has been accepted for inclusion in SMU
Data Science Review by an authorized administrator of SMU Scholar. For more information, please visit
http://digitalrepository.smu.edu.
kimari et al.: Generating and Smoothing Handwriting with LSTM
Abstract
This project explores the different neural network methods to generate
synthetic handwriting text. The goal is to offer an AI tool that generates
handwriting, while maintaining an individual’s style, to people suffering with
Dysgraphia. As part of this project, an application development framework is
setup on GitHub, in such a way that others can continue to explore and
improve the AI tool.
1 Introduction
Since there is no common handwriting style, it is challenging to develop a tool that
generates and smooths handwriting. However, the potential future application
and rewards for helping people with Dysgraphia, on the hand, make this challenge
worthwhile. Dysgraphia impacts a person’s ability to learn in many ways. One
example of Dysgraphia’s effect on learning involves issues with fine motor skills.
When a person develops an idea, or learns a new concept the best memorization
technique is to repeatedly write the concept or idea on a piece of paper [5].
However, due to fine motor deficiency, people with Dysgraphia find writing
simple text difficult, impeding their ability to memorize information.
Due to the nonlinear nature of Neural Networks, the possibilities for solving
complex non linear problems are endless. Radiology can now keep up with image
processing thanks to Neural Networks. Self driving cars can now map road marks
to increase safety [4]. This research paper focuses on applying Neural Networks
to recognize and smooth handwriting in order to assist people with Dysgraphia.
The application of Neural Networks has many techniques, but for this research
papers the focus is on Long Short-Term Memory networks. LSTM models are good
for solving handwriting recognition problems because they have the ability to
memorize different styles and come up with a probabilistic and standard style of
handwriting [16].
The goal of this project is to inspire and broaden the research on Dysgraphia
and its effect on handwriting [15].
2 Background
Text recognition and generation has a long history, dating back before Neural
Networks became a science research field. The telegraph, invented in the early
1910s, is regarded as first text recognition application. Since its inception, the
telegraph has helped the blind to read and has made it possible to deliver short
messages without expensive telephone calls[22]. Technology is always advancing
looking for ways to increase accuracy and efficiency. Today, applications like scan
to digital documents, text to speech audio books, tablet note taking, and smart
phone language translation are not only widely available, but also relied on in our
daily life and work[8].
In recent years, Neural Network development has fueled the advancement of
text recognition and generation. There are two main categories of competing
technologies in this area. The Optical Character Recognition (OCR) is the most
popular method. It processes the printed or handwritten text as an image file, in
pixel format, utilizing the neural network’s mega data processing power to train
the model. This method is referred to as the offline method[7]. The second
category is the pen stroke position method, where the application collects pen
stroke positions as time series data. The advantage of the pen stroke position
method is that its data size is significantly smaller compared to OCR, making it
easier to work with. This method requires a device to collect pen position as the
writer creates the text, and it is referred to as the online method. Figure 1 shows
an example of online and offline text format.
The primary research in the text recognition and generation areas are focused
on commercially viable applications, such as medical form transcription, library
archive digitization, enterprise data entry automation, and indexing document for
search engines. Although just as important but less profitable, areas directed at
helping the visually or vocally impaired, the autistic, or those suffering with a
https://scholar.smu.edu/datasciencereview/vol5/iss1/4 2
kimari et al.: Generating and Smoothing Handwriting with LSTM
motor function disability like Dysgraphia are not as widely researched. This
project focuses on establishing a base line application, to generate legible text with
a handwriting style preference.
2.1 Dysgraphia
Dysgraphia is a bio-neural system disorder that impacts the fine motor skills.
Since Dysgraphia effects the hand and finger fine motor skills, the handwriting
and artwork of a person with Dysgraphia may appear illegible or sloppy[1]. The
severity of Dysgraphia varies from mild to severe. Mild symptoms may emerge as
a dislike for handwriting due to the difficulties associated with fine motor skills.
Severe symptoms may show as a visible hand tremor. In mild cases, Dysgraphia is
often left misdiagnosed or mistaken for laziness or sloppiness. People with
Dysgraphia often have other bio-neural system disorders like Dyslexia, Attention-
deficit/hyperactivity disorder, or brain trauma[21]. Although there is limited
medical research on Dysgraphia to date, it is estimated that it affects two to 25
percent of the population. Dysgraphia was not recognized as a neural disorder
until 1993 by Hamstra-Bletz Blote as a disturbance or difficulty in the production
of written language that is related to the mechanics of writing[23]. Prior to then,
it was viewed as a symptom of Dyslexia.
Dysgraphia symptoms often appear when a child is starting to learn how to
write and draw. Many schools do not recognize Dysgraphia in children. A child’s
lack of progress is often confused with not working hard enough, and at times,
children with Dysgraphia are punished by assigning more home work, detention,
or parental reports. Such oversights lead to low morale, depression, and
withdrawal of these children[14].
The handwriting of someone with Dysgraphia often appears illegible and
cacographic. The figure 2 3 4 shows a number of samples: letters appear more
slanted then usual, are unevenly sized and spaced.
https://scholar.smu.edu/datasciencereview/vol5/iss1/4 4
kimari et al.: Generating and Smoothing Handwriting with LSTM
3 Data
3.1 Structure
The input data set is in the form of a set of XML files, each of which reflects the set
of strokes belonging to a single sample of handwriting. The samples come from
the IAM Online Database [19]. They were created by volunteers who wrote short
phrases on an electronic device that captured the position of their pen in real time
as they wrote. Each point captured has an x and y coordinate as well as a time
stamp. This project will focus solely on the x,y pairs. Here is a sample of one of the
XML files:
<WhiteboardCaptureSession>
<WhiteboardDescription>
<SensorLocation corner="top_left"/>
<DiagonallyOppositeCoords x="6512" y="1376"/>
<VerticallyOppositeCoords x="966" y="1376"/>
<HorizontallyOppositeCoords x="6512" y="787"/>
</WhiteboardDescription>
<StrokeSet>
<Stroke colour="black" start_time="769.05" end_time="769.64">
<Point x="1073" y="1058" time="769.05"/>
<Point x="1072" y="1085" time="769.07"/>
...
<Point x="1215" y="1353" time="769.63"/>
<Point x="1204" y="1330" time="769.64"/>
</Stroke>
<Stroke colour="black" start_time="769.70" end_time="769.90">
<Point x="1176" y="1237" time="769.70"/>
<Point x="1175" y="1233" time="769.72"/>
...
<Point x="1010" y="1239" time="769.88"/>
<Point x="1014" y="1243" time="769.90"/>
</Stroke>
...
</StrokeSet>
</WhiteboardCaptureSession>
After studying the data set and considering the needs of the training process
(described below), a simple cascading series of one-to-many relationships
emerges as the apparent data modeling choice. To reflect this structure in the
application, a series of three objects is used. First, one Dataset object represents
the entire training data set and can be reused later to represent smoothing
samples in a consistent way. Each Dataset contains a list of stroke sets (StrokeSet),
where one stroke set represents the entirety of a single handwriting sample
written by a volunteer, as shown in Figure 5:
Each stroke set then contains multiple strokes, stored as a list of Stroke
objects. A stroke is the set of points drawn between the time that the pen is set to
the surface and lifted up again. Finally, each stroke contains a list of points. Each
point is a fourth class (i.e. Point) in this hierarchy even though, for practical and
performance reasons, the points are simply stored as x,y tuples in the stroke
object. The entire conceptual structure of the application is expressed in this UML
class diagram:
Here is the complete picture of how the data in the input data set corresponds
to its in-memory representation.
https://scholar.smu.edu/datasciencereview/vol5/iss1/4 6
kimari et al.: Generating and Smoothing Handwriting with LSTM
an SVG file. In addition, a plot of the sample in a variety of biases is saved for
further study on the effects of bias to the generation.
4 Methods
Two approaches, generative adversarial network (GAN) and Long Short-Term
Memory network (LSTM), were explored to generate and smooth handwriting;
however, Long Short-Term Memory network (LSTM) provided the better
outcome.
4.1 GAN
GAN is a new convolutional neural network used for image processing. It is made
up of two competing networks, which work together to generate a better model
for prediction. The first network, the generator, takes an input and generates a
false input, and then the second network, the discriminator, determines whether
the input is true or false [11]. The discriminator generates noise as its initial input,
but as it learns, it generates better input data. This approach deals with processing
handwritten images. Each image is split into words, then letters, and the final
image input which is a letter image is used for learning and prediction. To achieve
this, image extraction technologies such as Textract and OpenCV must be utilized.
The metrics used for measuring the recognition accuracy are Precision and
Recall. To be able to apply any of these performance metrics, the measure of the
correctness of each detection needs to be established. For any given detection, a
bounding box or a circle is placed on the object. The metric that measures the
correctness of a given bounding box or circle is called the Intersection over Union
(IoU). This is the ratio between the intersection and the union of the predicted box
or circle and the grounding truth box [10]. Grounding truth box or circle is the
correct box representation that would cover each letter. Twenty labeled image
files from the data sample were used to measure the performance of each of the
detection applications used, and then the mean precision and mean recall were
recorded. Both the textract and openCV application had a mean precision of 0.9
and 0.8 respectively and a mean recall of 0.8 and 0.76 respectively. Textract
performed better when compared with OpenCV because it is specifically designed
to extract text and letters. Even though textract is able to recognize letters, there
are instances where textract recognizes two letters as one and this poses a
significant challenge.
https://scholar.smu.edu/datasciencereview/vol5/iss1/4 8
kimari et al.: Generating and Smoothing Handwriting with LSTM
4.2 LSTM
The work here is based largely on a paper by Alex Graves [9]. The reader should
consider studying that paper first to maximize understanding of both LSTMs and
handwriting generation before proceeding. In addition, portions of the
implementation were adapted from another application of Graves’ work [6] and
duly noted in the source code.
10
https://scholar.smu.edu/datasciencereview/vol5/iss1/4 10
kimari et al.: Generating and Smoothing Handwriting with LSTM
11
The second hook will save heat map plots of the training progress every 100
batches. These heat maps are illustrated in Figure 11 and represent the training
φs (more below) and soft attention windows at that point in the training. They are
useful for understanding the math behind the training, for making sure training is
proceeding as expected, and for selectively including in other publications like
papers or reports.
The third hook, shown in figure 10, generates a sample handwriting sequence
every 500 batches. This is useful to see how the training is progressing and to
understand what the model is doing at certain points in its execution.
Then, a fourth hook will trigger at the end of the training operation if the –test-
model flag is set. This will generate a final set of handwriting images in SVG format
to validate the state of the trained model, as demonstrated in figure 12.
12
https://scholar.smu.edu/datasciencereview/vol5/iss1/4 12
kimari et al.: Generating and Smoothing Handwriting with LSTM
5 Results
The training data set contained 12,195 handwriting samples. Training took place
on a university mainframe computer with GPU support. Each epoch saw an
improvement in the loss, with the largest (exponential) improvements recorded
in the first 3 epochs. After 50 epochs, the loss reached its minimum of -348.
Handwriting generation resulted in the most realistic output as can be seen in
Figure 10f on page 11 and Figure 12 on page 12.
Smoothing results were less satisfactory than generation, as can be seen in
Figure 14. In fact, results degraded as the bias values increased. High biases were
unrecognizable as handwriting, while lower biases were appeared worse than the
original smoothing sample, which can be seen in Figure 5 on page 6.
Several approaches were tried to mitigate this effect, since Graves [9]
indicated that ”priming” the network should adjust the legibility via the biases
while still retaining the handwriting style of the priming sample. The one-hot
vector was padded with spaces the length of the sample text, the original text was
13
duplicated, and φ was pre-calculated from the sample sequence before feeding it
into the trained network. All approaches were unsuccessful, although (shown in
Figure 14) copying the original sample’s sequence into the Pytorch tensor as the
priming value before generating the smoothed sequence proved the most
promising.
14
https://scholar.smu.edu/datasciencereview/vol5/iss1/4 14
kimari et al.: Generating and Smoothing Handwriting with LSTM
6 Analysis
An analysis of this project discusses how the LSTM network produces the
handwriting results observed along with the role of statistical prediction and the
structure of the loss function.
15
As with any neural network, the design consists of an input layer, an output
layer, and a series of hidden layers in between. The x terms represent the location
at points in time. Those pass through the hidden layers to the output layer. The
output of a layer, which is a probability distribution of the next point, becomes the
input to the next network iteration. So the output of xn−1 becomes the input of xn.
16
https://scholar.smu.edu/datasciencereview/vol5/iss1/4 16
kimari et al.: Generating and Smoothing Handwriting with LSTM
The hidden layer activations are calculated according to equations 1 (for the
first hidden layer) and 2 (for the nth hidden layer) of the paper. The W terms are
weight matrices. In fact, most of the calculations will be done using linear algebra
vectors and matrices for simplicity and performance. The output is computed
according to equation 3 and 4.
At each iteration, a loss function is needed in order to determine how well the
iteration did and to make adjustments. The loss function is given by equation 5,
which calculates partial derivatives and applies them via back-propagation to the
network for use by the next iteration.
For an LSTM, a special cell is used for the hidden layer activation function,
shown in Figure 16. The cell is a group of equations that interact with each other
to compute the final value of the hidden layer ht. Then, that becomes ht−1 when the
cell is computed the next time through the network. Note that i, f, o, and c
correspond to each one of a function below, and h, also a function below, is
computed from the others. These functions represent equations 6 - 10.
17
into play over many iterations of the network training algorithm), the gradients
(derivatives) computed at each iteration might expand to an exponentially large
value. The solution is to ”clip” the gradients so that they never exceed a certain
value; in this project, they were clipped between -10 and 10.
To aid in understanding, Figure 17 is another diagram of the same thing.
The same functions are noted in the earlier memory cell diagram and in the
equations above: i, f, o, and c along with the hidden output h. Code implemented
from this will compute each piece using the inputs as shown on
18
https://scholar.smu.edu/datasciencereview/vol5/iss1/4 18
kimari et al.: Generating and Smoothing Handwriting with LSTM
series of connected points along the x/y axis. It is expected that the next likely
point will be somewhere in an oval space around the current point. Because it is
an S, the distribution would be skewed a bit to the next part of the S stroke
(because the network was trained, it knows how to draw an S). Figure 18 shows a
visual diagram of what is going on here (minus the S strokes).
The oval represents the possible points, but notice that the oval is composed
of two (i.e. the ”bi” in bi-variate) histograms, which are just the probabilities of x
and of y. The Gaussian mixture is expressed by equations 23 - 25 in the Graves
paper [9].
19
The paper [9] also goes through the derivation of the loss function, which is in
turn used to update the weights of the network in back-propagation, as usual. The
most likely point is then chosen from the distribution and used as the starting
point of the next iteration. The end point, which is the point at which the pen is
lifted, is also computed, and this is how the network knows when to end one
stroke and begin the next.
20
https://scholar.smu.edu/datasciencereview/vol5/iss1/4 20
kimari et al.: Generating and Smoothing Handwriting with LSTM
After LSTM1, the code computes the attention mechanism given by equations
14 - 19. After that, the network computes LSTM2 and LSTM3. Then it is just a
matter of computing equations 20 - 24 using a dense layer.
The algorithm will continue until it reaches the stopping condition. Since
strokes will nearly always take a variable number of points to construct, it is not
a simple matter of enumerating through a for loop. Instead, the algorithm must
compare the computed φ to the collection of previous φs. When the computed φ
becomes greater than any prior value, then the algorithm has reached the end of
the sequence and can stop.
21
22
https://scholar.smu.edu/datasciencereview/vol5/iss1/4 22
kimari et al.: Generating and Smoothing Handwriting with LSTM
Lower biases have less effect on the stylistic improvement, whereas higher
biases result in cleaner handwriting at a cost of reduced stylistic distinctness.
Values closer to 0 had the least effect and closer to 10 had the greatest effect.
In the code, the method smooth handwriting() is used to smooth the supplied
handwriting sequence and to optionally display/save it according to the options
that have been set. The sequence is provided in a data file of the same format
online format as the training data set. Internally, the sample is simply an
instantiation of the Dataset class described previously.
7 Discussion
Compared to other learning disabilities, the symptoms of Dysgraphia may appear
mild; however, it poses a significant life-long challenge [14]. This project
establishes an application with an extendable frame work, published on GitHub.
The application plots handwriting on a Cartesian grid, and then smooths the
writing into a legible style similar to the original.
The application can be integrated into any device that is capable of running
Python scripts. Potential field applications include: note taking for school and
home, auto-filling forms with the person’s writing style, and Dysgraphia
diagnostic tools. Diagnostic tools could help to identify early childhood neural
disorder and provide help during preschool years[1].
Determining which neural networks method are best suited for handwriting
smoothing was difficult. The initial thought was focusing on image (offline)
processing based methods like GAN and Style Transfer[3], this is the area where
the majority of previous work was completed,including the transcription of old
cursive historian records. A significant amount of time was spent on various image
processing models. Dysgraphia handwriting posed a higher level of difficulty due
to the significant slants, as well as uneven spacing and amplitude. Image
processing relies on fairly reasonable spacing to recognize the end of a word and
on comparable amplitude for pixel by pixel processing. The results were mixed
and inconsistent for image based model.The pen stroke position (online) based
LSTM model resulted in useful and the most promising models.
23
8 Future Work
In the course of exploring techniques for this project, a lot of information was
gathered about GANs, or Generative Adversarial Networks. It would be interesting
to explore using GANs to replicate the generative and smoothing work presented
here. Such a network could be utilized to convert handwriting from an image into
the online x, y format needed for smoothing. It would also be interesting to find a
way to minimize stylistic variance when using the bias adjustment technique
while still improving the legibility.
An obvious next step would be to iron out the difficulties this project faced
with priming the handwriting sequence in order to smooth out handwriting. In
fact, that would likely yield positive results quickly since the original paper [9]
was able to get it to work. Thus, it would simply be a matter of replicating that
success in the Python framework.
9 Ethical Implications
Overall, the ethical implications of handwriting recognition and smoothing are
positive. The techniques shown in this paper could be applied in helping children
and adults with Dysgraphia, as well as other disorders that affect handwriting like
Parkinson’s disease or arthritis. This work could also be used in the recreation of
important archaic documents or transcripts that, with time, have become illegible.
Although the negative ethical implications are few, there is the potential for
forgery which could lead to theft, misrepresentation, or mischief. With any new
technology, security measures could be implemented to prevent such cases.
10 Conclusion
Dysgraphia is a disorder that affects the fine motor skills of both children and
adults. This project explored possible ways of using neural networks to help
people suffering from this disorder smooth out their hand writing to help them to i
better communicate. The end product of this project is an application that takes
handwriting data, in the form of a sequential stroke data, as an input and
generates a set of stroke data that would represent new smooth handwriting. The
application can also take in training data sets that can be used to train a new
model. This application will not only help people with Dysgraphia, but could
promote additional research in many other areas of handwriting or hand drawing
applications.
References
[1] K. A. Akinmosin. The effect of poor handwriting on the academic
performance of the gifted learning-disabled students, 2016. URL
24
https://scholar.smu.edu/datasciencereview/vol5/iss1/4 24
kimari et al.: Generating and Smoothing Handwriting with LSTM
https://www.academia.edu/20030419/the_effect_of_poor_handwriting_on
_the_academic_performance_of_the_gifted_learning_disabled_students/
[3] Y.-L. Chen and C.-T. Hsu. Towards deep style transfer: A content-aware
perspective, 2017. URL
https://pdfs.semanticscholar.org/6969/2465952055d6d5702d30f62914c91344
5fd1.pdf.
[12] E. Macias, G. Boquet, J. Serrano, J. Lopez Vicario, J. Ibeas, and A. Morell. Novel
imputing method and deep learning techniques for early prediction of sepsis
in intensive care units. 12 2019. doi: 10.22489/CinC.2019.038.
25
[16] O. Mohammed. Style transfer and extraction for the handwritten letters using
deep learning, 2018. URL https://arxiv.org/pdf/1812.07103.pdf.
[22] J. Roland. What does dysgraphia look like in adults?, 2018. URL
https://www.additudemag.com/dysgraphia-in-adults-recognizing-symptoms-
later-in-life.
26
https://scholar.smu.edu/datasciencereview/vol5/iss1/4 26