Line-of-Sight Stroke Graphs and Parzen Shape Context Features For Handwritten Math Formula Representation and Symbol Segmentation

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

2016 15th International Conference on Frontiers in Handwriting Recognition

Line-of-Sight Stroke Graphs and Parzen Shape Context Features


for Handwritten Math Formula Representation and Symbol Segmentation

Lei Hu Richard Zanibbi


Department of Computer Science Department of Computer Science
Rochester Institute of Technology Rochester Institute of Technology
Rochester, USA Rochester, USA
[email protected] [email protected]

Abstract—This paper presents a new representation for extraneous edges as possible (i.e., high precision). We want
handwritten math formulae: a Line-of-Sight (LOS) graph to be able to express the correct expression, but we also want
over handwritten strokes, computed using stroke convex hulls. few additional edges so that we increase the likelihood of
Experimental results using the CROHME 2012 and 2014
datasets show that LOS graphs capture the visual structure producing the correct interpretation.
of handwritten formulae better than commonly used graphs In this paper, we propose Line-of-Sight (LOS) stroke
such as Time-series, Minimum Spanning Trees, and k-Nearest graphs for representing handwritten formulae written online,
Neighbor graphs. We then introduce a shape context-based and introduce a symbol segmentation technique using LOS
feature (Parzen window Shape Contexts (PSC)) which is com- graphs and new Parzen window-modified Shape Context
bined with simple geometric features and the distance in time
between strokes to obtain state-of-the-art symbol segmentation features (PSC). We first present existing stroke graph repre-
results (92.43% F-measure for CROHME 2014). This result sentations in Section II, and then algorithms for constructing
is obtained using a simple method, without use of OCR or LOS stroke graphs in Section III. In Section IV we compare
an expression grammar. A binary random forest classifier different graph representations using the CROHME compe-
identifies which LOS graph edges represent stroke pairs that tition benchmarks [2], and find that LOS graphs are able to
should be merged into symbols, with connected components
over merged strokes defining symbols. Line-of-Sight graphs represent the most expressions correctly, while still having
and Parzen Shape Contexts represent visual structure well, a reasonable Precision. In Section V we present our LOS-
and might be usefully applied to other notations. based segmenter using PSC features, which obtains state-of-
Keywords-Line-of-Sight graph, symbol segmentation, hand- the-art symbol segmentation results for the CROHME 2014
written math recognition, shape contexts data set. We then conclude and identify directions for future
work in Section VI.
I. I NTRODUCTION
Math expressions are an essential part of scientific com- II. H ANDWRITTEN S TROKE G RAPH R EPRESENTATIONS
munication. Recognizing handwritten expressions written on In this Section we briefly introduce stroke-level graph
tablets and other touch-sensitive devices would be helpful in representations used to parse formulae written online. Nodes
document editing, mathematics education applications, and represent individual strokes, while edges represent possible
search engines that support mathematical notation in queries. relationships between strokes, such as identifying strokes
In this paper, we are interested in recognizing Symbol Lay- belonging to the same symbol, and identifying spatial rela-
out Trees (SLTs) for expressions, which represent expression tionships between symbols such as right-adjacency (R) or
appearance by a set of symbols with spatial relationships superscript (Sup). Details regarding using stroke graphs to
between them (e.g., Right-adjacent, Subscript, Above [1]). represent formula appearance may be found elsewhere [2].
SLTs represent information similar to LATEX formulae, but The graph types below define edge subsets for the com-
without formatting information. plete graph with n2 = n(n−1)/2 undirected edges between
Recognizing handwritten formulae requires three main all stroke pairs. The motivation to use more compact graphs
tasks: symbol segmentation, symbol recognition and struc- is to reduce the number of irrelevant edges, making both
tural analysis. While often implicit in the literature [1], all training and parsing more efficient and accurate. However,
tasks require a graph-based representation for handwritten pruning stroke pairs can reduce the space of representable
strokes in the expression. These stroke graphs constrain expressions, as we will later show in Section IV.
stroke and symbol relationships considered while searching Time-series. Time-series graphs representing the se-
for the best interpretation of a formula. An ideal stroke graph quence in which strokes are written are common [3], [4].
contains enough edges to represent all spatial relationships Many current systems that parse formula using a modified
and symbols (i.e., perfect recall), while containing as few Cocke-Younger-Kasami (CYK) algorithm consider strokes

2167-6445/16 $31.00 © 2016 IEEE 180


DOI 10.1109/ICFHR.2016.41
in time order [2]. A Time-series graph can be represented
using an undirected edge between each stroke and its
successive stroke (except for the final stroke). Time-series
graphs are unable to directly represent formulae with de-
layed strokes (e.g., the dot for an ‘i’ written after writing
other symbols) or non-local relationships. These graphs are
compact; as sequences they are a restricted form of tree,
with n − 1 undirected edges for n strokes.
k-NN Graph. Others such as Eto and Suzuki [5] have
used k-Nearest Neighbor graphs (k-NN). In a k-NN graph,
there is an undirected edge from each stroke to each of its Figure 1: Line-of-Sight (LOS) Graph for a Handwritten
k nearest neighboring strokes. This allows strokes that are Expression. Small square nodes represent bounding box
nearby in space but not necessarily time to be related in centers for eleven handwritten strokes. Edges represent
the graph (e.g., the dot of an ‘i’ written after a delay). k- mutually ‘visible’ strokes. Two strokes share an edge if an
NN graphs are less compact than Time-series, with O(kn) unobstructed line can be drawn from one stroke’s bounding
edges. Smaller values of k may produce a disconnected box center to a point on the convex hull for the other stroke
graph, splitting a formula into two or more sub-expressions. (see Algorithm 1)
Minimum Spanning Tree (MST). Matsakis [6] uses
a stroke graph where edges are defined by the Minimum
Algorithm 1 Line-of-Sight (LOS) Graph Construction
Spanning Tree (MST) over strokes, based on the distance
Input: S, the set of handwritten strokes for an expression
between stroke bounding boxes. The MST is more compact
than k-NN graphs, with n − 1 undirected edges, and guaran- let E = ∅ be an empty edge set
tees that the graph is connected. A limitation is that edges for each stroke s ∈ S with b.box center sc = (x0 , y0 ), do
for relationships between non-neighboring strokes may be let unblocked angle interval set U = {[0, 2π]} radians
absent. for t ∈ S − s, by increasing distance from s, do
let θmin = +∞, θmax = −∞
Delaunay Triangulation Hirata et al. employ Delaunay for each node n = (xh , yh ) in the convex hull of t do
Triangulation (DT) for stroke graphs [7]. A DT for a set of let vector w = n − sc = (xh − x0 , yh − y0 )
2D points S is a triangulation where no point in p is inside let angle θ between w and horizontal h = (1, 0) be
the circumcircle of any triangle [8]. It is the dual structure ⎧  
⎨arccos w·h
if yh ≥ y0
of the Voronoi diagram. Each point in the triangulation ||w|| ||h||
 
θ=
is adjacent to all vertices (strokes) in attached triangles. ⎩2π − arccos w·h
if yh < y0
||w|| ||h||
Like MSTs, Delaunay Triangulations guarantee a connected
let θmin = min(θmin , θ), θmax = max(θmax , θ)
graph, but have more edges: 3n − 3 − h, where h is the
interval h = [θmin , θmax ]
let hull 
number of points of the convex hull. This allows more if V = u∈U u − h contains a non-empty interval
relationships to be represented in a DT than an MST. let E= E ∪ (s, t) ∪ (t, s) (sc ‘sees’ t)
In the next Section we propose using a new graph let U = (u − v0 − . . . − vn ), V = {v0 , . . . , vn }
u∈U
representation, Line-of-Sight graphs.
return E (LOS stroke edges)
III. L INE - OF -S IGHT (LOS) G RAPHS OVER S TROKES
We came to consider Line-of-Sight graphs after creating
k-NN graphs with large values of k, and finding that covered by strokes. Strokes other than s are sorted by the
expressions with large exponents were having edges between smallest distance between two sample points: one point is
horizontally adjacent symbols pruned (e.g., for k = 2, in from s, while the other point is from the other stroke. To test
x2a + 1, the x and + do not share an edge). However, an their visibility, strokes are represented by their convex hull
unobstructed line could often be drawn between strokes with [8]: the smallest convex polygon containing sample points
relationships pruned by k-NN in cases like these. A Line- in the stroke. While ‘looking’ at each stroke t from sc , if we
of-Sight graph [8] is a visibility graph defining which nodes find an unobstructed angle range between sc and the convex
can ‘see’ one another. For our stroke graphs, the LOS graphs hull for t, we define an undirected edge between s and the
define edges for strokes that can be ‘seen’ from the bounding other stroke (as two directed edges). We then remove the
box center of another stroke, or vice versa. An example LOS visible angle range for t from the set of unblocked angle
stroke graph is shown in Figure 1. intervals U . Figure 2 illustrates how visible and blocked
Algorithm 1 constructs LOS graphs from handwritten angles are defined using convex hulls.
stroke set S. For a given stroke s ∈ S, we consider whether Algorithm 2 recovers missing labeled edges in stroke
other strokes are visible by incrementally blocking angles graph representations, including LOS, to make sure that

181
(a) Handwritten strokes (b) Time-series edges

(d) Recovered SLT


(c) Edge types
(‘z = 4’)
Figure 3: Recovering Edge Labels with Algorithm 2. Edges
in Time-series graph (b) are assigned corresponding labels
Figure 2: Line-of-Sight Illustration (adapted from [8]). from Ground Truth in (c). (d) is then produced from (c)
‘Looking’ from point p, the purple and blue arcs represent by requiring strokes in a symbol to have the same spatial
the angle ranges blocked by Polygons A and B, ignoring relationship with strokes in other symbols. The Time-series
other polygons in the scene (h in Algorithm 1). The green graph in (a) represents the same SLT as in Ground Truth
arc represents the visible range of Polygon B (V ), taking
into account the portion blocked by Polygon C. The red
lines are sight lines to vertices blocked by Polygon C. uses Algorithm 2 in this way.
Figure 3 illustrates recovering an SLT from a Time-
Algorithm 2 Assigning Labels to an Undirected Stroke series graph for expression z = 4, using the corresponding
Graph. Label *: merge strokes into symbol; Label : un- complete ground truth graph with labels for all strokes and
defined stroke pairs (identical to Figure 3(d)). In Figure 3(c) and
Inputs: (d), there are two directed * (merge) edges between the two
G = (S, E) an undirected and unlabeled stroke graph lines in the equals sign.
Gt = (S, Et , λSt , λEt ) a complete labeled stroke graph with
same node set S (strokes), label functions λSt and λEt From a different perspective, Algorithm 2 can test whether
a stroke graph has sufficient edges to represent labeled edges
let λS = λSt (label strokes) in a ground truth SLT stroke graph. We use this to test the
let φ : S → 2S maps strokes to stroke sets for symbols expressivity (coverage) of different stroke graph types in the
Initially, φ(s) = {s}, ∀s ∈ S next Section.
let Ec = E ∩ Et be the common edges in G and Gt
for e = (s1 , s2 ) ∈ Ec do
λE (e) = λEt (e) (label edge) IV. GRAPH COVERAGE EXPERIMENTS
if λE (e) = ∗ (define symbols)
let φ(s1 ) = φ(s1 ) ∪ φ(s2 ), φ(s2 ) = φ(s1 ) Datasets and Performance Metrics. Our experiments
for (si , sj ) ∈ φ(s1 ) × φ(s1 ) where si = sj do test the expressivity (coverage) of LOS and other stroke
let λE (si , sj ) = λE (sj , si ) = ∗ graph representations. We use data from the Competition on
(refine relationships) Recognition of On-line Handwritten Mathematical Expres-
for e = (s1 , s2 ) ∈ Ec do sions (CROHME) [2]. CROHME 2012 has 1336 training
if λE (e) is labeled with a relationship (i.e., not * or ) and 486 test expressions; CROHME 2014 contains more
for so ∈ φ(s1 ) where so = s1 do
let λE (so , s2 ) = λE (s1 , s2 )
structurally complex formulae, and is larger with 8834
training and 986 test expressions.
return (λS , λE ) (stroke and edge labels for G)
For each test expression, all graph representations are
passed along with Ground Truth to Algorithm 2. Perfor-
mance metrics are then computed using the resulting graph.
edge labels represent a valid SLT. Connected components of We compare SLT graph coverage using four metrics. First,
merge-labeled stroke edges are converted into cliques (i.e., the number of expressions that can be correctly represented:
all pairs of strokes in a symbol share a merge-labeled edge), if all labeled ground truth edges and recovered, the expres-
and all strokes in a symbol are given the same relation- sion is represented correctly (as in Figure 3). At the level
ship with strokes in other symbols. The algorithm ignores of directed edges, we also consider the Recall for labeled
conflicting undefined ( ) labels and assumes no conflicting edges, Precision of selected edges, and F-score.
| Labeled Recovered Edges |
relationship labels, which is valid for well-defined SLTs. 1) Recall = | Labeled Ground Truth Edges |
Note that if we have a labeled stroke graph, we can pass
| Labeled Recovered Edges |
it in both inputs for Algorithm 2 to insure that the SLT 2) Precision = | Graph Edges |
representation is consistent, e.g., for recognition results [9].
In fact, the symbol segmenter described later in this paper 3) F-score = 2 × Recall×Precision
Recall+Precision

182
Graph Construction and Distance Metrics. MST, k- Table I: Coverage Comparison for Stroke Graph Types.
NN and LOS require distances between stroke pairs for Percentage of representable CROHME expressions (SLTs)
construction. We consider three Euclidean distances: are shown along with metrics for directed stroke edges
1) AC: distance between two strokes’ averaged CROHME 2012 Test (486 Expressions)
center/center-of-mass (AC), the mean of x and Stroke Pair Edges
y coordinates for the set of stroke sample points. Expr. (%) Recall Precision F-score
Complete 100.00 1.000 0.087 0.159
2) BBC: distance between bounding box centers (BBC). LOS 98.56 0.999 0.309 0.472
3) CPP: the smallest distance between two sample points 6-NN 89.92 0.994 0.286 0.444
from each of the strokes, the Closest Point Pair. Delaunay 76.75 0.977 0.388 0.555
MST 36.21 0.882 0.921 0.901
We choose the distance metric and stroke representation Time-series 31.28 0.878 0.917 0.897
based on model selection experiments [9]. AC works best for 2-NN 24.90 0.879 0.708 0.784
MST, while CPP distance works best for k-NN. For k-NN,
CROHME 2014 Test (986 Expressions)
2-NN achieves the highest F-score, while 6-NN achieves the Stroke Pair Edges
highest precision for k-NN with a recall higher than 99% [9]. Expr. (%) Recall Precision F-score
For Delaunay Triangulation, we do not require a stroke Complete 100.00 1.000 0.091 0.167
LOS 98.99 0.999 0.297 0.458
pair distance, but instead need to a point to represent strokes, 6-NN 95.81 0.994 0.283 0.441
for which we try the center-of-mass (AC) or bounding box Delaunay 79.11 0.973 0.391 0.558
center (BBC); AC worked best. 2-NN 44.93 0.885 0.685 0.773
MST 42.39 0.875 0.899 0.887
Results. Table I shows results for the CROHME 2012 Time-series 40.77 0.868 0.891 0.879
and 2014 test sets. All graph types have stable edge metrics Complete: all stroke pairs, LOS: Line-of-Sight
across the datasets. For complete graphs containing directed 2/6-NN: k-nearest neighbor, MST: Minimum Spanning Tree
edges between all stroke pairs, Recall is perfect, but over Delaunay: Delaunay Triangulation
90% of the directed edges are irrelevant (Precision < 10%).
The highest Precision and F-scores are obtained for MST, Table II: Number of Missing Edges for Line-of-Sight
suggesting these edges frequently belong to the SLT. Time- Graphs. CROHME 2012/2014 Test correspond to Table I
series graphs have the second-highest precision and F-score
CROHME Total * R Sub Sup Above Below Inside
values. For expressions on a single baseline, the MST and 2012 Train 14 14
Time-series graphs may be identical. This high Precision is 2012 Test 13 10 3
2014 Train 358 2 235 65 24 16 16
partly cause by having fewer edges than the other types, 2014 Test 22 22
which also leads to very low expression coverage (< 45%). * merge (into symbol)
For k-NN, as one expects edge Recall increases with
k, while Precision decreases more rapidly. 2-NN obtains
the lowest expression coverage for CROHME 2012, while V. S EGMENTATION USING LOS G RAPHS AND PARZEN
6-NN obtains the second-highest expression rate for both S HAPE C ONTEXTS (PSC)
data sets. Using k ≥ 6 mostly decreases Precision and F- In this Section we present a new technique for segmenting
score. Delaunay obtains the next-highest expression rate. handwritten symbols using LOS graphs and modified Shape
While Delaunay has higher Precision and F-score through Context features [11]. Our segmentation algorithm is simple,
producing fewer edges than 6-NN and LOS, its expression using a classifier to identify which directed LOS stroke
coverage is 13%-14% lower than 6-NN. pair edges correspond to strokes that should be merged. We
LOS obtains the highest expression coverage (> 98.5%) will now briefly review work on handwritten math symbol
with a slightly higher edge Precision than 6-NN of roughly segmentation, provide a description of the segmentation
30%. LOS misses fewer than 0.1% of labeled ground truth
edges. Table II summarizes missing edges in the LOS results.
Most missing edges are ‘Right’ relationships. Almost all
edges with a merge label can be covered by the Line-of-
(a) Sight lines for leftmost ‘A’ and comma blocked
Sight graph, as related strokes are often close to and can
(bottom-left of ‘A’ and comma can see one another)
‘see’ one another. In Figure 4, we see missing edges caused
by using a single ‘eye’ at the stroke bounding box center,
or completely blocked sight lines.
The nearly perfect Recall for merge edges in LOS graphs
provides a strong foundation for graph-based symbol seg- (b) Sight line between leftmost ‘y’ and the second
mentation, which we discuss in the next Section. Work on ’+’ blocked by subscript and exponent y’s
parsing with LOS graphs using visual features may be found Figure 4: Examples of Missing Right-Adjacency (R) Edges
in a companion paper [10]. in Line-of-Sight graphs

183
algorithm and features, and present segmentation results for width/height aspect ratio of the formula. In generating the
the CROHME 2012 and 2014 benchmarks. image we interpolate ten points between each consecutive
Related Work. There have been many graph-based seg- pair of sample points, and remove any duplicates.
mentation methods for online handwritten formulae [3], Segmentation Algorithm. Our algorithm is fairly simple:
[6], [12]. Toyozumi et al. [12] use a candidate character 1) Construct a stroke LOS graph using Algorithm 1
lattice method, where the closest distance between points 2) Use a binary classifier to classify all directed edges as
on two strokes along with language constraints are used to * (merge into symbol) or ‘ ’ (undefined)
determine whether strokes should be merged. Matsakis [6] 3) Define symbols by converting connected components
proposes a minimum spanning tree (MST) approach, where for * labels into cliques; pass the graph from Step 2
each node in the MST represents a stroke, with distances as both inputs for Algorithm 2
defined by the Euclidean distance between stroke bounding We create a random forest for classification in Step 2,
box centers. A limitation is that this technique only considers using the Python scikit-learn library [14]. We use 129
connected subtrees in the MST for partitioning. features, which are described below. These include the
Other methods include Smithies et al. [4] progressive seg- distance in time between strokes (‘time gap,’ e.g., the first
mentation method, which assumes that symbols are written and third handwritten strokes have a time gap of two),
one-at-a-time. After four strokes are written, the segmenter Parzen window-modified Shape Context features (PSC) and
generates all possible groupings. Strokes from the highest geometric features.
confidence candidate symbol are removed. The process Parzen Shape Context Features (PSC). A Shape Con-
then repeats after another four strokes have been written. text characterizes the relative position and density of points
Kosmala et al. [13] propose a segmentation method based (pixels) in an image around a given point using a log-polar
on Hidden Markov Models (HMM). Discrete left-to-right histogram [11]. Shape Contexts have been widely used in
HMMs without skips and with differing numbers of states computer vision for shape matching and classification, as
are used. A space model is also introduced to represent these local representations of appearance and context are
spaces between symbols. Many recent techniques perform often globally discriminative.
segmentation as a sub-routine while parsing handwritten In our work, we use Shape Contexts to characterize the
strokes using an expression grammar, e.g., using a modified density of points (pixels) in an expression image around
Cocke-Younger-Kasami (CYK) algorithm [2]. two strokes being considered for merging. First, we pro-
Hu and Zanibbi [3] classify pairs of strokes in time order duce smoother probability distributions with Parzen window
as merge/split, i.e., using the Time-series graph for strokes. estimation, using a 2D gaussian kernel. Second, the shape
An AdaBoost classifier with multi-scale shape contexts and context region is divided into bins using uniform angles and
symbol classification confidence features is used. In this distances from the center of the histogram. Previously, it was
paper we extend this work, but use random forests applied found that features using equal rather than the conventional
to Line-of-Sight graphs, do not use classification features, log-polar bin distances perform better when classifying
and improve the shape context features. spatial relationships in formulae [15]. The center of the
Stroke Preprocessing and Image Generation. To reduce PSC is the average of the two stroke bounding box centers.
the effects of sample noise, writing jitter and differences in The radius of the shape context includes the strokes being
resolution between expressions, we preprocess strokes and compared. Note that for distant strokes, the polar histogram
render the expression as a binary image. Preprocessing con- may cover the entire expression.
tains four steps: duplicate point filtering, size normalization, We use three separate Parzen window Shape Contexts
smoothing and resampling. when classifying directed LOS edges as ‘merge’ or ‘split.’
We first delete duplicate points which have the same We use a separate PSC for each stroke, and then a third PSC
(x, y) coordinate as the previous point, because they are for other strokes in the neighborhood of the two strokes.
uninformative. To reduce the influence of writing velocity We do this to improve discrimination by clearly separating
and differences in the coordinate range and resolution for point sources. We confirmed empirically that using multiple
different stroke recording devices, we normalize y coor- histograms is beneficial. Figure 5 shows an example of
dinate values to be in the interval [0, 1], while preserving Parzen Shape Context features for classifying a directed
the width-height aspect ratio for x coordinates. To reduce LOS edge. In this example, we consider an edge from the
noise caused by stylus/finger jitter, we smooth all strokes. vertical stroke of ‘+’ (the parent of the edge) to a nearby
For each stroke, with the exception of the first and last ‘1’ stroke (the child of the edge). Red represents points
points, we replace the coordinate of each point by the from the parent stroke, green points from the child stroke,
average of the current, previous, and next coordinate. Finally, and blue points from other strokes in the histogram. Color
we use linear interpolation to resample the expression and intensity represents the density of each bin. Note that during
render it as a binary image. For the image we use a fixed segmentation, the reverse edge from the ‘1’ to the vertical
height of 200 pixels, and then set the width to preserve the stroke of the ‘+’ would also be considered.

184
PSC features produce a simplified image of the region search and cross validation with random forest classifiers
around a pair of strokes, but with the point sources (parent, to determine the parameters of the PSC features [9]. We
child, and other strokes) clearly separated. Polar histograms choose six angles and five distances for the polar histograms
have higher resolution near their center, which we hoped (30 bins), and a Parzen window width of 15 of the shape
would be beneficial. However, 2D histograms, convolution context radius. The shape radius itself is 1.5 times the
masks or other abstracted/compressed image representations longest distance between stroke points to the center of the
might be used to similar or better effect. histogram. We also used the CROHME training data to
create our random forest merge/split classifier [9]. We use
a random forest with 50 trees, with maximum depth 40 for
each decision tree. The Gini criterion √was used for splitting.
There are n = 129 features, and n features (11) are
selected to define candidate splits at each decision tree node.
Experimental Results. The classification rate for merging
or splitting stroke pairs is quite high: we obtain 98.26%
Expression with PSC center and perimeter shown for CROHME 2012, and 97.88% for the CROHME 2014.
Table III shows our segmenter obtaining the second-highest
reported symbol recall for CROHME 2012 (94.87%), and
CROHME 2014 (92.41%), while obtaining the highest F-
score for CROHME 2014 (92.43%).
Note that we obtain these results without using OCR or an
expression grammar. Many of the systems shown are parser-
Parent stroke Child stroke Other strokes
driven, and use classification and relationship constraints
Figure 5: Example Parzen Shape Contexts (PSCs). A di- (i.e., context) to refine segmentation. We believe that lan-
rected edge from the vertical line in ‘+’ to the ‘1’ is guage constraints would improve our results substantially.
considered. Each PSC has 120 bins, with the PSC radius
reaching the furthest parent or child stroke point (pixel). In Table III: Symbol Segmentation Metrics for CROHME 2012
experiments we use only 30 bins (5 distances × 6 angles), Test (only Recall reported [19]) and CROHME 2014 Test
with a radius 1.5 times the distance to the furthest parent or
CROHME 2012 Test (486 Expressions)
child stroke point, capturing more context from other strokes Symbol
Recall (%) Precision (%) F-score (%)
MacLean et al. [20] 95.56
Geometric Features. We also use geometric features LOS + PSC 94.87 94.56 94.72
from previous work on classifying relationships between Alvaro [21] 91.95
stroke pairs, including horizontal distance, size difference Awal et al. [22] 87.75
Hu et al. [23] 87.51
and vertical offset [16]; minimum point distance [12]; over- Simistira et al. 71.21
lapping area [17]; minimum distance, horizontal overlapping Celik et al. [24] 59.20
of the bounding box, distance and offset between stroke
CROHME 2014 Test (986 Expressions)
start and end points, and finally backward movement and
Symbol
parallelity [18]. Parallelity is the angle between two vectors Recall (%) Precision (%) F-score (%)
representing strokes, with the vectors defined by the first and Alvaro [25] 93.31 90.72 92.00
last points of each stroke. LOS + PSC 92.41 92.45 92.43
Awal et al. [26] 89.43 86.13 87.75
We also add some additional geometric features. These Yao and Wang [27] 88.23 84.20 86.17
include the distance between bounding box centers, dis- Hu et al. [3] 85.52 86.09 85.80
tance between centers-of-mass, maximal point pair distance Le et al. [28] 83.05 85.36 84.19
Aguilar [29] 76.63 80.28 78.41
(two points are from different strokes of the stroke pair),
LOS + PSC: Line-of-Sight Graph using Parzen Shape Contexts
horizontal offset between the last point of the first stroke w. Random Forest Classifier
and the starting point of the second stroke, vertical distance
between bounding box centers, writing slope (angle between
the horizontal and the line connecting the last point of the VI. C ONCLUSION
current stroke and the first point of the next stroke) and We propose a Line-of-Sight (LOS) stroke graph that is
writing curvature (angle between the lines defined by the first able to represent more formulae than Time-series, Mini-
and last points of each stroke). We normalize all geometric mum Spanning Tree, Delaunay and k-NN graphs. For the
features to lie in the interval [0, 1] except for parallelity, CROHME 2012 and 2014 Test sets, LOS graphs omit fewer
writing slope and writing curvature. than 0.1% of necessary directed stroke pair edges, with a
Training. Using CROHME training data, We used greedy Precision of roughly 30%. We have used LOS graphs to

185
create a symbol segmenter making use of Parzen window- [13] A. Kosmala and G. Rigoll, “On-line handwritten formula
modified Shape Context features (PSC) that obtains state- recognition using statistical methods,” in Proc. ICPR, Aug.
of-the-art results for the CROHME 2014 Test set (92.43% 1998, pp. 1306–1308.
[14] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel,
F-measure) without using OCR or expression grammars. In B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss,
other work, LOS graphs have been used to obtain surpris- V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau,
ingly strong results for parsing handwritten formulae using M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn:
primarily visual features [10]. Machine learning in Python,” J. Machine Learning Research,
Avenues for future work include exploring modified ver- vol. 12, pp. 2825–2830, 2011.
[15] F. Alvaro and R. Zanibbi, “A shape-based layout descriptor
sions of LOS graphs (e.g., relaxing the notion of ‘visibility’ for classifying spatial relationships in handwritten math,” in
by allowing strokes to be partially transparent), exploring ACM DocEng, Sep. 2013, pp. 123–126.
new graphs and combinations of graph types, incorporating [16] Y. Shi, H. Li, and F. Soong, “A unified framework for symbol
classification and language constraints with our segmenter, segmentation and recognition of handwritten mathematical
and improving Parzen Shape Context features. expressions,” in Proc. ICDAR, Sep. 2007, pp. 854–858.
[17] S. MacLean and G. Labahn, “A new approach for recognizing
ACKNOWLEDGMENT handwritten mathematics using relational grammars and fuzzy
sets,” IJDAR, vol. 16, no. 2, pp. 139–163, 2013.
This material is based upon work supported by the [18] S. Lehmberg, H.-J. Winkler, and M. Lang, “A soft-decision
National Science foundation under Grant No. IIS-1016815. approach for symbol segmentation within handwritten math-
We thank Francisco Álvaro for providing code to convert ematical expressions,” in International Conference on Acous-
CROHME stroke data to images. tics, Speech, and Signal Processing, May 1996, pp. 3434–
3437.
R EFERENCES [19] H. Mouchère, C. Viard-Gaudin, D. H. Kim, J. H. Kim, and
U. Garain, “ICFHR 2012 competition on recognition of on-
[1] R. Zanibbi and D. Blostein, “Recognition and retrieval of line mathematical expressions (CROHME 2012),” in Proc.
mathematical expressions,” IJDAR, vol. 15, no. 4, pp. 331– ICFHR, Sep. 2012, pp. 811–816.
357, 2012. [20] S. MacLean and G. Labahn, “A bayesian model for recogniz-
[2] H. Mouchère, R. Zanibbi, U. Garain, and C. Viard-Gaudin, ing handwritten mathematical expressions,” Pattern Recogni-
“Advancing the state of the art for handwritten math recogni- tion, vol. 48, no. 8, pp. 2433–2445, 2015.
tion: the crohme competitions, 2011–2014,” IJDAR, pp. 1–17, [21] F. Alvaro, J.-A. Sanchez, and J. Benedi, “Recognition
2016. of printed mathematical expressions using two-dimensional
[3] L. Hu and R. Zanibbi, “Segmenting handwritten math sym- stochastic context-free grammars,” in Proc. ICDAR, Sept.
bols using adaboost and multi-scale shape context features,” 2011, pp. 1225 –1229.
in Proc. ICDAR, Aug. 2013, pp. 1212–1216. [22] A.-M. Awal, H. Mouchere, and C. Viard-Gaudin, “Towards
[4] S. Smithies, K. Novins, and J. Arvo, “A handwriting-based handwritten mathematical expression recognition,” in Proc.
equation editor,” in International Conference on Graphics ICDAR, 2009, pp. 1046–1050.
Interface, 1999, pp. 84–91. [23] L. Hu, K. Hart, R. Pospesel, and R. Zanibbi, “Baseline
[5] Y. Eto and M. Suzuki, “Mathematical formula recognition extraction-driven parsing of handwritten mathematical expres-
using virtual link network,” in Proc. ICDAR, Sep. 2001, pp. sions,” in Proc. ICPR, Nov. 2012, pp. 326–330.
762–767. [24] M. Celik and B. Yanikoglu, “Probabilistic mathematical for-
[6] N. Matsakis, “Recognition of handwritten mathematical ex- mula recognition using a 2d context-free graph grammar,” in
pressions,” Master’s thesis, Massachusetts Institute of Tech- Proc. ICDAR, Sep. 2011, pp. 161–166.
nology, Cambridge, MA, May 1999. [25] F. Alvaro, J. Sanchez, and J. Benedi, “Recognition of on-
[7] N. S. T. Hirata and W. Y. Honda, “Automatic labeling of line handwritten mathematical expressions using 2d stochastic
handwritten mathematical symbols via expression matching,” context-free grammars and hidden markov models,” Pattern
in International Conference on Graph-based Representations Recognition Letters, vol. 35, pp. 58–67, 2014.
in Pattern Recognition, May 2011, pp. 295–304. [26] A. Awal, H. Mouchère, and C. Viard-Gaudin, “A global
[8] M. d. Berg, O. Cheong, M. v. Kreveld, and M. Over- learning approach for an online handwritten mathematical
mars, Computational Geometry: Algorithms and Applications, expression recognition system,” Pattern Recognition Letters,
3rd ed. Springer-Verlag TELOS, 2008. vol. 35, pp. 68–77, 2014.
[9] L. Hu, “Features and algorithms for visual parsing of [27] H. Mouchère, C. Viard-Gaudin, R. Zanibbi, and U. Garain,
handwritten mathematical expressions,” Ph.D. dissertation, “ICFHR 2014 competition on recognition of on-line hand-
Rochester Institute of Technology, 2016. written mathematical expressions (CROHME 2014),” in Proc.
[10] L. Hu and R. Zanibbi, “MST-based visual parsing of on- ICFHR, Sep. 2014, pp. 791–796.
line handwritten mathematical expressions,” in Proc. ICFHR, [28] A. D. Le, T. V. Phan, and M. Nakagawa, “A system for
2016. recognizing online handwritten mathematical expressions and
[11] S. Belongie, J. Malik, and J. Puzicha, “Shape matching and improvement of structure analysis,” in Proc. DAS, Apr. 2014,
object recognition using shape contexts,” TPAMI, vol. 24, pp. 51–55.
no. 4, pp. 509–522, 2002. [29] F. JulcaAguilar, N. Hirata, C. ViardGaudin, H. Mouchere, and
[12] K. Toyozumi, N. Yamada, T. Kitasaka, K. Mori, Y. Suenaga, S. Medjkoune, “Mathematical symbol hypothesis recognition
K. Mase, and T. Takahashi, “A study of symbol segmentation with rejection option,” in Proc. ICFHR, Sep. 2014, pp. 500–
method for handwritten mathematical formula recognition 505.
using mathematical structure information,” in Proc. ICPR,
Aug. 2004, pp. 630–633.

186

You might also like