Mathfor: The Mathematical Formula Recognition System: Keywords: Graphics Recognition, Technical Drawings, Complete Sys
Mathfor: The Mathematical Formula Recognition System: Keywords: Graphics Recognition, Technical Drawings, Complete Sys
Mathfor: The Mathematical Formula Recognition System: Keywords: Graphics Recognition, Technical Drawings, Complete Sys
Abstract. We describe MathFoR, a system for the recognition of online handwritten mathematical expressions. The system consists of two main components. The rst component is a complete set of Java libraries that handles and recognizes digital ink. The second component is the layout analyzer that translates the recognized ink into a tree structure in XML format, which is transformed to another language using XSLT. We present also example of nal top level applications developed with our system. Keywords: Graphics Recognition, Technical Drawings, Complete Systems.
Introduction
The introduction of devices such as personal digital assistants (PDAs) or Tablet PCs has inuenced a growing interest on developing pen-computing applications during the last years. Such devices use the stylus as input tool, being a substitute for keyboards and mouse and a natural extension of the pen and paper, the most widely used form to collect information. Handwritten information stored in electronic form eases its processing and understanding, in particular, with articial intelligence techniques. Handwriting Recognition (HWR) is an important technology that converts handwriting into text or other data structures that can be automatically processed by computers. An example of these technologies is the stroke alphabet Grati that is the most widely used method for symbol recognition in PDAs. HWR can be also used to develop methods for user-computer interaction and interfaces, by recognizing gestures that indicate the execution of operating system commands. Most of the handwritten information in pen-computing is, however, a twodimensional composite of symbols and drawings that have rich structure, which cannot be simply interpreted and recognized as plain text. In particular, recognition mathematical notation such as formulas, matrices and diagrams is a complex and dicult task that has gained attention among the AI community during the last years. Recognition of such two-dimensional structures have a potential application in scientic document processing, human-computer interaction and mathematical knowledge management, only to mention some areas.
1.1
Existing Architectures
Development of pen-computing applications led vendors and researchers to develop software architectures and to implement libraries that handle digital ink. The next sections give a short overview about such architectures. General-Purpose Architectures Microsoft, for instance, released the Tablet PC version of the Windows operating system that provides a Software Development Kit (SDK) for processing, storing, and recognizing digital ink. These libraries are specialized towards the recognition of letters and words of western languages and East Asian languages, allowing a more natural writing when compared with Grati. One of its most attractive characteristics is that the recognizers are integrated natively in the operating system. Unfortunately, some researchers have found dicult to extend the recognizers and to dene new ones within the SDK. Madhvanath et al. [3] oer, in contrast, a general-purpose open-source toolkit for on-line handwritten recognition. The aim of the LipiTk toolkit is to facilitate the development of new recognizers and their use in real-world applications. The main components that the toolkit provides are a generic library to handle digital ink, two algorithms for symbol recognition, and tools to collect and annotate digital ink, and to train classiers. LipiTk runs in both Linux and Windows, and most of the programs are written in standard C++. The toolkit is a very ambitious project that aims also to dene standard interfaces and communication protocols for exchange of digital-ink among dierent platforms and devices. Specialized Architectures for Mathematical Notation Most of the working systems and prototype programs for the recognition of pen-mathematics follow, in essence, the steps proposed by Lee and Wang [2]. Although his system is specialized for o-line mathematical equations, i.e. expressions in scanned documents, almost all of the procedures can be used as well for the recognition of pen-based mathematics. We have only to substitute the Optical Scanning step for a Ink Input step in their ow diagram. Their procedural framework can be divided in three main modules: 1. Document Segmentation: Mathematical expressions are extracted and isolated from text lines. 2. Symbol Recognition: The label of symbols is established by means of a classier 3. Structural Analysis: The structure of the recognized symbols is analyzed to form a hierarchical structure that represents a mathematical expression. The internal hierarchical structure is processed and interpreted to obtain a nal result. This result can be a character string that represents the expression in LaTeX, another structure used by a computer algebra system, or an image representing the plot of a function given by the expression, among other structures.
Dierences between recognition systems found in the literature are generated by variations and improvements of these modules. Smirnova and Watt [4] propose an architectural framework for pen-based mathematics from the perspective of software engineering. Their target deployment is document processing and mathematical computing. Their objective is to dene a platform-independent pen-based framework for mathematical entry, edition and calculation. These objectives lead them to dene two portability criteria for the framework, which consider the software life and usability given a platform, and the software life across platforms. The modules of their framework which shall vary, depending of the platform, are 1) basic software to collect digital ink, 2) low level processing module to interconnect the components that process digital ink and manipulate mathematical objects, and 3) use of the framework in some hosting application.
MathFoR is a system for the recognition of on-line handwritten mathematical expressions. Even though our system did not intent to be, at least at the very beginning of our research, a general-purpose toolkit for pen-based systems, our experience developing MathFoR led us to regard the whole system in a general concept as represented in Fig. 1(a). MathFoRs current structure aims to build top-level application in the top of two main components: one handling the raw data and the other analyzing the document structure. Such an application is a bridge between the main components. The main components hide a third component that implements the required classication algorithms. It remains, in general but not necessarily, hidden to the top level application. The whole architecture is based on an execution platform that can be the operating system or a virtual machine. The very concrete implementation of the architecture uses the Java virtual machine by Sun Microsystems as execution platform, and Weka library [8] for the classication component, see Fig. 1(b). Java allows users to run the system over a wide variety of operating systems including Windows, Linux, MacOS and Sun OS. Weka contains a collection of machine learning algorithms that can be
included directly in the Java code. We developed a pair of specialized toolkits, Jink and MathFoR, as a concrete implementation of the raw data handling and the structural analysis components. Both are described in the next section.
3
3.1
Processing and interpreting digital stylus input imposes special requirements to raw data representation, visualization and integration into applications. Even though available general-purpose vector graphics frameworks oer a complete infrastructure to deal with vector graphics, they dont comply with the special needs (i.e. runtime guarantees or data representation) when dealing with digital ink. Additionally, most of them dont integrate well and seamlessly to custom projects -not speaking of being lightweight and easy-to-use. Thus, forcing the use of an improper or inconvenient framework, or quick and dirty custom GUI code, constrains and slows down the development on small experimental projects for applications evolving out of current research. In contrast to existing frameworks, JInk is a Digital Ink Toolkit for Java that emerged from the cumulated working practice of the Workgroup for Articial Intelligence on Project MathFoR. The library has been especially designed to defuse the most signicant diculties when dealing with digital ink, such that forthcoming projects can be placed upon a solid foundation. In short: the toolkit keeps all needed runtime guarantees without dropping versatility and application quality look and feel. JInk provides: 1. Straight and uncorrupted live-input recording: Input recording responsiveness and processing speed does not depend on input length, document size or visualization complexity. Stroke insertion and deletion, visibility calculations, and painting are local access operations that have O(1) execution time execution time in practice, no matter how large the document is. 2. Ease-of-use and exibility: JInk consists of 100% Java code and is entirely based on Suns Java2D foundation and the Swing GUI Toolkit J2SDK 1.3 and up or J2SDK 1.5 when using generics. JInk can be integrated into every Java AWT or Swing-based application or applet. All GUI classes consequently follow the Swing design patterns, such that every developer who is used to Swing-API can work with JInk without reading a manual. JInk is lightweight less than 100k binary package size and does not introduce any additional dependencies to heavyweight frameworks. 3. Especially designed for concurrent and deferred processing: Most applications of digital ink interpretation perform complex computations on recorded input concurrently or distributed among (remote) workers in order not to harm user interface responsiveness. This practice usually introduces many data synchronization diculties and pitfalls. By using JInk, the raw data exchange with concurrent or distributed workers is inherently thread-safe
without any use of thread synchronization, and without handing the need for thread synchronization to the application-level code: The data representation virtually mimics the copy-on-write paradigm that is used by Java Strings. Modications on ink strokes do not alter strokes itself, because a virtual copy is created instead. Therefore, referenced data cannot lose integrity. The JInk framework itself can be divided into two main packages: The core package contains everything needed for the actual digital ink data representation. The central component Ink encapsulates the raw data and also provides functions for most common modication operations. All those operations are well designed and stick to guaranteed execution cost boundaries. Additionally, a sequence of modications is executed at the moment of access. Such collected operations are aggregated and, if possible, conated. This minimizes the overall execution cost of data batch processing and also optimizes the overall memory footprint and data storage or transfer volumes, since only the executed operations need to be stored or transferred instead of the modied data. Finally, the Ink component fully integrates into Javas GUI API and can be directly used as a drawing primitive in Suns Java2D. The Editor package contains GUI components that can be assembled to an end-user ink editor. The editor displays and edits arbitrary large ink documents, and provides familiar input and drawing methods, including true rubber erase, selection and transformation and unconstrained undo and redo. 3.2 Structural Analysis Toolkit: MathFoR
The layout analysis of mathematical notation operates on recognized symbols, acting independently from the original raw input, because we do not relay directly on the stylus information. This approach assumes that the raw data is represented as node elements containing information about their identity and spatial location within the abstract document. The layout analyzer takes a list of nodes and, depending on their spatial relationships, constructs a baseline structured tree that represents the mathematical expression, see Fig g:mathfor.
Symbols MathFoR denes symbols and their relationships in a XML conguration le that consists of three main parts: symbol classes, orientation and the symbols themselves. A symbol class denes a grammar of the sensitive regions of a symbol and the type of relation between these regions. In our standard grammar, for instance, we expect that an element of the variable class has superscripted or subscripted components. An element of the left parenthesis class has neither a superscripted component nor a subscripted one. The orientation part denes some spatial regions of symbols. This information contains the ascent and descent thresholds of the symbols and the left and right border space
Fig. 2. A conmutative diagram recognized by MathFoR. The baseline structure tree is showed at the left of the component.
in the bounding box. Finally, the symbol part lists concrete symbols associated with a class and orientation and associate ambiguities which can be resolved by MathFoR. The information about ambiguities helps the system to distinguish, for example, between a capital sigma and the sum operator. The conguration le allows the use of dierent symbols sets and grammars under a common layout analyzer. Users can change symbol sets and grammars without modifying the MathFoR core classes or their program code. Especially the orientations part of the symbols depends on the user and the input method: If one uses an OCR system for the recognition of symbols, one does not concern about the irregularities in writing; the conguration can be more restrictive, increasing the recognition rate in this case.
Recognition Recognition of mathematical expressions uses some algorithms developed previously by our work group [6, 5, 7]. The algorithms use a Minimum Spanning Tree (MST) construction, using the symbols as nodes of a totally connected graph, combined with information about the typical usage of the symbols, as given in the aforementioned conguration le. Mathematical structures with tabular lay-
out require, however, special treatment, which led us to develop a new algorithm to recognize such structures. Mathematical structures with tabular layout, like matrices or commutative diagrams, require special assumptions. One assumption about matrices is that they occur only between parentheses. Therefore, when MathFoR recognizes an open and closing bracket pair, it switches to matrix mode and tries to recognize the matrix, assuming that the matrix components are mainly cells aligned along a grid. Commutative diagrams consist on an abstract level of cells and arrows, where each cell is connected to another by at least one arrow. This representation is inspired by several packages for LaTeX that commonly uses a matrix format to describe commutative diagrams. The recognition of matrices uses projection methods to nd their cells, and the recognition of diagrams is a two-step method that uses an optimization criterion [9]. The rst step constructs an initial grouping of the raw symbols in the diagram, using the arrow ends in combination with the MST approach, see Fig. 3(a)(b). By using the MST, each symbol is assigned to a start, end point or to a label of an arrow. Symbols that are connected to a start or end point are grouped with their neighbors to construct an initial group of cells in the diagram. The second step nds the nal cells and the grid structure of the diagram. To simplify this task, we assume that the cells are aligned in a grid structure, and the rows and columns of the grid are parallel to the screen coordinate system. These assumptions allow us to dene a cost function for a given grid, transforming our layout analysis into a minimization problem. We found in our experiments that a simple hill climbing algorithm is sucient to nd, in most cases, the correct diagram. See Fig. 3(c)-(d). The cost function is a linear combination of several numerical features of the grid. At rst, we consider the number of rows and the height of the highest row. The minimization of these features ensures that the rows are compact, and that there are not too many rows in the grid. The next features are row overlapping and distance between rows, which help to overcome some irregularities in the input. If the overlapping increases, then the grid cost decreases. The minimal overlapping of groups in the same row is used to split a row, even when the two rows are closed to each other. If the distance between two adjacent rows is too small, then they merge. Analogous features are used to create and destroy columns within the grid. Output The output is a structure tree which can be used by a Java program directly, using the classes dened in MathFoR. The structure tree has also a representation as XML document, which allows the user to transform the tree into a custom format using XSLT. At time, we use a set of XSLT transformations to convert the structure tree into mathematical expressions in LaTeX or Mathematica format. The transformations dened in the XSLT le translate the names of the symbols into a proper representation and deal with the quirks of the selected format. In LaTeX, for example, it is necessary to create two objects \left( and a \right., when the original text only has an opening parenthesis.
Fig. 3. (a) The intial grouping collect symbols that belong to the same cell in dierent goups, which leads to (b) an incorrect conversion of the diagram. (c) After otimization the nal grid is found that leads to (d) the correct interpretation of the diagram.
Queries on the Structure Tree Accessing and extracting nodes in the structure tree were quite simple operations in the rst versions of MathFoR, because we accessed only direct child nodes and discarded the nodes that we dont needed. Such operations became, however, a complicated and confusing task during the recognition of commutative diagrams, because we not only accessed direct children but also grandchildren using some special constraints. For that reason, we decided to implement a query mechanism that allows us to build up a query on the structure tree in advance, and to execute this query late as many times as we need. This reduces the code complexity and improves the readability.
JEdit Plug-in As a demonstrator of the MathFoR system, we developed a JEdit plug-in that embeds the recognizer in an editor used to write LaTeX manuscripts. Figure 4(a)
shows a view of the LaTeX editor in the background, when the handwriting recognizer plug-in has been started. The idea is that when a person is working with LaTeX and needs to enter a complex formula, he starts the plug-in by pressing a function key on the keyboard. The function key opens a window where the user can draw the formula using the mouse or a digitizing tablet connected to the computer. The digital ink is then translated into LaTeX code that the editor immediately incorporates into the document. Here we are thinking of procient LaTeX users who usually enter small expressions directly using the keyboard. Thus, the handwriting recognition plug-in will be called only for larger formulas, where handwriting input is more convenient. The advantage of using a standard LaTeX editor for embedding the handwriting recognizer is that information about the variables and processed symbols is immediately available. The special syntax of LaTeX tells the system exactly which variables are being used. In a future research, the LaTeX editor will allow us to use context information for on-line improvement of symbol recognition, which can be accomplished more easily as when only considering the digital ink document without the context. WebFoR Another demonstrator for the MathFoR system and specically for the capabilities of the JInk library is WebFoR, a web-based version of the JEdit LaTeX Plug-In, see Fig. 4(b). The WebFoR application design splits the MathFoR system into a notepad applet integrated into a website, which is executed within the client-side web browser. A server-side ink recognition daemon delegates all incoming recognition requests among a set of workers. The client-side notepad allows a user to input a formula as done in the JEdit plug-in. All Input and modications are transmitted to the ink recognition daemon and evaluated. The result is then shown within the website.
Fig. 5. (a) Main view of the classication wizard. (b) The classier dialog.
This split design keeps the client-side user interface responsive and small enough for a website, while server-side daemon encapsulates the heavyweight symbol classier and the formula recognition system. In that way, a full-scale recognition system can be used, since there is no need to cut down the recognition component in order to t into an applet. Character Classication Wizard The main purpose of the Character Classication Wizard is to oer a comfortable and easy way to enter and manage sets of handwritten symbols, which will be used as training samples to build new classiers. The classication wizard is based on the JInk classes, see Fig. 5. One can use the wizard to create projects that organize the training samples into user-dened alphabets. Such a structure makes possible to select a certain subset of the whole database to train a classier. These alphabets contain a set of isolated characters that are managed internally as an instance of the InkGruop class each. Selected characters are used to train default and custom classiers. The wizard oers two default classiers, k-nearest neighbors and multilayer perceptron, which are included in the Weka library. We have chosen the Weka library because it already implements a wide variety of classication algorithms and is freely available. Custom self-written classiers can be created by only implementing the Classier interface of Weka. The output of the training process is an instance of the InkClassier class. This is just a convenience class that contains the trained Weka-classier. InkClassier encapsulates the digital-ink preprocessing and feature extraction algorithms needed as input for the Weka classier. The default preprocessing settings
can be adjusted in the training dialog or, again, custom preprocessing algorithms can be implemented by using the given Preprocessor interface. Object serialization is used to store an InkClassier abject. This eases the inclusion of the classier in some external application that tryes to classify a new InkGroup without worrying about a reimplementation of preprocessing and data conversion.
UNIPEN Reader and Viewer To make sure that classiers deliver the best possible results, it is necessary to train them with as much data as possible. For this reason we use the UNIPEN database [1], the de-facto database of on-line handwriting for benchmarking classiers. This database contains a collection of online handwritten symbols, collected by 40 dierent institutions and companies around the world. The UNIPEN data format is very exible and allows the user to store a broad variety of symbols, since the writer has the freedom to choose the hardware and the organisation of the data into metadata les. A typical UNIPEN data le consists on sets of strokes organized as segments that can be handwritten paragraphs, word or characters. The le can contain several .INCLUDE codeword that indicate the path to the actual stroke data that dene the segments. The main function of the viewer is, of course, the visualization of the UNIPEN data, and its conversion into an own internal format. The viewer uses the metadata le to collect the information needed to nd the right coordinate sequence for a symbol in the database. The coordinates are normalized and used to create instances of the InkGroup object that are used by our classication wizard or other algorithms. See Fig. 6.
Status
The rst version of the MathFoR system was developed between 2004 and 2005. Some modication and extensions were made in 2006 to include the recognition of matrices. This version of the system was included as an intelligent tool within the Electronic Chalkboard (E-Chalk) [6]. The treatment of digital ink became a toolkit as its own and has been developed separately since the beginning of 2007. That allowed the rewriting of the layout analysis libraries from scratch, also including the separate denition of layout parameters using XML les and the use of XSLT to convert the nal recognition into several formats. The whole set of libraries has been developed by some researches in our work group and, currently, by hard-working students of our university. The current version has been used only internally to develop the tools and applications described in the last section. We plan to release a beta version of the libraries at the beginning of October under some open source license. Students of our university will use our library during the next semester in the pattern recognition courses and other specialized seminars, related with recognition of on-online handwriting and pen-based interfaces.
MathFoR uses the Java Virtual Machine as execution platform, taking as advantage that the virtual machine runs on a wide variety of operating systems. The main components of the system are the libraries JInk and MathFoR. They handle digital ink and analyze the spatial relations between the recognized symbols. A top level application is a bridge between both packages. It is also a concrete implementation of several abstract classes and interfaces dened in JInk and MathFoR. JInk is a library based completely in the Component Architecture of the Java AWT. For this reason, experimented Java programmers can easily integrate digital ink documents into GUI Applications. Digital ink is dened as an interface that has a concrete implementation as a GUI Component Object, oering all the advantages of the components patterns dened in Java AWT: handling mouse events, drawing, serialization, ane transformations, etc. The library also oers an editor for digital ink, that can be easily extended for the developer to include own processing algorithms and recognizers. The structural analysis is based on previous research in our work group. A new library for the structural analysis has been implemented from scratch, offering a better ordering and organization of the classes that describe the data used by the algorithms. The XML representation of the data and the recognized structures allows a exible the translation, via XSLT, of the recognized expressions into several formatting and programming languages. We have currently translators into the LaTeX language. Our libraries have been growing horizontally depending on the needs of the system. We have on-going projects for storing and retrieval of digital ink
data, annotation of digital ink, word recognition and communication using clientserver architecture for the recognition of digital ink.
References
1. I. Guyon, L. Schomaker, R. Plamondon, M. Liberman, and S. Janet. UNIPEN Project of On-Line Data Exchange and Recognizer Benchmarks. Pattern Recognition, 1994. Vol. 2-Conference B: Proceedings of the 12th IAPR International Conference on Computer Vision & Image Processing, 2, 1994. 2. H.J. Lee and J.S. Wang. Design of a mathematical expression understanding system. Pattern Recognition Letters, 18(3):289298, 1997. 3. S. Madhvanath, D. Vijayasenan, and T.M. Kadiresan. LipiTk: A Generic Toolkit for Online Handwriting Recognition. In Proceedings of the tenth International Workshop on Frontiers in Handwriting Recognition (IWFHR-10), 2006. 4. E. Smirnova and S. Watt. A context for pen-based mathematical computing. Proceedings of the 2005 Maple Summer Conference, 2005. 5. E. Tapia and R. Rojas. Recognition of On-Line Handwritten Mathematical Expressions using a Minimum Spanning Tree Construction and Symbol Dominance. Fifth IAPR International Workshop on Graphics Recognition (GREC), 2003. 6. E. Tapia and R. Rojas. Recognition of On-Line Handwritten Mathematical Formulas in the E-Chalk System. Proceedings of the Seventh International Conference on Document Analysis and Recognition (ICDAR), 2003. 7. E. Tapia and R. Rojas. Recognition of On-Line Handwritten Mathematical Formulas in the E-Chalk System - An Extension. Proceedings of the Eighth International Conference on Document Analysis and Recognition (ICDAR), 2005. 8. I.H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, 1999. 9. M. Ye, H. Sutanto, S. Raghupathy, C. Li, and M. Shilman. Grouping Text Lines in Freeform Handwritten Notes. Proceedings of the Eighth International Conference on Document Analysis and Recognition (ICDAR), 2005.