The GF Mathematical Grammar Library: From Openmath To Natural Languages
The GF Mathematical Grammar Library: From Openmath To Natural Languages
1 Introduction
Since 2005 we have been developing a software library for rendering, reading, and translating
mathematical expressions, either expressed using formal languages such as OpenMath or LATEX,
or in a number of natural languages. The work begun with the WebALT [1] project as a way to
serve mathematical exercises in the native language of the student: in fact the library can be
used to generate natural language descriptions of formally encoded mathematical expressions
with no loss of meaning. The applications of this technology, coming from the area of grammar-
based machine translation are related to the possibility of parsing and generating high quality
representations of mathematics.
In this short paper we concentrate on few technical details that made the work interesting
from the linguistic point of view. Therefore we introduce the computational linguistic software
used as backbone to the work, called Grammatical Framework, and proceed with the presenta-
tion of the mathematical library, its organization and modular design. We then discuss some
examples that required careful thought.
1
The Mathematical Grammar Library Caprotti and Saludes
the absolute value of the sum of x and y is less than the sum of the absolute value
of x and the absolute value of y
As mentioned above, the abstract tree is not far from the OpenMath expression. The
linguistic function mkProp wraps the wording produced by the subexpressions. In terms of
computational linguistic technology, this approach differs from the standard statistical based
approaches, such as Google Translate, in that it can generate high quality translations for
arbitrarly deep nesting of subexpressions, as opposed to being limited by n-grams distance.
The number of categories on a GF application is a trade-off between how much ambiguity
is tolerable and the expressiveness of the whole system. The defined categoriesin the MGL
are Value X, and Variable X where X is a Number, a Function, a Set or a Tensor (namely
vectors or matrices). The actual version of the library implements these by defining a fixed
category for each combination {Variable, Value} × {Number, Set, Function, Tensor}. Thus, for
instance, VarNum = Variable Number and ValSet = Value Set. Other categories stand for
propositions, geometric constructions and indexes.
Each abstract category corresponds to a linguistic category in a concrete grammar of a
specific language. Usually a Value points to a noun phrase and a Variable to a string. More
complex expressions, those combining categories, correspond in a natural way to linguistic
entities composed from these elements: propositions are mapped into clauses with grammatical
polarity, operations to sentences and simple exercises to texts.
2. OpenMath: modeled after the following Content Dictionaries, considered useful for
expressing the mathematical fragments at the time of the WebALT project:
2 Notice the special form of the conjunction “x e y”: The usual Spanish conjunction “y” must be changed for
euphony before a vowel that sounds alike. It is automatically taken care by the GF Spanish resource grammar.
2
The Mathematical Grammar Library Caprotti and Saludes
3 Linguistic peculiarities
Some interesting points on the implementation are related to language specifics. For example,
the simple exercise that asks for computing a numeric value 3 :
DoComputeN ComputeV ( determinant ( Var2Tensor M ) )
gives in English:
Compute the determinant of M .
This pattern is shared in most of the languages, so it got abstracted into an incomplete concrete
grammar file OperationsI. From this module, one can get OperationsL for language L simply
by specifiying the lexicon and paradigms modules for this L, in a similar way a function is applied
to its arguments. But in French is impolite to use an imperative in this case; Therefore the
module OperationsFre should re-implement this production in a specific way.
Another point worth mentioning is function application. Notice the different forms:
• “the cosine of 3”
• “f at 3”
• “the derivative of the sine at 3”
• “x to the cosine of x where x is 3”
They are all mathematically equivalent but differ in structure: in the first case, the function
being applied is a named symbol (the cosine) while in the last one is a λ-abstraction. In the
other cases, it is a function variable or it comes from a functional operator.
3 determinant belongs to the OpenMath layer of the library and Var2Tensor makes a value out of a variable.
DoComputeN denotes an exercise asking to compute a number, while ComputeV gives finer control on which verb
to use to denote computation (‘to compute’ in this case).
3
The Mathematical Grammar Library Caprotti and Saludes
OpenMath GF
Symbol name in CD name in module CD
Integer n n converted to Value from predefined type Int in
module Literals
Variable name name in category Variable X
Application of a on b ab
Binding λz app lambda z app, where z is a Variable and app a Value.
Attribution, Error, Bytearray Not supported
Table 1: Some equivalences between the OpenMath standard and grammar library
References
[1] http://webalt.math.helsinki.fi/content/index_eng.html Last viewed June 2012.
[2] http://www.sagemath.org/ Last viewed May 2012.
[3] A. Ranta, “Grammatical Framework: programming with Multilingual Grammars,” CSLI
Studies in Computational Linguistics, Standford, 2011.
[4] J. H. Davenport, “A small OpenMath type system,” ACM SIGSAM Bulletin, vol. 34,
no. 2, pp. 16–21, Jun. 2000.
[5] OpenMath Consortium, “The OpenMath Standard,” OpenMath Deliverable, vol. 1, 2000.
[6] The MOLTO project. http://www.molto-project.eu/ Last viewed May 2012.
[7] The mathematical library svn://molto-project.eu/mgl.
[8] J. Saludes et al., “Simple drill grammar library,” http://www.molto-project.eu/sites/
default/files/d61.pdf. Last viewed May 2012.
[9] Grammatical framework demos. http://www.grammaticalframework.org/demos/index.
html. Last viewed May 2012.
[10] Mohan Ganesalingam, “The Language of Mathematics.” PhD thesis, Cambridge Univer-
sity, 2009.
[11] A. Ranta, “Translating between language and logic: what is easy and what is difficult.”
Automated Deduction, CADE-23, 2011.
[12] Dominique Archambault, Olga Caprotti, Aarne Ranta and Jordi Saludes, “Using GF
in multimodal assistants for mathematics.” Digitization and E-Inclusion in Mathematics
and Science 2012, Tokyo, Japan.