An Information-Processing Account of Representation Change: International Mathematical Olympiad Problems Are Hard Not Only For Humans
An Information-Processing Account of Representation Change: International Mathematical Olympiad Problems Are Hard Not Only For Humans
International Mathematical Olympiad Problems are Hard not only for Humans
Takuya Matsuzaki ([email protected])
Nagoya University, Furo-cho, Chikusa-ku, Nagoya, 464-8603, JAPAN
Suppose that x and y are real numbers. Find the range of a such Sentence Processing 𝐴𝐵𝐶 is a right triangle with ∠𝐴𝐵𝐶 = 90°.
that there exists x satisfying x2 + ax + 1 < 0. Zero Anaphora
The length of the hypotenuse (of 𝜙) is 3.
Detection 𝑑𝑖𝑎𝑔𝑜𝑛𝑎𝑙 𝑜𝑓 𝑅
∃x ∈ R(x2 + ax + 1 < 0) ⇔ a2 − 4 > 0 𝑁/𝑃𝑃𝑜𝑓 𝑃𝑃𝑜𝑓
Syntactic 𝑎 : 𝜆𝑦𝜆𝑥. 𝑑𝑖𝑎𝑔(𝑥, 𝑦) :𝑅
⇔ a < −2 ∨ a > 2 Parsing 𝑇 ∖ (𝑇 Τ𝑁𝑃)/𝑁 𝑁: 𝜆𝑥. 𝑑𝑖𝑎𝑔(𝑥, 𝑅)
𝑖𝑠 : 𝜆𝑁𝜆𝑃𝜆𝑦.
റ ∃𝑥(𝑁𝑥 ∧ 𝑃𝑥 𝑦)
റ
Semantic 𝑆 ∖ 𝑁𝑃/𝑁𝑃 𝑇 ∖ (𝑇 Τ𝑁𝑃): 𝜆𝑃𝜆𝑦.
റ ∃𝑥(𝑑𝑖𝑎𝑔 𝑥, 𝑅 ∧ 𝑃𝑥 𝑦) റ
𝐴𝐵 : 𝜆𝑥𝜆𝑦. (𝑦 = 𝑥)
Composition 𝑁𝑃 𝑆 ∖ 𝑁𝑃: 𝜆𝑦. ∃𝑥(𝑑𝑖𝑎𝑔 𝑥, 𝑅 ∧ 𝑦 = 𝑥)
: 𝑠𝑒𝑔(𝐴, 𝐵)
Figure 1: Problem solving and quantifier-elimination Discourse Processing
S: ∃𝑥(𝑑𝑖𝑎𝑔 𝑥, 𝑅 ∧ 𝑠𝑒𝑔 𝐴, 𝐵 = 𝑥)
Secondary
Preliminaries Representation Representation
Change
Let us first redefine what we mean by “mathematical problem
Search / Reasoning
solving.” A math problem is usually expressed as a combina- Failure in search / reasoning
Constraint Satisfaction
tion of sentences, formulas, and figures. In principle, it can be
Quantifier Elimination
expressed as a logical formula in a theory. A theory consists
Theorem proving ∀𝑥 ∈ 𝑅 𝑥 2 + 𝑎𝑥 + 1 > 0
of a set of symbols called a language and a set of axioms. ↓
A language consists of constants, variables, relations, func- Success −2 < 𝑎 < 2
tions, and logical symbols. Constants and variables are terms, Solution
and also f (t1 , . . . ,tn ) is a term if f is an n-ary function symbol
and all ti s are terms. R(t1 , . . . ,tn ) is an atomic formula if R is
an n-ary relation symbol and all ti s are terms. For example, Figure 2: End-to-end problem solving model
2x + 1 = y and x > y + z are atomic formulas in arithmetic.
In the first-order mathematical logic, formulas are defined re-
cursively from atomic formulas and logical symbols. In clas- Specifically, we say that a problem is proved when we show
sical logic, we have seven connectives: ∧ (and), ∨ (or), ¬ that a given problem is equivalent to True.
(not), → (implies), ↔ (if and only if), ∀ (for all), and ∃ (there
exists). The last two connectives, ∀ and ∃, are called quan- A theory is called decidable when there is an algorithm
tifiers. When a variable is quantified, it is called a bound to determine whether any sentence is true or not. Gödel’s
variable. For example, the variable x is bound in the formula incompleteness theorem shows that any theory containing
∃x( f (a) = x) though a remains free (not bound). A formula Peano Arithmetic is undecidable.
containing no free variable is called a sentence. A formula Propositional logic, RCF, and Presburger Arithmetic are
containing no quantifier is called quantifier-free. The set of rare exceptions that are known to be decidable. However,
seven connectives are known to be complete in a sense that computational complexity of the decision procedures is quite
any mathematical assertion can be expressed as a first-order high. The theoretical lower bound of the decision procedure
formula provided that an appropriate language and a set of for propositional logic is superpolynomial to the size of in-
axioms are given. put formulas assuming that P̸=NP, and those for RCF and
A mathematical problem is solved when we find a formal Presburger Arithmetic are doubly exponential (Tarski, 1951;
procedure to show the problem is equivalent to a quantifier Fischer & Rabin, 1974). These lower bounds reflect the phe-
free formula of the simplest form. Fig. 1 gives examples. nomena of search space explosion.
An End-to-end Math Problem Solving Model those interpretable in a local theory, such as RCF and propo-
Fig. 2 presents an overview of our problem solving model. sitional logic. The former is usually a routine procedure for
It consists of three modules. The perception module trans- a person with the necessary math knowledge. The latter re-
lates a problem into a primary representation expressed in quires a target theory to be chosen beforehand. In the experi-
ZF by language processing. The formulation module trans- ments, we chose RCF as the target local theory and confirmed
forms the primary representation to another formula in ZF that many pre-university math problems can be mechanically
that is interpretable in a local theory such as RCF. Finally, the reformulated in RCF. This suggests that, once an appropriate
search/reasoning module works on the secondary representa- local theory is chosen, the reformulation can be modeled as a
tion. Once a failure is detected in the reasoning, the process heuristic search that seeks a formula in the local theory that
backtracks to the formulation module and seeks another prob- is equivalent to the primary representation.
lem representation that makes the reasoning easier. The rest What is missing in our prototype implementation is a
of this section provides more details on the three modules. mechanism to choose an appropriate local theory. Our hy-
pothesis is that it is the key ability in the representation
Perception Module change, which truly requires ‘insight.’ Our information-
processing model thus serves as a test bed for computational
The perception module is organized along a hierarchy in nat-
models of insight problem solving by plugging-in a theory
ural language: words, sentences, and discourses (i.e., se-
choice model to it. The rest of this section summarizes the
quences of sentences). The lexical processing unit identi-
implementation of the two reformulation steps and elucidates
fies the parts-of-speech and other syntactic properties of the
the contribution of the problem solving model.
words and math formulas in a problem. Since math formulas
have their own grammar, they are analyzed by a specialized
parser. Fig. 2 provides an example in which the same for- Higher-order to first-order transformation The primary
mula y = x2 has different syntactic roles, noun phrase (NP) representation often includes higher-order elements (λ-
and embedded sentence (S), in accordance with its context. abstractions), which denotes functions (e.g., λx.x2 ) and con-
The sentence processing unit translates each sentence in ditions (e.g., λy.(|y| < 1)). They are necessary to translate
the problem into a formal representation. We assume a the natural language expressions such as “The function that
grammar-driven translation model here, which composes the maps x ∈ R to x2 ” and “The absolute value of y is less than
semantic representation of a sentence along its syntactic 1. The same condition also applies to x.” We eliminate such
structure (Carpenter, 1997; Heim & Kratzer, 1998). Specifi- higher-order elements to obtain a first-order formula. In the
cally, we developed a Japanese grammar in the formalism of current implementation, this is done by iteratively applying a
Combinatory Categorial Grammar (CCG) (Steedman, 2001). handful of transformation rules such as β-reduction and vari-
Fig. 2 depicts the process of semantic composition with CCG able elimination by substitution (∃x(x = α ∧ φ(x)) ⇔ φ(α)).
for the sentence “AB is a diagonal of R.”
We need to detect omissions (zero pronouns) in the text Reformulation in RCF In the prototype implementation, a
before the semantic composition. Our current implementa- primary representation is rewritten into the language of RCF.
tion detects them using a list of words and their syntactic The first-order language of RCF consists of polynomial equa-
arguments (i.e., case frames). Fig. 2 provides an example tions and inequalities, logical connectives, and quantifiers.
where an omission (“of φ,” where φ, a zero pronoun, stands We developed a set of axioms that define various math con-
for something) is detected as the argument of ‘hypotenuse.’ cepts in the (higher-order) language of RCF, such as:
The discourse processing unit combines the sentence-level
semantic representations into a single formula. We adopt ∀x∀ f (minimize(x, f ) ↔ ∀x′ ( f (x) ≤ f (x′ ))).
the discourse representation theory (Kamp & Reyle, 1993)
as the basic mechanism of the inter-sentential composition. The primary representation is iteratively rewritten with these
Fig. 2 depicts an example where the semantic representations axioms until an equivalent formula is found in the first-order
of three sentences are combined into one with the two con- language of RCF. There is no theoretical guarantee that such a
nectives ∧ and →, and two universal quantifications (∀m∀n). formula will be eventually found even when it exists. We em-
The discourse processing unit also determines the antecedents pirically examined how often it succeeds in the experiments.
of the anaphoric expressions including zero pronouns.
Where in the process does insight come? The vocabulary
Formulation Module of a problem usually tells us in which theory it should be
The formulation module receives a primary problem repre- solved. However, this is not always the case. For instance,
sentation and transforms it into a secondary representation the wording in the mutilated draughtboard problem does not
that is amenable to reasoning. The process consists of two suggest it should be formulated in arithmetic but not in propo-
steps. One is the transformation of the higher-order formu- sitional logic. Human solvers thus usually start by searching
las produced by the perception module to first-order formulas for the solution in propositional logic, putting dominoes on
in ZF. The other is the transformation of the ZF formulas to the board in trial-and-error manner. It is inevitable to change
Table 1: Subject areas of the benchmark problems Table 2: Overall benchmark results
Ex Univ IMO Solved Failed
Algebra 0 10 21 Problem Solved Time (sec) FM TO WR
Linear Algebra 14 62 0 Source (%) min/med/avg/max (%) (%) (%)
Geometry 81 65 94 Ex 75.2 1 / 4 / 20 / 1069 7.9 13.9 3.0
Pre-calculus 0 75 0 Univ 65.3 1 / 7 / 38 / 1061 7.3 22.9 4.5
Calculus 6 33 0 IMO 26.1 2 / 10 / 56 / 513 10.4 60.0 3.5
total 101 245 115