0% found this document useful (0 votes)
102 views6 pages

An Information-Processing Account of Representation Change: International Mathematical Olympiad Problems Are Hard Not Only For Humans

The document presents a new information-processing model for mathematical problem solving that incorporates representation change theory. It divides the problem representation process into translating problem texts into formulas in Zermelo-Fraenkel set theory and then interpreting those formulas in local mathematical theories. This allows representation change to be implemented as choosing an appropriate interpretation. The document develops a prototype system using real closed fields theory and benchmark problems to suggest this model can quantitatively study representation change by how well the system solves problems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
102 views6 pages

An Information-Processing Account of Representation Change: International Mathematical Olympiad Problems Are Hard Not Only For Humans

The document presents a new information-processing model for mathematical problem solving that incorporates representation change theory. It divides the problem representation process into translating problem texts into formulas in Zermelo-Fraenkel set theory and then interpreting those formulas in local mathematical theories. This allows representation change to be implemented as choosing an appropriate interpretation. The document develops a prototype system using real closed fields theory and benchmark problems to suggest this model can quantitatively study representation change by how well the system solves problems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

An Information-Processing Account of Representation Change:

International Mathematical Olympiad Problems are Hard not only for Humans
Takuya Matsuzaki ([email protected])
Nagoya University, Furo-cho, Chikusa-ku, Nagoya, 464-8603, JAPAN

Munehiro Kobayashi ([email protected])


University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8571, JAPAN

Noriko H. Arai ([email protected])


National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, 101-8634, JAPAN

Abstract formalized as proof search in ZF. However, we cannot expect


the search to be terminated in a realistic time since the search
In this paper, we present a new information-processing model
of math problem solving in which representation change the- space of ZF is too vast. On the other hand, the representation
ory can be implemented. Specifically, we divided the problem change account also has some downsides. It does not provide
representation process into two. One is to straightforwardly any process model, and the analysis remains qualitative but
translate problem texts into formulas in a conservative exten-
sion of Zermelo-Fraenkel’s set theory, and the other is to in- not quantitative (MacGregor et al., 2001).
terpret the translated formulas in local mathematical theories. In this paper, we present a new information-processing
A ZF formula has several interpretations, and representation
change is thus implementable as a choice of an appropriate in- model that enables us to include the representation change
terpretation. Adopting the theory of real closed fields as an ex- account. On the basis of the flow chart of insight problem
ample of local theory and its quantifier elimination algorithms solving (Öllinger et al., 2014), we first specify the perceptual
as an approximate process of searching for solutions, we de-
velop a prototype system. We use more than 400 problems process as translation of a given problem into a formula in ZF.
from three sources as benchmarks: exercise books, univer- We extend the language of ZF so that the translation is kept
sity entrance examination, and the International Mathematical as straightforward as possible. In other words, we assume
Olympiad problems. Our experimental results suggest that our
model can serve as a basis of a quantitative study on represen- that the perceptual process requires no insight but rather cor-
tation change in the sense that the performance of our proto- responds to natural language and image processing. This is
type system reflects difficulties of the problems quite precisely. worth mentioning since the inputs of the existing information-
Keywords: problem solving; information-processing model; processing account are usually not obtainable without insight
insight; representation change
regardless of the theories in which the problems are repre-
sented (Newell & Simon, 1972; Chou, 1988; Kerber & Pol-
Introduction
let, 2006). The obtained ZF formula is considered to be the
Some math problems are much more difficult than others to primary problem representation. There are usually many pos-
solve even though they do not require higher levels of mathe- sible interpretations of the primary problem representation in
matical knowledge or techniques. Nine dot problem and mu- different mathematical local theories. For example, the mu-
tilated draughtboard problem are examples of such problems. tilated draughtboard problem can be embedded to not only
Where does the difficulty come from? propositional logic but also Peano Arithmetic and Presburger
In classical information-processing models, the difficulty Arithmetic. The possible interpretations of the primary rep-
of a given problem is explained by its computational com- resentation are called secondary representation.
plexity: the cost of search (Kaplan & Simon, 1990; Mac-
Gregor, Ormerod, & Chronicle, 2001). In contrast, Gestalts We take the theory of real closed field (RCF) as an ex-
explain the phenomena by the term insights (Isaak & Just, ample of local theories and implement an interpretation pro-
1995; Ohlsson, 1992). A problem is called an insight prob- cess from the primary to secondary representation. We adopt
lem when solving it requires a key feature of the problem to a quantifier elimination (QE) algorithm as an approximate
be recognized or restructured (representation change). process of searching for solutions (Iwane, Yanami, & Anai,
One of the major criticisms of classical information- 2014) and develop a prototype system to solve geometry and
processing account is that it has no mechanism to implement introductory calculus problems.
representation change since problem solving is understood We manually formalize more than 400 math problems
as a search within a well-defined problem space (Öllinger, from three different sources in our extended ZF language
Jones, & Knoblich, 2014). If one tries to enlarge the frame- as a benchmark. The problems are translated so that they
work (theory) of the problem to implement representation can be obtainable automatically from the problem text us-
change inside it, then search space explosion is almost always ing state-of-the-art natural language processing theories and
inevitable. For example, it is a well-known fact that almost techniques (Kamp & Reyle, 1993; Steedman, 2001; Zettle-
all the mathematical activities can be formalized in Zermelo- moyer & Collins, 2005). One source of the problems is exer-
Fraenkel’s set theory (ZF), thus representation change can be cise books, another is university entrance examinations, and
  Problem

Suppose that x and y are real numbers. Find the range of a


satisfying x2 + ax + 1 > 0 for all x. Perception
the graph of 𝑦 = 𝑥 2  the graph of 𝜆𝑥. 𝑥 2 /NP
Lexical Processing let 𝑦 = 𝑥 2  let 𝑝𝑟𝑜𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛(𝑦 = 𝑥 2 )/S
∀x ∈ R(x + ax + 1 > 0)
2
⇔ a −4 < 0
2
Formula Parsing
⇔ −2 < a < 2 the perimeter of circle O
POS tagging  the/DT perimeter/NN of/PP circle/NN O/PN

Suppose that x and y are real numbers. Find the range of a such Sentence Processing 𝐴𝐵𝐶 is a right triangle with ∠𝐴𝐵𝐶 = 90°.
that there exists x satisfying x2 + ax + 1 < 0. Zero Anaphora
The length of the hypotenuse (of 𝜙) is 3.

Detection 𝑑𝑖𝑎𝑔𝑜𝑛𝑎𝑙 𝑜𝑓 𝑅
∃x ∈ R(x2 + ax + 1 < 0) ⇔ a2 − 4 > 0 𝑁/𝑃𝑃𝑜𝑓 𝑃𝑃𝑜𝑓
Syntactic 𝑎 : 𝜆𝑦𝜆𝑥. 𝑑𝑖𝑎𝑔(𝑥, 𝑦) :𝑅
⇔ a < −2 ∨ a > 2 Parsing 𝑇 ∖ (𝑇 Τ𝑁𝑃)/𝑁 𝑁: 𝜆𝑥. 𝑑𝑖𝑎𝑔(𝑥, 𝑅)

 
𝑖𝑠 : 𝜆𝑁𝜆𝑃𝜆𝑦.
റ ∃𝑥(𝑁𝑥 ∧ 𝑃𝑥 𝑦)

Semantic 𝑆 ∖ 𝑁𝑃/𝑁𝑃 𝑇 ∖ (𝑇 Τ𝑁𝑃): 𝜆𝑃𝜆𝑦.
റ ∃𝑥(𝑑𝑖𝑎𝑔 𝑥, 𝑅 ∧ 𝑃𝑥 𝑦) റ
𝐴𝐵 : 𝜆𝑥𝜆𝑦. (𝑦 = 𝑥)
Composition 𝑁𝑃 𝑆 ∖ 𝑁𝑃: 𝜆𝑦. ∃𝑥(𝑑𝑖𝑎𝑔 𝑥, 𝑅 ∧ 𝑦 = 𝑥)
: 𝑠𝑒𝑔(𝐴, 𝐵)
Figure 1: Problem solving and quantifier-elimination Discourse Processing
S: ∃𝑥(𝑑𝑖𝑎𝑔 𝑥, 𝑅 ∧ 𝑠𝑒𝑔 𝐴, 𝐵 = 𝑥)

Coreference 𝐴𝐵𝐶 is a right triangle with ∠𝐴𝐵𝐶 = 90°.


Resolution
The length of the hypotenuse (of 𝜙) is 3.
the other is the International Mathematical Olympiad (IMO). Discourse
Structure Analysis
Though all the problems require mathematical knowledge Let 𝑚, 𝑛 be natural numbers = 𝑚, 𝑛 ∈ 𝑁
Assume that 𝑚 = 𝑛2 = 𝑚 = 𝑛2
and techniques no higher than high-school level, they have Primary Prove that (𝑚 − 𝑛) is even = 𝑒𝑣𝑒𝑛(𝑚 − 𝑛)
different levels of difficulty. We naturally assume that more  ∀𝑚∀𝑛((𝑚, 𝑛 ∈ 𝑁 ∧ 𝑚 = 𝑛2 ) → 𝑒𝑣𝑒𝑛 𝑚 − 𝑛 )
Representation
insight problems can be found in the IMO than in the other (When 𝑛 pigeons sit in 𝑛-1 holes, some hole
two because IMO problems are known to be solvable by only Formulation contains more than one pigeon)
𝑃 = 𝑛 ∧ 𝐻 = 𝑛−1 ∧ 𝑆 ⊂ 𝑃 × 𝐻 ∧ 𝜋1 𝑆 = 𝑃
a few mathematically talented students. The highlight of our HOLFOL → ∃𝑥∃𝑦∃𝑧(𝑥 ≠ 𝑦 ∧ 𝑥, 𝑧 ∈ 𝑆 ∧ 𝑦, 𝑧 ∈ 𝑆)
Transformation
paper is the experimental results on the benchmark. This is  Propositional logic:
Theory
the first paper to report the automated problem solving results Choice ‫=𝑖𝑛ٿ‬1 ‫𝑛ڀ‬−1 𝑛−1
𝑗=1 𝑝𝑖𝑗 → ‫ڀ‬1≤𝑖<𝑚≤𝑛 ‫=𝑗ڀ‬1 𝑝𝑖𝑗 ∧ 𝑝𝑚𝑗
on not only a few problems or a set of artificial problems but  Peano Arithmetic:
Embedding into
∀𝑚 𝑚<𝑛 → 𝑓 𝑚 <𝑛
a large number of real high-school-level problems. Local Theories
→ ∃𝑖∃𝑗 i < j ≤ 𝑛 ∧ 𝑓 𝑖 = 𝑓 𝑗

Secondary
Preliminaries Representation Representation
Change
Let us first redefine what we mean by “mathematical problem
Search / Reasoning
solving.” A math problem is usually expressed as a combina- Failure in search / reasoning
Constraint Satisfaction
tion of sentences, formulas, and figures. In principle, it can be
Quantifier Elimination
expressed as a logical formula in a theory. A theory consists
Theorem proving ∀𝑥 ∈ 𝑅 𝑥 2 + 𝑎𝑥 + 1 > 0
of a set of symbols called a language and a set of axioms. ↓
A language consists of constants, variables, relations, func- Success −2 < 𝑎 < 2

tions, and logical symbols. Constants and variables are terms, Solution
and also f (t1 , . . . ,tn ) is a term if f is an n-ary function symbol
and all ti s are terms. R(t1 , . . . ,tn ) is an atomic formula if R is
an n-ary relation symbol and all ti s are terms. For example, Figure 2: End-to-end problem solving model
2x + 1 = y and x > y + z are atomic formulas in arithmetic.
In the first-order mathematical logic, formulas are defined re-
cursively from atomic formulas and logical symbols. In clas- Specifically, we say that a problem is proved when we show
sical logic, we have seven connectives: ∧ (and), ∨ (or), ¬ that a given problem is equivalent to True.
(not), → (implies), ↔ (if and only if), ∀ (for all), and ∃ (there
exists). The last two connectives, ∀ and ∃, are called quan- A theory is called decidable when there is an algorithm
tifiers. When a variable is quantified, it is called a bound to determine whether any sentence is true or not. Gödel’s
variable. For example, the variable x is bound in the formula incompleteness theorem shows that any theory containing
∃x( f (a) = x) though a remains free (not bound). A formula Peano Arithmetic is undecidable.
containing no free variable is called a sentence. A formula Propositional logic, RCF, and Presburger Arithmetic are
containing no quantifier is called quantifier-free. The set of rare exceptions that are known to be decidable. However,
seven connectives are known to be complete in a sense that computational complexity of the decision procedures is quite
any mathematical assertion can be expressed as a first-order high. The theoretical lower bound of the decision procedure
formula provided that an appropriate language and a set of for propositional logic is superpolynomial to the size of in-
axioms are given. put formulas assuming that P̸=NP, and those for RCF and
A mathematical problem is solved when we find a formal Presburger Arithmetic are doubly exponential (Tarski, 1951;
procedure to show the problem is equivalent to a quantifier Fischer & Rabin, 1974). These lower bounds reflect the phe-
free formula of the simplest form. Fig. 1 gives examples. nomena of search space explosion.
An End-to-end Math Problem Solving Model those interpretable in a local theory, such as RCF and propo-
Fig. 2 presents an overview of our problem solving model. sitional logic. The former is usually a routine procedure for
It consists of three modules. The perception module trans- a person with the necessary math knowledge. The latter re-
lates a problem into a primary representation expressed in quires a target theory to be chosen beforehand. In the experi-
ZF by language processing. The formulation module trans- ments, we chose RCF as the target local theory and confirmed
forms the primary representation to another formula in ZF that many pre-university math problems can be mechanically
that is interpretable in a local theory such as RCF. Finally, the reformulated in RCF. This suggests that, once an appropriate
search/reasoning module works on the secondary representa- local theory is chosen, the reformulation can be modeled as a
tion. Once a failure is detected in the reasoning, the process heuristic search that seeks a formula in the local theory that
backtracks to the formulation module and seeks another prob- is equivalent to the primary representation.
lem representation that makes the reasoning easier. The rest What is missing in our prototype implementation is a
of this section provides more details on the three modules. mechanism to choose an appropriate local theory. Our hy-
pothesis is that it is the key ability in the representation
Perception Module change, which truly requires ‘insight.’ Our information-
processing model thus serves as a test bed for computational
The perception module is organized along a hierarchy in nat-
models of insight problem solving by plugging-in a theory
ural language: words, sentences, and discourses (i.e., se-
choice model to it. The rest of this section summarizes the
quences of sentences). The lexical processing unit identi-
implementation of the two reformulation steps and elucidates
fies the parts-of-speech and other syntactic properties of the
the contribution of the problem solving model.
words and math formulas in a problem. Since math formulas
have their own grammar, they are analyzed by a specialized
parser. Fig. 2 provides an example in which the same for- Higher-order to first-order transformation The primary
mula y = x2 has different syntactic roles, noun phrase (NP) representation often includes higher-order elements (λ-
and embedded sentence (S), in accordance with its context. abstractions), which denotes functions (e.g., λx.x2 ) and con-
The sentence processing unit translates each sentence in ditions (e.g., λy.(|y| < 1)). They are necessary to translate
the problem into a formal representation. We assume a the natural language expressions such as “The function that
grammar-driven translation model here, which composes the maps x ∈ R to x2 ” and “The absolute value of y is less than
semantic representation of a sentence along its syntactic 1. The same condition also applies to x.” We eliminate such
structure (Carpenter, 1997; Heim & Kratzer, 1998). Specifi- higher-order elements to obtain a first-order formula. In the
cally, we developed a Japanese grammar in the formalism of current implementation, this is done by iteratively applying a
Combinatory Categorial Grammar (CCG) (Steedman, 2001). handful of transformation rules such as β-reduction and vari-
Fig. 2 depicts the process of semantic composition with CCG able elimination by substitution (∃x(x = α ∧ φ(x)) ⇔ φ(α)).
for the sentence “AB is a diagonal of R.”
We need to detect omissions (zero pronouns) in the text Reformulation in RCF In the prototype implementation, a
before the semantic composition. Our current implementa- primary representation is rewritten into the language of RCF.
tion detects them using a list of words and their syntactic The first-order language of RCF consists of polynomial equa-
arguments (i.e., case frames). Fig. 2 provides an example tions and inequalities, logical connectives, and quantifiers.
where an omission (“of φ,” where φ, a zero pronoun, stands We developed a set of axioms that define various math con-
for something) is detected as the argument of ‘hypotenuse.’ cepts in the (higher-order) language of RCF, such as:
The discourse processing unit combines the sentence-level
semantic representations into a single formula. We adopt ∀x∀ f (minimize(x, f ) ↔ ∀x′ ( f (x) ≤ f (x′ ))).
the discourse representation theory (Kamp & Reyle, 1993)
as the basic mechanism of the inter-sentential composition. The primary representation is iteratively rewritten with these
Fig. 2 depicts an example where the semantic representations axioms until an equivalent formula is found in the first-order
of three sentences are combined into one with the two con- language of RCF. There is no theoretical guarantee that such a
nectives ∧ and →, and two universal quantifications (∀m∀n). formula will be eventually found even when it exists. We em-
The discourse processing unit also determines the antecedents pirically examined how often it succeeds in the experiments.
of the anaphoric expressions including zero pronouns.
Where in the process does insight come? The vocabulary
Formulation Module of a problem usually tells us in which theory it should be
The formulation module receives a primary problem repre- solved. However, this is not always the case. For instance,
sentation and transforms it into a secondary representation the wording in the mutilated draughtboard problem does not
that is amenable to reasoning. The process consists of two suggest it should be formulated in arithmetic but not in propo-
steps. One is the transformation of the higher-order formu- sitional logic. Human solvers thus usually start by searching
las produced by the perception module to first-order formulas for the solution in propositional logic, putting dominoes on
in ZF. The other is the transformation of the ZF formulas to the board in trial-and-error manner. It is inevitable to change
Table 1: Subject areas of the benchmark problems Table 2: Overall benchmark results
Ex Univ IMO Solved Failed
Algebra 0 10 21 Problem Solved Time (sec) FM TO WR
Linear Algebra 14 62 0 Source (%) min/med/avg/max (%) (%) (%)
Geometry 81 65 94 Ex 75.2 1 / 4 / 20 / 1069 7.9 13.9 3.0
Pre-calculus 0 75 0 Univ 65.3 1 / 7 / 38 / 1061 7.3 22.9 4.5
Calculus 6 33 0 IMO 26.1 2 / 10 / 56 / 513 10.4 60.0 3.5
total 101 245 115

lem in a certain representation, which is a requirement for a


the representation of the problem to solve it in a realistic time. quantitative study on representation change.
When and where does representation change happen in cogni-
tive process? The main contribution of our processing model Material
lies in pinning it down to a specific step in the problem for- We collected more than 400 problems taken from three
mulation process, namely the theory choice. sources: exercise books (Ex), Japanese university entrance
Our information-processing model helps discriminate be- exams (Univ), and International Mathematical Olympiads
tween different kinds of ‘insight’ problems. Nine-dot prob- (IMO). The Ex problems were sampled from a popular ex-
lem and mutilated draughtboard problem have been consid- ercise book series. The problems in the books are marked
ered typical insight problems of the same kind. However, the with one to five stars in accordance with their difficulty: one
reasons why people have difficulties are different in nature. to three stars signify textbook exercise level and four and five
Failure in solving nine-dot problem is at least partially due stars signify university entrance exam level. We sampled ap-
to the ambiguity of the term “line (segment)”. Disambigua- proximately the same number of problems from those marked
tion of terms is a part of the perception process, but not of with one, two, and three stars. The Univ problems were taken
the formulation or representation change in our model. In from the past entrance exams of seven top Japanese national
contrast, they fail to solve mutilated draughtboard problem universities. The IMO problems were taken from the past
because they cannot choose an appropriate theory to solve it IMOs held from 1959 through 2014.
only from the superficial properties of the problem. We examined the problems and exhaustively selected those
that can be formulated (by humans) in the theory of RCF. The
Solution Search/Reasoning
distinction between RCF and non-RCF problems was made
In the current implementation, we adopt a QE algorithm for solely on the basis of the essential mathematical content of
RCF (Iwane et al., 2014) as an example for solution search. the problems. The selected problems thus contain problems
Note that we do not argue the QE algorithm per se is the in several subject areas as shown in Table 1.
model of human answer-deduction process. We utilize it to The problems were manually formalized in a higher-order
approximately measure the difficulty of mathematical reason- language. Operators, who all majored in computer science
ing on a given problem representation. The computational and/or mathematics, were trained to translate the problems as
cost of the QE algorithm is quite sensitive to the problem rep- faithfully as possible to the original natural language state-
resentation; its time complexity is doubly exponential to the ments following the design of the perception module.
number of the variables in the representation. We regard a
long running time of the algorithm as a sign of the impasse Experimental Results
in the reasoning, which has been considered as a trigger of
The prototype system was run on the benchmark problems
representation change (e.g., (Öllinger et al., 2014)). In the ex-
with a time limit of 3600s per problem. Table 2 shows
periments, we examined to what extent this failure detection
the number of successfully solved problems; minimum, me-
mechanism correctly reflects the difficulty of the problems.
dian, average, and maximum (wallclock) time spent on solved
Experimental Procedure problems; number of failures in the reformulation of the pri-
mary ZF representation in RCF (FM); number of failures due
Aim of the Experiment to timeout (TO); and wrong answers (WR). Wrong answers
We developed a prototype implementation of the model de- were due to bugs in the current implementation.
scribed in the previous section. The theory choice process and Overall, the performances on the Ex, Univ, and IMO prob-
representation change mechanism is not yet implemented. lems seem to well reflect the inherent differences in their dif-
The aim of the experiment is to test if we can use the model ficulty levels. We conducted χ2 -test on the difference in the
as a basis for developing a computational model of theory rates of success on them. The difference between IMO and
choice and representation change. We thus need to verify: other sources were statistically significant (p < 0.01) though
A) the model can solve many non-insight problems, which that between Ex and Univ was not (p = 0.09).
do not require representation change and B) the response of We further examined how well the system performance
the model correlates with the difficulty of the problems. B) correlates with the fine-grained difficulty level assessed by
means the model is usable to quantify the difficulty of a prob- human experts. Table 3 lists the performance figures for
Table 3: Results for Ex problems by number of stars Table 5: Accuracy of the solvability prediction
Succeeded Failed Source Precision Recall
#⋆ Success % Time (sec) FM TO Ex 88% ( 67/ 76) 93% ( 67/ 72)
min/med/avg/max (%) (%) Univ 73% (116/160) 78% (116/149)
1 82.4 (28/34) 1 / 4 / 5 / 39 11.8 5.9 IMO 57% ( 17/ 30) 47% ( 17/ 36)
2 73.5 (25/34) 2 / 4 / 6 / 39 5.9 11.8 All 75% (200/266) 78% (200/257)
3 69.7 (23/33) 2 / 4 / 51 / 1069 6.1 24.2
4 63.2 (24/38) 2 / 6 / 36 / 589 10.5 23.7
5 54.3 (19/35) 3 / 10 / 198 / 3245 2.9 42.9
The analysis presented above revealed that certain types of
the difficulty are not captured by the superficial properties of
Table 4: Syntactic profiles of the formalized problems the problems including the problem size and the vocabulary.
Ex Univ IMO This is a partial indication of the necessity of representation
# of ∀ 2.2 2.0 5.8 change or other kinds of insight for solving the problems. A
# of ∃ 5.3 9.3 3.1
# of λ 1.3 2.1 0.1 future work is to examine such problems and clarify why they
# of relations 12.5 19.8 13.8 are difficult and what kinds of theory choice appear in human
# of functions 19.9 36.3 21.9 solutions of such problems.
# of bound variables 8.8 13.4 9.1
# of free variables 3.0 3.1 1.8
Discussions
A first-order theory consists of a language and axioms. A for-
the Ex problems (one to three stars) and additional problems mal theory is expressed in propositional logic, the first-order
sampled from those marked with four and five stars in the predicate logic, or higher-order predicate logic (typed lambda
same exercise books. The overall correlation between the calculus). In our model, we set the primary representation
difficulty level and the system performance is clear although expressed in the first-order ZF. Thus, there are three kinds of
the difference in the success rates was statistically signifi- theory changes: axiom change, language (and axiom) change,
cant only between the problems with one star and five stars and change from propositional to predicate logic.
(p < 0.05, χ2 -test).
Axiom Change
Analysis of the Experimental Results There are infinitely-many possible representations for propo-
Can we estimate the difficulty of a problem just by seeing it? sitional logic. Among them we can find analytic tableaux
If we can, the difficulty of the problems shall be attributed (cut-free LK), resolution, and Frege system (LK). The former
more to its inherent search cost (e.g., the time complexity de- two are the major systems used as the basis for automated
termined by the number of variables) rather than the necessity theorem proving. Cut rule (axiom) allows one to introduce
of representation change. Table 4 presents several syntactic “lemmas” to prove theorems.
features of the benchmark problems. The figures are averaged The pigeonhole principle is known to require exponen-
over the problems taken from each source. It reveals that the tial size proofs both in analytic tableau (Cook & Reckhow,
syntactic features of the IMO problems are not very different 1979) and resolution (Haken, 1985), but it has polynomial-
from the exercise problems in Ex except for the distribution size proofs in Frege system (Buss, 1987). This is because
of variable binders (∀, ∃, λ). we can introduce concepts of “addition”, “subtraction” and
In addition to the basic features listed in Table 4, we may “counting”, and manipulate them to do some “restricted arith-
be able to estimate the difficulty of a problem by the vocab- metic” in Frege system. However, search-cost for appropriate
ulary (i.e., distribution of function/relation symbols). To see cut-formulas is extravagant, and there is almost no hope that
this, we trained a binary classifier that predicts whether or someone comes up with appropriate cut-formulas.
not a problem can be solved by the prototype system in one Another way to shorten proofs is to introduce a “symme-
hour. We used the features in Table 4 and the number of each try rule” (Arai, 1996) as a new axiom. Propositional vari-
symbol in a problem as the input and trained the classifier able p∨i, j stands for “the ith -pigeon sitting in the jth -hole”,
on the results of the benchmark test. Table 5 lists the pre- and “ nj=1 p1, j ” for “the first pigeon sitting in some hole”
cision and the recall of the classification obtained by 5-fold when expressing the pigeonhole principle in propositional
cross-validation. The definitions of the precision and recall logic (Fig. 2). If we have to check all the possible pigeons’
are: precision = T P/(T P + FP) and recall = T P/(T P + FN), positions, proofs blow up exponentially. Proofs will be short-
where T P (resp. FP) is the number of problems correctly ened if we, without loss of generality, assume that the first
(resp. wrongly) predicted ‘solvable’ and FN is the number of pigeon sits in the first hole. In other words, “insight” real-
problems wrongly predicted ‘unsolvable.’ The overall predic- izing that a given problem has the property of symmetric-
tion accuracy in Table 5 is way above the majority baseline of ity helps us to escape from an exhaustive search. There are
57% but the accuracy is not very high especially on Univ and some heuristics known to detect symmetricity, and it is im-
IMO problems. plemented on computer (Arai & Masukawa, 2000).
D D the eighth symposium on the integration of symbolic com-
putation and mechanized reasoning.
A A
P’
Buss, S. R. (1987). Polynomial size proofs of the proposi-
P
tional pigeonhole principle. The Journal of Symbolic Logic,
52(04), 916–927.
B C B C
Carpenter, B. (1997). Type-logical semantics. MIT Press.
Chou, S.-C. (1988). Mechanical geometry theorem proving
(Vol. 41). Springer Science & Business Media.
Figure 3: Solution to the quadrangle problem Cook, S. A., & Reckhow, R. A. (1979). The relative efficiency
of propositional proof systems. The Journal of Symbolic
Logic, 44(01), 36–50.
Language Change Fischer, M. J., & Rabin, M. O. (1974). Super-exponential
Elementary (Euclidean) geometry is known to be embeddable complexity of presburger arithmetic. In Proc. of the siam-
into the Cartesian coordinate system, and finally to RCF. ams symposia in applied mathematics (Vol. 7, pp. 27–41).
However, languages and sets of axioms are different. As a Haken, A. (1985). The intractability of resolution. Theoreti-
result, the difficulties of problems do not remain the same. cal Computer Science, 39, 297–308.
Consider the following problem: “Let ABCD be a quad- Heim, I., & Kratzer, A. (1998). Semantics in generative
rangle. Find the point P that minimizes the sum of AP, BP, grammar. Wiley.
CP, and DP.” Fig. 3 illustrates that the intersection of the Isaak, M. I., & Just, M. A. (1995). Constraints on thinking in
diagonals minimizes the sum because of triangle inequality. insight and invention. In R. J. Sternberg & J. E. Davidson
Insight may be required to line up the intersection of the (Eds.), The nature of insight (pp. 281–325). MIT Press.
diagonals as a candidate for P. However, the idea is easier to Iwane, H., Yanami, H., & Anai, H. (2014). Synrac: A tool-
conceive when it is represented in Euclidean Geometry than box for solving real algebraic constraints. In Mathematical
in RCF since the intersection of the diagonals has salience in software–icms 2014 (pp. 518–522). Springer.
Euclidean Geometry. Kamp, H., & Reyle, U. (1993). From discourse to logic:
Introduction to modeltheoretic semantics of natural lan-
Propositional or Predicate? guage, formal logic and discourse representation theory.
Mutilated draughtboard problem (2n × 2n version) is a good Kluwer Academic.
example of problems which is solvable when one changes Kaplan, C. A., & Simon, H. A. (1990). In search of insight.
the setting radically. The problem requires exponential-size Cognitive psychology, 22(3), 374–419.
proofs in resolution and analytic tableaux. It is not known Kerber, M., & Pollet, M. (2006). A tough nut for mathemat-
whether or not it has short proofs in tableaux with symmetry ical knowledge management. In Mathematical knowledge
rule. However, it has a short proof in arithmetic. management (pp. 81–95). Springer.
MacGregor, J. N., Ormerod, T. C., & Chronicle, E. P. (2001).
Conclusion Information processing and insight: A process model of
An end-to-end model of math problem solving has been pre- performance on the nine-dot and related problems. Jour-
sented. In the model, representation change is explained as nal of Experimental Psychology: Learning, Memory, and
the result of a choice of a local theory and the reformula- Cognition, 27(1), 176.
tion of a primary problem representation in it. Experimen- Newell, A., & Simon, H. A. (1972). Human problem solving
tal results on more than 400 problems show that our proto- (Vol. 104) (No. 9). Englewood Cliffs, NJ: Prentice-Hall.
type implementation reflects the difficulties of the problems Ohlsson, S. (1992). Information-processing explanations of
quite precisely. Specifically, IMO problems require the sys- insight and related phenomena. In Advances in the psychol-
tem “theory change” more often than others when interpret- ogy of thinking (pp. 1–44). Harvester Wheatsheaf.
ing the timeout as “impasse”. It indicates the model correctly Öllinger, M., Jones, G., & Knoblich, G. (2014). The dynam-
captures the difficulty of the problems and hence it can serve ics of search, impasse, and representational change provide
as a basis of a quantitative study on representation change. a coherent explanation of difficulty in the nine-dot problem.
Future work includes further analysis of the difficulty of math Psychological research, 78(2), 266–275.
problems in light of our information-processing account and Steedman, M. (2001). The syntactic process. MIT Press.
development of computational models of theory choice. Tarski, A. (1951). A decision method for elementary algebra
and geometry. University of California Press.
References Zettlemoyer, L. S., & Collins, M. (2005). Learning to map
Arai, N. H. (1996). Tractability of cut-free gentzen type sentences to logical form: Structured classification with
propositional calculus with permutation inference. Theo- probabilistic categorial grammars. In Proc. of the 21st con-
retical Computer Science, 170(1), 129–144. ference in uncertainty in artificial intelligence (pp. 658–
Arai, N. H., & Masukawa, R. (2000). How to find symme- 666).
tries hidden in combinatorial problems. In Proceedings of

You might also like