Bouchard, Stenetorp, Riedel - Unknown - Learning To Generate Textual Data
Bouchard, Stenetorp, Riedel - Unknown - Learning To Generate Textual Data
AI2 IL CC AI2 IL CC
X+Y X+Y X+Y−Z Train 198 214 300
X+Y+Z X−Y X ∗ (Y + Z) Dev 66 108 100
X−Y X∗Y X ∗ (Y − Z) Test 131 240 200
X/Y (X + Y)/Z Total 395 562 600
(X − Y)/Z Table 2: Math word problems dataset sizes.
Table 1: Patterns of the equations seen in the datasets for one
permutation of the placeholders.
sets. For AI2 and CC, we simply split the data ran-
domly and for IL we opted to maintain the clusters
However, machine translation can be learned on mil- described in Roy and Roth (2015). We then used the
lions of pairs of already translated sentences, and implementation of Roy and Roth (2015) provided by
such massive training datasets dwarfs all previously the authors, which is the current state-of-the-art for
introduced math exam datasets. Today, standard all three data sets, to obtain results to compare our
repositories are restricted to a few hundreds prob- model against. The resulting data sizes are shown
lems with their solutions (Hosseini et al., 2014; Roy on Table 2. We verified that there are no duplicate
et al., 2015; Roy and Roth, 2015). problems, and our splits and a fork of the baseline
We used standard benchmark data from the litera- implementation are available online.4
ture. The first one, AI2, was introduced by Hosseini
3.4 Development of the Generator
et al. (2014) and covers addition and subtraction of
one or two variables or two additions scraped from Generators were organized as a set of 8 base genera-
two web pages. The second (IL), introduced by Roy tors pk , summarized in Table 4. Each base generator
et al. (2015), contains single operator questions but has several functions associated with it. The func-
covers addition, subtraction, multiplication, and di- tions were written by a human, over 3 days of full-
vision, and was also obtained from two, although time development. The first group of base genera-
different from AI2, web pages. The last data set tors is only based on the type of symbol the equation
(CC) was introduced by Roy and Roth (2015) to has, the second group is the pair (#1, #2) to represent
cover combinations of different operators and was equations with one or two symbols. Finally, the last
obtained from a fifth web page. two generators are more experimental as they corre-
An overview of the equation patterns in the data spond to simple modifications applied to the avail-
is shown in Table 1. It should be noted that there able training data. The Noise ’N’ generator picks
are sometimes numbers mentioned in the problem one or two random words from a training sample to
description that are not used in the equation. create a new (but very similar) problem. Finally, the
’P’ generator is based on computing the statistics of
As there are no available train/dev/test split in the
4
literature we introduced such splits for all three data https://github.com/ninjin/roy_and_roth_2015
John sprints to William’s apartment. The distance is 32 yards from John’s apartment to 32 / 2
William’s apartment. It takes John 2 hours to at the end get there. How fast did John go?
Sandra has 7 erasers. She grasps 7 more. The following day she grasps 18 whistles at the 7+7
local supermarket. How many erasers does Sandra have in all?
A pet store had 81 puppies In one day they sold 41 of them and put the rest into cages with 8 ( 81 - 41 ) / 8
in each cage. How many cages did they use?
S1 V1 Q1 O1 C1 S1(pronoun) V2 Q2 of O1(pronoun) and V2 the rest into O3(plural) with ( Q1 - Q2 ) /
Q3 in each O3. How many O3(plural) V3? Q3
Table 3: Examples of generated sentences (first 3 rows). The last row is the template used to generate the 3rd example where
brackets indicate modifiers, symbols starting with ’S’ or ’O’ indicate a noun phrase for a subject or object, symbols with ’V’
indicate a verb phrase, and symbols with ’Q’ indicate a quantity. They are identified with a number to match multiple instances of
the same token.