0% found this document useful (0 votes)
10 views

A Mathematical Word Problem Generator with Structure

The document presents a Mathematical structure Planning and Knowledge enhanced Generation model (MaPKG) for automatically generating mathematical word problems (MWPs) that adhere to logical coherence and diverse topics. The model incorporates a dynamic planning module and a dual-attention mechanism to enhance problem generation by utilizing equation structures and commonsense knowledge. Experimental results demonstrate that MaPKG significantly improves the solvability, quality, and diversity of generated MWPs compared to existing methods.

Uploaded by

ustckevinf
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

A Mathematical Word Problem Generator with Structure

The document presents a Mathematical structure Planning and Knowledge enhanced Generation model (MaPKG) for automatically generating mathematical word problems (MWPs) that adhere to logical coherence and diverse topics. The model incorporates a dynamic planning module and a dual-attention mechanism to enhance problem generation by utilizing equation structures and commonsense knowledge. Experimental results demonstrate that MaPKG significantly improves the solvability, quality, and diversity of generated MWPs compared to existing methods.

Uploaded by

ustckevinf
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

A Mathematical Word Problem Generator with Structure

Planning and Knowledge Enhancement


Longhu Qin Jiayu Liu Zhenya Huang∗
School of Computer Science and School of Data Science, University of School of Computer Science and
Technology, University of Science and Science and Technology of China & Technology, University of Science and
Technology of China & State Key State Key Laboratory of Cognitive Technology of China & State Key
Laboratory of Cognitive Intelligence Intelligence Laboratory of Cognitive Intelligence
Hefei, Anhui, China Hefei, Anhui, China Hefei, Anhui, China
[email protected] [email protected] [email protected]

Kai Zhang Qi Liu Binbin Jin


School of Computer Science and School of Computer Science and Huawei Cloud Computing
Technology, University of Science and Technology, University of Science and Technologies Co., Ltd.
Technology of China & State Key Technology of China & State Key Hangzhou, Zhejiang, China
Laboratory of Cognitive Intelligence Laboratory of Cognitive Intelligence [email protected]
Hefei, Anhui, China Hefei, Anhui, China
[email protected] [email protected]

Enhong Chen
Anhui Province Key Laboratory of
Big Data Analysis and Application,
University of Science and Technology
of China & State Key Laboratory of
Cognitive Intelligence
Hefei, Anhui, China
[email protected]
ABSTRACT experiments on two MWP datasets show our model can guaran-
Automatically generating controllable and diverse mathematical tee more solvable, high-quality, and diverse problems. Our code is
word problems (MWPs) which conform to equations and topics is available at https://github.com/KenelmQLH/MaPKG.git
a crucial task in information retrieval and natural language genera-
tion. Recent deep learning models mainly focus on improving the CCS CONCEPTS
problem readability but overlook the mathematical logic coherence, • Computing methodologies → Natural language generation.
which tends to generate unsolvable problems. In this paper, we draw
inspiration from the human problem-designing process and pro- KEYWORDS
pose a Mathematical structure Planning and Knowledge enhanced MWP generation, planning mechanism, knowledge enhancement
Generation model (MaPKG), following the “plan-then-generate”
steps. Specifically, we propose a novel dynamic planning module ACM Reference Format:
to make sentence-level equation plans and a dual-attention mecha- Longhu Qin, Jiayu Liu, Zhenya Huang, Kai Zhang, Qi Liu, Binbin Jin, and En-
nism for word-level generation, incorporating equation structure hong Chen. 2023. A Mathematical Word Problem Generator with Structure
Planning and Knowledge Enhancement. In Proceedings of the 46th Interna-
representation and external commonsense knowledge. Extensive
tional ACM SIGIR Conference on Research and Development in Information
Retrieval (SIGIR ’23), July 23–27, 2023, Taipei, Taiwan. ACM, New York, NY,
∗ Corresponding author. USA, 5 pages. https://doi.org/10.1145/3539618.3591937

Permission to make digital or hard copies of all or part of this work for personal or 1 INTRODUCTION
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation Automatic problem generation has attracted much attention in in-
on the first page. Copyrights for components of this work owned by others than the formation retrieval and natural language generation fields, which
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission could provide important educational resources for several applica-
and/or a fee. Request permissions from [email protected]. tions [9, 15, 16]. In this paper, we study the task of automatically
SIGIR ’23, July 23–27, 2023, Taipei, Taiwan generating mathematical word problems (MWPs), which not only
© 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 978-1-4503-9408-6/23/07. . . $15.00 asks for semantic understanding [6, 14, 25] of the specific equations
https://doi.org/10.1145/3539618.3591937 and topics, but requires mathematical logic to generate controllable

1780
(b) Subtree Encoder (d) Dynamic Planning Module
𝑠
𝑠 𝑃
f ℎ
Guide
f
Plan
𝑠 𝑠
f f ℎ ℎ ℎ

sentence pooling
n ∗ x n ∗ x = n 𝑠 𝑠 𝑠 𝑠

SIGIR ’23, July 23–27, 2023, Taipei, Taiwan Longhu Qin et al.

Equation Template: x0 + x1 = n0 ; n1 x0 + n2 x1 = n3 (a) Model Architecture (c) Dual-Attention Decoder


Topic1 Keywords: teacher buy flower rose lily 𝑃 𝑆𝑃 𝑂𝑛 𝑟𝑎𝑖𝑛𝑦 … 𝑛𝑢𝑡𝑠. 𝑃 𝑆𝑃 𝑂𝑛 𝑠𝑢𝑛𝑛𝑦 … 𝑛𝑢𝑡𝑠. ℎ

Problem1: Jack bought some flowers for his teachers as gifts. He bought 𝐧𝐧𝟎𝟎 flowers at the shop, Feed Forward
which cost $ 𝐧𝐧𝟑𝟑 in total. Each rose cost $ 𝐧𝐧𝟏𝟏 and each lily cost $ 𝐧𝐧𝟐𝟐 . How many roses and lilies ℎ ℎ ℎ …ℎ ℎ ℎ ℎ …ℎ
did Jack buy? … … …
Add & Norm
Equation Expression Plan Commonsense Knowledge 𝐻 Dynamic Planning Module
Topic-Attention
n0 n3 n1 n2 Subject Relation Object
flower IsA gift
= = buy HasSubevent shop Dual-Attention Decoder Add & Norm
+ n0 + n3 shop HasA flower

Equation-Attention
𝑥𝑥0 𝑥𝑥1 * * rose IsA flower 𝑆 𝐻 𝐻
lily IsA flower
n1 x0 n2 x1 Add & Norm
… … …
Mask
Topic2 Keywords: squirrel pick nuts sunny rainy Self-Attention
Subtree Sequence Subgraph
KG 𝐿
Encoder Encoder Encoder
Problem2: A little squirrel picks nuts in the forest every day. On sunny days, it can pick 𝐧𝐧𝟏𝟏
Subgraph
nuts. On rainy days, it can only pick 𝐧𝐧𝟐𝟐 nuts. In the latest 𝐧𝐧𝟎𝟎 days, the squirrel has picked a Output Embedding
Equations Keywords
total of 𝐧𝐧𝟑𝟑 nuts. How many sunny days and rainy days were there during this time?
(b) Subtree Encoder (d) Dynamic Planning Module
Equation Expression Plan Commonsense Knowledge 𝑠
n1 n2 n0 n3 Subject Relation Object 𝑠 𝑃
f ℎ
squirrel AtLocation forest f
Guide
= = Plan
squirrel Desires nuts 𝑠 𝑠
+ n0 + n3 rainy IsA weather f f ℎ ℎ ℎ


𝑥𝑥0 𝑥𝑥1 * * rainy RelatedTo day
sunny RelatedTo day sentence pooling
n1 x0 n2 x1
… … … n ∗ x n ∗ x = n 𝑠 𝑠 𝑠 𝑠

Figure 1: Examples of diverse MWP generation. Figure 2: Framework of MaPKG.


not only comprehend the concepts [26] based on commonsense
and diverse(a) Model problems.
Architecture As shown in Figure 1,(c) given Dual-AttentiontheDecoder
same equa- knowledge but also combine math information for description.
𝑜𝑜

tion templates 𝑃𝑃 but different𝑃𝑃topic keywords, we can create
𝑖𝑖 𝑆𝑆𝑆𝑆 𝑂𝑂𝑂𝑂 𝑟𝑟𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 … 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛. 𝑗𝑗 problems
𝑆𝑆𝑆𝑆 𝑂𝑂𝑂𝑂 𝑠𝑠𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢 … 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛. 𝑖𝑖
To this end, we propose a novel Mathematical structure Planning
that describe mathematical ℎ
𝑝𝑝 𝑠𝑠
ℎ1𝑖𝑖
information
ℎ 𝑠𝑠 𝑠𝑠
ℎ2𝑖𝑖 … ℎ𝑎𝑎𝑖𝑖
in diverse 𝑝𝑝
scenarios.
Feed Forward
ℎ1
𝑠𝑠𝑗𝑗 𝑠𝑠𝑗𝑗
ℎ2 …ℎ
𝑠𝑠𝑗𝑗
and Knowledge enhanced Generation model (MaPKG) for MWP
… 1𝑖𝑖 … …
𝑗𝑗 𝑏𝑏

Toward this 𝐻𝐻
goal, Dynamic
𝑜𝑜
𝐿𝐿
therePlanning
areModule
several efforts in the task including
Add & Norm
generation following the “plan-then-generate” process with the
rule-based methods and neural network-based Topic-Attention ones. Specifically, encoder-decoder architecture. Specifically, in the encoder, we in-
the earlier rule-based methods [8, 18, 23] …always generate
Dual-Attention Decoder Add & Norm a problem troduce the subtree structure and external knowledge to represent
Equation-Attention
with predefined𝑆𝑆 𝑒𝑒 math𝐻𝐻
rules or text
𝑒𝑒 templates,
𝐻𝐻
and however, they 𝑘𝑘
the equations and keywords respectively. In the decoder, we first
n2 n0 n3
generally suffer from manual construction cost andAddlimited & Norm template propose a novel dynamic planning module to make sentence-level
Mask
diversity. Subtree
Recently, researchers
Sequence
change the
Subgraph
attention to
Self-Attention neural net- expression plans based on equation subtrees. Then, we design a
× 𝐿𝐿
Encoder
work approaches, whichKGfollow
Encoder Encoder
the sequence-to-sequence
Subgraph Output Embedding
architec- dual-attention mechanism to fuse equation information and topic
Keywords
ture to generate Equations diverse MWPs [17, 28]. Moreover, some works ex- knowledge to generate problem word by word. Extensive experi-
(b) Subtree Encoder (d) Dynamic Planning Module
plore the possibilities including 𝑠𝑠4 the pre-trained language model [24], ments on two MWP datasets verify that our MaPKG improves the
𝑠𝑠3 f 𝑃𝑃 𝑖𝑖 𝑠𝑠

retrieval-based generation f
[2], and commonsenseGuideenhancement[3] 𝑖𝑖
generation results in terms of solvability, quality and diversity.
Plan
𝑠𝑠1 𝑠𝑠2
etc. Although f
they have f
achieved great ℎ
success,ℎ they ℎ
generally

fo- 𝑝𝑝 𝑠𝑠i 𝑠𝑠𝑖𝑖 𝑠𝑠𝑖𝑖
… 𝑖𝑖 1 2 𝑎𝑎

cuses on improving the problem readability. Their generation pro-


sentence pooling 2 METHOD
∗ x + n ∗ x = n 𝑠𝑠1 𝑠𝑠2 𝑠𝑠3 𝑠𝑠4
cess maynoverlook 0 0
the mathematical 1 1 2
logic coherent among MWPs,
and tend to generate problems that may be unsolvable in practice. 2.1 Task Definition
In this paper, we draw inspiration from the problem-designing Given mathematical equation templates 𝑬 = {𝑒 1, ..., 𝑒𝑛 } and topic
process of human educators. On one hand, human experts always keywords 𝑻 = {𝑤 1, ..., 𝑤𝑚 }, the MWP generator aims to generate a
follow the “plan-then-generate” principle [11, 19] in real-world descriptive problem 𝒚 = {𝑦1, ..., 𝑦𝑙 } by:
situations. Specifically, before writing down a problem, they usually 𝒚 = argmax𝑦ˆ 𝑃 ( 𝑦ˆ | 𝑬, 𝑻 ). (1)
make an explicit plan to express equations in a logical order. For Two requirements should be met in this process. First, 𝒚 can be
Problem1 in Figure 1, the equation “𝑥 0 + 𝑥 1 = 𝑛 0 ” (marked red) solved by the input equations 𝑬. Second, 𝒚 is described as coherent
is selected first and then described as the sentence “He bought narrative text related to the input topic 𝑻 .
𝑛 0 flowers ...”, followed by the green part to be generated. On the
2.2 Model Architecture
other hand, when generating a specific sentence, it is essential to
associate commonsense knowledge, which not only helps select Figure 2 shows the framework of MaPKG, which mainly consists of
proper keywords but also enrich the description. In Figure 1, if two multi-grained equation encoders, a keyword subgraph encoder,
we know that both “rose” and “lily” are similar concepts related to a dynamic planning module, and a dual-attention decoder.
“flower”, they could be more likely to describe the variables “𝑥 0 ” and 2.2.1 Equation Representation. We consider that equations con-
“𝑥 1 ” respectively after describing “𝑛 0 ” as the number of flowers. tain two kinds of information. First, an equation can be directly
However, it is non-trivial for machines to carry out this human- viewed as a sequence of numbers, variables, and operators. Second,
style process for MWP generation. First, the token-level equation the operators determine the relationship between numbers and
sequences fail to accurately reflect their mathematical logic. In variables, which form structural subtrees [7] that imply specific
Figure 1, the reasonable plan for Problem1 follows the subtree-level sentence-level descriptions (e.g., the subtree “𝑥 0 ∗ 𝑛 1 ” are ground-
order marked as “red-green-purple-yellow” steps, rather than the ing for sentence “Each rose cost 𝑛 1 ”). To this end, we represent
token-level order. Second, introducing more topic keywords may each equation 𝑒 ∈ 𝑬 from two aspects, namely sequential token
lead to a harder knowledge application process, since we should representations and hierarchical subtree representations.

1781
A Mathematical Word Problem Generator with Structure Planning and Knowledge Enhancement SIGIR ’23, July 23–27, 2023, Taipei, Taiwan

Table 1: The statistics of the datasets. which induces a subtree planning loss as (e
𝑔 is the golden plans):
Statistics Lmwp-G Hmwp-G L𝑝𝑙𝑎𝑛 = −
Í
(𝑇𝑖 ) −
Í
(1 − 𝑃 (𝑇𝑖 ) ) .
e log 𝑃
𝑇𝑖 ∈𝑔 𝑔 log
𝑇𝑖 ∉e (4)
Num. problems 5447 5491
Avg Num. words 39.6 60.1 For the second goal, we design a guidance loss to enhance the
Num. templates 48 2144 dependence of the sentence on the current plan. Specifically, we use
Avg Num. equations 2.0 1.3 the mean squared error to minimize the latent plan representation
Avg Num. keywords 8.84 8.80 ℎ𝑝 and the sentence representation ℎ𝑠 . ℎ𝑠 is obtained by average
Avg Num. Concepts 64.6 49.66
Avg Num. Triples 98.24 71.13
pooling the hidden states of tokens in the generated sentence.
L𝑠𝑒𝑛𝑡 = 𝑀𝑆𝐸 (ℎ𝑝 , ℎ𝑠 ). (5)

For sequential token representations, we use a Bi-GRU based en- With the above proper planning and forced guiding, the decoder
coder to represent token sequence embeddings as 𝑯 𝑒 = {ℎ𝑒1 , ..., ℎ𝑒|𝑒 | }. below can guarantee the equation logic in word generation, to
ensure the problem quality and solvability.
For subtree representations, we first convert equation 𝑒 into
a binary expression tree and denote each subtree 𝑖 as a triplet 2.2.4 Dual-Attention Decoder. Our decoder adopts the basic Trans-
𝑇𝑖 = (𝑜, 𝑙, 𝑟 ). Then we propose a subtree encoder to learn the sub- former manner to conduct word-level generation. Specifically, we
tree embeddings in a bottom-up way as shown in Figure 2 (b). propose a novel dual-attention mechanism in it to combine the
Specifically, the subtree embedding 𝑠𝑖 of 𝑖 is obtained by: equation information with the knowledge-aware keyword infor-
mation. As shown in Figure 2(c), we design an equation attention
𝑠𝑖 = 𝑊𝑠 [ℎ𝑜𝑒 ; 𝑠𝑙 ; 𝑠𝑟 ] + 𝑏𝑠 , (2)
layer and a topic attention layer as follows:
where ℎ𝑜𝑒 ∈ 𝑯𝑒is the representation of operator 𝑜, 𝑠𝑙 , 𝑠𝑟 are embed-
𝐻𝑙′ = 𝐻𝑙 + 𝑀𝑢𝑙𝑡𝑖𝐻𝑒𝑎𝑑𝐸 (𝐻𝑙 , 𝑯 𝑒 , 𝑯 𝑒 ), (6)
dings of the left child 𝑙 and right child 𝑟 respectively. For every leaf
node, we set 𝑠 ∗ = ℎ𝑒∗ ∈ 𝑯 𝑒 . The hierarchical subtree embeddings 𝐻𝑙′′ = 𝐻𝑙′ + 𝑀𝑢𝑙𝑡𝑖𝐻𝑒𝑎𝑑𝑇 (𝐻𝑙′ , 𝑯 𝑘 , 𝑯 𝑘 ), (7)
are donated as 𝑺 𝑒 = {𝑠 1, ..., 𝑠𝑏 }, where 𝑏 is the number of subtrees. where 𝐻𝑙 represents the hidden states from the mask self-attention
2.2.2 Knowledge-aware Keyword Representation. Keywords pro- layer. With 𝐿 decoder blocks, the dual-attention mechanism can
vide essential semantic information to generate a problem. However, fuse the equation information and topic information iteratively.
it is still not enough to only perceive them isolatedly, which might Finally, the hidden states in the last decoder layer ℎ𝑜 are passed
lead to improper expression of math information (e.g., generate to another feed-forward layer with softmax to estimate output
distribution and generate the problem words. Especially, if [𝑆𝑃] is
that the number of “roses” is the sum of “flowers” and “lilies”).
generated, ℎ𝑜 is taken as the latent plan representation ℎ𝑝 described
Therefore, when representing the keywords, we retrieve external in Section 2.2.3. The generation loss is defined as follows:
commonsense knowledge to promote the understanding of them. Í𝑙
L𝐿𝑀 = − 𝑡 =1 log 𝑃 (𝑦𝑡 | 𝑦 <𝑡 , 𝑬, 𝑻 ) , (8)
Specifically, given keywords 𝑻 , we first regard them as central con-
cepts and extract their 𝐾-hop neighbor concepts 𝑽 from the public Finally, we jointly minimize the following loss with hyperparam-
eters 𝛼, 𝛽 to balance sentence planning and word generation:
knowledge bases ConceptNet 1 and HowNet2 , which form a key-
L = L𝐿𝑀 + 𝛼 L𝑝𝑙𝑎𝑛 + 𝛽 L𝑠𝑒𝑛𝑡 . (9)
word subgraph G = {V, E} where V = 𝑻 ∪𝑽 and E are edges taken
from the knowledge bases. Then, we learn the node representations
{𝑔𝑣 | 𝑣 ∈ V} with a keyword subgraph encoder, which is imple- 3 EXPERIMENTS
mented with GGNN [12]. After 𝑁 iterations of information passing 3.1 Experimental Dataset and Setup
on G, we concat the initial and 𝑁 -th iteration node presentations 3.1.1 Datasets. We conduct our experiments based on two MWP
𝑔0𝑣 , 𝑔𝑁
𝑣 for each node 𝑣, i.e. ℎ𝑘 = 𝑊 [𝑔 𝑣 ; 𝑔 𝑣 ] + 𝑏 , and obtain the
𝑣 𝑘 0 𝑁 𝑘 datasets. (1) Lmwp [17] is a dataset with two linear equations and
knowledge-aware keyword representations as 𝑯 𝑘 = {ℎ𝑘𝑣 | 𝑣 ∈ 𝑻 }. two unknown variables for each problem. (2) Hmwp [20] consists of
2.2.3 Dynamic Planning Module. To accurately express the logic hybrid MWPs including both one-known and two-unknown, which
of equations in problems, we generate subtree plans as a skeleton contains more various equation templates and longer problems.
to guide the sentence expression order. Specifically, inspired by Based on them, we extract topic keywords from each problem
[4, 5, 13], we treat the plans as additional tokens denoted as [𝑆𝑃] with jionlp3 and annotate the subtree positions in equations as
and generate them with words in problem 𝑦 dynamically by a dual- the golden plans. Table 1 summarizes the basic statistics of the
attention mechanism described in Section 2.2.4, which derives a annotated datasets Lmwp-G and Hmwp-G.
hidden state ℎ𝑝 for each [𝑆𝑃] as its latent representation. 3.1.2 Experimental Setup. We set the embedding dim as 512 and
In this module, we aim to ensure that (1) ℎ𝑝 indeed conveys the the number of transformer layers as 6. The keyword sub-graphs are
plan information about one or more subtrees in equations 𝑬 and (2) constructed by 1-hop neighbors (i.e., 𝐾 = 1) , and 𝑁 = 2 in GGNN.
ℎ𝑝 guides the generation of the current sentence (e.g., “Jack bought Hyperparameter 𝛼 and 𝛽 are both set to 0.5.
... 𝑛 0 flowers ...” is grounded on the current plan “𝑥 0 + 𝑥 1 = 𝑛 0 ”). 3.1.3 Baseline and Evaluation. We compare our model against sev-
For the first goal, we introduce a prediction task to determine eral strong baselines: (1) CVAE [27] is a GRU-based sequence-
which subtree(s) the plan ℎ𝑝 indicates. Specifically, we calculate the
to-sequence model with Conditional VAE. (2) S2S-GRU [1] is a
probability that ℎ𝑝 relates to the subtree 𝑇𝑖 by a pointer-network:
   GRU-based sequence-to-sequence model with Attention. (3) MAG-
𝑃 (𝑇𝑖 ) = 𝜎 (𝑊1𝑇 tanh 𝑊2 𝑠𝑖 ; ℎ𝑝 + 𝑏 1 ) + 𝑏 2 , (3) NET [28] is a MWP generator with entity-enforced loss. (4) S2S-TF
[22] is a standard Transformer-based sequence-to-sequence model.
1 https://conceptnet.io
2 https://openhownet.thunlp.org 3 http://www.jionlp.com/

1782
SIGIR ’23, July 23–27, 2023, Taipei, Taiwan Longhu Qin et al.

Table 2: Automatic evaluation on MWP generation.


Dataset Lmwp-G Hmwp-G
Solvability Quality Diversity Solvability Quality Diversity
Metric
Equ-Acc↑ Ans-Acc↑ BLEU↑ METEOR↑ Self-BLEU↓ Equ-Acc↑ Ans-Acc↑ BLEU↑ METEOR↑ Self-BLEU↓
CVAE 0.5350 0.6341 25.40 0.4908 0.7203 0.1750 0.2250 29.61 0.4843 0.6705
S2S-GRU 0.6310 0.7169 27.40 0.5015 0.7278 0.2370 0.2910 31.77 0.5004 0.6493
MAGNET 0.6305 0.7004 25.06 0.4839 0.6832 0.3404 0.3891 25.37 0.4314 0.6534
S2S-TF 0.7000 0.7800 29.04 0.5397 0.7133 0.3030 0.3720 39.13 0.5796 0.6630
BART 0.6213 0.7426 31.64 0.5595 0.6791 0.3050 0.3830 41.93 0.6050 0.6033
MaPKG 0.7440 0.8490 30.79 0.5410 0.6780 0.3320 0.4082 42.34 0.6149 0.6378
w/o SP 0.7366 0.8379 30.54 0.5395 0.6850 0.3184 0.3965 42.05 0.6072 0.6442
w/o KG 0.7403 0.8434 29.94 0.5273 0.6869 0.3320 0.3984 40.61 0.5975 0.6474
w/o DA 0.7016 0.8158 29.60 0.5266 0.7002 0.2891 0.3652 38.85 0.5800 0.6561

Number Order Consistency ↑ Number Missing Rate ↓


0.5 25%
Table 3: Examples of generated MWPs.
0.4
Equation: 𝑥 0 + 𝑥 1 = 𝑛 0 ; 𝑛 1 ∗ 𝑥 0 −𝑛 2 ∗ 𝑥 1 = 𝑛 3
20%
Keywords: Grandma green red cake food cost
0.3
15% S2S-TF: In the past, cakes are rare treats. Each green bean cake cost 𝑛 1 yuan and each
0.2
red bean cake cost 𝑛 2 yuan. A family bought 𝑛 0 cakes at the cost of 𝑛 3 yuan. How many
0.1 10%
Lmwp‐G Hmwp‐G Lmwp‐G Hmwp‐G
green bean cakes and red bean cakes did they buy? (%)
CVAE S2S-GRU MAGNET CVAE S2S-GRU MAGNET BART: Grandma Wang spent 𝑛 0 yuan to buy some cakes. Each green bean cake cost
S2S-TF BART MaPKG S2S-TF BART MaPKG 𝑛 1 yuan and each red bean cake cost 𝑛 2 yuan. There were 𝑛 3 more green bean cakes
than red bean cakes. How many red bean cakes and red bean cakes did Wang buy? (%)
Figure 3: Analysis for numbers in generated MWPs. MaPKG: Grandma Li went to the street to buy 𝑛 0 pieces of cake. Each green bean cake
cost 𝑛 1 yuan, each red bean cake cost 𝑛 2 yuan. The green bean cakes cost more 𝑛 3 yuan
(5) BART [10] is one of the most popular generative pre-trained than the red bean cakes. How many red bean cakes? (")
language models, and we choose BART-base for comparison.
We conduct the automatic evaluation in three aspects: solvability,
3.2.2 Analysis of Planning. MaPKG’s planning effectiveness is
language diversity, and quality. For solvability, we use equation
demonstrated by analyzing the numbers in generated MWPs. We
accuracy (Equ-Acc) and answer accuracy (Ans-Acc) to measure
assess the logical order of equations of generated MWPs by mea-
whether the generated MWPs can be solved by the input equa-
suring number order consistency with labeled MWPs. The number
tions. The equation accuracy is checked using a SOTA MWP solver
missing rate is also computed to determine the omission of numbers
[21], where the predicted equations are compared with the original
in generated MWPs. Our results in Figure 3 indicate that MaPKG
equations. The answer accuracy is computed by the answers of
outperforms other models in both metrics, indicating that it not
unknowns in equations. For language diversity, we use Self-BLEU
only produces logical plans but also encourages number expression.
[29] to measure the diversity of generated problems. For language
quality, we select BLEU(average of BLEU-1,2,3,4) and METEOR. 3.2.3 Case Study. A representative example in Table 3 shows that
S2S-TF and BART generate problems with good language expres-
3.2 Results and Analysis sion but unsatisfying equation consistency. S2S-TF misrepresents
3.2.1 Main Results. Table 2 reports the evaluation results and the operator “−” as “the sum of cost”, and BART misinterprets “𝑛 0 ”
we observe that MaPKG outperforms the baselines on most occa- as the “total price of cakes”. Conversely, MaPKG precisely perceives
sions. Specifically, for problem solvability, MaPKG achieves the best equation subtrees with the planning module and incorporates com-
performance overall, which verifies the effectiveness of the “plan- monsense knowledge to avoid these situations.
then-generate” principle for generating logically reasonable MWPs.
For language quality and diversity, MaPKG and BART achieve the 4 CONCLUSION
best results, proving that our MaPKG is competitive with the pre-
In this paper, we proposed a novel MWP generation model (MaPKG)
training language models in language by knowledge enhancement.
following the “plan-then-generate” steps. Specifically, we intro-
We also conduct ablation studies in Table 2. Specifically, we in-
duced the subtree structure and external knowledge into represen-
troduce “w/o SP” which omits the subtree-based dynamic planning
tation modeling. Then, we proposed a dynamic planning module to
module, “w/o KG” which replaces the keyword subgraph encoder
make sentence-level expression plans based on equation subtrees.
with the initial keyword embeddings , and “w/o DA” which replaces
Next, we designed a dual attention mechanism to fuse equations
the dual-attention decoder with a standard Transformer decoder.
and topic knowledge in word-level generation. Extensive experi-
We conclude the results as follows. First, all components con-
ments on two MWP datasets verified that our MaPKG improved
tribute to MWP generation since removing any module leads to
the solvability, quality and diversity of generated problems.
performance degradation. Second, "w/o DA" diminishes all metrics
significantly, implying that the fusion of equation and keyword is
the basis for correctly describing math information. Third, “w/o SP” ACKNOWLEDGMENTS
diminishes the results greatly in equation consistency. It indicates This research was partially supported by grants from the National
that our proposed planning module is necessary and crucial for Natural Science Foundation of China (Grants No. 62106244, No.
generating solvable problems. Fourth, the performance of “w/o KG” U20A20229), and the University Synergy Innovation Program of
shows that knowledge enhancement benefits language diversity. Anhui Province (No. GXXT-2022-042).

1783
A Mathematical Word Problem Generator with Structure Planning and Knowledge Enhancement SIGIR ’23, July 23–27, 2023, Taipei, Taiwan

REFERENCES [15] Jiayu Liu, Zhenya Huang, Xin Lin, Qi Liu, Jianhui Ma, and Enhong Chen. 2022.
[1] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural ma- A Cognitive Solver with Autonomously Knowledge Learning for Reasoning
chine translation by jointly learning to align and translate. arXiv preprint Mathematical Answers. In 2022 IEEE International Conference on Data Mining
arXiv:1409.0473 (2014). (ICDM). IEEE, 269–278.
[2] Tianyang Cao, Shuang Zeng, Xiaodan Xu, Mairgup Mansur, and Baobao Chang. [16] Jiayu Liu, Zhenya Huang, Chengxiang Zhai, and Qi Liu. 2023. Learning by
2022. DISK: Domain-constrained Instance Sketch for Math Word Problem Gener- Applying: A General Framework for Mathematical Reasoning via Enhancing
ation. arXiv preprint arXiv:2204.04686 (2022). Explicit Knowledge Learning. arXiv preprint arXiv:2302.05717 (2023).
[3] Tianyang Cao, Shuang Zeng, Songge Zhao, Mairgup Mansur, and Baobao Chang. [17] Tianqiao Liu, Qiang Fang, Wenbiao Ding, Hang Li, Zhongqin Wu, and Zitao Liu.
2021. Generating math word problems from equations with topic consistency 2021. Mathematical Word Problem Generation from Commonsense Knowledge
maintaining and commonsense enforcement. In Artificial Neural Networks and Graph and Equations (Proceedings of the 2021 Conference on Empirical Methods in
Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Natural Language Processing). Association for Computational Linguistics, 4225–
Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part III 30. 4240. https://doi.org/10.18653/v1/2021.emnlp-main.348
Springer, 66–79. [18] Oleksandr Polozov, Eleanor O’Rourke, Adam M Smith, Luke Zettlemoyer, Sumit
[4] Jian Guan, Xiaoxi Mao, Changjie Fan, Zitao Liu, Wenbiao Ding, and Minlie Gulwani, and Zoran Popović. 2015. Personalized mathematical word problem
Huang. 2021. Long Text Generation by Modeling Sentence-Level and Discourse- generation. In Twenty-Fourth International Joint Conference on Artificial Intelli-
Level Coherence. In Proceedings of the 59th Annual Meeting of the Association for gence.
[19] Ratish Puduppully, Li Dong, and Mirella Lapata. 2019. Data-to-text generation
Computational Linguistics and the 11th International Joint Conference on Natural
with content selection and planning. In Proceedings of the AAAI conference on
Language Processing (Volume 1: Long Papers). Association for Computational
artificial intelligence, Vol. 33. 6908–6915.
Linguistics, Online, 6379–6393. https://doi.org/10.18653/v1/2021.acl-long.499
[20] Jinghui Qin, Lihui Lin, Xiaodan Liang, Rumin Zhang, and Liang Lin. 2020.
[5] Zhe Hu, Hou Pong Chan, Jiachen Liu, Xinyan Xiao, Hua Wu, and Lifu Huang.
Semantically-Aligned Universal Tree-Structured Solver for Math Word Prob-
2022. PLANET: Dynamic Content Planning in Autoregressive Transformers for
lems. In Proceedings of the 2020 Conference on Empirical Methods in Natural
Long-form Text Generation. In Proceedings of the 60th Annual Meeting of the
Language Processing (EMNLP). Association for Computational Linguistics, Online,
Association for Computational Linguistics (Volume 1: Long Papers). Association
3780–3789. https://doi.org/10.18653/v1/2020.emnlp-main.309
for Computational Linguistics, Dublin, Ireland, 2288–2305. https://doi.org/10.
[21] Yibin Shen, Qianying Liu, Zhuoyuan Mao, Zhen Wan, Fei Cheng, and Sadao Kuro-
18653/v1/2022.acl-long.163
hashi. 2022. Seeking Diverse Reasoning Logic: Controlled Equation Expression
[6] Zhenya Huang, Xin Lin, Hao Wang, Qi Liu, Enhong Chen, Jianhui Ma, Yu Su, and
Generation for Solving Math Word Problems. In Proceedings of the 2nd Conference
Wei Tong. 2021. Disenqnet: Disentangled representation learning for educational
of the Asia-Pacific Chapter of the Association for Computational Linguistics and
questions. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge
the 12th International Joint Conference on Natural Language Processing (Volume 2:
Discovery & Data Mining. 696–704.
Short Papers). Association for Computational Linguistics, Online only, 254–260.
[7] Zhenya Huang, Qi Liu, Weibo Gao, Jinze Wu, Yu Yin, Hao Wang, and Enhong
https://aclanthology.org/2022.aacl-short.32
Chen. 2020. Neural mathematical solver with enhanced formula structure. In
[22] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,
Proceedings of the 43rd International ACM SIGIR Conference on Research and
Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all
Development in Information Retrieval. 1729–1732.
you need. Advances in neural information processing systems 30 (2017).
[8] Nabila Ahmed Khodeir, Hanan Elazhary, and Nayer Wanas. 2018. Generating
[23] Ke Wang and Zhendong Su. 2016. Dimensionally Guided Synthesis of Mathe-
story problems via controlled parameters in a web-based intelligent tutoring
matical Word Problems.. In IJCAI. 2661–2668.
system. The International Journal of Information and Learning Technology 35, 3
[24] Zichao Wang, Andrew Lan, and Richard Baraniuk. 2021. Math Word Problem
(2018), 199–216.
Generation with Mathematical Consistency and Problem Context Constraints
[9] Ghader Kurdi, Jared Leo, Bijan Parsia, Uli Sattler, and Salam Al-Emari. 2020. A
(Proceedings of the 2021 Conference on Empirical Methods in Natural Language
systematic review of automatic question generation for educational purposes.
Processing). Association for Computational Linguistics, 5986–5999. https://doi.
International Journal of Artificial Intelligence in Education 30 (2020), 121–204.
org/10.18653/v1/2021.emnlp-main.484
[10] Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman
[25] Kai Zhang, Qi Liu, Hao Qian, Biao Xiang, Qing Cui, Jun Zhou, and Enhong Chen.
Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. 2019. Bart: Denoising
2023. EATN: An Efficient Adaptive Transfer Network for Aspect-Level Sentiment
sequence-to-sequence pre-training for natural language generation, translation,
Analysis. IEEE Transactions on Knowledge and Data Engineering 35, 1 (Jan 2023),
and comprehension. arXiv preprint arXiv:1910.13461 (2019).
377–389. https://doi.org/10.1109/TKDE.2021.3075238
[11] Junyi Li, Wayne Xin Zhao, Zhicheng Wei, Nicholas Jing Yuan, and Ji-Rong Wen.
[26] Kun Zhang, Guangyi Lv, Le Wu, Enhong Chen, Qi Liu, and Meng Wang. 2021.
2021. Knowledge-based review generation by coherence enhanced text planning.
Ladra-net: Locally aware dynamic reread attention net for sentence semantic
In Proceedings of the 44th International ACM SIGIR Conference on Research and
matching. IEEE Transactions on Neural Networks and Learning Systems (2021).
Development in Information Retrieval. 183–192.
[27] Tiancheng Zhao, Ran Zhao, and Maxine Eskenazi. 2017. Learning Discourse-level
[12] Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. 2015. Gated
Diversity for Neural Dialog Models using Conditional Variational Autoencoders.
graph sequence neural networks. arXiv preprint arXiv:1511.05493 (2015).
In Proceedings of the 55th Annual Meeting of the Association for Computational
[13] Zekang Li, Jinchao Zhang, Zhengcong Fei, Yang Feng, and Jie Zhou. 2021. Con-
Linguistics (Volume 1: Long Papers). Association for Computational Linguistics,
versations Are Not Flat: Modeling the Dynamic Information Flow across Dia-
Vancouver, Canada, 654–664. https://doi.org/10.18653/v1/P17-1061
logue Utterances. In Proceedings of the 59th Annual Meeting of the Association for
[28] Qingyu Zhou and Danqing Huang. 2019. Towards generating math word prob-
Computational Linguistics and the 11th International Joint Conference on Natural
lems from equations and topics. In Proceedings of the 12th International Conference
Language Processing (Volume 1: Long Papers). Association for Computational
on Natural Language Generation. 494–503.
Linguistics, Online, 128–138. https://doi.org/10.18653/v1/2021.acl-long.11
[29] Yaoming Zhu, Sidi Lu, Lei Zheng, Jiaxian Guo, Weinan Zhang, Jun Wang, and
[14] Xin Lin, Zhenya Huang, Hongke Zhao, Enhong Chen, Qi Liu, Hao Wang, and
Yong Yu. 2018. Texygen: A benchmarking platform for text generation models.
Shijin Wang. 2021. Hms: A hierarchical solver with dependency-enhanced
In The 41st international ACM SIGIR conference on research & development in
understanding for math word problem. In Proceedings of the AAAI conference on
information retrieval. 1097–1100.
artificial intelligence, Vol. 35. 4232–4240.

1784

You might also like