A Mathematical Word Problem Generator with Structure
A Mathematical Word Problem Generator with Structure
Enhong Chen
Anhui Province Key Laboratory of
Big Data Analysis and Application,
University of Science and Technology
of China & State Key Laboratory of
Cognitive Intelligence
Hefei, Anhui, China
[email protected]
ABSTRACT experiments on two MWP datasets show our model can guaran-
Automatically generating controllable and diverse mathematical tee more solvable, high-quality, and diverse problems. Our code is
word problems (MWPs) which conform to equations and topics is available at https://github.com/KenelmQLH/MaPKG.git
a crucial task in information retrieval and natural language genera-
tion. Recent deep learning models mainly focus on improving the CCS CONCEPTS
problem readability but overlook the mathematical logic coherence, • Computing methodologies → Natural language generation.
which tends to generate unsolvable problems. In this paper, we draw
inspiration from the human problem-designing process and pro- KEYWORDS
pose a Mathematical structure Planning and Knowledge enhanced MWP generation, planning mechanism, knowledge enhancement
Generation model (MaPKG), following the “plan-then-generate”
steps. Specifically, we propose a novel dynamic planning module ACM Reference Format:
to make sentence-level equation plans and a dual-attention mecha- Longhu Qin, Jiayu Liu, Zhenya Huang, Kai Zhang, Qi Liu, Binbin Jin, and En-
nism for word-level generation, incorporating equation structure hong Chen. 2023. A Mathematical Word Problem Generator with Structure
Planning and Knowledge Enhancement. In Proceedings of the 46th Interna-
representation and external commonsense knowledge. Extensive
tional ACM SIGIR Conference on Research and Development in Information
Retrieval (SIGIR ’23), July 23–27, 2023, Taipei, Taiwan. ACM, New York, NY,
∗ Corresponding author. USA, 5 pages. https://doi.org/10.1145/3539618.3591937
Permission to make digital or hard copies of all or part of this work for personal or 1 INTRODUCTION
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation Automatic problem generation has attracted much attention in in-
on the first page. Copyrights for components of this work owned by others than the formation retrieval and natural language generation fields, which
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission could provide important educational resources for several applica-
and/or a fee. Request permissions from [email protected]. tions [9, 15, 16]. In this paper, we study the task of automatically
SIGIR ’23, July 23–27, 2023, Taipei, Taiwan generating mathematical word problems (MWPs), which not only
© 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 978-1-4503-9408-6/23/07. . . $15.00 asks for semantic understanding [6, 14, 25] of the specific equations
https://doi.org/10.1145/3539618.3591937 and topics, but requires mathematical logic to generate controllable
1780
(b) Subtree Encoder (d) Dynamic Planning Module
𝑠
𝑠 𝑃
f ℎ
Guide
f
Plan
𝑠 𝑠
f f ℎ ℎ ℎ
…
ℎ
sentence pooling
n ∗ x n ∗ x = n 𝑠 𝑠 𝑠 𝑠
SIGIR ’23, July 23–27, 2023, Taipei, Taiwan Longhu Qin et al.
Problem1: Jack bought some flowers for his teachers as gifts. He bought 𝐧𝐧𝟎𝟎 flowers at the shop, Feed Forward
which cost $ 𝐧𝐧𝟑𝟑 in total. Each rose cost $ 𝐧𝐧𝟏𝟏 and each lily cost $ 𝐧𝐧𝟐𝟐 . How many roses and lilies ℎ ℎ ℎ …ℎ ℎ ℎ ℎ …ℎ
did Jack buy? … … …
Add & Norm
Equation Expression Plan Commonsense Knowledge 𝐻 Dynamic Planning Module
Topic-Attention
n0 n3 n1 n2 Subject Relation Object
flower IsA gift
= = buy HasSubevent shop Dual-Attention Decoder Add & Norm
+ n0 + n3 shop HasA flower
…
Equation-Attention
𝑥𝑥0 𝑥𝑥1 * * rose IsA flower 𝑆 𝐻 𝐻
lily IsA flower
n1 x0 n2 x1 Add & Norm
… … …
Mask
Topic2 Keywords: squirrel pick nuts sunny rainy Self-Attention
Subtree Sequence Subgraph
KG 𝐿
Encoder Encoder Encoder
Problem2: A little squirrel picks nuts in the forest every day. On sunny days, it can pick 𝐧𝐧𝟏𝟏
Subgraph
nuts. On rainy days, it can only pick 𝐧𝐧𝟐𝟐 nuts. In the latest 𝐧𝐧𝟎𝟎 days, the squirrel has picked a Output Embedding
Equations Keywords
total of 𝐧𝐧𝟑𝟑 nuts. How many sunny days and rainy days were there during this time?
(b) Subtree Encoder (d) Dynamic Planning Module
Equation Expression Plan Commonsense Knowledge 𝑠
n1 n2 n0 n3 Subject Relation Object 𝑠 𝑃
f ℎ
squirrel AtLocation forest f
Guide
= = Plan
squirrel Desires nuts 𝑠 𝑠
+ n0 + n3 rainy IsA weather f f ℎ ℎ ℎ
…
ℎ
𝑥𝑥0 𝑥𝑥1 * * rainy RelatedTo day
sunny RelatedTo day sentence pooling
n1 x0 n2 x1
… … … n ∗ x n ∗ x = n 𝑠 𝑠 𝑠 𝑠
Toward this 𝐻𝐻
goal, Dynamic
𝑜𝑜
𝐿𝐿
therePlanning
areModule
several efforts in the task including
Add & Norm
generation following the “plan-then-generate” process with the
rule-based methods and neural network-based Topic-Attention ones. Specifically, encoder-decoder architecture. Specifically, in the encoder, we in-
the earlier rule-based methods [8, 18, 23] …always generate
Dual-Attention Decoder Add & Norm a problem troduce the subtree structure and external knowledge to represent
Equation-Attention
with predefined𝑆𝑆 𝑒𝑒 math𝐻𝐻
rules or text
𝑒𝑒 templates,
𝐻𝐻
and however, they 𝑘𝑘
the equations and keywords respectively. In the decoder, we first
n2 n0 n3
generally suffer from manual construction cost andAddlimited & Norm template propose a novel dynamic planning module to make sentence-level
Mask
diversity. Subtree
Recently, researchers
Sequence
change the
Subgraph
attention to
Self-Attention neural net- expression plans based on equation subtrees. Then, we design a
× 𝐿𝐿
Encoder
work approaches, whichKGfollow
Encoder Encoder
the sequence-to-sequence
Subgraph Output Embedding
architec- dual-attention mechanism to fuse equation information and topic
Keywords
ture to generate Equations diverse MWPs [17, 28]. Moreover, some works ex- knowledge to generate problem word by word. Extensive experi-
(b) Subtree Encoder (d) Dynamic Planning Module
plore the possibilities including 𝑠𝑠4 the pre-trained language model [24], ments on two MWP datasets verify that our MaPKG improves the
𝑠𝑠3 f 𝑃𝑃 𝑖𝑖 𝑠𝑠
ℎ
retrieval-based generation f
[2], and commonsenseGuideenhancement[3] 𝑖𝑖
generation results in terms of solvability, quality and diversity.
Plan
𝑠𝑠1 𝑠𝑠2
etc. Although f
they have f
achieved great ℎ
success,ℎ they ℎ
generally
ℎ
fo- 𝑝𝑝 𝑠𝑠i 𝑠𝑠𝑖𝑖 𝑠𝑠𝑖𝑖
… 𝑖𝑖 1 2 𝑎𝑎
1781
A Mathematical Word Problem Generator with Structure Planning and Knowledge Enhancement SIGIR ’23, July 23–27, 2023, Taipei, Taiwan
Table 1: The statistics of the datasets. which induces a subtree planning loss as (e
𝑔 is the golden plans):
Statistics Lmwp-G Hmwp-G L𝑝𝑙𝑎𝑛 = −
Í
(𝑇𝑖 ) −
Í
(1 − 𝑃 (𝑇𝑖 ) ) .
e log 𝑃
𝑇𝑖 ∈𝑔 𝑔 log
𝑇𝑖 ∉e (4)
Num. problems 5447 5491
Avg Num. words 39.6 60.1 For the second goal, we design a guidance loss to enhance the
Num. templates 48 2144 dependence of the sentence on the current plan. Specifically, we use
Avg Num. equations 2.0 1.3 the mean squared error to minimize the latent plan representation
Avg Num. keywords 8.84 8.80 ℎ𝑝 and the sentence representation ℎ𝑠 . ℎ𝑠 is obtained by average
Avg Num. Concepts 64.6 49.66
Avg Num. Triples 98.24 71.13
pooling the hidden states of tokens in the generated sentence.
L𝑠𝑒𝑛𝑡 = 𝑀𝑆𝐸 (ℎ𝑝 , ℎ𝑠 ). (5)
For sequential token representations, we use a Bi-GRU based en- With the above proper planning and forced guiding, the decoder
coder to represent token sequence embeddings as 𝑯 𝑒 = {ℎ𝑒1 , ..., ℎ𝑒|𝑒 | }. below can guarantee the equation logic in word generation, to
ensure the problem quality and solvability.
For subtree representations, we first convert equation 𝑒 into
a binary expression tree and denote each subtree 𝑖 as a triplet 2.2.4 Dual-Attention Decoder. Our decoder adopts the basic Trans-
𝑇𝑖 = (𝑜, 𝑙, 𝑟 ). Then we propose a subtree encoder to learn the sub- former manner to conduct word-level generation. Specifically, we
tree embeddings in a bottom-up way as shown in Figure 2 (b). propose a novel dual-attention mechanism in it to combine the
Specifically, the subtree embedding 𝑠𝑖 of 𝑖 is obtained by: equation information with the knowledge-aware keyword infor-
mation. As shown in Figure 2(c), we design an equation attention
𝑠𝑖 = 𝑊𝑠 [ℎ𝑜𝑒 ; 𝑠𝑙 ; 𝑠𝑟 ] + 𝑏𝑠 , (2)
layer and a topic attention layer as follows:
where ℎ𝑜𝑒 ∈ 𝑯𝑒is the representation of operator 𝑜, 𝑠𝑙 , 𝑠𝑟 are embed-
𝐻𝑙′ = 𝐻𝑙 + 𝑀𝑢𝑙𝑡𝑖𝐻𝑒𝑎𝑑𝐸 (𝐻𝑙 , 𝑯 𝑒 , 𝑯 𝑒 ), (6)
dings of the left child 𝑙 and right child 𝑟 respectively. For every leaf
node, we set 𝑠 ∗ = ℎ𝑒∗ ∈ 𝑯 𝑒 . The hierarchical subtree embeddings 𝐻𝑙′′ = 𝐻𝑙′ + 𝑀𝑢𝑙𝑡𝑖𝐻𝑒𝑎𝑑𝑇 (𝐻𝑙′ , 𝑯 𝑘 , 𝑯 𝑘 ), (7)
are donated as 𝑺 𝑒 = {𝑠 1, ..., 𝑠𝑏 }, where 𝑏 is the number of subtrees. where 𝐻𝑙 represents the hidden states from the mask self-attention
2.2.2 Knowledge-aware Keyword Representation. Keywords pro- layer. With 𝐿 decoder blocks, the dual-attention mechanism can
vide essential semantic information to generate a problem. However, fuse the equation information and topic information iteratively.
it is still not enough to only perceive them isolatedly, which might Finally, the hidden states in the last decoder layer ℎ𝑜 are passed
lead to improper expression of math information (e.g., generate to another feed-forward layer with softmax to estimate output
distribution and generate the problem words. Especially, if [𝑆𝑃] is
that the number of “roses” is the sum of “flowers” and “lilies”).
generated, ℎ𝑜 is taken as the latent plan representation ℎ𝑝 described
Therefore, when representing the keywords, we retrieve external in Section 2.2.3. The generation loss is defined as follows:
commonsense knowledge to promote the understanding of them. Í𝑙
L𝐿𝑀 = − 𝑡 =1 log 𝑃 (𝑦𝑡 | 𝑦 <𝑡 , 𝑬, 𝑻 ) , (8)
Specifically, given keywords 𝑻 , we first regard them as central con-
cepts and extract their 𝐾-hop neighbor concepts 𝑽 from the public Finally, we jointly minimize the following loss with hyperparam-
eters 𝛼, 𝛽 to balance sentence planning and word generation:
knowledge bases ConceptNet 1 and HowNet2 , which form a key-
L = L𝐿𝑀 + 𝛼 L𝑝𝑙𝑎𝑛 + 𝛽 L𝑠𝑒𝑛𝑡 . (9)
word subgraph G = {V, E} where V = 𝑻 ∪𝑽 and E are edges taken
from the knowledge bases. Then, we learn the node representations
{𝑔𝑣 | 𝑣 ∈ V} with a keyword subgraph encoder, which is imple- 3 EXPERIMENTS
mented with GGNN [12]. After 𝑁 iterations of information passing 3.1 Experimental Dataset and Setup
on G, we concat the initial and 𝑁 -th iteration node presentations 3.1.1 Datasets. We conduct our experiments based on two MWP
𝑔0𝑣 , 𝑔𝑁
𝑣 for each node 𝑣, i.e. ℎ𝑘 = 𝑊 [𝑔 𝑣 ; 𝑔 𝑣 ] + 𝑏 , and obtain the
𝑣 𝑘 0 𝑁 𝑘 datasets. (1) Lmwp [17] is a dataset with two linear equations and
knowledge-aware keyword representations as 𝑯 𝑘 = {ℎ𝑘𝑣 | 𝑣 ∈ 𝑻 }. two unknown variables for each problem. (2) Hmwp [20] consists of
2.2.3 Dynamic Planning Module. To accurately express the logic hybrid MWPs including both one-known and two-unknown, which
of equations in problems, we generate subtree plans as a skeleton contains more various equation templates and longer problems.
to guide the sentence expression order. Specifically, inspired by Based on them, we extract topic keywords from each problem
[4, 5, 13], we treat the plans as additional tokens denoted as [𝑆𝑃] with jionlp3 and annotate the subtree positions in equations as
and generate them with words in problem 𝑦 dynamically by a dual- the golden plans. Table 1 summarizes the basic statistics of the
attention mechanism described in Section 2.2.4, which derives a annotated datasets Lmwp-G and Hmwp-G.
hidden state ℎ𝑝 for each [𝑆𝑃] as its latent representation. 3.1.2 Experimental Setup. We set the embedding dim as 512 and
In this module, we aim to ensure that (1) ℎ𝑝 indeed conveys the the number of transformer layers as 6. The keyword sub-graphs are
plan information about one or more subtrees in equations 𝑬 and (2) constructed by 1-hop neighbors (i.e., 𝐾 = 1) , and 𝑁 = 2 in GGNN.
ℎ𝑝 guides the generation of the current sentence (e.g., “Jack bought Hyperparameter 𝛼 and 𝛽 are both set to 0.5.
... 𝑛 0 flowers ...” is grounded on the current plan “𝑥 0 + 𝑥 1 = 𝑛 0 ”). 3.1.3 Baseline and Evaluation. We compare our model against sev-
For the first goal, we introduce a prediction task to determine eral strong baselines: (1) CVAE [27] is a GRU-based sequence-
which subtree(s) the plan ℎ𝑝 indicates. Specifically, we calculate the
to-sequence model with Conditional VAE. (2) S2S-GRU [1] is a
probability that ℎ𝑝 relates to the subtree 𝑇𝑖 by a pointer-network:
GRU-based sequence-to-sequence model with Attention. (3) MAG-
𝑃 (𝑇𝑖 ) = 𝜎 (𝑊1𝑇 tanh 𝑊2 𝑠𝑖 ; ℎ𝑝 + 𝑏 1 ) + 𝑏 2 , (3) NET [28] is a MWP generator with entity-enforced loss. (4) S2S-TF
[22] is a standard Transformer-based sequence-to-sequence model.
1 https://conceptnet.io
2 https://openhownet.thunlp.org 3 http://www.jionlp.com/
1782
SIGIR ’23, July 23–27, 2023, Taipei, Taiwan Longhu Qin et al.
1783
A Mathematical Word Problem Generator with Structure Planning and Knowledge Enhancement SIGIR ’23, July 23–27, 2023, Taipei, Taiwan
REFERENCES [15] Jiayu Liu, Zhenya Huang, Xin Lin, Qi Liu, Jianhui Ma, and Enhong Chen. 2022.
[1] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural ma- A Cognitive Solver with Autonomously Knowledge Learning for Reasoning
chine translation by jointly learning to align and translate. arXiv preprint Mathematical Answers. In 2022 IEEE International Conference on Data Mining
arXiv:1409.0473 (2014). (ICDM). IEEE, 269–278.
[2] Tianyang Cao, Shuang Zeng, Xiaodan Xu, Mairgup Mansur, and Baobao Chang. [16] Jiayu Liu, Zhenya Huang, Chengxiang Zhai, and Qi Liu. 2023. Learning by
2022. DISK: Domain-constrained Instance Sketch for Math Word Problem Gener- Applying: A General Framework for Mathematical Reasoning via Enhancing
ation. arXiv preprint arXiv:2204.04686 (2022). Explicit Knowledge Learning. arXiv preprint arXiv:2302.05717 (2023).
[3] Tianyang Cao, Shuang Zeng, Songge Zhao, Mairgup Mansur, and Baobao Chang. [17] Tianqiao Liu, Qiang Fang, Wenbiao Ding, Hang Li, Zhongqin Wu, and Zitao Liu.
2021. Generating math word problems from equations with topic consistency 2021. Mathematical Word Problem Generation from Commonsense Knowledge
maintaining and commonsense enforcement. In Artificial Neural Networks and Graph and Equations (Proceedings of the 2021 Conference on Empirical Methods in
Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Natural Language Processing). Association for Computational Linguistics, 4225–
Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part III 30. 4240. https://doi.org/10.18653/v1/2021.emnlp-main.348
Springer, 66–79. [18] Oleksandr Polozov, Eleanor O’Rourke, Adam M Smith, Luke Zettlemoyer, Sumit
[4] Jian Guan, Xiaoxi Mao, Changjie Fan, Zitao Liu, Wenbiao Ding, and Minlie Gulwani, and Zoran Popović. 2015. Personalized mathematical word problem
Huang. 2021. Long Text Generation by Modeling Sentence-Level and Discourse- generation. In Twenty-Fourth International Joint Conference on Artificial Intelli-
Level Coherence. In Proceedings of the 59th Annual Meeting of the Association for gence.
[19] Ratish Puduppully, Li Dong, and Mirella Lapata. 2019. Data-to-text generation
Computational Linguistics and the 11th International Joint Conference on Natural
with content selection and planning. In Proceedings of the AAAI conference on
Language Processing (Volume 1: Long Papers). Association for Computational
artificial intelligence, Vol. 33. 6908–6915.
Linguistics, Online, 6379–6393. https://doi.org/10.18653/v1/2021.acl-long.499
[20] Jinghui Qin, Lihui Lin, Xiaodan Liang, Rumin Zhang, and Liang Lin. 2020.
[5] Zhe Hu, Hou Pong Chan, Jiachen Liu, Xinyan Xiao, Hua Wu, and Lifu Huang.
Semantically-Aligned Universal Tree-Structured Solver for Math Word Prob-
2022. PLANET: Dynamic Content Planning in Autoregressive Transformers for
lems. In Proceedings of the 2020 Conference on Empirical Methods in Natural
Long-form Text Generation. In Proceedings of the 60th Annual Meeting of the
Language Processing (EMNLP). Association for Computational Linguistics, Online,
Association for Computational Linguistics (Volume 1: Long Papers). Association
3780–3789. https://doi.org/10.18653/v1/2020.emnlp-main.309
for Computational Linguistics, Dublin, Ireland, 2288–2305. https://doi.org/10.
[21] Yibin Shen, Qianying Liu, Zhuoyuan Mao, Zhen Wan, Fei Cheng, and Sadao Kuro-
18653/v1/2022.acl-long.163
hashi. 2022. Seeking Diverse Reasoning Logic: Controlled Equation Expression
[6] Zhenya Huang, Xin Lin, Hao Wang, Qi Liu, Enhong Chen, Jianhui Ma, Yu Su, and
Generation for Solving Math Word Problems. In Proceedings of the 2nd Conference
Wei Tong. 2021. Disenqnet: Disentangled representation learning for educational
of the Asia-Pacific Chapter of the Association for Computational Linguistics and
questions. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge
the 12th International Joint Conference on Natural Language Processing (Volume 2:
Discovery & Data Mining. 696–704.
Short Papers). Association for Computational Linguistics, Online only, 254–260.
[7] Zhenya Huang, Qi Liu, Weibo Gao, Jinze Wu, Yu Yin, Hao Wang, and Enhong
https://aclanthology.org/2022.aacl-short.32
Chen. 2020. Neural mathematical solver with enhanced formula structure. In
[22] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,
Proceedings of the 43rd International ACM SIGIR Conference on Research and
Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all
Development in Information Retrieval. 1729–1732.
you need. Advances in neural information processing systems 30 (2017).
[8] Nabila Ahmed Khodeir, Hanan Elazhary, and Nayer Wanas. 2018. Generating
[23] Ke Wang and Zhendong Su. 2016. Dimensionally Guided Synthesis of Mathe-
story problems via controlled parameters in a web-based intelligent tutoring
matical Word Problems.. In IJCAI. 2661–2668.
system. The International Journal of Information and Learning Technology 35, 3
[24] Zichao Wang, Andrew Lan, and Richard Baraniuk. 2021. Math Word Problem
(2018), 199–216.
Generation with Mathematical Consistency and Problem Context Constraints
[9] Ghader Kurdi, Jared Leo, Bijan Parsia, Uli Sattler, and Salam Al-Emari. 2020. A
(Proceedings of the 2021 Conference on Empirical Methods in Natural Language
systematic review of automatic question generation for educational purposes.
Processing). Association for Computational Linguistics, 5986–5999. https://doi.
International Journal of Artificial Intelligence in Education 30 (2020), 121–204.
org/10.18653/v1/2021.emnlp-main.484
[10] Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman
[25] Kai Zhang, Qi Liu, Hao Qian, Biao Xiang, Qing Cui, Jun Zhou, and Enhong Chen.
Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. 2019. Bart: Denoising
2023. EATN: An Efficient Adaptive Transfer Network for Aspect-Level Sentiment
sequence-to-sequence pre-training for natural language generation, translation,
Analysis. IEEE Transactions on Knowledge and Data Engineering 35, 1 (Jan 2023),
and comprehension. arXiv preprint arXiv:1910.13461 (2019).
377–389. https://doi.org/10.1109/TKDE.2021.3075238
[11] Junyi Li, Wayne Xin Zhao, Zhicheng Wei, Nicholas Jing Yuan, and Ji-Rong Wen.
[26] Kun Zhang, Guangyi Lv, Le Wu, Enhong Chen, Qi Liu, and Meng Wang. 2021.
2021. Knowledge-based review generation by coherence enhanced text planning.
Ladra-net: Locally aware dynamic reread attention net for sentence semantic
In Proceedings of the 44th International ACM SIGIR Conference on Research and
matching. IEEE Transactions on Neural Networks and Learning Systems (2021).
Development in Information Retrieval. 183–192.
[27] Tiancheng Zhao, Ran Zhao, and Maxine Eskenazi. 2017. Learning Discourse-level
[12] Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. 2015. Gated
Diversity for Neural Dialog Models using Conditional Variational Autoencoders.
graph sequence neural networks. arXiv preprint arXiv:1511.05493 (2015).
In Proceedings of the 55th Annual Meeting of the Association for Computational
[13] Zekang Li, Jinchao Zhang, Zhengcong Fei, Yang Feng, and Jie Zhou. 2021. Con-
Linguistics (Volume 1: Long Papers). Association for Computational Linguistics,
versations Are Not Flat: Modeling the Dynamic Information Flow across Dia-
Vancouver, Canada, 654–664. https://doi.org/10.18653/v1/P17-1061
logue Utterances. In Proceedings of the 59th Annual Meeting of the Association for
[28] Qingyu Zhou and Danqing Huang. 2019. Towards generating math word prob-
Computational Linguistics and the 11th International Joint Conference on Natural
lems from equations and topics. In Proceedings of the 12th International Conference
Language Processing (Volume 1: Long Papers). Association for Computational
on Natural Language Generation. 494–503.
Linguistics, Online, 128–138. https://doi.org/10.18653/v1/2021.acl-long.11
[29] Yaoming Zhu, Sidi Lu, Lei Zheng, Jiaxian Guo, Weinan Zhang, Jun Wang, and
[14] Xin Lin, Zhenya Huang, Hongke Zhao, Enhong Chen, Qi Liu, Hao Wang, and
Yong Yu. 2018. Texygen: A benchmarking platform for text generation models.
Shijin Wang. 2021. Hms: A hierarchical solver with dependency-enhanced
In The 41st international ACM SIGIR conference on research & development in
understanding for math word problem. In Proceedings of the AAAI conference on
information retrieval. 1097–1100.
artificial intelligence, Vol. 35. 4232–4240.
1784