0% found this document useful (0 votes)
24 views11 pages

Abstract Syntax Networks For Code Generation and Semantic Parsing

Uploaded by

josezemanate
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views11 pages

Abstract Syntax Networks For Code Generation and Semantic Parsing

Uploaded by

josezemanate
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Abstract Syntax Networks for Code Generation and Semantic Parsing

Maxim Rabinovich∗ Mitchell Stern∗ Dan Klein


Computer Science Division
University of California, Berkeley
{rabinovich,mitchell,klein}@cs.berkeley.edu

Abstract
name: [
'D', 'i', 'r', 'e', ' ',
Tasks like code generation and semantic 'W', 'o', 'l', 'f', ' ',
'A', 'l', 'p', 'h', 'a']
parsing require mapping unstructured (or
arXiv:1704.07535v1 [cs.CL] 25 Apr 2017

cost: ['2']
partially structured) inputs to well-formed, type: ['Minion']
rarity: ['Common']
executable outputs. We introduce ab- race: ['Beast']
class: ['Neutral']
stract syntax networks, a modeling frame- description: [
work for these problems. The outputs 'Adjacent', 'minions', 'have',
'+', '1', 'Attack', '.']
are represented as abstract syntax trees health: ['2']
attack: ['2']
(ASTs) and constructed by a decoder with durability: ['-1']
a dynamically-determined modular struc-
ture paralleling the structure of the output class DireWolfAlpha(MinionCard):
tree. On the benchmark H EARTHSTONE def __init__(self):
super().__init__(
dataset for code generation, our model ob- "Dire Wolf Alpha", 2, CHARACTER_CLASS.ALL,
CARD_RARITY.COMMON, minion_type=MINION_TYPE.BEAST)
tains 79.2 BLEU and 22.7% exact match def create_minion(self, player):
accuracy, compared to previous state-of- return Minion(2, 2, auras=[
Aura(ChangeAttack(1), MinionSelector(Adjacent()))
the-art values of 67.1 and 6.1%. Further- ])
more, we perform competitively on the
ATIS, J OBS, and G EO semantic parsing Figure 1: Example code for the “Dire Wolf Alpha”
datasets with no task-specific engineering. Hearthstone card.
1 Introduction show me the fare from ci0 to ci1
Tasks like semantic parsing and code generation lambda $0 e
are challenging in part because they are struc- ( exists $1 ( and ( from $1 ci0 )
tured (the output must be well-formed) but not ( to $1 ci1 )
( = ( fare $1 ) $0 ) ) )
synchronous (the output structure diverges from
the input structure). Figure 2: Example of a query and its logical form
Sequence-to-sequence models have proven ef- from the ATIS dataset. The ci0 and ci1 tokens
fective for both tasks (Dong and Lapata, 2016; are entity abstractions introduced in preprocess-
Ling et al., 2016), using encoder-decoder frame- ing (Dong and Lapata, 2016).
works to exploit the sequential structure on both
the input and output side. Yet these approaches
do not account for much richer structural con- version of CCG-based semantic parsing (Zettle-
straints on outputs—including well-formedness, moyer and Collins, 2005).
well-typedness, and executability. The well- In this work, we introduce abstract syntax
formedness case is of particular interest, since it networks (ASNs), an extension of the standard
can readily be enforced by representing outputs as encoder-decoder framework utilizing a modular
abstract syntax trees (ASTs) (Aho et al., 2006), an decoder whose submodels are composed to na-
approach that can be seen as a much lighter weight tively generate ASTs in a top-down manner. The

Equal contribution. decoding process for any given input follows a dy-
namically chosen mutual recursion between the ing a limited form of top-down recursion, but
modules, where the structure of the tree being without the modularity or tight coupling between
produced mirrors the call graph of the recursion. output grammar and model characteristic of our
We implement this process using a decoder model approach.
built of many submodels, each associated with a Neural (and probabilistic) modeling of code, in-
specific construct in the AST grammar and in- cluding for prediction problems, has a longer his-
voked when that construct is needed in the out- tory. Allamanis et al. (2015) and Maddison and
put tree. As is common with neural approaches to Tarlow (2014) proposed modeling code with a
structured prediction (Chen and Manning, 2014; neural language model, generating concrete syn-
Vinyals et al., 2015), our decoder proceeds greed- tax trees in left-first depth-first order, focusing on
ily and accesses not only a fixed encoding but metrics like perplexity and applications like code
also an attention-based representation of the in- snippet retrieval. More recently, Shin et al. (2017)
put (Bahdanau et al., 2014). attacked the same problem using a grammar-based
Our model significantly outperforms previous variational autoencoder with top-down generation
architectures for code generation and obtains com- similar to ours instead. Meanwhile, a separate line
petitive or state-of-the-art results on a suite of se- of work has focused on the problem of program
mantic parsing benchmarks. On the H EARTH - induction from input-output pairs (Balog et al.,
STONE dataset for code generation, we achieve a 2016; Liang et al., 2010; Menon et al., 2013).
token BLEU score of 79.2 and an exact match ac- The prediction framework most similar in spirit
curacy of 22.7%, greatly improving over the pre- to ours is the doubly-recurrent decoder network in-
vious best results of 67.1 BLEU and 6.1% exact troduced by Alvarez-Melis and Jaakkola (2017),
match (Ling et al., 2016). which propagates information down the tree using
The flexibility of ASNs makes them readily ap- a vertical LSTM and between siblings using a hor-
plicable to other tasks with minimal adaptation. izontal LSTM. Our model differs from theirs in
We illustrate this point with a suite of seman- using a separate module for each grammar con-
tic parsing experiments. On the J OBS dataset, struct and learning separate vertical updates for
we improve on previous state-of-the-art, achiev- siblings when the AST labels require all siblings
ing 92.9% exact match accuracy as compared to to be jointly present; we do, however, use a hori-
the previous record of 90.7%. Likewise, we per- zontal LSTM for nodes with variable numbers of
form competitively on the ATIS and G EO datasets, children. The differences between our models re-
matching or exceeding the exact match reported flect not only design decisions, but also differences
by Dong and Lapata (2016), though not quite in data—since ASTs have labeled nodes and la-
reaching the records held by the best previous se- beled edges, they come with additional structure
mantic parsing approaches (Wang et al., 2014). that our model exploits.
Apart from ours, the best results on the code-
1.1 Related work generation task associated with the H EARTH -
Encoder-decoder architectures, with and without STONE dataset are based on a sequence-to-
attention, have been applied successfully both to sequence approach to the problem (Ling et al.,
sequence prediction tasks like machine translation 2016). Abstract syntax networks greatly improve
and to tree prediction tasks like constituency pars- on those results.
ing (Cross and Huang, 2016; Dyer et al., 2016; Previously, Andreas et al. (2016) introduced
Vinyals et al., 2015). In the latter case, work has neural module networks (NMNs) for visual ques-
focused on making the task look like sequence-to- tion answering, with modules corresponding to
sequence prediction, either by flattening the output linguistic substructures within the input query.
tree (Vinyals et al., 2015) or by representing it as The primary purpose of the modules in NMNs is
a sequence of construction decisions (Cross and to compute deep features of images in the style of
Huang, 2016; Dyer et al., 2016). Our work dif- convolutional neural networks (CNN). These fea-
fers from both in its use of a recursive top-down tures are then fed into a final decision layer. In
generation procedure. contrast to the modules we describe here, NMN
Dong and Lapata (2016) introduced a sequence- modules do not make decisions about what to gen-
to-sequence approach to semantic parsing, includ- erate or which modules to call next, nor do they
ClassDef Call
name func args
body
bases
Name
identifier

“DireWolfAlpha” identifier Call Call

Name FunctionDef FunctionDef “Aura” func args func args

Name Name
identifier identifier ... identifier ...

“MinionCard” “__init__” “create_minion” identifier Num identifier Call

“ChangeAttack” “MinionSelector” func args

object Name
1
identifier

“Adjacent”

(a) The root portion of the AST. (b) Excerpt from the same AST, corresponding to the code snip-
pet Aura(ChangeAttack(1),MinionSelector(Adjacent())).

Figure 3: Fragments from the abstract syntax tree corresponding to the example code in Figure 1. Blue
boxes represent composite nodes, which expand via a constructor with a prescribed set of named children.
Orange boxes represent primitive nodes, with their corresponding values written underneath. Solid black
squares correspond to constructor fields with sequential cardinality, such as the body of a class
definition (Figure 3a) or the arguments of a function call (Figure 3b).

primitive types: identifier, object, ...


maintain recurrent state.
stmt
2 Data Representation = FunctionDef(
identifier name, arg* args, stmt* body)
2.1 Abstract Syntax Trees | ClassDef(
identifier name, expr* bases, stmt* body)
Our model makes use of the Abstract Syntax | Return(expr? value)
| ...
Description Language (ASDL) framework (Wang
et al., 1997), which represents code fragments as expr
trees with typed nodes. Primitive types correspond = BinOp(expr left, operator op, expr right)
| Call(expr func, expr* args)
to atomic values, like integers or identifiers. Ac- | Str(string s)
cordingly, primitive nodes are annotated with a | Name(identifier id, expr_context ctx)
primitive type and a value of that type—for in- | ...
stance, in Figure 3a, the identifier node stor- ...
ing "create minion" represents a function of
the same name.
Figure 4: A simplified fragment of the Python
Composite types correspond to language con- ASDL grammar.1
structs, like expressions or statements. Each type
has a collection of constructors, each of which
specifies the particular language construct a node
a composite node of type stmt that represents a
of that type represents. Figure 4 shows con-
class definition and therefore uses the ClassDef
structors for the statement (stmt) and expression
constructor. In Figure 3b, on the other hand, the
(expr) types. The associated language constructs
root uses the Call constructor because it repre-
include function and class definitions, return state-
sents a function call.
ments, binary operations, and function calls.
Composite types enter syntax trees via compos- Children are specified by named and typed
ite nodes, annotated with a composite type and a fields of the constructor, which have cardinalities
choice of constructor specifying how the node ex- of singular, optional, or sequential.
pands. The root node in Figure 3a, for example, is By default, fields have singular cardinality,
meaning they correspond to exactly one child.
1
The full grammar can be found online on the For instance, the ClassDef constructor has a
documentation page for the Python ast module:
https://docs.python.org/3/library/ast. singular name field of type identifier.
html#abstract-grammar Fields of optional cardinality are associ-
ated with zero or one children, while fields the decoder.
of sequential cardinality are associated with
zero or more children—these are designated us- 3.1 Encoder
ing ? and * suffixes in the grammar, respectively. Each component c of the input is encoded using a
Fields of sequential cardinality are often used component-specific bidirectional LSTM. This re-
to represent statement blocks, as in the body field sults in forward and backward token encodings

→ ← −
of the ClassDef and FunctionDef construc- (hc , hc ) that are later used by the attention mech-
tors. anism. To obtain an encoding of the input as a
The grammars needed for semantic parsing can whole for decoder initialization, we concatenate
easily be given ASDL specifications as well, us- the final forward and backward encodings of each
ing primitive types to represent variables, predi- component into a single vector and apply a linear
cates, and atoms and composite types for standard projection.
logical building blocks like lambdas and counting
3.2 Decoder Modules
(among others). Figure 2 shows what the resulting
λ-calculus trees look like. The ASDL grammars The decoder decomposes into several classes of
for both λ-calculus and Prolog-style logical forms modules, one per construct in the grammar, which
are quite compact, as Figures 9 and 10 in the ap- we discuss in turn. Throughout, we let v de-
pendix show. note the current vertical LSTM state, and use f
to represent a generic feedforward neural network.
2.2 Input Representation LSTM updates with hidden state h and input x are
We represent inputs as collections of named com- notated as LSTM(h, x).
ponents, each of which consists of a sequence of Composite type modules Each composite type
tokens. In the case of semantic parsing, inputs T has a corresponding module whose role is to se-
have a single component containing the query sen- lect among the constructors C for that type. As
tence. In the case of H EARTHSTONE, the card’s Figure 5a exhibits, a composite type module re-
name and description are represented as sequences ceives a vertical LSTM state v as input and ap-
of characters and tokens, respectively, while cate- plies a feedforward network fT and a softmax out-
gorical attributes are represented as single-token put layer to choose a constructor:
sequences. For H EARTHSTONE, we restrict our  
input and output vocabularies to values that occur p (C | T, v) = softmax (fT (v)) C .
more than once in the training set. Control is then passed to the module associated
with constructor C.
3 Model Architecture
Constructor modules Each constructor C has a
Our model uses an encoder-decoder architecture corresponding module whose role is to compute
with hierarchical attention. The key idea behind an intermediate vertical LSTM state vu,F for each
our approach is to structure the decoder as a col- of its fields F whenever C is chosen at a composite
lection of mutually recursive modules. The mod- node u.
ules correspond to elements of the AST gram- For each field F of the constructor, an embed-
mar and are composed together in a manner that ding eF is concatenated with an attention-based
mirrors the structure of the tree being generated. context vector c and fed through a feedforward
A vertical LSTM state is passed from module to neural network fC to obtain a context-dependent
module to propagate information during the de- field embedding:
coding process.
ẽF = fC (eF , c) .
The encoder uses bidirectional LSTMs to em-
bed each component and a feedforward network An intermediate vertical state for the field F at
to combine them. Component- and token-level at- composite node u is then computed as
tention is applied over the input at each step of the
vu,F = LSTMv (vu , ẽF ) .
decoding process.
We train our model using negative log likeli- Figure 5b illustrates the process, starting with a
hood as the loss function. The likelihood encom- single vertical LSTM state and ending with one
passes terms for all generation decisions made by updated state per field.
ClassDef
stmt expr
If
For If
test
While If body stmt*
Assign
orelse
Return
... stmt*

(a) A composite type module choosing a constructor for


the corresponding type. (b) A constructor module computing updated vertical
LSTM states.

identifier
__init__
stmt* create_minion
stmt add_buff add_buff
change_attack
damage
...
(c) A constructor field module (sequential cardinal-
ity) generating children to populate the field. At each (d) A primitive type module choosing a value from a
step, the module decides whether to generate a child and closed list.
continue (white circle) or stop (black circle).

Figure 5: The module classes constituting our decoder. For brevity, we omit the cardinality modules for
singular and optional cardinalities.

Constructor field modules Each field F of a decides whether to generate another child by ap-
constructor has a corresponding module whose plying a modified form of (2):
role is to determine the number of children asso-
ciated with that field and to propagate an updated p (zF,i = 1 | sF,i−1 , vu,F ) =
vertical LSTM state to them. In the case of fields sigmoid (fFgen (sF,i−1 , vu,F )) .
with singular cardinality, the decision and up-
date are both vacuous, as exactly one child is al- If zF,i = 0, generation stops and the process ter-
ways generated. Hence these modules forward the minates, as represented by the solid black circle
field vertical LSTM state vu,F unchanged to the in Figure 5c. Otherwise, the process continues as
child w corresponding to F: represented by the white circle in Figure 5c. In
that case, the horizontal state su,i−1 is combined
vw = vu,F . (1) with the vertical state vu,F and an attention-based
context vector cF,i using a feedforward network
Fields with optional cardinality can have either fFupdate to obtain a joint context-dependent encod-
zero or one children; this choice is made using a ing of the field F and the position i:
feedforward network applied to the vertical LSTM
state: ẽF,i = fFupdate (vu,F , su,i−1 , cF,i ).

p(zF = 1 | vu,F ) = sigmoid (fFgen (vu,F )) . (2) The result is used to perform a vertical LSTM up-
date for the corresponding child wi :
If a child is to be generated, then as in (1), the state
is propagated forward without modification. vwi = LSTMv (vu,F , ẽF,i ).
In the case of sequential fields, a horizon-
tal LSTM is employed for both child decisions and Finally, the horizontal LSTM state is updated us-
state updates. We refer to Figure 5c for an illus- ing the same field-position encoding, and the pro-
tration of the recurrent process. After being ini- cess continues:
tialized with a transformation of the vertical state,
sF,0 = WF vu,F , the horizontal LSTM iteratively su,i = LSTMh (su,i−1 , ẽF,i ).
Primitive type modules Each primitive type T The current state x can be either the vertical
has a corresponding module whose role is to se- LSTM state in isolation or a concatentation of the
lect among the values y within the domain of that vertical LSTM state and either a horizontal LSTM
type. Figure 5d presents an example of the sim- state or a character LSTM state (for string gener-
plest form of this selection process, where the ation). Each submodule that computes attention
value y is obtained from a closed list via a soft- does so using a separate matrix W.
max layer applied to an incoming vertical LSTM A separate attention score qccomp is computed
state: for each component of the input, independent of
  its content:
p (y | T, v) = softmax (fT (v)) y . qccomp = wc> x.
Some string-valued types are open class, how- The final token-level attention scores are the
ever. To deal with these, we allow generation both sums of the raw token-level scores and the corre-
from a closed list of previously seen values, as in sponding component-level scores:
Figure 5d, and synthesis of new values. Synthesis comp
is delegated to a character-level LSTM language qt = qtraw + qc(t) ,
model (Bengio et al., 2003), and part of the role where c(t) denotes the component in which token
of the primitive module for open class types is to t occurs. The attention weight vector a is then
choose whether to synthesize a new value or not. computed using a softmax:
During training, we allow the model to use the
character LSTM only for unknown strings but in- a = softmax (q) .
clude the log probability of that binary decision in
the loss in order to ensure the model learns when Given the weights, the attention-based context is
to generate from the character LSTM. given by: X
c= at et .
3.3 Decoding Process t

The decoding process proceeds through mutual re- Certain decision points that require attention
cursion between the constituting modules, where have been highlighted in the description above;
the syntactic structure of the output tree mirrors however, in our final implementation we made
the call graph of the generation procedure. At attention available to the decoder at all decision
each step, the active decoder module either makes points.
a generation decision, propagates state down the Supervised Attention In the datasets we con-
tree, or both. sider, partial or total copying of input tokens into
To construct a composite node of a given type, primitive nodes is quite common. Rather than pro-
the decoder calls the appropriate composite type viding an explicit copying mechanism (Ling et al.,
module to obtain a constructor and its associated 2016), we instead generate alignments where pos-
module. That module is then invoked to obtain sible to define a set of tokens on which the atten-
updated vertical LSTM states for each of the con- tion at a given primitive node should be concen-
structor’s fields, and the corresponding constructor trated.2 If no matches are found, the correspond-
field modules are invoked to advance the process ing set of tokens is taken to be the whole input.
to those children. The attention supervision enters the loss
This process continues downward, stopping at through a term that encourages the final attention
each primitive node, where a value is generated weights to be concentrated on the specified sub-
but no further recursion is carried out. set. Formally, if the matched subset of component-
3.4 Attention token pairs is S, the loss term associated with the
supervision would be
Following standard practice for sequence-to- X X
sequence models, we compute a raw bilinear at- log exp (at ) − log exp (at ), (3)
tention score qtraw for each token t in the input us- t t∈S
ing the decoder’s current state x and the token’s 2
Alignments are generated using an exact string match
encoding et : heuristic that also included some limited normalization, pri-
marily splitting of special characters, undoing camel case,
qtraw = e>
t Wx. and lemmatization for the semantic parsing datasets.
where at is the attention weight associated with to- its description consists of a sequence of tokens
ken t, and the sum in the first term ranges over all split on whitespace and punctuation. All categori-
tokens in the input. The loss in (3) can be inter- cal components are represented as single-token se-
preted as the negative log probability of attending quences.
to some token in S.
Evaluation For direct comparison to the results
4 Experimental evaluation of Ling et al. (2016), we evaluate our predicted
code based on exact match and token-level BLEU
4.1 Semantic parsing relative to the reference implementations from the
Data We use three semantic parsing datasets: library. We additionally compute node-based pre-
J OBS, G EO, and ATIS. All three consist of nat- cision, recall, and F1 scores for our predicted trees
ural language queries paired with a logical repre- compared to the reference code ASTs. Formally,
sentation of their denotations. J OBS consists of these scores are obtained by defining the intersec-
640 such pairs, with Prolog-style logical represen- tion of the predicted and gold trees as their largest
tations, while G EO and ATIS consist of 880 and common tree prefix.
5,410 such pairs, respectively, with λ-calculus log-
4.3 Settings
ical forms. We use the same training-test split
as Zettlemoyer and Collins (2005) for J OBS and For each experiment, all feedforward and LSTM
G EO, and the standard training-development-test hidden dimensions are set to the same value. We
split for ATIS. We use the preprocessed versions select the dimension from {30, 40, 50, 60, 70}
of these datasets made available by Dong and La- for the smaller J OBS and G EO datasets, or from
pata (2016), where text in the input has been low- {50, 75, 100, 125, 150} for the larger ATIS
ercased and stemmed using NLTK (Bird et al., and H EARTHSTONE datasets. The dimensionality
2009), and matching entities appearing in the same used for the inputs to the encoder is set to 100 in
input-output pair have been replaced by numbered all cases. We apply dropout to the non-recurrent
abstract identifiers of the same type. connections of the vertical and horizontal LSTMs,
selecting the noise ratio from {0.2, 0.3, 0.4, 0.5}.
Evaluation We compute accuracies using tree All parameters are randomly initialized using Glo-
exact match for evaluation. Following the pub- rot initialization (Glorot and Bengio, 2010).
licly released code of Dong and Lapata (2016), we We perform 200 passes over the data for the
canonicalize the order of the children within con- J OBS and G EO experiments, or 400 passes for
junction and disjunction nodes to avoid spurious the ATIS and H EARTHSTONE experiments. Early
errors, but otherwise perform no transformations stopping based on exact match is used for the se-
before comparison. mantic parsing experiments, where performance is
evaluated on the training set for J OBS and G EO
4.2 Code generation
or on the development set for ATIS. Parameters
Data We use the H EARTHSTONE dataset intro- for the H EARTHSTONE experiments are selected
duced by Ling et al. (2016), which consists of based on development BLEU scores. In order to
665 cards paired with their implementations in the promote generalization, ties are broken in all cases
open-source Hearthbreaker engine.3 Our training- with a preference toward higher dropout ratios and
development-test split is identical to that of Ling lower dimensionalities, in that order.
et al. (2016), with split sizes of 533, 66, and 66, Our system is implemented in Python using
respectively. the DyNet neural network library (Neubig et al.,
Cards contain two kinds of components: tex- 2017). We use the Adam optimizer (Kingma and
tual components that contain the card’s name and Ba, 2014) with its default settings for optimiza-
a description of its function, and categorical ones tion, with a batch size of 20 for the semantic pars-
that contain numerical attributes (attack, health, ing experiments, or a batch size of 10 for the
cost, and durability) or enumerated attributes (rar- H EARTHSTONE experiments.
ity, type, race, and class). The name of the card
is represented as a sequence of characters, while 4.4 Results
3
Available online at https://github.com/ Our results on the semantic parsing datasets are
danielyule/hearthbreaker. presented in Table 1. Our basic system achieves
ATIS G EO J OBS
System Accuracy System Accuracy System Accuracy
ZH15 84.2 ZH15 88.9 ZH15 85.0
ZC07 84.6 KCAZ13 89.0 PEK03 88.0
WKZ14 91.3 WKZ14 90.4 LJK13 90.7
DL16 84.6 DL16 87.1 DL16 90.0
ASN 85.3 ASN 85.7 ASN 91.4
+ S UPATT 85.9 + S UPATT 87.1 + S UPATT 92.9

Table 1: Accuracies for the semantic parsing tasks. ASN denotes our abstract syntax network framework.
S UPATT refers to the supervised attention mentioned in Section 3.4.

System Accuracy BLEU F1


N EAREST 3.0 65.0 65.7 class IronbarkProtector(MinionCard):
def __init__(self):
LPN 6.1 67.1 – super().__init__(
ASN 18.2 77.6 72.4 'Ironbark Protector', 8,
CHARACTER_CLASS.DRUID,
+ S UPATT 22.7 79.2 75.6 CARD_RARITY.COMMON)
def create_minion(self, player):
return Minion(
Table 2: Results for the H EARTHSTONE task. S U - 8, 8, taunt=True)

PATT refers to the system with supervised atten-


tion mentioned in Section 3.4. LPN refers to the
system of Ling et al. (2016). Our nearest neigh-
bor baseline N EAREST follows that of Ling et al.
Figure 6: Cards with minimal descriptions exhibit
(2016), though it performs somewhat better; its
a uniform structure that our system almost always
nonzero exact match number stems from spurious
predicts correctly, as in this instance.
repetition in the data.

class ManaWyrm(MinionCard):
a new state-of-the-art accuracy of 91.4% on the def __init__(self):
super().__init__(
J OBS dataset, and this number improves to 92.9% 'Mana Wyrm', 1,
CHARACTER_CLASS.MAGE,
when supervised attention is added. On the ATIS CARD_RARITY.COMMON)
def create_minion(self, player):
and G EO datasets, we respectively exceed and return Minion(
match the results of Dong and Lapata (2016). 1, 3, effects=[
Effect(
However, these fall short of the previous best re- SpellCast(),
ActionTag(
sults of 91.3% and 90.4%, respectively, obtained Give(ChangeAttack(1)),
by Wang et al. (2014). This difference may be par- ])
SelfSelector()))

tially attributable to the use of typing information


Figure 7: For many cards with moderately com-
or rich lexicons in most previous semantic pars-
plex descriptions, the implementation follows a
ing approaches (Zettlemoyer and Collins, 2007;
functional style that seems to suit our modeling
Kwiatkowski et al., 2013; Wang et al., 2014; Zhao
strategy, usually leading to correct predictions.
and Huang, 2015).
On the H EARTHSTONE dataset, we improve
significantly over the initial results of Ling et al. 4.5 Error Analysis and Discussion
(2016) across all evaluation metrics, as shown in As the examples in Figures 6-8 show, classes in
Table 2. On the more stringent exact match metric, the H EARTHSTONE dataset share a great deal of
we improve from 6.1% to 18.2%, and on token- common structure. As a result, in the simplest
level BLEU, we improve from 67.1 to 77.6. When cases, such as in Figure 6, generating the code is
supervised attention is added, we obtain an ad- simply a matter of matching the overall structure
ditional increase of several points on each scale, and plugging in the correct values in the initializer
achieving peak results of 22.7% accuracy and 79.2 and a few other places. In such cases, our sys-
BLEU. tem generally predicts the correct code, with the
class MultiShot(SpellCard): class MultiShot(SpellCard):
def __init__(self): def __init__(self):
super().__init__( super().__init__(
'Multi-Shot', 4, 'Multi-Shot', 4,
CHARACTER_CLASS.HUNTER, CHARACTER_CLASS.HUNTER,
CARD_RARITY.FREE) CARD_RARITY.FREE)
def use(self, player, game): def use(self, player, game):
super().use(player, game) super().use(player, game)
targets = copy.copy( minions = copy.copy(
game.other_player.minions) game.other_player.minions)
for i in range(0, 2): for i in range(0, 3):
target = game.random_choice(targets) minion = game.random_choice(minions)
targets.remove(target) minions.remove(minion)
target.damage( def can_use(self, player, game):
player.effective_spell_damage(3), return (
self) super().can_use(player, game) and
def can_use(self, player, game): len(game.other_player.minions) >= 3)
return (
super().can_use(player, game) and
(len(game.other_player.minions) >= 2))

Figure 8: Cards with nontrivial logic expressed in an imperative style are the most challenging for our
system. In this example, our prediction comes close to the gold code, but misses an important statement
in addition to making a few other minor errors. (Left) gold code; (right) predicted code.

exception of instances in which strings are incor- match that canonicalize the code—for example,
rectly transduced. Introducing a dedicated copy- by anonymizing all variables—may prove more
ing mechanism like the one used by Ling et al. meaningful. Direct evaluation of functional equiv-
(2016) or more specialized machinery for string alence is of course impossible in general (Sipser,
transduction may alleviate this latter problem. 2006), and practically challenging even for the
The next simplest category of card-code pairs H EARTHSTONE dataset because it requires inte-
consists of those in which the card’s logic is grating with the game engine.
mostly implemented via nested function calls. Existing work also does not attempt to enforce
Figure 7 illustrates a typical case, in which the semantic coherence in the output. Long-distance
card’s effect is triggered by a game event (a spell semantic dependencies, between occurrences of a
being cast) and both the trigger and the effect are single variable for example, in particular are not
described by arguments to an Effect construc- modeled. Nor is well-typedness or executability.
tor. Our system usually also performs well on in- Overcoming these evaluation and modeling issues
stances like these, apart from idiosyncratic errors remains an important open problem.
that can take the form of under- or overgeneration
or simply substitution of incorrect predicates.
5 Conclusion
Cards whose code includes complex logic ex- ASNs provide a modular encoder-decoder archi-
pressed in an imperative style, as in Figure 8, pose tecture that can readily accommodate a variety of
the greatest challenge for our system. Factors like tasks with structured output spaces. They are par-
variable naming, nontrivial control flow, and in- ticularly applicable in the presence of recursive
terleaving of code predictable from the descrip- decompositions, where they can provide a simple
tion with code required due to the conventions of decoding process that closely parallels the inher-
the library combine to make the code for these ent structure of the outputs. Our results demon-
cards difficult to generate. In some instances (as strate their promise for tree prediction tasks, and
in the figure), our system is nonetheless able to we believe their application to more general out-
synthesize a close approximation. However, in the put structures is an interesting avenue for future
most complex cases, the predictions deviate sig- work.
nificantly from the correct implementation.
Acknowledgments
In addition to the specific errors our system
makes, some larger issues remain unresolved. Ex- MR is supported by an NSF Graduate Research
isting evaluation metrics only approximate the Fellowship and a Fannie and John Hertz Founda-
actual metric of interest: functional equiva- tion Google Fellowship. MS is supported by an
lence. Modifications of BLEU, tree F1, and exact NSF Graduate Research Fellowship.
References grammars. In NAACL HLT 2016, The 2016 Confer-
ence of the North American Chapter of the Associ-
Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jef- ation for Computational Linguistics: Human Lan-
frey D. Ullman. 2006. Compilers: Principles, Tech- guage Technologies, San Diego California, USA,
niques, and Tools (2Nd Edition). Addison-Wesley June 12-17, 2016. pages 199–209.
Longman Publishing Co., Inc., Boston, MA, USA.
Xavier Glorot and Yoshua Bengio. 2010. Understand-
Miltiadis Allamanis, Daniel Tarlow, Andrew D. Gor- ing the difficulty of training deep feedforward neu-
don, and Yi Wei. 2015. Bimodal modelling of ral networks. In In Proceedings of the International
source code and natural language. In Proceedings Conference on Artificial Intelligence and Statistics
of the 32nd International Conference on Machine (AISTATS10). Society for Artificial Intelligence and
Learning, ICML 2015, Lille, France, 6-11 July 2015. Statistics.
pages 2123–2132.
Diederik P. Kingma and Jimmy Ba. 2014. Adam:
David Alvarez-Melis and Tommi S. Jaakkola. 2017. A method for stochastic optimization. CoRR
Tree-structured decoding with doubly-recurrent abs/1412.6980. http://arxiv.org/abs/1412.6980.
neural networks. In Proceedings of the International
Conference on Learning Representations (ICLR) Tom Kwiatkowski, Eunsol Choi, Yoav Artzi, and
2017. Luke S. Zettlemoyer. 2013. Scaling semantic
parsers with on-the-fly ontology matching. In Pro-
Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and
ceedings of the 2013 Conference on Empirical Meth-
Dan Klein. 2016. Neural module networks. In Pro-
ods in Natural Language Processing, EMNLP 2013,
ceedings of the IEEE Conference on Computer Vi-
18-21 October 2013, Grand Hyatt Seattle, Seattle,
sion and Pattern Recognition (CVPR). Oral.
Washington, USA, A meeting of SIGDAT, a Special
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Interest Group of the ACL. pages 1545–1556.
Bengio. 2014. Neural machine translation by
Percy Liang, Michael I. Jordan, and Dan Klein. 2010.
jointly learning to align and translate. CoRR
Learning programs: A hierarchical bayesian ap-
abs/1409.0473.
proach. In Proceedings of the 27th International
Matej Balog, Alexander L. Gaunt, Marc Brockschmidt, Conference on Machine Learning (ICML-10), June
Sebastian Nowozin, and Daniel Tarlow. 2016. 21-24, 2010, Haifa, Israel. pages 639–646.
Deepcoder: Learning to write programs. CoRR
abs/1611.01989. Percy Liang, Michael I. Jordan, and Dan Klein.
2013. Learning dependency-based compositional
Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and semantics. Comput. Linguist. 39(2):389–446.
Christian Janvin. 2003. A neural probabilistic lan- https://doi.org/10.1162/COLI a 00127.
guage model. J. Mach. Learn. Res. 3:1137–1155.
http://dl.acm.org/citation.cfm?id=944919.944966. Wang Ling, Phil Blunsom, Edward Grefenstette,
Karl Moritz Hermann, Tomás Kociský, Fumin
Steven Bird, Ewan Klein, and Edward Loper. Wang, and Andrew Senior. 2016. Latent predictor
2009. Natural Language Processing with Python. networks for code generation. In Proceedings of the
O’Reilly Media, Inc., 1st edition. 54th Annual Meeting of the Association for Compu-
tational Linguistics, ACL 2016, August 7-12, 2016,
Danqi Chen and Christopher D. Manning. 2014. A Berlin, Germany, Volume 1: Long Papers.
fast and accurate dependency parser using neu-
ral networks. In Proceedings of the 2014 Con- Chris J. Maddison and Daniel Tarlow. 2014. Struc-
ference on Empirical Methods in Natural Lan- tured generative models of natural source code. In
guage Processing, EMNLP 2014, October 25-29, Proceedings of the 31th International Conference on
2014, Doha, Qatar, A meeting of SIGDAT, a Spe- Machine Learning, ICML 2014, Beijing, China, 21-
cial Interest Group of the ACL. pages 740–750. 26 June 2014. pages 649–657.
http://aclweb.org/anthology/D/D14/D14-1082.pdf.
Aditya Krishna Menon, Omer Tamuz, Sumit Gulwani,
James Cross and Liang Huang. 2016. Span-based Butler W. Lampson, and Adam Kalai. 2013. A ma-
constituency parsing with a structure-label system chine learning framework for programming by ex-
and provably optimal dynamic oracles. In Proceed- ample. In Proceedings of the 30th International
ings of the 2016 Conference on Empirical Meth- Conference on Machine Learning, ICML 2013, At-
ods in Natural Language Processing, EMNLP 2016, lanta, GA, USA, 16-21 June 2013. pages 187–195.
Austin, Texas, USA, November 1-4, 2016. pages 1–
11. Graham Neubig, Chris Dyer, Yoav Goldberg, Austin
Matthews, Waleed Ammar, Antonios Anastasopou-
Li Dong and Mirella Lapata. 2016. Language to logical los, Miguel Ballesteros, David Chiang, Daniel
form with neural attention. CoRR abs/1601.01280. Clothiaux, Trevor Cohn, Kevin Duh, Manaal
Faruqui, Cynthia Gan, Dan Garrette, Yangfeng Ji,
Chris Dyer, Adhiguna Kuncoro, Miguel Ballesteros, Lingpeng Kong, Adhiguna Kuncoro, Gaurav Ku-
and Noah A. Smith. 2016. Recurrent neural network mar, Chaitanya Malaviya, Paul Michel, Yusuke
Oda, Matthew Richardson, Naomi Saphra, Swabha A Appendix
Swayamdipta, and Pengcheng Yin. 2017. Dynet:
The dynamic neural network toolkit. arXiv preprint expr
arXiv:1701.03980 . = Apply(pred predicate, arg* arguments)
| Not(expr argument)
Ana-Maria Popescu, Oren Etzioni, and Henry Kautz. | Or(expr left, expr right)
2003. Towards a theory of natural language inter- | And(expr* arguments)
faces to databases. In Proceedings of the 8th in-
ternational conference on Intelligent user interfaces. arg
ACM, pages 149–157. = Literal(lit literal)
| Variable(var variable)
Richard Shin, Alexander A. Alemi, Geoffrey Irving,
and Oriol Vinyals. 2017. Tree-structured varia- Figure 9: The Prolog-style grammar we use for the
tional autoencoder. In Proceedings of the Inter- J OBS task.
national Conference on Learning Representations
(ICLR) 2017.
expr
Michael Sipser. 2006. Introduction to the Theory of = Variable(var variable)
Computation. Course Technology, second edition. | Entity(ent entity)
| Number(num number)
Oriol Vinyals, Lukasz Kaiser, Terry Koo, Slav Petrov, | Apply(pred predicate, expr* arguments)
Ilya Sutskever, and Geoffrey E. Hinton. 2015. | Argmax(var variable, expr domain, expr body)
Grammar as a foreign language. In Advances in | Argmin(var variable, expr domain, expr body)
Neural Information Processing Systems 28: Annual | Count(var variable, expr body)
Conference on Neural Information Processing Sys- | Exists(var variable, expr body)
tems 2015, December 7-12, 2015, Montreal, Que- | Lambda(var variable, var_type type, expr body)
| Max(var variable, expr body)
bec, Canada. pages 2773–2781.
| Min(var variable, expr body)
| Sum(var variable, expr domain, expr body)
Adrienne Wang, Tom Kwiatkowski, and Luke S Zettle-
| The(var variable, expr body)
moyer. 2014. Morpho-syntactic lexical generaliza- | Not(expr argument)
tion for ccg semantic parsing. In EMNLP. pages | And(expr* arguments)
1284–1295. | Or(expr* arguments)
| Compare(cmp_op op, expr left, expr right)
Daniel C. Wang, Andrew W. Appel, Jeff L. Korn,
and Christopher S. Serra. 1997. The zephyr ab- cmp_op = Equal | LessThan | GreaterThan
stract syntax description language. In Proceedings
of the Conference on Domain-Specific Languages on Figure 10: The λ-calculus grammar used by our
Conference on Domain-Specific Languages (DSL), system.
1997. USENIX Association, Berkeley, CA, USA,
DSL’97, pages 17–17.

Luke S. Zettlemoyer and Michael Collins. 2005.


Learning to map sentences to logical form: Struc-
tured classification with probabilistic categorial
grammars. In UAI ’05, Proceedings of the 21st Con-
ference in Uncertainty in Artificial Intelligence, Ed-
inburgh, Scotland, July 26-29, 2005. pages 658–666.
Luke S. Zettlemoyer and Michael Collins. 2007. On-
line learning of relaxed ccg grammars for parsing to
logical form. In In Proceedings of the 2007 Joint
Conference on Empirical Methods in Natural Lan-
guage Processing and Computational Natural Lan-
guage Learning (EMNLP-CoNLL-2007. pages 678–
687.
Kai Zhao and Liang Huang. 2015. Type-driven in-
cremental semantic parsing with polymorphism. In
NAACL HLT 2015, The 2015 Conference of the
North American Chapter of the Association for
Computational Linguistics: Human Language Tech-
nologies, Denver, Colorado, USA, May 31 - June 5,
2015. pages 1416–1421.

You might also like