0% found this document useful (0 votes)
46 views

MIT Open Access

The document presents a method to automatically generate program input parsers from English specifications of input file formats. It uses a Bayesian model to translate an English specification into a specification tree, which is then translated into C++ code for an input parser. The method models the problem as a joint dependency parsing and semantic role labeling task, using correlations between the text and specification tree and feedback from successfully parsing input examples to generate the C++ parser. Evaluation shows the approach achieves an 80.0% F-Score accuracy on a dataset of programming contest input specifications.

Uploaded by

rahul1990bhatia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views

MIT Open Access

The document presents a method to automatically generate program input parsers from English specifications of input file formats. It uses a Bayesian model to translate an English specification into a specification tree, which is then translated into C++ code for an input parser. The method models the problem as a joint dependency parsing and semantic role labeling task, using correlations between the text and specification tree and feedback from successfully parsing input examples to generate the C++ parser. Evaluation shows the approach achieves an 80.0% F-Score accuracy on a dataset of programming contest input specifications.

Uploaded by

rahul1990bhatia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

From Natural Language Specifications to Program Input

Parsers

The MIT Faculty has made this article openly available. Please share
how this access benefits you. Your story matters.

Citation Lei, Tao; Long, Fan; Barzilay, Regina; Rinard, Martin C. "From
Natural Language Specifications to Program Input Parsers". The
51st Annual Meeting of the Association for Computational
Linguistics (ACL 2013).
As Published http://acl2013.org/site/accepted/299.html
Publisher Association for Computational Linguistics (ACL)

Version Author's final manuscript


Accessed Wed Oct 03 04:13:15 EDT 2018
Citable Link http://hdl.handle.net/1721.1/79643
Terms of Use Creative Commons Attribution-Noncommercial-Share Alike 3.0
Detailed Terms http://creativecommons.org/licenses/by-nc-sa/3.0/
From Natural Language Specifications to Program Input Parsers

Tao Lei, Fan Long, Regina Barzilay, and Martin Rinard


Computer Science and Artificial Intelligence Laboratory
Massachusetts Institute of Technology
{taolei, fanl, regina, rinard}@csail.mit.edu

Abstract (a) Text Specification:


The input contains a single integer T that indicates the
We present a method for automatically number of test cases. Then follow the T cases. Each test
case begins with a line contains an integer N, representing
generating input parsers from English the size of wall. The next N lines represent the original
wall. Each line contains N characters. The j-th character of
specifications of input file formats. We the i-th line figures out the color ...
use a Bayesian generative model to cap-
(b) Specification Tree:
ture relevant natural language phenomena
the input
and translate the English specification into
a specification tree, which is then trans- a single integer T test cases

lated into a C++ input parser. We model an integer N the next N lines
the problem as a joint dependency pars-
ing and semantic role labeling task. Our N characters

method is based on two sources of infor- (c) Two Program Input Examples:
mation: (1) the correlation between the 1 2
10 1
text and the specification tree and (2) noisy YYWYYWWWWW Y
supervision as determined by the success YWWWYWWWWW 5
YYWYYWWWWW YWYWW
of the generated C++ parser in reading in- ... ...
WWWWWWWWWW WWYYY
put examples. Our results show that our
approach achieves 80.0% F-Score accu-
Figure 1: An example of (a) one natural language
racy compared to an F-Score of 66.7%
specification describing program input data; (b)
produced by a state-of-the-art semantic
the corresponding specification tree representing
parser on a dataset of input format speci-
the program input structure; and (c) two input ex-
fications from the ACM International Col-
amples
legiate Programming Contest (which were
written in English for humans with no in-
however, researchers have had success address-
tention of providing support for automated
ing specific aspects of this problem. Recent ad-
processing).1
vances in this area include the successful transla-
1 Introduction tion of natural language commands into database
queries (Wong and Mooney, 2007; Zettlemoyer
The general problem of translating natural lan- and Collins, 2009; Poon and Domingos, 2009;
guage specifications into executable code has been Liang et al., 2011) and the successful mapping of
around since the field of computer science was natural language instructions into Windows com-
founded. Early attempts to solve this problem mand sequences (Branavan et al., 2009; Branavan
produced what were essentially verbose, clumsy, et al., 2010).
and ultimately unsuccessful versions of standard In this paper we explore a different aspect of
formal programming languages. In recent years this general problem: the translation of natural
1
The code, data, and experimental setup for this research language input specifications into executable code
are available at http://groups.csail.mit.edu/rbg/code/nl2p that correctly parses the input data and generates
data structures for holding the data. The need 1 struct TestCaseType {
2 int N;
to automate this task arises because input format 3 vector<NLinesType*> lstLines;
specifications are almost always described in natu- 4 InputType* pParentLink;
5 }
ral languages, with these specifications then man- 6
ually translated by a programmer into the code 7 struct InputType {
for reading the program inputs. Our method 8 int T;
9 vector<TestCaseType*> lstTestCase;
highlights potential to automate this translation, 10 }
thereby eliminating the manual software develop- 11

ment overhead. 12 TestCaseType* ReadTestCase(FILE * pStream,


13 InputType* pParentLink) {
Consider the text specification in Figure 1a. 14 TestCaseType* pTestCase
If the desired parser is implemented in C++, it 15 = new TestCaseType;
16 pTestCase→pParentLink = pParentLink;
should create a C++ class whose instance objects 17
hold the different fields of the input. For exam- 18 ...
ple, one of the fields of this class is an integer, i.e., 19
20 return pTestCase;
“a single integer T” identified in the text specifi- 21 }
cation in Figure 1a. Instead of directly generating 22

code from the text specification, we first translate 23 InputType* ReadInput(FILE * pStream) {
24 InputType* pInput = new InputType;
the specification into a specification tree (see Fig- 25
ure 1b), then map this tree into parser code (see 26 pInput→T = ReadInteger(pStream);
Figure 2). We focus on the translation from the 27 for (int i = 0; i < pInput→T; ++i) {
28 TestCaseType* pTestCase
text specification to the specification tree.2 29 = new TestCaseType;
We assume that each text specification is ac- 30 pTestCase = ReadTestCase (pStream,
31 pInput);
companied by a set of input examples that the de- 32 pInput→lstTestCase.push back (pTestCase);
sired input parser is required to successfully read. 33 }
34
In standard software development contexts, such
35 return pInput;
input examples are usually available and are used 36 }
to test the correctness of the input parser. Note that
this source of supervision is noisy — the generated Figure 2: Input parser code for reading input files
parser may still be incorrect even when it success- specified in Figure 1.
fully reads all of the input examples. Specifically,
the parser may interpret the input examples differ- cation trees given text specifications. A specifica-
ently from the text specification. For example, the tion tree is rejected in the sampling framework if
program input in Figure 1c can be interpreted sim- the corresponding code fails to successfully read
ply as a list of strings. The parser may also fail all of the input examples. The sampling frame-
to parse some correctly formatted input files not in work also rejects the tree if the text/specification
the set of input examples. Therefore, our goal is to tree pair has low probability.
design a technique that can effectively learn from
this weak supervision. We evaluate our method on a dataset of in-
put specifications from ACM International Colle-
We model our problem as a joint depen-
giate Programming Contests, along with the cor-
dency parsing and role labeling task, assuming
responding input examples. These specifications
a Bayesian generative process. The distribution
were written for human programmers with no in-
over the space of specification trees is informed
tention of providing support for automated pro-
by two sources of information: (1) the correla-
cessing. However, when trained using the noisy
tion between the text and the corresponding spec-
supervision, our method achieves substantially
ification tree and (2) the success of the generated
more accurate translations than a state-of-the-art
parser in reading input examples. Our method uses
semantic parser (Clarke et al., 2010) (specifically,
a joint probability distribution to take both of these
80.0% in F-Score compared to an F-Score of
sources of information into account, and uses a
66.7%). The strength of our model in the face of
sampling framework for the inference of specifi-
such weak supervision is also highlighted by the
2
During the second step of the process, the specification fact that it retains an F-Score of 77% even when
tree is deterministically translated into code. only one input example is provided for each input
Text Specification: Specification Tree: Formal Input Grammar Definition:
Your program is supposed to read the input from the standard Input := N
the input
input and write its output to the standard output. Lines [size = N]
FollowingLines [size = *]
The first line of the input contains one integer N.
one integer N N lines the following lines N := int
N lines follow, the i-th of them contains two real numbers Xi, Yi
Lines := Xi Yi
separated by a single space - the coordinates of the i-th house.
Xi := float
Each of the following lines contains four real numbers separated
Yi := float
by a single space. These numbers are the coordinates of two two real four real
FollowingLines := F1 F2 F3 F4
numbers Xi, Yi numbers
different points (X1, Y1) and (X2, Y2), lying on the highway.
F1 := float

(a) (b) (c)

Figure 3: An example of generating input parser code from text: (a) a natural language input specifica-
tion; (b) a specification tree representing the input format structure (we omit the background phrases in
this tree in order to give a clear view of the input format structure); and (c) formal definition of the input
format constructed from the specification tree, represented as a context-free grammar in Backus-Naur
Form with additional size constraints.

specification. brary specifications (Zhong et al., 2009; Pandita


et al., 2012) and analyzing code comments to de-
2 Related Work tect concurrency bugs (Tan et al., 2007; Tan et al.,
Learning Meaning Representation from Text 2011). This research analyzes natural language in
Mapping sentences into structural meaning rep- documentation or comments to better understand
resentations is an active and extensively studied existing application programs. Our mechanism, in
task in NLP. Examples of meaning representations contrast, automatically generates parser programs
considered in prior research include logical forms from natural language input format descriptions.
based on database query (Tang and Mooney, 2000;
Zettlemoyer and Collins, 2005; Kate and Mooney, 3 Problem Formulation
2007; Wong and Mooney, 2007; Poon and Domin- The task of translating text specifications to input
gos, 2009; Liang et al., 2011; Goldwasser et al., parsers consists of two steps, as shown in Figure 3.
2011), semantic frames (Das et al., 2010; Das First, given a text specification describing an input
and Smith, 2011) and database records (Chen and format, we wish to infer a parse tree (which we
Mooney, 2008; Liang et al., 2009). call a specification tree) implied by the text. Sec-
Learning Semantics from Feedback Our ap- ond, we convert each specification tree into for-
proach is related to recent research on learn- mal grammar of the input format (represented in
ing from indirect supervision. Examples include Backus-Naur Form) and then generate code that
leveraging feedback available via responses from reads the input into data structures. In this paper,
a virtual world (Branavan et al., 2009) or from ex- we focus on the NLP techniques used in the first
ecuting predicted database queries (Chang et al., step, i.e., learning to infer the specification trees
2010; Clarke et al., 2010). While Branavan et from text. The second step is achieved using a de-
al. (2009) formalize the task as a sequence of de- terministic rule-based tool. 3
cisions and learns from local rewards in a Rein- As input, we are given a set of text specifica-
forcement Learning framework, our model learns tions w = {w1 , · · · , wN }, where each wi is a text
to predict the whole structure at a time. Another specification represented as a sequence of noun
difference is the way our model incorporates the phrases {wki }. We use UIUC shallow parser to
noisy feedback. While previous approaches rely preprocess each text specificaton into a sequence
on the feedback to train a discriminative prediction of the noun phrases.4 In addition, we are given a
model, our approach models a generative process set of input examples for each wi . We use these
to guide structure predictions when the feedback examples to test the generated input parsers to re-
is noisy or unavailable. 3
Specifically, the specification tree is first translated into
the grammar using a set of rules and seed words that identi-
NLP in Software Engineering Researchers fies basic data types such as int. Our implementation then
have recently developed a number of approaches generates a top-down parser since the generated grammar is
that apply natural language processing techniques simple. In general, standard techniques such as Bison and
Yacc (Johnson, 1979) can generate bottom-up parsers given
to software engineering problems. Examples in- such grammar.
4
clude analyzing API documents to infer API li- http://cogcomp.cs.illinois.edu/demo/shallowparse/?id=7
ject incorrect predictions made by our probabilis- The generative process is described formally as
tic model. follows:
We formalize the learning problem as a de-
pendency parsing and role labeling problem. • Generating Model Parameters: For every
Our model predicts specification trees t = pair of feature type f and phrase tag z, draw
{t1 , · · · , tN } for the text specifications, where a multinomial distribution parameter θfz from
each specification tree ti is a dependency tree over a Dirichlet prior P (θfz ). The multinomial pa-
noun phrases {wki }. In general many program in- rameters provide the probabilities of observ-
put formats are nested tree structures, in which the ing different feature values in the text.
tree root denotes the entire chunk of program in- • Generating Specification Tree: For each
put data and each chunk (tree node) can be further text specification, draw a specification tree t
divided into sub-chunks or primitive fields that ap- from all possible trees over the sequence of
pear in the program input (see Figure 3). There- noun phrases in this specification. We denote
fore our objective is to predict a dependency tree the probability of choosing a particular spec-
that correctly represents the structure of the pro- ification tree t as P (t).
gram input.
Intuitively, this distribution should assign
In addition, the role labeling problem is to as-
high probability to good specification trees
sign a tag zki to each noun phrase wki in a specifi-
that can produce C++ code that reads all input
cation tree, indicating whether the phrase is a key
examples without errors, we therefore define
phrase or a background phrase. Key phrases are
P (t) as follows:5
named entities that identify input fields or input
chunks appear in the program input data, such as 
“the input” or “the following lines” in Figure 3b.  1 the input parser of tree t

1  reads all input examples
In contrast, background phrases do not define in- P (t) = ·
Z  without error
put fields or chunks. These phrases are used to or- 

 otherwise
ganize the document (e.g., “your program”) or to
refer to key phrases described before (e.g., “each where Z is a normalization factor and  is em-
line”). pirically set to 10−6 . In other words, P (·)
treats all specification trees that pass the input
4 Model example test as equally probable candidates
We use two kinds of information to bias our and inhibits the model from generating trees
model: (1) the quality of the generated code as which fail the test. Note that we do not know
measured by its ability to read the given input ex- this distribution a priori until the specification
amples and (2) the features over the observed text trees are evaluated by testing the correspond-
wi and the hidden specification tree ti (this is stan- ing C++ code. Because it is intractable to test
dard in traditional parsing problems). We combine all possible trees and all possible generated
these two kinds of information into a Bayesian code for a text specification, we never explic-
generative model in which the code quality of the itly compute the normalization factor 1/Z of
specification tree is captured by the prior probabil- this distribution. We therefore use sampling
ity P (t) and the feature observations are encoded methods to tackle this problem during infer-
in the likelihood probability P (w|t). The infer- ence.
ence jointly optimizes these two factors: • Generating Features: The final step gener-
ates lexical and contextual features for each
P (t|w) ∝ P (t) · P (w|t).
tree. For each phrase wk associated with tag
Modeling the Generative Process. We assume zk , let wp be its parent phrase in the tree and
the generative model operates by first generating ws be the non-background sibling phrase to
the model parameters from a set of Dirichlet dis- its left in the tree. The model generates the
tributions. The model then generates text spec- corresponding set of features φ(wp , ws , wk )
ification trees. Finally, it generates natural lan- for each text phrase tuple (wp , ws , wk ), with
guage feature observations conditioned on the hid- 5
When input examples are not available, P (t) is just uni-
den specification trees. form distribution.
probability P (φ(wp , ws , wk )). We assume However directly solving the subproblem (1)
that each feature fj is generated indepen- in our case is still hard, we therefore use a
dently: Metropolis-Hastings sampler that is similarly ap-
plied in traditional sentence parsing problems.
P (w|t) = P (φ(wp , ws , wk )) Specifically, the Hastings sampler approximates
0
(1) by first drawing a new ti from a tractable pro-
Y
= θfzjk
fj ∈φ(wp ,ws ,wk ) posal distribution Q instead of P (ti |w, t−i ). We
choose Q to be:
where θfzjk is the j-th component in the multi- 0 0
Q(ti |θ0 , wi ) ∝ P (wi |ti , θ0 ). (2)
nomial distribution θfzk denoting the proba-
bility of observing a feature fj associated Then the probability of accepting the new sample
with noun phrase wk labeled with tag zk . We is determined using the typical Metropolis Hast-
0
define a range of features that capture the cor- ings process. Specifically, ti will be accepted to
respondence between the input format and its replace the last ti with probability:
description in natural language. For example, 0
( )
i i0 P (ti |w, t−i ) Q(ti |θ0 , wi )
at the unigram level we aim to capture that R(t , t ) = min 1,
P (ti |w, t−i ) Q(ti 0 |θ0 , wi )
noun phrases containing specific words such
0
( )
as “cases” and “lines” may be key phrases P (ti , t−i , w)P (wi |ti , θ0 )
= min 1, ,
(correspond to data chunks appear in the in- P (ti , t−i , w)P (wi |ti 0 , θ0 )
put), and that verbs such as “contain” may
indicate that the next noun phrase is a key in which the normalization factors 1/Z are can-
phrase. celled out. We choose θ0 to be the parameter ex-
pectation based on  the current observations, i.e.
The full joint probability of a set w of N spec- θ0 = E θ|w, t−i , so that the proposal distribu-
ifications and hidden text specification trees t is tion is close to the true distribution. This sampling
defined as: algorithm with a changing proposal distribution
N has been shown to work well in practice (John-
Y
i i i son and Griffiths, 2007; Cohn et al., 2010; Naseem
P (θ, t, w) = P (θ) P (t )P (w |t , θ)
i=1 and Barzilay, 2011). The algorithm pseudo code is
Y N Y shown in Algorithm 1.
i i i i
= P (θ) P (t ) P (φ(wp , ws , wk )). To sample from the proposal distribution (2) ef-
i=1 k ficiently, we implement a dynamic programming
algorithm which calculates marginal probabilities
Learning the Model During inference, we want
of all subtrees. The algorithm works similarly to
to estimate the hidden specification trees t given
the inside algorithm (Baker, 1979), except that we
the observed natural language specifications w, af-
do not assume the tree is binary. We therefore per-
ter integrating the model parameters out, i.e.
form one additional dynamic programming step
Z
that sums over all possible segmentations of each
t ∼ P (t|w) = P (t, θ|w)dθ .
θ span. Once the algorithm obtains the marginal
probabilities of all subtrees, a specification tree
We use Gibbs sampling to sample variables t from can be drawn recursively in a top-down manner.
this distribution. In general, the Gibbs sampling Calculating P (t, w) in R(t, t0 ) requires inte-
algorithm randomly initializes the variables and grating the parameters θ out. This has a closed
then iteratively solves one subproblem at a time. form due to the Dirichlet-multinomial conjugacy:
The subproblem is to sample only one variable Z
conditioned on the current values of all other vari- P (t, w) = P (t) · P (w|t, θ)P (θ)dθ
ables. In our case, we sample one hidden spec- Yθ
i −i
ification tree t while holding all other trees t ∝ P (t) · Beta (count(f ) + α) .
fixed:
Here α are the Dirichlet hyper parameters and
i i
t ∼ P (t |w, t ) −i
(1) count(f ) are the feature counts observed in data
(t, w). The closed form is a product of the Beta
where t−i = (t1 , · · · , ti−1 , ti+1 , · · · , tN ). functions of each feature type.
Feature Type Description Feature Value
Word each word in noun phrase wk lines, VAR
Verb verbs in noun phrase wk and the verb phrase before wk contains
Distance sentence distance between wk and its parent phrase wp 1
Coreference wk share duplicate nouns or variable names with wp or ws True

Table 1: Example of feature types and values. To deal with sparsity, we map variable names such as “N”
and “X” into a category word “VAR” in word features.

Input: Set of text specification documents Total # of words 7330


w = {w1 , · · · , wN }, Total # of noun phrases 1829
Number of iterations T Vocabulary size 781
Avg. # of words per sentence 17.29
1 Randomly initialize specification trees Avg. # of noun phrase per document 17.26
t = {t1 , · · · , tN } Avg. # of possible trees per document 52K
Median # of possible trees per document 79
2 for iter = 1 · · · T do
Min # of possible trees per document 1
3 Sample tree ti for i-th document: Max # of possible trees per document 2M
4 for i = 1 · · · N do
5 Estimate model parameters: Table 2: Statistics for 106 ICPC specifications.
θ0 = E θ0 |w, t−i
 
6

7 Sample a new specification tree from distribution background phrase (for example, “each test case”
Q:
8 t0 ∼ Q(t0 |θ0 , wi ) after mentioning “test cases”), by initially adding
9 Generate and test code, and return feedback: pseudo counts to Dirichlet priors.
10 f 0 = CodeGenerator(wi , t0 )
11 Calculate accept probability r: 5 Experimental Setup
12 r = R(ti , t0 )
Datasets: Our dataset consists of problem de-
13 Accept the new tree with probability r:
scriptions from ACM International Collegiate Pro-
14 With probability r : ti = t0
gramming Contests.6 We collected 106 problems
15 end
from ACM-ICPC training websites.7 From each
16 end problem description, we extracted the portion that
17 Produce final structures: provides input specifications. Because the test
18 return { ti if ti gets positive feedback }
input examples are not publicly available on the
Algorithm 1: The sampling framework for learn- ACM-ICPC training websites, for each specifica-
ing the model. tion, we wrote simple programs to generate 100
random input examples.
Table 2 presents statistics for the text specifica-
Model Implementation: We define several
tion set. The data set consists of 424 sentences,
types of features to capture the correlation be-
where an average sentence contains 17.3 words.
tween the hidden structure and its expression in
The data set contains 781 unique words. The
natural language. For example, verb features are
length of each text specification varies from a sin-
introduced because certain preceding verbs such
gle sentence to eight sentences. The difference be-
as “contains” and “consists” are good indicators of
tween the average and median number of trees is
key phrases. There are 991 unique features in total
large. This is because half of the specifications are
in our experiments. Examples of features appear
relatively simple and have a small number of pos-
in Table 1.
sible trees, while a few difficult specifications have
We use a small set of 8 seed words to bias the
over thousands of possible trees (as the number of
search space. Specifically, we require each leaf
trees grows exponentially when the text length in-
key phrase to contain at least one seed word that
creases).
identifies the C++ primitive data type (such as “in-
teger”, “float”, “byte” and “string”). Evaluation Metrics: We evaluate the model
We also encourage a phrase containing the word 6
Official Website: http://cm.baylor.edu/welcome.icpc
“input” to be the root of the tree (for example, “the 7
PKU Online Judge: http://poj.org/; UVA Online Judge:
input file”) and each coreference phrase to be a http://uva.onlinejudge.org/
performance in terms of its success in generating a Model Recall Precision F-Score
formal grammar that correctly represents the input No Learning 52.0 57.2 54.5
Aggressive 63.2 70.5 66.7
format (see Figure 3c). As a gold annotation, we Full Model 72.5 89.3 80.0
construct formal grammars for all text specifica- Full Model (Oracle) 72.5 100.0 84.1
Aggressive (Oracle) 80.2 100.0 89.0
tions. Our results are generated by automatically
comparing the machine-generated grammars with Table 3: Average % Recall and % Precision of our
their golden counterparts. If the formal grammar model and all baselines over 20 independent runs.
is correct, then the generated C++ parser will cor-
rectly read the input file into corresponding C++
data structures. 2010) and choose a structural Support Vector Ma-
We use Recall and Precision as evaluation mea- chine SVMstruct 9 as the structure learner.
sures: The remaining baselines provide an upper
# correct structures bound on the performance of our model. The base-
Recall = line Full Model (Oracle) is the same as our full
# text specifications
# correct structures model except that the feedback comes from an or-
Precision = acle which tells whether the specification tree is
# produced structures
correct or not. We use this oracle information in
where the produced structures are the positive the prior P (t) same as we use the noisy feedback.
structures returned by our framework whose corre- Similarly the baseline Aggressive (Oracle) is the
sponding code successfully reads all input exam- Aggressive baseline with access to the oracle.
ples (see Algorithm 1 line 18). Note the number of
produced structures may be less than the number Experimental Details: Because no human an-
of text specifications, because structures that fail notation is required for learning, we train our
the input test are not returned. model and all baselines on all 106 ICPC text spec-
ifications (similar to unsupervised learning). We
Baselines: To evaluate the performance of our
report results averaged over 20 independent runs.
model, we compare against four baselines.
For each of these runs, the model and all baselines
The No Learning baseline is a variant of our
run 100 iterations. For baseline Aggressive, in
model that selects a specification tree without
each iteration the SVM structure learner predicts
learning feature correspondence. It continues
one tree with the highest score for each text spec-
sampling a specification tree for each text speci-
ification. If two different specification trees of the
fication until it finds one which successfully reads
same text specification get positive feedback, we
all of the input examples.
take the one generated in later iteration for evalu-
The second baseline Aggressive is a state-of-
ation.
the-art semantic parsing framework (Clarke et al.,
2010).8 The framework repeatedly predicts hidden 6 Experimental Results
structures (specification trees in our case) using a
structure learner, and trains the structure learner Comparison with Baselines Table 3 presents
based on the execution feedback of its predictions. the performance of various models in predicting
Specifically, at each iteration the structure learner correct specification trees. As can be seen, our
predicts the most plausible specification tree for model achieves an F-Score of 80%. Our model
each text document: therefore significantly outperforms the No Learn-
ing baseline (by more than 25%). Note that the
ti = argmaxt f (wi , t). No Learning baseline achieves a low Precision
Depending on whether the corresponding code of 57.2%. This low precision reflects the noisi-
reads all input examples successfully or not, the ness of the weak supervision - nearly one half of
(wi , ti ) pairs are added as an positive or negative the parsers produced by No Learning are actually
sample to populate a training set. After each it- incorrect even though they read all of the input
eration the structure learner is re-trained with the examples without error. This comparison shows
training samples to improve the prediction accu- the importance of capturing correlations between
racy. In our experiment, we follow (Clarke et al., the specification trees and their text descriptions.
8 9
We take the name Aggressive from this paper. www.cs.cornell.edu/people/tj/svm light/svm struct.html
(a) The input contains several testcases.
Each is specified by two strings S, T of alphanumeric ASCII characters

The next N lines of the input file contain the Cartesian coordinates of
(b)
watchtowers, one pair of coordinates per line.

Figure 4: Examples of dependencies and key phrases predicted by our model. Green marks correct key
phrases and dependencies and red marks incorrect ones. The missing key phrases are marked in gray.

%supervision #input examples

Figure 5: Precision and Recall of our model by Figure 6: Precision and Recall of our model by
varying the percentage of weak supervision. The varying the number of available input examples
green lines are the performance of Aggressive per text specification.
baseline trained with full weak supervision.

Impact of Input Examples Our model can also


Because our model learns correlations via feature be trained in a fully unsupervised or a semi-
representations, it produces substantially more ac- supervised fashion. In real cases, it may not be
curate translations. possible to obtain input examples for all text spec-
While both the Full Model and Aggressive base- ifications. We evaluate such cases by varying the
line use the same source of feedback, they capi- amount of supervision, i.e. how many text specifi-
talize on it in a different way. The baseline uses cations are paired with input examples. In each
the noisy feedback to train features capturing the run, we randomly select text specifications and
correlation between trees and text. Our model, in only these selected specifications have access to
contrast, combines these two sources of informa- input examples. Figure 5 gives the performance of
tion in a complementary fashion. This combina- our model with 0% supervision (totally unsuper-
tion allows our model to filter false positive feed- vised) to 100% supervision (our full model). With
back and produce 13% more correct translations much less supervision, our model is still able to
than the Aggressive baseline. achieve performance comparable with the Aggres-
sive baseline.
Clean versus Noisy Supervision To assess the We also evaluate how the number of provided
impact of noise on model accuracy, we compare input examples influences the performance of the
the Full Model against the Full Model (Oracle). model. Figure 6 indicates that the performance is
The two versions achieve very close performance largely insensitive to the number of input exam-
(80% v.s 84% in F-Score), even though Full Model ples — once the model is given even one input
is trained with noisy feedback. This demonstrates example, its performance is close to the best per-
the strength of our model in learning from such formance it obtains with 100 input examples. We
weak supervision. Interestingly, Aggressive (Ora- attribute this phenomenon to the fact that if the
cle) outperforms our oracle model by a 5% mar- generated code is incorrect, it is unlikely to suc-
gin. This result shows that when the supervision cessfully parse any input.
is reliable, the generative assumption limits our
model’s ability to gain the same performance im- Case Study Finally, we consider some text spec-
provement as discriminative models. ifications that our model does not correctly trans-
late. In Figure 4a, the program input is interpreted for mapping instructions to actions. In Proceedings
as a list of character strings, while the correct in- of the Annual Meeting of the Association for Com-
putational Linguistics.
terpretation is that the input is a list of string pairs.
Note that both interpretations produce C++ input S.R.K Branavan, Luke Zettlemoyer, and Regina Barzi-
parsers that successfully read all of the input ex- lay. 2010. Reading between the lines: Learning to
amples. One possible way to resolve this problem map high-level instructions to commands. In Pro-
is to add other features such as syntactic depen- ceedings of ACL, pages 1268–1277.
dencies between words to capture more language Mingwei Chang, Vivek Srikumar, Dan Goldwasser,
phenomena. In Figure 4b, the missing key phrase and Dan Roth. 2010. Structured output learning
is not identified because our model is not able to with indirect supervision. In Proceedings of the 27th
ground the meaning of “pair of coordinates” to two International Conference on Machine Learning.
integers. Possible future extensions to our model David L. Chen and Raymond J. Mooney. 2008. Learn-
include using lexicon learning methods for map- ing to sportscast: A test of grounded language acqui-
ping words to C++ primitive types for example sition. In Proceedings of 25th International Confer-
“coordinates” to hint, inti. ence on Machine Learning (ICML-2008).

James Clarke, Dan Goldwasser, Ming-Wei Chang, and


7 Conclusion Dan Roth. 2010. Driving semantic parsing from
It is standard practice to write English language the world’s response. In Proceedings of the Four-
teenth Conference on Computational Natural Lan-
specifications for input formats. Programmers guage Learning.
read the specifications, then develop source code
that parses inputs in the format. Known disadvan- Trevor Cohn, Phil Blunsom, and Sharon Goldwater.
tages of this approach include development cost, 2010. Inducing tree-substitution grammars. Jour-
nal of Machine Learning Research, 11.
parsers that contain errors, specification misunder-
standings, and specifications that become out of Dipanjan Das and Noah A. Smith. 2011. Semi-
date as the implementation evolves. supervised frame-semantic parsing for unknown
Our results show that taking both the correlation predicates. In Proceedings of the 49th Annual Meet-
ing of the Association for Computational Linguis-
between the text and the specification tree and the
tics: Human Language Technologies, pages 1435–
success of the generated C++ parser in reading in- 1444.
put examples into account enables our method to
correctly generate C++ parsers for 72.5% of our Dipanjan Das, Nathan Schneider, Desai Chen, and
natural language specifications. Noah A. Smith. 2010. Probabilistic frame-semantic
parsing. In Human Language Technologies: The
2010 Annual Conference of the North American
8 Acknowledgements Chapter of the Association for Computational Lin-
guistics, pages 948–956.
The authors acknowledge the support of Battelle
Memorial Institute (PO #300662) and the NSF Dan Goldwasser, Roi Reichart, James Clarke, and Dan
(Grant IIS-0835652). Thanks to Mirella Lapata, Roth. 2011. Confidence driven unsupervised se-
members of the MIT NLP group and the ACL re- mantic parsing. In Proceedings of the 49th Annual
viewers for their suggestions and comments. Any Meeting of the Association for Computational Lin-
guistics: Human Language Technologies - Volume
opinions, findings, conclusions, or recommenda- 1, HLT ’11.
tions expressed in this paper are those of the au-
thors, and do not necessarily reflect the views of Mark Johnson and Thomas L. Griffiths. 2007.
the funding organizations. Bayesian inference for pcfgs via markov chain
monte carlo. In Proceedings of the North American
Conference on Computational Linguistics (NAACL
’07).
References
James K. Baker. 1979. Trainable grammars for speech Stephen C. Johnson. 1979. Yacc: Yet another
recognition. In DH Klatt and JJ Wolf, editors, compiler-compiler. Unix Programmer’s Manual,
Speech Communication Papers for the 97th Meet- vol 2b.
ing of the Acoustical Society of America, pages 547–
550. Rohit J. Kate and Raymond J. Mooney. 2007. Learn-
ing language semantics from ambiguous supervi-
S. R. K. Branavan, Harr Chen, Luke S. Zettlemoyer, sion. In Proceedings of the 22nd national confer-
and Regina Barzilay. 2009. Reinforcement learning ence on Artificial intelligence - Volume 1, AAAI’07.
P. Liang, M. I. Jordan, and D. Klein. 2009. Learning 2009 IEEE/ACM International Conference on Auto-
semantic correspondences with less supervision. In mated Software Engineering, ASE ’09, pages 307–
Association for Computational Linguistics and In- 318, Washington, DC, USA. IEEE Computer Soci-
ternational Joint Conference on Natural Language ety.
Processing (ACL-IJCNLP).

P. Liang, M. I. Jordan, and D. Klein. 2011. Learn-


ing dependency-based compositional semantics. In
Proceedings of the Annual Meeting of the Associa-
tion for Computational Linguistics.

Tahira Naseem and Regina Barzilay. 2011. Using se-


mantic cues to learn syntax. In Proceedings of the
25th National Conference on Artificial Intelligence
(AAAI).

Rahul Pandita, Xusheng Xiao, Hao Zhong, Tao Xie,


Stephen Oney, and Amit Paradkar. 2012. Inferring
method specifications from natural language api de-
scriptions. In Proceedings of the 2012 International
Conference on Software Engineering, ICSE 2012,
pages 815–825, Piscataway, NJ, USA. IEEE Press.

Hoifung Poon and Pedro Domingos. 2009. Unsuper-


vised semantic parsing. In Proceedings of the 2009
Conference on Empirical Methods in Natural Lan-
guage Processing: Volume 1 - Volume 1, EMNLP
’09.

Lin Tan, Ding Yuan, Gopal Krishna, and Yuanyuan


Zhou. 2007. /* iComment: Bugs or bad comments?
*/. In Proceedings of the 21st ACM Symposium on
Operating Systems Principles (SOSP07), October.

Lin Tan, Yuanyuan Zhou, and Yoann Padioleau. 2011.


aComment: Mining annotations from comments and
code to detect interrupt-related concurrency bugs. In
Proceedings of the 33rd International Conference on
Software Engineering (ICSE11), May.

Lappoon R. Tang and Raymond J. Mooney. 2000. Au-


tomated construction of database interfaces: inte-
grating statistical and relational learning for seman-
tic parsing. In Proceedings of the conference on
Empirical Methods in Natural Language Process-
ing, EMNLP ’00.

Yuk Wah Wong and Raymond J. Mooney. 2007.


Learning synchronous grammars for semantic pars-
ing with lambda calculus. In ACL.

Luke S. Zettlemoyer and Michael Collins. 2005.


Learning to map sentences to logical form: Struc-
tured classification with probabilistic categorial
grammars. In Proceedings of UAI, pages 658–666.

Luke S. Zettlemoyer and Michael Collins. 2009.


Learning context-dependent mappings from sen-
tences to logical form. In Proceedings of the An-
nual Meeting of the Association for Computational
Linguistics.

Hao Zhong, Lu Zhang, Tao Xie, and Hong Mei. 2009.


Inferring resource specifications from natural lan-
guage api documentation. In Proceedings of the

You might also like