MIT Open Access
MIT Open Access
Parsers
The MIT Faculty has made this article openly available. Please share
how this access benefits you. Your story matters.
Citation Lei, Tao; Long, Fan; Barzilay, Regina; Rinard, Martin C. "From
Natural Language Specifications to Program Input Parsers". The
51st Annual Meeting of the Association for Computational
Linguistics (ACL 2013).
As Published http://acl2013.org/site/accepted/299.html
Publisher Association for Computational Linguistics (ACL)
lated into a C++ input parser. We model an integer N the next N lines
the problem as a joint dependency pars-
ing and semantic role labeling task. Our N characters
method is based on two sources of infor- (c) Two Program Input Examples:
mation: (1) the correlation between the 1 2
10 1
text and the specification tree and (2) noisy YYWYYWWWWW Y
supervision as determined by the success YWWWYWWWWW 5
YYWYYWWWWW YWYWW
of the generated C++ parser in reading in- ... ...
WWWWWWWWWW WWYYY
put examples. Our results show that our
approach achieves 80.0% F-Score accu-
Figure 1: An example of (a) one natural language
racy compared to an F-Score of 66.7%
specification describing program input data; (b)
produced by a state-of-the-art semantic
the corresponding specification tree representing
parser on a dataset of input format speci-
the program input structure; and (c) two input ex-
fications from the ACM International Col-
amples
legiate Programming Contest (which were
written in English for humans with no in-
however, researchers have had success address-
tention of providing support for automated
ing specific aspects of this problem. Recent ad-
processing).1
vances in this area include the successful transla-
1 Introduction tion of natural language commands into database
queries (Wong and Mooney, 2007; Zettlemoyer
The general problem of translating natural lan- and Collins, 2009; Poon and Domingos, 2009;
guage specifications into executable code has been Liang et al., 2011) and the successful mapping of
around since the field of computer science was natural language instructions into Windows com-
founded. Early attempts to solve this problem mand sequences (Branavan et al., 2009; Branavan
produced what were essentially verbose, clumsy, et al., 2010).
and ultimately unsuccessful versions of standard In this paper we explore a different aspect of
formal programming languages. In recent years this general problem: the translation of natural
1
The code, data, and experimental setup for this research language input specifications into executable code
are available at http://groups.csail.mit.edu/rbg/code/nl2p that correctly parses the input data and generates
data structures for holding the data. The need 1 struct TestCaseType {
2 int N;
to automate this task arises because input format 3 vector<NLinesType*> lstLines;
specifications are almost always described in natu- 4 InputType* pParentLink;
5 }
ral languages, with these specifications then man- 6
ually translated by a programmer into the code 7 struct InputType {
for reading the program inputs. Our method 8 int T;
9 vector<TestCaseType*> lstTestCase;
highlights potential to automate this translation, 10 }
thereby eliminating the manual software develop- 11
code from the text specification, we first translate 23 InputType* ReadInput(FILE * pStream) {
24 InputType* pInput = new InputType;
the specification into a specification tree (see Fig- 25
ure 1b), then map this tree into parser code (see 26 pInput→T = ReadInteger(pStream);
Figure 2). We focus on the translation from the 27 for (int i = 0; i < pInput→T; ++i) {
28 TestCaseType* pTestCase
text specification to the specification tree.2 29 = new TestCaseType;
We assume that each text specification is ac- 30 pTestCase = ReadTestCase (pStream,
31 pInput);
companied by a set of input examples that the de- 32 pInput→lstTestCase.push back (pTestCase);
sired input parser is required to successfully read. 33 }
34
In standard software development contexts, such
35 return pInput;
input examples are usually available and are used 36 }
to test the correctness of the input parser. Note that
this source of supervision is noisy — the generated Figure 2: Input parser code for reading input files
parser may still be incorrect even when it success- specified in Figure 1.
fully reads all of the input examples. Specifically,
the parser may interpret the input examples differ- cation trees given text specifications. A specifica-
ently from the text specification. For example, the tion tree is rejected in the sampling framework if
program input in Figure 1c can be interpreted sim- the corresponding code fails to successfully read
ply as a list of strings. The parser may also fail all of the input examples. The sampling frame-
to parse some correctly formatted input files not in work also rejects the tree if the text/specification
the set of input examples. Therefore, our goal is to tree pair has low probability.
design a technique that can effectively learn from
this weak supervision. We evaluate our method on a dataset of in-
put specifications from ACM International Colle-
We model our problem as a joint depen-
giate Programming Contests, along with the cor-
dency parsing and role labeling task, assuming
responding input examples. These specifications
a Bayesian generative process. The distribution
were written for human programmers with no in-
over the space of specification trees is informed
tention of providing support for automated pro-
by two sources of information: (1) the correla-
cessing. However, when trained using the noisy
tion between the text and the corresponding spec-
supervision, our method achieves substantially
ification tree and (2) the success of the generated
more accurate translations than a state-of-the-art
parser in reading input examples. Our method uses
semantic parser (Clarke et al., 2010) (specifically,
a joint probability distribution to take both of these
80.0% in F-Score compared to an F-Score of
sources of information into account, and uses a
66.7%). The strength of our model in the face of
sampling framework for the inference of specifi-
such weak supervision is also highlighted by the
2
During the second step of the process, the specification fact that it retains an F-Score of 77% even when
tree is deterministically translated into code. only one input example is provided for each input
Text Specification: Specification Tree: Formal Input Grammar Definition:
Your program is supposed to read the input from the standard Input := N
the input
input and write its output to the standard output. Lines [size = N]
FollowingLines [size = *]
The first line of the input contains one integer N.
one integer N N lines the following lines N := int
N lines follow, the i-th of them contains two real numbers Xi, Yi
Lines := Xi Yi
separated by a single space - the coordinates of the i-th house.
Xi := float
Each of the following lines contains four real numbers separated
Yi := float
by a single space. These numbers are the coordinates of two two real four real
FollowingLines := F1 F2 F3 F4
numbers Xi, Yi numbers
different points (X1, Y1) and (X2, Y2), lying on the highway.
F1 := float
Figure 3: An example of generating input parser code from text: (a) a natural language input specifica-
tion; (b) a specification tree representing the input format structure (we omit the background phrases in
this tree in order to give a clear view of the input format structure); and (c) formal definition of the input
format constructed from the specification tree, represented as a context-free grammar in Backus-Naur
Form with additional size constraints.
Table 1: Example of feature types and values. To deal with sparsity, we map variable names such as “N”
and “X” into a category word “VAR” in word features.
7 Sample a new specification tree from distribution background phrase (for example, “each test case”
Q:
8 t0 ∼ Q(t0 |θ0 , wi ) after mentioning “test cases”), by initially adding
9 Generate and test code, and return feedback: pseudo counts to Dirichlet priors.
10 f 0 = CodeGenerator(wi , t0 )
11 Calculate accept probability r: 5 Experimental Setup
12 r = R(ti , t0 )
Datasets: Our dataset consists of problem de-
13 Accept the new tree with probability r:
scriptions from ACM International Collegiate Pro-
14 With probability r : ti = t0
gramming Contests.6 We collected 106 problems
15 end
from ACM-ICPC training websites.7 From each
16 end problem description, we extracted the portion that
17 Produce final structures: provides input specifications. Because the test
18 return { ti if ti gets positive feedback }
input examples are not publicly available on the
Algorithm 1: The sampling framework for learn- ACM-ICPC training websites, for each specifica-
ing the model. tion, we wrote simple programs to generate 100
random input examples.
Table 2 presents statistics for the text specifica-
Model Implementation: We define several
tion set. The data set consists of 424 sentences,
types of features to capture the correlation be-
where an average sentence contains 17.3 words.
tween the hidden structure and its expression in
The data set contains 781 unique words. The
natural language. For example, verb features are
length of each text specification varies from a sin-
introduced because certain preceding verbs such
gle sentence to eight sentences. The difference be-
as “contains” and “consists” are good indicators of
tween the average and median number of trees is
key phrases. There are 991 unique features in total
large. This is because half of the specifications are
in our experiments. Examples of features appear
relatively simple and have a small number of pos-
in Table 1.
sible trees, while a few difficult specifications have
We use a small set of 8 seed words to bias the
over thousands of possible trees (as the number of
search space. Specifically, we require each leaf
trees grows exponentially when the text length in-
key phrase to contain at least one seed word that
creases).
identifies the C++ primitive data type (such as “in-
teger”, “float”, “byte” and “string”). Evaluation Metrics: We evaluate the model
We also encourage a phrase containing the word 6
Official Website: http://cm.baylor.edu/welcome.icpc
“input” to be the root of the tree (for example, “the 7
PKU Online Judge: http://poj.org/; UVA Online Judge:
input file”) and each coreference phrase to be a http://uva.onlinejudge.org/
performance in terms of its success in generating a Model Recall Precision F-Score
formal grammar that correctly represents the input No Learning 52.0 57.2 54.5
Aggressive 63.2 70.5 66.7
format (see Figure 3c). As a gold annotation, we Full Model 72.5 89.3 80.0
construct formal grammars for all text specifica- Full Model (Oracle) 72.5 100.0 84.1
Aggressive (Oracle) 80.2 100.0 89.0
tions. Our results are generated by automatically
comparing the machine-generated grammars with Table 3: Average % Recall and % Precision of our
their golden counterparts. If the formal grammar model and all baselines over 20 independent runs.
is correct, then the generated C++ parser will cor-
rectly read the input file into corresponding C++
data structures. 2010) and choose a structural Support Vector Ma-
We use Recall and Precision as evaluation mea- chine SVMstruct 9 as the structure learner.
sures: The remaining baselines provide an upper
# correct structures bound on the performance of our model. The base-
Recall = line Full Model (Oracle) is the same as our full
# text specifications
# correct structures model except that the feedback comes from an or-
Precision = acle which tells whether the specification tree is
# produced structures
correct or not. We use this oracle information in
where the produced structures are the positive the prior P (t) same as we use the noisy feedback.
structures returned by our framework whose corre- Similarly the baseline Aggressive (Oracle) is the
sponding code successfully reads all input exam- Aggressive baseline with access to the oracle.
ples (see Algorithm 1 line 18). Note the number of
produced structures may be less than the number Experimental Details: Because no human an-
of text specifications, because structures that fail notation is required for learning, we train our
the input test are not returned. model and all baselines on all 106 ICPC text spec-
ifications (similar to unsupervised learning). We
Baselines: To evaluate the performance of our
report results averaged over 20 independent runs.
model, we compare against four baselines.
For each of these runs, the model and all baselines
The No Learning baseline is a variant of our
run 100 iterations. For baseline Aggressive, in
model that selects a specification tree without
each iteration the SVM structure learner predicts
learning feature correspondence. It continues
one tree with the highest score for each text spec-
sampling a specification tree for each text speci-
ification. If two different specification trees of the
fication until it finds one which successfully reads
same text specification get positive feedback, we
all of the input examples.
take the one generated in later iteration for evalu-
The second baseline Aggressive is a state-of-
ation.
the-art semantic parsing framework (Clarke et al.,
2010).8 The framework repeatedly predicts hidden 6 Experimental Results
structures (specification trees in our case) using a
structure learner, and trains the structure learner Comparison with Baselines Table 3 presents
based on the execution feedback of its predictions. the performance of various models in predicting
Specifically, at each iteration the structure learner correct specification trees. As can be seen, our
predicts the most plausible specification tree for model achieves an F-Score of 80%. Our model
each text document: therefore significantly outperforms the No Learn-
ing baseline (by more than 25%). Note that the
ti = argmaxt f (wi , t). No Learning baseline achieves a low Precision
Depending on whether the corresponding code of 57.2%. This low precision reflects the noisi-
reads all input examples successfully or not, the ness of the weak supervision - nearly one half of
(wi , ti ) pairs are added as an positive or negative the parsers produced by No Learning are actually
sample to populate a training set. After each it- incorrect even though they read all of the input
eration the structure learner is re-trained with the examples without error. This comparison shows
training samples to improve the prediction accu- the importance of capturing correlations between
racy. In our experiment, we follow (Clarke et al., the specification trees and their text descriptions.
8 9
We take the name Aggressive from this paper. www.cs.cornell.edu/people/tj/svm light/svm struct.html
(a) The input contains several testcases.
Each is specified by two strings S, T of alphanumeric ASCII characters
The next N lines of the input file contain the Cartesian coordinates of
(b)
watchtowers, one pair of coordinates per line.
Figure 4: Examples of dependencies and key phrases predicted by our model. Green marks correct key
phrases and dependencies and red marks incorrect ones. The missing key phrases are marked in gray.
Figure 5: Precision and Recall of our model by Figure 6: Precision and Recall of our model by
varying the percentage of weak supervision. The varying the number of available input examples
green lines are the performance of Aggressive per text specification.
baseline trained with full weak supervision.