0% found this document useful (0 votes)

31 views14 pages

N - S P S: Euro Ymbolic Rogram Ynthesis

Uploaded by

nazimbash

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views14 pages

N - S P S: Euro Ymbolic Rogram Ynthesis

Uploaded by

nazimbash

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Under review as a conference paper at ICLR 2017

N EURO -S YMBOLIC P ROGRAM S YNTHESIS

Emilio Parisotto1,2 , Abdel-rahman Mohamed1 , Rishabh Singh1 ,
Lihong Li1 , Dengyong Zhou1 , Pushmeet Kohli1
1
Microsoft Research, USA 2 Carnegie Mellon University, USA
[email protected] , {asamir,risin,lihongli,denzho,pkohli}@microsoft.com

A BSTRACT
arXiv:1611.01855v1 [cs.AI] 6 Nov 2016

Recent years have seen the proposal of a number of neural architectures for the
problem of Program Induction. Given a set of input-output examples, these ar-
chitectures are able to learn mappings that generalize to new test inputs. While
achieving impressive results, these approaches have a number of important limi-
tations: (a) they are computationally expensive and hard to train, (b) a model has
to be trained for each task (program) separately, and (c) it is hard to interpret or
verify the correctness of the learnt mapping (as it is defined by a neural network).
In this paper, we propose a novel technique, Neuro-Symbolic Program Synthesis,
to overcome the above-mentioned problems. Once trained, our approach can au-
tomatically construct computer programs in a domain-specific language that are
consistent with a set of input-output examples provided at test time. Our method
is based on two novel neural modules. The first module, called the cross corre-
lation I/O network, given a set of input-output examples, produces a continuous
representation of the set of I/O examples. The second module, the Recursive-
Reverse-Recursive Neural Network (R3NN), given the continuous representation
of the examples, synthesizes a program by incrementally expanding partial pro-
grams. We demonstrate the effectiveness of our approach by applying it to the
rich and complex domain of regular expression based string transformations. Ex-
periments show that the R3NN model is not only able to construct programs from
new input-output examples, but it is also able to construct new programs for tasks
that it had never observed before during training.

1 I NTRODUCTION

The act of programming, i.e., developing a procedure to accomplish a task, is a remarkable demon-
stration of the reasoning abilities of the human mind. Expectedly, Program Induction is considered
as one of the fundamental problems in Machine Learning and Artificial Intelligence. Recent progress
on deep learning has led to the proposal of a number of promising neural architectures for this prob-
lem. Many of these models are inspired from computation modules (CPU, RAM, GPU) (Graves
et al., 2014; Kurach et al., 2015; Reed & de Freitas, 2015; Neelakantan et al., 2015) or common
data structures used in many algorithms (stack) (Joulin & Mikolov, 2015). A common thread in this
line of work is to specify the atomic operations of the network in some differentiable form, allowing
efficient end-to-end training of a neural controller, or to use reinforcement learning to make hard
choices about which operation to perform. While these results are impressive, these approaches
have a number of important limitations: (a) they are computationally expensive and hard to train, (b)
a model has to be trained for each task (program) separately, and (c) it is hard to interpret or verify
the correctness of the learnt mapping (as it is defined by a neural network). While some recently
proposed methods (Kurach et al., 2015; Gaunt et al., 2016; Riedel et al., 2016; Bunel et al., 2016)
do learn interpretable programs, they still need to learn a separate neural network model for each
individual task.
Motivated by the need for model interpretability and scalability to multiple tasks, we address the
problem of Program Synthesis. Program Synthesis, the problem of automatically constructing pro-
grams that are consistent with a given specification, has long been a subject of research in Computer
Science (Biermann, 1978; Summers, 1977). This interest has been reinvigorated in recent years on

1
Under review as a conference paper at ICLR 2017

the back of the development of methods for learning programs in various domains, ranging from
low-level bit manipulation code (Solar-Lezama et al., 2005) to data structure manipulations (Singh
& Solar-Lezama, 2011) and regular expression based string transformations (Gulwani, 2011).
Most of the recently proposed methods for program synthesis operate by searching the space of
programs in a Domain-Specific Language (DSL) instead of arbitrary Turing-complete languages.
This hypothesis space of possible programs is huge (potentially infinite) and searching over it is a
challenging problem. Several search techniques including enumerative (Udupa et al., 2013), stochas-
tic (Schkufza et al., 2013), constraint-based (Solar-Lezama, 2008), and version-space algebra based
algorithms (Gulwani et al., 2012) have been developed to search over the space of programs in the
DSL, which support different kinds of specifications (examples, partial programs, natural language
etc.) and domains. These techniques not only require significant engineering and research effort to
develop carefully-designed heuristics for efficient search, but also have limited applicability and can
only synthesize programs of limited sizes and types.
In this paper, we present a novel technique called Neuro-Symbolic Program Synthesis (NSPS) that
learns to generate a program incrementally without the need for an explicit search. Once trained,
NSPS can automatically construct computer programs that are consistent with any set of input-output
examples provided at test time. Our method is based on two novel module neural architectures. The
first module, called the cross correlation I/O network, produces a continuous representation of any
given set of input-output examples. The second module, the Recursive-Reverse-Recursive Neural
Network (R3NN), given the continuous representation of the input-output examples, synthesizes a
program by incrementally expanding partial programs. R3NN employs a tree-based neural archi-
tecture that sequentially constructs a parse tree by selecting which non-terminal symbol to expand
using rules from a context-free grammar (i.e., the DSL).
We demonstrate the efficacy of our method by applying it to the rich and complex domain of regular-
expression-based syntactic string transformations, using a DSL based on the one used by Flash-
Fill (Gulwani, 2011; Gulwani et al., 2012), a Programming-By-Example (PBE) system in Microsoft
Excel 2013. Given a few input-output examples of strings, the task is to synthesize a program built
on regular expressions to perform the desired string transformation. An example task that can be
expressed in this DSL is shown in Figure 1, which also shows the DSL.
Our evaluation shows that NSPS is not only able to construct programs for known tasks from new
input-output examples, but it is also able to construct completely new programs that it had not ob-
served during training. Specifically, the proposed system is able to synthesize string transformation
programs for 63% of tasks that it had not observed at training time, and for 94% of tasks when
100 program samples are taken from the model. Moreover, our system is able to learn 38% of 238
real-world FlashFill benchmarks.
To summarize, the key contributions of our work are:

• A novel Neuro-Symbolic program synthesis technique to encode neural search over the
space of programs defined using a Domain-Specific Language (DSL).
• The R3NN model that encodes and expands partial programs in the DSL, where each node
has a global representation of the program tree.
• A novel cross-correlation based neural architecture for learning continuous representation
of sets of input-output examples.
• Evaluation of the NSPS approach on the complex domain of regular expression based string
transformations.

2 P ROBLEM D EFINITION

In this section, we formally define the DSL-based program synthesis problem that we consider in this
paper. Given a DSL L, we want to automatically construct a synthesis algorithm A such that given
a set of input-output example, {(i1 , o1 ), · · · , (in , on )}, A returns a program P ∈ L that conforms
to the input-output examples, i.e.,
∀j : 1 ≤ j ≤ n P (ij ) = oj . (1)

2
Under review as a conference paper at ICLR 2017

String e := Concat(f1 , · · · , fn )
Input v Output
Substring f := ConstStr(s)
1 William Henry Charles Charles, W.
2 Michael Johnson Johnson, M. | SubStr(v, pl , pr )
3 Barack Rogers Rogers, B. Position p := (r, k, Dir)
4 Martha D. Saunders Saunders, M. | ConstPos(k)
5 Peter T Gates Gates, P. Direction Dir := Start | End
Regex r := s | T1 · · · | Tn
(a) (b)

Figure 1: An example FlashFill task for transforming names to lastname with initials of first name,
and (b) The DSL for regular expression based string transformations.

The syntax and semantics of the DSL for string transformations is shown in Figure 1(b) and Figure 7
respectively. The DSL corresponds to a large subset of FlashFill DSL (except conditionals), and
allows for a richer class of substring operations than FlashFill. A DSL program takes as input a
string v and returns an output string o. The top-level string expression e is a concatenation of a
finite list of substring expressions f1 , · · · , fn . A substring expression f can either be a constant
string s or a substring expression, which is defined using two position logics pl (left) and pr (right).
A position logic corresponds to a symbolic expression that evaluates to an index in the string. A
position logic p can either be a constant position k or a token match expression (r, k, Dir), which
denotes the Start or End of the k th match of token r in input string v. A regex token can either be a
constant string s or one of 8 regular expression tokens: p (ProperCase), C (CAPS), l (lowercase), d
(Digits), α (Alphabets), αn (Alphanumeric), ∧ (StartOfString), and $ (EndOfString). The semantics
of the DSL programs is described in the appendix.
A DSL program for the name transformation task shown in Figure 1(a) that is con-
sistent with the examples is: Concat(f1 , ConstStr(“, ”), f2 , ConstStr(“.”)), where f1 ≡
SubStr(v, (“ ”, −1, End), ConstPos(−1)) and f2 ≡ SubStr(v, ConstPos(0), ConstPos(1)). The
program concatenates the following 4 strings: i) substring between the end of last whitespace and
end of string, ii) constant string “, ”, iii) first character of input string, and iv) constant string “.”.

3 OVERVIEW OF OUR A PPROACH

We now present an overview of our approach. Given a DSL L, we learn a generative model of pro-
grams in the DSL L that is conditioned on input-output examples to efficiently search for consistent
programs. The workflow of our system is shown in Figure 2, which is trained end-to-end using a
large training set of programs in the DSL together with their corresponding input-output examples.
To generate a large training set, we uniformly sample programs from the DSL and then use a rule-
based strategy to compute well-formed input strings that satisfy the pre-conditions of the programs.
The corresponding output strings are obtained by running the programs on the input strings.
A DSL can be considered a context-free grammar with a start symbol S and a set of non-terminals
with corresponding expansion rules. The (partial) grammar derivations or trees correspond to (par-
tial) programs. A naı̈ve way to perform a search over the programs in a DSL is to start from the start
symbol S and then randomly choose non-terminals to expand with randomly chosen expansion rules
until reaching a derivation with only terminals. We, instead, learn a generative model over partial
derivations in the DSL that assigns probabilities to different non-terminals in a partial derivation and
corresponding expansions to guide the search for complete derivations.
Our generative model uses a Recursive-Reverse-Recursive Neural Network (R3NN) to encode par-
tial trees (derivations) in L, where each node in the partial tree encodes global information about
every other node in the tree. The model assigns a vector representation for every symbol and every
expansion rule in the grammar. Given a partial tree, the model first assigns a vector representation
to each leaf node, and then performs a recursive pass going up in the tree to assign a global tree
representation to the root. It then performs a reverse-recursive pass starting from the root to assign
a global tree representation to each node in the tree.

3
Under review as a conference paper at ICLR 2017

i1 – o 1 DSL pj,0 DSL

DSL
Program
Sampler
{
p1 i2 – o 2
…
ik – o k
…
R3NN

DSL
pj,1
i1 – o1
i2 – o2
…
R3NN

DSL
i1 – o 1 ik – ok
Input Gen Rules
{
pj i2 – o 2
…
ik – o k
…
I/O Encoder
R3NN
..
DSL
pj,2 I/O Encoder
R3NN
..
DSL
i1 – o 1

{
pn i2 – o 2
…
ik – o k
R3NN
pj
R3NN
Learnt program

(a) Training Phase (b) Test Phase

Figure 2: An overview of the training and test workflow of our synthesis appraoch.

The generative process is conditioned on a set of input-output examples to learn a program that is
consistent with this set of examples. We experiment with multiple input-output encoders including
an LSTM encoder that concatenates the hidden vectors of two deep bidirectional LSTM networks
for input and output strings in the examples, and a Cross Correlation encoder that computes the cross
correlation between the LSTM tensor representations of input and output strings in the examples.
This vector is then used as an additional input in the R3NN model to condition the generative model.

4 T REE -S TRUCTURED G ENERATION M ODEL

We define a program t-steps into construction as a partial program tree (PPT) (see Figure 3 for a
visual depiction). A PPT has two types of nodes: leaf (symbol) nodes and inner non-leaf (rule)
nodes. A leaf node represents a symbol, whether non-terminal or terminal. An inner non-leaf node
represents a particular production rule of the DSL, where the number of children of the non-leaf
node is equivalent to the arity of the RHS of the rule it represents. A PPT is called a program tree
(PT) whenever all the leaves of the tree are terminal symbols. Such a tree represents a completed
program under the DSL and can be executed. We define an expansion as the valid application of
a specific production rule (e → e op2 e) to a specific non-terminal leaf node within a PPT (leaf
with symbol e). We refer to the specific production rule that an expansion is derived from as the
expansion type. It can be seen that if there exist two leaf nodes (l1 and l2 ) with the same symbol
then for every expansion specific to l1 there exists an expansion specific to l2 with the same type.

4.1 R ECURSIVE -R EVERSE -R ECURSIVE N EURAL N ETWORK

In order to define a generation model over PPTs, we need an efficient way of assigning probabilities
to every valid expansion in the current PPT. A valid expansion has two components: first the pro-
duction rule used, and second the position of the expanded leaf node relative to every other node in
the tree. To account for the first component, a separate distributed representation for each produc-
tion rule is maintained. The second component is handled using an architecture where the forward
propagation resembles belief propagation on trees, allowing a notion of global tree state at every
node within the tree. A given expansion probability is then calculated as being proportional to the
inner product between the production rule representation and the global-tree representation of the
leaf-level non-terminal node. We now describe the design of this architecture in more detail.
The R3NN has the following parameters for the grammar described by a DSL (see Figure 3):

1. For every symbol s ∈ S, an M -dimensional representation φ(s) ∈ RM .

2. For every production rule r ∈ R, an M −dimensional representation ω(r) ∈ RM .
3. For every production rule r ∈ R, a deep neural network fr which takes as input a vector
x ∈ RQ·M , with Q being the number of symbols on the RHS of the production rule r,
and outputs a vector y ∈ RM . Therefore, the production-rule network fr takes as input a
concatenation of the distributed representations of each of its RHS symbols and produces
a distributed representation for the LHS symbol.
4. For every production rule r ∈ R, an additional deep neural network gr which takes as
input a vector x0 ∈ RM and outputs a vector y 0 ∈ RQ·M . We can think of gr as a reverse

4
Under review as a conference paper at ICLR 2017

(a) Recursive pass (b) Reverse-Recursive pass

Figure 3: (a) The initial recursive pass of the R3NN. (b) The reverse-recursive pass of the R3NN
where the input is the output of the previous recursive pass.

production-rule network that takes as input a vector representation of the LHS and produces
a concatenation of the distributed representations of each of the rule’s RHS symbols.

Let E be the set of all valid expansions in a PPT T , let L be the current leaf nodes of T and N be
the current non-leaf (rule) nodes of T . Let S(l) be the symbol of leaf l ∈ L and R(n) represent the
production rule of non-leaf node n ∈ N .

4.1.1 G LOBAL T REE I NFORMATION AT THE L EAVES

To compute the probability distribution over the set E, the R3NN first computes a distributed rep-
resentation for each leaf node that contains global tree information. To accomplish this, for every
leaf node l ∈ L in the tree we retrieve its distributed representation φ(S(l)) . We now do a standard
recursive bottom-to-top, RHS→LHS pass on the network, by going up the tree and applying fR(n)
for every non-leaf node n ∈ N on its RHS node representations (see Figure 3(a)). These networks
fR(n) produce a node representation which is input into the parent’s rule network and so on until we
reach the root node.
Once at the root node, we effectively have a fixed-dimensionality global tree representation φ(root)
for the start symbol. The problem is that this representation has lost any notion of tree position. To
solve this problem R3NN now does what is effectively a reverse-recursive pass which starts at the
root node with φ(root) as input and moves towards the leaf nodes (see Figure 3(b)).
More concretely, we start with the root node representation φ(root) and use that as input into the
rule network gR(root) where R(root) is the production rule that is applied to the start symbol in
T . This produces a representation φ0 (c) for each RHS node c of R(root). If c is a non-leaf node,
we iteratively apply this procedure to c, i.e., process φ0 (c) using gR(c) to get representations φ0 (cc)
for every RHS node cc of R(c), etc. If c is a leaf node, we now have a leaf representation φ0 (c)
which has an information path to φ(root) and thus to every other leaf node in the tree. Once the
reverse-recursive process is complete, we now have a distributed representation φ0 (l) for every leaf
node l which contains global tree information. While φ(l1 ) and φ(l2 ) could be equal for leaf nodes
which have the same symbol type, φ0 (l1 ) and φ0 (l2 ) will not be equal even if they have the same
symbol type because they are at different positions in the tree.

4.1.2 E XPANSION P ROBABILITIES

Given the global leaf representations φ0 (l), we can now straightforwardly acquire scores for each
e ∈ E. For expansion e, let e.r be the expansion type (production rule r ∈ R that e applies) and
let e.l be the leaf node l that e.r is applied to. ze = φ0 (e.l) · ω(e.r) The score of an expansion is
calculated using ze = φ0 (e.l) · ω(e.r). The probability of expansion e is simply the exponentiated
ze
normalized sum over all scores: π(e) = P 0 e eze0 .
e ∈E

An additional improvement that was found to help was to add a bidirectional LSTM to process the
global leaf representations right before calculating the scores. The LSTM hidden states are then

5
Under review as a conference paper at ICLR 2017

used in the score calculation rather than the leaves themselves. This serves primarily to reduce the
minimum length that information has to propagate between nodes in the tree. The R3NN can be
seen as an extension and combination of several previous tree-based models, which were mainly
developed in the context of natural language processing (Le & Zuidema, 2014; Paulus et al., 2014;
Irsoy & Cardie, 2013).

5 C ONDITIONING WITH I NPUT /O UTPUT E XAMPLES

Now that we have defined a generation process over tree-structured programs, we need a way of
conditioning this generation process on a set of input/output examples. The set of input/output
examples provide a nearly complete specification for the desired output program, and so a good
encoding of the examples is crucial to the success of our program generator. For the most part, this
example encoding needs to be domain-specific, since different DSLs have different inputs (some
may operate over integers, some over strings, etc.). Therefore, in our case, we use an encoding
adapted to the input-output strings that our DSL operates over. We also investigate different ways of
conditioning program search on the learnt example input-output encodings.

5.1 E NCODING INPUT / OUTPUT EXAMPLES

There are two types of information that string manipulation programs need to extract from input-
output examples: 1) constant strings, such as “@domain.com” or “.”, which appear in all output
examples; 2) substring indices in input where the index might be further defined by a regular expres-
sion. These indices determine which parts of the input are also present in the output. To simplify the
DSL, we assume that there is a fixed finite universe of possible constant strings that could appear in
programs. Therefore we focus on extracting the second type of information, the substring indices.
In earlier hand-engineered systems such as FlashFill, this information was extracted from the input-
output strings by running the Longest Common Substring algorithm, a dynamic programming algo-
rithm that efficiently finds matching substrings in string pairs. To extract substrings, FlashFill runs
LCS on every input-output string pair in the I/O set to get a set of substring candidates. It then takes
the entire set of substring candidates and simply tries every possible regex and constant index that
can be used at substring boundaries, exhaustively searching for the one which is the most “general”,
where generality is specified by hand-engineered heuristics.
In contrast to these previous methods, instead of of hand-designing a complicated algorithm to
extract regex-based substrings, we develop neural network based architectures that are capable of
learning to extract and produce continuous representations of the likely regular expressions given
input/output strings.

5.1.1 BASELINE LSTM ENCODER

Our first I/O encoding network involves running two separate deep bidirectional LSTM networks for
processing the input and the output string in each example pair. For each pair, it then concatenates
the topmost hidden representation at every time step to produce a 4HT -dimensional feature vector
per I/O pair, where T is the maximum string length for any input or output string, and H is the
topmost LSTM hidden dimension.
We then concatenate the encoding vectors across all I/O pairs to get a vector representation of the en-
tire I/O set. This encoding is conceptually straightforward and has very little prior knowledge about
what operations are being performed over the strings, i.e., substring, constant, etc., which might
make it difficult to discover substring indices, especially the ones based on regular expressions.

5.1.2 C ROSS C ORRELATION ENCODER

To help the model discover input substrings that are copied to the output, we designed an novel I/O
example encoder to compute the cross correlation between each input and output example repre-
sentation. We used the two output tensors of the LSTM encoder (discussed above) as inputs to this
encoder. For each example pair, we first slide the output feature block over the input feature block
and compute the dot product between the respective position representation. Then, we sum over all
overlapping time steps. Features of all pairs are then concatenated to form a 2 ∗ (T − 1)-dimensional

6
Under review as a conference paper at ICLR 2017

vector encoding for all example pairs. There are 2 ∗ (T − 1) possible alignments in total between
input and output feature blocks. We also designed the following variants of this encoder.
Diffused Cross Correlation Encoder: This encoder is identical to the Cross Correlation encoder
except that instead of summing over overlapping time steps after the element-wise dot product, we
simply concatenate the vectors corresponding to all time steps, resulting in a final representation that
contains 2 ∗ (T − 1) ∗ T features for each example pair.
LSTM-Sum Cross Correlation Encoder: In this variant of the Cross Correlation encoder, instead
of doing an element-wise dot product, we run a bidirectional LSTM over the concatenated feature
blocks of each alignment. We represent each alignment by the LSTM hidden representation of the
final time step leading to a total of 2 ∗ H ∗ 2 ∗ (T − 1) features for each example pair.
Augmented Diffused Cross Correlation Encoder: For this encoder, the output of each character
position of the Diffused Cross Correlation encoder is combined with the character embedding at this
position, then a basic LSTM encoder is run over the combined features to extract a 4∗H-dimensional
vector for both the input and the output streams. The LSTM encoder output is then concatenated
with the output of the Diffused Cross Correlation encoder forming a (4∗H +T ∗(T −1))-dimensional
feature vector for each example pair.

5.2 C ONDITIONING PROGRAM SEARCH ON EXAMPLE ENCODINGS

Once the I/O example encodings have been computed, we can use them to perform conditional
generation of the program tree using the R3NN model. There are a number of ways in which the
PPT generation model can be conditioned using the I/O example encodings depending on where the
I/O example information is inserted in the R3NN model. We investigated three locations to inject
example encodings:
1) Pre-conditioning: where example encodings are concatenated to the encoding of each tree leaf,
and then passed to a conditioning network before the bottom-up recursive pass over the program
tree. The conditioning network can be either a multi-layer feedforward network, or a bidirectional
LSTM network running over tree leaves. Running an LSTM over tree leaves allows the model to
learn more about the relative position of each leaf node in the tree.
2) Post-conditioning: After the reverse-recursive pass, example encodings are concatenated to the
updated representation of each tree leaf and then fed to a conditioning network before computing
the expansion scores.
3) Root-conditioning: After the recursive pass over the tree, the root encoding is concatenated to
the example encodings and passed to a conditioning network. The updated root representation is
then used to drive the reverse-recursive pass.
Empirically, pre-conditioning worked better than either root- or post- conditioning. In addition,
conditioning at all 3 places simultaneously did not cause a significant improvement over just
pre-conditioning. Therefore, for the experimental section, we report models which only use pre-
conditioning.

6 E XPERIMENTS
In order to evaluate and compare variants of the previously described models, we generate a dataset
randomly from the DSL. To do so, we first enumerate all possible programs under the DSL up to
a specific number of instructions, which are then partitioned into training, validation and test sets.
In order to have a tractable number of programs, we limited the maximum number of instructions
for programs to be 13. Length 13 programs are important for this specific DSL because all larger
programs can be written as compositions of sub-programs of length at most 13. The semantics of
length 13 programs therefore constitute the “atoms” of this particular DSL.
In testing our model, there are two different categories of generalization. The first is input/output
generalization, where we are given a new set of input/output examples as well as a program with a
specific tree that we have seen during training. This represents the model’s capacity to be applied
on new data. The second category is program generalization, where we are given both a previously
unseen program tree in addition to unseen input/output examples. Therefore the model needs to

7
Under review as a conference paper at ICLR 2017

I/O Encoding Train Test

LSTM 88% 88%
Cross Correlation (CC) 67% 65%
Diffused CC 89% 88%
LSTM-sum CC 90% 91%
Augmented diffused CC 91% 91%

Table 1: The effect of different input/output encoders on accuracy. Each result used 100 samples.
There is almost no generalization error in the results.

have a sufficient enough understanding of the semantics of the DSL that it can construct novel
combinations of operations. For all reported results, training sets correspond to the first type of
generalization since we have seen the program tree but not the input/output pairs. Test sets represent
the second type of generalization, as they are trees which have not been seen before on input/output
pairs that have also not been seen before.
In this section, we compare several different variants of our model. We first evaluate the effect of
each of the previously described input/output encoders. We then evaluate the R3NN model against a
simple recurrent model called io2seq, which is basically an LSTM that takes as input the input/output
conditioning vector and outputs a sequence of DSL symbols that represents a linearized program
tree. Finally, we report the results of the best model on the length 13 training and testing sets, as
well as on a set of 238 benchmark functions.

6.1 S ETUP AND HYPERPARAMETERS SETTINGS

For training the R3NN, two hyperparameters that were crucial for stabilizing training were the use
of hyperbolic tangent activation functions in both R3NN and cross-correlation I/O encoders and
the use of minibatches of length 8. Due to the difficulty of batching tree-based neural networks and
time-constraints, we were limited to 8 samples per batch but some preliminary experiments indicated
that increasing the batch size even further improved performance. Additionally, for all results, the
program tree generation is conditioned on a set of 10 input/output string pairs.
For each latent function and set of input/output examples that we test on, we report whether we had
a success after sampling 100 functions from the model and testing all 100 to see if one of these
functions is equivalent to the latent function. Here we consider two functions to be equivalent with
respect to a specific input/output example set if the functions output the same strings when run on
the inputs. Under this definition, two functions can have a different set of operations but still be
equivalent with respect to a specific input-output set.

6.2 E XAMPLE ENCODING

In this section, we evaluate the effect of several different input/output example encoders. To control
for the effect of the tree model, all results here used an R3NN with fixed hyperparameters to generate
the program tree. Table 1 shows the performance of several of these input/output example encoders.
We can see that the summed cross-correlation encoder did not perform well, which can be due to
the fact that the sum destroys positional information that might be useful for determining specific
substring indices. The LSTM-sum and the augmented diffused cross-correlation models did the
best. Surprisingly, the LSTM encoder was capable of finding nearly 88% of all programs without
having any prior knowledge explicitly built into the architecture. We use 100 samples for evaluating
the Train and Test sets. The training performance is sometimes slightly lower because there are
close to 5 million training programs but we only look at less than 2 million of these programs during
training. We sample a subset of only 1000 training programs from the 5 million program set to
report the training results in the tables. The test sets also consist of 1000 programs.

6.3 IO 2 SEQ

In this section, we motivate the use of the R3NN by testing whether a simpler model can also be
used to generate programs. The io2seq model is an LSTM whose initial hidden and cell states

8
Under review as a conference paper at ICLR 2017

Sampling Train Test

io2seq 44% 42%

Table 2: Testing the I/O-vector-to-sequence model. Each result used 100 samples.

Sampling Train Test

1-best 60% 63%
1-sample 56% 57%
10-sample 81% 79%
50-sample 91% 89%
100-sample 94% 94%
300-sample 97% 97%

Table 3: The effect of backtracking (sampling) multiple programs on accuracy. 1-best is determin-
istically choosing the expansion with highest probability at each step.

are a function of the input/output encoding vector. The io2seq model then generates a linearized
tree of a program symbol-by-symbol. An example of what a linearized program tree looks like is
(S (e (f (ConstStr “@”)ConstStr )f )e )S , which represents the program tree that returns the constant
string “@”. Predicting a linearized tree using an LSTM was also done in the context of pars-
ing (Vinyals et al., 2015). For the io2seq model, we used the LSTM-sum cross-correlation I/O
conditioning model.
The results in Table 2 show that the performance of the io2seq model at 100 samples per latent test
function is far worse than the R3NN, at around 42% versus 91%, respectively. The reasons for that
could be that the io2seq model needs to perform far more decisions than the R3NN, since the io2seq
model has to predict the parentheses symbols that determine at which level of the tree a particular
symbol is at. For example, the io2seq model requires on the order of 100 samples for length 13
programs, while the R3NN requires no more than 13.

6.4 E FFECT OF BACKTRACKING SEARCH

For the best R3NN model that we trained, we also evaluated the effect that a different number of
samples per latent function had on performance. The results are shown in Table 3. The increase of
the model’s performance as the sample size increases hints that the model has a notion of what type
of program satisfies a given I/O pair, but it might not be that certain about the details such as which
regex to use, etc. By 300 samples, the model is nearing perfect accuracy on the test sets.

6.5 F LASH F ILL B ENCHMARKS

We also evaluate our learnt models on 238 real-world FlashFill benchmarks obtained from the Mi-
crosoft Excel team and online help-forums. These benchmarks involve string manipulation tasks
described using input-output examples. We evaluate two models – one with a cross correlation en-
coder trained on 5 input-output examples and another trained on 10 input-output examples. Both
the models were trained on randomly sampled programs from the DSL upto size 13 with randomly
generated input-output examples.
The distribution of the size of smallest DSL programs needed to solve the benchmark tasks is shown
in Figure 4(a), which varies from 4 to 63. The figure also shows the number of benchmarks for
which our model was able to learn the program using 5 input-output examples using samples of
top-2000 learnt programs. In total, the model is able to learn programs for 91 tasks (38.2%). Since
the model was trained for programs upto size 13, it is not surprising that it is not able to solve tasks
that need larger program size. There are 110 FlashFill benchmarks that require programs upto size
13, out of which the model is able to solve 82.7% of them.
The effect of sampling multiple learnt programs instead of only top program is shown in Figure 4(b).
With only 10 samples, the model can already learn about 13% of the benchmarks. We observe
a steady increase in performance upto about 2000 samples, after which we do not observe any

9
Under review as a conference paper at ICLR 2017

Number of FlashFill Benchmarks solved

50
Sampling Solved Benchmarks
Total Solved
45 10 13%
Number of Benchmarks
40
35 50 21%
30 100 23%
25
20 200 29%
15 500 33%
10
5 1000 34%
0 2000 38%
4 7 9 10 11 13 15 17 19 24 25 27 30 31 37 50 59 63
Size of smallest programs for FlashFill Benchmarks
5000 38%

(a) (b)

Figure 4: (a) The distribution of size of programs needed to solve FlashFill tasks and the perfor-
mance of our model, (b) The effect of sampling for trying top-k learnt programs.

Input v Output Input v Output Input v Output

[CPT-00350 [CPT-00350] 732606129 0x73 John Doyle John D.
[CPT-00340] [CPT-00340] 430257526 0x43 Matt Walters Matt W.
[CPT-114563] [CPT-114563] 444004480 0x44 Jody Foster Jody F.
[CPT-1AB02 [CPT-1AB02] 371255254 0x37 Angela Lindsay Angela L.
[CPT-00360 [CPT-00360] 635272676 0x63 Maria Schulte Maria S.
(a) (b) (c)

Figure 5: Some example solved benchmarks: (a) cleaning up medical codes with closing brackets,
(b) generating Hex numbers with first two digits, (c) transforming names to firstname and last initial.

significant improvement. Since there are more than 2 million programs in the DSL of length 11
itself, the enumerative techniques with uniform search do not scale well (Alur et al., 2015).
We also evaluate a model that is learnt with 10 input-output examples per benchmark. Surprisingly,
this model can only learn programs for about 29% of the FlashFill benchmarks. We hypothesize
that the space of consistent programs gets more constrained with additional input-output examples,
which makes it harder for R3NN to learn the desired program. Another possibility is that the input-
output encoder gets more confused with the additional example pairs.
Our model is able to solve majority of FlashFill benchmarks that require learning programs with
upto 3 Concat operations. We now describe a few of these benchmarks, also shown in Fig-
ure 5. An Excel user wanted to clean a set of medical billing records by adding a missing “]”
to medical codes as shown in Figure 5(a). Our system learns the following program given these
5 input-output examples: Concat(SubStr(v,ConstPos(0),(d,-1,End)), ConstStr(“]”)). The pro-
gram concatenates the substring between the start of the input string and the position of the last
digit regular expression with the constant string “]”. Another task that required user to trans-
form some numbers into a hex format is shown in Figure 5(b). Our system learns the following
program: Concat(ConstStr(“0x”),SubStr(v,ConstPos(0),ConstPos(2))). For some benchmarks
with long input strings, it is still able to learn regular expressions to extract the desired sub-
string, e.g. it learns a program to extract “NancyF” from the string “123456789,freehafer ,drew
,nancy,19700101,11/1/2007,[email protected],1230102,123 1st Avenue,Seattle,wa,09999”.
Our system is currently not able to learn programs for benchmarks that require 4 or more Con-
cat operations. Two such benchmarks are shown in Figure 6. The task of combining names in
Figure 6(a) requires 6 Concat arguments, whereas the phone number transformation task in Fig-
ure 6(b) requires 5 Concat arguments. This is mainly because of the scalability issues in training
with programs of larger size. There are also a few interesting benchmarks where the R3NN models
gets very close to learning the desired program. For example, for the task “Bill Gates” → “Mr.
Bill Gates”, it learns a program that generates “Mr.Bill Gates” (without the whitespace), and for
the task “617-444-5454” → “(617) 444-5454”, it learns a program that generates the string “(617
444-5454”.

10
Under review as a conference paper at ICLR 2017

Input v Output Input v Output

1 John James Paul John, James, and Paul. 1 (425) 221 6767 425-221-6767
2 Tom Mike Bill Tom, Mike, and Bill. 2 206.225.1298 206-225-1298
3 Marie Nina John Marie, Nina, and John. 3 617-224-9874 617-224-9874
4 Reggie Anna Adam Reggie, Anna, and Adam. 4 425.118.9281 425-118-9281

(a) (b)

Figure 6: Some unsolved benchmarks: (a)Combining names by different delimiters. (b) Transform-
ing phone numbers to consistent format.

7 R ELATED W ORK

We have seen a renewed interest in recent years in the area of Program Induction and Synthesis.
In the machine learning community, a number of promising neural architectures have been pro-
posed to perform program induction. These methods have employed architectures inspired from
computation modules (Turing Machines, RAM) (Graves et al., 2014; Kurach et al., 2015; Reed &
de Freitas, 2015; Neelakantan et al., 2015) or common data structures such as stacks used in many
algorithms (Joulin & Mikolov, 2015). These approaches represent the atomic operations of the net-
work in a differentiable form, which allows for efficient end-to-end training of a neural controller.
However, unlike our approach that learns comprehensible complete programs, many of these ap-
proaches learn only the program behavior (i.e., they produce desired outputs on new input data).
Some recently proposed methods (Kurach et al., 2015; Gaunt et al., 2016; Riedel et al., 2016; Bunel
et al., 2016) do learn interpretable programs but these techniques require learning a separate neural
network model for each individual task, which is undesirable in many synthesis settings where we
would like to learn programs in real-time for a large number of tasks. Liang et al. (2010) restrict
the problem space with a probabilistic context-free grammar and introduce a new representation
of programs based on combinatory logic, which allows for sharing sub-programs across multiple
tasks. They then take a hierarchical Bayesian approach to learn frequently occurring substructures
of programs. Our approach, instead, uses neural architectures to condition the search space of pro-
grams, and does not require additional step of representing program space using combinatory logic
for allowing sharing.
The DSL-based program synthesis approach has also seen a renewed interest recently (Alur et al.,
2015). It has been used for many applications including synthesizing low-level bitvector implemen-
tations (Solar-Lezama et al., 2005), Excel macros for data manipulation (Gulwani, 2011; Gulwani
et al., 2012), superoptimization by finding smaller equivalent loop bodies (Schkufza et al., 2013),
protocol synthesis from scenarios (Udupa et al., 2013), synthesis of loop-free programs (Gulwani
et al., 2011), and automated feedback generation for programming assignments (Singh et al., 2013).
The synthesis techniques proposed in the literature generally employ various search techniques in-
cluding enumeration with pruning, symbolic constraint solving, and stochastic search, while sup-
porting different forms of specifications including input-output examples, partial programs, program
invariants, and reference implementation.
In this paper, we consider input-output example based specification over the hypothesis space de-
fined by a DSL of string transformations, similar to that of FlashFill (without conditionals) (Gul-
wani, 2011). The key difference between our approach over previous techniques is that our system
is trained completely in an end-to-end fashion, while previous techniques require significant manual
effort to design heuristics for efficient search. There is some work on guiding the program search us-
ing learnt clues that suggest likely DSL expansions, but the clues are learnt over hand-coded textual
features of examples (Menon et al., 2013). Moreover, their DSL consists of composition of about
100 high-level text transformation functions such as count and dedup, whereas our DSL consists of
tree structured programs over richer regular expression based substring constructs.
There is also a recent line of work on learning probabilistic models of code from a large number of
code repositories (big code) (Raychev et al., 2015; Bielik et al., 2016; Hindle et al., 2016), which
are then used for applications such as auto-completion of partial programs, inference of variable
and method names, program repair, etc. These language models typically capture only the syntactic

11
Under review as a conference paper at ICLR 2017

properties of code, unlike our approach that also tries to capture the semantics to learn the desired
program. The work by Maddison & Tarlow (2014) addresses the problem of learning structured
generative models of source code but both their model and application domain are different from
ours.
The R3NN model employed in our work is related to several tree and graph structured neural net-
works present in the NLP literature (Le & Zuidema, 2014; Paulus et al., 2014; Irsoy & Cardie, 2013).
The Inside-Outside Recursive Neural Network (Le & Zuidema, 2014) in particular is most similar to
the R3NN, where they generate a parse tree incrementally by using global leaf-level representations
to determine which expansions in the parse tree to take next.

8 C ONCLUSION

We have proposed a novel technique called Neuro-Symbolic Program Synthesis that is able to con-
struct a program incrementally based on given input-output examples. To do so, a new neural
architecture called Recursive-Reverse-Recursive Neural Network is used to encode and expand a
partial program tree into a full program tree. Its effectiveness at example-based program synthesis
is demonstrated, even when the program has not been seen during training.
These promising results open up a number of interesting directions for future research. For example,
we took a supervised-learning approach here, assuming availability of target programs during train-
ing. In some scenarios, we may only have access to an oracle that returns the desired output given
an input. In this case, reinforcement learning is a promising framework for program synthesis.

R EFERENCES
Alur, Rajeev, Bodı́k, Rastislav, Dallal, Eric, Fisman, Dana, Garg, Pranav, Juniwal, Garvit, Kress-
Gazit, Hadas, Madhusudan, P., Martin, Milo M. K., Raghothaman, Mukund, Saha, Shamwaditya,
Seshia, Sanjit A., Singh, Rishabh, Solar-Lezama, Armando, Torlak, Emina, and Udupa, Ab-
hishek. Syntax-guided synthesis. In Dependable Software Systems Engineering, pp. 1–25. 2015.
Bielik, Pavol, Raychev, Veselin, and Vechev, Martin T. PHOG: probabilistic model for code. In
ICML, pp. 2933–2942, 2016.
Biermann, Alan W. The inference of regular lisp programs from examples. IEEE transactions on
Systems, Man, and Cybernetics, 8(8):585–600, 1978.
Bunel, Rudy, Desmaison, Alban, Kohli, Pushmeet, Torr, Philip H. S., and Kumar, M. Pawan. Adap-
tive neural compilation. CoRR, abs/1605.07969, 2016. URL http://arxiv.org/abs/1605.07969.
Gaunt, Alexander L, Brockschmidt, Marc, Singh, Rishabh, Kushman, Nate, Kohli, Pushmeet, Tay-
lor, Jonathan, and Tarlow, Daniel. Terpret: A probabilistic programming language for program
induction. arXiv preprint arXiv:1608.04428, 2016.
Graves, Alex, Wayne, Greg, and Danihelka, Ivo. Neural turing machines. arXiv preprint
arXiv:1410.5401, 2014.
Gulwani, Sumit. Automating string processing in spreadsheets using input-output examples. In
POPL, pp. 317–330, 2011.
Gulwani, Sumit, Jha, Susmit, Tiwari, Ashish, and Venkatesan, Ramarathnam. Synthesis of loop-free
programs. In PLDI, pp. 62–73, 2011.
Gulwani, Sumit, Harris, William, and Singh, Rishabh. Spreadsheet data manipulation using exam-
ples. Communications of the ACM, Aug 2012.
Hindle, Abram, Barr, Earl T., Gabel, Mark, Su, Zhendong, and Devanbu, Premkumar T. On the
naturalness of software. Commun. ACM, 59(5):122–131, 2016.
Irsoy, Orzan and Cardie, Claire. Bidirectional recursive neural networks for token-level labeling
with structure. In NIPS Deep Learning Workshop, 2013.

12
Under review as a conference paper at ICLR 2017

Joulin, Armand and Mikolov, Tomas. Inferring algorithmic patterns with stack-augmented recurrent
nets. In NIPS, pp. 190–198, 2015.

Kurach, Karol, Andrychowicz, Marcin, and Sutskever, Ilya. Neural random-access machines. arXiv
preprint arXiv:1511.06392, 2015.

Le, Phong and Zuidema, Willem. The inside-outside recursive neural network model for dependency
parsing. In EMNLP, pp. 729–739, 2014.

Liang, Percy, Jordan, Michael I., and Klein, Dan. Learning programs: A hierarchical Bayesian
approach. In ICML, pp. 639–646, 2010.

Maddison, Chris J and Tarlow, Daniel. Structured generative models of natural source code. In
ICML, pp. 649–657, 2014.

Menon, Aditya Krishna, Tamuz, Omer, Gulwani, Sumit, Lampson, Butler W., and Kalai, Adam. A
machine learning framework for programming by example. In ICML, pp. 187–195, 2013.

Neelakantan, Arvind, Le, Quoc V, and Sutskever, Ilya. Neural programmer: Inducing latent pro-
grams with gradient descent. arXiv preprint arXiv:1511.04834, 2015.

Paulus, Romain, Socher, Richard, and Manning, Christopher D. Global belief recursive neural
networks. pp. 2888–2896, 2014.

Raychev, Veselin, Vechev, Martin T., and Krause, Andreas. Predicting program properties from ”big
code”. In POPL, pp. 111–124, 2015.

Reed, Scott and de Freitas, Nando. Neural programmer-interpreters. arXiv preprint

arXiv:1511.06279, 2015.

Riedel, Sebastian, Bosnjak, Matko, and Rocktäschel, Tim. Programming with a differentiable forth
interpreter. CoRR, abs/1605.06640, 2016. URL http://arxiv.org/abs/1605.06640.

Schkufza, Eric, Sharma, Rahul, and Aiken, Alex. Stochastic superoptimization. In ASPLOS, pp.
305–316, 2013.

Singh, Rishabh and Solar-Lezama, Armando. Synthesizing data structure manipulations from sto-
ryboards. In SIGSOFT FSE, pp. 289–299, 2011.

Singh, Rishabh, Gulwani, Sumit, and Solar-Lezama, Armando. Automated feedback generation for
introductory programming assignments. In PLDI, pp. 15–26, 2013.

Solar-Lezama, Armando. Program Synthesis By Sketching. PhD thesis, EECS Dept., UC Berkeley,
2008.

Solar-Lezama, Armando, Rabbah, Rodric, Bodik, Rastislav, and Ebcioglu, Kemal. Programming by
sketching for bit-streaming programs. In PLDI, 2005.

Summers, Phillip D. A methodology for lisp program construction from examples. Journal of the
ACM (JACM), 24(1):161–175, 1977.

Udupa, Abhishek, Raghavan, Arun, Deshmukh, Jyotirmoy V., Mador-Haim, Sela, Martin, Milo
M. K., and Alur, Rajeev. TRANSIT: specifying protocols with concolic snippets. In PLDI, pp.
287–296, 2013.

Vinyals, Oriol, Kaiser, Lukasz, Koo, Terry, Petrov, Slav, Sutskever, Ilya, and Hinton, Geoffrey.
Grammar as a foreign language. In ICLR, 2015.

13
Under review as a conference paper at ICLR 2017

JConcat(f1 , · · · , fn )Kv = Concat(Jf1 Kv , · · · , Jfn Kv )

JConstStr(s)Kv = s
JSubStr(v, pl , pr )Kv = v[Jpl Kv ..Jpr Kv ]
JConstPos(k)Kv = k > 0? k : len(s) + k
J(r, k, Start)Kv = Start of k th match of r in v
from beginning (end if k < 0)
J(r, k, End)Kv = End of k th match of r in v
from beginning (end if k < 0)

Figure 7: The semantics of the DSL for string transformations.

A D OMAIN - SPECIFIC L ANGUAGE FOR S TRING T RANSFORMATIONS

The semantics of the DSL programs is shown in Figure 7. The semantics of a Concat expression
is to concatenate the results of recursively evaluating the constituent substring expressions fi . The
semantics of ConstStr(s) is to simply return the constant string s. The semantics of a substring
expression is to first evaluate the two position logics pl and pr to p1 and p2 respectively, and then
return the substring corresponding to v[p1 ..p2 ]. We denote s[i..j] to denote the substring of string
s starting at index i (inclusive) and ending at index j (exclusive), and len(s) denotes its length.
The semantics of ConstPos(k) expression is to return k if k > 0 or return len + k (if k < 0).
The semantics of position logic (r, k, Start) is to return the Start of k th match of r in v from the
beginning (if k > 0) or from the end (if k < 0).

Automated Discovery of Algorithms From Data: Nature Computational Science
No ratings yet
Automated Discovery of Algorithms From Data: Nature Computational Science
12 pages
Parallel Distributed Processing by David Rumelhart
No ratings yet
Parallel Distributed Processing by David Rumelhart
249 pages
Automatic Program Construction Techniques
No ratings yet
Automatic Program Construction Techniques
592 pages
tr#252
No ratings yet
tr#252
98 pages
Practical Implementation of A Dependently Typed Functional Programming Language
100% (1)
Practical Implementation of A Dependently Typed Functional Programming Language
270 pages
Program Synthesis With Large Language Models: Jacob Austin Augustus Odena
No ratings yet
Program Synthesis With Large Language Models: Jacob Austin Augustus Odena
34 pages
Looped Transformers As Programmable Computers
No ratings yet
Looped Transformers As Programmable Computers
64 pages
A Conversational Paradigm For Program Synthesis
No ratings yet
A Conversational Paradigm For Program Synthesis
25 pages
Programming Lang Processing
No ratings yet
Programming Lang Processing
70 pages
Code Contrast A Contractive Learning Approach - For - G
No ratings yet
Code Contrast A Contractive Learning Approach - For - G
28 pages
CodeGen- An Open Large Language Model for Code With Multi-Turn Program Synthesis (CodeGen：用于代码的开放大型语言模型，具备多轮程序综合能力)
No ratings yet
CodeGen- An Open Large Language Model for Code With Multi-Turn Program Synthesis (CodeGen：用于代码的开放大型语言模型，具备多轮程序综合能力)
25 pages
Emergent Representations of Program Semantics in Language Models Trained On Programs
No ratings yet
Emergent Representations of Program Semantics in Language Models Trained On Programs
25 pages
Diffusion On Syntax Trees For Program Synthesis
No ratings yet
Diffusion On Syntax Trees For Program Synthesis
17 pages
deepseek论文
No ratings yet
deepseek论文
18 pages
LILO
No ratings yet
LILO
11 pages
Augment LLM Universal!
No ratings yet
Augment LLM Universal!
23 pages
Faster Sorting Algorithms Discovered Using Deep Re
No ratings yet
Faster Sorting Algorithms Discovered Using Deep Re
18 pages
2018-NeurIPS-Neural Code Comprehension - A Learnable Representation of Code Semantics
No ratings yet
2018-NeurIPS-Neural Code Comprehension - A Learnable Representation of Code Semantics
17 pages
N - G D S R - T P S E: Eural Uided Eductive Earch For EAL IME Rogram Ynthesis From Xamples
No ratings yet
N - G D S R - T P S E: Eural Uided Eductive Earch For EAL IME Rogram Ynthesis From Xamples
18 pages
4786 Planning With Large Language M
No ratings yet
4786 Planning With Large Language M
28 pages
A Data Structure Optimizing Compiler For tUPL
No ratings yet
A Data Structure Optimizing Compiler For tUPL
102 pages
Neural Execution Engines: Learning To Execute Subroutines: Work Completed During An Internship at Google
No ratings yet
Neural Execution Engines: Learning To Execute Subroutines: Work Completed During An Internship at Google
21 pages
Learning Functional Programs With Function Invention and Reuse
No ratings yet
Learning Functional Programs With Function Invention and Reuse
46 pages
Building Program Vector Representations For Deep Learning
No ratings yet
Building Program Vector Representations For Deep Learning
11 pages
Generative Fuzzy System For Sequence Generation: Hailong Yang Zhaohong Deng
No ratings yet
Generative Fuzzy System For Sequence Generation: Hailong Yang Zhaohong Deng
12 pages
Fully Autonomous Programming With Large Language Models
No ratings yet
Fully Autonomous Programming With Large Language Models
10 pages
Discrete Recurrent Neural Networks For Grammatical Inference
No ratings yet
Discrete Recurrent Neural Networks For Grammatical Inference
11 pages
Neural Networks and The Chomsky Hierarchy
No ratings yet
Neural Networks and The Chomsky Hierarchy
32 pages
Convolutional Neural Networks Over Tree Structures For Programming Language Processing
No ratings yet
Convolutional Neural Networks Over Tree Structures For Programming Language Processing
8 pages
5a. Recurrent Neural Networks
No ratings yet
5a. Recurrent Neural Networks
45 pages
Robustfill: Neural Program Learning Under Noisy I/O: Gettoken (Alpha, - 1) - ,' - ' - Tocase (Proper, Gettoken (Alpha, 1) )
No ratings yet
Robustfill: Neural Program Learning Under Noisy I/O: Gettoken (Alpha, - 1) - ,' - ' - Tocase (Proper, Gettoken (Alpha, 1) )
18 pages
25654-Article Text-29717-1-2-20230626
No ratings yet
25654-Article Text-29717-1-2-20230626
9 pages
PL0 Generation Notes
No ratings yet
PL0 Generation Notes
77 pages
2103.11614v1 - Unknown
No ratings yet
2103.11614v1 - Unknown
34 pages
Abstract Syntax Networks For Code Generation and Semantic Parsing
No ratings yet
Abstract Syntax Networks For Code Generation and Semantic Parsing
11 pages
Synthesizing High-Quality Programming Tasks With LLM-based Expert and Student Agents
No ratings yet
Synthesizing High-Quality Programming Tasks With LLM-based Expert and Student Agents
12 pages
Program L
No ratings yet
Program L
20 pages
DeepMind - Faster Sorting Algorithms Discovered Using Deep Reinforcement Learning
No ratings yet
DeepMind - Faster Sorting Algorithms Discovered Using Deep Reinforcement Learning
17 pages
ACL - 2022 - Yanzhe Zhang - Continual Sequence Generation With Adaptive Compositional Modules
No ratings yet
ACL - 2022 - Yanzhe Zhang - Continual Sequence Generation With Adaptive Compositional Modules
15 pages
S Y W: S I - C L M: HOW OUR ORK Cratchpads For Ntermedi ATE Omputation With Anguage Odels
No ratings yet
S Y W: S I - C L M: HOW OUR ORK Cratchpads For Ntermedi ATE Omputation With Anguage Odels
16 pages
Generative Code Modeling With Graphs
No ratings yet
Generative Code Modeling With Graphs
23 pages
Deepercoder: Code Generation Using Machine Learning: Ntroduction
No ratings yet
Deepercoder: Code Generation Using Machine Learning: Ntroduction
6 pages
Physics of Language Models Part 1 Learning Hierarchical Language Structures
No ratings yet
Physics of Language Models Part 1 Learning Hierarchical Language Structures
45 pages
Learning To Superoptimize Real-World Programs
No ratings yet
Learning To Superoptimize Real-World Programs
10 pages
Neural Symbolic Machines: Learning Semantic Parsers On Freebase With Weak Supervision
No ratings yet
Neural Symbolic Machines: Learning Semantic Parsers On Freebase With Weak Supervision
12 pages
Nuro Symbolic AI 1706972510
No ratings yet
Nuro Symbolic AI 1706972510
38 pages
PDP Handbook
No ratings yet
PDP Handbook
205 pages
EgoCoder - Intelligent Program Synthesis With Hierarchical Sequential Neural Network Model
No ratings yet
EgoCoder - Intelligent Program Synthesis With Hierarchical Sequential Neural Network Model
10 pages
(Advanced Studies in Theoretical and Applied Econometrics) Jan Beran, Yuanhua Feng, Hartmut Hebbel-Empirical Economic and Financial Research - Theory, Methods and Practice-Springer (
100% (2)
(Advanced Studies in Theoretical and Applied Econometrics) Jan Beran, Yuanhua Feng, Hartmut Hebbel-Empirical Economic and Financial Research - Theory, Methods and Practice-Springer (
506 pages
2011 Dimensions in Prog Synthesis
No ratings yet
2011 Dimensions in Prog Synthesis
12 pages
AIOS Compiler - LLM As Interpreter For Natural Language Programming and Flow Programming of AI Agents
No ratings yet
AIOS Compiler - LLM As Interpreter For Natural Language Programming and Flow Programming of AI Agents
12 pages
Toward Neurosymbolic Program Comprehension
No ratings yet
Toward Neurosymbolic Program Comprehension
5 pages
AGR 351 - Soil Water Movement - PPT 1 - Agri Junction
No ratings yet
AGR 351 - Soil Water Movement - PPT 1 - Agri Junction
27 pages
Deep Learning Computers Today
No ratings yet
Deep Learning Computers Today
13 pages
Lunyiu SOP UT
No ratings yet
Lunyiu SOP UT
2 pages
Write, Execute, Assess: Program Synthesis With A REPL: These Authors Contributed Equally To This Work
No ratings yet
Write, Execute, Assess: Program Synthesis With A REPL: These Authors Contributed Equally To This Work
15 pages
Deep Learning Abstract
No ratings yet
Deep Learning Abstract
1 page
2023 Msce Mock 2 Chemistry P1
100% (2)
2023 Msce Mock 2 Chemistry P1
12 pages
2732 Shraddha
No ratings yet
2732 Shraddha
4 pages
2nd Year Physics CH Wise 2021 by 786 Academy
100% (6)
2nd Year Physics CH Wise 2021 by 786 Academy
10 pages
Learning To Execute: WOJ Zaremba Gmail COM
No ratings yet
Learning To Execute: WOJ Zaremba Gmail COM
8 pages
Code Generation Using Machine Learning A Systematic Review 1ic7hqvz - Extracted
No ratings yet
Code Generation Using Machine Learning A Systematic Review 1ic7hqvz - Extracted
1 page
Land Range Rover - GAP IID Tool V - 300 BT BlueTooth Puerto OBD
100% (1)
Land Range Rover - GAP IID Tool V - 300 BT BlueTooth Puerto OBD
192 pages
Competition-Level Code Generation With AlphaCode
No ratings yet
Competition-Level Code Generation With AlphaCode
7 pages
Petrochemical Processes - 2001
No ratings yet
Petrochemical Processes - 2001
174 pages
1Z0 1087 24 Demo
No ratings yet
1Z0 1087 24 Demo
4 pages
Kertas 1
No ratings yet
Kertas 1
41 pages
Designing A Bucket Mechanism of A Backhoe Loader
90% (10)
Designing A Bucket Mechanism of A Backhoe Loader
70 pages
66 Brosur Arthroscopic Energy Generator
No ratings yet
66 Brosur Arthroscopic Energy Generator
2 pages
Quantum Riemannian Geometry Edwin Beggs Shahn Majid Instant Download
No ratings yet
Quantum Riemannian Geometry Edwin Beggs Shahn Majid Instant Download
37 pages
Concept of Inheritance Encapsulation and Polymorphism
No ratings yet
Concept of Inheritance Encapsulation and Polymorphism
36 pages
ITS323Y12S1E02 Final Exam Answers
No ratings yet
ITS323Y12S1E02 Final Exam Answers
21 pages
Handbook of Production Economics Subhash C. Ray (Editor)
100% (1)
Handbook of Production Economics Subhash C. Ray (Editor)
72 pages
Module 1 AAD
No ratings yet
Module 1 AAD
9 pages
1 - Goodenough Park 2013 The Li Ion Rechargeable Battery A Perspective
No ratings yet
1 - Goodenough Park 2013 The Li Ion Rechargeable Battery A Perspective
10 pages
FX Family Catalogue
No ratings yet
FX Family Catalogue
80 pages
Anova One Way & Two Way Classified Data: Dr. Mukta Datta Mazumder Associate Professor Department of Statistics
No ratings yet
Anova One Way & Two Way Classified Data: Dr. Mukta Datta Mazumder Associate Professor Department of Statistics
32 pages
05 Traps PDF
No ratings yet
05 Traps PDF
17 pages
Security Processor Architecture 1
No ratings yet
Security Processor Architecture 1
29 pages
An Intelligent Hair and Scalp Analysis System
No ratings yet
An Intelligent Hair and Scalp Analysis System
16 pages
Team POP Quiz 1 Template: About Module 1 Prepared By: BOIL BABY BOIL
No ratings yet
Team POP Quiz 1 Template: About Module 1 Prepared By: BOIL BABY BOIL
13 pages
Introduction To Linear Programming: Simplex Method: Lesson/ Learning Plan
No ratings yet
Introduction To Linear Programming: Simplex Method: Lesson/ Learning Plan
10 pages
P5x30 Selection Guide
No ratings yet
P5x30 Selection Guide
2 pages
Hines Et Al-2007-Quality and Reliability Engineering International
No ratings yet
Hines Et Al-2007-Quality and Reliability Engineering International
13 pages
Geotropism
No ratings yet
Geotropism
8 pages
Control Strategy and Application of Power Converter
No ratings yet
Control Strategy and Application of Power Converter
6 pages
Solution 1
No ratings yet
Solution 1
7 pages
Pitriani Rajab Mangasi - 201830112
No ratings yet
Pitriani Rajab Mangasi - 201830112
14 pages
19-Article Text-69-1-10-20170622
No ratings yet
19-Article Text-69-1-10-20170622
7 pages

N - S P S: Euro Ymbolic Rogram Ynthesis

Uploaded by

N - S P S: Euro Ymbolic Rogram Ynthesis

Uploaded by

Under review as a conference paper at ICLR 2017

N EURO -S YMBOLIC P ROGRAM S YNTHESIS

3 OVERVIEW OF OUR A PPROACH

i1 – o 1 DSL pj,0 DSL

(a) Training Phase (b) Test Phase

4 T REE -S TRUCTURED G ENERATION M ODEL

4.1 R ECURSIVE -R EVERSE -R ECURSIVE N EURAL N ETWORK

1. For every symbol s ∈ S, an M -dimensional representation φ(s) ∈ RM .

(a) Recursive pass (b) Reverse-Recursive pass

4.1.1 G LOBAL T REE I NFORMATION AT THE L EAVES

4.1.2 E XPANSION P ROBABILITIES

5 C ONDITIONING WITH I NPUT /O UTPUT E XAMPLES

5.1 E NCODING INPUT / OUTPUT EXAMPLES

5.1.1 BASELINE LSTM ENCODER

5.1.2 C ROSS C ORRELATION ENCODER

5.2 C ONDITIONING PROGRAM SEARCH ON EXAMPLE ENCODINGS

I/O Encoding Train Test

6.1 S ETUP AND HYPERPARAMETERS SETTINGS

6.2 E XAMPLE ENCODING

Sampling Train Test

Sampling Train Test

6.4 E FFECT OF BACKTRACKING SEARCH

6.5 F LASH F ILL B ENCHMARKS

Number of FlashFill Benchmarks solved

Input v Output Input v Output Input v Output

Input v Output Input v Output

Reed, Scott and de Freitas, Nando. Neural programmer-interpreters. arXiv preprint

JConcat(f1 , · · · , fn )Kv = Concat(Jf1 Kv , · · · , Jfn Kv )

Figure 7: The semantics of the DSL for string transformations.

A D OMAIN - SPECIFIC L ANGUAGE FOR S TRING T RANSFORMATIONS

You might also like