0% found this document useful (0 votes)
22 views45 pages

Synthesizing Formal Semantics From Executable Interpreters

Uploaded by

widowmaker
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views45 pages

Synthesizing Formal Semantics From Executable Interpreters

Uploaded by

widowmaker
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Synthesizing Formal Semantics from Executable Interpreters

JIANGYI LIU, University of Wisconsin – Madison, USA


CHARLIE MURPHY, University of Wisconsin – Madison, USA
ANVAY GROVER, University of Wisconsin – Madison, USA
arXiv:2408.14668v3 [cs.PL] 6 Sep 2024

KEITH J.C. JOHNSON, University of Wisconsin – Madison, USA


THOMAS REPS, University of Wisconsin – Madison, USA
LORIS D’ANTONI, University of California, San Diego, USA
Program verification and synthesis frameworks that allow one to customize the language in which one is
interested typically require the user to provide a formally defined semantics for the language. Because writing
a formal semantics can be a daunting and error-prone task, this requirement stands in the way of such
frameworks being adopted by non-expert users. We present an algorithm that can automatically synthesize
inductively defined syntax-directed semantics when given (i) a grammar describing the syntax of a language
and (ii) an executable (closed-box) interpreter for computing the semantics of programs in the language of the
grammar. Our algorithm synthesizes the semantics in the form of Constrained-Horn Clauses (CHCs), a natural,
extensible, and formal logical framework for specifying inductively defined relations that has recently received
widespread adoption in program verification and synthesis. The key innovation of our synthesis algorithm is
a Counterexample-Guided Synthesis (CEGIS) approach that breaks the hard problem of synthesizing a set
of constrained Horn clauses into small, tractable expression-synthesis problems that can be dispatched to
existing SyGuS synthesizers. Our tool Synantic synthesized inductively-defined formal semantics from 14
interpreters for languages used in program-synthesis applications. When synthesizing formal semantics for
one of our benchmarks, Synantic unveiled an inconsistency in the semantics computed by the interpreter
for a language of regular expressions; fixing the inconsistency resulted in a more efficient semantics and, for
some cases, in a 1.2x speedup for a synthesizer solving synthesis problems over such a language.
CCS Concepts: • Theory of computation → Operational semantics; Automated reasoning; Logic and
verification; Constraint and logic programming; • Software and its engineering → Semantics.
Additional Key Words and Phrases: SemGuS, SyGuS, Semantics, SMT, Program Synthesis
ACM Reference Format:
Jiangyi Liu, Charlie Murphy, Anvay Grover, Keith J.C. Johnson, Thomas Reps, and Loris D’Antoni. 2024. Syn-
thesizing Formal Semantics from Executable Interpreters. Proc. ACM Program. Lang. 8, OOPSLA2, Article 284
(October 2024), 45 pages. https://doi.org/10.1145/3689724

1 Introduction
Recent work on frameworks for program verification and program synthesis has created tools that
are parametric in the language that is supported [6, 14, 16]. A user of such a framework must define
the language of interest by giving both a syntactic specification and a formal semantic specification
Authors’ Contact Information: Jiangyi Liu, University of Wisconsin – Madison, Madison, USA, [email protected];
Charlie Murphy, University of Wisconsin – Madison, Madison, USA, [email protected]; Anvay Grover, University of
Wisconsin – Madison, Madison, USA, [email protected]; Keith J.C. Johnson, University of Wisconsin – Madison, Madison,
USA, [email protected]; Thomas Reps, University of Wisconsin – Madison, Madison, USA, [email protected]; Loris
D’Antoni, University of California, San Diego, La Jolla, USA, [email protected].

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and
the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses,
contact the owner/author(s).
© 2024 Copyright held by the owner/author(s).
ACM 2475-1421/2024/10-ART284
https://doi.org/10.1145/3689724

Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
284:2 Jiangyi Liu, Charlie Murphy, Anvay Grover, Keith J.C. Johnson, Thomas Reps, and Loris D’Antoni

of the language. The semantic specification assigns a meaning to each program in the language.
However, for most programming languages, and even for simple ones used in program-synthesis
applications, it is usually a demanding task to create a formal semantics that defines the behaviors
of the programs in the language. Obstacles include: (i) the language’s semantics might only be
documented in natural language, and thus may be ambiguous (or worse, inconsistent), and (ii) the
sheer level of detail that is involved in writing such a semantics.

Synthesizing Formal Semantics from Interpreters. In this paper, we propose an alternative


approach—based on synthesis—that is applicable to any programming language for which a com-
piler or interpreter exists. Such infrastructure serves as an operational semantics for the language,
albeit one for which anything other than closed-box access would be difficult. Assuming existence
of a working compiler or interpreter is not hard — usually a language (typically not an “academic”
language) already has an interpreter already implemented, and the language users, if they want to
access techniques like verification and synthesis, need a formal semantics. Thus, we take closed-box
access as a given, and ask the following question:
Is it possible to use an existing compiler or interpreter for a language 𝐿 to create a formal
semantics for 𝐿 automatically?
In this paper, we assume that the given compiler or interpreter is capable of executing any program
or subprogram in language 𝐿.
This question is natural, but answering it formally requires one to address two key challenges.
First, in what formalism should the formal semantics be expressed? The right formalism should
be expressive enough to capture common semantics, yet structured enough to allow synthesis to be
possible. Furthermore, the formalism should not be tied to any specific programming language—i.e.,
it should be language-agnostic.
Second, how can the synthesis problem be broken down into simple enough small problems
for which one can design a practical approach? The representation of the semantics of most
programming languages is usually very large, and a monolithic synthesis approach that does not
take advantage of the compositionality of semantics definitions is bound to fail.

Our Approach. In this paper, we address both of these challenges and present an algorithm that
can automatically synthesize an inductively defined syntax-directed semantics when given (i) a
grammar describing the syntax of the language, and (ii) an executable (closed-box) interpreter for
computing the semantics of programs in the language on given inputs.
To address the first of the aforementioned challenges, we choose to synthesize the formal
semantics in the form of Constrained Horn Clauses (CHCs), a well-studied fragment of first-order
logic that already provides the foundation of SemGuS [7, 14], a domain- and solver-agnostic
framework for defining arbitrary synthesis problems. CHCs can naturally express a big-step
operational semantics, structured as an inductive definition over a language’s abstract syntax,
which makes them appropriate for compositional reasoning.
For example, the operational semantics for an assignment to a variable x in an imperative
programming language can be written as the following CHC:

J𝑒K(s1 ) = 𝑟 1 𝑠 0 = 𝑠 1 ∧ 𝑟 0 = 𝑠 0 [𝑥 ↦→ 𝑟 1 ]
Jx := 𝑒K(s0 ) = 𝑟 0

The CHC is defined inductively in terms of the semantics of the child term 𝑒.
To address the second aforementioned challenge, we take advantage of the inductive structure
of CHCs and design a synthesis algorithm that inductively synthesizes the semantics of programs

Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
Synthesizing Formal Semantics from Executable Interpreters 284:3

in the grammar, starting from simple base constructs and moving up to more complex inductively-
defined constructs. For each construct in the language, our algorithm uses a counter-example-guided
inductive synthesis (CEGIS) loop to synthesize the semantic rule—i.e., the CHC—for that construct.
For each construct, we use input-output valuations obtained by calling the closed-box interpreter
to approximate the behavior of its child terms. Such an approximation allows us to synthesize
the semantics construct-by-construct, rather than all at once, which converts the problem of
synthesizing semantics into many smaller problems that only have to synthesize part of the overall
semantics.
To evaluate our approach, we implemented it in a tool called Synantic. Our evaluation of
Synantic involved synthesizing the semantics for languages with a wide variety of features,
including assignments, conditionals, while loops, bit-vector operations, and regular expressions.
The evaluation revealed that our approach not only can help synthesize semantics of non-trivial
languages but can also help debug existing semantics.
Goals and No-goals. Our tool Synantic mainly targets users who want to use verification and
synthesis techniques on an existing language. Once Synantic creates the semantics in SemGuS
format, a wide range of tools based on SemGuS can be instantly applied [11]. For example, such
a way enables the user to get a synthesizer for the existing language for free, because the cre-
ation of SemGuS files requires minimal manual labor. Also the original goal is helping SemGuS
users, our techniques are general and we envision they could potentially be applied to other
semantic-specification frameworks (e.g., to help formalize semantics for use with a theorem prover).
Synthesizing semantics of purely academic languages is a no-goal for our tool, because most of
them already have a formal semantics available before the interpreter is implemented, thus creating
the SemGuS specifications would be trivial.
Contributions. Our work makes the following contributions:
• We introduce a new kind of synthesis problem: the semantics-synthesis problem (Section 3).
• We devise an algorithm for solving semantics-synthesis problems (Section 4). In this algorithm,
we harness an example-based program synthesizer (specifically a SyGuS solver) to synthesize
the constraint in each CHC.
• We implement our algorithm in a tool, called Synantic, which also supports an optimization
for multi-output productions, i.e., productions whose semantic constraints include multiple
output variables (Section 5).
• We evaluate Synantic on a range of different language benchmarks from the program-
synthesis literature. For one benchmark, the Synantic-generated semantics revealed an
inconsistency in the way the original semantics had been formalized. Fixing the inconsistency
in the semantics resulted in a more efficient semantics and a speedup (in some case 1.2x) for
a synthesizer solving synthesis problems over such a language (Section 6)
Section 2 illustrates how our algorithm synthesizes the semantics of an imperative while-loop
language. Section 7 discusses related work. Section 8 concludes.
References of the form Appendix A.1 refer to appendices that are available in the arXiv version
of this paper [18].

2 Illustrative Example
As discussed in Section 1, our technique synthesizes a semantic specification that is compatible with
the Semantics-Guided Synthesis (SemGuS) format [14]. SemGuS is a domain- and solver-agnostic
framework for specifying program synthesis and verification problems [7]. A SemGuS problem
consists of three components that the user must provide: (i) a grammar specifying the syntax

Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
284:4 Jiangyi Liu, Charlie Murphy, Anvay Grover, Keith J.C. Johnson, Thomas Reps, and Loris D’Antoni

of programs; (ii) a semantics for every program in the language of the grammar, provided as a
set of Constrained Horn Clauses (CHCs) assigned to the productions of the grammar; and (iii) a
specification of the desired program that makes use of the semantic predicates. Crucially, SemGuS
enables the development of general tools for program synthesis and verification, thus reducing the
burden of creating such tools for custom languages [11]. However, the stumbling block is that the
end user must be able to provide a semantics of the language they are interested in working with,
a task that can be burdensome and error-prone to perform by hand. In this section, we illustrate
how our technique (implemented in Synantic) automatically synthesizes such a semantics for
an imperative language Imp (cf. Example 2.1)—a simple but illustrative example of Synantic’s
abilities.
Example 2.1 (Syntactic Definition of Imp). Consider the grammar 𝐺 Imp𝑛 that defines the syntax of
Imp for programs with 𝑛 variables x1 , . . . , xn :
𝑆 F x1 B 𝐸 | · · · | xn B 𝐸 | 𝑆 ; 𝑆 | ite 𝐵 𝑆 𝑆 | while 𝐵 do 𝑆
| do 𝑆 while 𝐵 | repeat 𝑆 until 𝐵
𝐵 B false | true | ¬ 𝐵 | 𝐵 ∧ 𝐵 | 𝐵 ∨ 𝐵 | 𝐸 < 𝐸
𝐸 B 0 | 1 | x1 | · · · | xn | 𝐸 + 𝐸 | 𝐸 − 𝐸
The Imp language consists of arithmetic and Boolean expressions, statements for assignment to
the variables x1 through xn , sequential composition, if-then-else, and various looping constructs.
Imp also comes equipped with an executable interpreter IImp that assigns to each term 𝑡 ∈ L (𝐺)
its standard (denotational) semantics (e.g., arithmetic and Boolean expressions are evaluated as in
linear integer arithmetic, xi B 𝑒 takes as input a state, and outputs the input state with 𝑥𝑖 ’s value
updated by the result of evaluating 𝑒, etc.).
Suppose that we did not know the semantics of Imp a priori; that is, suppose that we only have
access to the interpreter IImp . How can we synthesize a formal semantics for each program in 𝐺 Imp
using the interpreter? A naïve approach would randomly generate a large set of terms and inputs,
and try to learn a function mapping inputs to outputs for each term. However, this approach would
only provide a semantics for the enumerated terms, and fails to generalize to the entire language. A
less naïve approach might attempt to form a monolithic synthesis problem to synthesize a semantic
function for each production of the grammar that satisfies a set of generated example terms and
input-output pairs. However, it is known that synthesizers scale exceptionally poorly in the size
of the desired output [3], even for Imp1 , which has only 17 productions, this approach would be
practically impossible.
Nullary productions. One of the key innovations of our approach is that we synthesize the
semantics on a per-production basis, i.e., working one production at a time. We start by synthesizing
a semantics for nullary (leaf) productions. For Imp1 , this means we synthesize a semantics for the
productions 0, 1, x1 , false, and true before we synthesize the semantics of any other productions.
For a nullary production p, we synthesize a semantics of the form:
𝑥 0out = 𝑓 (𝑥 0in )
Sem(p, 𝑥 0in, 𝑥 0out )
which states that, because the term p has no sub-terms, the output is only a function of the input 𝑥 0in .
In our approach, we use a Counter-Example-Guided Synthesis (CEGIS) approach to synthesize a
function 𝑓 that captures the behavior of IImp on production 𝑝. Within the CEGIS loop, we synthesize
a candidate function 𝑓 , then verify if it is consistent with IImp (e.g., on a larger number of inputs

Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
Synthesizing Formal Semantics from Executable Interpreters 284:5

𝑥 0in ). If 𝑓 is consistent, then we have successfully learned the semantics of 𝑝; otherwise, the verifier
generates a counter-example and a new candidate semantic function 𝑓 .
Inductively synthesizing semantics. Next, our approach synthesizes the semantics for other arith-
metic and Boolean expressions. In this step, we inductively synthesize the semantics of productions
by reusing the semantics of previously learned productions to learn the semantics of new produc-
tions. At this point, we may assume that we know the semantics of all nullary productions. For
instance, suppose that we wish to next learn the semantics of +. At first, our algorithm generates
examples favoring terms like 1 + 1, x + 1, etc. that contains sub-terms whose semantics have already
been learned. For t1 + t2 , our algorithm generates a semantics that can rely on the semantics of its
sub-terms t1 and t2 . Specifically, the semantics of t1 + t2 takes the following form:
sem(t1, 𝑥 1in, 𝑥 1out ) sem(t2, 𝑥 2in, 𝑥 2out )
𝑥 1in = 𝑓1 (𝑥 0in ) in in out
𝑥 2 = 𝑓2 (𝑥 0 , 𝑥 1 ) 𝑥 0out = 𝑓0 (𝑥 0in, 𝑥 1out , 𝑥 2out )
sem(t1 + t2, 𝑥 0in, 𝑥 1out )
which states that the semantics of t1 + t2 is inductively defined in terms of the semantics of t1 and
the semantics of t2 . The semantics enforces a left-to-right evaluation order:1 the rule expresses
that the input to t1 , 𝑥 1in , is a function of t1 + t2 ’s input, 𝑥 0in , and similarly that t2 ’s input, 𝑥 2in , is
a function of t1 + t2 ’s input, 𝑥 0in , and t1 ’s output, 𝑥 1out . Finally, it also expresses that the t1 + t2 ’s
output, 𝑥 0out , is a function of its input, 𝑥 0in , and the outputs of t1 (𝑥 1out ) and t2 (𝑥 2out ).
When the semantics of a sub-term ti is known (e.g., for nullary productions), we substitute its
learned semantics for sem(ti, 𝑥𝑖in, 𝑥𝑖out ); otherwise, we approximate its semantics using examples.
Again, we use a CEGIS loop to generate examples for the entire term t1 + t2 , as well as any sub-
terms whose exact semantics have not yet been synthesized (e.g., for a sub-term that uses + or −).
The process proceeds analogously for most other productions in Imp.
Semantically recursive productions. The final interesting case is for while loops, for
which the semantics is recursive on the term itself. For semantically recursive produc-
tions, we assume that the semantics can make a recursive call (i.e., effectively acting
as if the term itself is a sub-term). We additionally synthesize a predicate determin-
ing if the recursive call should be made or not. For while b do s, we synthesize two
semantic rules, one in which the recursive call is made, and one in which it is not.
sem(b, 𝑥 1in, 𝑥 1out ) sem(s, 𝑥 2in, 𝑥 2out ) ¬Pred rec (𝑥 0in, 𝑥 1out , 𝑥 2out )
𝑥 1in = 𝑓1 (𝑥 0in ) 𝑥 2in= 𝑓2 (𝑥 0in, 𝑥 1out ) 𝑥 0out = 𝑓0 (𝑥 0in, 𝑥 1out , 𝑥 2out )
sem(while b do s, 𝑥 0in, 𝑥 1out )

sem(b, 𝑥 1in, 𝑥 1out ) sem(s, 𝑥 2in, 𝑥 2out ) sem(while b do s, 𝑥 3in, 𝑥 3out ) Pred rec (𝑥 0in, 𝑥 1out , 𝑥 2out )
in in
𝑥 1 = 𝑓1 (𝑥 0 ) 𝑥 2 = 𝑓2 (𝑥 0in, 𝑥 1out )
in
𝑥 3in in out out
= 𝑓2 (𝑥 0 , 𝑥 1 , 𝑥 2 ) 𝑥 0 = 𝑓0 (𝑥 0in, 𝑥 1out , 𝑥 2out , 𝑥 3out )
out

sem(while b do s, 𝑥 0in, 𝑥 1out )


As with the previous productions, our algorithm uses a CEGIS loop to synthesize a candidate
semantics of the above form, verify its correctness, and generate a counter-example if the candidate
semantics is incorrect. While we may employ learned semantics for sub-terms, recursive calls to
a sub-term must be approximated using examples because we are still in the process of learning
its semantics. We formally define the semantics-synthesis problem that we solve in Section 3 and
explain how our synthesis algorithm works in Section 4.
Multi-output productions. In the above while-loop example, we saw that the function 𝑓0 had four
inputs that must be considered when synthesizing a term to instantiate 𝑓0 . As the number of input
1We show how to overcome this restriction in Section 5.1.

Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
284:6 Jiangyi Liu, Charlie Murphy, Anvay Grover, Keith J.C. Johnson, Thomas Reps, and Loris D’Antoni

variables and the size of the desired result grows, synthesis scales poorly. In the above examples,
the notation is not showing the full picture. For Imp𝑛 all input and (most) output variables are an
𝑛-tuple of variables representing a state of an Imp𝑛 program. Even for just Imp2 , 𝑓0 has twice as
many inputs.
To address this problem, we allow synthesizing the semantics of each output of a production
independently. For example, consider the production x0 B t (for Imp2 ). We generate a semantics
using two constraints 𝐹 and 𝐺, independently. The constraint 𝐹 (resp. 𝐺) represents the pair of
functions 𝑓0 and 𝑓1 (resp. 𝑔0 and 𝑔1 ).

sem(t, 𝑥 1in, 𝑥 1in ) 𝑥 1in = 𝑓1 (𝑥 0in ) 𝑥 0out = 𝑓0 (𝑥 0in, 𝑥 1out )


𝐹
sem(x0 B t, 𝑥 0in, 𝑥 0out )

sem(t, 𝑥 1in, 𝑥 1in ) 𝑥 1in = 𝑔1 (𝑥 0in ) 𝑥 0out = 𝑔0 (𝑥 0in, 𝑥 1out )


𝐺
sem(x0 B t, 𝑥 0in, 𝑥 0out )

By independently synthesizing 𝐹 and 𝐺, we reduce the burden on the underlying synthesizer;


however, now the synthesizer is allowed to return an 𝐹 and 𝐺 for which 𝑓1 ≠ 𝑔1 . Thus, 𝐹 and 𝐺
have inconsistent inputs being provided to the child-term 𝑡. We use an SMT solver to determine
if 𝑓1 and 𝑔1 are consistent for each of the example inputs to the term x0 B t. If so, we will return
either 𝑓0, 𝑔0, 𝑓1 (or 𝑓0, 𝑔0, 𝑔1 because 𝑓1 and 𝑔1 are consistent on all examples—i.e., when evaluated on
the same example they return equal outputs—otherwise, we discover that 𝑓1 and 𝑔1 are inconsistent
on some input and add a new constraint to ensure that the same pair of functions 𝑓1 and 𝑔1 cannot
be synthesized again. This optimization is further discussed in Section 5.3.

3 Problem Definition
In this paper, we consider the problem of synthesizing a formal logical semantics for a deterministic
language from an executable interpreter. While there are many possible ways to logically define a
semantics, we are interested in an approach that is language-agnostic and inductive. The SemGuS
synthesis framework has proposed using Constrained Horn Clauses as a way of defining program
semantics that meets both of our desiderata. Concretely, SemGuS already supports synthesis for
a large number of languages (which we consider in our experimental evaluation) by allowing a
user to provide a user-defined semantics. As mentioned above, in SemGuS, semantics are defined
inductively on the structure of the grammar (i.e., per production/language construct) using logical
relations represented as Constrained Horn Clauses (CHCs) [14]. In this paper, we follow suit and
address the problem of learning a semantics of this form from an executable interpreter for the
given language. This section formalizes the semantics-synthesis problem that we consider. We
begin by detailing our representation of syntax (Section 3.1), interpreters (Section 3.2), semantics
(Section 3.3), and semantic-equivalence oracles (Section 3.4). Finally, we formalize the semantics-
synthesis problem in Section 3.4.

3.1 Syntax
We consider languages represented as regular tree grammars (RTGs). A ranked alphabet is a tuple
⟨Σ, 𝑟𝑘 Σ ⟩ that consists of a finite set of symbols Σ and a function 𝑟𝑘 Σ : Σ → N that associates every
symbol with a rank (or arity). For any 𝑛 ≥ 0, Σ𝑛 ⊆ Σ denotes the set of symbols of rank 𝑛. The set
of all (ranked) Trees over Σ is denoted by 𝑇Σ . Specifically, 𝑇Σ is the least set such that Σ0 ⊆ 𝑇Σ and if
𝜎 𝑘 ∈ Σ𝑘 and 𝑡 1, . . . , 𝑡𝑘 ∈ 𝑇Σ , then 𝜎 𝑘 (𝑡 1, . . . , 𝑡𝑘 ) ∈ 𝑇Σ . In the remainder of the paper, we assume a
fixed ranked alphabet ⟨Σ, rk Σ ⟩.

Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
Synthesizing Formal Semantics from Executable Interpreters 284:7

A typed regular tree grammar (RTG) is a tuple 𝐺 = ⟨𝑁 , Σ, 𝛿, T, 𝜃, 𝜏⟩, where 𝑁 is a finite set of
non-terminal symbols of rank 0, Σ is a ranked alphabet, 𝛿 is a set of productions over a set of types
T, and for each non-terminal 𝐴 ∈ 𝑁 , and 𝜃 𝐴 (resp. 𝜏𝐴 ) assigns 𝐴 an input-type (resp. output-type)
from T. Each production in 𝛿 takes the form:

𝐴0 → 𝜎 𝐴1, 𝐴2, . . . , 𝐴𝑟𝑘 Σ (𝜎 )
where 𝐴𝑖 ∈ 𝑁 and 𝜎 ∈ Σ. We use L (𝐴) to denote the language of non-terminal 𝐴 and 𝛿 (𝐴) the
set of all productions associated with 𝐴 (i.e., all productions where 𝐴0 is 𝐴). In the remainder, we
assume a fixed grammar 𝐺 = ⟨𝑁 , Σ, 𝛿, T, 𝜃, 𝜏⟩.
Example 3.1 (𝐺 Imp as a Regular Tree Grammar). Consider the Imp language detailed in Section 2,
𝐺 Imp is a regular tree grammar that has been stylized to ease readability. For example, the non-
terminals consist of the rank-0 symbols 𝐸, 𝐵, and 𝑆. The productions include 𝑆 → x1 B(𝐸),
𝑆 → ;(𝑆, 𝑆), and 𝑆 → while(𝐵, 𝑆). For Imp2 (Imp with two variables 𝑥 1 and 𝑥 2 ), 𝜃 𝐸 is the type Z × Z,
representing the state of the two variables, and 𝜏𝐸 is Z, representing the return type of arithmetic
expressions.

3.2 Interpreters
We consider a class of deterministic executable interpreters—i.e., a program evaluator for which we
may only observe input-output behavior.
Definition 3.2 (Interpreter). Formally, an interpreter for 𝐺 maps each non-terminal 𝐴 ∈ 𝑁 to
a partial function I𝐴 : (L (𝐴) × 𝜃 𝐴 ) → 𝜏𝐴 —with the interpretation that the interpreter maps a
program 𝑡 ∈ L (𝐴) and input value in ∈ 𝜃 𝐴 to some output out ∈ 𝜏𝐴 if and only if 𝑡 starting with
the input value in terminates with the output value out.
Example 3.3 (Interpreters for Imp1 ). Recall the Imp language defined in Section 2. The interpreter
I for Imp consists of three base interpreters I𝐸 , I𝐵 , and I𝑆 , which are used to evaluate arithmetic
expressions, Boolean expressions, and statements, respectively. Throughout this paper, we assume
the interpreters for Imp1 (and all Imp variants) evaluate according to the standard denotational
semantics (e.g., 0 is the expression that always returns 0 regardless of input state; + is mathematical
+; while b s evaluates 𝑏, executes the loop body, and recurses if b evaluates to true and otherwise
immediately terminates; etc.).

3.3 Semantics
We represent the big-step semantics of a language (defined by some grammar 𝐺) using a set
of Constrained Horn Clauses (CHCs) within some background theory T per production. While
CHCs (at first glance) seem limiting, this formulation of semantics has been employed by the
SemGuS framework to represent user-defined semantics for many languages [7, 14], including
many variations of Imp, regular expressions, SyGuS expressions within the theory of bit vectors,
algebraic data types, linear integer arithmetic.
Definition 3.4 (Constrained Horn Clause). A CHC (in theory T ) is a first-order formula of the
form:
∀¯
𝑥 1, . . . , 𝑥¯𝑛 , 𝑥¯ . 𝜙 ∧ 𝑅1 (𝑥¯1 ) ∧ · · · ∧ 𝑅𝑛 (𝑥¯𝑛 ) ⇒ 𝐻 (𝑥)
¯
where 𝑅1, . . . , 𝑅𝑛 and 𝐻 are uninterpreted relations, 𝑥¯1, . . . , 𝑥¯𝑛 and 𝑥¯ are variables, and 𝜙 is a
quantifier-free T -constraint over the variables.
To specify the big-step semantics of a non-terminal 𝐴 ∈ 𝑁 (for which the interpreter has type
I𝐴 : (L (𝐴) × 𝜃 𝐴 ) → 𝜏𝐴 ), we introduce the semantic relation Sem𝐴 (𝑡𝐴 , 𝑥𝐴in, 𝑥𝐴out ), where 𝑡𝐴 is a

Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
284:8 Jiangyi Liu, Charlie Murphy, Anvay Grover, Keith J.C. Johnson, Thomas Reps, and Loris D’Antoni

variable representing elements of L (𝐴), 𝑥𝐴in is a variable of type 𝜃 𝐴 , and 𝑥𝐴out is a variable of type 𝜏𝐴 .
Throughout this paper, we may also use J𝑡𝐴 KSem (𝑥𝐴in ) = 𝑥𝐴out to denote that Sem𝐴 (𝑡𝐴 , 𝑥𝐴in, 𝑥𝐴out ) holds.
Example 3.5 (Semantic relations). Consider the Imp1 language introduced in Section 2; a semantics
for Imp1 uses the semantic relations:
Sem𝐸 : L (𝐸) ×Z×Z → bool Sem𝐵 : L (𝐵) ×Z×bool → bool Sem𝑆 : L (𝑆) ×Z×Z → bool
While CHCs are quite general and capable of defining both deterministic and non-deterministic
semantics, we limit our scope to CHCs that represent deterministic semantics. Furthermore, for a
grammar 𝐺, we assume that each production 𝐴0 → 𝜎 (𝐴1, . . . , 𝐴𝑛 ) ∈ 𝐺 evaluates sub-terms in a
fixed order from left to right (i.e., for a term 𝑝 (𝑡 1, . . . , 𝑡𝑛 ) sub-term 𝑡 1 is evaluated before 𝑡 2 , etc.).
While this imposed order may seem too restrictive, we later show how this restriction can be lifted
by considering all permutations of sub-terms.
Definition 3.6 (Semantic Rule, Semantic Constraint). Given a production 𝐴0 → 𝑝 (𝐴1, . . . , 𝐴𝑛 ) a
semantic rule for 𝑝 is a CHC of the form:
Sem𝐴1 (𝑡 1, 𝑥 1in, 𝑥 1out ) ... Sem𝐴𝑛 (𝑡𝑛 , 𝑥𝑛in, 𝑥𝑛out ) 𝐹 (𝑥 0in, . . . , 𝑥𝑛in, 𝑥 0out , . . . , 𝑥𝑛out )
Sem𝐴𝑛 (𝑝 (𝑡 1, . . . , 𝑡𝑛 ), 𝑥 0in, 𝑥 0out ) (1)
where 𝐹 is constraint over theory T , which we call a semantic constraint, that takes the form:
𝑥 1in = 𝑓1 (𝑥 0in ) ∧ · · · ∧𝑥𝑛in = 𝑓𝑛 (𝑥 1out , . . . , 𝑥𝑛−1
out
, 𝑥 0in ) ∧𝑥 0out = 𝑓0 (𝑥 1out , . . . , 𝑥𝑛out , 𝑥 0in ) ∧ 𝑃 (𝑥 0in, 𝑥 0out , . . . , 𝑥𝑛out )
(2)
where each 𝑓𝑖 is a function that returns a term of type 𝜃 𝐴𝑖 for 𝑖 > 0 and 𝜏𝐴0 for 𝑖 = 0. The semantic
constraint also includes predicate 𝑃 (𝑥𝐴in0 , 𝑥𝐴out1 , . . . , 𝑥𝐴out𝑛 ) that determines when the semantic rule is
valid (e.g., for conditionals and loops).
Example 3.7 (Semantics of do_while). We give the semantics of the do_while Imp statement
below:
J𝑠K(𝑥 1 ) = 𝑥 1′

J𝑏K(𝑥 2 ) = 𝑟𝑏 Jdo 𝑠 while 𝑏K(𝑥 3 ) = 𝑥 3 𝑟𝑏 𝑥1 = 𝑥0 𝑥 2 = 𝑥 1′ 𝑥 3 = 𝑥 1′ 𝑥 0′ = 𝑥 3′
Jdo 𝑠 while 𝑏K(𝑥 0 ) = 𝑥 0′

J𝑠K(𝑥 1 ) = 𝑥 1′ J𝑏K(𝑥 2 ) = 𝑟𝑏 ¬𝑟𝑏 𝑥1 = 𝑥0 𝑥 2 = 𝑥 1′ 𝑥 0′ = 𝑥 1′


Jdo 𝑠 while 𝑏K(𝑥 0 ) = 𝑥 0′
The first rule executes the statement 𝑠 and then, if the guard 𝑏 is true recursively executes the
whole loop and returns the resulting value. The second rule executes the statement 𝑠 and then, if
the guard 𝑏 is false returns the output produced when executing the statement 𝑠.

3.4 Equivalence Oracle and Semantics Synthesis Problem


For a grammar 𝐺, a semantics Sem for 𝐺, and an interpreter I for 𝐺, an equivalence oracle is used
to determine whether Sem is equivalent to the semantics defined by the interpreter I.
Definition 3.8 (Equivalent, Equivalence Oracle). Given an interpreter I for a language 𝐺, a
subgrammar 𝐺 ′ ⊆ 𝐺, and a semantics Sem for 𝐺 ′ , we say that I and Sem are equivalent on 𝐺 ′ if
and only if for every term 𝑡 ∈ L (𝐺 ′ ), input in ∈ 𝜃 𝐴 , and output out ∈ 𝜏, we have:
𝐼 (𝑡, in) = out ⇔ J𝑡KSem (in) = out
An equivalence oracle E for I is a function that takes as input a semantics Sem for 𝐺 ′ and
determines if Sem is equivalent to I on 𝐺 ′ . If Sem is not equivalent to I, then E returns an

Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
Synthesizing Formal Semantics from Executable Interpreters 284:9

example ⟨in, 𝑡, out⟩ for which I and Sem disagree—i.e., there is some term 𝑡 and input in such that
J𝑡KSem (in) ≠ J𝑡K I (in)—and otherwise returns None when Sem and I are equivalent.

Given a language (a grammar and accompanying interpreter), the semantics synthesis problem
is to find some semantics of the language that is equivalent to the interpreter. We formalize the
semantics synthesis problem as follows:

Definition 3.9 (Semantics-Synthesis Problem, Solution). A semantics-synthesis problem is a


tuple P ≜ ⟨𝐺, I, E⟩, where 𝐺 is a grammar, I is an interpreter for 𝐺, and E is an equivalence
oracle for I. A solution to the semantics-synthesis problem P is a semantics Sem for 𝐺 that is
equivalent to I as determined by E.

4 Semantics Synthesis
This section presents an algorithm SemSynth (Algorithm 1) to synthesize a semantics for a language
from an executable interpreter. The input to SemSynth is a semantics-synthesis problem consisting
of (i) a grammar 𝐺, (ii) an executable interpreter I for 𝐺, and (iii) an equivalence oracle E for I.
Upon termination, SemSynth returns a semantics Sem for 𝐺 that is equivalent to the executable
interpreter I as determined by the equivalence oracle E.
Synthesizing a semantics for arbitrary languages comes with several challenges. In general,
semantics are defined as complex recursively defined functions that provide an interpretation
to every program within the language. Trying to directly synthesize such a semantics is already
impractical for relatively small languages, such as the Imp language defined in Example 2.1.
As described in Section 3.3, we consider semantics represented using logical relations defined
by a set of Constrained Horn Clauses per production of 𝐺 (cf. Definition 3.6). By formulating
the desired semantics as CHCs per production, SemSynth can synthesize the semantics of 𝐺 one
production at a time. In fact, because SemSynth uses examples to approximate the semantics of
all sub-terms during synthesis (cf. Section 4.1), SemSynth can synthesize the semantics of each
production independently. Finally, by fixing the shape of the semantics (i.e., as a set of CHCs per
production), SemSynth reduces the monolithic synthesis problem to a series of first-order synthesis
problems—specifically, by using a SyGuS or sketch-based synthesizer to synthesize the constraint
of each semantic rule (CHC) defining the semantics of a production.
The remainder of this section is structured as follows: Section 4.1 provides a high-level overview
of how SemSynth solves semantic-synthesis problems, Sections 4.2 and 4.3 provide specifications
for SynthSemanticConstraint and Verify, which synthesize semantic constraints from examples
and verify candidate semantic constraints against the interpreter, respectively. Finally, Section 4.4
explains how SemSynth handles semantically recursive productions.

4.1 Overview of SemSynth


SemSynth (Algorithm 1) uses the counter-example-guided synthesis (CEGIS) paradigm to synthe-
size a semantics for 𝐺 that is equivalent to I according to the equivalence oracle E. Throughout
this section, we will use the Imp language from Example 2.1 to illustrate how SemSynth operates.

Synthesizing a Candidate Semantics. After initialization, SemSynth synthesizes the semantics of


each production. SemSynth employs a CEGIS loop to synthesize the semantics of each production.
During each iteration, SemSynth first synthesizes a candidate semantic constraint (cf. Definition 3.6)
for production 𝑝 using SynthSemanticConstraint. The procedure SynthSemanticConstraint
returns some semantic constraint for 𝑝 that satisfies the set of examples 𝐸. Section 4.2 provides a
formal specification of SynthSemanticConstraint’s operation.

Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
284:10 Jiangyi Liu, Charlie Murphy, Anvay Grover, Keith J.C. Johnson, Thomas Reps, and Loris D’Antoni

Algorithm 1: Semantics-Synthesis Algorithm


1 Procedure SemSynth (𝐺, I, E)
2 foreach production 𝑝 of 𝐺 do
3 𝐸←∅; // Example Set for Production 𝑝
4 do
5 Sem[𝑝] ← SynthSemanticConstraint(𝑝, 𝐸) ; // Get candidate semantics
6 CEX ← Verify(Sem[𝑝], 𝑝, I, E) ; // Check candidate semantics
7 if 𝐶𝐸𝑋 ≠ ∅ then
8 𝐸 ← 𝐸 ∪ CEX ; // Update example set
9 while CEX ≠ ∅;
10 return Sem;

SemSynth then uses the procedure Verify to determine if the semantics synthesized for produc-
tion 𝑝 is consistent with the interpreter I as determined by the equivalence oracle E. A formal
specification of Verify is provided in Section 4.3. If Verify determines that the candidate semantics
of 𝑝 is correct, then Verify returns an empty set of examples and SemSynth proceeds to synthesize
the semantics of the next production. Otherwise, if Verify determines that the candidate semantics
of 𝑝 is not equivalent to the interpreter I, Verify returns a set of examples. The new examples are
added to the example set 𝐸, and the CEGIS loop repeats and synthesizes a new candidate semantics
for 𝑝.

4.2 Specification of SynthSemanticConstraint


Before formally specifying SynthSemanticConstraint (Section 4.2.3), we first define example
sets (Section 4.2.1) and when a semantic constraint is consistent with an example set (Section 4.2.2).
4.2.1 Example Sets. For an interpreter I, an example set 𝐸 is a set of examples consistent with I.

Definition 4.1 (Example set for interpreter I). Given an interpreter I for grammar 𝐺, an example
set 𝐸 for interpreter I is a finite set of examples of the form ⟨in, 𝑡, out⟩, where 𝑡 ∈ 𝐿(𝐺) and
I (𝑡, in) = out.

Example 4.2 (Example set for Imp1 ). Recall the interpreter IImp1 described in Example 3.3
for language Imp1 . An example set 𝐸 for IImp1 might include the examples ⟨0, 0, 0⟩, ⟨1, 0, 0⟩,
⟨1, x B 0; x B x + 4, 4⟩, and ⟨10, while 0 < x do x B x − 1, 0⟩; however, an example set for IImp
could not include any example of the form ⟨𝑛, while 0 < x do x B x + 1, 𝑛 ′ ⟩ where 𝑛 (the initial
value of x) is some positive number. Since, while 0 < x do x B x + 1 would not terminate on the
input 𝑛. The example ⟨𝑛, while 0 < x do x B x + 1, 𝑛 ′ ⟩ would violate the assumption that 𝐸 only
contains examples consistent with the interpreter IImp .

4.2.2 Example Consistency. In SemSynth, we use the example set 𝐸 to ensure that the semantic
constraint returned by SynthSemanticConstraint is consistent with I for at least the examples
appearing in 𝐸.

Definition 4.3 (Consistency with Example Set). Given a production 𝐴0 → 𝑝 (𝐴1, . . . , 𝐴𝑛 ), a se-
mantic rule 𝑅 with semantic constraint 𝐹 of the form defined in Definition 3.6, and example set
𝐸, we say 𝑅 is consistent with 𝐸 if and only if the semantic constraint 𝐹 is consistent with 𝐸.
Furthermore, the semantic constraint 𝐹 is consistent with the example set 𝐸 if for every example

Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
Synthesizing Formal Semantics from Executable Interpreters 284:11

in𝐴0 , 𝑝 (𝑡 1, . . . , 𝑡𝑛 ), out 𝐴0 ∈ 𝐸 the following condition holds:


𝑥 0in = in0
­ ∧ Summary(𝑡 1 ) ®
© ª
∀𝑥 0in, . . . , 𝑥𝑛in, 𝑥 0out , . . . , 𝑥𝑛out . ­ ® ⇒ 𝑥 0out = out 0 (3)
­ ®
...
­ ∧ Summary(𝑡𝑛 ) ®
­ ®

« ∧ 𝐹 ¬
Ô in
where Summary(𝑡𝑖 ) = {𝑥𝑖 = in𝑖 ∧ 𝑥𝑖 = out𝑖 : ⟨in𝑖 , 𝑡𝑖 , out𝑖 ⟩ ∈ 𝐸} summarizes the semantics of
out

𝑡𝑖 according to the examples found in 𝐸.


Example 4.4 (Example Consistency). Consider the production for the operator +, and the (correct)
semantic constraint 𝐹 ≜ 𝑥 1in = 𝑥 0in ∧ 𝑥 2in = 𝑥 0in ∧ 𝑥 0out = 𝑥 1out + 𝑥 2out ; 𝐹 is consistent with the examples
⟨0, x0 + 1, 1⟩, ⟨0, x0, 0⟩, and ⟨0, 1, 1⟩. Specifically, the following formula is valid:
∀𝑥 0in, 𝑥 1in, 𝑥 2in, 𝑥 0out , 𝑥 1out , 𝑥 2out . (𝑥 0in = 0 ∧ (𝑥 1in = 0 ∧ 𝑥 1out = 0) ∧ (𝑥 1in = 0 ∧ 𝑥 1out = 1) ∧ 𝐹 ) ⇒ 𝑥 0out = 1.
4.2.3 Formal Specification of SynthSemanticConstraint. The procedure SynthSemantic-
Constraint takes as input the production 𝑝 whose semantics is to be synthesized and the current
example set 𝐸; it returns a constraint 𝐹 —of the form defined in Definition 3.6—defining a semantics
for production 𝑝 that is consistent with the example set 𝐸.
Example 4.5 (Synthesizing semantics of x B consistent with examples). Recall that for the language
Imp, the semantics of the production x B is represented as (a set of) CHC rule(s) of the form:
Sem𝐸 (𝑒, 𝑥 1in, 𝑥 0out ) ∧ 𝑥 1in = 𝑓 (𝑥 0in ) ∧ 𝑥 0out = 𝑔(𝑥 0in, 𝑥 1out )
Sem𝑆 (x B𝑒, 𝑥 0in, 𝑥 0out )
for some functions 𝑓 and 𝑔 (in the theory of linear integer arithmetic). The procedure call
SynthSemanticConstraint(x B, 𝐸) synthesizes the formulas 𝑓 (𝑥 0in ) = 𝑡 𝑓 and 𝑔(𝑥 0in, 𝑥 1out ) = 𝑡𝑔 ,
and returns the constraint 𝐹 ≜ 𝑥 1in = 𝑡 𝑓 ∧ 𝑥 0out = 𝑡𝑔 so that 𝐹 is consistent with 𝐸.
We note that for functions expressible in a decidable first-order theory, this problem can be
exactly encoded as a Syntax-Guided Synthesis (SyGuS) problem [2] and solved by a SyGuS solver
(e.g., cvc5 [4]).

4.3 Specification of Verify


The procedure Verify takes as input the production 𝑝, a candidate semantics of 𝑝, the interpreter
I, and the equivalence oracle E; it determines if Sem is equivalent to the interpreter I for all
terms of the form 𝑝 (𝑡 1, ..., 𝑡𝑘 ) ∈ 𝐿(𝐺). If Verify determines that the candidate semantics of 𝑝 is not
equivalent to I, Verify returns a set of counter-examples CEX such that (i) CEX is consistent with
I, (ii) CEX is not consistent with the candidate semantics of production 𝑝, and (iii) for the input
production 𝑝, there is exactly one example of the form ⟨𝑖, 𝑝 (𝑡 1, . . . , 𝑡𝑘 ), 𝑜⟩ appearing in CEX (for
any other production 𝑝 ′ ≠ 𝑝, there can be many examples of the form ⟨𝑖 ′, 𝑝 ′ (𝑡 1, . . . , 𝑡𝑘 ), 𝑜 ′ ⟩ in CEX).
Otherwise, Verify returns an empty-set to signify that the semantics of 𝑝 is equivalent to I for all
terms of the form 𝑝 (𝑡 1, . . . , 𝑡𝑘 ) ∈ 𝐿(𝐺).
Example 4.6 (Synthesizing Semantics of 0 for 𝐺 Imp ). Recall the Imp language in Example 2.1.
On some iterations, SemSynth will consider the production 0 (a leaf/nullary production).
During the first iteration of the CEGIS loop for 0, the example set 𝐸 will be empty and
SynthSemanticConstraint may return any constraint 𝐹 of the form 𝑥 0out = 𝑓 (𝑥 0in ). Assume
that SynthSemanticConstraint returns the constraint 𝑥 0out = 1. Verify returns the counter-
example ⟨0, 0, 0⟩, and the example set 𝐸 is updated.

Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
284:12 Jiangyi Liu, Charlie Murphy, Anvay Grover, Keith J.C. Johnson, Thomas Reps, and Loris D’Antoni

In the next iteration, the CEGIS loop must return a constraint satisfying the updated example set.
For example, suppose that SynthSemanticConstraint returns the constraint 𝑥 0out = 𝑥 0in . Again,
Verify determines that 𝑥 0out = 𝑥 0in is incorrect and returns the new counter-example ⟨1, 0, 0⟩. The
example set 𝐸 is updated with the returned counter-example.
A new iteration of the loop is run. On this iteration, SynthSemanticConstraint must return
a constraint that satisfies both of the previously returned examples. This time SynthSemantic-
Constraint returns the constraint 𝑥 0out = 0, Verify determines that 𝑥 0out = 0 is correct, and
SemSynth proceeds to synthesize the semantics of the next production (e.g., 1).
In Example 4.6, we see how SemSynth handles nullary (leaf) productions. SemSynth works
nearly identically for most production rules (excluding semantically recursive productions like
while loops). We demonstrate in Example 4.7 how SemSynth synthesizes a semantics for non-
nullary productions.
Example 4.7 (Synthesizing Semantics of Sequencing for Imp.). Continuing from Example 4.6,
SemSynth proceeds and comes to the sequencing operator (i.e., for production 𝑆 → ;(𝑆, 𝑆)).
After several attempts at synthesizing the semantics of sequencing, 𝐸 contains the examples
⟨0, x B 1; x B 0, 0⟩, ⟨0, x B 0; x B x + 1, 1⟩, and ⟨1, x B 0; (x B 1; x B x + 1), 2⟩.
In addition to these examples, we summarize the semantics of each example’s sub-term with
further examples in the example set 𝐸. These summarized examples of sub-terms are generated by
data-flow propagation through the term 𝑝 (𝑡 1, . . . , 𝑡𝑘 ) using the input 𝑖. Because the execution output
of a certain sub-term 𝑡 𝑗 can be used as input for any following term 𝑡𝑙 where 𝑙 > 𝑗, we repeatedly
enumerate all possible inputs for each sub-term (and add them into 𝐸) until we reach a fix-point, i.e.,
no new examples for sub-terms are found. SemSynth then generates the formula specifying that
the desired semantic constraint is consistent with the example set 𝐸 using the generated summaries,
and produces a new semantic constraint using SynthSemanticConstraint. On this iteration,
SynthSemanticConstraint returns the correct semantic constraint, Verify determines that it is
correct, and SemSynth proceeds to synthesize a semantics for the next production.

4.4 Synthesizing Semantics for Semantically Recursive Productions


So far, we have seen how SemSynth handles nullary productions and structurally recursive pro-
ductions (e.g., ite and sequencing). However, we have not yet seen how to handle productions that
are semantically recursive (e.g., while loops). To handle semantically recursive productions, we
augment the form of the desired constraint to be synthesized: SynthSemanticConstraint must
synthesize a predicate 𝑃rec and two base constraints 𝐹 nonrec and 𝐹 rec such that for every example
⟨in, 𝑝 (𝑡 1, . . . , 𝑡𝑛 ), out⟩, the following conditions hold:
Sem𝐴1 (𝑡 1, 𝑥𝐴in1 , 𝑥𝐴out1 )
... Sem𝐴𝑛 (𝑡𝑛 , 𝑥𝐴in𝑛 , 𝑥𝐴out𝑛 ) ¬𝑃rec (𝑥𝐴0 , 𝑥𝐴out1 , . . . , 𝑥𝐴out𝑛 )
in
𝐹 non−rec (𝑥𝐴in0 , 𝑥𝐴out1 , . . . , 𝑥𝐴out𝑛 ) 𝑥𝐴in0 = in
non-rec
𝑥𝐴out0 = out

Sem𝐴1 (𝑡 1, 𝑥𝐴in1 , 𝑥𝐴out1 ) ... Sem𝐴𝑛 (𝑡𝑛 , 𝑥𝐴in𝑛 , 𝑥𝐴out𝑛 )


in ′ out ′ in out out
Sem𝐴0 (𝑝 (𝑡 1, . . . , 𝑡𝑛 ), 𝑥𝐴0 , 𝑥𝐴0 ) 𝑃rec (𝑥𝐴0 , 𝑥𝐴1 , . . . , 𝑥𝐴𝑛 ) 𝐹 rec (𝑥𝐴in0 , 𝑥𝐴out1 , . . . , 𝑥𝐴out𝑛 ) 𝑥𝐴in0 = in
rec
𝑥𝐴out0 = out
where 𝑃rec determines if the non-rec or rec condition should hold. The non-recursive case is similar
to the conditions for non-semantically recursive statements (with the addition of asserting that 𝑃rec
is false). The recursive case, however additionally allows the semantics to make use of a recursive
call to the program term. Other than the change in the shape of the desired semantics, SemSynth
remains unchanged.

Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
Synthesizing Formal Semantics from Executable Interpreters 284:13

Example 4.8 (Synthesizing semantics of while loops for Imp.). Continuing from Example 4.7,
SemSynth eventually considers the while production. We assume that the grammar 𝐺 additionally
annotates whether each production is semantically recursive.
After several iterations of the CEGIS loop, the example set 𝐸 contains the examples ⟨0, 𝑡, 0⟩,
⟨1, 𝑡, 0⟩, and ⟨2, 𝑡, 0⟩, where 𝑡 is the term while 0 < x do x := x − 1. In this iteration, SynthSeman-
ticConstraint gets called with a recursive summary of 𝑡 containing the three examples, and
examples for x := x − 1 and 0 < x.
In this iteration, SynthSemanticConstraint finds the correct 𝑃rec , 𝐹 non−rec and 𝐹 rec . Verify
determines that the result is indeed correct and the main loop of SemSynth continues to the next
production. If while is the last production of the considered grammar 𝐺, then SemSynth terminates
and returns the synthesized semantics for each production.
Now that we have defined how SemSynth handles semantically recursive productions, SemSynth
is fully specified. Theorem 4.9 states that SemSynth is sound.
Theorem 4.9 (SemSynth is sound). For any semantics-synthesis problem P = ⟨𝐺, I, E⟩, if
SemSynth(𝐺, I, E) returns a semantics Sem, then Sem is a solution to P.
Proof. SemSynth iterates over the productions in some order, say 𝑝 0, . . . , 𝑝𝑘 −1 . For all iterations
0 ≤ 𝑖 ≤ 𝑘, SemSynth maintains the invariant that the synthesized semantics Sem is correct with
respect to the oracle E for all previously considered productions 𝑝 0 through 𝑝𝑖 −1 . This condition
trivially holds on the first iteration. To proceed to iteration 𝑖 + 1, the CEGIS loop for production
𝑝𝑖 must terminate. For the CEGIS loop to terminate, Verify must return an empty set of counter-
examples, which implies that Sem is correct for the production 𝑝𝑖 (and that the semantics for
productions 𝑝 1, . . . , 𝑝𝑖 −1 were left unmodified)—and thus the invariant is maintained. The algorithm
only terminates after exploring all productions. Consequently, upon termination, Sem must be
correct for all productions of 𝐺—i.e., Sem satisfies the given semantics-synthesis problem P. □

While Theorem 4.9 states the soundness of SemSynth, it fails to show that SemSynth will
eventually synthesize a correct semantics. Theorem 4.10 states that SemSynth makes progress.
Intuitively, it states that once a semantic rule for production 𝑝 is explored during some iteration of
the CEGIS loop, it is never explored in any future iteration of the CEGIS loop for production 𝑝.
Theorem 4.10 (SemSynth makes progress). For any semantics-synthesis problem P = ⟨𝐺, I, E⟩,
if SemSynth(𝐺, I, E) is synthesizing the semantics of production 𝑝 and on the 𝑘 th iteration of the
CEGIS loop for production 𝑝, SynthSemanticConstraint produces the semantic relation 𝑅𝑘 , then for
all future iterations 𝑗 > 𝑘, SynthSemanticConstraint will return some relation 𝑅 𝑗 ≠ 𝑅𝑘 .
Proof. Assume that the negation holds, i.e., “∃𝑗 > 𝑘.𝑅 𝑗 = 𝑅𝑘 ”. By the assumption 𝑗 > 𝑘, it must
be that Verify (𝑅𝑘 , 𝑝, I, E) returned a non-empty set of examples CEX. Otherwise, the CEGIS loop
for production 𝑝 would have immediately terminated and not continued to iteration 𝑗. By definition,
𝑅𝑘 is inconsistent with the set of counter-examples CEX. The returned counter-examples CEX are
then added to the example set 𝐸 for all future iterations. By assumption, 𝑅 𝑗 must be consistent with
the example set 𝐸, and thus 𝑅 𝑗 must not be 𝑅𝑘 , a contradiction. □

5 Implementation
This section gives details of Synantic, which implements our approach to synthesizing semantics
via the algorithm SemSynth. Synantic is developed in Scala (version 2.13), and uses cvc5 (version
1.0.3) to solve SyGuS problems—which are used within our implementation of SynthSemantic-
Constraint to generate candidate semantic constraints. The remainder of this section is structured

Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
284:14 Jiangyi Liu, Charlie Murphy, Anvay Grover, Keith J.C. Johnson, Thomas Reps, and Loris D’Antoni

as follows: Section 5.1 details how we implement SynthSemanticConstraint. Section 5.2 summa-
rizes the implementation of Verify, and explains how we approximate an equivalence oracle for an
interpreter. Section 5.3 presents an optimization of SynthSemanticConstraint for productions
with multiple outputs (i.e., where the output type of a production is a tuple).

5.1 Implementation of SynthSemanticConstraint


In Section 4, SemSynth is parameterized on the procedure SynthSemanticConstraint. On line 5
of Algorithm 1, we assume that SynthSemanticConstraint produces a semantic constraint 𝐹
for production 𝑝𝑖 that satisfies the example set 𝐸. To accomplish this task, we construct a SyGuS
problem consisting of a grammar of allowable semantic constraints and a set of conditions to
enforce that the semantic constraint is consistent with the example set. To handle productions
whose semantics does not evaluate its child terms from left to right, we run in parallel a version
of SynthSemanticConstraint for each permutation of the child terms and immediately return
upon any permutation’s success. In practice, for all of our benchmarks, all the productions evaluate
their children from left to right.
We defer discussion of the SyGuS grammars we use to Section 6.1 when we discuss each
benchmark. The specification of the semantic constraint is exactly the condition specified in
Equation (3).

5.2 Implementation of Verify


In Section 4, Algorithm 1 is parameterized on the procedure Verify (line 6), which uses the
equivalence oracle E to determine if the learned semantics Sem is consistent with the interpreter
for all terms of the form 𝑝 (𝑡 1, . . . , 𝑡𝑘 ) ∈ 𝐿(𝐺) for some production 𝑝. In Synantic, we approximate
an equivalence oracle using fuzzing. Specifically, we randomly generate terms and inputs and use
the interpreter I to generate an output. We then use the learned constraint for 𝑝𝑖 to generate
inputs to each sub-term (from left to right), and compute outputs for each using interpreter I. In
effect, we are computing a new example set 𝐸 ′ , and testing the semantic constraints learned so far.
If any example disagrees with the learned semantics of production 𝑝, we return the example (and
necessary child-term summaries) as a counter-example.
When Verify fuzzes the semantics, it uses the interpreter to generate examples (i.e., terms with
corresponding input-output examples). During example generation, we set a recursion limit of 1,000
recursive calls. We discard an example—i.e., we assume the program does not terminate—if its run
exceeds the recursion depth. We then evaluate the candidate semantic constraint from left-to-right
to ensure that the semantic constraint is consistent with each of the generated examples. We return
the first example (and the child-term summaries for the example) that is inconsistent with the
candidate constraint.

5.3 Optimized SynthSemanticConstraint for Multi-Output Productions


In Section 5.1, we described how SynthSemanticConstraint produces and solves (using cvc5) a
SyGuS problem to synthesize a semantic constraint that is consistent with the current example set.
However, it is well known that SyGuS solvers scale poorly as a function of the size of the desired
grammar/result. This issue is especially problematic when learning a semantic constraint for a
language in which productions have multiple outputs (e.g., statements for Imp with more than one
variable) and thus the grammar and resulting constraint grow with the number of outputs.
For some languages, it is possible to augment the semantics of the language to use a suitable
theory to encode multiple outputs as a single output—e.g., using the theory of arrays to support
multi-variable states in the ImpArr language (cf. Section 6.1). However, for other languages this
methodology may require the use of theories that are not well suited for existing SyGuS and

Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
Synthesizing Formal Semantics from Executable Interpreters 284:15

Algorithm 2: Verifier implementation using (approximate) fuzzing based oracle.


1 Procedure Verify (𝑅𝑝 , 𝑝, I, ·)
2 𝐸 ← random set of examples of the form ⟨in, 𝑝 (𝑡 1, . . . , 𝑡𝑘 ), out⟩ consistent with I;
Ó
3 ( 𝑖 𝑥𝑖in = 𝑓𝑖 ) ∧ 𝑥𝑖out = 𝑓0 ← 𝑅𝑝 ; // Destruct semantic constraint to recover each 𝑓𝑖
4 for ⟨in, 𝑝 (𝑡 1, . . . , 𝑡𝑘 ), out⟩ ∈ 𝐸 do
5 𝐸 ′ ← ⟨in, 𝑝 (𝑡 1, . . . , 𝑡𝑘 ), out⟩;
6 𝑀 ← {𝑥 0in ↦→ in} ; // Build up model to evaluate sub-terms’s semantics
7 for 𝑖 ← 1 to 𝑘 do
8 in𝑖 ← J𝑓𝑖 K𝑀 ; // Get input to term 𝑡𝑖 .
9 out𝑖 ← I (𝑡𝑖 , in𝑖 ) ; // Evaluate term 𝑡𝑖
10 𝑀 ← 𝑀 [𝑥𝑖out ↦→ out𝑖 ] ; // Update model. Ensures next term’s input is defined.
11 𝐸 ′ ← 𝐸 ′ ∪ {⟨in𝑖 , 𝑡𝑖 , out𝑖 ⟩} ; // Add sub-term’s summary to set of examples
12 if J𝑓0 K𝑀 ≠ out then
13 return 𝐸 ′ ; // The output computed by evaluating 𝑅𝑝 is inconsistent with 𝐸 ′
14 return ∅

SMT solvers (e.g., RegEx(𝑘) in Section 6.1 would require the theory of strings). Instead, for such
instances we developed a variant of SynthSemanticConstraint that synthesizes a constraint for
each output independently. However, this process may lead to constraints that do not agree on the
internal data flow of the constraints (i.e., the functions determining the input to each child term). To
remedy this issue, our implementation of SynthSemanticConstraint uses an additional CEGIS
loop that resynthesizes the constraint for each output until all agree on the inputs to each child
term.
We detail SynthSemanticConstraint for 𝑁 outputs in Algorithm 3. For simplicity, we explain
how Algorithm 3 works for a production that has two outputs (i.e., 𝑁 = 2). Consider the case for
𝐴0 → 𝑝 (𝐴1, . . . , 𝐴𝑛 ) where 𝜏𝐴0 ≜ 𝜏1 × 𝜏2 . In this scenario, our goal is to synthesize two constraints
𝐹 and 𝐺 (i.e., 𝐹 = 𝐹 1 and 𝐺 = 𝐹 2 ),
𝐹 ≜ 𝑥 1 = 𝑓1 (𝑥 0 ) ∧ · · · ∧ 𝑥𝑛 = 𝑓𝑛 (𝑥 0, 𝑥 1′ , . . . , 𝑥𝑛−1

) ∧ 𝑥 0 0′ = 𝑓0 (𝑥 0, 𝑥 1′ , . . . , 𝑥𝑛′ ) (4)
𝐺 ≜ 𝑥 1 = 𝑔1 (𝑥 0 ) ∧ · · · ∧ 𝑥𝑛 = 𝑔𝑛 (𝑥 0, 𝑥 1′ , . . . , 𝑥𝑛−1

) ∧ 𝑥 0 1′= 𝑔0 (𝑥 0, 𝑥 1′ , . . . , 𝑥𝑛′ ) (5)
To determine if 𝐹 and 𝐺 agree on each child term’s input for example set 𝐸 ′ , we generate the
formula 𝜙 shown below, for each example ⟨in, 𝑝 (𝑡 1, . . . , 𝑡𝑛 ), out⟩ ∈ 𝐸 ′ :
′ ′
𝑥 0𝐹 = 𝑥 0𝐺 = in ∧ Summary(𝑡 1 ) (𝑥 1𝐹 , 𝑥 1𝐹 ) ∧ · · · ∧ Summary(𝑡𝑛 ) (𝑥𝑛𝐹 , 𝑥𝑛𝐹 )
′ ′
∧ Summary(𝑡 1 ) (𝑥 1𝐺 , 𝑥 1𝐺 ) ∧ · · · ∧ Summary(𝑡𝑛 ) (𝑥𝑛𝐺 , 𝑥𝑛𝐺 ) (6)
D ′ ′
E
∧ 𝐹 ∧ 𝐺 ∧ (𝑥 1𝐹 ≠ 𝑥 1𝐺 ∨ · · · ∨ 𝑥𝑛𝐹 ≠ 𝑥𝑛𝐺 ) ∧ 𝑥 0 0𝐹 , 𝑥 0 1𝐺 = out
which asks if 𝐹 and 𝐺 agree on the input to each child term for the given example. To make this
concept concrete, consider the following example.
Example 5.1 (Synthesizing Semantic Constraint for Multi-Output Production.). Consider the task of
synthesizing a semantics for x0 B in the language Imp2 , using the examples: ⟨⟨0, 1⟩ , x0 B x1, ⟨1, 1⟩⟩,
⟨⟨0, 1⟩ , x1, 1⟩, ⟨⟨1, 1⟩ , x1, 1⟩.
For the above examples, SynthSemanticConstraint might generate 𝐹 ≜ 𝑥 1,0 in = 𝑥 in ∧ 𝑥 in =
0,0 1,1
𝑥 0,1 ∧ 𝑥 0,0 = 𝑥 1,1 and 𝐺 ≜ 𝑥 1,0 = 𝑥 0,1 ∧ 𝑥 1,1 = 𝑥 0,1 ∧ 𝑥 0,0 = 𝑥 1,1 , where 𝑥𝑖,𝑗 is the 𝑗 th projection
in out out in in in in out out in

of 𝑥𝑖in . While both 𝐹 and 𝐺 are consistent with the examples, the data-flow of 𝐹 is not consistent

Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
284:16 Jiangyi Liu, Charlie Murphy, Anvay Grover, Keith J.C. Johnson, Thomas Reps, and Loris D’Antoni

Algorithm 3: SynthSemanticConstraint for multi-output productions.


1 Procedure SynthSemanticConstraint(𝑝, 𝐸)
2 𝐴0 → 𝜎 (𝐴1, . . . , 𝐴𝑛 ) ← 𝑝;
3 𝜏1 × · · · × 𝜏𝑁 ← 𝜏𝐴0 ; // Determine number of outputs for production 𝑝.
4 𝐷 ← true ; // Data flow constraints.
5 while true do
6 Γ ← 𝐷;
7 for 𝑖 ← 1 to 𝑁 do
// Construct per-output conditions (c.f., Equation (4))
8 𝐹𝑖 ← 𝑥 1 = 𝑓1𝐹𝑖 (𝑥 0 ) ∧ · · · ∧ 𝑥𝑛 = 𝑓𝑛𝐹𝑖 (𝑥 0, 𝑥 1′ , . . . , 𝑥𝑛−1
′ ) ∧ 𝑥 ′ = 𝑓 𝐹𝑖 (𝑥 , 𝑥 ′ , . . . , 𝑥 ′ );
0𝑖 0 0 1 𝑛
9 Γ ← Γ ∧ 𝐹𝑖 ;
// Generate SyGuS conditions (c.f., lines 1-2 of Equation (6))
𝐹 ′ ′ ′
Ó 
𝐹
10 𝜙← Summary(𝑡𝑖 ) (𝑥𝑖 𝑗 , 𝑥𝑖 𝑗 ) ∧ ⟨𝑥 0 𝐹0 0 , . . . , 𝑥 0𝑛𝐹𝑛 ⟩ = out;
𝑖,𝑗
Ó 
11 Γ ←𝜙∧ 𝐹𝑖
𝑖 𝑥 0 = in ;
12 𝑚 ← SolveSygus(Γ);
13 𝑀 ← CheckSat(𝜙);
// Check if inconsistency is found (line 3 of Equation (6))
𝐹
14 if 𝑀.sat ∧ ∃𝑖, 𝑗, 𝑘 : 𝑀 (𝑥𝑖 𝑗 ) ≠ 𝑀 (𝑥𝑖𝐹𝑘 ) then
// Inconsistency is caused by inaccurate summary of a child term
𝐹
15 if ∃𝑡𝑖 : ∀⟨in, 𝑡𝑖 , out⟩ ∈ 𝐸 : ∀𝑗 : in ≠ 𝑀 (𝑥𝑖 𝑗 ) then 𝐸 ← 𝐸 ∪ {⟨in, 𝑡, I (𝑡𝑖 , in)⟩} ;
// Real data flow inconsistency
Ô 
𝐹𝑗 𝐹𝑗
16 else 𝐷 ← 𝐷 ∧ 𝑥
𝑖,𝑗 𝑖 ≠ 𝑀 (𝑥 𝑖 ) ;
17 else
// No inconsistency
𝐹
18 merge 𝑓𝑖 𝑗 ∈ 𝑚 to form the solution;
19 return solution;

with the data-flow of 𝐺 (i.e., in 𝐹 , 𝑥 1,0


in is assigned 𝑥 in , while in 𝐺, 𝑥 in is assigned 𝑥 in ). We can
0,0 1,0 0,1
𝐹
construct the formula in Equation (6) for 𝐹 and 𝐺, and find out that in 𝐹 , the variable 𝑥 1,0
in takes
𝐺
the value 0, and in 𝐺, the variable 𝑥 1,0
in takes value 1. Thus, 𝐹 and 𝐺 are not consistent on data-
flows to children for the provided example. We generate a new condition for the next iteration of
SynthSemanticConstraint that asserts 𝑥 0,0 in 𝐹 ≠ 0 ∨ 𝑥 in 𝐺 ≠ 1.
0,0

In practice, we create a copy of each variable indexed by 𝐹 and 𝐺, respectively, to avoid clashing
variable names when encoding the constraints 𝐹 and 𝐺 within a single formula. To check the
consistency of 𝐹 and 𝐺’s data flows, we use cvc5 to check the satisfiability of the formula 𝜙 in
Equation (6). If 𝜙 is unsatisfiable, then 𝐹 and 𝐺 must agree on the inputs of all child terms for the
given examples. If so, then we may return either 𝐹 ∧ 𝑥 02 = 𝑔0 (. . . ) or 𝐺 ∧ 𝑥 01 = 𝑓0 (. . . ) (i.e., because
𝐹 and 𝐺 agree on all child term inputs, we may use either to constrain the data-flow to child terms).
If 𝜙 is satisfiable, then 𝐹 and 𝐺 do not agree on the input to all child terms. In this case, we find a
model that satisfies 𝜙. If there is some subterm 𝑡𝑖 such that there is no example ⟨in, 𝑡𝑖 , out⟩ ∈ 𝐸 such
that in = 𝑀 (𝑥𝑖𝐹 ) or in = 𝑀 (𝑥𝑖𝐺 ), then we add the example ⟨in, 𝑡, I (𝑡𝑖 , in)⟩ to the set of examples,
and resynthesize the constraints 𝐹 and 𝐺. Otherwise, we know that the sub-term summaries are

Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
Synthesizing Formal Semantics from Executable Interpreters 284:17

sufficient to fully specify both 𝐹 and 𝐺 for all examples in 𝐸. Thus, we must add a new constraint
that ensures the pair of constraints 𝐹 and 𝐺 are never synthesized again. To do this, we add a new
constraint 𝑥 0𝐹 ≠ 𝑀 (𝑥 0𝐹 ) ∨ 𝑥 0𝐺 ≠ 𝑀 (𝑥 0𝐺 ) ∨ · · · ∨ 𝑥𝑛𝐹 ≠ 𝑀 (𝑥𝑛𝐹 ) ∨ 𝑥𝑛𝐺 ≠ 𝑀 (𝑥𝑛𝐺 ), which ensures that the
input of at least one of the child terms for either 𝐹 or 𝐺 must change. A new candidate 𝐹 and 𝐺
are then synthesized. The CEGIS loop continues until it finds a valid pair of 𝐹 and 𝐺 for the set of
examples.

6 Evaluation
The goal of our evaluation is to answer the following questions:
RQ1 Can Synantic synthesize the semantics of non-trivial languages?
RQ2 Where is time spent during synthesis?
RQ3 Is the multi-output optimization from Section 5.3 effective?
RQ4 How do synthesized semantics compare to manually written ones?
All experiments were run on a machine with an Intel(R) i9-13900K CPU and 32 GB of memory,
running NixOS 23.10 and Scala 2.13.13. All experiments were allotted 2 hours, 4 cores of CPU, and
24 GB of memory. Cvc5 version 1.0.3 is used for SMT solving and SyGuS function synthesis. For
the total running time of each experiment, we report the median of 7 runs using different random
seeds. For every language, we record whether Synantic terminates within the given time limit of
2 hours, and when it does, we also record the set of synthesized semantic rules. A language that
does not terminate within the time limit on more than half of the seeds is reported as a timeout.

6.1 Benchmarks
We collected 15 benchmarks from the two sources discussed below. For every language discussed
in this section, we manually translated the semantics to a simple equivalent interpreter written in
Scala; our goal was then to synthesize an appropriate CHC-based semantics from the interpreter.
The one non-standard feature of our setup is that the interpreter must be capable of interpreting
the programs derived from any nonterminal in the grammar.
SemGuS benchmarks. Our first source of benchmarks is the SemGuS benchmark repository [14].
This dataset contains SemGuS synthesis problems where each problem consists of a grammar of
terms, a set of CHCs inductively defining the semantics of terms in the grammar, and a specification
that the synthesized program should meet. For our purposes, we ignored the specification and
collected the grammar plus semantics for 11 distinct languages that appear in the repository. We
do not consider languages that contain abstract data types (e.g., stacks) or require a large range of
inputs (e.g., ASCII characters) due to their poor support by the SyGuS solver. These languages gave
us 11 benchmarks.
Some of the languages used in the SemGuS benchmark set are parametric (denoted by a parameter
𝑘), meaning that the semantics is slightly different based on a given parameter (e.g., number of
program variables for IMP and length of the input string for regular expressions). For these
benchmarks, we ran Synantic on an increasing sequence of parameter values and reported the
largest parameter value for which Synantic succeeds.
RegEx(𝑘) is a language for matching regular expressions on strings of length 𝑘; Given a regular
expression 𝑟 and string 𝑠 of length 𝑘 (index starts from 0), the semantic functions produce a Boolean
matrix 𝑀 ∈ Bool (𝑘+1) × (𝑘+1) such that 𝑀𝑖,𝑗 = true iff the substring 𝑠𝑖...𝑗 −1 matches regular expression
𝑟 —here 𝑠𝑖...𝑖 denotes the empty string, and by definition, 𝑀𝑖,𝑗 = false for 𝑖 ≥ 𝑗.
Cnf(𝑘), Dnf(𝑘) and Cube(𝑘) are languages of Boolean formulas (of the syntactic kind indicated
by their names, i.e., conjunctive normal form, disjunctive normal form, and cubes) involving up to
𝑘 variables.

Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
284:18 Jiangyi Liu, Charlie Murphy, Anvay Grover, Keith J.C. Johnson, Thomas Reps, and Loris D’Antoni

Imp is an imperative language that contains common control flow structures, such as conditionals
and while loops, for programs with 𝑘 integer variables. Note that Imp includes operators such
as while and do_while for which the semantics involves semantically recursive productions
(Section 4.4). The complete semantics of Imp can be found in the supplementary material. Two
versions of Imp are used in our benchmarks. The first version is called Imp(𝑘), where we explicitly
record the states of 𝑘 variables as 𝑘 arguments of semantic functions. Synantic could synthesize
its semantics up to 𝑘 = 2. We also present another version of this language called ImpArr where an
arbitrary number of variables can be used. In ImpArr, variables are named var 0, var 1, . . . where
the subscript is any natural number. We use the theory of arrays to store the variable states into an
array, passing the array as an argument to the semantic function. The array is indexed by variable
id. When we present results later in the section, the results for both languages (i.e., Imp(2) and
ImpArr) are shown for comparison. (For Imp(2), the goal is to synthesize a semantics that works
on states with exactly 2 variables; for ImpArr, the goal is to synthesize a semantics that works for
states with any number of variables.)
IntArith is a benchmark about basic integer calculations, like addition, multiplication, and
conditional selection. It also includes three constants whose value can be specified in the input to
the semantic relations.
BvSimple(𝑘) describes bit-vector operations involving 𝑘 bit-vector constants. BvSimpleImp(𝑚, 𝑛)
is essentially a variant of BvSimple(𝑘) that augments the language with let-expressions. Param-
eters 𝑚 and 𝑛 mean that the language can use up to 𝑚 bit-vector constants and 𝑛 bit-vector
variables. BvSaturate(𝑘) and BvSaturateImp(𝑘) use the same syntaxes as BvSimple(𝑘) and
BvSimpleImp(𝑘), respectively, but operations use a saturating semantics that never overflows or
underflows.

Attribute-grammar synthesis [13]. Our second source of benchmarks is from the Panini tool for
synthesizing attribute grammars [13]. An attribute grammar (AG) associates each nonterminal of
an underlying context-free grammar with some number of attributes. Each production has a set
of attribute-definition rules (sometimes called semantic actions) that specify how the value of one
attribute of the production is set as a function of the values of other attributes of the production.
In a given derivation tree of the AG, each node has an associated set of attribute instances. The
attribute-definition rules are used to obtain a consistent assignment of values to the tree’s attribute
instances: each attribute instance has a value equal to its defining function applied to the appropriate
(neighboring) attribute instances of the tree. Effectively, AGs assign a semantics to programs via
attributes, and the underlying attribute-definition rules can be captured via CHCs. While there
are AG extensions to handle circular AGs [12, 19]—i.e., AGs in which some derivation trees have
attribute instances that are defined in terms of themselves—the work of Kalita et al. concerns
non-circular AGs.
Kalita et al. [13] present 12 benchmarks. We ignored 4 benchmarks that are either (i) not publicly
accessible, or (ii) use semantic functions that cannot be expressed in SMT-LIB and are thus beyond
what can be synthesized using a SyGuS solver—e.g., complex data structures, or (iii) identical to
existing benchmarks from other sources. We did not run their tool on our benchmarks because our
problem is more general than theirs, supporting a wider range of language semantics: the scope of
our work includes recursive semantics, which can be handled only indirectly in a system such as
theirs (which supports only non-circular AGs)—i.e., by introducing powerful hard-to-synthesize
recursive functions that effectively capture an entire construct’s semantics. The running time is
also not directly comparable, because Kalita et al.’s approach uses user-provided sketches (i.e.,
partial solutions to each semantic action), which simplifies the synthesis problem. In contrast, in

Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
Synthesizing Formal Semantics from Executable Interpreters 284:19

our work we do not assume that a sketch is provided for the semantic constraints and instead
consider general SyGuS grammars.
The remaining 8 benchmarks of Kalita et al. are consolidated as 4 languages (i.e., giving us
four benchmarks). IteExpr is a language of basic integer operations, comparison expressions, and
ternary if-then-else expressions (not statements). Our IteExpr benchmark subsumes benchmarks
B3, B4, and B5 of Kalita et al. because their only differences stem from whether the expression
is written in prefix, postfix, or infix notation. For Synantic, such surface-syntax differences are
unimportant because Synantic uses regular tree grammars to express a language’s abstract syntax,
and the underlying abstract syntax of prefix, postfix, and infix expressions is the same. BinOp is
a language of binary strings (combined from benchmarks B1 and B2 of Kalita et al.), along with
built-in functions for popcount (counting the number of ones) and binary-to-decimal conversion.
Currency is a language for currency exchange and calculation. Diff is a language for computing
finite differences. Because the original benchmark from Kalita et al. involves differentiation and
real numbers (which are not supported by existing SyGuS solvers), we modified the benchmark to
perform the related operation of finite differencing over integer-valued functions. Specifically, for a
function 𝑓 , its finite difference is defined as Δ𝑓 = 𝑓 (𝑥 +1)−𝑓 (𝑥). Starting from here, finite differences
for sums and products can be obtained compositionally, e.g., Δ (𝑢 · 𝑣) = 𝑢 (𝑥)Δ𝑣 (𝑥) + 𝑣 (𝑥 + 1)Δ𝑢 (𝑥).

SyGuS grammars. For each semantic function, we also provided a grammar for the SyGuS solver,
which contains the operators of the underlying logical theory and any specific functions that must
appear in the target semantics.
For instance, for all benchmarks using the logic fragment NIA, we allow the use of basic integer
operations and integer constants, along with language-specific operations like conditional operators
(if-then-else).
For the languages Diff and Currency we did not include conditional operators, because they
do not appear in the semantics.
For BVSaturated and BVIMPSaturated we provided operators for detecting overflow and
underflow.
Lastly, for languages known to be free of side effects, we modified the SyGuS grammars to forbid
data flow between siblings, and only allow parent-to-child and child-to-parent assignments.

6.2 RQ1: Can Synantic Synthesize the Semantics of Non-trivial Languages?


Table 1 presents a highlight of the results of running Synantic on each benchmark (column 1) for
each production rule (column 2). For the parametric languages, we ran each benchmark up to the
largest parameter 𝑘 for which the solver timed out and reported the running time and other metrics
for the largest such 𝑘 (more details below). The third column provides the median number of
CEGIS iterations taken to synthesize each production, and the fourth column provides the median
number of ⟨in, term, out⟩ counterexamples found for one production rule. We take the median of
total execution time on one production rule and list it in column 7. Columns 5–6 are breakdowns
of the total time into time for SyGuS solving and time for SMT solving. To summarize, Synantic
could synthesize complete semantics for 12/15 ≈ 80% of benchmark languages (two languages
exist for the Imp benchmark, see below).
For RegEx(𝑘) (𝑘 = 2, . . . , 8) , Synantic could synthesize a semantics for up to 𝑘 = 2. For Cnf(𝑘)
(𝑘 = 4, . . . , 8), Dnf(𝑘) (𝑘 = 4, . . . , 8), and Cube(𝑘) (𝑘 = 4, . . . , 11), Synantic could synthesize seman-
tics for all parameters included in the SemGuS benchmarks. For the bit vector benchmarks, Synantic
could synthesize a semantics for BVSimple(𝑘) up to 𝑘 = 3, and a semantics for BVIMPSimple(𝑚, 𝑛)
((𝑚, 𝑛) ∈ {(1, 2), (3, 3)}) up to 𝑚 = 1 and 𝑛 = 2.

Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
284:20 Jiangyi Liu, Charlie Murphy, Anvay Grover, Keith J.C. Johnson, Thomas Reps, and Loris D’Antoni

Table 1. Detailed results for selected benchmarks. See supplementary material for the full list of results.

Lang. Rule # Iter. # Ex SyGuS (s) SMT (s) Total (s)


𝐸 →0 1 1 0.01 0.01 0.04
𝐸 →1 1 1 0.01 0.01 0.04
𝐵 →f 1 1 0.01 0.01 0.05
𝐵 →t 1 1 0.01 0.01 0.10
𝑆 → dec_var𝑖 3 2 4.29 0.56 5.36
𝑆 → inc_var𝑖 3 2 3.56 0.54 4.51
𝐵 → ¬𝐵 3 2 0.01 1.42 5.53
ImpArr

𝐸 → var𝑖 3 2 0.01 0.28 0.66


𝐸 →𝐸 + 𝐸 3 2 0.02 6.51 12.38
𝐸 →𝐸 − 𝐸 3 2 0.01 6.43 12.13
𝐵 →𝐸 < 𝐸 4 3 0.01 3.38 10.33
𝐵 →𝐵 ∧ 𝐵 4 3 0.06 2.36 6.23
𝑆 → var𝑖 B 𝐸 2 1 0.03 8.11 11.60
𝐵 →𝐵 ∨ 𝐵 5 4 0.03 2.42 6.14
𝑆 →𝑆 ; 𝑆 3 1 0.02 13.88 25.91
𝑆 → do_while 𝑆 𝐵 5 2 0.22 342.25 499.11
𝑆 → while 𝐵 𝑆 4 2 0.10 218.66 321.11
𝑆 → ite 𝐵 𝑆 𝑆 4 2 0.03 7.08 27.82
𝐸 →0 1 1 0.01 0.01 0.05
𝐸 →1 1 1 0.01 0.01 0.04
𝑆 →x − − 2 2 0.06 0.02 0.11
𝑆 →y − − 2 2 0.11 0.03 0.17
𝐵 →f 1 1 0.01 0.01 0.06
𝑆 →x + + 2 2 0.04 0.03 0.11
𝑆 →y + + 2 2 0.12 0.02 0.16
𝐵 →t 1 1 0.01 0.02 0.13
IMP(2)

𝐸 →x 2 2 0.01 0.01 0.04


𝐸 →y 1 1 0.01 0.01 0.04
𝑆 →x B 𝐸 2 2 0.10 3.23 6.17
𝑆 →y B 𝐸 2 2 0.04 3.22 6.19
𝐵 → ¬𝐵 3 3 0.02 2.49 5.26
𝐸 →𝐸 + 𝐸 4 3 0.05 8.52 14.83
𝐸 →𝐸 − 𝐸 5 2 0.13 8.03 13.83
𝐵 →𝐸 < 𝐸 8 5 0.08 7.50 13.66
𝐵 →𝐵 ∧ 𝐵 4 4 0.03 5.33 11.71
𝐵 →𝐵 ∨ 𝐵 4 4 0.05 4.61 8.99
𝑆 →𝑆 ; 𝑆 5 3 4.55 15.00 72.53
𝑆 → do_while 𝑆 𝐵 27 35 858.50 257.33 1374.13
𝑆 → while 𝐵 𝑆 9 7 16.88 122.41 266.80
𝑆 → ite 𝐵 𝑆 𝑆 11 5 525.28 33.88 628.71
𝐵 →0 1 1 0.01 0.01 0.07
𝐵 →1 1 1 0.01 0.01 0.22
𝐵 →x 2 2 0.01 0.01 0.08
BinOp

𝑁 → atom 𝐵 2 2 0.09 0.04 0.30


𝑀 → atom′ 𝐵 3 3 0.07 0.05 0.26
𝑆 → bin2dec 𝑀 2 2 0.02 0.09 0.30
𝑆 → count 𝑁 2 2 0.04 0.05 0.24
𝑁 → concat 𝑁 𝐵 5 5 8.61 0.22 10.31
𝑀 → concat′ 𝑀 𝐵 5 5 288.81 0.23 308.50
𝑆𝑡𝑎𝑟𝑡 → eval 𝑅 3 3 0.02 4.43 13.40
𝑅 →? 3 3 3.84 0.07 4.07
𝑅 →a 4 4 11.10 0.07 11.53
RegEx(2)

𝑅 →b 5 5 11.63 0.06 12.01


𝑅 →𝜖 1 1 0.07 0.07 2.38
𝑅 →∅ 1 1 0.19 0.07 0.46
𝑅 → !𝑅 5 5 2.85 15.77 77.36
𝑅 → 𝑅∗ 6 6 0.99 13.06 31.91
𝑅 →𝑅 · 𝑅 24 24 333.71 72.58 495.45
𝑅 →𝑅 | 𝑅 10 10 10.96 59.54 140.82

For all these parametric cases that timeout, the number of input and output variables in semantic
functions is large: 10 inputs and 10 outputs for RegEx(3).
Additionally, Synantic timed out for the benchmarks Diff, BVSaturated, and BVIMPSatu-
rated.2 For Diff, 4 of the 7 runs resulted in a timeout, so Diff is reported as a timeout (even
though at least one run could synthesize the semantics of all the productions). For the 4 runs that
2 Data for some languages are only listed in the supplementary material.

Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
Synthesizing Formal Semantics from Executable Interpreters 284:21

timed out, Synantic can solve the semantics of 5 of the 6 productions in the grammar. Synantic
could synthesize the semantics of 9/18 productions for BVIMPSaturated, and 10/17 productions
for BVSaturated in at least one run.
In benchmarks that timed out, the time-out happened during a call to the SyGuS solver—i.e., the
functions to be synthesized were too complex (more details in Section 6.3).

Finding: To answer RQ1, Synantic can synthesize semantics for many non-trivial languages as
long as the semantics does not involve very large functions (more than 20 terms).

6.3 RQ2: Where is Time Spent during Synthesis?


SyGuS vs SMT Time. Appendix B also presents the breakdown of how much time the solver
spends solving SyGuS problems (to find candidate functions) and calling SMT solvers (to compute
complete summaries). Among all the benchmarks, a median of 16.24% of the total solving time
is spent on SyGuS problems, and a median of 19.90% of the time is spent solving SMT queries.
However, for the slowest 10% production rules (>32.17 s), the median of SyGuS solving time grows
to 64.91%, which indicates that SyGuS contributes to most of the execution time on slow-running
cases.
Among all benchmarks, 90% of the per-production semantics are solved within 32.17 s. The
12 rules that take longer than 32.17 s to be synthesized are all non-leaf rules and their partial
semantic constraints fall into the following three categories: (i) 5 of them contain large integers or
complex SMT primitives (e.g., 32-bit integer division, theory of arrays); (ii) 3 of them involve large
logical formulas with sizes ranging between 8 and 24 subterms, e.g., formulas representing 3 × 3
matrix multiplication or other matrix operations; (iii) 4 of them contain multiple input and output
parameters of semantic functions that correspond to variable states, e.g., while and do_while.
In particular, Synantic takes 1374.13 s to synthesize the CHC for do_while in Imp(2) because
there can be many possible ways to modify the data flow between the production’s child terms;
this aspect occurs in many CEGIS iterations. In all of the above cases, as expected from known
limitations of CVC5, the SyGuS and SMT solvers account for most of the execution time—45.63%
and 27.98% of the total running time is spent calling the SyGuS and SMT solvers, respectively.

Relation to CEGIS Iterations and Size of Solutions. Table 1 hints that the cost of synthesizing a
semantics may be proportional to the number of CEGIS iterations, which in general is a good
indicator of the complexity of a formula (and of how expressive the underlying SyGuS grammar
is). Additionally, the cost should also be proportional to the size of synthesized parts in the SyGuS
problems, which directly indicates formula complexity. We plotted Figure 1 to better understand
those relations by using the data from some slowest benchmarks.
Figure 1a shows the relationship between the time for synthesizing a per-production rule
semantics and the size of the final semantics. For the same language, the time grows exponentially
with the increase in the size of the final solution. Figure 1b shows that the time also grows
exponentially with the increase in solution size for per-output partial semantic constraint.
Because the performance varies greatly across different benchmarks, to better understand the
impact of CEGIS iterations, we focus our attention on one difficult benchmark, Imp(2). Specifically,
we analyze the time taken to synthesize the semantic rule for do_while, which was one of the
hardest productions in our benchmark set (2,500s). Figure 4 provides a stack plot detailing the
running time for all 16 CEGIS iterations needed to synthesize do_while. As expected, as more
examples are accumulated by CEGIS iterations, the SyGuS solver requires more time. The execution
time for different parts is plotted by the areas of different colors. We can conclude that for the rule
of do_while, SyGuS solver takes 64.3% of the execution time.

Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
284:22 Jiangyi Liu, Charlie Murphy, Anvay Grover, Keith J.C. Johnson, Thomas Reps, and Loris D’Antoni

5000
Imp(2) 1500
4000 ImpArr
1250
Diff Imp(2)
Total time (s)

Total time (s)


3000 Regex 1000 ImpArr
BinOp Diff
750
2000 Regex
500 BinOp
1000
250

0 0

0 10 20 30 40 50 5 10 15
Size of final semantics for one production Size of candidate partial semantics

(a) Time vs. semantic constraint size (b) Time vs. partial semantic constraint size
1 1
Fig. 1. Plots relating the time to synthesize the semantics of one production rule vs final semantic constraint
solution size (a) and partial semantic constraint solution size (b). We only included selected slowest benchmarks
due to graph size limit.

J·KImp(2)
Sem.S : L (𝑆) × Z × Z × Z × Z
J·KImpArr Z Z
Sem.S : L (𝑆) × A Z × A Z

Fig. 2. Selected differences of semantic signatures between Imp(2) and ImpArr. AZZ stands for an SMT array
mapping integers to integers.

J𝑏K(𝜎) = 𝑣 J𝑠 1 K(𝜎) = 𝜎 ′ J𝑠 2 K(𝜎) = 𝜎 ′′


Ite
Jif 𝑏 then 𝑠 1 else 𝑠 2 K(𝜎) = 𝑣 ? 𝜎 ′ : 𝜎 ′′

J𝑏K(𝜎) = true J𝑠K(𝜎) = 𝜎 ′ Jwhile 𝑏 do 𝑠K(𝜎 ′ ) = 𝜎 ′′


WhileLoop
Jwhile 𝑏 do 𝑠K(𝜎) = 𝜎 ′′

J𝑏K(𝜎) = false
WhileEnd
Jwhile 𝑏 do 𝑠K(𝜎) = (𝜎)

Fig. 3. Selected semantic rules for ImpArr. 𝜎, 𝜎 ′, 𝜎 ′′ are arrays.

Simplifying synthesis with appropriate SMT theory. To our surprise, the language ImpArr, which
uses the theory of array to model an arbitrary number of variables, takes 1041.2s on average, being
nearly twice as fast as Imp(2). To understand why this is the case, note the difference in their
semantic signatures (Figure 2): the signature of Imp(2)’s semantics contains 3 input arguments and
2 output arguments. However, the signature of ImpArr’s semantics contains only 2 input arguments
and 1 output argument, packing program states into a single array rather than 𝑘 arguments (see
Figure 3 for some examples). By choosing an appropriate theory, the signature of the semantics
can be simplified, thus shrinking the solution space for synthesis.

Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
Synthesizing Formal Semantics from Executable Interpreters 284:23

×106
×103
3.0 25 8
SMT Examples
SyGuS 7

Time (optimization off)


2.5
20

Total #Examples
Other 6
2.0
Time (ms)

5
15
1.5 4

10 3
1.0
2
5 Time
0.5 1
No speedup
0.0 0
0 5 10 15 1 2 3 4 5 6 7 8
Iteration Index Time (optimization on) ×103

Fig. 4. Execution time per1 iteration for do_while in


Imp(2) 1
Fig. 5. Speedup provided by optimization

Finding: To answer RQ2, Synantic spends most of the time (71.78%) solving SyGuS problems,
and the time is affected by the size of the candidate semantic function.

6.4 RQ3: Is the Multi-output Optimization from Section 5.3 Effective?


Figure 5 compares the running time of Synantic with and without the multi-output optimization
(Section 5.3) on all the runs of our tools for the 7 different random seeds.
With the optimization turned off, Synantic timed out on 10 more runs (specifically all the 7
runs for RegEx and 3 more runs for Diff). All the benchmarks for which disabling the optimization
caused a timeout have 3 or more output variables. Comparing Figure 1a and Figure 1b shows how
the semantic functions used in the RegEx benchmarks are very large (up to size 50), but thanks to
the optimization, our algorithm only has to solve SyGuS problems on formulas of size at most 15.
On the runs that terminated both with and without the optimization, the non-optimized algorithm
is on average 8% faster—i.e., the two versions of the algorithm have comparable performance.
However, for 15/98 runs the optimization results in a 20% or more slowdown. When inspecting these
instances, we observed that the multi-output optimization spent many iterations synchronizing
the many possible data flows for productions where the final term was actually small but many
variables were involved—e.g., sequential composition in Imp(2).
Finding: The multi-output optimization from Section 5.3 is effective for languages with 3 or more
output variables in their semantics.

6.5 RQ4: How do Synthesized Semantics Compare to Manually Written Ones?


The synthesized semantics for almost all of our benchmarks are either identical to the original
manually constructed one, or each CHC in the synthesized semantics is logically equivalent to the
CHC of the original semantics.
The one exception is the semantics synthesized for the language of RegEx(2), for which the
individual CHCs for Or, Concat, Neg, and Star are not logically equivalent to the manually-
written ones. For instance, consider the Concat rule for the semantics of concatenation. For this
construct, the manually written CHC is shown in Figure 6a, whereas Synantic synthesizes the CHC
shown in Figure 6b. The
 two CHCs are not logically
 equivalent.
 For example, if the children evaluate
true false false true false false
to the matrices 𝑀 = false false and 𝑀 ′ = false false , the outputs computed by the manually
true true

Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
284:24 Jiangyi Liu, Charlie Murphy, Anvay Grover, Keith J.C. Johnson, Thomas Reps, and Loris D’Antoni

   𝑀′ ′
𝑀0,1

𝑀0,0 𝑀0,1 0,0
J𝑒 1 K(𝑠) = 𝑀1,1 J𝑒 2 K(𝑠) = ′
𝑀1,1
′ ) (𝑀 ∧𝑀 ′ )∨(𝑀 ∧𝑀 ′ ) (𝑀 ∧𝑀 ′ )∨(𝑀 ∧𝑀 ′ )∨(𝑀 ∧𝑀 ′ )
! ConcatM
(𝑀0,0 ∧𝑀0,0 0,0 0,1 0,1 1,1 0,0 0,2 0,1 1,2 0,2 2,2
′ ) ′ )∨(𝑀 ∧𝑀 ′ )
J𝑒 1 · 𝑒 2 K(𝑠) = (𝑀1,1 ∧𝑀1,1 (𝑀1,1 ∧𝑀1,2 1,2 2,2

(𝑀2,2 ∧𝑀2,2 )

(a) Manually written semantics


   𝑀′ ′
𝑀0,1

𝑀0,0 𝑀0,1 0,0
J𝑒 1 K(𝑠) = 𝑀1,1 J𝑒 2 K(𝑠) = ′
𝑀1,1
′ (𝑀 ∧𝑀 ′ )∨(𝑀 ′ ∧𝑀 ) (𝑀 ∧𝑀 ′ )∨(𝑀 ∧𝑀 ′ )∨(𝑀 ∧𝑀 ′ )
! ConcatS
𝑀2,2 ∧𝑀1,1 0,0 0,1 0,0 0,1 2,2 0,2 0,2 0,0 0,1 1,2
′ ′ ∧𝑀 )∨(𝑀 ∧𝑀 ′ )
J𝑒 1 · 𝑒 2 K(𝑠) = (𝑀0,0 ∨𝑀2,2 )∧𝑀2,2 (𝑀1,1 1,2 1,1 1,2
′ )
(𝑀1,1 ∧𝑀0,0

(b) Synthesized Semantics

Fig. 6. Manually-written and synthesized semantics for Concat in RegEx(2)

     ′ ′ 
𝑀 𝑀 ′ 𝑀0,0 𝑀0,1
J𝑒 1 K(𝑠) = 𝑀𝜖 , 0,0 𝑀0,1
1,1
J𝑒 2 K(𝑠) = 𝑀 𝜖 , ′
𝑀1,1
  (𝑀 ∧𝑀 ′ )∨(𝑀 ∧𝑀 ′ ) (𝑀 ∧𝑀 ′ )∨(𝑀 ∧𝑀 ′ )∨(𝑀 ∧𝑀 ′ )   Concat
J𝑒 1 · 𝑒 2 K(𝑠) = 𝑀𝜖 ∧ 𝑀𝜖′ ,
𝜖 0,0 0,0 𝜖 𝜖 0,1 0,0 1,1 0,1 𝜖
(𝑀𝜖 ∧𝑀 ′ )∨(𝑀1,1 ∧𝑀 ′ )
1,1 𝜖

Fig. 7. New semantics for Concat in RegExSimp.

   false false false 


true false false
written CHC and the synthesized CHC are 𝑀man = false false , and 𝑀syn = true false ,
true false
respectively, which have different values on the diagonal.
When inspecting the two rules, we realized that the example matrices 𝑀 and 𝑀 ′ shown above
cannot actually be produced by the semantic rules for regular expressions. In particular, the
examples require different Boolean values to appear on the diagonal of one 3 × 3 matrix. However,
all the elements on the diagonal represent the semantics of the regular expression on the empty
string, so they must all have the same value! We note that this inconsistency in the semantics
can also be observed without a reference semantics to compare against because different runs of
the algorithm could return logically inequivalent CHCs—in fact, such inequivalence was how we
initially discovered the inconsistency.
Synantic helped us discover an inefficiency in the semantics that was being used in the standard
regular expressions benchmarks in the SemGuS repository. We thus modified  the interpreter so
𝑀0,1 𝑀0,2
that for the example above it only produces a 2 × 2 matrix 𝑀 = 𝑀1,2 (corresponding to the
non-empty substrings of the input string) and a single variable 𝑀𝜖 to denote whether the regular
expression should accept the empty string (instead of the previous multiple copies of logically
equivalent variables). This semantics reduces the total number of variables in the semantic domain
from 6 to 4 in this example.
We call this new semantics RegExSimp (see Figure 7 for an example). After modifying the
interpreter to produce this new semantics, Synantic synthesized the corresponding CHCs in a
median of 1968.00 s.
To check whether the semantics RegExSimp is indeed more efficient than the original semantics
RegEx, we modified all the 28 regular-expression synthesis benchmarks appearing in the SemGuS

Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
Synthesizing Formal Semantics from Executable Interpreters 284:25

benchmark set. Each of these benchmarks requires one to find a regular expression that accepts
some examples and rejects others.
We then used the Ks2 enumeration-based synthesizer to try to solve all the benchmarks with
either of the two semantics. Because Ks2 enumerates programs of increasing size and uses the
semantics to execute them and discard invalid program candidates, we conjectured that executing
programs faster allows Ks2 to explore the search space faster.
Ks2 was faster at solving synthesis problems with the RegExSimp semantics than with the RegEx
ones (although both solved the same set of benchmarks). Although the speedup over all benchmarks
is only 1.1x, the new semantics RegExSimp was particularly beneficial for the harder synthesis
problems. When considering the 13 benchmarks for which synthesis using the RegEx took longer
than one second, the speedup increased to 1.18x.

Finding: Synantic synthesized semantics that were identical to the manually written ones
for 13/14 benchmarks. When Synantic found a logically inequivalent semantics, it unveiled a
performance bug.

7 Related Work
Semantics-based Synthesis vs. Library-based Synthesis.
As discussed throughout the paper, our framework is intended for extracting a formal SemGuS
semantics that can be used to then take advantage of existing SemGuS synthesizers. One can
compare our two-step approach (i.e., first synthesizing the semantics in SemGuS format, then using
it to synthesize programs) to one-step approaches that only use the given program interpreter in
a closed-box fashion to evaluate input-output examples and use them to perform example-based
synthesis. (Such an approach is also used when synthesizing programs that contain calls to closed-
box library functions [10].) On one hand, the closed-box example-based approach is flexible because
it can be used for a library/language of any complexity. On the other hand, our approach allows
one to use any program synthesizer, even constraint-based ones [14], and to synthesize programs
that meet logical (and not just example-based) specifications (e.g., as in [21]) our approach provides
an explicit logical representation of the program semantics, whereas example-based approaches
are limited to generate-and-test synthesis techniques, such as program enumeration [11] and
example-based specifications.
Synthesis of Recursive Programs. At a high level, the semantics-synthesis problem we consider is
similar to a number of works on synthesizing recursively defined programs [8, 9, 15, 20]. In effect, a
semantics for a recursively defined grammar is a recursive program assigning meaning to programs
within the language. Both Farzan et al. [8], Farzan and Nicolet [9] use recursion skeletons to reduce
their task from synthesizing a recursive program to synthesizing a non-recursive program. Our
use of semantic constraints plays a similar role. While both of their techniques assume programs
are only structurally recursive (i.e., no recursion on the program term itself), and our framework
explicitly allows for program terms that are self-recursive (e.g., while loops in Imp).
Similar to the approach used by Miltner et al. [20] to synthesize simple recursive programs,
SemSynth employs a bottom-up approach to synthesis (i.e., we first synthesize semantics for nullary
productions before moving on to other productions). However, unlike Miltner et al., SemSynth
is well-defined for any ordering of production rules and targets a more complex setting—i.e.,
synthesizing program semantics.
Finally, Lee and Cho [15] synthesize recursive procedures from examples by first finding likely
sub-expressions that can be used to build a complex recursive program and then guessing the
recursive structure of the program. The key difference is that our approach targets a more restricted
program, synthesizing program semantics, and therefore already has the recursive structure in hand

Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
284:26 Jiangyi Liu, Charlie Murphy, Anvay Grover, Keith J.C. Johnson, Thomas Reps, and Loris D’Antoni

(thanks to the presence of the grammar in the problems Synantic is solving). Because we have
limited the synthesis target to an inductively defined program semantics, Synantic can directly
focus its effort on synthesizing the semantic functions of each CHC using well-known synthesis
techniques.
Datalog Synthesis. Albarghouthi et al. [1] synthesize Datalog programs (i.e., Horn clauses) with
SMT solvers, whereas Si et al. [22] use a syntax-guided approach. In our work, we use constrained
Horn clauses, which are strictly more expressive than Datalog programs, to denote semantics. Aside
from the fact that the Datalog-synthesis problem considers different inputs (i.e., the data), a CHC
also contains a function in a theory T (such as LIA or BV), which Synantic has to synthesize.
Synthesizing Attribute Grammars. Kalita et al. [13] proposed a sketch-based method for synthesiz-
ing attribute grammars. When provided with a context-free grammar, their tool can automatically
create appropriate semantic actions from sketches of attribute grammars. Instead of semantic ac-
tions, in our work we use CHCs to express program semantics. Our approach can model recursive
semantics whereas the technique by Kalita et al. is limited to non-circular attribute grammars. Ad-
ditionally, while their method requires providing a distinct program sketch (i.e., a partial program)
for each production, our approach only requires providing a (fairly general) SyGuS grammar for
each nonterminal in the language.

8 Conclusion
Writing logical semantics for a language can be a difficult task and our work supplies a method to
automatically synthesize a language’s semantics from an executable interpreter that is treated as a
closed-box. By generating example terms and input-output pairs from the interpreter, we use a
SyGuS solver to synthesize semantic rules. Our evaluation shows that the approach applies to a
wide range of language features, e.g., recursive semantic functions with multiple outputs.
As discussed in Section 2, one motivation for this work is to be able to generate automatically
the kind of semantics that is needed to create a program synthesizer using the SemGuS framework.
In our algorithm, we harness a SyGuS solver to synthesize the constraint in each CHC—i.e., we
harness SyGuS in service to SemGuS—which limits us to synthesizing constraints that are written in
theories that SyGuS supports. Going forward, we would like to make use of “higher-level” theories,
supporting such abstractions as stores or algebraic data types. As SemGuS-based synthesizers and
verifiers improve, we might be able to satisfy this wish by using SemGuS in service to SemGuS!
That is, we could extend Synantic to use SemGuS solvers to synthesize semantic constraints.

Data-Availability Statement
The artifact that contains Synantic and all benchmark data is available on Zenodo [17].

Acknowledgments
Supported, in part, by a Microsoft Faculty Fellowship; a gift from Rajiv and Ritu Batra; and NSF
under grants CCF-1750965, CCF-1918211, CCF-2023222, CCF-2211968, and CCF-2212558. Any
opinions, findings, and conclusions or recommendations expressed in this publication are those of
the authors, and do not necessarily reflect the views of the sponsoring entities.

References
[1] Aws Albarghouthi, Paraschos Koutris, Mayur Naik, and Calvin Smith. 2017. Constraint-based synthesis of datalog
programs. In Principles and Practice of Constraint Programming: 23rd International Conference, CP 2017, Melbourne, VIC,
Australia, August 28–September 1, 2017, Proceedings 23. Springer, 689–706.
[2] Rajeev Alur, Rastislav Bodík, Garvit Juniwal, Milo M. K. Martin, Mukund Raghothaman, Sanjit A. Seshia, Rishabh
Singh, Armando Solar-Lezama, Emina Torlak, and Abhishek Udupa. 2013. Syntax-guided synthesis. In Formal Methods

Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
Synthesizing Formal Semantics from Executable Interpreters 284:27

in Computer-Aided Design, FMCAD 2013, Portland, OR, USA, October 20-23, 2013. IEEE, 1–8. https://ieeexplore.ieee.org/
document/6679385/
[3] Rajeev Alur, Arjun Radhakrishna, and Abhishek Udupa. 2017. Scaling enumerative program synthesis via divide
and conquer. In International conference on tools and algorithms for the construction and analysis of systems. Springer,
319–336.
[4] Haniel Barbosa, Clark Barrett, Martin Brain, Gereon Kremer, Hanna Lachnitt, Makai Mann, Abdalrhman Mohamed,
Mudathir Mohamed, Aina Niemetz, Andres Nötzli, Alex Ozdemir, Mathias Preiner, Andrew Reynolds, Ying Sheng,
Cesare Tinelli, and Yoni Zohar. 2022. cvc5: A Versatile and Industrial-Strength SMT Solver. In Tools and Algorithms for
the Construction and Analysis of Systems, Dana Fisman and Grigore Rosu (Eds.). Springer International Publishing,
Cham, 415–442.
[5] Clark Barrett, Aaron Stump, and Cesare Tinelli. 2010. The SMT-LIB Standard: Version 2.0. In Proceedings of the 8th
International Workshop on Satisfiability Modulo Theories (Edinburgh, UK), A. Gupta and D. Kroening (Eds.).
[6] Xiaohong Chen and Grigore Rosu. 2019. A Semantic Framework for Programming Languages and Formal Analysis. In
Engineering Trustworthy Software Systems - 5th International School, SETSS 2019, Chongqing, China, April 21-27, 2019,
Tutorial Lectures (Lecture Notes in Computer Science, Vol. 12154), Jonathan P. Bowen, Zhiming Liu, and Zili Zhang (Eds.).
Springer, 122–158. https://doi.org/10.1007/978-3-030-55089-9_4
[7] Loris D’Antoni, Qinheping Hu, Jinwoo Kim, and Thomas Reps. 2021. Programmable program synthesis. In Computer
Aided Verification: 33rd International Conference, CAV 2021, Virtual Event, July 20–23, 2021, Proceedings, Part I 33.
Springer, 84–109.
[8] Azadeh Farzan, Danya Lette, and Victor Nicolet. 2022. Recursion synthesis with unrealizability witnesses. In Proceedings
of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation. 244–259.
[9] Azadeh Farzan and Victor Nicolet. 2021. Counterexample-Guided Partial Bounding for Recursive Function Synthesis.
In Computer Aided Verification: 33rd International Conference, CAV 2021, Virtual Event, July 20–23, 2021, Proceedings,
Part I 33. Springer, 832–855.
[10] Kangjing Huang and Xiaokang Qiu. 2022. Bootstrapping Library-Based Synthesis. In International Static Analysis
Symposium. Springer, 272–298.
[11] Keith J. C. Johnson, Andrew Reynolds, Thomas Reps, and Loris D’Antoni. 2024. The SemGuS Toolkit. In Computer
Aided Verification, Arie Gurfinkel and Vijay Ganesh (Eds.). Springer Nature Switzerland, Cham, 27–40.
[12] Larry G. Jones. 1990. Efficient Evaluation of Circular Attribute Grammars. ACM Trans. Program. Lang. Syst. 12, 3
(1990), 429–462. https://doi.org/10.1145/78969.78971
[13] Pankaj Kumar Kalita, Miriyala Jeevan Kumar, and Subhajit Roy. 2022. Synthesis of semantic actions in attribute
grammars. In 2022 Formal Methods in Computer-Aided Design (FMCAD). IEEE, 304–314.
[14] Jinwoo Kim, Qinheping Hu, Loris D’Antoni, and Thomas Reps. 2021. Semantics-guided synthesis. Proceedings of the
ACM on Programming Languages 5, POPL (2021), 1–32.
[15] Woosuk Lee and Hangyeol Cho. 2023. Inductive synthesis of structurally recursive functional programs from non-
recursive expressions. Proceedings of the ACM on Programming Languages 7, POPL (2023), 2048–2078.
[16] Junghee Lim and Thomas W. Reps. 2013. TSL: A System for Generating Abstract Interpreters and its Application to
Machine-Code Analysis. ACM Trans. Program. Lang. Syst. 35, 1 (2013), 4:1–4:59. https://doi.org/10.1145/2450136.2450139
[17] Jiangyi Liu, Charlie Murphy, Anvay Grover, Keith Johnson, Thomas Reps, and Loris D’Antoni. 2024. Artifact of paper
"Synthesizing Formal Semantics from Executable Interpreters". https://doi.org/10.5281/zenodo.13368062
[18] Jiangyi Liu, Charlie Murphy, Anvay Grover, Keith J. C. Johnson, Thomas Reps, and Loris D’Antoni. 2024. Synthesizing
Formal Semantics from Executable Interpreters. arXiv:2408.14668 [cs.PL] https://arxiv.org/abs/2408.14668
[19] Eva Magnusson and Görel Hedin. 2003. Circular Reference Attributed Grammars - Their Evaluation and Applications.
In Workshop on Language Descriptions, Tools and Applications, LDTA@ETAPS 2003, Warsaw, Poland, April 12-13, 2003
(Electronic Notes in Theoretical Computer Science, Vol. 82), Barrett R. Bryant and João Saraiva (Eds.). Elsevier, 532–554.
https://doi.org/10.1016/S1571-0661(05)82627-1
[20] Anders Miltner, Adrian Trejo Nuñez, Ana Brendel, Swarat Chaudhuri, and Isil Dillig. 2022. Bottom-up synthesis of
recursive functional programs using angelic execution. Proceedings of the ACM on Programming Languages 6, POPL
(2022), 1–29.
[21] Charlie Murphy, Keith J. C. Johnson, Thomas Reps, and Loris D’Antoni. 2024. Verifying Solutions to Semantics-Guided
Synthesis Problems. arXiv:2408.15475 [cs.PL] https://arxiv.org/abs/2408.15475
[22] Xujie Si, Woosuk Lee, Richard Zhang, Aws Albarghouthi, Paraschos Koutris, and Mayur Naik. 2018. Syntax-guided
synthesis of datalog programs. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering
Conference and Symposium on the Foundations of Software Engineering. 515–527.

Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
284:28 Jiangyi Liu, Charlie Murphy, Anvay Grover, Keith J.C. Johnson, Thomas Reps, and Loris D’Antoni

A Semantics for languages used in benchmark


In this section, we present the semantics synthesized by our tool Synantic for any languages
referenced within the main text. Appendix A.1 provides a detailed discussion of how ImpArr extends
the Imp(𝑘) benchmarks to support an unbounded number of variables. Appendix A.2 provides
the synthesized semantics of SemGuS benchmarks and Appendix A.3 presents the synthesized
semantics of the attribute grammar benchmarks.

A.1 The ImpArr Benchmark with the Theory of Arrays


Languages like Imp(𝑘) are limited to an a priori defined number of variables. For example, Imp(2)
only allows two variables to be used in the language, because it explicitly passes the values of two
variables as two integer arguments of semantic functions (𝑣 0, 𝑣 1 in Figures 14 and 15). For a language
like Imp, we would rather have one semantics that can handle an unbounded number of variables.
Instead of using the explicit arguments 𝑣 0, 𝑣 1, . . . , 𝑣𝑘 , we can use an array argument to store the
variable values. The SMT theory of arrays [Barrett et al. 2010] provides select-store axioms that
conveniently allow us to access the elements of this array. The (select i a) axiom extracts the
value at index 𝑖 from an array 𝑎. The (store a i v) axiom returns a new array that is identical to
𝑎, except that the value at index 𝑖 is changed to 𝑣. According to these two axioms, the length of the
array is not explicitly defined. Therefore, arrays are suitable of expressing stores of unbounded
size. Variable evaluation and assignment can be respectively expressed using select and store.
The semantics of ImpArr that uses the theory of arrays is given in Figure 16.

A.2 SemGuS Benchmarks


The SemGuS suite of benchmarks consists of a total of 11 languages. We present the synthesized
semantics of each as follows:

(1) Cnf(𝑘) is depicted in Figure 8.


(2) Dnf(𝑘) is depicted in Figure 9.
(3) Cube(𝑘) is depicted in Figure 10.
(4) IntArith is depicted in Figure 11.
(5) RegEx(2) is depicted in Figure 12.
(6) RegExSimp(2) is depicted in Figure 13.
(7) Imp(2) is depicted in Figures 14 and 15.
(8) ImpArr is depicted in Figure 16.
(9) BvSimple(𝑘) is depicted in Figure 17.
(10) BvSaturated(𝑘) is depicted in Figure 18.
(11) BVImpSimple(𝑚, 𝑛) is depicted in Figure 19.
(12) BVImpSaturated(𝑚, 𝑛) is depicted in Figure 20.

A.3 Attribute-Grammar Synthesis


The suite of attribute-grammar benchmarks from [Kalita et al. 2022] consists of four languages
which we present as follows:

(1) BinOp is presented in Figure 21.


(2) Currency is presented in Figure 22.
(3) Diff is presented in Figure 23.
(4) IteExpr is presented in Figure 24.

Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
Synthesizing Formal Semantics from Executable Interpreters 284:29

B Benchmark Data
We present the full detailed evaluation results for all languages and productions rules in Table 2.
For each production we present the number of CEGIS iterations, number of generated examples,
and execution time (i) to solve SyGuS problems, (ii) to solve SMT queries, and (iii) overall. For each
column we repor the number for the median run based on the total run time. See Section 6 for full
description of experimental setup.

𝑖 = 0, 1, . . . , (𝑘 − 1) J𝑣K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟
VarAtom Var
Jv𝑖 K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑥𝑖 Jvar 𝑣K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟

J𝑣K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 J𝑐K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟
NotVar Clause
Jnvar 𝑣K(𝑥 0, . . . , 𝑥𝑘 −1 ) = ¬𝑟 Jclause 𝑐K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟

J𝑐K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 1 J𝑏K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 2
And
J𝑐 ∧ 𝑏K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 1 ∧ 𝑟 2

J𝑣K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 1 J𝑐K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 2
Or
J𝑣 ∨ 𝑐K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 1 ∨ 𝑟 2

Fig. 8. Semantics of Cnf(𝑘)

𝑖 = 0, 1, . . . , (𝑘 − 1) J𝑣K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟
VarAtom Var
Jv𝑖 K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑥𝑖 Jvar 𝑣K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟

J𝑣K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 J𝑐K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟
NotVar Conjunction
Jnvar 𝑣K(𝑥 0, . . . , 𝑥𝑘 −1 ) = ¬𝑟 Jconj 𝑐K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟

J𝑐K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 1 J𝑏K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 2
Or
J𝑐 ∨ 𝑏K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 1 ∨ 𝑟 2

J𝑣K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 1 J𝑐K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 2
And
J𝑣 ∧ 𝑐K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 1 ∧ 𝑟 2

Fig. 9. Semantics of Dnf(𝑘)

Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
284:30 Jiangyi Liu, Charlie Murphy, Anvay Grover, Keith J.C. Johnson, Thomas Reps, and Loris D’Antoni

𝑖 = 0, 1, . . . , (𝑘 − 1) J𝑣K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟
VarAtom Var
Jv𝑖 K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑥𝑖 Jvar 𝑣K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟

J𝑏K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 1 J𝑏K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 2
And
J𝑏 ∧ 𝑏K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 1 ∧ 𝑟 2

Fig. 10. Semantics of Cube(𝑘)

𝑖 = 0, 1, . . . , 3
IntLiteral True False
J𝑖K(𝑣 0 ) = 𝑖 JtK(𝑣 0 ) = true JfK(𝑣 0 ) = false

VarX VarY VarZ


JxK(𝑣 0 ) = 𝑣 0 JyK(𝑣 0 ) = 𝑣 0 JzK(𝑣 0 ) = 𝑣 0

J𝑏K(𝑣 0 ) = 𝑟 0 J𝑒 1 K(𝑣 0 ) = 𝑟 1 J𝑒 2 K(𝑣 0 ) = 𝑟 2 𝑟0


Ite1
Jite 𝑏 𝑒 1 𝑒 2 K(𝑣 0 ) = 𝑟 1

J𝑏K(𝑣 0 ) = 𝑟 0 J𝑒 1 K(𝑣 0 ) = 𝑟 1 J𝑒 2 K(𝑣 0 ) = 𝑟 2 ¬𝑟 0


Ite2
Jite 𝑏 𝑒 1 𝑒 2 K(𝑣 0 ) = 𝑟 2

J𝑒 1 K(𝑣 0 ) = 𝑟 1 J𝑒 2 K(𝑣 0 ) = 𝑟 2 J𝑒 1 K(𝑣 0 ) = 𝑟 1 J𝑒 2 K(𝑣 0 ) = 𝑟 2


Plus Multiply
J𝑒 1 + 𝑒 2 K(𝑣 0 ) = 𝑟 1 + 𝑟 2 J𝑒 1 × 𝑒 2 K(𝑣 0 ) = 𝑟 1 · 𝑟 2

J𝑒 1 K(𝑣 0 ) = 𝑟 1 J𝑒 2 K(𝑣 0 ) = 𝑟 2 J𝑏 1 K(𝑣 0 ) = 𝑟 1 J𝑏 2 K(𝑣 0 ) = 𝑟 2


LessThan And
J𝑒 1 < 𝑒 2 K(𝑣 0 ) = (𝑟 1 < 𝑟 2 ) Jand 𝑏 1 𝑏 2 K(𝑣 0 ) = (𝑟 1 ∧ 𝑟 2 )

J𝑏 1 K(𝑣 0 ) = 𝑟 1 J𝑏 2 K(𝑣 0 ) = 𝑟 2 J𝑏K(𝑣 0 ) = 𝑟


Or Not
Jor 𝑏 1 𝑏 2 K(𝑣 0 ) = (𝑟 1 ∨ 𝑟 2 ) Jnot 𝑏K(𝑣 0 ) = (¬𝑟 )

Fig. 11. Semantics of IntArith

Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
Synthesizing Formal Semantics from Executable Interpreters 284:31

′ 𝑀′ 𝑀′
!
J𝑒K(𝑠) = 𝑀
𝑀  𝑀0,0
0,0 𝑀0,1 𝑀0,2 0,1 0,2
′ ′ 𝑀′
𝑀B 𝑀1,1 𝑀1,2 𝑀 B 𝑀1,1 1,2 Eval
𝑀2,2 ′
𝑀2,2 Jeval 𝑒K(𝑠) = 𝑀0,2

  Eps  false false false  Phi


true false false
J𝜖K(𝑠) = true false J𝜙K(𝑠) = false false
true false

 false (𝑠 0 =a) false


 CharA  false (𝑠 0 =b) false
 CharB
JaK(𝑠) = false (𝑠 1 =a) JbK(𝑠) = false (𝑠 1 =b)
false false

J𝑒 1 K(𝑠) = 𝑀 J𝑒 2 K(𝑠) = 𝑀 ′
 false (𝑠 0 ≠𝜖 ) false
 Any ′ ) (𝑀 ∨𝑀 ′ ) (𝑀 ∨𝑀 ′ )
! Or
(𝑀0,0 ∨𝑀0,0 0,1 0,1 0,2 0,2
J?K(𝑠) = false (𝑠 1 ≠𝜖 )
J𝑒 1 + 𝑒 2 K(𝑠) = ′ ) (𝑀 ∨𝑀 ′ )
(𝑀1,1 ∨𝑀1,1 1,2 1,2
false ′ )
(𝑀2,2 ∨𝑀2,2

J𝑒 1 K(𝑠) = 𝑀 J𝑒 2 K(𝑠) = 𝑀 ′
′ ) Ô ′ Ô ′ )
! Concat
(𝑀0,0 ∧𝑀0,0 𝑖=0...1 (𝑀0,𝑖 ∧𝑀𝑖,1 ) (𝑀 ∧𝑀𝑖,2
′ ) Ô𝑖=0...2 0,𝑖 ′
J𝑒 1 · 𝑒 2 K(𝑠) = (𝑀1,1 ∧𝑀1,1 𝑖=1...2 (𝑀1,𝑖 ∧𝑀𝑖,2 )
(𝑀2,2 ∧𝑀2,2 ′ )

J𝑒K(𝑠) = 𝑀 J𝑒K(𝑠) = 𝑀
  Star  ¬𝑀 ¬𝑀 ¬𝑀0,2
 Neg
true 𝑀0,1 𝑀0,2 ∨(𝑀0,1 ∧𝑀1,2 ) 0,0 0,1
∗ ∗
J𝑒 K(𝑠) = true 𝑀1,2 J𝑒 K(𝑠) = ¬𝑀1,1 ¬𝑀1,2
true ¬𝑀2,2

Fig. 12. Semantics of RegEx

Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
284:32 Jiangyi Liu, Charlie Murphy, Anvay Grover, Keith J.C. Johnson, Thomas Reps, and Loris D’Antoni

′ 𝑀′ 𝑀′
!
J𝑒K(𝑠) = (𝑀𝜖 , 𝑀)
𝑀  𝑀0,0
0,0 𝑀0,1 𝑀0,2 0,1 0,2
′ ′ 𝑀′
𝑀B 𝑀1,1 𝑀1,2 𝑀 B 𝑀1,1 1,2 Eval
𝑀2,2 ′
𝑀2,2 Jeval 𝑒K(𝑠) = 𝑀0,1

  Eps     Phi
false false false false
J𝜖K(𝑠) = (true, false ) J𝜙K(𝑠) = false, false

    CharA     CharB
JaK(𝑠) = false, 𝑠0 =a 𝑠false
0 =a ) JbK(𝑠) = false, 𝑠0 =b 𝑠false
0 =b

J𝑒 1 K(𝑠) = (𝑀𝜖 , 𝑀) J𝑒 2 K(𝑠) = (𝑀𝜖′ , 𝑀 ′ )


    Any   𝑀 ∨𝑀 ′ 𝑀 ∨𝑀 ′   Or
𝑠 0 ≠𝜖 false
J𝑒 1 + 𝑒 2 K(𝑠) = 𝑀𝜖 ∨ 𝑀𝜖′ ,
0,0 0,0 0,1 0,1
J?K(𝑠) = false, 𝑠 0 ≠𝜖 𝑀1,1 ∨𝑀 ′
1,1

J𝑒 1 K(𝑠) = (𝑀𝜖 , 𝑀) J𝑒 2 K(𝑠) = (𝑀𝜖′ , 𝑀 ′ )


 h (𝑀 ∧𝑀 ′ )∨(𝑀 ∧𝑀 ′ ) (𝑀 ∧𝑀 ′ )∨(𝑀 ∧𝑀 ′ )∨(𝑀 ∧𝑀 ′ ) i  Concat
J𝑒 1 · 𝑒 2 K(𝑠) = 𝑀𝜖 ∧ 𝑀𝜖′ ,
𝜖 0,0 0,0 𝜖 𝜖 0,1 0,0 1,1 0,1 𝜖
(𝑀𝜖 ∧𝑀 ′ )∨(𝑀1,1 ∧𝑀 ′ ) 1,1 𝜖

J𝑒K(𝑠) = (𝑀𝜖 , 𝑀) J𝑒K(𝑠) = (𝑀𝜖 , 𝑀)


    Star  h i  Meg
¬𝑀0,0 ¬𝑀0,1
J𝑒 ∗ K(𝑠) = true, 𝑀0,0 𝑀0,1 ∨(𝑀 0,0 ∧𝑀1,1 )
𝑀1,1 J𝑒 ∗ K(𝑠) = ¬𝑀𝜖 , ¬𝑀1,1

Fig. 13. Semantics of RegExSimp

Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
Synthesizing Formal Semantics from Executable Interpreters 284:33

Const0 Const1 ConstT


J0K(𝑣 0, 𝑣 1 ) = 0 J1K(𝑣 0, 𝑣 1 ) = 1 JtK(𝑣 0, 𝑣 1 ) = true

ConstF VarX VarY


JfK(𝑣 0, 𝑣 1 ) = false JxK(𝑣 0, 𝑣 1 ) = 𝑣 0 JyK(𝑣 0, 𝑣 1 ) = 𝑣 1

J𝑒 1 K(𝑣 0, 𝑣 1 ) = 𝑢 1 J𝑒 2 K(𝑣 0, 𝑣 1 ) = 𝑢 2 J𝑒 1 K(𝑣 0, 𝑣 1 ) = 𝑢 1 J𝑒 2 K(𝑣 0, 𝑣 1 ) = 𝑢 2


Plus Minus
J𝑒 1 + 𝑒 2 K(𝑣 0, 𝑣 1 ) = 𝑢 1 + 𝑢 2 J𝑒 1 − 𝑒 2 K(𝑣 0, 𝑣 1 ) = 𝑢 1 − 𝑢 2

J𝑒 1 K(𝑣 0, 𝑣 1 ) = 𝑢 1 J𝑒 2 K(𝑣 0, 𝑣 1 ) = 𝑢 2 𝑢1 < 𝑢2


LessThanTrue
J𝑒 1 < 𝑒 2 K(𝑣 0, 𝑣 1 ) = true

J𝑒 1 K(𝑣 0, 𝑣 1 ) = 𝑢 1 J𝑒 2 K(𝑣 0, 𝑣 1 ) = 𝑢 2 𝑢1 ≥ 𝑢2
LessThanFalse
J𝑒 1 < 𝑒 2 K(𝑣 0, 𝑣 1 ) = false

J𝑏 1 K(𝑣 0, 𝑣 1 ) = 𝑢 1 J𝑏 2 K(𝑣 0, 𝑣 1 ) = 𝑢 2
BoolAnd
J𝑏 1 and 𝑏 2 K(𝑣 0, 𝑣 1 ) = 𝑢 1 ∧ 𝑢 2

J𝑏 1 K(𝑣 0, 𝑣 1 ) = 𝑢 1 J𝑏 2 K(𝑣 0, 𝑣 1 ) = 𝑢 2 J𝑏K(𝑣 0, 𝑣 1 ) = 𝑢


BoolOr BoolNot
J𝑏 1 or 𝑏 2 K(𝑣 0, 𝑣 1 ) = 𝑢 1 ∨ 𝑢 2 Jnot 𝑏K(𝑣 0, 𝑣 1 ) = ¬𝑢

J𝑒K(𝑣 0, 𝑣 1 ) = 𝑣 J𝑒K(𝑣 0, 𝑣 1 ) = 𝑣
AssignX AssignY
Jx := 𝑒K(𝑣 0, 𝑣 1 ) = (𝑣, 𝑣 1 ) Jy := 𝑒K(𝑣 0, 𝑣 1 ) = (𝑣 0, 𝑣)

IncX IncY
Jx + +K(𝑣 0, 𝑣 1 ) = (𝑣 0 + 1, 𝑣 1 ) Jy + +K(𝑣 0, 𝑣 1 ) = (𝑣 0, 𝑣 1 + 1)

DecX DecY
Jx − −K(𝑣 0, 𝑣 1 ) = (𝑣 0 − 1, 𝑣 1 ) Jy − −K(𝑣 0, 𝑣 1 ) = (𝑣 0, 𝑣 1 − 1)

Fig. 14. Semantics of Imp(2), part 1

Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
284:34 Jiangyi Liu, Charlie Murphy, Anvay Grover, Keith J.C. Johnson, Thomas Reps, and Loris D’Antoni

J𝑠 1 K(𝑣 0, 𝑣 1 ) = (𝑣 0′ , 𝑣 1′ ) J𝑠 2 K(𝑣 0′ , 𝑣 1′ ) = (𝑣 0′′, 𝑣 1′′ )


Seq
J𝑠 1 ;𝑠 2 K(𝑣 0, 𝑣 1 ) = (𝑣 0′′, 𝑣 1′′ )

J𝑏K(𝑣 0, 𝑣 1 ) = 𝑣 J𝑠 1 K(𝑣 0, 𝑣 1 ) = (𝑣 0′ , 𝑣 1′ ) J𝑠 2 K(𝑣 0, 𝑣 1 ) = (𝑣 0′′, 𝑣 1′′ )


Ite
Jif 𝑏 then 𝑠 1 else 𝑠 2 K(𝑣 0, 𝑣 1 ) = 𝑣 ? (𝑣 0′ , 𝑣 1′ ) : (𝑣 0′′, 𝑣 1′′ )

J𝑏K(𝑣 0, 𝑣 1 ) = true J𝑠K(𝑣 0, 𝑣 1 ) = (𝑣 0′ , 𝑣 1′ ) Jwhile 𝑏 do 𝑠K(𝑣 0′ , 𝑣 1′ ) = (𝑣 0′′, 𝑣 1′′ )


WhileLoop
Jwhile 𝑏 do 𝑠K(𝑣 0, 𝑣 1 ) = (𝑣 0′′, 𝑣 1′′ )

J𝑏K(𝑣 0, 𝑣 1 ) = false
WhileEnd
Jwhile 𝑏 do 𝑠K(𝑣 0, 𝑣 1 ) = (𝑣 0, 𝑣 1 )

J𝑠K(𝑣 0, 𝑣 1 ) = (𝑣 0′ , 𝑣 1′ ) J𝑏K(𝑣 0′ , 𝑣 1′ ) = true Jdo 𝑠 while 𝑏K(𝑣 0′ , 𝑣 1′ ) = (𝑣 0′′, 𝑣 1′′ )


DoWhileLoop
Jdo 𝑠 while 𝑏K(𝑣 0, 𝑣 1 ) = (𝑣 0′′, 𝑣 1′′ )

J𝑠K(𝑣 0, 𝑣 1 ) = (𝑣 0′ , 𝑣 1′ ) J𝑏K(𝑣 0′ , 𝑣 1′ ) = false


DoWhileEnd
Jdo 𝑠 while 𝑏K(𝑣 0, 𝑣 1 ) = (𝑣 0′ , 𝑣 1′ )

Fig. 15. Semantics of Imp(2), part 2

Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
Synthesizing Formal Semantics from Executable Interpreters 284:35

Const0 Const1 ConstT


J0K(𝜎) = 0 J1K(𝜎) = 1 JtK(𝜎) = true

J𝑒 1 K(𝜎) = 𝑢 1 J𝑒 2 K(𝜎) = 𝑢 2
ConstF Var Plus
JfK(𝜎) = false Jvar𝑖 K(𝜎) = 𝜎 [𝑖] J𝑒 1 + 𝑒 2 K(𝜎) = 𝑢 1 + 𝑢 2

J𝑒 1 K(𝜎) = 𝑢 1 J𝑒 2 K(𝜎) = 𝑢 2
Minus
J𝑒 1 − 𝑒 2 K(𝜎) = 𝑢 1 − 𝑢 2

J𝑒 1 K(𝜎) = 𝑢 1 J𝑒 2 K(𝜎) = 𝑢 2 𝑢1 < 𝑢2


LessThanTrue
J𝑒 1 < 𝑒 2 K(𝜎) = true

J𝑒 1 K(𝜎) = 𝑢 1 J𝑒 2 K(𝜎) = 𝑢 2 𝑢1 ≥ 𝑢2
LessThanFalse
J𝑒 1 < 𝑒 2 K(𝜎) = false

J𝑏 1 K(𝜎) = 𝑢 1 J𝑏 2 K(𝜎) = 𝑢 2 J𝑏 1 K(𝜎) = 𝑢 1 J𝑏 2 K(𝜎) = 𝑢 2


BoolAnd BoolOr
J𝑏 1 and 𝑏 2 K(𝜎) = 𝑢 1 ∧ 𝑢 2 J𝑏 1 or 𝑏 2 K(𝜎) = 𝑢 1 ∨ 𝑢 2

J𝑏K(𝜎) = 𝑢 J𝑒K(𝜎) = 𝑣
BoolNot Assign
Jnot 𝑏K(𝜎) = ¬𝑢 Jvar𝑖 := 𝑒K(𝜎) = 𝜎 [𝑖 ↦→ 𝑣]

Inc Dec
Jinc_var𝑖 K(𝜎) = 𝜎 [𝑖 ↦→ 𝜎 [𝑖] + 1] Jdec_var𝑖 K(𝜎) = 𝜎 [𝑖 ↦→ 𝜎 [0] − 1]

J𝑠 1 K(𝜎) = 𝜎 ′ J𝑠 2 K(𝜎 ′ ) = 𝜎 ′′ J𝑏K(𝜎) = 𝑣 J𝑠 1 K(𝜎) = 𝜎 ′ J𝑠 2 K(𝜎) = 𝜎 ′′


Seq Ite
J𝑠 1 ;𝑠 2 K(𝜎) = 𝜎 ′′ Jif 𝑏 then 𝑠 1 else 𝑠 2 K(𝜎) = 𝑣 ? 𝜎 ′ : 𝜎 ′′

J𝑏K(𝜎) = true J𝑠K(𝜎) = 𝜎 ′ Jwhile 𝑏 do 𝑠K(𝜎 ′ ) = 𝜎 ′′


WhileLoop
Jwhile 𝑏 do 𝑠K(𝜎) = 𝜎 ′′

J𝑏K(𝜎) = false
WhileEnd
Jwhile 𝑏 do 𝑠K(𝜎) = (𝜎)

J𝑠K(𝜎) = 𝜎 ′ J𝑏K(𝜎 ′ ) = true Jdo 𝑠 while 𝑏K(𝜎 ′ ) = 𝜎 ′′


DoWhileLoop
Jdo 𝑠 while 𝑏K(𝜎) = 𝜎 ′′

J𝑠K(𝜎) = 𝜎 ′ J𝑏K(𝜎 ′ ) = false


DoWhileEnd
Jdo 𝑠 while 𝑏K(𝜎) = 𝜎 ′

Fig. 16. Semantics of ImpArr

Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
284:36 Jiangyi Liu, Charlie Murphy, Anvay Grover, Keith J.C. Johnson, Thomas Reps, and Loris D’Antoni

𝑖 = 0, 1, . . . , (𝑘 − 1)
VarAtom BvZero
Jv𝑖 K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑥𝑖 J0K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 0x00000000

BvOne
J1K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 0x00000001

J𝑒 1 K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 1 J𝑒 2 K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 2
BvUlt
J𝑒 1 < 𝑒 2 K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 1 <unsigned 𝑟 2

J𝑒 1 K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 1 J𝑒 2 K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 2
BvUge
J𝑒 1 ≥ 𝑒 2 K(𝑥 0, . . . , 𝑥𝑘 −1 ) = ¬(𝑟 1 <unsigned 𝑟 2 )

J𝑒 1 K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 1 J𝑒 2 K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 2
BvUle
J𝑒 1 ≤ 𝑒 2 K(𝑥 0, . . . , 𝑥𝑘 −1 ) = ¬(𝑟 2 <unsigned 𝑟 1 )

J𝑒 1 K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 1 J𝑒 2 K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 2 ⊙ ∈ {&, |, ⊕, ≫, ≪}


BvBitwise
J𝑒 1 ⊙ 𝑒 2 K(𝑥 0, . . . , 𝑥𝑘 −1 ) = (𝑟 1 ⊙ 𝑟 2 )

J𝑒K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 𝑟 ≠ 0x00000000
AnyBit1
Jany_bit 𝑒K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 0x00000001

J𝑒K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 𝑟 = 0x00000000 J𝑒K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟


AnyBit0 BvNot
Jany_bit 𝑒K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 0x00000000 J∼ 𝑒K(𝑥 0, . . . , 𝑥𝑘 −1 ) =∼ 𝑒

J𝑒K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟
BvNeg
J¬ 𝑒K(𝑥 0, . . . , 𝑥𝑘 −1 ) = ¬𝑒

J𝑒 1 K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 1 J𝑒 2 K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 2 ⊙ ∈ {+, −, ×, ÷}


BvArith
J𝑒 1 ⊙ 𝑒 2 K(𝑥 0, . . . , 𝑥𝑘 −1 ) = (𝑟 1 ⊙ 𝑟 2 )

Fig. 17. Semantics of BvSimple(𝑘)

Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
Synthesizing Formal Semantics from Executable Interpreters 284:37

𝑖 = 0, 1, . . . , (𝑘 − 1)
VarAtom BvZero
Jv𝑖 K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑥𝑖 J0K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 0x00000000

BvOne
J1K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 0x00000001

J𝑒 1 K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 1 J𝑒 2 K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 2
BvUlt
J𝑒 1 < 𝑒 2 K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 1 <unsigned 𝑟 2

J𝑒 1 K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 1 J𝑒 2 K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 2
BvUge
J𝑒 1 ≥ 𝑒 2 K(𝑥 0, . . . , 𝑥𝑘 −1 ) = ¬(𝑟 1 <unsigned 𝑟 2 )

J𝑒 1 K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 1 J𝑒 2 K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 2
BvUle
J𝑒 1 ≤ 𝑒 2 K(𝑥 0, . . . , 𝑥𝑘 −1 ) = ¬(𝑟 2 <unsigned 𝑟 1 )

J𝑒 1 K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 1 J𝑒 2 K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 2 ⊙ ∈ {&, |, ⊕, ≫, ≪}


BvBitwise
J𝑒 1 ⊙ 𝑒 2 K(𝑥 0, . . . , 𝑥𝑘 −1 ) = (𝑟 1 ⊙ 𝑟 2 )

J𝑒K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 𝑟 ≠ 0x00000000
AnyBit1
Jany_bit 𝑒K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 0x00000001

J𝑒K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 𝑟 = 0x00000000 J𝑒K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟


AnyBit0 BvNot
Jany_bit 𝑒K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 0x00000000 J∼ 𝑒K(𝑥 0, . . . , 𝑥𝑘 −1 ) =∼ 𝑒

J𝑒K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟
BvNeg
J¬ 𝑒K(𝑥 0, . . . , 𝑥𝑘 −1 ) = ¬𝑒

J𝑒 1 K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 1 J𝑒 2 K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 2 ⊙ ∈ {+, −, ×, ÷}


BvArith
J𝑒 1 ⊙ 𝑒 2 K(𝑥 0, . . . , 𝑥𝑘 −1 ) = (𝑟 1 ⊙sat 𝑟 2 )

𝑎 ⊙𝑠𝑎𝑡 𝑏 = 𝑎 ⊙ 𝑏 (when no overflow or underflow)


𝑎 ⊙𝑠𝑎𝑡 𝑏 = 0xffffffff (when overflow happens)
𝑎 ⊙𝑠𝑎𝑡 𝑏 = 0x00000000 (when underflow happens)

Fig. 18. Semantics of BvSaturated(𝑘)

Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
284:38 Jiangyi Liu, Charlie Murphy, Anvay Grover, Keith J.C. Johnson, Thomas Reps, and Loris D’Antoni

𝑖 = 0, 1, . . . , (𝑚 − 1)
ConstAtom
Jv𝑖 K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = 𝑢𝑖

𝑖 = 0, 1, . . . , (𝑛 − 1)
VarAtom
Jo𝑖 K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = 𝑥𝑖

J𝑒K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = 𝑟


Assign
Jo𝑖 := 𝑒K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = (𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑖 −1, 𝑟, 𝑥𝑖+1, . . . , 𝑥𝑛−1 )

® = (𝑢®′, 𝑥®′ )
® 𝑥)
J𝑠 1 K(𝑢, J𝑠 1 K(𝑢®′, 𝑥®′ ) = (𝑢®′′, 𝑥®′′ )
Seq 𝑢® = (𝑢 0, . . . , 𝑢𝑚−1 ), 𝑥® = (𝑥 0, . . . , 𝑥𝑛−1 )
® = (𝑢®′′, 𝑥®′′ )
® 𝑥)
J𝑠 1 ; 𝑠 2 K(𝑢,

BvZero
J0K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = 0x00000000

BvOne
J1K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = 0x00000001

J𝑒 1 K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = 𝑟 1 J𝑒 2 K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = 𝑟 2


BvUlt
J𝑒 1 < 𝑒 2 K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = 𝑟 1 <unsigned 𝑟 2

J𝑒 1 K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = 𝑟 1 J𝑒 2 K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = 𝑟 2


BvUge
J𝑒 1 ≥ 𝑒 2 K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = ¬(𝑟 1 <unsigned 𝑟 2 )

J𝑒 1 K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = 𝑟 1 J𝑒 2 K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = 𝑟 2


BvUle
J𝑒 1 ≤ 𝑒 2 K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = ¬(𝑟 2 <unsigned 𝑟 1 )

J𝑒 1 K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = 𝑟 1


J𝑒 2 K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = 𝑟 2 ⊙ ∈ {&, |, ⊕, ≫, ≪}
BvBitwise
J𝑒 1 ⊙ 𝑒 2 K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = (𝑟 1 ⊙ 𝑟 2 )

J𝑒K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = 𝑟 𝑟 ≠ 0x00000000


AnyBit1
Jany_bit 𝑒K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = 0x00000001

J𝑒K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = 𝑟 𝑟 = 0x00000000


AnyBit0
Jany_bit 𝑒K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = 0x00000000

J𝑒K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = 𝑟 J𝑒K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = 𝑟


BvNot BvNeg
J∼ 𝑒K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) =∼ 𝑒 J¬ 𝑒K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = ¬𝑒

J𝑒 1 K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = 𝑟 1 J𝑒 2 K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = 𝑟 2 ⊙ ∈ {+, −, ×, ÷}


BvArith
J𝑒 1 ⊙ 𝑒 2 K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = (𝑟 1 ⊙ 𝑟 2 )

Fig. 19. Semantics of BvImpSimple(𝑘)


Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
Synthesizing Formal Semantics from Executable Interpreters 284:39

𝑖 = 0, 1, . . . , (𝑚 − 1)
ConstAtom
Jv𝑖 K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = 𝑢𝑖

𝑖 = 0, 1, . . . , (𝑛 − 1)
VarAtom
Jo𝑖 K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = 𝑥𝑖

J𝑒K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = 𝑟


Assign
Jo𝑖 := 𝑒K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = (𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑖 −1, 𝑟, 𝑥𝑖+1, . . . , 𝑥𝑛−1 )

® = (𝑢®′, 𝑥®′ )
® 𝑥)
J𝑠 1 K(𝑢, J𝑠 1 K(𝑢®′, 𝑥®′ ) = (𝑢®′′, 𝑥®′′ )
Seq 𝑢® = (𝑢 0, . . . , 𝑢𝑚−1 ), 𝑥® = (𝑥 0, . . . , 𝑥𝑛−1 )
® = (𝑢®′′, 𝑥®′′ )
® 𝑥)
J𝑠 1 ; 𝑠 2 K(𝑢,

BvZero
J0K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = 0x00000000

BvOne
J1K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = 0x00000001

J𝑒 1 K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = 𝑟 1 J𝑒 2 K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = 𝑟 2


BvUlt
J𝑒 1 < 𝑒 2 K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = 𝑟 1 <unsigned 𝑟 2

J𝑒 1 K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = 𝑟 1 J𝑒 2 K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = 𝑟 2


BvUge
J𝑒 1 ≥ 𝑒 2 K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = ¬(𝑟 1 <unsigned 𝑟 2 )

J𝑒 1 K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = 𝑟 1 J𝑒 2 K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = 𝑟 2


BvUle
J𝑒 1 ≤ 𝑒 2 K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = ¬(𝑟 2 <unsigned 𝑟 1 )

J𝑒 1 K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = 𝑟 1


J𝑒 2 K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = 𝑟 2 ⊙ ∈ {&, |, ⊕, ≫, ≪}
BvBitwise
J𝑒 1 ⊙ 𝑒 2 K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = (𝑟 1 ⊙ 𝑟 2 )

J𝑒K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = 𝑟 𝑟 ≠ 0x00000000


AnyBit1
Jany_bit 𝑒K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = 0x00000001

J𝑒K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = 𝑟 𝑟 = 0x00000000


AnyBit0
Jany_bit 𝑒K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = 0x00000000

J𝑒K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = 𝑟 J𝑒K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = 𝑟


BvNot BvNeg
J∼ 𝑒K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) =∼ 𝑒 J¬ 𝑒K(𝑢 0, . . . , 𝑢𝑚−1, 𝑥 0, . . . , 𝑥𝑛−1 ) = ¬𝑒

J𝑒 1 K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 1 J𝑒 2 K(𝑥 0, . . . , 𝑥𝑘 −1 ) = 𝑟 2 ⊙ ∈ {+, −, ×, ÷}


BvArith
J𝑒 1 ⊙ 𝑒 2 K(𝑥 0, . . . , 𝑥𝑘 −1 ) = (𝑟 1 ⊙sat 𝑟 2 )

Fig. 20. Semantics of BvImpSaturated(𝑘)


Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
284:40 Jiangyi Liu, Charlie Murphy, Anvay Grover, Keith J.C. Johnson, Thomas Reps, and Loris D’Antoni

J𝑛K(𝑣 0 ) = 𝑟
One Zero VarX CountBit
J1K(𝑣 0 ) = 1 J0K(𝑣 0 ) = 0 JxK(𝑣 0 ) = 𝑣 0 Jcount 𝑛K(𝑣 0 ) = 𝑟

J𝑚K(𝑣 0 ) = 𝑟 J𝑛K(𝑣 0 ) = 𝑟 1 J𝑏K(𝑣 0 ) = 𝑟 2


BinToDec CountBitConcat
Jbin2dec 𝑚K(𝑣 0 ) = 𝑟 Jconcat 𝑛 𝑏K(𝑣 0 ) = 𝑟 1 + int(𝑟 2 )

J𝑚K(𝑣 0 ) = 𝑟 1 J𝑏K(𝑣 0 ) = 𝑟 2 J𝑏K(𝑣 0 ) = 𝑟



BinToDecConcat CountBitAtom
Jconcat 𝑚 𝑏K(𝑣 0 ) = 2 · 𝑟 1 + int(𝑟 2 ) Jatom 𝑏K(𝑣 0 ) = 𝑟

J𝑏K(𝑣 0 ) = 𝑟
BinToDecAtom
Jatom′ 𝑏K(𝑣 0 ) = 𝑟

Fig. 21. Semantics of BinOp

Zero One Two Four


J0K(𝑣 0 ) = 0 J1K(𝑣 0 ) = 1 J2K(𝑣 0 ) = 2 J4K(𝑣 0 ) = 4

J𝑘 1 K(𝑣 0 ) = 𝑟 1 J𝑘 2 K(𝑣 0 ) = 𝑟 2
Eight VarX ScalarPlus
J8K(𝑣 0 ) = 8 JxK(𝑣 0 ) = 𝑣 0 J𝑘 1 +k 𝑘 2 K(𝑣 0 ) = 𝑟 1 + 𝑟 2

J𝑠 1 K(𝑣 0 ) = 𝑟 1 J𝑠 2 K(𝑣 0 ) = 𝑟 2 J𝑠 1 K(𝑣 0 ) = 𝑟 1 J𝑠 2 K(𝑣 0 ) = 𝑟 2


CurrencyPlus CurrencySubtract
J𝑠 1 + 𝑠 2 K(𝑣 0 ) = 𝑟 1 + 𝑟 2 J𝑠 1 − 𝑠 2 K(𝑣 0 ) = 𝑟 1 − 𝑟 2

J𝑠K(𝑣 0 ) = 𝑟 1 J𝑘K(𝑣 0 ) = 𝑟 2 J𝑘K(𝑣 0 ) = 𝑟


CurrencyTimesScalar CurrencyJpy
J𝑠 × 𝑘K(𝑣 0 ) = 𝑟 1 · 𝑟 2 Jjpy 𝑘K(𝑣 0 ) = 𝑟

J𝑘K(𝑣 0 ) = 𝑟 J𝑘K(𝑣 0 ) = 𝑟
CurrencyCny CurrencyUsd
Jcny 𝑘K(𝑣 0 ) = 21 · 𝑟 Jusd 𝑘K(𝑣 0 ) = 152 · 𝑟

Fig. 22. Semantics of Currency

Zero One Two


J0K(𝑣 0 ) = (0, 0, 0)) J1K(𝑣 0 ) = (1, 1, 0) J2K(𝑣 0 ) = (2, 2, 0)

J𝑒 1 K(𝑣 0 ) = (𝑟 1, 𝑠 1, 𝑡 1 )J𝑒 2 K(𝑣 0 ) = (𝑟 2, 𝑠 2, 𝑡 2 )


VarX Plus
JxK(𝑣 0 ) = (𝑣 0, 𝑣 0 + 1, 0) J𝑒 1 + 𝑒 2 K(𝑣 0 ) = (𝑟 1 + 𝑟 2, 𝑠 1 + 𝑠 2, 𝑡 1 + 𝑡 2 )

J𝑒 1 K(𝑣 0 ) = (𝑟 1, 𝑠 1, 𝑡 1 )J𝑒 2 K(𝑣 0 ) = (𝑟 2, 𝑠 2, 𝑡 2 )


Multiply
J𝑒 1 × 𝑒 2 K(𝑣 0 ) = (𝑟 1 · 𝑟 2, 𝑠 1 · 𝑠 2, 𝑟 1 · 𝑡 2 + 𝑟 2 · 𝑠 1 )

Fig. 23. Semantics of Diff

Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
Synthesizing Formal Semantics from Executable Interpreters 284:41

𝑖 = 0, 1, . . . , 8 J𝑒K(𝑣 0 ) = 𝑟
IntLiteral VarX Expr
J𝑖K(𝑣 0 ) = 𝑖 JxK(𝑣 0 ) = 𝑣 0 Jexpr 𝑒K(𝑣 0 ) = 𝑟

J𝑏K(𝑣 0 ) = 𝑟 0 J𝑒 1 K(𝑣 0 ) = 𝑟 1 J𝑒 2 K(𝑣 0 ) = 𝑟 2 𝑟0


Ite1
Jite 𝑏 𝑒 1 𝑒 2 K(𝑣 0 ) = 𝑟 1

J𝑏K(𝑣 0 ) = 𝑟 0 J𝑒 1 K(𝑣 0 ) = 𝑟 1 J𝑒 2 K(𝑣 0 ) = 𝑟 2 ¬𝑟 0


Ite2
Jite 𝑏 𝑒 1 𝑒 2 K(𝑣 0 ) = 𝑟 2

J𝑒K(𝑣 0 ) = 𝑟 1 J𝑓 K(𝑣 0 ) = 𝑟 2 J𝑒K(𝑣 0 ) = 𝑟 1 J𝑓 K(𝑣 0 ) = 𝑟 2


Plus Minus
J𝑒 + 𝑓 K(𝑣 0 ) = 𝑟 1 + 𝑟 2 J𝑒 − 𝑓 K(𝑣 0 ) = 𝑟 1 − 𝑟 2

J𝑓 K(𝑣 0 ) = 𝑟 J𝑓 K(𝑣 0 ) = 𝑟 1 J𝑔K(𝑣 0 ) = 𝑟 2


Atom Multiply
Jatom 𝑓 K(𝑣 0 ) = 𝑟 J𝑓 ∗ 𝑔K(𝑣 0 ) = 𝑟 1 · 𝑟 2

J𝑓 K(𝑣 0 ) = 𝑟 1 J𝑔K(𝑣 0 ) = 𝑟 2 J𝑔K(𝑣 0 ) = 𝑟


Divide Atom
J𝑓 ÷ 𝑔K(𝑣 0 ) = 𝑟 1 ÷ 𝑟 2 Jnum 𝑔K(𝑣 0 ) = 𝑟

J𝑒 1 K(𝑣 0 ) = 𝑟 1 J𝑒 2 K(𝑣 0 ) = 𝑟 2 ⊖ ∈ {<, ≤, >, ≥, =, ≠}


Cmp
J𝑒 1 ⊖ 𝑒 2 K(𝑣 0 ) = (𝑟 1 ⊖ 𝑟 2 )

Fig. 24. Semantics of IteExpr

Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
284:42 Jiangyi Liu, Charlie Murphy, Anvay Grover, Keith J.C. Johnson, Thomas Reps, and Loris D’Antoni

Table 2. Evaluation results with optimization turned on.3

Lang. Rule # Iter. # Ex SyGuS (s) SMT (s) Total (s)


𝐸 →0 1 1 0.01 0.02 0.29
𝐸 →1 1 1 0.01 0.01 0.14
𝐸 → v0 1 1 0.01 0.01 0.13
𝐸 → v1 1 1 0.01 0.02 0.11
𝐸 → v2 1 1 0.02 0.01 0.09
𝐸 → −𝐸 2 2 0.09 1.09 1.92
𝐸 → ∼𝐸 2 2 0.06 0.88 1.65
BVSimple(3)

𝐸 → any_bit 𝐸 4 4 1.93 0.93 3.64


𝐸 →𝐸 + 𝐸 9 2 1.98 0.89 4.12
𝐸 →𝐸 & 𝐸 6 2 2.67 1.47 7.31
𝐸 →𝐸 ÷ 𝐸 6 2 1.39 37.08 62.33
𝐸 →𝐸 = 𝐸 19 6 69.51 4.16 77.67
𝐸 →𝐸 ≫ 𝐸 9 3 1.72 2.05 5.54
𝐸 →𝐸 × 𝐸 9 2 2.31 0.69 3.98
𝐸 →𝐸 | 𝐸 10 3 2.55 1.19 4.98
𝐸 →𝐸 ≪ 𝐸 9 3 1.60 3.13 6.71
𝐸 →𝐸 − 𝐸 9 2 1.28 1.61 4.61
𝑆 →𝐸 ≥ 𝐸 14 6 4.73 2.34 10.39
𝑆 →𝐸 ≤ 𝐸 13 5 2.55 1.84 6.75
𝑆 →𝐸 < 𝐸 6 5 0.46 1.54 3.62
𝐸 →𝐸 ⊕ 𝐸 9 3 2.00 1.56 5.10
→0 1 1 0.01 0.01 0.15
BVSaturated(2)(T7)

𝐸
𝐸 →1 1 1 0.01 0.01 0.06
𝐸 → v0 1 1 0.01 0.01 0.05
𝐸 → v1 1 1 0.01 0.01 0.05
𝐸 →𝐸 & 𝐸 5 3 0.46 0.94 3.41
𝐸 →𝐸 ≫ 𝐸 6.0 3.0 1.07 2.67 5.65
𝐸 →𝐸 × 𝐸 10.0 6.0 404.76 3.09 414.49
𝐸 →𝐸 | 𝐸 5.5 3.0 1.62 1.30 4.31
𝐸 →𝐸 ≪ 𝐸 6.0 3.0 0.70 2.03 4.41
𝐸 →𝐸 − 𝐸 6.0 5.0 5.46 3.07 12.22
𝐸 →𝐸 ⊕ 𝐸 5.5 2.5 0.64 1.03 2.94
𝐸 →0 1 1 0.01 0.02 0.36
𝐸 →1 1 1 0.01 0.03 0.17
𝐸 → o0 1 1 0.01 0.02 0.10
𝐸 → o1 1 1 0.01 0.01 0.07
𝐸 → v0 1 1 0.01 0.01 0.14
𝑆 → o0 B 𝐸 1 1 0.35 0.49 1.27
BVIMPSimple(1, 2)

𝑆 → o1 B 𝐸 2 2 0.32 0.38 1.47


𝐸 → −𝐸 2 2 0.07 0.79 1.64
𝐸 → ∼𝐸 2 2 0.08 0.61 1.47
𝐸 → any_bit 𝐸 4 4 1.30 0.79 2.72
𝐸 →𝐸 + 𝐸 5 2 1.19 0.92 3.48
𝐸 →𝐸 & 𝐸 7 3 3.65 1.46 8.17
𝐸 →𝐸 ÷ 𝐸 7 3 2.58 35.00 60.76
𝐸 →𝐸 = 𝐸 20 6 83.47 19.99 108.10
𝐸 →𝐸 ≫ 𝐸 9 3 2.52 2.19 6.40
𝐸 →𝐸 × 𝐸 9 3 2.48 0.86 4.39
𝐸 →𝐸 | 𝐸 9 3 2.00 1.10 4.30
𝐸 →𝐸 ≪ 𝐸 10 3 1.83 2.59 6.72
𝐸 →𝐸 − 𝐸 7 2 1.71 1.52 4.99
𝐵 →𝑆 ≥ 𝑆 26 8 18.24 2.87 25.36
𝐵 →𝑆 ≤ 𝑆 23 6 13.59 1.06 16.75
𝐵 →𝑆 < 𝑆 6 5 0.33 1.02 2.67
𝐸 →𝐸 ⊕ 𝐸 6 2 1.34 1.10 3.50
𝑆 →𝑆 ; 𝑆 13 3 535.19 15.61 590.21
𝑉 → v0 2 2 0.01 0.01 0.12
𝑉 → v1 2 2 0.01 0.01 0.06
𝑉 → v10 3 3 0.02 0.01 0.05
𝑉 → v2 2 2 0.01 0.01 0.05
Cube(11)

𝑉 → v3 2 2 0.01 0.01 0.05


𝑉 → v4 3 3 0.01 0.01 0.03
𝑉 → v5 3 3 0.01 0.01 0.04
𝑉 → v6 3 3 0.01 0.01 0.06
𝑉 → v7 3 4 0.02 0.01 0.06
𝑉 → v8 3 3 0.02 0.01 0.05
𝑉 → v9 3 4 0.01 0.01 0.05
𝐵 → var 𝑉 4 4 0.06 0.51 1.07
𝐵 →𝐵 ∧ 𝐵 116 8 1810.40 4.43 1831.92

3 Note: The label (T𝑖) in the language name means the language timeouts under 𝑖 runs.

Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
Synthesizing Formal Semantics from Executable Interpreters 284:43

𝐸 →0 1 1 0.01 0.01 0.11

Diff(T4)
𝐸 →1 1 1 0.01 0.01 0.06
𝐸 →2 1 1 0.02 0.01 0.06
𝐸 →x 2 2 0.35 0.01 0.41
𝐸 →𝐸 × 𝐸 5 5 92.68 1.00 95.15
𝐸 →𝐸 + 𝐸 3 3 9.12 2.09 13.47
BVIMPSat.(1, 2)(T7)

𝐸 →0 1 1 0.01 0.02 0.37


𝐸 →1 1 1 0.01 0.01 0.14
𝐸 → o0 1 1 0.01 0.01 0.12
𝐸 → o1 1 1 0.01 0.01 0.09
𝐸 → v0 1 1 0.01 0.02 0.14
𝐸 →𝐸 + 𝐸 16 3 31.60 1.11 34.15
𝐸 →𝐸 & 𝐸 6 3 1.75 1.89 7.24

𝑉 → v0 2 2 0.01 0.01 0.12


𝑉 → v1 2 2 0.01 0.01 0.04
𝑉 → v2 2 2 0.01 0.01 0.05
𝑉 → v3 2 3 0.01 0.01 0.04
CNF(8)

𝑉 → v4 2 2 0.01 0.01 0.03


𝑉 → v5 3 3 0.01 0.01 0.04
𝑉 → v6 3 3 0.01 0.01 0.04
𝑉 → v7 3 4 0.01 0.01 0.04
𝐵 → clause 𝐶 4 4 0.03 0.27 0.48
𝐶 → nvar 𝑉 5 5 0.05 0.32 0.70
𝐶 → var 𝑉 4 4 0.05 0.31 0.74
𝐵 →𝐶 ∧ 𝐵 39 6 30.52 0.56 31.83
𝐶 →𝑉 ∨ 𝐶 41 8 37.03 0.69 38.62
𝑉 → v0 2 2 0.01 0.01 0.13
𝑉 → v1 2 2 0.01 0.01 0.04
𝑉 → v2 2 2 0.01 0.01 0.04
𝑉 → v3 2 3 0.01 0.01 0.04
DNF(8)

𝑉 → v4 2 2 0.01 0.01 0.03


𝑉 → v5 3 3 0.01 0.01 0.04
𝑉 → v6 3 3 0.01 0.01 0.03
𝑉 → v7 3 4 0.01 0.01 0.05
𝐵 → conj 𝐶 4 4 0.05 0.30 0.57
𝐶 → nvar 𝑉 5 5 0.05 0.33 0.71
𝐶 → var 𝑉 4 4 0.05 0.32 0.76
𝐶 →𝑉 ∧ 𝐶 33 7 28.84 0.36 29.75
𝐵 →𝐶 ∨ 𝐵 72 6 93.47 0.79 95.62
𝐸 →0 1 1 0.01 0.01 0.05
𝐸 →1 1 1 0.01 0.01 0.04
𝑆 →x − − 2 2 0.06 0.02 0.11
𝑆 →y − − 2 2 0.11 0.03 0.17
𝐵 →f 1 1 0.01 0.01 0.06
𝑆 →x + + 2 2 0.04 0.03 0.11
𝑆 →y + + 2 2 0.12 0.02 0.16
𝐵 →t 1 1 0.01 0.02 0.13
𝐸 →x 2 2 0.01 0.01 0.04
Imp(2)

𝐸 →y 1 1 0.01 0.01 0.04


𝑆 →x B 𝐸 2 2 0.10 3.23 6.17
𝑆 →y B 𝐸 2 2 0.04 3.22 6.19
𝐵 → ¬𝐵 3 3 0.02 2.49 5.26
𝐸 →𝐸 + 𝐸 4 3 0.05 8.52 14.83
𝐸 →𝐸 − 𝐸 5 2 0.13 8.03 13.83
𝐵 →𝐸 < 𝐸 8 5 0.08 7.50 13.66
𝐵 →𝐵 ∧ 𝐵 4 4 0.03 5.33 11.71
𝐵 →𝐵 ∨ 𝐵 4 4 0.05 4.61 8.99
𝑆 →𝑆 ; 𝑆 5 3 4.55 15.00 72.53
𝑆 → do_while 𝑆 𝐵 27 35 858.50 257.33 1374.13
𝑆 → while 𝐵 𝑆 9 7 16.88 122.41 266.80
𝑆 → ite 𝐵 𝑆 𝑆 11 5 525.28 33.88 628.71

Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
284:44 Jiangyi Liu, Charlie Murphy, Anvay Grover, Keith J.C. Johnson, Thomas Reps, and Loris D’Antoni

𝐸 →0 1 1 0.01 0.01 0.05


𝐸 →1 1 1 0.01 0.01 0.04
𝐸 →2 1 1 0.01 0.01 0.05
𝐸 →3 1 1 0.03 0.04 0.10
𝐵 →f 1 1 0.01 0.02 0.07
IntArith
𝐵 →t 1 1 0.01 0.03 0.16
𝐸 →x 2 2 0.01 0.03 0.09
𝐸 →y 2 2 0.01 0.02 0.07
𝐸 →z 2 2 0.02 0.03 0.09
𝐵 → ¬𝐵 4 4 0.06 5.09 15.22
𝐸 →𝐸 × 𝐸 3 3 1.38 12.51 22.58
𝐸 →𝐸 + 𝐸 3 3 1.60 11.42 21.82
𝐵 →𝐸 < 𝐸 6 6 0.73 11.20 26.87
𝐵 →𝐵 ∧ 𝐵 5 5 0.08 8.08 14.95
𝐵 →𝐵 ∨ 𝐵 4 4 0.05 7.70 14.19
𝐸 → ite 𝐵 𝐸 𝐸 4 4 0.78 13.54 31.00
𝐺 →0 1 1 0.01 0.01 0.27
𝐺 →1 1 1 0.01 0.01 0.11
𝐺 →2 1 1 0.01 0.01 0.09
𝐺 →3 1 1 0.05 0.01 0.13
𝐺 →4 1 1 0.01 0.01 0.10
𝐺 →5 1 1 0.05 0.01 0.13
𝐺 →6 1 1 0.09 0.01 0.18
𝐺 →7 1 1 0.18 0.01 0.23
𝐺 →8 1 1 0.01 0.01 0.05
IteExpr

𝐺 →x 1 1 0.02 0.01 0.07


𝐸 → atom 𝐹 2 2 0.03 0.25 0.55
𝑆 → expr 𝐸 1 1 0.02 0.10 0.20
𝐹 → num 𝐺 1 1 0.03 0.30 0.69
𝐹 →𝐹 ×𝐺 2 2 1.19 0.77 3.33
𝐸 →𝐸 + 𝐹 2 2 1.25 0.25 1.79
𝐸 →𝐸 − 𝐹 2 2 1.12 0.27 1.68
𝐹 →𝐹 ÷𝐺 4 3 1.92 0.94 3.88
𝐵 →𝐸 = 𝐸 5 4 0.09 0.24 0.71
𝐵 →𝐸 ≥ 𝐸 5 5 1.79 0.33 2.60
𝐵 →𝐸 > 𝐸 5 5 0.18 0.26 0.79
𝐵 →𝐸 ≤ 𝐸 6 6 0.24 0.56 1.48
𝐵 →𝐸 < 𝐸 5 5 0.12 0.30 0.85
𝐵 →𝐸 ≠ 𝐸 6 6 5.35 0.26 6.11
𝑆 → ite 𝐵 𝐸 𝐸 3 3 0.29 0.29 0.92
𝐸 →0 1 1 0.01 0.01 0.04
𝐸 →1 1 1 0.01 0.01 0.04
𝐵 →f 1 1 0.01 0.01 0.05
𝐵 →t 1 1 0.01 0.01 0.10
𝑆 → dec_var𝑖 3 2 4.29 0.56 5.36
𝑆 → inc_var𝑖 3 2 3.56 0.54 4.51
𝐵 → ¬𝐵 3 2 0.01 1.42 5.53
ImpArr

𝐸 → var𝑖 3 2 0.01 0.28 0.66


𝐸 →𝐸 + 𝐸 3 2 0.02 6.51 12.38
𝐸 →𝐸 − 𝐸 3 2 0.01 6.43 12.13
𝐵 →𝐸 < 𝐸 4 3 0.01 3.38 10.33
𝐵 →𝐵 ∧ 𝐵 4 3 0.06 2.36 6.23
𝑆 → var𝑖 ← 𝐸 2 1 0.03 8.11 11.60
𝐵 →𝐵 ∨ 𝐵 5 4 0.03 2.42 6.14
𝑆 →𝑆 ; 𝑆 3 1 0.02 13.88 25.91
𝑆 → do_while 𝑆 𝐵 5 2 0.22 342.25 499.11
𝑆 → while 𝐵 𝑆 4 2 0.10 218.66 321.11
𝑆 → ite 𝐵 𝑆 𝑆 4 2 0.03 7.08 27.82
𝑅 →? 3 3 3.84 0.07 4.07
𝑅 →a 4 4 11.10 0.07 11.53
𝑅 →b 5 5 11.63 0.06 12.01
RegEx(2)

𝑅 →𝜖 1 1 0.07 0.07 2.38


𝑅 →∅ 1 1 0.19 0.07 0.46
𝑆𝑡𝑎𝑟𝑡 → eval 𝑅 3 3 0.02 4.43 13.40
𝑅 → !𝑅 5 5 2.85 15.77 77.36
𝑅 → 𝑅∗ 6 6 0.99 13.06 31.91
𝑅 →𝑅 · 𝑅 24 24 333.71 72.58 495.45
𝑅 →𝑅 | 𝑅 10 10 10.96 59.54 140.82

Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.
Synthesizing Formal Semantics from Executable Interpreters 284:45

𝐵 →0 1 1 0.01 0.01 0.07


𝐵 →1 1 1 0.01 0.01 0.22
𝐵 →x 2 2 0.01 0.01 0.08
BinOp 𝑁 → atom 𝐵 2 2 0.09 0.04 0.30
𝑀 → atom′ 𝐵 3 3 0.07 0.05 0.26
𝑆 → bin2dec 𝑀 2 2 0.02 0.09 0.30
𝑆 → count 𝑁 2 2 0.04 0.05 0.24
𝑁 → concat 𝑁 𝐵 5 5 8.61 0.22 10.31
𝑀 → concat′ 𝑀 𝐵 5 5 288.81 0.23 308.50
𝐾 →0 1 1 0.01 0.01 0.03
𝐾 →1 1 1 0.01 0.01 0.02
𝐾 →2 1 1 0.01 0.01 0.02
𝐾 →4 1 1 0.01 0.01 0.02
Currency

𝐾 →8 1 1 0.01 0.01 0.02


𝐾 →x 2 2 0.01 0.01 0.10
𝑆 → cny 𝐾 3 3 21.55 0.28 22.48
𝑆 → jpy 𝐾 1 1 0.01 0.10 0.22
𝑆 → usd 𝐾 2 2 19.65 0.21 20.27
𝑆 →𝑆 × 𝐾 2 2 0.03 0.11 0.24
𝑆 →𝑆 + 𝑆 2 2 0.03 0.14 0.33
𝑆 →𝑆 − 𝑆 2 2 0.04 0.16 0.38
𝐾 → 𝐾 +K 𝐾 3 3 0.28 0.35 1.19

Received 2024-04-06; accepted 2024-08-18

Proc. ACM Program. Lang., Vol. 8, No. OOPSLA2, Article 284. Publication date: October 2024.

You might also like