0% found this document useful (0 votes)
31 views193 pages

Aldrich Prog Analysis

The document describes the While programming language and its 3-address code representation Which3Addr that is used for program analysis. It defines the syntax of While and While3Addr and discusses control flow graphs and extensions to While3Addr like functions, methods, pointers etc. It also briefly introduces operational semantics for defining program meaning formally.

Uploaded by

fithis2001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views193 pages

Aldrich Prog Analysis

The document describes the While programming language and its 3-address code representation Which3Addr that is used for program analysis. It defines the syntax of While and While3Addr and discusses control flow graphs and extensions to While3Addr like functions, methods, pointers etc. It also briefly introduces operational semantics for defining program meaning formally.

Uploaded by

fithis2001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 193

Lecture Notes: The While Language and Program Semantics

17-355/17-665/17-819O: Program Analysis (Spring 2018)


Claire Le Goues and Jonathan Aldrich
[email protected], [email protected]

1 The While Language


We will study the theory of analyses using a simple programming language called While, with
various extensions. The While language is at least as old as Hoare’s 1969 paper on a logic for
proving program properties. It is a simple imperative language, with (to start!) assignment to local
variables, if statements, while loops, and simple integer and boolean expressions.
We use the following metavariables to describe different categories of syntax. The letter on the
left will be used as a variable representing a piece of a program. On the right, we describe the kind
of program piece that variable represents:

S statements
a arithmetic expressions (AExp)
x, y program variables (Vars)
n number literals
P boolean predicates (BExp)
The syntax of While is shown below. Statements S can be an assignment x := a; a skip
statement, which does nothing;1 and if and while statements, with boolean predicates P as
conditions. Arithmetic expressions a include variables x, numbers n, and one of several arithmetic
operators (opa ). Predicates are represented by Boolean expressions that include true, false, the
negation of another Boolean expression, Boolean operators opb applied to other Boolean expressions,
and relational operators opr applied to arithmetic expressions.

S ::= x := a P ::= true a ::= x opb ::= and | or


| skip | false | n opr ::= < | ≤ | =
| S1 ; S2 | not P | a1 opa a2 | > | ≥
| if P then S1 else S2 | P1 opb P2 opa ::= +|−|∗|/
| while P do S | a1 opr a2

2 While3Addr: A Representation for Analysis


For analysis, the source-like definition of While can sometimes prove inconvenient. For exam-
ple, While has three separate syntactic forms—statements, arithmetic expressions, and boolean
predicates—and we would have to define the semantics and analysis of each separately to reason
about it. A simpler and more regular representation of programs will help simplify certain of our
formalisms.
1
Similar to a lone semicolon or open/close bracket in C or Java

1
As a starting point, we will eliminate recursive arithmetic and boolean expressions and replace
them with simple atomic statement forms, which are called instructions, after the assembly language
instructions that they resemble. For example, an assignment statement of the form w = x ∗ y + z
will be rewritten as a multiply instruction followed by an add instruction. The multiply assigns to
a temporary variable t1 , which is then used in the subsequent add:

t1 = x ∗ y
w = t1 + z
As the translation from expressions to instructions suggests, program analysis is typically stud-
ied using a representation of programs that is not only simpler, but also lower-level than the source
(While, in this instance) language. Many Java analyses are actually conducted on byte code, for
example. Typically, high-level languages come with features that are numerous and complex, but
can be reduced into a smaller set of simpler primitives. Working at the lower level of abstraction
thus also supports simplicity in the compiler.
Control flow constructs such as if and while are similarly translated into simpler jump and
conditional branch constructs that jump to a particular (numbered) instruction. For example, a
statement of the form if P then S1 else S2 would be translated into:

1: if P then goto 4
2: S2
3: goto 5
4: S1

Exercise 1. How would you translate a While statement of the form while P do S?

This form of code is often called 3-address code, because every instruction has at most two
source operands and one result operand. We now define the syntax for 3-address code produced
from the While language, which we will call While3Addr. This language consists of a set of
simple instructions that load a constant into a variable, copy from one variable to another, compute
the value of a variable from two others, or jump (possibly conditionally) to a new address n. A
program P is just a map from addresses to instructions:2

I ::= x := n op ::= + | − | ∗ | /
| x := y opr ::= < | =
| x := y op z P ∈ N→I
| goto n
| if x opr 0 goto n

Formally defining a translation from a source language such as While to a lower-level in-
termediate language such as While3Addr is possible, but more appropriate for the scope of a
compilers course. For our purposes, the above should suffice as intuition. We will formally define
the semantics of While3Addr in subsequent lectures.
2
The idea of the mapping between numbers and instructions maps conceptually to Nielsens’ use of labels in the
While language specification in the textbook. This concept is akin to mapping line numbers to code.

2
3 Extensions
The languages described above are sufficient to introduce the fundamental concepts of program
analysis in this course. However, we will eventually examine various extensions to While and
While3Addr, so that we can understand how more complicated constructs in real languages can
be analyzed. Some of these extensions to While3Addr will include:

I ::= ...
| x := f (y) function call
| return x return
| x := y.m(z) method call
| x := &p address-of operator
| x := ∗p pointer dereference
| ∗p := x pointer assignment
| x := y.f field read
| x.f := y field assignment
We will not give semantics to these extensions now, but it is useful to be aware of them as you
will see intermediate code like this in practical analysis frameworks.

4 Control flow graphs


Program analysis tools typically work on a representation of code as a control-flow graph (CFG),
which is a graph-based representation of the flow of control through the program. It connects simple
instructions in a way that statically captures all possible execution paths through the program and
defines the execution order of instructions in the program. When control could flow in more than
one direction, depending on program values, the graph branches. An example is the representation
of an if or while statement. At the end of the instructions in each branch of an if statement, the
branches merge together to point to the single instruction that comes afterward. Historically, this
arises from the use of program analysis to optimize programs.
More precisely, a control flow graph consists of a set of nodes and edges. The nodes N correspond
to basic blocks: Sequences of program instructions with no jumps in or out (no gotos, no labeled
targets). The edges E represent the flow of control between basic blocks. We use Pred(n) to denote
the set of all predecessors of the node n, and Succ(n) the set of all successors. A CFG has a
start node, and a set of final nodes, corresponding to return or other termination of a function.
Finally, for the purposes of dataflow analysis, we say that a program point exists before and after
each node. Note that there exists considerable flexibility in these definitions, and the precision of
the representation can vary based on the desired precision of the resulting analysis as well as the
peculiarities of the language. In this course we will in fact often ignore the concept of a basic block
and just treat instructions as the nodes in a graph; this view is semantically equivalent and simpler,
but less efficient in practice. Further defining and learning how to construct CFGs is a subject best
left to a compilers course; this discussion should suffice for our purposes.

3
Lecture Notes: Program Semantics

17-355/17-665/17-819O: Program Analysis (Spring 2018)


Claire Le Goues and Jonathan Aldrich
[email protected], [email protected]

1 Operational Semantics
To reason about analysis correctness, we need a clear definition of what a program means. One way
to do this is using natural language (e.g., the Java Language Specification). However, although
natural language specifications are accessible, they are also often imprecise. This can lead to many
problems, including incorrect compiler implementations or program analyses.
A better alternative is a formal definition of program semantics. We begin with operational
semantics, which mimics, at a high level, the operation of a computer executing the program. Such
a semantics also reflects the way that techniques such as dataflow analysis or Hoare Logic reason
about the program, so it is convenient for our purposes.
There are two broad classes of operational semantics: big-step operational semantics, which spec-
ifies the entire operation of a given expression or statement; and small-step operational semantics,
which specifies the operation of the program one step at a time.

1.1 W HILE: Big-step operational semantics


We’ll start by restricting our attention to arithmetic expressions, for simplicity. What is the mean-
ing of a W HILE expression? Some expressions, like a natural number, have a very clear meaning:
The “meaning” of 5 is just, well, 5. But what about x + 5? The meaning of this expression clearly
depends on the value of the variable x. We must abstract the value of variables as a function from
variable names to integer values:

E ∈ Var → Z
Here E denotes a particular program state. The meaning of an expression with a variable like x + 5
involves “looking up” the x’s value in the associated E, and substituting it in. Given a state, we
can write a judgement as follows:

he, Ei ⇓ n
This means that given program state E, the expression e evaluates to n. This formulation is called
big-step operational semantics; the ⇓ judgement relates an expression and its “meaning.”1 We then
build up the meaning of more complex expressions using rules of inference (also called derivation
or evaluation rules). An inference rule is made up of a set of judgments above the line, known as
premises, and a judgment below the line, known as the conclusion. The meaning of an inference
rule is that the conclusion holds if all of the premises hold:
1
Note that I have chosen ⇓ because it is a common notational convention; it’s not otherwise special. This is true for
many notational choices in formal specification.

1
premise1 premise2 . . . premisen
conclusion
An inference rule with no premises is an axiom, which is always true. For example, integers always
evaluate to themselves, and the meaning of a variable is its stored value in the state:
big-int big-var
hn, Ei ⇓ n hx, Ei ⇓ E(x)

Addition expressions illustrate a rule with premises:

he1 , Ei ⇓ n1 he2 , Ei ⇓ n2
big-add
he1 + e2 , Ei ⇓ n1 + n2
But, how does the value of x come to be “stored” in E? For that, we must consider W HILE
Statements. Unlike expressions, statements have no direct result. However, they can have side
effects. That is to say: the “result” or meaning of a Statement is a new state. The judgement ⇓ as
applied to statements and states therefore looks like:

hS, Ei ⇓ E 0
This allows us to write inference rules for statements, bearing in mind that their meaning is not
an integer, but a new state. The meaning of skip, for example, is an unchanged state:

big-skip
hskip, Ei ⇓ E
Statement sequencing, on the other hand, does involve premises:

hs1 , Ei ⇓ E 0 hs2 , E 0 i ⇓ E 00
big-seq
hs1 ; s2 , Ei ⇓ E 00
The if statement involves two rules, one for if the boolean predicate evaluates to true (rules
for boolean expressions not shown), and one for if it evaluates to false. I’ll show you just the
first one for demonstration:

hb, Ei ⇓ true hs1 , Ei ⇓ E 0


big-iftrue
hif b then s1 else s2 , Ei ⇓ E 0
What should the second rule for if look like?

This brings us to assignments, which produce a new state in which the variable being assigned to
is mapped to the value from the right-hand side. We write this with the notation E[x 7→ n], which
can be read “a new state that is the same as E except that x is mapped to n.”

he, Ei ⇓ n
big-assign
hx := e, Ei ⇓ E[x 7→ n]
Note that the update to the state is modeled functionally; the variable E still refers to the old
state, while E[x 7→ n] is the new state represented as a mathematical map.
Fully specifying the semantics of a language requires a judgement rule like this for every lan-
guage construct. These notes only include a subset for W HILE, for brevity.

Exercise 1. What are the rule(s) for while?

2
1.2 W HILE: Small-step operational semantics
Big-step operational semantics has its uses. Among other nice features, it directly suggests a sim-
ple interpreter implementation for a given language. However, it is difficult to talk about a state-
ment or program whose evaluation does not terminate. Nor does it give us any way to talk about
intermediate states (so modeling multiple threads of control is out).
Sometimes it is instead useful to define a small-step operational semantics, which specifies pro-
gram execution one step at a time. We refer to the pair of a statement and a state (hS, Ei) as
a configuration. Whereas big step semantics specifies program meaning as a function between a
configuration and a new state, small step models it as a step from one configuration to another.
You can think of small-step semantics as a set of rules that we repeatedly apply to configura-
tions until we reach a final configuration for the language (hskip, Ei, in this case) if ever.2 We write
this new judgement using a slightly different arrow: →. hS, Ei → hS 0 , E 0 i indicates one step of ex-
ecution; hS, Ei →∗ hS 0 , E 0 i indicates zero or more steps of execution. We formally define multiple
execution steps as follows:

hS, Ei → hS 0 , E 0 i hS 0 , E 0 i →∗ hS 00 , E 00 i
multi-reflexive multi-inductive
hS, Ei →∗ hS, Ei hS, Ei →∗ hS 00 , E 00 i

To be complete, we should also define auxiliary small-step operators →a and →b for arithmetic
and boolean expressions, respectively; only the operator for statements results in an updated state
(as in big step). The types of these judgements are thus:

→ : (Stmt × E) → (Stmt × E)
→a : (Aexp × E) → Aexp
→b : (Bexp × E) → Bexp
We can now again write the semantics of a W HILE program as new rules of inference. Some rules
look very similar to the big-step rules, just with a different arrow. For example, consider variables:

small-var
hx, Ei →a E(x)
Things get more interesting when we return to statements. Remember, small-step semantics ex-
press a single execution step. So, consider an if statement:

hb, Ei →b b0
small-if-congruence
hif b then S1 else S2 , Ei → hif b0 then S1 else S2 , Ei

small-iftrue
hif true then s1 else s2 , Ei → hs1 , E 0 i

Exercise 2. We have again omitted the small-iffalse case, as well as rule(s) for while, as exercises
to the reader.

Note also the change for statement sequencing:

hs1 , Ei → hs01 , E 0 i
small-seq-congruence
hs1 ; s2 , Ei → hs01 ; s2 E 0 i
2
Not all statements reach a final configuration, like while true do skip.

3
small-seq
hskip; s2 , Ei → hs2 , Ei

1.3 W HILE 3A DDR: Small-step semantics


The ideas behind big- and small-step operational semantics are consistent across languages, but
the way they are written can vary based on what is notationally convenient for a particular lan-
guage or analysis. W HILE 3A DDR is slightly different from W HILE, so beyond requiring different
rules for its different constructs, it makes sense to modify our small-step notation a bit for defining
the meaning of a W HILE 3A DDR program.
First, let’s revisit the configuration to account for the slightly different meaning of a W HILE 3A DDR
program. As before, the configuration must include the state, which we still call E, mapping vari-
ables to values. However, a well-formed, terminating W HILE program was effectively a single
statement that can be iteratively reduced to skip; a W HILE 3A DDR program, on the other hand, is
a mapping from natural numbers to program instructions. So, instead of a statement that is being
reduced in steps, the W HILE 3A DDR c must includes a program counter n, representing the next
instruction to be executed.
Thus, a configuration c of the abstract machine for W HILE 3A DDR must include the stored
program P (which we will generally treat implicitly), the state environment E, and the current
program counter n representing the next instruction to be executed (c ∈ E × N). The abstract
machine executes one step at a time, executing the instruction that the program counter points to,
and updating the program counter and environment according to the semantics of that instruction.
This adds a tiny bit of complexity to the inference rules, because they must explicitly consider
the mapping between line number/labels and program instructions. We represent execution of
the abstract machine via a judgment of the form P ` hE, ni ; hE 0 , n0 i The judgment reads:
“When executing the program P , executing instruction n in the state E steps to a new state E 0 and
program counter n0 .”3 To see this in action, consider a simple inference rule defining the semantics
of the constant assignment instruction:

P [n] = x := m
step-const
P ` hE, ni ; hE[x 7→ m], n + 1i
This states that in the case where the nth instruction of the program P (looked up using P [n])
is a constant assignment x := m, the abstract machine takes a step to a state in which the state E
is updated to map x to the constant m, written as E[x 7→ m], and the program counter now points
to the instruction at the following address n + 1. We similarly define the remaining rules:

P [n] = x := y
step-copy
P ` hE, ni ; hE[x 7→ E[y]], n + 1i

P [n] = x := y op z E[y] op E[z] = m


step-arith
P ` hE, ni ; hE[x 7→ m], n + 1i
3
I could have used the same → I did above instead of ;, but I don’t want you to mix them up.

4
P [n] = goto m
step-goto
P ` hE, ni ; hE, mi

P [n] = if x opr 0 goto m E[x] opr 0 = true


step-iftrue
P ` hE, ni ; hE, mi

P [n] = if x opr 0 goto m E[x] opr 0 = f alse


step-iffalse
P ` hE, ni ; hE, n + 1i

1.4 Derivations and provability


Among other things, we can use operational semantics to prove that concrete program expressions
will evaluate to particular values. We do this by chaining together rules of inference (which simply
list the hypotheses necessary to arrive at a conclusion) into derivations, which interlock instances
of rules of inference to reach particular conclusions. For example:

h4, E1 i ⇓ 4 h2, E1 i ⇓ 2
h4 ∗ 2, E1 i ⇓ 8 h6, E1 i ⇓ 6
h(4 ∗ 2) − 6, E1 i ⇓ 2
We say that he, Ei ⇓ n is provable (expressed mathematically as ` he, Ei ⇓ n) if there exists a
well-formed derivation with he, Ei ⇓ n as its conclusion. “Well formed” simply means that every
step in the derivation is a valid instance of one of the rules of inference for this system.
A proof system like our operational semantics is complete if every true statement is provable.
It is sound (or consistent) if every provable judgement is true. Typically, a system of semantics is
always complete, unless you forget a rule; soundness can be easier to mess up!

2 Proof techniques using operational semantics


A precise language specification lets us precisely prove properties of our language or programs
written in it (and analyses of those programs!). Note that this exposition primarily uses big-step
semantics to illustrate, but the concepts generalize.
Well-founded induction. A key family of proof techniques in programming languages is based
on induction. You may already be familiar with mathematical induction. As a reminder: if P (n) is
a property of the natural numbers that we want to show holds for all n, mathematical induction
says that it suffices to show that P (0) is true (the base case), and then that if P (m) is true, then so is
P (m + 1) for any natural number m (the inductive step). This works because there are no infinite
descending chains of natural numbers. So, for any n, P (n) can be obtained by simply starting
from the base case and applying n instances of the inductive step.
Mathematical induction is a special case of well-founded induction, a general, powerful proof
principle that works as follows: a relation ⊆ A × A is well-founded if there are no infinite de-
scending chains in A. If so, to prove ∀x ∈ A.P (x) it is enough to prove ∀x ∈ A.[∀y  x ⇒ P (y)] ⇒
P (x).4
4
Mathematical induction as a special case arises when  is simply the predecessor relation ((x, x + 1)|x ∈ N).

5
Structural induction. Structural induction is another special case of well-founded induction where
the  relation is defined on the structure of a program or a derivation. For example, consider
the syntax of arithmetic expressions in W HILE, Aexp. Induction on a recursive definition like this
proves a property about a a mathematical structure by demonstrating that the property holds for
all possible forms of that structure. We define the relation a  b to hold if a is a substructure of b.
For Aexp expressions, the relation ⊆ Aexp × Aexp is:

e1  e1 + e2
e1  e1 ∗ e2
e2  e1 + e2
e2  e1 ∗ e2
. . . etc., for all arithmetic operators opa

To prove that a property P holds for all arithmetic expressions in W HILE (or, ∀e ∈ Aexp.P (e)),
we must show P holds for both the base cases and the inductive cases. e is a base case if there is
no e0 such that e0  e; e is an inductive case if ∃e0 .e0  e. There is thus one proof case per form of
the expression. For Aexp, the base cases are:

` ∀n ∈ Z.P (n)
` ∀x ∈ Vars.P (x)

And the inductive cases:

` ∀e1 , e2 ∈ Aexp.P (e1 )P (e2 ) ⇒ P (e1 + e2 )


` ∀e1 , e2 ∈ Aexp.P (e1 )P (e2 ) ⇒ P (e1 ∗ e2 )
. . . and so on for the other arithmetic operators. . .

Example. Let L(e) be the number of literals and variable occurrences in some expression e and O(e)
be the number of operators in e. Prove by induction on the structure of e that ∀e ∈ Aexp.L(e) =
O(e) + 1:
Base cases:
• Case e = n.L(e) = 1 and O(e) = 0
• Case e = x.L(e) = 1 and O(e) = 0
Inductive case 1: Case e = e1 + e2
• By definition, L(e) = L(e1 ) + L(e2 ) and O(e) = O(e1 ) + O(e2 ) + 1.
• By the induction hypothesis, L(e1 ) = O(e1 ) + 1 and L(e2 ) = O(e2 ) + 1.
• Thus, L(e) = O(e1 ) + O(e2 ) + 2 = O(e) + 1.
The other arithmetic operators follow the same logic.
Other proofs for the expression sublanguages of W HILE can be similarly conducted. For ex-
ample, we could prove that the small-step and big-step semantics will obtain equivalent results
on expressions:

∀e ∈ AExp.∀n ∈ N.e →∗a n ⇔ e ⇓ n

6
The actual proof is left as an exercise, but note that this works because the semantics rules for
expressions are strictly syntax-directed: the meaning of an expression is determined entirely by
the meaning of its subexpressions, the structure of which guides the induction.
Induction on the structure of derivations. Unfortunately, that last statement is not true for state-
ments in the W HILE language. For example, imagine we’d like to prove that W HILE is deterministic
(that is, if a statement terminates, it always evaluates to the same value). More formally, we want
to prove that:

∀e ∈ Aexp. ∀E ∈ E.∀n, n0 ∈ N. he, Ei ⇓ n ∧ he, Ei ⇓ n0 ⇒ n = n0 (1)


0 0 0
∀b ∈ Bexp. ∀E ∈ E.∀t, t ∈ B. hb, Ei ⇓ t ∧ hb, Ei ⇓ b ⇒ b = b (2)
0 00 0 00 0 00
∀s ∈ S. ∀E, E , E ∈ E. hs, Ei ⇓ E ∧ hs, Ei ⇓ E ⇒ E = E (3)

We can’t prove the third statement with structural induction on the language syntax because
the evaluation of statements (like while) does not depend only on the evaluation of its subexpres-
sions.
Fortunately, there is another way. Recall that the operational semantics assign meaning to pro-
grams by providing rules of inference that allow us to prove judgements by making derivations.
Derivation trees (like the expression trees we discussed above) are also defined inductively, and
are built of sub-derivations. Because they have structure, we can again use structural induction,
but here, on the structure of derivations.
Instead of assuming (and reasoning about) some statement s ∈ S, we instead assume a deriva-
tion D :: hs, Ei ⇓ E 0 and induct on the structure of that derivation (we define D :: Judgement to
mean “D is the derivation that proves judgement.” e.g., D :: hx + 1, Ei ⇓ 2). That is, to prove that
property P holds for a statement, we will prove that P holds for all possible derivations of that
statement. Such a proof consists of the following steps:
Base cases: show that P holds for each atomic derivation rule with no premises (of the form S).
Inductive cases: For each derivation rule of the form

H1 ...Hn
S
By the induction hypothesis, P holds for Hi , where i = 1 . . . n. We then have to prove that the
property is preserved by the derivation using the given rule of inference.
A key technique for induction on derivations is inversion. Because the number of forms of
rules of inference is finite, we can tell which inference rules might have been used last in the
derivation. For example, given D :: hx := 55, Ei i ⇓ E, we know (by inversion) that the assignment
rule of inference must be the last rule used in D (because no other rules of inference involve an
assignment statement in their concluding judgment). Similarly, if D :: hwhile b do c, Ei i ⇓ E, then
(by inversion) the last rule used in D was either the while-true rule or the while-false rule.
Given those preliminaries, to prove that the evaluation of statements is deterministic (equation
(3) above), pick arbitrary s, E, E 0 , and D :: hs, Ei ⇓ E 0
Proof: by induction of the structure of the derivation D, which we define D :: hs, Ei ⇓ E 0 .
Base case: the one rule with no premises, skip:

D :: hskip, Ei ⇓ E

7
By inversion, the last rule used in D0 (which, again, produced E 00 ) must also have been the rule
for skip. By the structure of the skip rule, we know E 00 = E.
Inductive cases: We need to show that the property holds when the last rule used in D was each
of the possible non-skip W HILE commands. I will show you one representative case; the rest are
left as an exercise. If the last rule used was the while-true statement:

D1 :: hb, Ei ⇓ true D2 :: hs, Ei ⇓ E1 D3 :: hwhile b do S, Ei ⇓ E 0


D :: hwhile b do s, Ei ⇓ E 0
Pick arbitrary E 00 such that D00 :: hwhile b do S, Ei ⇓ E 00
By inversion, and determinism of boolean expressions, D00 must also use the same while-true
rule. So D00 must also have subderivations D200 :: hs, Ei ⇓ E100 and D300 :: hwhile b do s, E100 i ⇓ E 00 . By
the induction hypothesis on D2 with D200 , we know E1 = E100 . Using this result and the induction
hypothesis on D3 with D300 , we have E 00 = E 0 .

8
Lecture Notes: A Dataflow Analysis Framework for While3Addr
17-355/17-665/17-819O: Program Analysis (Spring 2018)
Claire Le Goues and Jonathan Aldrich
[email protected], [email protected]

1 Defining a dataflow analysis


A dataflow analysis computes some dataflow information at each program point in the control flow
graph. We thus start by examining how this information is defined. We will use σ to denote this
information. Typically σ tells us something about each variable in the program. For example, σ
may map variables to abstract values taken from some set L:

σ P Var Ñ L
L represents the set of abstract values we are interested in tracking in the analysis. This varies
from one analysis to another. For example, consider a zero analysis, which tracks whether each
variable is zero or not at each program point (Thought Question: Why would this be useful?). For
this analysis, we define L to be the set tZ, N, Ju. The abstract value Z represents the value 0,
N represents all nonzero values. J is pronounced “top”, and we define it more concretely later it
in these notes; we use it as a question mark, for the situations when we do not know whether a
variable is zero or not, due to imprecision in the analysis.
Conceptually, each abstract value represents a set of one or more concrete values that may occur
when a program executes. We define an abstraction function α that maps each possible concrete
value of interest to an abstract value:

α:ZÑL
For zero analysis, we define α so that 0 maps to Z and all other integers map to N :

αZ p0q “ Z
αZ pnq “ N where n ‰ 0

The core of any program analysis is how individual instructions in the program are analyzed
and affect the analysis state σ at each program point. We define this using flow functions that map
the dataflow information at the program point immediately before an instruction to the dataflow
information after that instruction. A flow function should represent the semantics of the instruction,
but abstractly, in terms of the abstract values tracked by the analysis. We will link semantics to the
flow function precisely when we talk about correctness of dataflow analysis. For now, to approach
the idea by example, we define the flow functions fZ for zero analysis on While3Addr as follows:

1
fZ vx :“ 0wpσq “ rx ÞÑ Zsσ (1)
fZ vx :“ nwpσq “ rx ÞÑ N sσ where n ‰ 0 (2)
fZ vx :“ ywpσq “ rx ÞÑ σpyqsσ (3)
fZ vx :“ y op zwpσq “ rx ÞÑ Jsσ (4)
fZ vgoto nwpσq “σ (5)
fZ vif x “ 0 goto nwpσq “σ (6)
In the notation, the form of the instruction is an implicit argument to the function, which is
followed by the explicit dataflow information argument, in the form fZ vIwpσq. (1) and (2) are for
assignment to a constant. If we assign 0 to a variable x, then we should update the input dataflow
information σ so that x maps to the abstract value Z. The notation rx ÞÑ Zsσ denotes dataflow
information that is identical to σ except that the value in the mapping for x is updated to refer
to Z. Flow function (3) is for copies from a variable y to another variable x: we look up y in σ,
written σpyq, and update σ so that x maps to the same abstract value as y.
We start with a generic flow function for arithmetic instructions (4). Arithmetic can produce
either a zero or a nonzero value, so we use the abstract value J to represent our uncertainty. More
precise flow functions are available based on certain instructions or operands. For example, if the
instruction is subtraction and the operands are the same, the result will definitely be zero. Or, if
the instruction is addition, and the analysis information tells us that one operand is zero, then the
addition is really a copy and we can use a flow function similar to the copy instruction above. These
examples could be written as follows (we would still need the generic case above for instructions
that do not fit such special cases):

fZ vx :“ y ´ ywpσq “ rx ÞÑ Zsσ
fZ vx :“ y ` zwpσq “ rx ÞÑ σpyqsσ where σpzq “ Z

Exercise 1. Define another flow function for some arithmetic instruction and certain conditions
where you can also provide a more precise result than J.

The flow function for branches ((5) and (6)) is trivial: branches do not change the state of the
machine other than to change the program counter, and thus the analysis result is unaffected.
However, we can provide a better flow function for conditional branches if we distinguish the
analysis information produced when the branch is taken or not taken. To do this, we extend our
notation once more in defining flow functions for branches, using a subscript to the instruction to
indicate whether we are specifying the dataflow information for the case where the condition is
true (T ) or when it is false (F ). For example, to define the flow function for the true condition
when testing a variable for equality with zero, we use the notation fZ vif x “ 0 goto nwT pσq. In this
case we know that x is zero so we can update σ with the Z lattice value. Conversely, in the false
condition we know that x is nonzero:

fZ vif x “ 0 goto nwT pσq “ rx ÞÑ Zsσ


fZ vif x “ 0 goto nwF pσq “ rx ÞÑ N sσ

Exercise 2. Define a flow function for a conditional branch testing whether a variable x ă 0.

2
2 Running a dataflow analysis
The point of developing a dataflow analysis is to compute information about possible program
states at each point in a program. For example, for of zero analysis, whenever we divide some
expression by a variable x, we might like to know whether x must be zero (the abstract value Z)
or may be zero (represented by J) so that we can warn the developer.

2.1 Straightline code


Consider the following simple program (left), with its control flow graph (middle):

x y z
1: x :“ 0
1 Z
2: y :“ 1
2 Z N
3: z :“ y
3 Z N N
4: y :“ z ` x
4 Z N N
5: x :“ y ´ z
5 J N N

We simulate running the program in the analysis, using the flow function to compute, for each
instruction in turn, the dataflow analysis information after the instruction from the information we
had before the instruction. For such simple code, it is easy to track the analysis information using
a table with a column for each program variable and a row for each program point (right, above).
The information in a cell tells us the abstract value of the column’s variable immediately after the
instruction at that line (corresponding the the program points labeled with circles in the CFG).
Notice that the analysis is imprecise at the end with respect to the value of x. We were able to
keep track of which values are zero and nonzero quite well through instruction 4, using (in the last
case) the flow function that knows that adding a variable known to be zero is equivalent to a copy.
However, at instruction 5, the analysis does not know that y and z are equal, and so it cannot
determine whether x will be zero. Because the analysis is not tracking the exact values of variables,
but rather approximations, it will inevitably be imprecise in certain situations. However, in practice,
well-designed approximations can often allow dataflow analysis to compute quite useful information.

2.2 Alternative paths: Example


Things get more interesting in While3Addr code that contains if statements. In this case, there
are two possible paths through the program. Consider the following simple example (left), and its
CFG (middle). I have begun by analyzing one path through the program (the path in which the
branch is not taken):

3
x y z
1: if x “ 0 goto 4
2: y :“ 0 1 ZT , NF
3: goto 6 2 N Z
4: y :“ 1 3 N Z
5: x :“ 1 4
6: z :“ y 5
6 N Z Z

In the table above, the entry for x on line 1 indicates the different abstract values produced for
the true and false conditions of the branch. We use the false condition (x is nonzero) in analyzing
instruction 2. Execution proceeds through instruction 3, at which point we jump to instruction 6.
We have not yet analyzed a path through lines 4 and 5.
Turning to that alternative path, we can start by analyzing instructions 4 and 5 as if we had
taken the true branch at instruction 1:

x y z
1 ZT , NF
2 N Z
3 N Z
4 Z N
5 N N
6 N Z Z note: incorrect!

We have a dilemma in analyzing instruction 6. We already analyzed it with respect to the


previous path, assuming the dataflow analysis we computed from instruction 3, where x was nonzero
and y was zero. However, we now have conflicting information from instruction 5: in this case, x
is still nonzero, but y is also nonzero in this case.
We resolve this dilemma by combining the abstract values computed along the two paths for
y and z. The incoming abstract values at line 6 for y are N and Z. We can represent this
uncretainty with the abstract value J, indicating that we do know know if y is zero or not at this
instruction, because of the uncertainty about how we reached this program location. We can apply
similar logic in the case of x, but because x is nonzero on both incoming paths we can maintain
our knowledge that x is nonzero. Thus, we should reanalyze instruction 5 assuming the dataflow
analysis information tx ÞÑ N, y ÞÑ Ju. The results of our final analysis are shown below:

4
x y z
1 ZT , NF
2 N Z
3 N Z
4 Z N
5 N N
6 N J J corrected

2.3 Join
We generalize the procedure of combining analysis results along multiple paths by using a join
operation, \. When taking two abstract values l1 , l2 P L, the result of l1 \ l2 is an abstract value
lj that generalizes both l1 and l2 .
To precisely define what “generalizes” means, we define a partial order Ď over abstract values,
and say that l1 and l2 are at least as precise as lj , written l1 Ď lj . Recall that a partial order is
any relation that is:
• reflexive: @l : l Ď l
• transitive: @l1 , l2 , l3 : l1 Ď l2 ^ l2 Ď l3 ñ l1 Ď l3
• anti-symmetric: @l1 , l2 : l1 Ď l2 ^ l2 Ď l1 ñ l1 “ l2
A set of values L that is equipped with a partial order Ď, and for which the least upper bound
of any two values in that ordering l1 \ l2 is unique and is also in L, is called a join-semilattice. Any
join-semilattice has a maximal element J (pronounced “top”). We require that the abstract values
used in dataflow analyses form a join-semilattice. We will use the term lattice for short; as we will
see below, this is the correct terminology for most dataflow analyses anyway. For zero analysis, we
define the partial order with Z Ď J and N Ď J, where Z \ N “ J.
We have now introduced and considered all the elements necessary to define a dataflow analysis:
• a lattice pL, Ďq
• an abstraction function α
• initial dataflow analysis assumptions σ0
• a flow function f
Note that the theory of lattices answers a side question that comes up when we begin analyzing
the first program instruction: what should we assume about the value of input variables (like x on
program entry)? If we do not know anything about the value x can be, a good choice is to assume
it can be anything; That is, in the initial environment σ0 , input variables like x are mapped to J.

2.4 Dataflow analysis of loops


We now consider While3Addr programs with loops. While an if statement produces two paths
that diverge and later join, a loop produces an potentially unbounded number of program paths.

5
Despite this, we would like to analyze looping programs in bounded time. Let us examine how
through the following simple looping example:1

x y z
1: x :“ 10
1 N
2: y :“ 0
2 N Z
3: z :“ 0
3 N Z Z
4: if x “ 0 goto 8
4 ZT , NF Z Z
5: y :“ 1
5 N N Z
6: x :“ x ´ 1
6 J N Z
7: goto 4
7 J N Z
8: x :“ y
8

The right-hand side above shows the straightforward straight-line analysis of the path that runs
the loop once. We must now re-analyze instruction 4. This should not be surprising; it is analogous
to the one we encountered earlier, merging paths after an if instruction. To determine the analysis
information at instruction 4, we join the dataflow analysis information flowing in from instruction
3 with the dataflow analysis information flowing in from instruction 7. For x we have N \ J “ J.
For y we have Z \ N “ J. For z we have Z \ Z “ Z. The information for instruction 4 is therefore
unchanged, except that for y we now have J.
We can now choose between two paths once again: staying within the loop, or exiting out to
instruction 8. We will choose (arbitrarily, for now) to stay within the loop, and consider instruction
5. This is our second visit to instruction 5, and we have new information to consider: since we have
gone through the loop, the assignment y :“ 1 has been executed, and we have to assume that y may
be nonzero coming into instruction 5. This is accounted for by the latest update to instruction 4’s
analysis information, in which y is mapped to J. Thus the information for instruction 4 describes
both possible paths. We must update the analysis information for instruction 5 so it does so as
well. In this case, however, since the instruction assigns 1 to y, we still know that y is nonzero after
it executes. In fact, analyzing the instruction again with the updated input data does not change
the analysis results for this instruction.
A quick check shows that going through the remaining instructions in the loop, and even
coming back to instruction 4, the analysis information will not change. That is because the flow
functions are deterministic: given the same input analysis information and the same instruction,
they will produce the same output analysis information. If we analyze instruction 6, for example,
the input analysis information from instruction 5 is the same input analysis information we used
when analyzing instruction 6 the last time around. Thus, instruction 6’s output information will
not change, and so instruction 7’s input information will not change, and so on. No matter which
1
I provide the CFG for reference but omit the annotations in the interest of a cleaner diagram.

6
instruction we run the analysis on, anywhere in the loop (and in fact before the loop), the analysis
information will not change.
We say that the dataflow analysis has reached a fixed point.2 In mathematics, a fixed point of a
function is a data value v that is mapped to itself by the function, i.e. f pvq “ v. In this analysis, the
mathematical function is the flow function, and the fixed point is a tuple of the dataflow analysis
values at each program point. If we invoke the flow function on the fixed point, the analysis results
do not change (we get the same fixed point back).
Once we have reached a fixed point of the function for this loop, it is clear that further analysis
of the loop will not be useful. Therefore, we will proceed to analyze statement 8. The final analysis
results are as follows:

x y z
1 N
2 N Z
3 N Z Z
4 ZT , NF J Z updated
5 N N Z already at fixed point
6 J N Z already at fixed point
7 J N Z already at fixed point
8 Z J Z

Quickly simulating a run of the program program shows that these results correctly approximate
actual execution. The uncertainty in the value of x at instructions 6 and 7 is real: x is nonzero
after these instructions, except the last time through the loop, when it is zero. The uncertainty in
the value of y at the end shows imprecision in the analysis: this loop always executes at least once,
so y will be nonzero. However, the analysis (as currently formulated) cannot tell this for certain, so
it reports that it cannot tell if y is zero or not. This is safe—it is always correct to say the analysis
is uncertain—but not as precise as would be ideal.
The benefit of analysis, however, is that we can gain correct information about all possible
executions of the program with only a finite amount of work. In our example, we only had to
analyze the loop statements at most twice each before reaching a fixed point. This is a significant
improvement over the actual program execution, which runs the loop 10 times. We sacrificed
precision in exchange for coverage of all possible executions, a classic tradeoff in static analysis.
How can we be confident that the results of the analysis are correct, besides simulating every
possible run of a (possibly very complex) program? The intuition behind correctness is the invariant
that at each program point, the analysis results approximate all the possible program values that
could exist at that point. If the analysis information at the beginning of the program correctly
approximates the program arguments, then the invariant is true at the beginning of program
execution. One can then make an inductive argument that the invariant is preserved. In particular,
when the program executes an instruction, the instruction modifies the program’s state. As long
as the flow functions account for every possible way that instruction can modify state, then at the
analysis fixed point they will have correctly approximated actual program execution. We will make
this argument more precise in a future lecture.
2
Sometimes abbreviated in one word as fixpoint.

7
2.5 A convenience: the K abstract value and complete lattices
As we think about defining an algorithm for dataflow analysis more precisely, a natural question
comes up concerning how instruction 4 is analyzed in the example above. On the first pass, we
analyzed it using the dataflow information from instruction 3, but on the second pass we had to
consider dataflow information from both instruction 3 and instruction 7.
It is more consistent to say that analyzing an instruction always uses the incoming dataflow
analysis information from all instructions that could precede it. That way, we do not have to worry
about following a specific path during analysis. However, for instruction 4, this requires a dataflow
value from instruction 7, even if instruction 7 has not yet been analyzed. We could do this if we
had a dataflow value that is always ignored when it is joined with any other dataflow value. In
other words, we need a abstract dataflow value K (pronounced “bottom”) such that K \ l “ l.
K plays a dual role to the value J: it sits at the bottom of the dataflow value lattice. For all l,
we have the identity l Ď J and correspondingly K Ď l. There is an greatest lower bound operator
meet, [, which is dual to \. The meet of all dataflow values is K.
A set of values L that is equipped with a partial order Ď, and for which both least upper bounds
\ and greatest lower bounds [ exist in L and are unique, is called a complete lattice.
The theory of K and complete lattices provides an elegant solution to the problem mentioned
above. We can initialize σ at every instruction in the program, except at entry, to K, indicating
that the instruction there has not yet been analyzed. We can then always merge all input values
to a node, whether or not the sources of those inputs have been analysed, because we know that
any K values from unanalyzed sources will simply be ignored by the join operator \, and that if
the dataflow value for that variable will change, we will get to it before the analysis is completed.

3 Analysis execution strategy


The informal execution strategy outlined above considers all paths through the program, continuing
until the dataflow analysis information reaches a fixed point. This strategy can be simplified. The
argument for correctness outlined above implies that for correct flow functions, it doesn’t matter
how we get to the analysis fixed point. This is sensible: it would be surprising if analysis correctness
depended on which branch of an if statement we explored first! It is in fact possible to run the
analysis on program instructions in any order we choose. As long as we continue doing so until the
analysis reaches a fixed point, the final result will be correct. The simplest correct algorithm for
executing dataflow analysis can therefore be stated as follows:
for Instruction i in program
input [ i ] = K
input [ firstInstruction ] = i n i t i a l D a t a f l o w I n f o r m a t i o n

while not at fixed point


pick an instruction i in program
output = flow (i , input [ i ])
for Instruction j in sucessors ( i )
input [ j ] = input [ j ] \ output
Although in the previous presentation we have been tracking the analysis information immedi-
ately after each instruction, it is more convenient when writing down the algorithm to track the
analysis information immediately before each instruction. This avoids the need for a distinguished
location before the program starts (the start instruction is not analyzed).

8
In the code above, the termination condition is expressed abstractly. It can easily be checked,
however, by running the flow function on each instruction in the program. If the results of analysis
do not change as a result of analyzing any instruction, then it has reached a fixed point.
How do we know the algorithm will terminate? The intuition is as follows. We rely on the
choice of an instruction to be fair, so that each instruction is eventually considered. As long as the
analysis is not at a fixed point, some instruction can be analyzed to produce new analysis results.
If our flow functions are well-behaved (technically, if they are monotone, as we will discuss in a
future lecture) then each time the flow function runs on a given instruction, either the results do
not change, or they get become more approximate (i.e. they are higher in the lattice). Later runs of
the flow function consider more possible paths through the program and therefore produce a more
approximate result which considers all these possibilities. If the lattice is of finite height—meaning
there are at most a finite number of steps from any place in the lattice going up towards the J
value—then this process must terminate eventually. More concretely: once an abstract value is
computed to be J, it will stay J no matter how many times the analysis is run. The abstraction
only flows in one direction.
Although the simple algorithm above always terminates and results in the correct answer, it is
still not always the most efficient. Typically, for example, it is beneficial to analyze the program
instructions in order, so that results from earlier instructions can be used to update the results
of later instructions. It is also useful to keep track of a list of instructions for which there has
been a change since the instruction was last analyzed in the result dataflow information of some
predecessor. Only those instructions need be analyzed; reanalyzing other instructions is useless
since their input has not changed. Kildall captured this intuition with his worklist algorithm,
described in pseudocode as:
for Instruction i in program
input [ i ] = K
input [ firstInstruction ] = i n i t i a l D a t a f l o w I n f o r m a t i o n
worklist = { firstInstruction }

while worklist is not empty


take an instruction i off the worklist
output = flow (i , input [ i ])
for Instruction j in succs ( i )
if output ­Ď input [ j ]
input [ j ] = input [ j ] \ output
add j to worklist
The algorithm above is very close to the generic algorithm declared previously, except for the
worklist that chooses the next instruction to analyze and determines when a fixed point is reached.
We can reason about the performance of this algorithm as follows. We only add an instruction
to the worklist when the input data to some node changes, and the input for a given node can
only change h times, where h is the height of the lattice. Thus we add at most n ˚ h nodes to the
worklist, where n is the number of instructions in the program. After running the flow function for
a node, however, we must test all its successors to find out if their input has changed. This test is
done once for each edge, for each time that the source node of the edge is added to the worklist:
thus at most e ˚ h times, where e is the number of control flow edges in the successor graph between
instructions. If each operation (such as a flow function, \, or Ď test) has cost Opcq, then the overall
cost is Opc ˚ pn ` eq ˚ hq, or Opc ˚ e ˚ hq because n is bounded by e.
The algorithm above is still abstract: We have not defined the operations to add and remove

9
instructions from the worklist. We would like adding to the work list to be a set addition operation,
so that no instruction appears in it multiple times. If we have just analysed the program with respect
to an instruction, analyzing it again will not produce different results.
That leaves a choice of which instruction to remove from the worklist. We could choose among
several policies, including last-in-first-out (LIFO) order or first-in-first-out (FIFO) order. In prac-
tice, the most efficient approach is to identify the strongly-connected components (i.e. loops) in
the control flow graph of components and process them in topological order, so that loops that are
nested, or appear in program order first, are solved before later loops. This works well because we
do not want to do a lot of work bringing a loop late in the program to a fixed point, then have to
redo that work when dataflow information from an earlier loop changes.
Within each loop, the instructions should be processed in reverse postorder, the reverse of
the order in which each node is last visited when traversing a tree. Consider the example from
Section 2.2 above, in which instruction 1 is an if test, instructions 2-3 are the then branch,
instructions 4-5 are the else branch, and instruction 6 comes after the if statement. A tree
traversal might go as follows: 1, 2, 3, 6, 3 (again), 2 (again), 1 (again), 4, 5, 4 (again), 1 (again).
Some instructions in the tree are visited multiple times: once going down, once between visiting
the children, and once coming up. The postorder, or order of the last visits to each node, is 6, 3,
2, 5, 4, 1. The reverse postorder is the reverse of this: 1, 4, 5, 2, 3, 6. Now we can see why reverse
postorder works well: we explore both branches of the if statement (4-5 and 2-3) before we explore
node 6. This ensures that we do not have to reanalyze node 6 after one of its inputs changes.
Although analyzing code using the strongly-connected component and reverse postorder heuris-
tics improves performance substantially in practice, it does not change the worst-case performance
results described above.

10
Lecture Notes:
Dataflow Analysis Examples

17-355/17-665/17-819O: Program Analysis (Spring 2018)


Jonathan Aldrich and Claire Le Goues
[email protected], [email protected]

1 Constant Propagation
While zero analysis was useful for simply tracking whether a given variable is zero or not, constant
propagation analysis attempts to track the constant values of variables in the program, where
possible. Constant propagation has long been used in compiler optimization passes in order to
turn variable reads and computations into constants. However, it is generally useful for analysis
for program correctness as well: any client analysis that benefits from knowing program values
(e.g. an array bounds analysis) can leverage it.
For constant propagation, we want to track what is the constant value, if any, of each program
variable. Therefore we will use a lattice where the set LCP is Z Y tJ, Ku. The partial order is
@l P LCP : K „ l ^ l „ J. In other words, K is below every lattice element and J is above every
element, but otherwise lattice elements are incomparable.
In the above lattice, as well as our earlier discussion of zero analysis, we used a lattice to
describe individual variable values. We can lift the notion of a lattice to cover all the dataflow
information available at a program point. This is called a tuple lattice, where there is an element of
the tuple for each of the variables in the program. For constant propagation, the elements of the
set σ are maps from Var to LCP , and the other operators and J{K are lifted as follows:

σ P Var Ñ LCP
σ1 „lif t σ2 iff @x P Var : σ1 pxq „ σ2 pxq
σ1 \lif t σ2  tx ÞÑ σ1 pxq \ σ2 pxq | x P Varu
Jlif t  tx ÞÑ J | x P Varu
Klif t  tx ÞÑ K | x P Varu
We can likewise define an abstraction function for constant propagation, as well as a lifted
version that accepts an environment E mapping variables to concrete values. We also define the
initial analysis information to conservatively assume that initial variable values are unknown.
Note that in a language that initializes all variables to zero, we could make more precise initial
dataflow assumptions, such as tx ÞÑ 0 | x P Varu:

αCP pnq  n
αlif t pE q  tx ÞÑ αCP pE pxqq | x P Varu
σ0  Jlif t
We can now define flow functions for constant propagation:

1
fCP vx : nwpσ q  rx ÞÑ nsσ
fCP vx : y wpσ q  rx ÞÑ σpyqsσ
fCP vx : y op z wpσ q  rx ÞÑ σpyq oplif t σpzqsσ
where n oplif t m  n op m
and n oplif t J  J (and symmetric)
and n oplif t K  K (and symmetric)

fCP vgoto nwpσ q σ


fCP vif x  0 goto nwT pσ q  rx ÞÑ 0sσ
fCP vif x  0 goto nwF pσ q  σ
fCP vif x   0 goto nwpσ q σ
We can now look at an example of constant propagation. Below, the code is on the left, and the
results of the analysis is on the right. In this table we show the worklist as it is updated to show
how the algorithm operates:
stmt worklist x y z w
1 : x : 3 0 1 J J J J
2 : y : x 7 1 2 3 J J J
3 : if z  0 goto 6 2 3 3 10 J J
4 : z : x 2 3 4,6 3 10 0T , JF J
5 : goto 7 4 5,6 3 10 5 J
6 : z : y  5 5 6,7 3 10 5 J
7 : w : z  2 6 7 3 10 5 J
7 H 3 10 5 3

2 Reaching Definitions
Reaching definitions analysis determines, for each use of a variable, which assignments to that
variable might have set the value seen at that use. Consider the following program:

1: y : x
2: z : 1
3: if y  0 goto 7
4: z : z  y
5: y : y  1
6: goto 3
7: y : 0
In this example, definitions 1 and 5 reach the use of y at 4.

Exercise 1. Which definitions reach the use of z at statement 4?


Reaching definitions can be used as a simpler but less precise version of constant propagation,
zero analysis, etc. where instead of tracking actual constant values we just look up the reach-
ing definition and see if it is a constant. We can also use reaching definitions to identify uses of
undefined variables, e.g. if no definition from the program reaches a use.
For reaching definitions, we define a new kind of lattice: a set lattice. Here, a dataflow lattice
element is the set of definitions that reach the current program point. Assume that DEFS is the

2
set of all definitions in the program. The set of elements in the lattice is the set of all subsets of
DEFS—that is, the powerset of DEFS, written P DEFS .
What should „ be for reaching definitions? The intuition is that our analysis is more precise
the smaller the set of definitions it computes at a given program point. This is because we want to
know, as precisely as possible, where the values at a program point came from. So „ should be the
subset relation „: a subset is more precise than its superset. This naturally implies that \ should
be union, and that J and K should be the universal set DEFS and the empty set H, respectively.
In summary, we can formally define our lattice and initial dataflow information as follows:

σ P P DEFS
σ1 „ σ2 iff σ1 „ σ2
σ1 \ σ2  σ1 Y σ2
J  DEFS
K  H
σ0  H
Instead of using the empty set for σ0 , we could use an artificial reaching definition for each
program variable (e.g. x0 as an artificial reaching definition for x) to denote that the variable is
either uninitialized, or was passed in as a parameter. This is convenient if it is useful to track
whether a variable might be uninitialized at a use, or if we want to consider a parameter to be a
definition. We could write this formally as σ0  tx0 | x P Varsu
We will now define flow functions for reaching definitions. Notationally, we will write xn to
denote a definition of the variable x at the program instruction numbered n. Since our lattice is
a set, we can reason about changes to it in terms of elements that are added (called GEN) and
elements that are removed (called KILL) for each statement. This GEN/KILL pattern is common
to many dataflow analyses. The flow functions can be formally defined as follows:

fRD vI wpσ q  σ  KILLRD vI w Y GENRD vI w


KILLRD vn : x : ...w  txm |xm P DEFSpxqu
KILLRD vI w  H if I is not an assignment
GENRD vn : x : ...w  txn u
GENRD vI w  H if I is not an assignment
We would compute dataflow analysis information for the program shown above as follows:
stmt worklist defs
0 1 H
1 2 ty1u
2 3 ty1, z1u
3 4,7 ty1, z1u
4 5,7 ty1, z4u
5 6,7 ty5, z4u
6 3,7 ty5, z4u
3 4,7 ty1, y5, z1, z4u
4 5,7 ty1, y5, z4u
5 7 ty5, z4u
7 H ty7, z1, z4u

3
3 Live Variables
Live variable analysis determines, for each program point, which variables might be used again
before they are redefined. Consider again the following program:

1: y : x
2: z : 1
3: if y  0 goto 7
4: z : z  y
5: y : y  1
6: goto 3
7: y : 0
In this example, after instruction 1, y is live, but x and z are not. Live variables analysis
typically requires knowing what variable holds the main result(s) computed by the program. In
the program above, suppose z is the result of the program. Then at the end of the program, only z
is live.
Live variable analysis was originally developed for optimization purposes: if a variable is not
live after it is defined, we can remove the definition instruction. For example, instruction 7 in the
code above could be optimized away, under our assumption that z is the only program result of
interest.
We must be careful of the side effects of a statement, of course. Assigning a variable that is no
longer live to null could have the beneficial side effect of allowing the garbage collector to collect
memory that is no longer reachable—unless the GC itself takes into consideration which variables
are live. Sometimes warning the user that an assignment has no effect can be useful for software
engineering purposes, even if the assignment cannot safely be optimized away. For example, eBay
found that FindBugs’s analysis detecting assignments to dead variables was useful for identifying
unnecessary database calls.1
For live variable analysis, we will use a set lattice to track the set of live variables at each
program point. The lattice is similar to that for reaching definitions:

σ P P Var
σ1 „ σ2 iff σ1 „ σ2
σ1 \ σ2  σ1 Y σ2
J  Var
K  H
What is the initial dataflow information? This is a tricky question. To determine the variables
that are live at the start of the program, we must reason about how the program will execute...i.e.
we must run the live variables analysis itself! There’s no obvious assumption we can make about
this. On the other hand, it is quite clear which variables are live at the end of the program: just the
variable(s) holding the program result.
Consider how we might use this information to compute other live variables. Suppose the last
statement in the program assigns the program result z, computing it based on some other variable
x. Intuitively, that statement should make x live immediately above that statement, as it is needed
to compute the program result z—but z should now no longer be live. We can use similar logic for
the second-to-last statement, and so on. In fact, we can see that live variable analysis is a backwards
1
see Ciera Jaspan, I-Chin Chen, and Anoop Sharma, Understanding the value of program analysis tools, OOPSLA prac-
titioner report, 2007

4
analysis: we start with dataflow information at the end of the program and use flow functions to
compute dataflow information at earlier statements.
Thus, for our “initial” dataflow information—and note that “initial” means the beginning of
the program analysis, but the end of the program—we have:

σend  tx | x holds part of the program resultu


We can now define flow functions for live variable analysis. We can do this simply using GEN
and KILL sets:

KILLLV vI w tx | I defines xu
GENLV vI w  tx | I uses xu

We would compute dataflow analysis information for the program shown above as follows.
Note that we iterate over the program backwords, i.e. reversing control flow edges between in-
structions. For each instruction, the corresponding row in our table will hold the information
after we have applied the flow function—that is, the variables that are live immediately before the
statement executes:
stmt worklist defs
end 7 tz u
7 3 tz u
3 6,2 tz, yu
6 5,2 tz, yu
5 4,2 tz, yu
4 3,2 tz, yu
3 2 tz, yu
2 1 ty u
1 H tx u

5
Lecture Notes: Program Analysis Correctness

17-355/17-665/17-819O: Program Analysis (Spring 2018)


Claire Le Goues and Jonathan Aldrich
[email protected], [email protected]

The

1 Termination
As we think about the correctness of program analysis, let us first think more carefully about the
situations under which program analysis will terminate. In a previous lecture, we analyzed the
performance of Kildall’s worklist algorithm. A critical part of that performance analysis was the
the observation that running a flow function always either leaves the dataflow analysis informa-
tion unchanged, or makes it more approximate—that is, it moves the current dataflow analysis
results up in the lattice. The dataflow values at each program point describe an ascending chain:

Ascending Chain A sequence σk is an ascending chain iff n ď m implies σn Ď


σm
We can define the height of an ascending chain, and of a lattice, in order to bound the number of
new analysis values we can compute at each program point:

Height of an Ascend- An ascending chain σk has finite height h if it contains h ` 1


ing Chain distinct elements.

Height of a Lattice A lattice pL, Ďq has finite height h if there is an ascending


chain in the lattice of height h, and no ascending chain in
the lattice has height greater than h

We can now show that for a lattice of finite height, the worklist algorithm is guaranteed to
terminate. We do so by showing that the dataflow analysis information at each program point
follows an ascending chain. Consider the following version of the worklist algorithm:
forall (Instruction i P program)
σris “ K
σ[beforeStart] = initialDataflowInformation
worklist = { firstInstruction }

while worklist is not empty


take an instruction i off the worklist
var thisInput = K
forall (Instruction j P predecessors(i))
thisInput = thisInput \ σrjs

1
let newOutput = flow(i, thisInput)
if (newOutput ‰ σris)
σris = newOutput
worklist = worklist Y successors(i)
Question: what are the differences between this version and the previous version? Convince yourself that it
still does the same thing.

We can make the termination argument inductively: At the beginning of the analysis, the anal-
ysis information at every program point (other than the start) is K (by definition). Thus the first
time we run each flow function for each instruction, the result will be at least as high in the lattice
as what was there before (because nothing is lower in a lattice than K). We will run the flow func-
tion for a given instruction again at a program point only if the dataflow analysis information just
before that instruction changes. Assume that the previous time we ran the flow function, we had
input information σi and output information σo . Now we are running it again because the input
dataflow analysis information has changed to some new σi1 —and by the induction hypothesis, we
can assume it is higher in the lattice than before, i.e. σi Ď σi1 .
What we need to show is that the output information σo1 is at least as high in the lattice as
the old output information σo —that is, we must show that σo Ď σo1 . This will be true if our flow
functions are monotonic:

Monotonicity A function f is monotonic iff σ1 Ď σ2 implies f pσ1 q Ď f pσ2 )

Now we can state the termination theorem:


Theorem 1 (Dataflow Analysis Termination). If a dataflow lattice pL, Ďq has finite height, and the
corresponding flow functions are monotonic, the worklist algorithm will terminate.

Proof. Follows the logic given above when motivating monotonicity. Monotonicity implies that
the dataflow value at each program point i can only increase each time σris is assigned. This can
happen a maximum of h times for each program point, where h is the height of the lattice. This
bounds the number of elements added to the worklist to h ˚ e, where e is the number of edges
in the program’s control flow graph. Since we remove one element of the worklist for each time
through the loop, we will execute the loop at most h ˚ e times before the worklist is empty. Thus,
the algorithm will terminate.

2 Montonicity of Zero Analysis


We can formally show that zero analysis is monotone; this is relevant both to the proof of termi-
nation, above, and to correctness, next. We will only give a couple of the more interesting cases,
and leave the rest as an exercise to the reader:
Case fZ vx :“ 0wpσq “ rx ÞÑ Zsσ:
Assume we have σ1 Ď σ2
Since Ď is defined pointwise, we know that rx ÞÑ Zsσ1 Ď rx ÞÑ Zsσ2

Case fZ vx :“ ywpσq “ rx ÞÑ σpyqsσ:


Assume we have σ1 Ď σ2
Since Ď is defined pointwise, we know that σ1 pyq Ďsimple σ2 pyq
Therefore, using the pointwise definition of Ď again, we also obtain rx ÞÑ σ1 pyqsσ1 Ď
rx ÞÑ σ2 pyqsσ2

2
(αsimple and Ďsimple are simply the unlifted versions of α and Ď, i.e. they operate on individual
values rather than maps.)

3 Correctness
What does it mean for an analysis of a W HILE 3A DDR program to be correct? Intuitively, we would
like the program analysis results to correctly describe every actual execution of the program. To
establish correctness, we will make use of the precise definitions of W HILE 3A DDR we gave in the
form of operational semantics in the first couple of lectures. We start by formalizing a program
execution as a trace:

Program Trace A trace T of a program P is a potentially infinite sequence


tc0 , c1 , ...u of program configurations, where c0 “ E0 , 1 is
.
called the initial configuration, and for every i ě 0 we have
P $ ci ; ci`1
Given this definition, we can formally define soundness:

Dataflow Analysis The result tσi | i P P u of a program analysis running on


Soundness program P is sound iff, for all traces T of P , for all i such
that 0 ď i ă lengthpT q, αpci q Ď σni
In this definition, just as ci is the program configuration immediately before executing in-
struction ni as the ith program step, σi is the dataflow analysis information immediately before
instruction ni .

Exercise 1. Consider the following (incorrect) flow function for zero analysis:

fZ vx :“ y ` zwpσq “ rx ÞÑ Zsσ

Exercise 1. Give an example of a program and a concrete trace that illustrates that this flow func-
tion is unsound.

The key to designing a sound analysis is to make sure that the flow functions map abstract
information before each instruction to abstract information after that instruction in a way that
matches the instruction’s concrete semantics. Another way of saying this is that the manipulation
of the abstract state done by the analysis should reflect the manipulation of the concrete machine
state done by the executing instruction. We can formalize this as a local soundness property:

Local Soundness A flow function f is locally sound iff P $ ci ; ci`1 and


αpci q Ď σi and f vP rni swpσi q “ σi`1 implies αpci`1 q Ď σi`1
In English: if we take any concrete execution of a program instruction, map the input machine
state to the abstract domain using the abstraction function, find that the abstracted input state is
described by the analysis input information, and apply the flow function, we should get a result
that correctly accounts for what happens if we map the actual concrete output machine state to
the abstract domain.

Exercise 2. Consider again the incorrect zero analysis flow function described above. Specify an
input state ci and use that input state to show that the flow function is not locally sound.

3
We can now show that the flow functions for zero analysis are locally sound. Although techni-
cally the overall abstraction function α accepts a complete program configuration pE, nq, for zero
analysis we can ignore the n component and so in the proof below we will simply focus on the
environment E. We show the cases for a couple of interesting syntax forms; the rest are either
trivial or analogous:

Case fZ vx :“ 0wpσi q = rx ÞÑ Zsσi :


Assume ci “ E, n and αpEq “ σi
Thus σi`1 “ fZ vx :“ 0wpσi q “ rx ÞÑ ZsαpEq
ci`1 “ rx ÞÑ 0sE, n ` 1 by rule step-const
Now αprx ÞÑ 0sEq “ rx ÞÑ ZsαpEq by the definition of α.
Therefore αpci`1 q Ď σi`1 , which finishes the case.

Case fZ vx :“ mwpσi q “ rx ÞÑ N sσi where m ‰ 0:


Assume ci “ E, n and αpEq “ σi
Thus σi`1 “ fZ vx :“ mwpσi q “ rx ÞÑ N sαpEq
ci`1 “ rx ÞÑ msE, n ` 1 by rule step-const
Now αprx ÞÑ msEq “ rx ÞÑ N sαpEq by the definition of α and the assumption that
m ‰ 0.
Therefore αpci`1 q Ď σi`1 which finishes the case.

Case fZ vx :“ y op zwpσi q “ rx ÞÑ?sσi :


Assume ci “ E, n and αpEq “ σi
Thus σi`1 “ fZ vx :“ y op zwpσi q “ rx ÞÑ?sαpEq
ci`1 “ rx ÞÑ ksE, n ` 1 for some k by rule step-const
Now αprx ÞÑ ksEq Ď rx ÞÑ?sαpEq because the map is equal for all keys except x, and for
x we have αsimple pkq Ďsimple ? for all k, where αsimple and Ďsimple are the unlifted versions of
α and Ď, i.e. they operate on individual values rather than maps.
Therefore αpci`1 q Ď σi`1 which finishes the case.

Exercise 3. Prove the case for fZ vx :“ ywpσq “ rx ÞÑ σpyqsσ.

Now we can show that local soundness can be used to prove the global soundness of a dataflow
analysis. To do so, let us formally define the state of the dataflow analysis at a fixed point:

Fixed Point A dataflow analysis result tσi | i P P u is a fixed point iff


σ0 Ď σ1 where σ0 is the initial analysis information and σ1 is
the dataflow result before the
Ů first instruction, and for each
instruction i we have σi “ jPpredspiq f vP rjswpσj q.

And now the main result we will use to prove program analyses correct:
Theorem 2 (Local Soundness implies Global Soundness). If a dataflow analysis’s flow function f is
monotonic and locally sound, and for all traces T we have αpc0 q Ď σ0 where σ0 is the initial analysis
information, then any fixed point tσi | i P P u of the analysis is sound.

Proof. Consider an arbitrary program trace T . The proof is by induction on the program configu-
rations tci u in the trace.

4
Case c0 :
αpc0 q Ď σ0 by assumption.
σ0 Ď σn0 by the definition of a fixed point.
αpc0 q Ď σn0 by the transitivity of Ď.

Case ci`1 :
αpci q Ď σni by the induction hypothesis.
P $ ci ; ci`1 by the definition of a trace.
αpci`1 q Ď f vP rni swpαpci qq by local soundness.
f vP rni swpαpci qq Ď f vP rni swpσni q by monotonicity of f .
σni`1 “ f vP rni swpσni q \ ... by the definition of fixed point.
f vP rni swpσni q Ď σni`1 by the properties of \.
αpci`1 q Ď σni`1 by the transitivity of Ď.

Since we previously proved that Zero Analysis is locally sound and that its flow functions
are monotonic, we can use this theorem to conclude that the analysis is sound. This means, for
example, that Zero Analysis will never neglect to warn us if we are dividing by a variable that
could be zero.
This discussion leads naturally into a fuller treatment of abstract interpretation, which we will
turn to in subsequent lectures/readings.

5
Lecture Notes:
Widening Operators and Collecting Semantics

17-355/17-665/17-819O: Program Analysis (Spring 2018)


Claire Le Goues and Jonathan Aldrich
[email protected], [email protected]

1 A Collecting Semantics for Reaching Definitions


The approach to dataflow analysis correctness outlined in the previous lectures generalizes natu-
rally when we have a lattice that can be directly abstracted from program configurations c from
our execution semantics. Sometimes, however, it would be useful to track other kinds of infor-
mation, that we cannot get directly from a particular state in program execution. For example,
consider reaching definitions, which we discussed as an example analysis last week. Although we
can track which definitions reach a line using the previously-outlined approach, we cannot see
where the variables used in an instruction I were last defined.
To solve this problem, we can augment our semantics with additional information that cap-
tures the required information. For example, for reaching definitions, we want to know, at any
point in a particular execution, which definition reaches the current location for each program vari-
able in scope.
We call a version of the program semantics that has been augmented with additional informa-
tion necessary for some particular analysis a collecting semantics. For reaching definitions, we can
define a collecting semantics with a version of the environment E, which we will call ERD , that
has been extended with a index n indicating the location where each variable was last defined.

ERD P Var Ñ Z ˆ N

We can now extend the semantics to track this information. We show only the rules that differ
from those described in the earlier lectures:

P rns “ x :“ m
step-const
P $ E, n ; Erx ÞÑ m, ns, n ` 1

P rns “ x :“ y
step-copy
P $ E, n ; Erx ÞÑ Erys, ns, n ` 1

P rns “ x :“ y op z Erys op Erzs “ m


step-arith
P $ E, n ; Erx ÞÑ m, ns, n ` 1

Essentially, each rule that defines a variable records the current location as the latest definition
of that variable. Now we can define an abstraction function for reaching definitions from this
collecting semantics:

1
αRD pERD , nq “ tm | Dx P domainpERD q such that ERD pxq “ i, mu
From this point, reasoning about the correctness of reaching definitions proceeds analogously
to the reasoning for zero analysis outlined in the previous lectures.
Formulating a collecting semantics can be tricky for some analyses, but it can be done with a
little thought. For example, consider live variable analysis. The collecting semantics requires us
to know, for each execution of the program, which variables currently in scope will be used before
they are defined in the remainder of the program. We can compute this semantics by assuming a
(possibly infinite) trace for a program run, then specifying the set of live variables at every point
in that trace based on the trace going forward from that point. This semantics, specified in terms
of traces rather than a set of inference rules, can then be used in the definition of an abstraction
function and used to reason about the correctness of live variables analysis.

2 Interval Analysis
Let us consider a program analysis that might be suitable for array bounds checking, namely
interval analysis. As the name suggests, interval analysis tracks the interval of values that each
variable might hold. We can define a lattice, initial dataflow information, and abstraction function
as follows:

L “ Z8 ˆ Z8 where Z8 “ Z Y t´8, 8u
rl1 , h1 s Ď rl2 , h2 s iff l2 ď8 l1 ^ h1 ď8 h2
rl1 , h1 s \ rl2 , h2 s “ rmin8 pl1 , l2 q, max8 ph1 , h2 qs
J “ r´8, 8s
K “ r8, ´8s
σ0 “ J
αpxq “ rx, xs
We have extended the ď operator and the min and max functions to handle sentinels representing
positive and negative infinity in the obvious way. For example ´8 ď8 n for all n P Z. For
convenience we write the empty interval K as r8, ´8s.
Note also that this lattice is defined to capture the range of a single variable. As usual, we
can lift it to a map from variables to interval lattice elements. Thus we (again) have dataflow
information σ P Var Ñ L
We can also define a set of flow functions. Here we provide one for addition; the rest should
be easy for the reader to develop:

fI vx :“ y ` zwpσq “ rx ÞÑ rl, hssσ where l “ σpyq.low `8 σpzq.low


and h “ σpyq.high `8 σpzq.high
fI vx :“ y ` zwpσq “ σ where σpyq “ K _ σpzq “ K
In the above we have extended mathematical ` to operate over the sentinels for 8, ´8, for
example such that @x ‰ ´8 : 8 ` x “ 8. We define the second case of the flow function to handle
the case where one argument is K, possibly resulting in the undefined case ´8 ` 8.
If we run this analysis on a program, whenever we come to an array dereference, we can check
whether the interval produced by the analysis for the array index variable is within the bounds of
the array. If not, we can issue a warning about a potential array bounds violation.

2
Just one practical problem remains. Consider: what is the height of the above-defined lattice, and
what consequences does this have for our analysis in practice?

3 The Widening Operator


As in the example of interval analysis, there are times in which it is useful to define a lattice of
infinite height. We would like to nevertheless find a mechanism for ensuring that the analysis will
terminate. One way to do this is to find situations where the lattice may be ascending an infinite
chain at a given program point, and effectively shorten the chain to a finite height. We can do so
with a widening operator. To motivate the widening operator, consider applying interval analysis
to the program below:

1: x :“ 0
2: if x “ y goto 5
3: x :“ x ` 1
4: goto 2
5: y :“ 0
Using the worklist algorithm (strongly connected components first), gives us:

stmt worklist x y
0 1 J J
1 2 [0,0] J
2 3,5 [0,0] J
3 4,5 [1,1] J
4 2,5 [1,1] J
2 3,5 [0,1] J
3 4,5 [1,2] J
4 2,5 [1,2] J
2 3,5 [0,2] J
3 4,5 [1,3] J
4 2,5 [1,3] J
2 3,5 [0,3] J
...

Consider the sequence of interval lattice elements for x immediately after statement 2. Count-
ing the original lattice value as K (not shown explicitly in the trace above), we can see it is the as-
cending chain K, r0, 0s, r0, 1s, r0, 2s, r0, 3s, .... Recall that ascending chain means that each element
of the sequence is higher in the lattice than the previous element. In the case of interval analysis,
[0,2] (for example) is higher than [0,1] in the lattice because the latter interval is contained within
the former. Given mathematical integers, this chain is clearly infinite; therefore our analysis is not
guaranteed to terminate (and indeed it will not in practice).
A widening operator’s purpose is to compress such infinite chains to finite length. The widen-
ing operator considers the most recent two elements in a chain. If the second is higher than the
first, the widening operator can choose to jump up in the lattice, potentially skipping elements
in the chain. For example, one way to cut the ascending chain above down to a finite height is to
observe that the upper limit for x is increasing, and therefore assume the maximum possible value
8 for x. Thus we will have the new chain K, r0, 0s, r0, 8s, r0, 8s, ... which has already converged
after the third element in the sequence.

3
The widening operator gets its name because it is an upper bound operator, and in many
lattices, higher elements represent a wider range of program values.
We can define the example widening operator given above more formally as follows:

W pK, lcurrent q “ lcurrent

W prl1 , h1 s, rl2 , h2 sq “ rminW pl1 , l2 q, maxW ph1 , h2 qs


where minW pl1 , l2 q “ l1 if l1 ď l2
and minW pl1 , l2 q “ ´8 otherwise
where maxW ph1 , h2 q “ h1 if h1 ě h2
and maxW ph1 , h2 q “ 8 otherwise
Applying this widening operator each time just before analyzing instruction 2 produces:

stmt worklist x y
0 1 J J
1 2 [0,0] J
2 3,5 [0,0] J
3 4,5 [1,1] J
4 2,5 [1,1] J
2 3,5 [0,8] J
3 4,5 [1,8] J
4 2,5 [1,8] J
2 5 [0,8] J
5 H [0,8] [0,0]

Before we analyze instruction 2 the first time, we compute W pK, r0, 0sq “ r0, 0s using the
first case of the definition of W . Before we analyze instruction 2 the second time, we compute
W pr0, 0s, r0, 1sq “ r0, 8s. In particular, the lower bound 0 has not changed, but since the upper
bound has increased from h1 “ 0 to h2 “ 1, the maxW helper function sets the maximum to 8.
After we go through the loop a second time we observe that iteration has converged at a fixed
point. We therefore analyze statement 5 and we are done.
Let us consider the properties of widening operators more generally. A widening operator
W plprevious :L, lcurrent :Lq : L accepts two lattice elements, the previous lattice value lprevious at a pro-
gram location and the current lattice value lcurrent at the same program location. It returns a new
lattice value that will be used in place of the current lattice value.
We require two properties of widening operators. The first is that the widening operator must
return an upper bound of its operands. Intuitively, this is required for monotonicity: if the oper-
ator is applied to an ascending chain, an ascending chian should be a result. Formally, we have
@lprevious , lcurrent : lprevious Ď W plprevious , lcurrent q ^ lcurrent Ď W plprevious , lcurrent q.
The second property is that when the widening operator is applied to an ascending chain
li , the resulting ascending chain liW must be of finite height. Formally we define l0W “ l0 and
@i ą 0 : liW “ W pli´1 W , l q. This property ensures that when we apply the widening operator, it will
i
ensure that the analysis terminates.
Where can we apply the widening operator? Clearly it is safe to apply anywhere, since it must
be an upper bound and therefore can only raise the analysis result in the lattice, thus making
the analysis result more conservative. However, widening inherently causes a loss of precision.
Therefore it is better to apply it only when necessary. One solution is to apply the widening

4
operator only at the heads of loops, as in the example above. Loop heads (or their equivalent, in
unstructured control flow) can be inferred even from low-level three address code—see a compiler
text such as Appel and Palsberg’s Modern Compiler Implementation in Java.
We can use a somewhat smarter version of this widening operator with the insight that the
bounds of a lattice are often related to constants in the program. Thus if we have an ascend-
ing chain K, r0, 0s, r0, 1s, r0, 2s, r0, 3s, ... and the constant 10 is in the program, we might change
the chain to K, r0, 0s, r0, 10s, .... If we are lucky, the chain will stop ascending that that point:
K, r0, 0s, r0, 10s, r0, 10s, .... If we are not so fortunate, the chain will continue and eventually sta-
bilize at r0, 8s as before: K, r0, 0s, r0, 10s, r0, 8s.
If the program has the set of constants K, we can define a widening operator as follows:

W pK, lcurrent q “ lcurrent

W prl1 , h1 s, rl2 , h2 sq “ rminK pl1 , l2 q, maxK ph1 , h2 qs


where minK pl1 , l2 q “ l1 if l1 ď l2
and minK pl1 , l2 q “ maxptk P K|k ď l2 uq otherwise
where maxK ph1 , h2 q “ h1 if h1 ě h2
and maxK ph1 , h2 q “ minptk P K|k ě h2 u otherwise
We can now analyze a program with a couple of constants and see how this approach works:

1: x :“ 0
2: y :“ 1
3: if x “ 10 goto 7
4: x :“ x ` 1
5: y :“ y ´ 1
6: goto 3
7: goto 7
Here the constants in the program are 0, 1 and 10. The analysis results are as follows:

stmt worklist x y
0 1 J J
1 2 [0,0] J
2 3 [0,0] r1, 1s
3 4,7 r0, 0sF , KT r1, 1s
4 5,7 [1,1] r1, 1s
5 6,7 [1,1] r0, 0s
6 3,7 [1,1] r0, 0s
3 4,7 r0, 1sF , KT r0, 1s
4 5,7 [1,2] r0, 1s
5 6,7 [1,2] r´1, 0s
6 3,7 [1,2] r´1, 0s
3 4,7 r0, 9sF , r10, 10sT r´8, 1s
4 5,7 [1,10] r´8, 1s
5 6,7 [1,10] r´8, 0s
6 3,7 [1,10] r´8, 0s
3 7 r0, 9sF , r10, 10sT r´8, 1s
7 H [10,10] r´8, 1s

5
Applying the widening operation the first time we get to statement 3 has no effect, as the
previous analysis value was K. The second time we get to statement 3, the range of both x and y
has been extended, but both are still bounded by constants in the program. The third time we get
to statement 3, we apply the widening operator to x, whose abstract value has gone from [0,1] to
[0,2]. The widened abstract value is [0,10], since 10 is the smallest constant in the program that is at
least as large as 2. For y we must widen to r´8, 1s. The analysis stabilizes after one more iteration.
In this example I have assumed a flow function for the if instruction that propagates different
interval information depending on whether the branch is taken or not. In the table, we list the
branch taken information for x as K until x reaches the range in which it is feasible to take the
branch. K can be seen as a natural representation for dataflow values that propagate along a path
that is infeasible.

6
Lecture Notes:
Interprocedural Analysis

17-355/17-665: Program Analysis (Spring 2017)


Jonathan Aldrich
[email protected]

1 Interprocedural Analysis
Consider an extension of W HILE 3A DDR that includes functions. We thus add a new syntactic
category F (for functions), and two new instruction forms (function call and return), as follows:

F :: fun f pxq t n : I u


S :: . . . | return x | y  f px q
In the notation above, n : I, the line is shorthand for a list, so that the body of a function is a
list of instructions I with line numbers n. We assume in our formalism that all functions take a
single integer argument and return an integer result, but this is easy to generalize if we need to.
We’ve made our programming language much easier to use, but dataflow analysis has become
rather more difficult. Interprocedural analysis concerns analyzing a program with multiple pro-
cedures, ideally taking into account the way that information flows among those procedures. We
use zero analysis as our running example throughout, unless otherwise indicated.

1.1 Default Assumptions


Our first approach assumes a default lattice value for all arguments to a function La and a de-
fault value for procedure results Lr . In some respects, La is equivalent to the initial dataflow
information we set at the entry to the program when we were only looking intraprocedurally;
now we assume it on entry to every procedure. We check the assumptions hold when analyzing
a call or return instruction (trivial if La  Lr  J). We then use the assumption when analyz-
ing the result of a call instruction or starting the analysis of a method. For example, we have
σ0  tx ÞÑ La | x P Varu.
Here is a sample flow function for call and return instructions:

f vx : g py qwpσ q
 rx ÞÑ Lr sσ where σ py q „ La
f vreturn xwpσ q  σ where σ pxq „ Lr
We can apply zero analysis to the following function, using La  Lr  J:

1
1 : fun divByXpxq : int
2: y : 10{x
3: return y
4 : fun mainpq : void
5: z : 5
6: w : divByXpz q

The results are sound, but imprecise. We can avoid the false positive by using a more optimistic
assumption La  Lr  N Z. But then we get a problem with the following program:

1 : fun doublepx : intq : int


2: y : 2  x
3: return y
4 : fun mainpq : void
5: z : 0
6: w : doublepz q

Now what?

1.2 Annotations
An alternative approach uses annotations. This allows us to choose different argument and result
assumptions for different procedures. Flow functions might look like:

f vx : g py qwpσ q
 rx ÞÑ annotvgw.rsσ where σ py q „ annotvg w.a
f vreturn xwpσ q  σ where σ pxq „ annotvg w.r
Now we can verify that both of the above programs are safe. But some programs remain
difficult:

1 : fun doublepx : int @Jq : int @J


2: y : 2  x
3: return y
4 : fun mainpq
5: z : 5
6: w : doublepz q
7: z : 10{w

We will see other example analysis approaches that use annotations later in the semester, though
historically, programmer buy-in remains a challenge in practice.

1.3 Local vs. global variables


The above analyses assume we have only local variables. If we have global variables, we must
make conservative assumptions about them too. Assume globals should always be described by
some lattice value Lg at procedure boundaries. We can extend the flow functions as follows:

2
f vx : g py qwpσ q
 rx ÞÑ Lr srz ÞÑ Lg | z P Globalssσ
where σ py q „ La ^ @z P Globals : σ pz q „ Lg
f vreturn xwpσ q  σ
where σ pxq „ Lr ^ @z P Globals : σ pz q „ Lg
Annotations can be extended in a natural way to handle global variables.

1.4 Interprocedural Control Flow Graph


An approach that avoids the burden of annotations, and can capture what a procedure actually
does as used in a particular program, is to build a control flow graph for the entire program, rather
than just a single procedure. To make this work, we handle call and return instructions specially
as follows:

• We add additional edges to the control flow graph. For every call to function g, we add an
edge from the call site to the first instruction of g, and from every return statement of g to
the instruction following that call.

• When analyzing the first statement of a procedure, we generally gather analysis information
from each predecessor as usual. However, we take out all dataflow information related to
local variables in the callers. Furthermore, we add dataflow information for parameters in
the callee, initializing their dataflow values according to the actual arguments passed in at
each call site.

• When analyzing an instruction immediately after a call, we get dataflow information about
local variables from the previous statement. Information about global variables is taken from
the return sites of the function that was called. Information about the variable that the result
of the function call was assigned to comes from the dataflow information about the returned
value.

Now the example described above can be successfully analyzed. However, other programs
still cause problems:

1 : fun doublepx : int @Jq : int @J


2: y : 2  x
3: return y
4 : fun mainpq
5: z : 5
6: w : doublepz q
7: z : 10{w
8: z : 0
9: w : doublepz q

What’s the issue here?

3
1.5 Context Sensitive Analysis
Context-sensitive analysis analyzes a function either multiple times, or parametrically, so that the
analysis results returned to different call sites reflect the different analysis results passed in at
those call sites.
We could get context sensitivity just by duplicating all callees. But this works only for non-
recursive programs.
A simple solution is to build a summary of each function, mapping dataflow input information
to dataflow output information. We will analyze each function once for each context, where a
context is an abstraction for a set of calls to that function. At a minimum, each context must track
the input dataflow information to the function.
Let’s look at how this approach allows the program given above to be proven safe by zero
analysis.
[Example will be given in class]
Things become more challenging in the presence of recursive functions, or more generally mu-
tual recursion. Let us consider context-sensitive interprocedural constant propagation analysis of
a factorial function called by main. We are not focused on the intraprocedural part of the analysis,
so we will just show the function in the form of Java or C source code:
int fact ( int x) {
i f ( x == 1 )
return 1;
else
return x ∗ f a c t ( x 1);
}
void main ( ) {
int y = fact ( 2 ) ;
int z = fact ( 3 ) ;
i n t w = f a c t ( getInputFromUser ( ) ) ;
}
We can analyze the first two calls to fact within main in a straightforward way, and in fact if
we cache the results of analyzing fact(2) we can reuse this when analyzing the recursive call inside
fact(3).
For the third call to fact, the argument is determined at runtime and so constant propagation
uses J for the calling context. In this case the recursive call to fact() also has J as the calling
context. But we cannot look up the result in the cache yet as analysis of fact() with J has not
completed. A naı̈ve approach would attempt to analyze fact() with J again, and would therefore
not terminate.
We can solve the problem by applying the same idea as in intraprocedural analysis. The recur-
sive call is a kind of a loop. We can make the initial assumption that the result of the recursive call
is K, which is conceptually equivalent to information coming from the back edge of a loop. When
we discover the result is a higher point in the lattice then K, we reanalyze the calling context (and
recursively, all calling contexts that depend on it). The algorithm to do so can be expressed as
follows:

4
type Context
val f n : F unction
val input : L

type Summary
val input : L
val output : L

val worklist : SetrContexts


val analyzing : Stack rContexts
val results : M aprContext, Summary s
val callers : M aprContext, SetrContextss

function A NALYZE P ROGRAM


worklist  tContextpmain, Jqu
while N OT E MPTY(worklist) do
ctx = R EMOVE(worklist)
A NALYZE(ctx)
end while
end function

function A NALYZE(ctx, σi )
σo  resultsrctxs.output
P USH(analyzing, ctx)
σo1 = I NTRAPROCEDURAL(ctx)
P OP(analyzing)
if σo  σo1 then
resultsrctxs  Summary pσi , σo1 q
for c P callersrctxs do
A DD(worklist, c)
end for
end if
return σo1
end function

function F LOW(vn : x  f py qw, ctx, σi q)


calleeCtx  G ET C TX(f, ctx, n, σi )
σo  R ESULTS F OR(calleeCtx, σi )
A DD(callersrcalleeCtxs, ctx)
return σo
end function

5
function R ESULTS F OR(ctx, σi )
σ  resultsrctxs.output
if σ  K&&σi „ resultsrctxs.input then
return σ ™ existing results are good
end if
resultsrctxs.input  resultsrctxs.input \ σi ™ keep track of possibly more general input
if ctx P analyzing then
return K
else
return A NALYZE(ctx)
end if
end function

function G ET C TX(f, callingCtx, n, σi )


return Contextpf, σi q
end function

The following example shows that the algorithm generalizes naturally to the case of mutually
recursive functions:
bar ( ) { i f ( ∗ ) r e t u r n 2 e l s e r e t u r n foo ( ) }
foo ( ) { i f ( ∗ ) r e t u r n 1 e l s e r e t u r n bar ( ) }

main ( ) { foo ( ) ; }

1.6 Precision
A notable part of the algorithm above is that if we are currently analyzing a context and are asked
to analyze it again, we return K as the result of the analysis. This has similar benefits to using K for
initial dataflow values on the back edges of loops: starting with the most optimistic assumptions
about code we haven’t finished analyzing allows us to reach the best possible fixed point. The
following example program illustrates a function where the result of analysis will be better if we
assume K for recursive calls to the same context, vs. for example if we assumed J:
function i t e r a t i v e I d e n t i t y ( x : int , y : int )
i f x <= 0
return y
else
i t e r a t i v e I d e n t i t y ( x  1 ,y )

f u n c t i o n main ( z )
w= i t e r a t i v e I d e n t i t y ( z , 5 )

1.7 Termination
Under what conditions will context-sensitive interprocedural analysis terminate?
Consider the algorithm above. Analyze is called only when (1) a context has not been analyzed
yet, or when (2) it has just been taken off the worklist. So it is called once per reachable context,
plus once for every time a reachable context is added to the worklist.

6
We can bound the total number of worklist additions by (C) the number of reachable contexts,
times (H) the height of the lattice (we don’t add to the worklist unless results for some context
changed, i.e. went up in the lattice relative to an initial assumption of K or relative to the last
analysis result), times (N) the number of callers of that reachable context. C*N is just the number
of edges (E) in the inter-context call graph, so we can see that we will do intraprocedural analysis
O(E*H) times.
Thus the algorithm will terminate as long as the lattice is of finite height and there are a finite
number of reachable contexts. Note, however, that for some lattices, notably including constant
propagation, there are an unbounded number of lattice elements and thus an unbounded number
of contexts. If more than a finite number are not reachable, the algorithm will not terminate.
So for lattices with an unbounded number of elements, we need to adjust the context-sensitivity
approach above to limit the number of contexts that are analyzed.

1.8 Approaches to Limiting Context-Sensitivity


No context-sensitivity. One approach to limiting the number of contexts is to allow only one for
each function. This is equivalent to the interprocedural control flow graph approach described
above. We can recast this approach as a variant of the generic interprocedural analysis algorithm
by replacing the Context type to track only the function being called, and then having the G ET C TX
method always return the same context:
type Context
val f n : F unction

function G ET C TX(f, callingCtx, n, σi )


return Contextpf q
end function
Note that in this approach the same calling context might be used for several different input
dataflow information σi , one for each call to G ET C TX. This is handled correctly by R ESULTS F OR,
which updates the input information in the Summary for that context so that it generalizes all the
input to the function seen so far.
Limited contexts. Another approach is to create contexts as in the original algorithm, but once a
certain number of contexts have been created for a given function, merge all subsequent calls into a
single context. Of course this means the algorithm cannot be sensitive to additional contexts once
the bound is reached, but if most functions have fewer contexts that are actually used, this can
be a good strategy for analyzing most of the program in a context-sensitive way while avoiding
performance problems for the minority of functions that are called from many different contexts.
Can you implement a G ET C TX function that represents this strategy?
Call strings. Another context sensitivity strategy is to differentiate contexts by a call string: the
call site, its call site, and so forth. In the limit, when considering call strings of arbitrary length,
this provides full context sensitivity. Dataflow analysis results for contexts based on arbitrary-
length call strings are as precise as the results for contexts based on separate analysis for each
different input dataflow information. The latter strategy can be more efficient, however, because
it reuses analysis results when a function is called twice with different call strings but the same
input dataflow information.
In practice, both strategies (arbitrary-length call strings vs. input dataflow information) can
result in reanalyzing each function so many times that performance becomes unacceptable. Thus
multiple contexts must be combined somehow to reduce the number of times each function is

7
analyzed. The call-string approach provides an easy, but naive, way to do this: call strings can be
cut off at a certain length. For example, if we have call strings “a b c” and “d e b c” (where c is the
most recent call site) with a cutoff of 2, the input dataflow information for these two call strings
will be merged and the analysis will be run only once, for the context identified by the common
length-two suffix of the strings, “b c”. We can illustrate this by redoing the analysis of the fibonacci
example. The algorithm is the same as above; however, we use a different implementation of
G ET C TX that computes the call string suffix:
type Context
val f n : F unction
val string : ListrZs

function G ET C TX(f, callingCtx, n, σi )


newStr  S UFFIX(callingCtx.string ++ n, CALL STRING CUTOFF)
return Contextpf, newStrq
end function
Although this strategy reduces the overall number of analyses, it does so in a relatively blind
way. If a function is called many times but we only want to analyze it a few times, we want to
group the calls into analysis contexts so that their input information is similar. Call string context
is a heuristic way of doing this that sometimes works well. But it can be wasteful: if two different
call strings of a given length happen to have exactly the same input analysis information, we will
do an unnecessary extra analysis, whereas it would have been better to spend that extra analysis
to differentiate calls with longer call strings that have different analysis information.
Given a limited analysis budget, it is usually best to use heuristics that are directly based on
input information. Unfortunately these heuristics are harder to design, but they have the potential
to do much better than a call-string based approach. We will look at some examples from the
literature to illustrate this later in the course.

Acknowledgements
I thank Claire Le Goues for greatly appreciated extensions and refinements to these notes.

8
Lecture Notes: Pointer Analysis

17-355/17-665: Program Analysis (Spring 2017)


Jonathan Aldrich
[email protected]

1 Motivation for Pointer Analysis


In the spirit of extending our understanding of analysis to more realistic languages, consider pro-
grams with pointers, or variables whose value refers to another value stored elsewhere in memory
by storing the address of that stored value. Pointers are very common in imperative and object-
oriented programs, and ignoring them can dramatically impact the precision of other analyses
that we have discussed. Consider constant-propagation analysis of the following program:

1: z : 1
2: p : &z
3: p : 2
4: print z
To analyze this program correctly we must be aware that at instruction 3 p points to z. If this
information is available we can use it in a flow function as follows:

fCP vp : y wpσ q  rz ÞÑ σ py qsσ where must-point-topp, z q

When we know exactly what a variable x points to, we have must-point-to information, and
we can perform a strong update of the target variable z, because we know with confidence that
assigning to p assigns to z. A technicality in the rule is quantifying over all z such that p must
point to z. How is this possible? It is not possible in C or Java; however, in a language with pass-
by-reference, for example C++, it is possible that two names for the same location are in scope.
Of course, it is also possible to be uncertain to which of several distinct locations p points:

1: z : 1
2: if pcondq p : &y else p : &z
3:  p : 2
4: print z
Now constant propagation analysis must conservatively assume that z could hold either 1 or
2. We can represent this with a flow function that uses may-point-to information:

fCP vp : y wpσ q  rz ÞÑ σ pz q \ σ py qsσ where may-point-topp, z q

1
2 Andersen’s Points-To Analysis
Two common kinds of pointer analysis are alias analysis and points-to analysis. Alias analysis
computes sets S holding pairs of variables pp, q q, where p and q may (or must) point to the same
location. Points-to analysis, as described above, computes a relation points-topp, xq, where p may
(or must) point to the location of the variable x. We will focus primarily on points-to analysis,
beginning with a simple but useful approach originally proposed by Andersen (PhD thesis: “Pro-
gram Analysis and Specialization for the C Programming Language”).
Our initial setting will be C programs. We are interested in analyzing instructions that are
relevant to pointers in the program. Ignoring for the moment memory allocation and arrays, we
can decompose all pointer operations into four types: taking the address of a variable, copying a
pointer from one variable to another, assigning through a pointer, and dereferencing a pointer:

I :: ...
| p : &x
| p : q
| p : q
| p : q
Andersen’s points-to analysis is a context-insensitive interprocedural analysis. It is also a flow-
insensitive analysis, that is an analysis that does not consider program statement order. Context-
and flow-insensitivity are used to improve the performance of the analysis, as precise pointer
analysis can be notoriously expensive in practice.
We will formulate Andersen’s analysis by generating set constraints which can later be pro-
cessed by a set constraint solver using a number of technologies. Constraint generation for each
statement works as given in the following set of rules. Because the analysis is flow-insensitive,
we do not care what order the instructions in the program come in; we simply generate a set of
constraints and solve them.

address-of
vp : &xw ãÑ lx P p

copy
vp : q w ãÑ p … q

assign
vp : q w ãÑ p … q

dereference
vp : q w ãÑ p … q

The constraints generated are all set constraints. The first rule states that a constant location lx ,
representation the address of x, is in the set of location pointed to by p. The second rule states that
the set of locations pointed to by p must be a superset of those pointed to by q. The last two rules
state the same, but take into account that one or the other pointer is dereferenced.
A number of specialized set constraint solvers exist and constraints in the form above can be
translated into the input for these. The dereference operation (the  in p … q) is not standard
in set constraints, but it can be encoded—see Fähndrich’s Ph.D. thesis for an example of how
to encode Andersen’s points-to analysis for the BANE constraint solving engine. We will treat
constraint-solving abstractly using the following constraint propagation rules:

2
p … q lx P q
copy
lx P p

p … q lr P p lx P q
assign
lx P r

p … q lr P q lx P r
dereference
lx P p

We can now apply Andersen’s points-to analysis to the program above. Note that in this
example if Andersen’s algorithm says that the set p points to only one location lz , we have must-
point-to information, whereas if the set p contains more than one location, we have only may-
point-to information.
We can also apply Andersen’s analysis to programs with dynamic memory allocation, such as:

1: q : malloc1 pq
2: p : malloc2 pq
3: p : q
4: r : &p
5: s : malloc3 pq
6: r : s
7: t : &s
8: u : t
In this example, the analysis is run the same way, but we treat the memory cell allocated at
each malloc or new statement as an abstract location labeled by the location n of the allocation
point. We can use the rules:

malloc
vp : mallocn pqw ãÑ ln P p

We must be careful because a malloc statement can be executed more than once, and each time
it executes, a new memory cell is allocated. Unless we have some other means of proving that
the malloc executes only once, we must assume that if some variable p only points to one abstract
malloc’d location ln , that is still may-alias information (i.e. p points to only one of the many actual
cells allocated at the given program location) and not must-alias information.
Analyzing the efficiency of Andersen’s algorithm, we can see that all constraints can be gener-
ated in a linear Opnq pass over the program. The solution size is Opn2 q because each of the Opnq
variables defined in the program could potentially point to Opnq other variables.
We can derive the execution time from a theorem by David McAllester published in SAS’99.
There are Opnq flow constraints generated of the form p … q, p … q, or p … q. How many
times could a constraint propagation rule fire for each flow constraint? For a p … q constraint,
the rule may fire at most Opnq times, because there are at most Opnq premises of the proper form
lx P p. However, a constraint of the form p … q could cause Opn2 q rule firings, because there
are Opnq premises each of the form lx P p and lr P q. With Opnq constraints of the form p … q
and Opn2 q firings for each, we have Opn3 q constraint firings overall. A similar analysis applies for
p … q constraints. McAllester’s theorem states that the analysis with Opn3 q rule firings can be

3
implemented in Opn3 q time. Thus we have derived that Andersen’s algorithm is cubic in the size
of the program, in the worst case.

2.1 Field-Sensitive Analysis


What happens when we have a pointer to a struct in C, or an object in an object-oriented language?
In this case, we would like the pointer analysis to tell us what each field in the struct or object
points to. A simple solution is to be field-insensitive, treating all fields in a struct as equivalent.
Thus if p points to a struct with two fields f and g, and we assign:

1 : p.f : &x
2 : p.g : &y
A field-insensitive analysis would tell us (imprecisely) that p.f could point to y. In order
to be more precise, we can track the contents each field of each abstract location separately. In
the discussion below, we assume a setting in which we cannot take the address of a field; this
assumption is true for Java but not for C. We can define a new kind of constraints for fields:

field-read
vp : q.f w ãÑ p … q.f

field-assign
vp.f : q w ãÑ p.f … q

Now assume that objects (e.g. in Java) are represented by abstract locations l. We can process
field constraints with the following rules:

p … q.f lq P q lf P lq .f
field-read
lf P p

p.f … q lp P p lq P q
field-assign
lq P lp .f

If we run this analysis on the code above, we find that it can distinguish that p.f points to x
and p.g points to y.

3 Steensgaard’s Points-To Analysis


For large programs, a cubic algorithm is too inefficient. Steensgaard proposed an pointer analysis
algorithm that operates in near-linear time, supporting essentially unlimited scalability in practice.
The first challenge in designing a near-linear time points-to analysis is to represent the results
in linear space. This is nontrivial because over the course of program execution, any given pointer
p could potentially point to the location of any other variable or pointer q. Representing all of
these pointers explicitly will inherently take Opn2 q space.
The solution Steensgaard found is based on using constant space for each variable in the pro-
gram. His analysis associates each variable p with an abstract location named after the variable.
Then, it tracks a single points-to relation between that abstract location p and another one q, to
which it may point. Now, it is possible that in some real program p may point to both q and some

4
other variable r. In this situation, Steensgaard’s algorithm unifies the abstract locations for q and
r, creating a single abstract location representing both of them. Now we can track the fact that p
may point to either variable using a single points-to relationship.
For example, consider the program below:

1: p : &x
2: r : &p
3: q : &y
4: s : &q
5: r : s
Andersen’s points-to analysis would produce the following graph:
x y

p q

r s
But in Steensgaard’s setting, when we discover that r could point both to q and to p, we must
merge q and p into a single node:
x y

pq

r s
Notice that we have lost precision: by merging the nodes for p and q our graph now implies
that s could point to p, which is not the case in the actual program. But we are not done. Now
pq has two outgoing arrows, so we must merge nodes x and y. The final graph produced by
Steensgaard’s algorithm is therefore:
xy

pq

r s
To define Steensgaard’s analysis more precisely, we will study a simplified version of that
ignores function pointers. It can be specified as follows:

5
copy
vp : q w ãÑ joinpp, q q

address-of
vp : &xw ãÑ joinpp, xq

dereference
vp : q w ãÑ joinpp, q q

assign
vp : q w ãÑ joinpp, q q

With each abstract location p, we associate the abstract location that p points to, denoted p.
Abstract locations are implemented as a union-find1 data structure so that we can merge two
abstract locations efficiently. In the rules above, we implicitly invoke find on an abstract location
before calling join on it, or before looking up the location it points to.
The join operation essentially implements a union operation on the abstract locations. How-
ever, since we are tracking what each abstract location points to, we must update this information
also. The algorithm to do so is as follows:
j o i n ( e1 , e2 )
i f ( e1 == e2 )
return
e 1 n e x t = ∗ e1
e 2 n e x t = ∗ e2
u n i f y ( e1 , e2 )
j o i n ( e1next , e 2 n e x t )
Once again, we implicitly invoke find on an abstract location before comparing it for equality,
looking up the abstract location it points to, or calling join recursively.
As an optimization, Steensgaard does not perform the join if the right hand side is not a pointer.
For example, if we have an assignment vp : q w and q has not been assigned any pointer value so
far in the analysis, we ignore the assignment. If later we find that q may hold a pointer, we must
revisit the assignment to get a sound result.
Steensgaard illustrated his algorithm using the following program:

1: a : &x
2: b : &y
3: if p then
4: y : &z
5: else
6: y : &x
7: c : &y
His analysis produces the following graph for this program:
1
See any algorithms textbook

6
xz

y a

c b
Rayside illustrates a situation in which Andersen must do more work than Steensgaard:

1: q : &x
2: q : &y
3: p : q
4: q : &z
After processing the first three statements, Steensgaard’s algorithm will have unified variables
x and y, with p and q both pointing to the unified node. In contrast, Andersen’s algorithm will
have both p and q pointing to both x and y. When the fourth statement is processed, Steensgaard’s
algorithm does only a constant amount of work, merging z in with the already-merged xy node.
On the other hand, Andersen’s algorithm must not just create a points-to relation from q to z, but
must also propagate that relationship to p. It is this additional propagation step that results in the
significant performance difference between these algorithms.
Analyzing Steensgaard’s pointer analysis for efficiency, we observe that each of n statements
in the program is processed once. The processing is linear, except for find operations on the union-
find data structure (which may take amortized time Opαpnqq each) and the join operations. We
note that in the join algorithm, the short-circuit test will fail at most Opnq times—at most once for
each variable in the program. Each time the short-circuit fails, two abstract locations are unified,
at cost Opαpnqq. The unification assures the short-circuit will not fail again for one of these two
variables. Because we have at most Opnq operations and the amortized cost of each operation
is at most Opαpnqq, the overall running time of the algorithm is near linear: Opn  αpnqq. Space
consumption is linear, as no space is used beyond that used to represent abstract locations for all
the variables in the program text.
Based on this asymptotic efficiency, Steensgaard’s algorithm was run on a 1 million line pro-
gram (Microsoft Word) in 1996; this was an order of magnitude greater scalability than other
pointer analyses known at the time.
Steensgaard’s pointer analysis is field-insensitive; making it field-sensitive would mean that it
is no longer linear.

Acknowledgements
I thank Claire Le Goues for greatly appreciated extensions and refinements to these notes.

7
Lecture Notes: Object-Oriented Call Graph Construction

17-355/17-665: Program Analysis (Spring 2017)


Jonathan Aldrich
[email protected]

1 Dynamic dispatch
Analyzing object-oriented programs is challenging because it is not obvious which function is
called at a given call site. In order to construct a precise call graph, an analysis must determine
what the type of the receiver object is at each call site. Therefore, object-oriented call graph con-
struction algorithms must simultaneously build a call graph and compute aliasing information
describing to which objects (and thereby implicitly to which types) each variable could point.

1.1 Simple approaches


The simplest approach is class hierarchy analysis, which uses the type of a variable, together with
the class hierarchy, to determine what types of object the variable could point to. Unsurprisingly,
this is very imprecise, but can be computed very efficiently in Opn  tq time, because it visits n call
sites and at each call site traverses a subtree of size t of the class hierarchy.
An improvement to class hierarchy analysis is rapid type analysis, which eliminates from the
hierarchy classes that are never instantiated. The analysis iteratively builds a set of instantiated
types, method names invoked, and concrete methods called. Initially, it assumes that main is the
only concrete method that is called, and that no objects are instantiated. It then analyzes concrete
methods known to be called one by one. When a method name is invoked, it is added to the
list, and all concrete methods with that name defined within (or inherited by) types known to be
instantiated are added to the called list. When an object is instantiated, its type is added to the list
of instantiated types, and all its concrete methods that have a method name that is invoked are
added to the called list. This proceeds iteratively until a fixed point is reached, at which point the
analysis knows all of the object types that may actually be created at run time.
Rapid type analysis can be considerably more precise than class hierarchy analysis in programs
that use libraries that define many types, only a few of which are used by the program. It remains
extremely efficient, because it only needs to traverse the program once (in Opnq time) and then
build the call graph by visiting each of n call sites and considering a subtree of size t of the class
hierarchy, for a total of Opn  tq time.

1.2 0-CFA Style Object-Oriented Call Graph Construction


Object-oriented call graphs can also be constructed using a pointer analysis such as Andersen’s
algorithm, either context-insensitive or context-sensitive. The context-sensitive versions are called
k-CFA by analogy with control-flow analysis for functional programs (to be discussed in a forth-
coming lecture). The context-insensitive version is called 0-CFA for the same reason. Essentially,

1
the analysis proceeds as in Andersen’s algorithm, but the call graph is built up incrementally as
the analysis discovers the types of the objects to which each variable in the program can point.
Even 0-CFA analysis can be considerably more precise than Rapid Type Analysis. For example,
in the program below, RTA would assume that any implementation of foo() could be invoked at
any program location, but O-CFA can distinguish the two call sites:
class A { A foo (A x ) { return x ; } }
class B extends A { A foo (A x ) { r e t u r n new D ( ) ; } }
class D extends A { A foo (A x ) { r e t u r n new A ( ) ; } }
class C extends A { A foo (A x ) { return t h i s ; } }

// i n main ( )
A x = new A ( ) ;
while ( . . . )
x = x . foo ( new B ( ) ) ; // may c a l l A. foo , B . foo , or D. foo
A y = new C ( ) ;
y . foo ( x ) ; // only c a l l s C . foo

Acknowledgements
I thank Claire Le Goues for greatly appreciated extensions and refinements to these notes.

2
Lecture Notes: Control Flow Analysis for Functional
Languages

17-355/17-665: Program Analysis (Spring 2017)


Jonathan Aldrich
[email protected]

1 Analysis of Functional Programs


Analyzing functional programs challenges the framework we’ve discussed so far. Understanding
and solving those problems illustrates constraint based analyses and is also closely related to call
graph construction in object-oriented languages, as we discussed in the previous lecture. Consider
an idealized functional language based on the lambda calculus, similar to the core of Scheme or
ML, defined as follows:

e :: λx.e
| x
| e1 e2
| let x  e1 in e2
| if e0 then e1 else e2
| n | e1 e2 | ...
The grammar includes a definition of an anonymous function λx.e, where x is the function argu-
ment and e is the function body.1 The function can include any of the other types of expressions,
such as variables x or function calls e1 e2 , where e1 is the function to be invoked and e2 is passed
to that function as an argument. (In an imperative language this would more typically be written
e1 pe2 q but we follow the functional convention here, with parenthesis included when helpful syn-
tactically). We evaluate a function call pλx.eqpv q by substituting the argument v for all occurrences
of x in e. For example, pλx.x 1qp3q evaluates to 3 1, which of course evaluates to 4.
A more interesting example is pλf.f 3qpλx.x 1q, which first substitutes the argument for f ,
yielding pλx.x 1q 3. Then we invoke the function, getting 3 1 which again evaluates to 4.

1.1 0-CFA
Static analysis be just as useful in this type of language as in imperative languages, but immediate
complexities arise. For example: what is a program point in a language without obvious predeces-
sors or successors? Computation is intrinsically nested. Second, because functions are first-class
entities that can be passed around as variables, it’s not obvious which function is being applied
where. Although it is not obvious, we still need some way to figure it out, because the value a
function returns (which we may hope to track, such as through constant propagation analysis)
1
The formulation in PPA also includes a syntactic construct for explicitly recursive functions. The ideas extend
naturally, but we’ll follow the simpler syntax for expository purposes.

1
will inevitably depend on which function is called, as well as its arguments. Control flow analysis2
seeks to statically determine which functions could be associated with which variables. Further,
because functional languages are not based on statements but rather expressions, it is appropriate
to consider both the values of variables and the values expressions evaluate to.
We thus consider each expression to be labeled with a label l P L. Our analysis information σ
maps each variable and label to a lattice value. This first analysis is only concerned with possible
functions associated with each location or variable, and so the abstract domain is as follows:

σ P Var Y L Ñ P pλx.eq
The analysis information at any given program point, or for any program variable, is a set of
functions that could be stored in the variable or computed at that program point. Question: what
is the „ relation on this dataflow state?
We define the analysis by via inference rules that generate constraints over the possible dataflow
values for each variable or labeled location; those constraints are then solved. We use the ãÑ to
denote a relation such that vewl ãÑ C can be read as “The analysis of expression e with label l
generates constraints C over dataflow state σ.” For our first CFA, we can define inference rules for
this relation as follows:
var
vnwl ãÑ H const vxwl ãÑ σpxq „ σplq
In the rules above, the constant or variable value flows to the program location l. Question:
what might the rules for the if-then-else or arithmetic operator expressions look like? The rule for function
calls is a bit more complex. We define rules for lambda and application as follows:

vewl ãÑ C 0

vλx.el wl ãÑ tλx.eu „ σplq Y C


0
lambda

ve1wl ãÑ C1 ve2wl ãÑ C2
1 2
apply
vel1 el2 wl ãÑ C1 Y C2 Y @λx.el0 P σpl1q : σpl2q „ σpxq ^ σpl0q „ σplq
1 2 0

The first rule just states that if a literal function is declared at a program location l, that function
is part of the lattice value σ plq computed by the analysis for that location. Because we want to
analyze the data flow inside the function, we also generate a set of constraints C from the function
body and return those constraints as well.
The rule for application first analyzes the function and the argument to extract two sets of
constraints C1 and C2 . We then generate a conditional constraint, saying that for every literal
function λx.e0 that the analysis (eventually) determines the function may evaluate to, we must
generate additional constraints capture value flow from the formal function argument to the actual
argument variable, and from the function result to the calling expression.
Consider the first example program given above, properly labelled as: ppλx.pxa 1b qc qd p3qe qg
one by one to analyze it. The first rule to use is apply (because that’s the top-level program con-
struct). We will work this out together, but the generated constraints could look like:

pσpxq „ σpaqq Y ptλx.x 1u „ σ pdqq Y pσ peq „ σ pxqq ^ pσ pcq „ σ pg qq


2
This nomenclature is confusing because it is also used to refer to analyses of control flow graphs in imperative
languages; We usually abbreviate to CFA when discussing the analysis of functional languages.

2
There are many possible valid solutions to this constraint set; clearly we want a precise solution
that does not overapproximate. We will elide a formal definition and instead assert that a σ that
maps all variables and locations except d to H and d to tλx.x 1u satisfies this set of constraints.

1.2 0-CFA with dataflow information


The analysis in the previous subsection is interesting if all you’re interested in is which functions
can be called where, but doesn’t solve the general problem of dataflow analysis of functional pro-
grams. Fortunately, extending that approach to a more general analysis space is straightforward:
we simply add the abstract information we’re tracking to the abstract domain defined above. For
constant propagation, for example, we can extend the dataflow state as follows:

σ P Var Y Lab Ñ L L  pK Z Jq  P pλx.eq


Now, the analysis information at any program point, or for any variable, may be an integer n,
or J, or a set of functions that could be stored in the variable or computed at that program point.
This requires that we modify our inference rules slightly, but not as much as you might expect.
Indeed, the rules mostly change for arithmetic operators (which we omitted above) and constants.
We simply need to provide an abstraction over concrete values that captures the dataflow infor-
mation in question. The rule for constants thus becomes:

vnwl ãÑ β pnq „ σplq const


Where β is defined as we discussed in abstract interpretation. We must also provide an abstraction
over arithmetic operators, as before; we omit this rule for brevity.
Consider the second example, above, properly labeled: pppλf.pf a 3b qc qe pλx.pxg 1h qi qj qk A
constant propagation analysis could produce the following results:

Var Y Lab L by rule


e λf.f 3 lambda
j λx.x 1 lambda
f λx.x 1 apply
a λx.x 1 var
b 3 const
x 3 apply
g 3 var
h 1 const
i 4 add
c 4 apply
k 4 apply

1.3 m-Calling Context Sensitive Control Flow Analysis (m-CFA)


The control flow analysis described above—known as 0-CFA, where CFA stands for Control Flow
Analysis and the 0 indicates context insensitivity—works well for simple programs like the exam-
ple above, but it quickly becomes imprecise in more interesting programs that reuse functions in
several calling contexts. The following code illustrates the problem:

3
let add  λx. λy. x y
let add5  padd 5qa5
let add6  padd 6qa6
let main  padd5 2qm
This example illustrates currying, in which a function such as add that takes two arguments x
and y in sequence can be called with only one argument (e.g. 5 in the call labeled a5), resulting in
a function that can later be called with the second argument (in this case, 2 at the call labeled m).
The value 5 for the first argument in this example is stored with the function in the closure add5.
Thus when the second argument is passed to add5, the closure holds the value of x so that the sum
x y  5 2  7 can be computed.
The use of closures complicates program analysis. In this case, we create two closures, add5
and add6, within the program, binding 5 and 6 and the respective values for x. But unfortunately
the program analysis cannot distinguish these two closures, because it only computes one value
for x, and since two different values are passed in, we learn only that x has the value J. This
is illustrated in the following analysis. The trace below has been shortened to focus only on the
variables (the actual analysis, of course, would compute information for each program point too):

Var Y Lab L notes


add λx. λy. x y
x 5 when analyzing first call
add5 λy. x y
x J when analyzing second call
add6 λy. x y
main J
We can add precision using a context-sensitive analysis. One could, in principle, use either the
functional or call-string approach, as described earlier. In practice the call-string approach seems
to be used for control-flow analysis in functional programming languages, perhaps because in the
functional approach there could be many, many contexts for each function, and it is easier to place
a bound on the analysis in the call-string approach.
We add context sensitivity by making our analysis information σ track information separately
for different call strings, denoted by ∆. Here a call string is a sequence of labels, each one denoting
a function call site, where the sequence can be of any length between 0 and some bound m (in
practice m will be in the range 0-2 for scalability reasons):

σ P pVar Y Labq  ∆ Ñ L δ P ∆  Labn¤m L  pK Z Jq  P ppλx.e, δqq


When a lambda expression is analyzed, we now consider as part of the lattice the call string
context δ in which its free variables were captured. We can then define a set of rules that generate
constraints which, when solved, provide an answer to control-flow analysis, as well as (in this
case) constant propagation:
var
δ $ vnwl ãÑ αpnq „ σpl, δq const δ $ vxwl ãÑ σpx, δq „ σpl, δq

4
$ vλx.el wl ãÑ tpλx.e, δqu „ σpl, δq lambda
δ 0

δ $ ve”
1 w ãÑ C1
l1 δ $ ve2 wl ãÑ C2 2 δ 1  suffix pδ l, mq
C3  pλx.e ,δ qPσpl ,δq σ pl2 , δ q „ σ px, δ q ^ σ pl0 , δ q „ σ pl, δ q
l0
1 1

^ @y P FV pλx.e0q : σpy, δ0q „ σpy, δ1q


0 0 1

”
C4  pλx.e ,δ qPσpl ,δq analyze pδ 1 $ ve0 wl q
l0
0
0 0 1
apply
δ $ vel1 el2 wl ãÑ C1 Y C2 Y C3 Y C4
1 2

These rules contain a call string context δ in which the analysis of each line of code is done. The
rules const and var are unchanged except for indexing σ by the current context δ. The lambda rule
now captures the context δ along with the lambda expression, so that when the lambda expression
is called the analysis knows in which context to look up the free variables.
The apply rule has gotten more complicated. A new context δ is formed by appending the
current call site l to the old call string, then taking the suffix of length m (or less). We now consider
all functions that may be called, as eventually determined by the analysis (our notation is slightly
loose, because the quantifier must be evaluated continuously for more matches as the analysis
goes along). For each, we produce constraints capturing the flow of values from the formal to
actual arguments, and from the result to the calling expression. We also produce constraints that
bind the free variables in the new context: all free variables in the called function flow from the
point δ0 at which the closure was captured. Finally, in C4 we collect the constraints that we get
from analyzing each of the potentially called functions in the new context δ 1 .
We can now reanalyze the earlier example, observing the benefit of context sensitivity. In the
table below, denotes the empty calling context (e.g. when analyzing the main procedure):

Var / Lab, δ L notes


add, pλx. λy. x y, q
x, a5 5
add5, pλy. x y, a5q
x, a6 6
add6, pλy. x y, a6q
main, 7

Note three points about this analysis. First, we can distinguish the values of x in the two
calling contexts: x is 5 in the context a5 but it is 6 in the context a6. Second, the closures returned
to the variables add5 and add6 record the scope in which the free variable x was bound when the
closure was captured. This means, third, that when we invoke the closure add5 at program point
m, we will know that x was captured in calling context a5, and so when the analysis analyzes the
addition, it knows that x holds the constant 5 in this context. This enables constant propagation
to compute a precise answer, learning that the variable main holds the value 7.

1.4 Optional: Uniform k-Calling Context Sensitive Control Flow Analysis (k-CFA)
m-CFA was proposed recently by Might, Smaragdakis, and Van Horn as a more scalable version
of the original k-CFA analysis developed by Shivers for Scheme. While m-CFA now seems to be
a better tradeoff between scalability and precision, k-CFA is interesting both for historical reasons

5
and because it illustrates a more precise approach to tracking the values of variables in a closure.
The following example illustrates a situation in which m-CFA may be too imprecise:

let adde  λx.


let h  λy. λz. x y z
let r  ph 8qr
in r
let t  padde 2qt
let f  padde 4qf
let e  pt 1qe
When we analyze it with m-CFA (for m  1), we get the following results:

Var / Lab, δ L notes


adde, pλx..., q
x, t 2
y, r 8
x, r 2 when analyzing first call

t, pλz. x y z, rq
x, f 4
x, r J when analyzing second call

f, pλz. x y z, rq
e, J
The k-CFA analysis is like m-CFA, except that rather than keeping track of the scope in which
a closure was captured, the analysis keeps track of the scope in which each variable captured in
the closure was defined. We use an environment η to track this. Note that since η can represent
a separately calling context for each variable, rather than merely a single context for all variables,
it has the potential to be more accurate, but also much more expensive. We can represent the
analysis information as follows:

σ P pVar Y Labq  ∆ Ñ L ∆  Labn¤k


L  Z J P pλx.e, ηq η P Var Ñ ∆

Let us briefly analyze the complexity of this analysis. In the worst case, if a closure captures n
different variables, we may have a different call string for each of them. There are Opnk q different
call strings for a program of size n, so if we keep track of one for each of n variables, we have
Opnnk q different representations of the contexts for the variables captured in each closure. This
exponential blowup is why k-CFA scales so badly. m-CFA is comparatively cheap—there are
“only” Opnk q different contexts for the variables captured in each closure—still exponential in k,
but polynomial in n for a fixed (and generally small) k.
We can now define the rules for k-CFA. They are similar to the rules for m-CFA, except that we
now have two contexts: the calling context δ, and the environment context η tracking the context
in which each variable is bound. When we analyze a variable x, we look it up not in the current
context δ, but the context η pxq in which it was bound. When a lambda is analyzed, we track the
current environment η with the lambda, as this is the information necessary to determine where
captured variables are bound. The application rule is actually somewhat simpler, because we do
not copy bound variables into the context of the called procedure:

6
var
δ, η $ vnwl ãÑ αpnq „ σpl, δq const δ, η $ vxwl ãÑ σpx, ηpxqq „ σpl, δq

δ, η $ vλx.el wl ãÑ tpλx.e, ηqu „ σpl, δq lambda


0

δ, η $ ve1 wl1 ãÑ C1
” δ, η $ ve2 wl2 ãÑ C2 δ 1  suffix pδ l, k q
C3  pλx.el0 ,η qPσpl ,δq σ pl2 , δ q „ σ px, δ q ^ σ pl0 , δ q „ σ pl, δ q
1 1
”
C4  pλx.el0 ,η qPσpl ,δq C where δ 1 , η0 $ ve0 wl0 ãÑ C
0 0 1

0 0 1
apply
δ, η $ vel1 1
el22 wl ãÑ C1 Y C2 Y C3 Y C4

Now we can see how k-CFA analysis can more precisely analyze the latest example program.
In the simulation below, we give two tables: one showing the order in which the functions are
analyzed, along with the calling context δ and the environment η for each analysis, and the other
as usual showing the analysis information computed for the variables in the program:

function δ η
main H
adde t tx ÞÑ tu
h r tx ÞÑ t, y ÞÑ ru
adde f tx ÞÑ f u
h r tx ÞÑ f, y ÞÑ ru
λz.... e tx ÞÑ t, y ÞÑ r, z ÞÑ eu

Var / Lab, δ L notes


adde, pλx..., q
x, t 2
y, r 8
t, pλz. x y z, tx ÞÑ t, y ÞÑ ruq
x, f 4
f, pλz. x y z, tx ÞÑ f, y ÞÑ ruq
z, e 1
t, 11

Tracking the definition point of each variable separately is enough to restore precision in this
program. However, programs with this structure—in which analysis of the program depends on
different calling contexts for bound variables even when the context is the same for the function
eventually called—appear to be rare in practice. Might et al. observed no examples among the real
programs they tested in which k-CFA was more accurate than m-CFA—but k-CFA was often far
more costly. Thus at this point the m-CFA analysis seems to be a better tradeoff between efficiency
and precision, compared to k-CFA.

Acknowledgements
I thank Claire Le Goues for greatly appreciated extensions and refinements to these notes.

7
Lecture Notes: Axiomatic Semantics and
Hoare-style Verification

17-355/17-665/17-819O: Program Analysis (Spring 2018)


Claire Le Goues and Jonathan Aldrich
[email protected], [email protected]

It has been found a serious problem to define these languages [ALGOL, FORTRAN,
COBOL] with sufficient rigor to ensure compatibility among all implementations...One
way to achieve this would be to insist that all implementations of the language shall
satisfy the axioms and rules of inference which underlie proofs of properties of pro-
grams expressed in the language. In effect, this is equivalent to accepting the axioms
and rules of inference as the ultimately definitive specification of the meaning of the
language.
C.A.R Hoare, An Axiomatic Basis for Computer Programming,1969

1 Axiomatic Semantics
Axiomatic semantics (or Hoare-style logic) defines the meaning of a statement in terms of its effects
on assertions of truth that can be made about the associated program. This provides a formal
system for reasoning about correctness. An axiomatic semantics fundamentally consists of: (1)
a language for stating assertions about programs (where an assertion is something like “if this
function terminates, x > 0 upon termination”), coupled with (2) rules for establishing the truth of
assertions. Various logics have been used to encode such assertions; for simplicity, we will begin
by focusing on first-order logic.
In this system, a Hoare Triple encodes such assertions:

{P } S {Q}
P is the precondition, Q is the postcondition, and S is a piece of code of interest. Relating this
back to our earlier understanding of program semantics, this can be read as “if P holds in some
state E and if hS, Ei ⇓ E 0 , then Q holds in E 0 .” We distinguish between partial ({P } S {Q}) and
total ([P ] S [Q]) correctness by saying that total correctness means that, given precondition P , S
will terminate, and Q will hold; partial correctness does not make termination guarantees. We
primarily focus on partial correctness.

1.1 Assertion judgements using operational semantics


Consider a simple assertion language adding first-order predicate logic to W HILE expressions:

A ::= true | false | e1 = e2 | e1 ≥ e2 | A1 ∧ A2


| A1 ∨ A2 | A1 ⇒ A2 | ∀x.A | ∃x.A

1
Note that we are somewhat sloppy in mixing logical variables and program variables; All
W HILE variables implicitly range over integers; All W HILE boolean expressions are also assertions
We now define an assertion judgement E  A , read “A is true in E”. The  judgment is de-
fined inductively on the structure of assertions, and relies on the operational semantics of W HILE
arithmetic expressions. For example:

E  true always
E  e1 = e2 iff he1 , Ei ⇓ n = he2 , Ei ⇓ n
E  e1 ≥ e2 iff he1 , Ei ⇓ n ≥ he2 , Ei ⇓ n
E  A1 ∧ A2 iff E  A1 and E  A2
...
E  ∀x.A iff ∀n ∈ Z.E[x := n]  A
E  ∃x.A iff ∃n ∈ Z.E[x := n]  A
Now we can define formally the meaning of a partial correctness assertion  {P } S {Q}:

∀E ∈ E.∀E 0 ∈ E.(E  P ∧ hS, Ei ⇓ E 0 ) ⇒ E 0  Q

Question: What about total correctness?

This gives us a formal, but unsatisfactory, mechanism to decide  {P } S {Q}. By defining


the judgement in terms of the operational semantics, we practically have to run the program to
verify an assertion! It’s also awkward/impossible to effectively verify the truth of a ∀x.A assertion
(check every integer?!). This motivates a new symbolic technique for deriving valid assertions
from others that are known to be valid.

1.2 Derivation rules for Hoare triples


We write ` A (read “we can prove A”) when A can be derived from basic axioms. The derivation
rules for ` A are the usual ones from first-order logic with arithmetic, like (but obviously not
limited to):

`A `B
and
`A∧B
We can now write ` {P } S {Q} when we can derive a triple using derivation rules. There is
one derivation rule for each statement type in the language (sound familiar?):

skip assign
` {P } skip {P } ` {[e/x]P } x:=e {P }

` {P } S1 {P 0 } ` {P 0 } S2 {Q} ` {P ∧ b}S1 {Q} ` {P ∧ ¬b} S2 {Q}


seq if
` {P } x:=e {Q} ` {P } if b then S1 else S2 {Q}

Question: What can we do for while?

There is also the rule of consequence:

` P0 ⇒ P ` {P } S {Q} ` Q ⇒ Q0
consq
` {P 0 } S {Q0 }

2
This rule is important because it lets us make progress even when the pre/post conditions
in our program don’t exactly match what we need (even if they’re logically equivalent) or are
stronger or weaker logically than ideal.
We can use this system to prove that triples hold. Consider {true} x := e {x = e}, using (in
this case) the assignment rule plus the rule of consequence:

` true ⇒ e = e {e = e} x := e {x = e}
` {true}x := e{x = e}

We elide a formal statement of the soundness of this system. Intuitively, it expresses that the
axiomatic proof we can derive using these rules is equivalent to the operational semantics deriva-
tion (or that they are sound and relatively complete, that is, as complete as the underlying logic).

2 Proofs of a Program
Hoare-style verification is based on the idea of a specification as a contract between the imple-
mentation of a function and its clients. The specification consists of the precondition and a post-
condition. The precondition is a predicate describing the condition the code/function relies on for
correct operation; the client must fulfill this condition. The postcondition is a predicate describing
the condition the function establishes after correctly running; the client can rely on this condition
being true after the call to the function.
Note that if a client calls a function without fulfilling its precondition, the function can behave
in any way at all and still be correct. Therefore, if a function must be robust to errors, the precon-
dition should include the possibility of erroneous input, and the postcondition should describe
what should happen in case of that input (e.g. an exception being thrown).
The goal in Hoare-style verification is thus to (statically!) prove that, given a pre-condition,
a particular post-condition will hold after a piece of code executes. We do this by generating a
logical formula known as a verification condition, constructed such that, if true, we know that the
program behaves as specified. The general strategy for doing this, introduced by Dijkstra, relies
on the idea of a weakest precondition of a statement with respect to the desired post-condition. We
then show that the given precondition implies it. However, loops, as ever, complicate this strategy.

2.1 Strongest postconditions and weakest pre-conditions


We can write any number of perfectly valid Hoare triples. Consider the Hoare triple {x = 5} x :=
x ∗ 2 {x > 0}. This triple is clearly correct, because if x = 5 and we multiply x by 2, we get x = 10
which clearly implies that x > 0. However, although correct, this Hoare triple is not a precise as
we might like. Specifically, we could write a stronger postcondition, i.e. one that implies x > 0.
For example, x > 5 ∧ x < 20 is stronger because it is more informative; it pins down the value of x
more precisely than x > 0. The strongest postcondition possible is x = 10; this is the most useful
postcondition. Formally, if {P } S {Q} and for all Q0 such that {P } S {Q0 }, Q ⇒ Q0 , then Q is the
strongest postcondition of S with respect to P .
We can compute the strongest postcondition for a given statement and precondition using the
function sp(S, P ). Consider the case of a statement of the form x := E. If the condition P held
before the statement, we now know that P still holds and that x = E—where P and E are now in
terms of the old, pre-assigned value of x. For example, if P is x+y = 5, and S is x := x+z, then we

3
should know that x0 + y = 5 and x = x0 + z, where x0 is the old value of x. The program semantics
doesn’t keep track of the old value of x, but we can express it by introducing a fresh, existentially
quantified variable x0 . This gives us the following strongest postcondition for assignment:1

sp(x := E, P ) = ∃x0 .[x0 /x]P ∧ x = [x0 /x]E


While this scheme is workable, it is awkward to existentially quantify over a fresh variable
at every statement; the formulas produced become unnecessarily complicated, and if we want to
use automated theorem provers, the additional quantification tends to cause problems. Dijkstra
proposed reasoning instead in terms of weakest preconditions, which turns out to work better. If
{P } S {Q} and for all P 0 such that {P 0 } S {Q}, P 0 ⇒ P , then P is the weakest precondition
wp(S, Q) of S with respect to Q.
We can define a function yielding the weakest precondition inductively, following the Hoare
rules. For for assignments, sequences, and if statements, this yields:

wp(x := E, P ) = [E/x]P
wp(S; T, Q) = wp(S, wp(T, Q))
wp(if B then S else T, Q) = B ⇒ wp(S, Q) ∧ ¬B ⇒ wp(T, Q)

2.2 Loops
As usual, things get tricky when we get to loops. Consider:

{P } while(i < x) do f = f ∗ i; i := i + 1 done{f = x!}


What is the weakest precondition here? Fundamentally, we need to prove by induction that the
property we care about will generalize across an arbitrary number of loop iterations. Thus, P is the
base case, and we need some inductive hypothesis that is preserved when executing loop body an
arbitrary number of times. We commonly refer to this hypothesis as a loop invariant, because it rep-
resents a condition that is always true (i.e. invariant) before and after each execution of the loop.
Computing weakest preconditions on loops is very difficult on real languages. Instead, we
assume the provision of that loop invariant. A loop invariant must fulfill the following conditions:
• P ⇒ I : The invariant is initially true. This condition is necessary as a base case, to establish
the induction hypothesis.
• {Inv∧B} S {Inv} : Each execution of the loop preserves the invariant. This isn the inductive
case of the proof.
• (Inv ∧ ¬B) ⇒ Q : The invariant and the loop exit condition imply the postcondition. This
condition is simply demonstrating that the induction hypothesis/loop invariant we have
chosen is sufficiently strong to prove our postcondition Q.
The procedure outlined above only verifies partial correctness, because it does not reason
about how many times the loop may execute. Verifying full correctness involves placing an upper
bound on the number of remaining times the loop body will execute, typically called a variant
function, written v, because it is variant: we must prove that it decreases each time we go through
the loop. We mention this for the interested reader; we will not spend much time on it.
1
Recall that the operation [x0 /x]E denotes the capture-avoiding substitution of x0 for x in E; we rename bound
variables as we do the substitution so as to avoid conflicts.

4
2.3 Proving programs
Assume a version of W HILE that annotates loops with invariants: whileinv b do S. Given such a
program, and associated pre- and post-conditions:

{P } Sinv {Q}
The proof strategy constructs a verification condition V C(Sannot , Q) that we seek to prove true
(usually with the help of a theorem prover). V C is guaranteed to be stronger than wp(Sannot , Q)
but still weaker than P : P ⇒ V C(Sannot , Q) ⇒ wp(Sannot , Q) We compute V C using a verification
condition generation procedure V CGen, which mostly follows the definition of the wp function
discussed above:

V CGen(skip, Q) = Q
V CGen(S1 ; S2 , Q) = V CGen(S1 , V CGen(S2 , Q))
V CGen(if b then S1 else S2 , Q) = b ⇒ V CGen(S1 , Q) ∧ ¬b ⇒ V CGen(S2 , Q)
V CGen(x := e, Q) = [e/x]Q

The one major point of difference is in the handling of loops:

V CGen(whileinv e do S, Q) = Inv ∧ (∀x1 ...xn .Inv ⇒ (e ⇒ V CGen(S, Inv) ∧ ¬e ⇒ Q))

To see this in action, consider the following W HILE program:

r := 1;
i := 0;
while i < m do
r := r ∗ n;
i := i + 1
We wish to prove that this function computes the nth power of m and leaves the result in r.
We can state this with the postcondition r = nm .
Next, we need to determine a precondition for the program. We cannot simply compute it
with wp because we do not yet know the loop invariant is—and in fact, different loop invariants
could lead to different preconditions. However, a bit of reasoning will help. We must have m ≥ 0
because we have no provision for dividing by n, and we avoid the problematic computation of 00
by assuming n > 0. Thus our precondition will be m ≥ 0 ∧ n > 0.
A good heuristic for choosing a loop invariant is often to modify the postcondition of the loop
to make it depend on the loop index instead of some other variable. Since the loop index runs
from i to m, we can guess that we should replace m with i in the postcondition r = nm . This gives
us a first guess that the loop invariant should include r = ni .
This loop invariant is not strong enough, however, because the loop invariant conjoined with
the loop exit condition should imply the postcondition. The loop exit condition is i ≥ m, but we
need to know that i = m. We can get this if we add i ≤ m to the loop invariant. In addition, for
proving the loop body correct, we will also need to add 0 ≤ i and n > 0 to the loop invariant.
Thus our full loop invariant will be r = ni ∧ 0 ≤ i ≤ m ∧ n > 0.

5
Our next task is to use weakest preconditions to generate proof obligations that will verify the
correctness of the specification. We will first ensure that the invariant is initially true when the
loop is reached, by propagating that invariant past the first two statements in the program:

{m ≥ 0 ∧ n > 0}
r := 1;
i := 0;
{r = ni ∧ 0 ≤ i ≤ m ∧ n > 0}
We propagate the loop invariant past i := 0 to get r = n0 ∧ 0 ≤ 0 ≤ m ∧ n > 0. We propagate
this past r := 1 to get 1 = n0 ∧ 0 ≤ 0 ≤ m ∧ n > 0. Thus our proof obligation is to show that:

m ≥ 0 ∧ n > 0 ⇒ 1 = n0 ∧ 0 ≤ 0 ≤ m ∧ n > 0
We prove this with the following logic:

m≥0∧n>0 by assumption
1 = n0 because n0 = 1 for all n > 0 and we know n > 0
0≤0 by definition of ≤
0≤m because m ≥ 0 by assumption
n>0 by the assumption above
1 = n0 ∧ 0 ≤ 0 ≤ m ∧ n > 0 by conjunction of the above
To show the loop invariant is preserved, we have:

{r = ni ∧ 0 ≤ i ≤ m ∧ n > 0 ∧ i < m}
r := r ∗ n;
i := i + 1;
{r = ni ∧ 0 ≤ i ≤ m ∧ n > 0}
We propagate the invariant past i := i + 1 to get r = ni+1 ∧ 0 ≤ i + 1 ≤ m ∧ n > 0. We propagate
this past r := r ∗ n to get: r ∗ n = ni+1 ∧ 0 ≤ i + 1 ≤ m ∧ n > 0. Our proof obligation is therefore:

r = ni ∧ 0 ≤ i ≤ m ∧ n > 0 ∧ i < m
⇒ r ∗ n = ni+1 ∧ 0 ≤ i + 1 ≤ m ∧ n > 0
We can prove this as follows:

r = ni ∧ 0 ≤ i ≤ m ∧ n > 0 ∧ i < m by assumption


r ∗ n = ni ∗ n multiplying by n
r ∗ n = ni+1 definition of exponentiation
0≤i+1 because 0 ≤ i
i+1<m+1 by adding 1 to inequality
i+1≤m by definition of ≤
n>0 by assumption
r ∗ n = ni+1 ∧ 0 ≤ i + 1 ≤ m ∧ n > 0 by conjunction of the above
Last, we need to prove that the postcondition holds when we exit the loop. We have already
hinted at why this will be so when we chose the loop invariant. However, we can state the proof
obligation formally:

r = ni ∧ 0 ≤ i ≤ m ∧ n > 0 ∧ i ≥ m
⇒ r = nm

6
We can prove it as follows:

r = ni ∧ 0 ≤ i ≤ m ∧ n > 0 ∧ i ≥ m by assumption
i=m because i ≤ m and i ≥ m
r = nm substituting m for i in assumption

7
Lecture Notes: Program Synthesis

17-355/17-665/17-819O: Program Analysis (Spring 2018)


Jonathan Aldrich and Claire Le Goues
[email protected], [email protected]

Note: A complete, if lengthy, resource on inductive program synthesis is the book “Program Syn-
thesis” by Gulwani et. al [8]. You need not read the whole thing; I encourage you to investigate
the portions of interest to you, and skim as appropriate. I drew many references in this document
from there; if you are interested, it contains many more.

1 Program Synthesis Overview


The problem of program synthesis can be expressed as follows:

DP.@x, ϕpx, P pxqq

That is, we seek a program P that satisfies some specification ϕ on all inputs. We take a lib-
eral view of P in discussing synthesis, as a wide variety of artifact types have beeen successfully
synthesized (anythign that reads inputs or produces outputs). Beyond (relatively small) program
snippets of the expected variety, this includes protocols, interpreters, classifiers, compression al-
gorithms or implementations, scheduling policies, cache coherence protocols for multicore pro-
cessors. The specification ϕ is an expression of the user intent, and may be expressed in one of
several ways: a formula, a reference implementation, input/output pairs, traces, demonstrations,
or a syntactic sketch, among other options.
Program synthesis can thus be considered along three dimensions:

(1) Expressing user intent. User intent (or ϕ in the above) can be expressed in a number of
ways, including logical specifications, input/output examples [4] (often with some kind of user- or
synthesizer-driven interaction), traces, natural language [3, 7, 13], or full- or partial programs [?].
In this latter category lies reference implementations, such as executable specifications (which
gives the desired output for a given input) or declarative specifications (which checks whether a
given input/output pair is correct). Some synthesis techniques allow for multi-modal specifica-
tions, including pre- and post- conditions, safety assertions at arbitrary program points, or partial
program templates.
Such specifications can constrain two aspects of the synthesis problem:

• Observable behavior, such as an input/output relation, a full executable specification or


safety property. This specifies what a program should compute.

• Structural properties, or internal computation steps. These are often expressed as a sketch
or template, but can be further constrained by assertions over the number or variety of op-
erations in a synthesized programs (or number of iterations, number of cache misses, etc,

1
depending on the synthesis problem in question). Indeed, one of the key principles behind
the scaling of many modern synthesis techniques lie in the way they syntactically restrict the
space of possible programs, often via a sketch, grammar, or DSL.
.
Note that basically all of the above types of specifications can be translated to constraints in
some form or another. Techniques that operate over multiple types of specifications can overcome
various challenges that come up over the course of an arbitrary synthesis problem. Different
specification types are more suitable for some types of problems than others. Alternatively, trace-
or sketch-based specifications can allow a synthesizer to decompese a synthesis problems into
intermediate program points.

Question: how many ways can we specify a sorting algorithm?

(2) Search space of possible programs. The search space naturally includes programs, often con-
structed of subsets of normal program languages. This can include a predefined set of considered
operators or control structures, defined as grammars. However, other spaces are considered for
various synthesis problems, like logics of various kinds, which can be useful for, e.g., synthesizing
graph/tree algorithms.

(3) Search technique. At a high level, there are two general approaches to logical synthesis:
• Deductive (or classic) synthesis (e.g., [15]), which maps a high-level (e.g. logical) specifica-
tion to an executable implementation. Such approaches are efficient and provably correct:
thanks to the semantics-preserving rules, only correct programs are explored. However,
they require complete specifications and sufficient axiomatization of the domain. These ap-
proaches are classically applied to e.g., controller synthesis.
• Inductive (sometimes called syntax-guided) synthesis, which takes a partial (and often multi-
modal) specification and constructs a program that satisfies it. These techniques are more
flexible in their specification requirements and require no axioms, but often at the cost of
lower efficiency and weaker bounded guarantees on the optimality of synthesized code.
Deductive synthesis shares quite a bit in common, conceptually, with compilation: rewriting a
specification according to various rules to achieve a new program in at a different level of represen-
tation. We will (very) briefly overview Denali [11], a prototypical deductive synthesis techniques,
using slides. However, deductive synthesis approaches assume a complete formal specification
of the desired user intent was provided. In many cases, this can be as complicated as writing the
program itself.
This has motivated new inductive synthesis approaches, towards which considerable modern
research energy has been dedicated. This category of techniques lends itself to a wide variety of
search strategies, including brute-force or enumerative [1] (you might be surprised!), probabilistic
inference/belief propagation [6], or genetic programming [12]. Alternatively, techniques based on
logical reasoning delegate the search problem to a constraint solver. We will spend more time on
this set of techniques.

2 Inductive Synthesis
Inductive synthesis uses inductive reasoning to construct programs in response to partial specifi-
cations. The program is synthesized via a symbolic interpretation of a space of candidates, rather

2
than by deriving the candidate directly. So, to synthesize such a program, we basically only require
an interpreter, rather than a sufficient set of derivation axioms. Inductive synthesis is applicable
to a variety of problem types, such as string transformation (FlashFill) [5], data extraction/pro-
cessing/wrangling [4, 19], layout transformation of tables or tree-shaped structures [20], graphics
(constructing structured, repetitive drawings) [9, 2], program repair [16, 14] (spoiler alert!), super-
optimization [11], and efficient synchronization, among others.
Inductive synthesis consists of several family of approaches; we will overview several promi-
nent examples, without claiming to be complete.

2.1 SKETCH, CEGIS, and SyGuS


SKETCH is a well-known synthesis system that allows programs to provide partial programs (a
sketch) that expresses the high-level structure of the intended implementation but leaves holes
for low-level implementation details. The synthesizer fills these holes from a finite set of choices,
using an approach now known as Counterexample-guided Inductive Synthesis (CEGIS) [?, 18].
This well-known synthesis architecture divies the problem into search and verification components,
and uses the output from the latter to refine the specification given to the former.

We have a diagram to illustrate on slides.

Syntax-Guided Synthesis (or SyGuS) formalizes the problem of program synthesis where specifi-
cation is supplemented with a syntactic template. This defines a search space of possible programs
that the synthesizer effectively traverses. Many search strategies exist; two especially well-known
strategies are enumerative search (which can be remarkably effective, though rarely scales), and
deductive or top down search, which recursively reduces the problem into simpler sub-problems.

2.2 Oracle-guided synthesis


Templates or sketches are often helpful and easy to write. However, they are not always available.
Beyond search or enumeration, constraint-based approaches translate a program’s specification
into a constraint system that is provided to a solver. This can be especially effective if combined
with an outer CEGIS loop that provides oracles.
This kind of synthesis can be effective when the properties we care about are relatively easy to
verify. For example, imagine we wanted to find a maximum number m in a list l.

Turn to the handout...


Note that instead of proving that a program satisfies a given formula, we can instead disprove
its negation, such as:

Dl, m : pPmax plq “ mq ^ pm R l _ Dx P l : m ă xq


If the above is satisfiable, a solver will give us a counterexample, which we can use to strengthen
the specificaiton. We can even make this counterexample constructive, so that it provides us an
input together with the corresponding correct output m˚ :

Dl, m˚ : pPmax plq ‰ m˚ q ^ pm˚ P lq ^ p@x P l : mp ˚q ě xq


This is a much stronger constraint than the original counterexample. This approach was origi-
nally introduced for SKETCH, and generalized to oracle-guided inductive synthesis by Jha and
Seshia. Different oracles have been developed for this type of synthesis. We will discussed

3
component-based oracle-guided program synthesis in detail, which illustrates the use of distin-
guishing oracles.

3 Oracle-guided Component-based Program Synthesis


Problem statement and intuition. 1 Given a set of input-output pairs ă α0 , β0 ą . . . ă αn , βn ą
and N components f1 , . . . fn , the goal is to synthesize a function f out of the components such
that @αi .f pαq produces βi . We achieve this by constructing and solving a set of constraints over f ,
passing those constraints to an SMT solver, and using a returned satisfying model to reconstruct
f . The key idea is that we define a set of location variables for each component and inputs and
outputs. The synthesis process then reduces to finding values for those location variables, which
then tell us which line of the program on which each component should appear. This requires
two sets of constraints: one to ensure the program is well-formed, and the other that ensures the
program encodes the desired functionality.
Definitions. We assume for simplicity that each component has a single output, and one or more
inputs. The inputs for the ith component, are denoted as Ñ Ýχ i ; its output, ri . Q denotes the set of all
input variables from all components; R the set of output variables from all components. Finally,
for all variables x, we define a location variable lx , which denotes where x is defined. L is the set
of all location variables: ŤN Ñ
Q :“ Ý
i“1 χ i
ŤN
R :“ i“1 ri
L :“ tlx |x P Q Y R Y Ñ Ýχ Y ru
Well-formedness. ψwf p denotes the well-formedness constraint. Let M “ |Ñ
Ý
χ | ` N , where N is
the number of available components:
def Ź Ñ
p|Ý
Ź
ψwf p pL, Q, Rq “ p0 ď lx ă M q ^ χ | ď lx ă M q ^
xPQ xPR
ψcons pL, Rq ^ ψacyc pL, Q, Rq
The first line of that definition says that inputs must be defined before outputs. ψcons and ψacyc
dictate that there is only one component in each line and that the inputs of each component are
defined before they are used, respectively:
def Ź
ψcons pL, Rq “ plx ‰ ly q
x,yPR,xıy
def ŹN Ź
ψacyc pL, Q, Rq “ lx ă ly
i“1 xPÑ
Ý
χ i ,y”ri

Functionality. φf unc denotes the functionality constraint that guarantees that the solution f satis-
fies the given input-output pairs:
1
These notes are inspired by Section III.B of Nguyen et al., ICSE 2013 [17] ...which provides a really beautifully clear
exposition of the work that originally proposed this type of synthesis in Jha et al., ICSE 2010 [10].

4
def
φf unc pL, α, βq “ ψconn pL, Ñ
Ý
χ , r, Q, Rq ^ φlib pQ, Rq ^ pα “ Ñ
Ý
χ q ^ pβ “ rq
def
ψconn pL, Ñ
Ý Ź
χ , r, Q, Rq “ plx “ ly ñ x “ yq
x,yPQYRYÑ Ý
χ Ytru
def N
p φi pÑ Ý
Ź
φlib pQ, Rq “ χ i , ri qq
i“1

ψconn encodes the meaning of the location variables: If two locations are equal, then the values
of the variables defined at those locations are also equal. φlib encodes the semantics of the pro-
vided basic components, with φi representing the specification of component fi . The rest of φf unc
encodes that if the input to the synthesized function is α, the output must be β.
Almost done! φf unc provides constraints over a single input-output pair αi , βi , we still need to
generalize it over all n provided pairs tă αi , betai ą |1 ď i ď nu:

def n
Ź
θ “ p φf unc pL, αi , βi qq ^ ψwf p pL, Q, Rq
i“1

θ collects up all the previous constraints, and says that the synthesized function f should satisfy
all input-output pairs and the function has to be well formed.

LVal2Prog. The only real unknowns in all of θ are the values for the location variables L. So, the
solver that provides a satisfying assignment to θ is basically giving a valuation of L that we then
turn into a constructed program as follows:
Given a valuation of L, Lval2ProgpLq converts it to a program as follows: The it h line of the
η
Ź
program is rj “ fj prσp1q , ..., rσpηq q whenlrj ““ i and plχk ““ lrσpkq q, where η is the number of
j
k“1
inputs for component fj and χkj denotes the k th input parameter of component fj . The program
output is produced in line lr .

Example. Assume we only have one component, +. + has two inputs: χ1` and χ2` . The output
variable is r` . Further assume that the desired program f has one input χ (which we call input0
in the actual program text) and one output r. Given a mapping for location variables of: tlr` ÞÑ
1, lχ1` ÞÑ 0, χ2` ÞÑ 0, lr ÞÑ 1, lχ ÞÑ 0u, then the program looks like:

0 r0 :“ input0
1 r` :“ r0 ` r0
2 return r`
This occurs because the location of the variables used as input to + are both on the same line (0),
which is also the same line as the input to the program (0). lr , the return variable of the program,
is defined on line 1, which is also where the output of the + component is located. (lr` ). We added
the return on line 2 as syntactic sugar.

References
[1] R. Alur, R. Bodı́k, E. Dallal, D. Fisman, P. Garg, G. Juniwal, H. Kress-Gazit, P. Madhusudan,
M. M. K. Martin, M. Raghothaman, S. Saha, S. A. Seshia, R. Singh, A. Solar-Lezama, E. Torlak,

5
and A. Udupa. Syntax-guided synthesis. In M. Irlbeck, D. A. Peled, and A. Pretschner,
editors, Dependable Software Systems Engineering, volume 40 of NATO Science for Peace and
Security Series, D: Information and Communication Security, pages 1–25. IOS Press, 2015.

[2] R. Chugh, B. Hempel, M. Spradlin, and J. Albers. Programmatic and direct manipulation,
together at last. SIGPLAN Not., 51(6):341–354, June 2016.

[3] A. Desai, S. Gulwani, V. Hingorani, N. Jain, A. Karkare, M. Marron, S. R, and S. Roy. Program
synthesis using natural language. In Proceedings of the 38th International Conference on Software
Engineering, ICSE ’16, pages 345–356, New York, NY, USA, 2016. ACM.

[4] S. Gulwani. Programming by examples: Applications, algorithms, and ambiguity resolution.


In Proceedings of the 8th International Joint Conference on Automated Reasoning - Volume 9706,
pages 9–14, New York, NY, USA, 2016. Springer-Verlag New York, Inc.

[5] S. Gulwani, W. R. Harris, and R. Singh. Spreadsheet data manipulation using examples.
Commun. ACM, 55(8):97–105, Aug. 2012.

[6] S. Gulwani and N. Jojic. Program verification as probabilistic inference. In Proceedings of


the 34th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages,
POPL ’07, pages 277–289, New York, NY, USA, 2007. ACM.

[7] S. Gulwani and M. Marron. Nlyze: Interactive programming by natural language for spread-
sheet data analysis and manipulation. In Proceedings of the 2014 ACM SIGMOD International
Conference on Management of Data, SIGMOD ’14, pages 803–814, New York, NY, USA, 2014.
ACM.

[8] S. Gulwani, O. Polozov, and R. Singh. Program synthesis. Foundations and Trends in Program-
ming Languages, 4(1-2):1–119, 2017.

[9] B. Hempel and R. Chugh. Semi-automated svg programming via direct manipulation. In
Proceedings of the 29th Annual Symposium on User Interface Software and Technology, UIST ’16,
pages 379–390, New York, NY, USA, 2016. ACM.

[10] S. Jha, S. Gulwani, S. A. Seshia, and A. Tiwari. Oracle-guided component-based program


synthesis. In Proceedings of the 32Nd ACM/IEEE International Conference on Software Engineering
- Volume 1, ICSE ’10, pages 215–224, New York, NY, USA, 2010. ACM.

[11] R. Joshi, G. Nelson, and K. Randall. Denali: A goal-directed superoptimizer. SIGPLAN Not.,
37(5):304–314, May 2002.

[12] G. Katz and D. Peled. Genetic programming and model checking: Synthesizing new mutual
exclusion algorithms. In Proceedings of the 6th International Symposium on Automated Technology
for Verification and Analysis, ATVA ’08, pages 33–47, Berlin, Heidelberg, 2008. Springer-Verlag.

[13] V. Le, S. Gulwani, and Z. Su. Smartsynth: Synthesizing smartphone automation scripts from
natural language. In Proceeding of the 11th Annual International Conference on Mobile Systems,
Applications, and Services, MobiSys ’13, pages 193–206, New York, NY, USA, 2013. ACM.

[14] C. Le Goues, T. Nguyen, S. Forrest, and W. Weimer. GenProg: A generic method for auto-
mated software repair. IEEE Transactions on Software Engineering, 38(1):54–72, 2012.

6
[15] Z. Manna and R. J. Waldinger. Toward automatic program synthesis. Commun. ACM,
14(3):151–165, Mar. 1971.

[16] S. Mechtaev, J. Yi, and A. Roychoudhury. Angelix: Scalable Multiline Program Patch Synthe-
sis via Symbolic Analysis. In International Conference on Software Engineering, ICSE ’16, pages
691–701, 2016.

[17] H. D. T. Nguyen, D. Qi, A. Roychoudhury, and S. Chandra. Semfix: Program repair via
semantic analysis. In Proceedings of the 2013 International Conference on Software Engineering,
ICSE ’13, pages 772–781, Piscataway, NJ, USA, 2013. IEEE Press.

[18] O. Polozov and S. Gulwani. Flashmeta: A framework for inductive program synthesis. SIG-
PLAN Not., 50(10):107–126, Oct. 2015.

[19] R. Singh and S. Gulwani. Transforming spreadsheet data types using examples. In Proceedings
of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages,
POPL ’16, pages 343–356, New York, NY, USA, 2016. ACM.

[20] A. Solar-Lezama. Program Synthesis by Sketching. PhD thesis, Berkeley, CA, USA, 2008.
AAI3353225.

[21] N. Yaghmazadeh, C. Klinger, I. Dillig, and S. Chaudhuri. Synthesizing transformations on


hierarchically structured data. SIGPLAN Not., 51(6):508–521, June 2016.

7
Lecture Notes: Symbolic Execution

17-355/17-665/17-819O: Program Analysis (Spring 2018)


Jonathan Aldrich and Claire Le Goues
[email protected], [email protected]

1 Symbolic Execution Overview


Symbolic execution is a way of executing a program abstractly, so that one abstract execution
covers multiple possible inputs of the program that share a particular execution path through the
code. The execution treats these inputs symbolically, “returning” a result that is expressed in terms
of symbolic constants that represent those input values.
Symbolic execution is less general than abstract interpretation, because it doesn’t explore all
paths through the program. However, symbolic execution can often avoid approximating in
places where AI must approximate in order to ensure analysis termination. This means that sym-
bolic execution can avoid giving false warnings; any error found by symbolic execution represents
a real, feasible path through the program, and (as we well see) can be witnessed with a test case
that illustrates the error.

1.1 A Generalization of Testing


As the above discussion suggest, symbolic execution is a way to generalize testing. A test in-
volves executing a program concretely on one specific input, and checking the results. In contrast,
symbolic execution considers how the program executes abstractly on a family of related inputs.
Consider the following code example, where a, b, and c are user-provided:
1 int x=0, y=0, z=0;
2 if(a) {
3 x = -2;
4 }
5 if (b < 5) {
6 if (!a && c) { y = 1; }
7 z = 2;
8 }
9 assert(x + y + z != 3);
Running this code with a = 1, b = 2, and c = 1 causes the assertion to fail, and if we are good
testers/lucky, we can stumble upon this combination and generalize to the combination of input
spaces that will lead to it (and hopefully fix it!).
Instead of executing the code on concrete inputs (like a = 1, b = 2, and c = 1), symbolic execution
evaluates it on symbolic inputs, like a  α, b  β, c  γ, and then tracks execution in terms of those
symbolic values. If a branch condition ever depends on unknown symbolic values, the symbolic
execution engine simply chooses one branch to take, recording the condition on the symbolic

1
values that would lead to that branch. After a given symbolic execution is complete, the engine
may go back to the branches taken and explore other paths through the program.
To get an intuition for how symbolic analysis works, consider abstractly executing a path
through the program above. As we go along the path, we will keep track of the (potentially
symbolic) values of variables, and we will also track the conditions that must be true in order for
us to take that path. We can write this in tabular form, showing the values of the path condition g
and symbolic environment E after each line:
line g E
0 true a ÞÑ α, b ÞÑ β, c ÞÑ γ
1 true . . . , x ÞÑ 0, y ÞÑ 0, z ÞÑ 0
2 α . . . , x ÞÑ 0, y ÞÑ 0, z ÞÑ 0
5 α^β ¥5 . . . , x ÞÑ 0, y ÞÑ 0, z ÞÑ 0
9 α ^ β ¥ 5 ^ 0 0 0  3 . . . , x ÞÑ 0, y ÞÑ 0, z ÞÑ 0
In the example, we arbitrarily picked the path where the abstract value of a, i.e. α, is false,
and the abstract value of b, i.e. β, is not less than 5. We build up a path condition out of these
boolean predicates as we hit each branch in the code. The assignment to x, y, and z updates the
symbolic state E with expressions for each variable; in this case we know they are all equal to 0.
At line 9, we treat the assert statement like a branch. In this case, the branch expression evaluates
to 0 0 0  3 which is true, so the assertion is not violated.
Now, we can run symbolic execution again along another path. We can do this multiple times,
until we explore all paths in the program (exercise to the reader: how many paths are there in the
program above?) or we run out of time. If we continue doing this, eventually we will explore the
following path:
line g E
0 true a ÞÑ α, b ÞÑ β, c ÞÑ γ
1 true . . . , x ÞÑ 0, y ÞÑ 0, z ÞÑ 0
2 α . . . , x ÞÑ 0, y ÞÑ 0, z ÞÑ 0
5 α^β  5 . . . , x ÞÑ 0, y ÞÑ 0, z ÞÑ 0
6 α^β  5^γ . . . , x ÞÑ 0, y ÞÑ 1, z ÞÑ 0
6 α^β  5^γ . . . , x ÞÑ 0, y ÞÑ 1, z ÞÑ 2
9 α ^ β   5 ^ p0 1 2  3q . . . , x ÞÑ 0, y ÞÑ 1, z ÞÑ 2
Along this path, we have α ^ β   5. This means we assign y to 1 and z to 2, meaning that the
assertion 0 1 2  3 on line 9 is false. Symbolic execution has found an error in the program!

1.2 History of Symbolic Analysis


Symbolic execution was originally proposed in the 1970s, but it relied on automated theorem
proving, and the algorithms and hardware of that period weren’t ready for widespread use. With
recent advances in SAT/SMT solving and 4 decades of Moore’s Law applied to hardware, sym-
bolic execution is now practical in many more situations, and is used extensively in program
analysis research as well as some emerging industry tools.

2 Symbolic Execution Semantics


We can write rules for evaluating programs symbolically in W HILE. We will write the rules in a
style similar to the big-step semantics we wrote before, but incorporate symbolic values and keep
track of the path conditions we have taken.

2
We start by defining symbolic analogs for arithmetic expressions and boolean predicates. We
will call symbolic predicates guards and use the metavariable g, as these will turn into guards for
paths the symbolic evaluator explores. These analogs are the same as the ordinary versions, except
that in place of variables we use symbolic constants:

g :: true as :: α


| false | n
| not g | as1 opa as2
| g1 opb g2
| as1 opr as2
Now we generalize the notion of the environment E, so that variables refer not just to integers
but to symbolic expressions:

E P Var Ñ as
Now we can define big-step rules for the symbolic evaluation of expressions, resulting in sym-
bolic expressions. Since we don’t have actual values in many cases, the expressions won’t evalu-
ate, but variables will be replaced with symbolic constants:

xn, E y ó n big-int

xx, E y ó E pxq big-var


xa1, E y ó as1 xa2, E y ó as2
xa1 a2, E y ó as1 as2 big-add

We can likewise define rules for statement evaluation. These rules need to update not only the
environment E, but also a path guard g:

xg, E, skipy ó xg, E y big-skip


xg, E, s1y ó xg , E y xg , E , s2y ó xg
1 1 1 1 2
, E2y
xg, E, s1; s2y ó xg , E y 2 2
big-seq

xa, E y ó as
xg, E, x : ay ó xg, E rx ÞÑ assy big-assign
xP, E y ó g g ^ g SAT xg ^ g , E, s1y ó xg , E y big-iftrue
1 1 1 2 1

xg, E, if P then s1 else s2, y ó xg , E y 2 1

xP, E y ó g g ^ g SAT xg ^ g , E, s2y ó xg


1 1 1 2
, E1y
xg, E, if P then s1 else s2, y ó xg , E y 2 1
big-iffalse

The rules for skip, sequence, and assignment are compositional in the expected way, with the
arithmetic expression on the right-hand side of an assignment evaluating to a symbolic expression

3
rather than a value. The interesting rules are the ones for if. Here, we evaluate the condition to a
symbolic predicate g 1 . In the true case, we use a SMT solver to verify that the guard is satisfiable
when conjoined with the existing path condition. If that’s the case, we continue by evaluating the
true branch symbolically. The false case is analogous.
We leave the rule for while to the reader, following the principles behind the if rules above.

3 Symbolic Execution Implementation and Industrial Use


Of course programs with loops have infinite numbers of paths, so exhaustive symbolic execution
is not possible. Instead, tools take hueristics, such as exploring all execution trees down to a certain
depth, or limiting loop iteration to a small constant, or trying to find at least one path that covers
each line of code in the program. In order to avoid analyzing complex library code, symbolic
executors may use an abstract model of libraries.
Symbolic execution has been used in industry for the last couple of decades. One of the most
prominent examples is the use of the PREfix to find errors in C/C++ code within Microsoft [1].

References
[1] W. R. Bush, J. D. Pincus, , and D. J. Sielaff. A static analyzer for finding dynamic programming
errors. SoftwarePractice and Experience, 30:775–802, 2000.

4
Mixing Type Checking and Symbolic Execution

Khoo Yit Phang Bor-Yuh Evan Chang Jeffrey S. Foster


University of Maryland, College Park University of Colorado, Boulder University of Maryland, College Park
[email protected] [email protected] [email protected]

Abstract atively straightforward algorithm at their core, but then gradually


Static analysis designers must carefully balance precision and ef- accrete a multitude of special cases to add just enough precision
ficiency. In our experience, many static analysis tools are built without sacrificing efficiency.
around an elegant, core algorithm, but that algorithm is then exten- Some degree of fine tuning is inevitable—undecidability of
sively tweaked to add just enough precision for the coding idioms static analysis means that analyses must be targeted to programs
seen in practice, without sacrificing too much efficiency. There are of interest—but an ad-hoc approach has a number of disadvan-
several downsides to adding precision in this way: the tool’s imple- tages: it significantly complicates the implementation of a static
mentation becomes much more complicated; it can be hard for an analysis algorithm; it is hard to be sure that all the special cases
end-user to interpret the tool’s results; and as software systems vary are handled correctly; and it makes the tool less predictable and
tremendously in their coding styles, it may require significant algo- understandable for an end-user since the exact analysis algorithm
rithmic engineering to enhance a tool to perform well in a particular becomes obscured by the special cases. Perhaps most significantly,
software domain. software systems are extremely diverse, and programming styles
In this paper, we present M IX, a novel system that mixes type vary greatly depending on the application domain and the idiosyn-
checking and symbolic execution. The key aspect of our approach crasies of the programmer and her community’s coding standards.
is that these analyses are applied independently on disjoint parts of Thus an analysis that is carefully tuned to work in one domain may
the program, in an off-the-shelf manner. At the boundaries between not be effective in another domain.
nested type checked and symbolically executed code regions, we In this paper, we present M IX, a novel system that trades off pre-
use special mix rules to communicate information between the off- cision and efficiency by mixing type checking—a coarse but highly
the-shelf systems. The resulting mixture is a provably sound analy- scalable analysis—with symbolic execution [King 1976], which is
sis that is more precise than type checking alone and more efficient very precise but inefficient. In M IX, precision versus efficiency is
than exclusive symbolic execution. In addition, we also describe a adjusted using typed blocks {t e t} and symbolic blocks {s e s}
prototype implementation, M IXY, for C. M IXY checks for potential that indicate whether expression e should be analyzed with type
null dereferences by mixing a null/non-null type qualifier inference checking or symbolic execution, respectively. Blocks may nest ar-
system with a symbolic executor. bitrarily to achieve the desired level of precision versus efficiency.
The distinguishing feature of M IX is that its type checking and
Categories and Subject Descriptors D.2.4 [Software Engineer- symbolic execution engines are completely standard, off-the-shelf
ing]: Software/Program Verification; D.2.5 [Software Engineer- implementations. Within a typed or symbolic block, the analyses
ing]: Testing and Debugging—Symbolic execution; F.3.2 [Log- run as usual. It is only at the boundary between blocks that we use
ics and Meanings of Programs]: Semantics of Programming Lang- special mix rules to translate information back-and-forth between
uages—Program analysis the two analyses. In this way, M IX gains precision at limited cost,
while potentially avoiding many of the pitfalls of more complicated
General Terms Languages, Verification approaches.
Keywords Mix, mixed off-the-shelf analysis, symbolic execution, As a hypothetical example, consider the following code:
type checking, mix rules, false alarms, precision 1 {s
2 if (multithreaded) {t fork(); t}
1. Introduction 3 {t . . . t}
4 if (multithreaded) {t lock(); t}
All static analysis designers necessarily make compromises be- 5 {t . . . t}
tween precision and efficiency. On the one hand, static analysis 6 if (multithreaded) {t unlock(); t}
must be precise enough to prove properties of realistic software 7 s}
systems, and on the other hand, it must run in a reasonable amount This code uses multiple threads only if multithreaded is set to
of time and space. One manifestation of this trade-off is that, in our true. Suppose we have a type-based analysis that checks for data
experience, many practical static analysis tools begin with a rel- races. Then assuming the analysis is path-insensitive, it cannot tell
whether a thread is created on line 2, and it does not know the lock
state after lines 4 and 6—all of which will lead to false positives.
Rather than add path-sensitivity to our core data race analysis,
Permission to make digital or hard copies of all or part of this work for personal or we can instead use M IX to gain precision. We wrap the program
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
in a symbolic block at the top level so that the executions for each
on the first page. To copy otherwise, to republish, to post on servers or to redistribute setting of multithreaded will be explored independently. Then for
to lists, requires prior specific permission and/or a fee. performance, we wrap all the other code (lines 3 and 5 and the calls
PLDI’10, June 5–10, 2010, Toronto, Ontario, Canada. to fork, lock, and unlock) in typed blocks, so that they are analyzed
Copyright © 2010 ACM 978-1-4503-0019/10/06. . . $10.00 with the type-based analysis. In this case, these block annotations
effectively cause the type-based analysis to be run twice, once for bit of path sensitivity to increase the precision of type checking.
each possible setting of multithreaded; and by separating those two For example, we can avoid analyzing unreachable code:
cases, we avoid conflation and eliminate false positives.
{t . . .{s if true then {t 5 t} else {t ”foo” + 3 t} s} . . . t}
While M IX cannot address every precision/efficiency tradeoff
issue (for example, the lexical scoping of typed and symbolic This code runs without errors, but pure type checking would com-
blocks is one limitation), there are nonetheless many potential ap- plain about the potential type error in the false branch. However,
plications. Among other uses, M IX can encode forms of flow- with these block annotations added in M IX, the symbolic executor
sensitivity, context-sensitivity, path-sensitivity, and local type re- will only invoke the type checker for the true branch and hence will
finements. M IX can also use type checking to overcome some lim- avoid a false positive.
itations of symbolic execution (Section 2). Also, for the purposes We can also use symbolic execution to gain some flow sensi-
of this paper, we leave the placement of block annotations to the tivity. For example, in a dynamically-typed imperative language,
programmer, but we envision that an automated refinement algo- programmers may reuse variables as different types, such as in the
rithm could heuristically insert blocks as needed. In this scenario, following:
M IX becomes an intermediate language for modularly combining
{t . . .{s var x = 1; {t . . . t} ; x = ”foo”; s} . . . t}
off-the-shelf analyzer implementations.
In this paper, we formalize M IX for a small imperative lan- Here the local variable x is first assigned an integer and is later
guage, mixing a standard type checking system with symbolic ex- reused to refer to a string. With the annotations above, we can
ecution to yield a system to check for the absence of run-time type successfully statically check such code using the symbolic executor
errors. Thus, rather than checking for assertion failures, as a typical to distinguish the two different assignments to x, then type check
symbolic executor might do, our formal symbolic executor reports the code in between.
any type mismatches it detects. To mix these two systems together, Similar cases can occur if we try to apply a non-standard type
we introduce two new rules: one rule in the type system that “type system to existing code. For example, in our case study (Sec-
checks” blocks {s e s} using the symbolic executor; and one rule tion 4.5), we applied a nullness checker based on type qualifiers
in the symbolic executor that “executes” blocks {t e t} using the to C. We found some examples like the following code:
type checker. We prove that the type system, symbolic executor,
{t . . .{s x→obj = NULL;
and mix of the two systems are sound. The soundness proof of M IX x →obj = (. . .)malloc(. . .); s} . . . t}
uses the proofs of type soundness and symbolic execution sound-
ness essentially as-is, which provides some additional evidence of a Here x→obj is initially assigned to NULL, immediately before
clean modularization. Additionally, two features of our formalism being assigned a freshly allocated location. A flow-insensitive type
for symbolic execution may be of independent interest: we discuss qualifier system would think that x→obj could be NULL after this
the tradeoff between “forking” the symbolic executor and giving pair of assignments, even though it cannot be.
more work to the solver; and we provide a soundness proof, which, Finally, we can also use symbolic execution to gain context-
surprisingly, we have been unable to find for previous symbolic ex- sensitivity, though at the cost of duplicate work. For example, in
ecution systems (Section 3). the following:
Finally, we describe M IXY, a prototype implementation of M IX
{s let id x = x in {t . . .{s id 3 s} . . .{s id 3.0 s} . . . t} s}
for C. M IXY combines a simple, monomorphic type qualifier in-
ference system (a reimplementation of Foster et al. [2006]) with a the identity function id is called with an int and a float. Rather than
C symbolic executor. There are two key challenges that arise when adding parametric polymorphism to type check this example, we
mixing type inference rather than checking: we need to perform could wrap those calls in symbolic blocks, which in M IX causes
a fixed-point computation as we switch between typed and sym- the calls to be checked with symbolic execution. While this is
bolic blocks since data values can pass from one to the other and likely not useful for standard type checking, for which parametric
back; and we need to integrate aliasing information into our analy- polymorphism is well-understood, it could be very useful for a
sis so that pointer manipulations performed within symbolic blocks more advanced type system for which fully general parametric
correctly influence typed blocks. Additionally, we extend M IXY to polymorphic type inference might be difficult to implement or
support caching block results as well as recursion between blocks. perhaps even undecidable.
We use M IXY to look for null pointer errors in a reasonably-sized A combination of context sensitivity and path sensitivity is
benchmark vsftpd; we found several examples where adding sym- possible with M IX. For example, consider the following:
bolic blocks can eliminate false positives compared to pure type
{s
qualifier inference (Section 4). let div x y = if y = 0 then ‘‘err’’ else x / y in
We believe that M IX provides a promising new approach to {t . . .+ {s div 7 4 s} t}
trading off precision and efficiency in static analysis. We expect s}
that the ideas behind M IX can be applied to many different combi-
nations of many different analyses. where the div function may return an int or a string, but it returns a
string (indicating error) only when the second argument is 0. Note
that this level of precision would be out of the reach of parametric
2. Motivating Examples polymorphism by itself.
Before describing M IX formally, we examine some coding idioms
Local Refinements of Data. Symbolic execution can also poten-
for which type inference and symbolic execution can profitably be
tially be used to model data more precisely for non-standard type
mixed. Our examples will be written in either an ML-like language
systems. As one example, suppose we introduce a type qualifier
or C-like language, depending on which one is more natural for the
system that distinguishes the sign of an integer as either positive,
particular example.
negative, zero, or unknown. Then we can use symbolic execution
Path, Flow, and Context Sensitivity. In the introduction, we to refine the type of an integer after a test:
saw one example in which symbolic execution introduced a small {t
amount of path sensitivity to type inference. There are several po- let x : unknown int = . . .in
tential variations on this example where we can locally add a little {s
if x > 0 then {t (∗ x : pos int ∗) . . . t} Source Language.
else if x = 0 then {t (∗ x : zero int ∗) . . . t} e ::= x | v variables, constants
else {t (∗ x : neg int ∗) . . . t} | e+e arithmetic
s} | e = e | ¬e | e ∧ e predicates
t} | if e then e else e conditional
Here on entry to the symbolic block, x is an unknown integer, so | let x = e in e let-binding
the symbolic executor will assign it an initial symbolic value αx | ref e | !e | e := e references
ranging over all possible integers. Then at the conditional branches, | {t e t} type checking block
the symbolic executor will fork and explore the three possibilities: | {s e s} symbolic execution block
αx > 0, αx = 0, and αx < 0. On entering the typed block in v ::= n | true | false concrete values
each branch, since the value of x is constrained in the symbolic Types, Symbolic Expressions, and Environments.
execution, the type system will start with the appropriate type for x,
either pos, zero, or neg int, respectively. τ ::= int | bool | τ ref types
As another example, suppose we have a type system to prevent Γ ::= ∅ | Γ, x : τ typing environment
data races in C. Then a common problem that arises is analyzing
local initialization of shared data [Pratikakis et al. 2006]. Consider s ::= u:τ typed symbolic expressions
the following code: g ::= u:bool guards
u ::= α | v symbolic variables, constants
{t | u:int + u:int arithmetic
{s | s = s | ¬g | g ∧ g predicates
x = (struct foo ∗) malloc(sizeof(struct foo));
| m[u:τ ref] memory select
x→bar = . . .;
x→baz = . . .; m ::= µ arbitrary memory
x→qux = . . .; | m, (s  s) memory update
a
s} | m, (s  s) memory allocation
insert(shared data structure, x); Σ ::= ∅ | Σ, x : s symbolic environment
t}
Figure 1. Program expressions, types, and symbolic expressions.
Here we allocate a new block of memory and then initialize it in
several steps before it becomes shared. A flow-insensitive type-
based analysis would report an error because the writes through in the top portion of Figure 1, with which we study the essence of
x occur without a lock held. On the other hand, if we wrap the switching blocks for mixing analyses. Our language includes vari-
allocation and initialization in a symbolic block, as above, symbolic ables x; integers n; booleans true and false; selected arithmetic and
execution can easily observe that x is local during the initialization boolean operations +, =, ¬, and ∧; conditionals with if; let bind-
phase, and hence the writes need not be protected by a lock. ings; and ML-style updatable references with ref (construction), !
Helping Symbolic Execution. The previous examples considered (dereference), and := (assignment). We also include two new block
adding precision in type checking through symbolic execution. forms, typed blocks {t e t} and symbolic blocks {s e s}, which
Alternatively, typed blocks can potentially be used to introduce indicate e should be analyzed with type checking or symbolic ex-
conservative abstraction in symbolic execution when the latter is ecution, respectively. We leave unspecified whether the outermost
not viable. For example: scope of a program is treated as a typed block or a symbolic block;
M IX can handle either case.
{s
let x = {t unknown function() t} in . . . 3.1 Type Checking and Symbolic Execution
let y = {t 2∗∗z (∗ operation not supported by solver ∗) t} in . . .
{t while true do {s loop body s} t} Type checking for our source language is entirely standard, and
s}
so we omit those rules here. Our type checking system proves
judgments of the form Γ ` e : τ , where Γ is the type environment
The first line contains a call to a function whose source code is not and τ is e’s type. Grammars for Γ and τ are given in the bottom
available, so we cannot symbolically execute the call. However, if portion of Figure 1.
we know the called function’s type, then we can wrap the call in The remainder of this section describes a generic symbolic ex-
a typed block (assuming the function has no side effects), conser- ecutor. While the concept of symbolic execution is widely known,
vatively modeling its return value as any possible member of its there does not appear to be a clear consensus of its definition. Thus,
return type. Similarly, on the second line, we are performing an we make explicit our definition of symbolic execution here through
exponentiation operation, and let us suppose the symbolic execu- a formalization similar to an operational semantics. Such a formal-
tor’s solver cannot model this operation if z is symbolic. Then by ization enables us to describe the switching between type checking
wrapping the operation in a typed block, we can continue symbolic and symbolic execution in a uniform manner.
execution, again conservatively assuming the result of the exponen-
tiation is any member of the result type. The third line shows how Symbolic Expressions, Memories, and Environments. The re-
we could potentially handle long-running loops by wrapping them mainder of Figure 1 describes the symbolic expressions and en-
in typed blocks, so that symbolic execution would effectively skip vironments used by our symbolic executor. Symbolic expressions
over them rather than unroll them (infinitely). We can also recover are used to accumulate constraints in deferral rules. For example,
some precision within the loop body by further wrapping the loop the symbolic expression (α:int + 3:int):int represents a value that
body with a symbolic block. is three more than the unknown integer α.
Because we are concerned with checking for run-time type er-
rors, in our system symbolic expressions s have the form u:τ ,
3. The M IX System where u is a bare symbolic expression and τ is its type. With these
In the previous section, we considered a number of idioms that mo- type annotations, we can immediately determine the type of a sym-
tivate the design of M IX. Here, we consider a core language, shown bolic expression, just like in a concrete evaluator with values. As a
shorthand, we use g to represent conditional guards, which are just Symbolic Execution. Σ ` hS ; ei ⇓ hS 0 ; si S = hg ; mi
symbolic expressions with type bool. Bare symbolic expressions u
may be symbolic variables α (e.g., α:int is a symbolic integer, and SEVAR
α:bool is a symbolic boolean); known values v; or operations +,
=, ¬, ∧ applied to symbolic expressions of the appropriate type. Σ, x : s ` hS ; xi ⇓ hS ; si
Notice that our syntax forbids the formation of certain ill-typed
SEVAL
symbolic expression (e.g., α1 :int + α2 :bool is not allowed).
Symbolic expressions also include symbolic memory accesses Σ ` hS ; vi ⇓ hS ; (v: typeof(v))i
m[u:τ ref], which represents an access through pointer u in sym-
bolic memory m. A symbolic memory may be µ, representing an SEP LUS
arbitrary but well-typed memory; m, (s  s0 ), a memory that Σ ` hS ; e1 i ⇓ hS1 ; u1 :inti Σ ` hS1 ; e2 i ⇓ hS2 ; u2 :inti
is the same as m except location s is updated to contain s0 ; or Σ ` hS ; e1 + e2 i ⇓ hS2 ; (u1 :int + u2 :int):inti
a
m, (s  s0 ), which is the same as m except newly allocated lo-
cation s points to s0 . These are essentially McCarthy-style sel and SEE Q
upd expressions that allow the symbolic executor to accumulate Σ ` hS ; e1 i ⇓ hS1 ; u1 :τ i Σ ` hS1 ; e2 i ⇓ hS2 ; u2 :τ i
a log of writes and allocations while deferring alias analysis. An Σ ` hS ; e1 = e2 i ⇓ hS2 ; (u1 :τ = u2 :τ ):booli
allocation always creates a new location that is distinct from the lo-
cations in the base unknown memory, so we distinguish them from SEN OT
arbitrary writes. Σ ` hS ; e1 i ⇓ hS1 ; g1 i
Finally, symbolic environments Σ map local variables x to Σ ` hS ; ¬e1 i ⇓ hS1 ; ¬g1 :booli
(typed) symbolic expressions s.
Symbolic Execution for Pure Expressions. Figure 2 describes SEA ND
our symbolic executor on pure expressions using what are essen- Σ ` hS ; e1 i ⇓ hS1 ; g1 i Σ ` hS1 ; e2 i ⇓ hS2 ; g2 i
tially big-step operational semantics rules. The rules in Figure 2 Σ ` hS ; e1 ∧ e2 i ⇓ hS2 ; (g1 ∧ g2 ):booli
prove judgments of the form
SEL ET
Σ ` hS ; ei ⇓ hS 0 ; si Σ ` hS ; e1 i ⇓ hS1 ; s1 i Σ, x : s1 ` hS1 ; e2 i ⇓ hS2 ; s2 i
meaning with local variables bound in Σ, starting in state S, expres- Σ ` hS ; let x = e1 in e2 i ⇓ hS2 ; s2 i
sion e evaluates to symbolic expression s and updates the state to
S 0 . In our symbolic execution judgment, a state S is a tuple hg ; mi, SEI F -T RUE
where g is a path condition constraining the current state and m is Σ ` hS ; e1 i ⇓ hS1 ; g1 i
the current symbolic memory. The path condition begins as true, Σ ` hS1 [g 7→ g(S1 ) ∧ g1 ] ; e2 i ⇓ hS2 ; s2 i
and whenever the symbolic executor makes a choice at a condi- Σ ` hS ; if e1 then e2 else e3 i ⇓ hS2 ; s2 i
tional, we extend the path condition to remember that choice (more
on this below). We write X(S) for the X component of S, with SEI F -FALSE
X ∈ {g, m}, and similarly we write S[X 7→ Y ] for the state that Σ ` hS ; e1 i ⇓ hS1 ; g1 i
is the same as S, except its X component is now Y . Σ ` hS[g 7→ g(S1 ) ∧ ¬g1 ] ; e3 i ⇓ hS3 ; s3 i
Most of the rules in Figure 2 are straightforward and intend to Σ ` hS ; if e1 then e2 else e3 i ⇓ hS3 ; s3 i
summarize typical symbolic executors. Rule SEVAR evaluates a
local variable by looking it up in the current environment. Notice
that, as with standard operational semantics, there is no reduction Figure 2. Symbolic execution for pure expressions.
possible if the variable is not in the current environment. Rule
SEVAL reduces values to themselves, using the auxiliary function
typeof(v) that examines the value form to return its type (i.e., (so-called “concolic execution”), but then would ask an SMT solver
typeof(n) = int and typeof(true) = typeof(false) = bool). later whether the path not taken was feasible and, if so, come back
Rules SEP LUS, SEE Q, SEN OT, and SEA ND execute the and take it eventually. All of these implementation choices can be
subexpressions and then form a new symbolic expression with +, viewed as optimizations to prune infeasible paths or hints to focus
=, ¬, or ∧, respectively. Notice that these rules place requirements the exploration. Since we are not concerned with performance in
on the subexpressions—for example, SEP LUS requires that the our formalism, we simply extend the path condition and continue—
subexpressions reduce to symbolic integers, and SEN OT requires eventually, when symbolic execution completes, we will check the
that the subexpression reduces to a guard (a symbolic boolean). If path condition and discard the path if it is infeasible. To get sound
the subexpression does not reduce to an expression of the right type, symbolic execution, we will compute a set of symbolic executions
then symbolic execution fails. Thus, these rules form a symbolic and require that all feasible paths are explored (see Section 3.2).
execution engine that does very precise dynamic type checking. Sometimes, the symbolic executor may want to throw away
Rule SEL ET symbolically executes e1 and then binds e1 to x for information (e.g., replace a symbolic expression for a compli-
execution of e2 . The last two rules, SEI F -T RUE and SEI F -FALSE, cated memory read with a fresh symbolic variable). Such a rule
model a pure, non-deterministic version of the kind of symbolic ex- is straightforward to add, but as discussed in Section 3.2, a nested
ecution popularized by DART [Godefroid et al. 2005], CUTE [Sen typed block {t e t} serves a similar purpose.
et al. 2005], EXE [Cadar et al. 2006], and KLEE [Cadar et al.
2008]. When we reach a conditional, we conceptually fork exe- Deferral Versus Execution. Consider again the rules for sym-
cution, extending the path condition with g1 or ¬g1 to indicate the bolic execution on pure expressions in Figure 2. Excluding the triv-
branch taken. EXE and KLEE would both invoke an SMT solver at ial SEVAL rule, the first set of rules (SEP LUS, SEE Q, SEN OT,
this point to decide whether one or both branches are feasible, and and SEA ND) versus the second set (SEL ET, SEVAR, SEI F -T RUE,
then try all feasible paths. DART and CUTE, in contrast, would SEI F -FALSE) seem qualitatively different. The first set simply get
continue down one path as guided by an underlying concrete run symbolic expressions for their subexpressions and form a new sym-
bolic expression of the corresponding operator, essentially defer- Symbolic Execution for References. Σ ` hS ; ei ⇓ hS 0 ; si
ring any reasoning about the operation (e.g., to an SMT solver).
In contrast, the second set does not accumulate any such symbolic SER EF
expression but rather chooses a possible concrete execution to fol- Σ ` hS ; e1 i ⇓ hS1 ; u1 :τ i α∈ / Σ, S, S1 , u1
S 0 = S1 [m 7→ (m(S1 ), (α:τ ref  u1 :τ ))]
a
low. For example, we can view SEI F -T RUE as choosing to assume
that g1 is concretely true and proceeding to symbolically execute Σ ` hS1 ; ref e1 i ⇓ hS 0 ; α:τ refi
e2 . This assumption is recorded in the path condition. (The SEL ET
and SEVAR rules are degenerate execution rules where no assump- SEA SSIGN
tions need to be made because there is only one possible concrete Σ ` hS ; e1 i ⇓ hS1 ; s1 i Σ ` hS1 ; e2 i ⇓ hS2 ; s2 i
execution for each.) Alternatively, we see that there are symbolic Σ ` hS ; e1 := e2 i ⇓ hS2 [m 7→ (m(S2 ), (s1  s2 ))] ; s2 i
expression forms for +, =, ¬, and ∧ but not for let, program vari-
ables, and if. SED EREF
Although it is not commonly presented as such, the decision Σ ` hS ; e1 i ⇓ hS1 ; u1 :τ refi ` m(S1 ) ok
of deferral versus execution is a design choice. For example, let
Σ ` hS ; !e1 i ⇓ hS1 ; m(S1 )[u1 :τ ref]:τ i
us include an if-then-else symbolic expression g?s1 :s2 (using a C-
style conditional syntax) that evaluates to s1 if g evaluates to true
and s2 otherwise. Then, we could defer to the evaluation of the
conditional to the solver with the following rule: Memory Type Consistency. ` m ok U ` m ok
SEI F -D EFER E MPTY-OK A LLOC -OK
Σ ` hS ; e1 i ⇓ hS1 ; g1 i ` m ok U
Σ ` hS[g 7→ g(S1 ) ∧ g1 ] ; e2 i ⇓ hS2 ; u2 :τ i ` µ ok ∅ a
` m, (α:τ ref  u2 :τ ) ok U
Σ ` hS[g 7→ g(S1 ) ∧ ¬g1 ] ; e3 i ⇓ hS3 ; u3 :τ i
S 0 = h(g1 ?g(S2 ):g(S3 )) ; (g1 ?m(S2 ):m(S3 ))i OVERWRITE -OK
Σ ` hS ; (if e1 then e2 else e3 )i ⇓ hS 0 ; (g1 ?(u2 :τ ):(u3 :τ )):τ i ` m ok U U 0 = U \ {s1  s2 | s1 ≡ u1 :τ ref ∧ s1  s2 ∈ U }
Here notice we also have to extend the ·? · :· relation to operate ` m, (u1 :τ ref  u2 :τ ) ok U 0
over memory as well. With this rule, we need not “fork” symbolic
execution at all. However, note that even with conditional symbolic A RBITRARY-N OT OK M-OK
expressions and condition symbolic memory, this rule is more con- ` m ok U ` m ok ∅
servative than the SEI F -T RUE and SEI F -FALSE execution rules, as ` m, (s1  s2 ) ok (U ∪ {s1  s2 }) ` m ok
it requires both branches to have the same type.
Conversely, other rules may also be made non-deterministic in
manner similar to SEI F -*. For example, SEVAR may instead return Figure 3. Symbolic execution for updatable references.
an arbitrary value v and add Σ(x) = v to the path condition, a style
that resembles hybrid concolic testing [Majumdar and Sen 2007].
A special case of execution rules are ones that apply only when the contents. How, then, do we determine the type of the pointed-
we have concrete values during symbolic execution and thus do not to value? We need the type so that we can halt symbolic execution
need to “fork.” For example, we could have a SEP LUS -C ONC that later if that value is used in a type-incorrect manner. That is, we do
applies to two concrete values n1 , n2 and returns the sum. This not want to defer the discovery of a potential type error.
approach is reminiscent of partial evaluation. Our solution is to use the type annotation on the pointer to
These choices trade off the amount of work done between the get the type of the contents—but above we just explained that
symbolic executor and the underlying SMT solver. For example, SEA SSIGN allows writes to violate those type annotations. There
SEI F -D EFER introduces many disjunctions into symbolic expres- are many potential ways to solve this problem. We could invoke
sions, which then may be hard to solve efficiently. To match current an SMT solver to compute the actual set of addresses that could
practice, we stick with the forking variant for conditionals, but we be dereferenced and fork execution for each one. Or we could
believe our system would also be sound with SEI F -D EFER. proceed as our implementation and use an external alias analysis
to conservatively model all possible locations that could be read to
Symbolic References. Figure 3 continues our symbolic executor check that the values at all locations have the same type (Section 4).
definition with rules for updatable references. We use deferral rules However, to keep the formal system simple, we choose a very
for all aspects of references in our formalization. Rule SER EF eval- coarse solution: we require that all pointers in memory are well-
uates e1 and extends m(S1 ) with an allocation for fresh symbolic typed with the check ` m(S1 ) ok.
pointer α. Similarly, rule SEA SSIGN extends S2 to record that s1 This judgment is defined in the bottom portion of Figure 3 in
now points to s2 . Observe that allocations and writes are simply terms of the auxiliary judgment ` m ok U , which means mem-
logged during symbolic execution for later inspection. Also, no- ory m is consistently typed (pointers point to values of the right
tice that we allow any value to be written to s1 , even if it does not type), except for mappings in U . There are four cases for this judg-
match the type annotation on s1 . In contrast, standard type systems ment. E MPTY-OK says that arbitrary well-typed memory µ is con-
require that any writes to memory must preserve types since the sistently typed. Similarly, A LLOC -OK says that if m is consistently
type system does not track enough information about pointers to be typed except for potentially inconsistent writes in U , then adding an
sound if that property is violated. Symbolic execution tracks every allocation preserves consistent typing up to U . Rule OVERWRITE -
possible program execution precisely, and so it can allow arbitrary OK says that if ` m ok U and we extend m with a well-typed
memory writes. write to u1 , then any previous, inconsistent writes to locations
In SED EREF, we evaluate e1 to a pointer u1 :τ ref and then s1 ≡ u1 :τ ref can be ignored. Here by ≡ we mean syntactic equiv-
produce the symbolic expression m(S1 )[u1 :τ ref]:τ to represent alence, but in practice we could query a solver to validate such
the contents of that location. However, here we are faced with a an equality given the current path condition. Rule A RBITRARY-
challenge: we are not actually looking up the contents of memory; N OT OK says that any write can be added to U and viewed as po-
rather, we are simply forming a symbolic expression to represent tentially inconsistent. Finally, M-OK says that ` m ok if m has no
Block Typing. Γ`e:τ the current symbolic memory state be consistent, since the typed
block relies purely on type information (rather than tracking pointer
TS YM B LOCK values as symbolic execution does). Then we type check e in Γ,
Σ(x) = αx :Γ(x) (for all x ∈ dom(Γ)) yielding a type τ . The typed block itself symbolically evaluates to
Σ ` hS ; ei ⇓ hSi ; ui :τ i S = htrue ; µi µ∈ /Σ a fresh symbolic variable α of type τ . Since the typed block may
` m(Si ) ok exhaustive(g(S1 ), . . . , g(Sn )) (i ∈ 1..n) have written to memory, we conservatively set the memory of the
Γ ` {s e s} : τ output state to a fresh µ0 , indicating we know nothing about the
memory state at that point except that it is consistent.
exhaustive(g1 , . . . , gn ) ⇐⇒ (g1 ∨ . . . ∨ gn is a tautology) Note that in our formalism, we do not have typed blocks within
typed blocks, or symbolic blocks within symbolic blocks, though
Block Symbolic Execution. Σ ` hS ; ei ⇓ hS 0 ; si these would be trivial to add (by passing-through).
SET YP B LOCK Why Mix? The mix rules are essentially as precise as possible
`Σ:Γ ` m(S) ok Γ`e:τ µ0 , α ∈
/ Σ, S given the strengths and limitations of each analysis. The nested
0
Σ ` hS ; {t e t}i ⇓ hS[m 7→ µ ] ; α:τ i analysis starts with the maximum amount of information that can
be extracted from the other static analysis—for symbolic blocks,
Symbolic and Typing Environment Conformance. `Σ:Γ the only available information for symbolic execution is types,
whereas for typed blocks, the type checker only cares about types of
dom(Σ) = dom(Γ) variables and thus abstracts away the symbolic expressions. After
Σ(x) = u:Γ(x) (for all x ∈ dom(Γ)) the nested analysis is complete, the result is similarly passed back
`Σ:Γ to the enclosing analysis as precisely as possible.
For this paper, we deliberately chose two analyses at oppo-
Figure 4. Mixing symbolic execution and type checking. site ends of the precision spectrum: type checking is cheap, flow-
insensitive with a rather coarse abstraction, while symbolic execu-
tion is expensive, flow- and path-sensitive (and context-sensitive if
inconsistent writes that persist. Together, these rules ensure that the we add functions) with a minimal amount of abstraction (i.e., it
type assigned to the result of a dereference is sound. We can also is not even a proper program analysis per se, as there are no ter-
see how the SED EREF may be made more precise by only requir- mination guarantees). They also work in such a different manner
ing consistency up to a set of writes U and querying a solver to that it does not seem particularly natural to combine them in tighter
show that u1 :τ ref are disequal to all the address expressions in U . ways (e.g., as a reduced product of abstract interpreters [Cousot and
Cousot 1979]). We think it is surprising just how much additional
3.2 Mixing precision we can obtain and the kinds of idioms we can analyze
from such a simple mixing of an entirely standard type system and
In the previous section, we considered type checking and symbolic a typical symbolic executor as-is (as we see in Section 2). We note
execution separately, ignoring the blocks that indicate a switch in that a type system capturing all of the examples in Section 2 would
analysis. Figure 4 shows the two mix rules that capture switching likely be quite advanced (involving, for example, dependent types).
between analyses. However, as can be seen in Figure 4, the conversion between
Rule TS YM B LOCK describes how to type check a symbolic these two analyses may be extremely lossy. For example, in
block {s e s}, that is, how to apply symbolic execution to de- SET YP B LOCK, the memory after returning from the type checker
rive a type of a subexpression for a type checker. First, we con- must be a fresh arbitrary memory µ0 because e may make any num-
struct an environment Σ that maps each variable x in Γ to a fresh ber of writes not captured by the type system and thus not seen by
symbolic variable αx , whose type is extracted from Γ. Then we the symbolic executor. We can also imagine mixing any number of
run the symbolic execution under Σ, starting in a state with true analyses in arbitrary combination, yielding different precision/effi-
for the path condition and a fresh symbolic variable µ to stand ciency tradeoffs. For example, if we were to use a type and effect
for the current memory. Recall that, because of SEI F -T RUE and system rather than just a type system, we could avoid introducing a
SEI F -FALSE, symbolic execution is actually non-deterministic— completely fresh memory µ0 in SET YP B LOCK—instead, we could
it conceptually can branch at conditionals. If we want to soundly find the effect of e and limit applying this “havoc” operation only
model the entire possible behavior of e, we need to execute all to locations that could have been changed.
paths. Thus, we run the symbolic executor n times, yielding final
states hSi ; ui :τ i for i ∈ 1..n, and we require that the disjunction
of the guards from all executions form a tautology. This constraint 3.3 Soundness
ensures that we exhaustively explore every possible path (see Sec- In this section, we sketch the soundness of M IX, which is de-
tion 3.3 about soundness). And if all those paths executed success- scribed in full detail in the appendix of our companion technical
fully without type errors and returned a value of the same type τ , report [Khoo et al. 2010]. The key feature of our proof is that aside
then that is the type of expression e. We also check that all paths from the mix rule cases, it reuses the standalone type soundness
leave memory in a consistent state. and symbolic execution soundness proofs essentially as-is.
Symbolic execution has typically been used as an unsound anal- We show soundness with respect to a standard big-step opera-
ysis where there is no exhaustiveness check like exhaustive(. . .) tional semantics for our simple language of expressions. Our se-
in the TS YM B LOCK. We can also model such unsound analysis by mantics is given by a judgment E ` hM ; ei → r. This says that
weakening exhaustive(. . .) to a “good enough check.” in a concrete environment E, an initial concrete memory M and an
The other rule, SET YP B LOCK, describes how to symbolically expression e evaluate to a result r. A concrete environment maps
execute a typed block {t e t}, that is, how to apply the type checker variables to values, while a concrete memory maps locations to val-
in the middle of a symbolic execution. We begin by deriving a type ues. The evaluation result r is either a concrete memory-value pair
environment Γ that maps local variables to the types of the symbols hM 0 ; vi or a distinguished error token.
they are mapped to in Σ. This mapping is described precisely by the To prove mix soundness, we consider simultaneously type and
judgment ` Σ : Γ, which is straightforward. We also require that symbolic execution soundness. While type soundness is standard,
we discuss it briefly, as it is a part of mix soundness, and provides Σ ` hS;ei ⇓ hS 0 ;si such that the symbolic state is an abstraction of
intuition for symbolic execution soundness. the concrete state (i.e., hE; M i ∼Λ0 ·V ·Λ hΣ; m(S)i). There is one
For type soundness, we introduce a memory type environment more premise, Jg(S 0 )KV , which says that the path condition accu-
Λ that maps locations to types, and we update the typing judgment mulated during symbolic execution holds under this valuation. This
to carry this additional environment, as Γ `Λ e : τ where Λ is constrains the concrete and symbolic executions to follow the same
constant in all rules. In many proofs, Λ is included in Γ rather path. With these premises, symbolic execution soundness says that
than being called out separately, but for Mix soundness separat- the result of symbolic execution, that is the memory-symbolic ex-
ing locations from variables makes the proof easier. To show type pression pair hm(S 0 ); si, is an abstraction of the concrete evalua-
soundness, we need a relation between the concrete environment tion result, which must be a memory-value pair hM 0 ; vi.
and memory hE; M i and the type environment and memory typing
Theorem 1 (M IX Soundness)
hΓ; Λi. We write this relation as hE; M i ∼ hΓ; Λi, which infor-
1. If
mally says two things: (1) the type environment Γ abstracts the
concrete environment E, that is, the concrete value v mapped by E ` hM ; ei → r and
each variable x in E has type Γ(x), and (2) the memory typing Λ
Γ `Λ e : τ such that
abstracts the concrete memory M , that is, the concrete value v at
each location l in M has type Λ(l). We also talk about the second hE; M i ∼ hΓ; Λi ,
component in isolation, in which case we write M ∼ Λ to mean
memory typing Λ abstracts the concrete memory M . then ∅ `Λ0 v : τ and M ∼ Λ for some M 0 , v, Λ0 such that
0 0

Type soundness is the first part of mix soundness (statement 1 in r = hM 0 ; vi and Λ0 ⊇ Λ.


Theorem 1, shown below). Let us consider the pieces. Suppose we 2. If
have a concrete evaluation E ` hM ; ei → r. We further suppose E ` hM ; ei → r and
that e has type τ in typing environments that are sound with respect
0
to the concrete state (i.e., hE; M i ∼ hΓ; Λi). Then, the result r Σ ` hS ; ei ⇓ hS ; si such that
must be a memory-value pair hM 0 ; vi where the resulting concrete
memory is abstracted by Λ0 , an extension of Λ, and the resulting hE; M i ∼Λ0 ·V ·Λ hΣ; m(S)i and Jg(S 0 )KV ,
value v has the same type τ in Γ with the extended memory typing 0
then r ∼Λ00 ·V 0 ·Λ0 hm(S ); si for some V 0
⊇ V and some
Λ0 . Notice this captures the notions that well-typed expressions Λ00 , Λ0 such that Λ00 ∗ Λ0 ⊇ Λ0 ∗ Λ.
cannot evaluate to error and that evaluation preserves typing.
For symbolic execution soundness, we need to ensure that a P ROOF
symbolic execution faithfully models actual concrete executions. By simultaneous induction on the derivations of E ` hM ; ei → r.
Let V be a valuation, which is a finite mapping from symbolic The proof is given in the appendix of our companion technical
values α to concrete values v or concrete memories M . We write report [Khoo et al. 2010].
JsKV , JmKV , and JΣKV for the natural extension of V to operate
This statement of symbolic execution soundness (part 2 in The-
on arbitrary symbolic expressions, memories, and the symbolic en-
orem 1) is what we need to show M IX sound, but at first glance,
vironment. Symbolic execution begins with symbolic values α for
it seems suspect because it does not say anything about symbolic
unknown inputs and accumulates a symbolic expression s that rep-
execution being exhaustive. However, if we look at type checking a
resents the result of the program. Then at a high-level, if symbolic
symbolic block (i.e., rule TS YM B LOCK), exhaustiveness is ensured
execution is sound, then a concrete run that begins with JαKV for
through the exhaustive(. . .) constraint.
inputs should produce the expression JsKV . In particular, we can state exhaustive symbolic execution as
To formalize this notion, we need a soundness relation between a corollary, and the case for TS YM B LOCK proceeds in the same
the concrete evaluation state and the symbolic execution state, just manner as this corollary.
as in type soundness. The form of our soundness relations for
symbolic execution states is as follows: Corollary 1.1 (Exhaustive Symbolic Execution)
Suppose E ` hM ; ei → hM 0 ; vi and we have n > 0 symbolic
hE; M i ∼Λ0 ·V ·Λ hΣ; mi executions
This relation captures two key properties. First, applying the valu- Σ ` hhtrue; mi ; ei ⇓ hSi ; si i such that
ation V to the symbolic state should yield the concrete state (i.e., exhaustive(g(S1 ), . . . , g(Sn )) and
JΣKV = E and JmKV = M ). Second, the types of symbolic ex-
pressions in Σ and m must be correctly related. Recall that an addi- hE; M i ∼Λ0 ·V ·Λ hΣ; mi ,
tional property of our typed symbolic execution is that it tracks the
type of symbolic expressions and halts upon encountering ill-typed then hM ; vi ∼Λ00 ·V 0 ·Λ0 hm(Si ); si i for some i ∈ 1..n, V 0 ⊇
0

expressions. The typing of symbolic reference expressions must be V , and some Λ00 , Λ0 such that Λ00 ∗ Λ0 ⊇ Λ0 ∗ Λ.
with respect to some memory typing. This memory typing is given Here we say that if we have n > 0 symbolic executions that each
by Λ0 and Λ. For technical reasons, we need to separate the loca- start with a path condition of true and where their resulting path
tions in the arbitrary memory on entry Λ0 from the locations that conditions are exhaustive (i.e., g(S1 ) ∨ . . . ∨ g(Sn ) is a tautology
come from allocations during symbolic execution Λ; to get typing meaning it holds under any valuation V ), then one of those sym-
for the entire memory, we write Λ0 ∗ Λ to mean the union of sub- bolic executions must match the concrete execution. Observe that
memory typings Λ0 and Λ with disjoint domains. Analogously, we in this statement, there is no premise on the resulting path condi-
also have a symbolic soundness relation that applies to memory- tion, but rather that we start with a initial path condition of true.
value pairs: hM ; vi ∼Λ0 ·V ·Λ hm; si.
As alluded to above, we first consider a notion of symbolic ex-
ecution soundness with respect to a concrete execution. This no- 4. M IXY: A Prototype of M IX for C
tion is what is stated in the second part of mix soundness (Theo- We have developed M IXY, a prototype tool for C that uses M IX
rem 1). Analogous to type soundness, it says that suppose we have to detect null pointer errors. M IXY mixes a (flow-insensitive) type
a concrete evaluation E ` hM ; ei → r and a symbolic execution qualifier inference system with a symbolic executor. M IXY is built
on top of the CIL front-end for C [Necula et al. 2002], and our with MIX(symbolic). We use CIL’s built-in pointer analysis to find
type qualifier inference system, CilQual, is essentially a simplified the targets of calls through function pointers. Finally, we switch
CIL reimplementation of the type qualifier inference algorithm to symbolic execution for each function marked MIX(symbolic) that
described by Foster et al. [2006]. Our symbolic executor, Otter was discovered at the frontier.
[Reisner et al. 2010], uses STP [Ganesh and Dill 2007] as its SMT In this section, we describe implementation details that are not
solver and works in a manner similar to KLEE [Cadar et al. 2008]. captured by our formal system from Section 3:
Type Qualifiers and Null Pointer Errors. For this application, we • The formal system M IX is based on a type checking system
introduce two qualifier annotations for pointers: nonnull indicates where all types are given. Since type qualifier inference in-
that a pointer must not be null, and null indicates that a pointer may volves variables, we need to handle variables that are not yet
be null. Our inference system automatically annotates uses of the constrained to concrete type qualifiers when transitioning to a
NULL macro with the null qualifier annotation. The type qualifier symbolic block (Section 4.1).
inference system generates constraints among known qualifiers • We need to translate information about aliasing between blocks
and unknown qualifier variables, solves those constraints, and then (Section 4.2).
reports a warning if null values may flow to nonnull positions. • Since the same block or function may be called from multiple
Thus, our type qualifier inference system ensures pointers that may contexts, we need to avoid repeating analysis of the same func-
be null cannot be used where non-null pointers are required. tion (Section 4.3).
For example, consider the following C code: • Since functions can contain blocks and be recursive, we need
1 void free(int ∗nonnull x);
to handle recursion between typed and symbolic blocks (Sec-
2 int ∗id(int ∗p) { return p; } tion 4.4).
3 int ∗x = NULL; Finally, we present our initial experience with M IXY (Section 4.5),
4 int ∗y = id(x);
and we discuss some limitations and future work (Section 4.6).
5 free(y);
Here on line 1 we annotate free to indicate it takes a nonnull pointer. 4.1 Translating Null/Non-null and Type Variables
Then on line 3, we initialize x to be NULL, pass that value through At transitions between typed and symbolic blocks, we need to
id, and store the result in y on line 4. Then on line 5 we call free translate null and nonnull annotations back and forth.
with NULL.
Our qualifier inference system will generate the following types From Types to Symbolic Values. Suppose local variable x has
and constraints (with some simplifications, and ignoring l- and r- type int ∗nonnull. Then in the symbolic executor, we initialize x
value issues): to point to a fresh memory cell. If x has type int ∗null, then we ini-
free : int ∗ nonnull → void x : int ∗β tialize x to be (α:bool)?loc:0, where α is a fresh boolean that may
be either true or false, loc is a newly initialized pointer (described
id : int ∗γ → int ∗δ y : int ∗ in Section 4.2), and 0 represents null. Hence this expression means
null = β β=γ γ=δ δ=  = nonnull x may be either null or non-null, and the symbolic executor will try
both possibilities.
Here β, γ, δ, and  are variables that standard for unknown quali- A more interesting case occurs if a variable x has a type with
fiers. Put together, these constraints require null = nonnull, which a qualifier variable (e.g., int ∗β ). In this case, we first try to solve
is not allowed, and hence qualifier inference will report an error for the current set of constraints to see whether β has a solution as
this program. either null or nonnull, and if it does, we perform the translation
Our symbolic executor also looks for null pointer errors. The given above. Otherwise, if β could be either, we first optimistically
symbolic executor tracks C values at the bit level, using a repre- assume it is nonnull.
sentation similar to KLEE [Cadar et al. 2008]. A null pointer is We can safely use this assumption when returning from a typed
represented as the value 0, and the symbolic executor reports an block to a symbolic block since such a qualifier variable can only
error if 0 is ever dereferenced. be introduced when variables are aliased (e.g., via pointer assign-
Typed and Symbolic Blocks. In our formal system, we allow ment), a case that is separately taken into account by the M IXY
typed and symbolic blocks to be introduced anywhere in the memory model (Section 4.2).
program. In M IXY, these blocks can only be introduced around However, if we use this assumption when entering a symbolic
whole function bodies by annotating a function as MIX(typed) or block from a typed block, we may later discover our assumption
MIX(symbolic), and M IXY switches between qualifier inference and was too optimistic. For example, consider the following code:
symbolic execution at function calls. We can simulate blocks within 1 {t int ∗x; {s x = NULL; s} ; {s free(x); s} t}
functions by manually extracting the relevant code into a fresh
function. In the type system, x has type int ∗ β , where initially β is uncon-
Skipping some details for the moment, this switching process strained. Suppose that we analyze the symbolic block on the right
works as follows. When M IXY is invoked, the programmer speci- before the one on the left. This scenario could happen because the
fies (as a command-line option) whether to begin in a typed block or analysis of the enclosing typed block does not model control-flow
a symbolic block. In either case, we first initialize global variables order (i.e., is flow-insensitive). Then initially, we would think the
as appropriate for the analysis, and then analyze the program start- call to free was safe because we optimistically treat unconstrained
ing with main. In symbolic execution mode, we begin simulating β as nonnull—but this is clearly not accurate here.
the program at the entry function, and at calls to functions that are The solution is, as expected, to repeat our analyses until we
either unmarked or are marked as symbolic, we continue symbolic reach a fixed point. In this case, after we analyze the left symbolic
execution into the function body. At calls to functions marked with block, we will discover a new constraint on x, and hence when we
MIX(typed), we switch to type inference starting with that function. iterate and reanalyze the right symbolic block, we will discover the
In type inference mode, we begin analysis at the entry function error. We are computing a least fixed point because we start with
f, applying qualifier inference to f and all functions reachable from f optimistic assumptions—nothing is null—and then monotonically
in the call graph, up to the frontier of any functions that are marked discover more expressions may be null.
From Symbolic Values to Types. We use the SMT solver to block to a typed block, we add constraints to require that all may-
discover the possible final values of variables and translate those to aliased expressions have the same type.
the appropriate types. Given a variable x that is mapped to symbolic
expression s, we ask whether g ∧ (s = 0) is satisfiable where g is 4.3 Caching Blocks
the path condition. If the condition is satisfiable, we constrain x to In C, a block or function may be called from many different call
be null in the type system. There are no nonnull constraints to be sites, so we may need to analyze that block in the context of
added since they correspond to places in code where pointers are each call site. Since it can be quite costly to analyze that block
dereferenced, which is not reflected in symbolic values. repeatedly, we cache the calling context and the results of the
Thus, null pointers from symbolic blocks will lead to errors analysis for that block, and we reuse the results when the block
in typed blocks if they flow to a nonnull position; whereas null is called again with a compatible calling context. Conceptually,
pointers from typed blocks will lead to errors in symbolic blocks if caching is similar to compositional symbolic execution [Godefroid
they are dereferenced symbolically. 2007]; in M IXY, we implement caching as an extension to the
mix rules, using types to summarize blocks rather than symbolic
4.2 Aliasing and M IXY’s Memory Model constraints.
The formal system M IX defers all reasoning about aliasing to as
Caching Symbolic Blocks. Before we translate the types from the
late of a time as possible. As alluded to in Section 3, this choice
enclosing typed block to symbolic values, we first check to see
may be difficult to implement in practice given limitations in the
if we have previously analyzed the same symbolic block with a
constraint solver. Thus in M IXY, we use a pre-pass pointer analysis
compatible calling context. We define the calling context to be the
to initialize aliasing relationships.
types for all variables that will be translated into symbolic values,
Typed to Symbolic Block. When we switch from a typed block and we say two calling contexts are compatible if every variable
to a symbolic block, we initialize a fresh symbolic memory, which has the same type in both contexts.
may include pointers. We use a variant of the approach described in If we have not analyzed the symbolic block before with a com-
Section 3 that makes use of aliasing information to be more precise. patible calling context, we translate the types into symbolic values,
Rather than modeling memory as one big array, M IXY models analyze the symbolic block, and translate the symbolic values to
memory as a map from locations to separate arrays. Aliasing within types by adding type constraints as usual. At this point, we will
arrays is modeled as in our formalism, and aliasing between arrays cache the translated types for this calling context; we cache the
is modeled using Morris’s general axiom of assignment [Bornat translated types instead of the symbolic values since the translation
2000; Morris 1982]. from symbolic values to types is expensive. Otherwise, if we have
C also supports a richer variety of types such as arrays and analyzed the symbolic block before with a compatible calling con-
structs, as well as recursive data structures. M IXY lazily initializes text, we use the cached results by adding null type constraints for
memory in an incremental manner so that we can sidestep the issue null cached types in a manner similar to translating symbolic val-
of initializing an arbitrarily recursive data structure; M IXY only ues. Finally, in both cached and uncached cases, we restore aliasing
initializes as much as is required by the symbolic block. We use relationships and return to the enclosing typed block as usual.
CIL’s pointer analysis to determine possible points-to relationships
Caching Typed Blocks. Caching for typed blocks is similarly im-
and initialize memory accordingly.
plemented, but with one difference: unlike above, we first translate
Symbolic to Typed Block. An issue arises from using type infer- symbolic values into types, then use the translated types as the call-
ence when we switch from a symbolic block to a typed block. Con- ing context, and finally cache the final types as the result of analyz-
sider the following code snippets, which are identical except that y ing the typed block. We could have chosen to use symbolic values
points to r on the left, and y points to x on the right: as the calling context and the result, but since translating symbolic
values to types or comparing symbolic values both involve similar
{s {s number of calls to the SMT solver, we chose to use types to unify
// ∗y not aliased to x // ∗y aliased to x the implementation.
int ∗x = . . .; int ∗x = . . .;
int ∗r = . . ., ∗∗y = &r; int ∗∗y = &x; 4.4 Recursion between Typed and Symbolic Blocks
{t // okay {t // should fail
A typed block and a symbolic block may recursively call each
x = NULL; x = NULL;
assert nonnull(∗y); t} assert nonnull(∗y); t} other, and we found block recursion to be surprisingly common
s} s}
in our experiments. Without special handling for recursion, M IXY
will keep switching between them indefinitely since a block is
In both cases, at entry to the typed blocks, x and ∗y are assigned analyzed with a fresh initial state upon every entry. Therefore, we
types β ref and γ ref respectively, based on their current values. need to detect when recursion occurs, either beginning with a typed
Notice, however, that for the code on the right, we should also block or a symbolic block, and handle it specially.
have β = γ . Otherwise, after the assignment x = NULL, we will To handle recursion, we maintain a block stack to keep track of
not know that ∗y is also NULL. blocks that are currently being analyzed. Similar to a function call
This example illustrates an important difference between type stack, the block stack is a stack of blocks and their calling contexts,
inference and type checking. In type checking, this problem cannot which are defined in terms of types as in caching (Section 4.3). We
arise because every value has a known type, and we only have push blocks onto the stack upon entry and pop them upon return.
to check that those types are consistent. However, type inference Before entering a block, we first look for recursion by search-
actually has to discover richer information, such as what types must ing the block stack for the same block with a compatible calling
be equal because of aliasing, in order to find a valid typing. context. If recursion is detected, then instead of entering the block,
One solution to this problem would be to translate aliasing in- we mark the matching block on the stack as recursive and return an
formation from symbolic execution to and from type constraints. In assumption about the result. For the initial assumption, we use the
M IXY, we use an alternative solution that is easier to implement: calling context of the marked block, optimistically assuming that
we use CIL’s built-in may pointer analysis to conservatively dis- the block has no effect. When we eventually return to the marked
cover points-to relationships. When we transition from a symbolic block, we compare the assumption with the actual result of analyz-
ing the block. If the assumption is compatible with the actual result, Annotating function str next dirent as symbolic, while leaving
we return the result; otherwise, we re-analyze the block using the sysutil next dirent and str alloc text as typed, successfully elim-
actual result as the updated assumption until we reach a fixed point. inates this warning: the symbolic executor correctly determines
that p filename is not null when it is used as an argument to
4.5 Preliminary Experience str alloc text. And although the extra precision does not matter
We gained some initial experience with M IXY by running it on in this particular example, notice that the call on line 8 will be an-
vsftpd-2.0.7 and looking for false null pointer warnings from alyzed in a separate invocation of the type system than the call on
pure type qualifier inference that can be eliminated with the addi- line 10, thus introducing some context-sensitivity.
tion of symbolic execution. Since M IXY is in the prototype stage,
we started small. Rather than annotate all dereferences as requiring Case 3: Flow- and path-insensitivity in dns resolve and main
nonnull, we added just one nonnull annotation: 1 void main BLOCK(struct sockaddr∗∗ p sock) MIX(symbolic) {
sysutil free(void ∗ nonnull p ptr) MIX(typed) { . . . } 2 ∗p sock = NULL;
3 dns resolve(p sock, tunable pasv address);
The sysutil free function wraps the free system call and checks, at 4 }
run time, that the pointer argument is not null. In essence, our anal- 5 int main(. . .) {
ysis tries to check this property statically. We annotated sysutil free 6 . . .main BLOCK(&p addr); . . .; sysutil free(p addr); . . .
7 }
itself with MIX(typed), so M IXY need not symbolically execute its
8 void dns resolve(struct sockaddr∗∗ p sock,
body—our annotation captures the important part of its behavior 9 const char∗ p name) {
for our analysis. 10 struct hostent∗ hent = gethostbyname(p name);
We then ran M IXY on vsftpd, beginning with typing at the out- 11 sockaddr clear(p sock);
ermost level. We examined the resulting warnings and then tried 12 if (hent→h addrtype == AF INET)
adding MIX(symbolic) annotations to eliminate warnings. We suc- 13 sockaddr alloc ipv4(p sock);
ceeded in several cases, discussed next. We did not fully examine 14 else if (hent→h addrtype == AF INET6)
many of the other cases, but Section 4.6 describes some prelimi- 15 sockaddr alloc ipv6(p sock);
nary observations about M IXY in practice. Note that the code snip- 16 else
pets shown below are abbreviated, and many identifiers have been 17 die(”gethostbyname(): neither IPv4 nor IPv6”);
18 }
shortened. We should also point out that all the examples below
eliminate one or more imprecise qualifier flows from type qualifier There are two sources of null values in the code above: ∗p sock
inference; this pruning may or may not suppress a given warning, is set to null on line 2; and sockaddr clear, which was previously
depending on whether other flows could produce the same warning. marked as symbolic in Case 1 above, also sets ∗p sock to null on
Case 1: Flow and path insensitivity in sockaddr clear line 11 in dns resolve. Due to flow insensitivity in the type system,
both these null values eventually reach sysutil free on line 6, leading
1 void sockaddr clear(struct sockaddr ∗∗p sock) MIX(symbolic) { to false warnings.
2 if (∗p sock != NULL) { However, we can see that these null values are actually overwrit-
3 sysutil free(∗p sock);
ten by non-null values on lines 13 and 15, where sockaddr alloc ipv4
4 ∗p sock = NULL;
5 } or sockaddr alloc ipv6 allocates the appropriate structure and as-
6 } signs it to ∗p sock (not shown). We can eliminate these warnings
by extracting the code in main that includes both null sources into
This function is implicated in a false warning: due to flow insen- a symbolic block.
sitivity in the type system, the null assignment on line 4 flows to Also, there is a system call gethostbyname on line 10 that we
the argument to sysutil free on line 3, even though the assignment need to handle. Here, we define a well-behaved, symbolic model
occurs after the call. Also, the type system ignores the null check of gethostbyname that returns only AF INET and AF INET6 as is
on line 2 due to path-insensitivity. standard (not shown). This will cause the symbolic executor to skip
Marking sockaddr clear with MIX(symbolic) successfully resolves the last branch on line 17, which we need to do because we cannot
this warning: the symbolic executor determines that ∗p sock is not analyze die symbolically as it eventually calls a function pointer, an
null when used as an argument to sysutil free(). operation that our symbolic executor currently has limited support
Case 2: Path and context insensitivity in str next dirent for. We also cannot put gethostbyname or die in typed blocks in this
case, since ∗p sock is null and will result in false warnings.
1 void str alloc text(struct mystr∗ p str) MIX(typed);
2 const char∗ sysutil next dirent(. . .) MIX(typed) { Case 4: Helping symbolic execution with symbolic function point-
3 if (p dirent == NULL) return NULL; ers
4 }
5 void str next dirent(. . .) MIX(symbolic) { 1 void sysutil exit BLOCK(void) MIX(typed) {
6 const char∗ p filename = sysutil next dirent(. . .); 2 if (s exit func) (∗s exit func)();
7 if (p filename != NULL) 3 }
8 str alloc text(p filename); 4 void sysutil exit(int exit code) {
9 } 5 sysutil exit BLOCK();
10 . . .str alloc text(str); sysutil free(str); . . . 6 exit(exit code);
7 }
In this example, the function str next direct calls sysutil next dirent
on line 6, which may return a null value. Hence p filename may be In several instances, we would like to evaluate symbolic blocks
null. The type system ignores the null check on line 7 and due to that call sysutil exit, defined on line 4, which in turn calls exit to
context-insensitivity, conflates p filename with other variables, such terminate the program. However, before terminating the program,
as str, that are passed to str alloc text (lines 8 and 10). Hence the sysutil exit calls the function pointer s exit func on line 2. Our sym-
type system believes str may be null. However, str is used as an bolic executor does not support calling symbolic function pointers
argument to sysutil free (line 10), which leads the type system to (i.e., which targets are unknown), so instead, we extract the call to
report a false warning. s exit func into a typed block to analyze the call conservatively.
4.6 Discussion and Future Work execution to explore a small subset of the possible program paths,
Our preliminary experience provides some real-world validation since in the presence of loops with symbolic bounds, pure symbolic
of M IX’s efficacy in removing false positives. However, there are execution will not terminate in a reasonable amount of time (unless
several limitations to be addressed in future work. loop invariants are assumed). In the M IX formalism, in contrast, we
Most importantly, the overwhelming source of issues in M IXY use symbolic execution in a sound manner by exploring all paths,
is its coarse treatment of aliasing, which relies on an imprecise which is possible because we can use type checking on parts of the
pointer analysis. One immediate consequence is that it impedes per- code where symbolic execution takes too long. Of course, it is also
formance in the symbolic executor: if an imprecise pointer analysis possible to mix unsound symbolic execution with type checking, to
returns large points-to sets for pointers, translating symbolic point- gain whatever level of assurance the user desires.
ers to type constraints becomes slow because we first need to check There are several static analyses that can operate at different lev-
if each pointer target is valid in the current path condition by call- els of abstraction. Bandera [Corbett et al. 2000] is a model check-
ing the SMT solver, then determine if any valid targets may be null. ing system that uses abstraction-based program specialization, in
This leads to a significant slowdown: our small examples from Sec- which the user specifies the exact abstractions to use. System Z
tion 4.5 take less than a second to run without symbolic blocks, but is an abstract interpreter generator in which the user can tune the
from 5 to 25 seconds to run with one symbolic block, and about level of abstraction to trade off cost and precision [Yi and Harri-
60 seconds with two symbolic blocks. This issue is further com- son 1993]. Tuning these systems requires a deep knowledge of pro-
pounded by the fixed-point computation that repeatedly analyzes gram analysis. In contrast, we believe that M IX’s tradeoff is eas-
symbolic blocks nested in typed blocks or for handling recursion. ier to understand—one selects between essentially no abstraction
We also noticed several cases in vsftpd where calls to symbolic (symbolic execution), or abstraction in terms of types, which are
blocks would help introduce context sensitivity to distinguish calls arguably the most successful, well-understood static analysis.
to malloc. However, since we rely on a context-insensitive pointer M IX bears some resemblance to static analysis based on ab-
analysis to restore aliasing relationships when switching to typed straction refinement, such as SLAM [Ball and Rajamani 2002],
blocks, these calls will again be conflated. The issue especially af- BLAST [Henzinger et al. 2004], and client-driven pointer analy-
fects the analysis of typed-to-symbolic-to-typed recursive blocks sis [Guyer and Lin 2005]. These tools incrementally refine their
because the nested typed blocks are polluted by aliasing relation- abstraction of the program as necessary for analysis. Adding sym-
ships from the entire program. A similar issue occurs with symbolic bolic blocks to a program can be seen as introducing a very precise
blocks, as pointers are initialized to point to targets from the entire “refinement” of the program abstraction.
program, rather than being limited to the enclosing context. There are a few systems that combine type checking or infer-
Just as in the formalism, M IXY has to consider the entire mem- ence with other analyses. Dependent types provide an elegant way
ory when switching from typed to symbolic or vice-versa. Since to augment standard type with very rich type refinements [Xi and
this was a deliberate design decision, we were not surprised to find Pfenning 1999]. Liquid types combines Hindley-Milner style type
out that this has an impact on performance and leads to many limi- inference with predicate abstraction [Rondon et al. 2008, 2010].
tations in practice. Any temporary violation of type invariants from Hybrid types combines static typing, theorem proving, and dy-
symbolic blocks would immediately be flagged when switching to namic typing [Flanagan 2006]. All of these systems combine types
typed blocks, even if they have no effect on the code in the typed with refinements at a deep level—the refinements are placed “on
blocks. In the other direction, symbolic blocks are forced to start top of” the type structure. In contrast, M IX uses a much coarser
with a fresh memory when switching from typed blocks even if approach in which the precise analysis is almost entirely separated
there were no effects. from the type system, except for a thin interface between the two
Ultimately, we believe that these issues can be addressed with systems.
more precise information about aliasing as well as effects, perhaps Many others have considered the problem of combining pro-
extracted directly from the type inference constraints and symbolic gram analyses. A reduced product in abstract interpretation [Cousot
execution. and Cousot 1979] is a theoretical description of the most precise
In addition to checking for null pointer errors, we plan to ex- combination of two abstract domains. It is typically obtained via
tend M IXY to check other properties, such as data races, and to manually defined reduction operators that depend on the domains
mix other types of analysis together. We also plan to investigate au- being combined. Another example of combining abstract domains
tomatic placement of type/symbolic blocks, i.e., essentially using is the logical product of Gulwani and Tiwari [2006]. Combining
M IX as an intermediate language for combining analyses. One idea program analyses for compiler optimizations is also well-studied
is to begin with just typed blocks and then incrementally add sym- (e.g., Lerner et al. [2002]). In all of these cases, the combinations
bolic blocks to refine the result. This approach resembles abstrac- strengthen the kinds of derivable facts over the entire program.
tion refinement (e.g., Ball and Rajamani [2002]; Henzinger et al. With M IX, we instead analyze separate parts of the program with
[2004]), except the refinement can be obtained using completely different analyses. Finally, M IX was partially inspired by Nelson-
different analyses instead of one particular family of abstractions. Oppen style cooperating decision procedures [Nelson and Oppen
1979]. One important feature of the Nelson-Oppen framework is
that it provides an automatic method for distributing the appropri-
5. Related Work ate formula fragments to each solver (if that the solvers match cer-
tain criteria). Clearly M IX is targeted at solving a very different
There are several threads of related work. There have been numer- problem, but it would be an interesting direction for future work to
ous proposals for static analyses based on type systems; see Pals- try to extend M IX into a similar framework that can automatically
berg and Millstein [2008] for pointers. Symbolic execution was first integrate analyses that have appropriately structured interfaces.
proposed by King [1976] as an enhanced testing strategy, but was
difficult to apply for many years. Recently, SMT solvers have be-
come very powerful, making symbolic execution much more at- 6. Conclusion
tractive as even very complex path conditions can be solved sur- We presented M IX, a new approach for mixing type checking and
prisingly fast. There have been many recent, impressive results us- symbolic execution to trade off efficiency and precision. The key
ing symbolic execution for bug finding [Cadar et al. 2006, 2008; feature of our approach is that the mixed systems are essentially
Godefroid et al. 2005; Sen et al. 2005]. These systems use symbolic completely independent, and they are used in an off-the-shelf man-
ner. Only at the boundaries between typed blocks—which the user James C. King. Symbolic execution and program testing. Commun. ACM,
inserts to indicate where type checking should be used—and sym- 19(7):385–394, 1976.
bolic blocks—the symbolic checking annotation—do we invoke Sorin Lerner, David Grove, and Craig Chambers. Composing dataflow
special mix rules to translate information between the two sys- analyses and transformations. In Principles of Programming Languages
tems. We proved that M IX is sound (which implies that type check- (POPL), pages 270–282, 2002.
ing and symbolic execution are also independently sound). We Rupak Majumdar and Koushik Sen. Hybrid concolic testing. In Inter-
also described a preliminary implementation, M IXY, which per- national Conference on Software Engineering (ICSE), pages 416–426,
forms null/non-null type qualifier inference for C. We identified 2007.
several cases in which symbolic execution could eliminate false Joe M. Morris. A general axiom of assignment. Assignment and linked
positives from type inference. In sum, we believe that M IX provides data structure. A proof of the Schorr-Waite algorithm. In Theoretical
a promising new approach to trade off precision and efficiency in Foundations of Programming Methodology, pages 25–51, 1982.
static analysis. George C. Necula, Scott McPeak, Shree Prakash Rahul, and Westley
Weimer. CIL: Intermediate language and tools for analysis and transfor-
mation of C programs. In Compiler Construction (CC), pages 213–228,
Acknowledgments 2002.
We would like to thank the anonymous reviewers and Patrice Gode- Greg Nelson and Derek C. Oppen. Simplification by cooperating decision
froid for their helpful comments and suggestions. This research was procedures. ACM Trans. Program. Lang. Syst., 1(2):245–257, 1979.
supported in part by DARPA ODOD.HR00110810073, NSF CCF- Jens Palsberg and Todd Millstein. Type Systems: Advances and Applica-
0541036, and NSF CCF-0915978. tions. In The Compiler Design Handbook: Optimizations and Machine
Code Generation, chapter 9. 2008.
References Polyvios Pratikakis, Jeffrey S. Foster, and Michael W. Hicks. Locksmith:
context-sensitive correlation analysis for race detection. In Program-
Thomas Ball and Sriram K. Rajamani. The SLAM project: debugging
ming Language Design and Implementation (PLDI), pages 320–331,
system software via static analysis. In Principles of Programming
2006.
Languages (POPL), pages 1–3, 2002.
Elnatan Reisner, Charles Song, Kin-Keung Ma, Jeffrey S. Foster, and Adam
Richard Bornat. Proving pointer programs in Hoare logic. In Mathematics Porter. Using symbolic evaluation to understand behavior in config-
of Program Construction (MPC), pages 102–126, 2000. urable software systems. In International Conference on Software Engi-
Cristian Cadar, Vijay Ganesh, Peter M. Pawlowski, David L. Dill, and neering (ICSE), 2010. To appear.
Dawson R. Engler. EXE: automatically generating inputs of death. In Patrick M. Rondon, Ming Kawaguci, and Ranjit Jhala. Liquid types.
Computer and Communications Security (CCS), pages 322–335, 2006. In Programming Language Design and Implementation (PLDI), pages
Cristian Cadar, Daniel Dunbar, and Dawson R. Engler. KLEE: Unassisted 159–169, 2008.
and automatic generation of high-coverage tests for complex systems Patrick M. Rondon, Ming Kawaguchi, and Ranjit Jhala. Low-level liquid
programs. In Operating Systems Design and Implementation (OSDI), types. In Principles of Programming Languages (POPL), pages 131–
pages 209–224, 2008. 144, 2010.
James C. Corbett, Matthew B. Dwyer, John Hatcliff, Shawn Laubach, Koushik Sen, Darko Marinov, and Gul Agha. CUTE: a concolic unit testing
Corina S. Păsăreanu, Robby, and Hongjun Zheng. Bandera: extracting engine for C. In Foundations of Software Engineering (FSE), pages 263–
finite-state models from Java source code. In International Conference 272, 2005.
on Software Engineering (ICSE), pages 439–448, 2000.
Hongwei Xi and Frank Pfenning. Dependent types in practical program-
Patrick Cousot and Radhia Cousot. Systematic design of program analysis ming. In Principles of Programming Languages (POPL), pages 214–
frameworks. In Principles of Programming Languages (POPL), pages 227, 1999.
269–282, 1979.
Kwangkeun Yi and Williams Ludwell Harrison, III. Automatic generation
Cormac Flanagan. Hybrid type checking. In Principles of Programming and management of interprocedural program analyses. In Principles of
Languages (POPL), pages 245–256, 2006. Programming Languages (POPL), pages 246–259, 1993.
Jeffrey S. Foster, Robert Johnson, John Kodumal, and Alex Aiken. Flow-
insensitive type qualifiers. ACM Trans. Program. Lang. Syst., 28(6):
1035–1087, 2006.
Vijay Ganesh and David L. Dill. A decision procedure for bit-vectors
and arrays. In Computer-Aided Verification (CAV), pages 519–531, July
2007.
Patrice Godefroid. Compositional dynamic test generation. In Principles of
Programming Languages (POPL), pages 47–54, 2007.
Patrice Godefroid, Nils Klarlund, and Koushik Sen. DART: directed auto-
mated random testing. In Programming Language Design and Imple-
mentation (PLDI), pages 213–223, 2005.
Sumit Gulwani and Ashish Tiwari. Combining abstract interpreters. In Pro-
gramming Language Design and Implementation (PLDI), pages 376–
386, 2006.
Samuel Z. Guyer and Calvin Lin. Error checking with client-driven pointer
analysis. Sci. Comput. Program., 58(1-2):83–114, 2005.
Thomas A. Henzinger, Ranjit Jhala, Rupak Majumdar, and Kenneth L.
McMillan. Abstractions from proofs. In Principles of Programming
Languages (POPL), pages 232–244, 2004.
Khoo Yit Phang, Bor-Yuh Evan Chang, and Jeffrey S. Foster. Mixing
type checking and symbolic execution (extended version). Technical
Report CS-TR-4954, Department of Computer Science, University of
Maryland, College Park, 2010.
Lecture Notes: Concolic Testing

17-355/17-665/17-819O: Program Analysis (Spring 2018)


Jonathan Aldrich and Claire Le Goues
[email protected], [email protected]

1 Motivation
Companies today spend a huge amount of time and energy testing software to determine whether
it does the right thing, and to find and then eliminate bugs. A major challenge is writing a set of
test cases that covers all of the source code, as well as finding inputs that lead to difficult-to-trigger
corner case defects.
Symbolic execution, discussed in the last lecture, is a promising approach to exploring differ-
ent execution paths through programs. However, it has significant limitations. For paths that are
long and involve many conditions, SMT solvers may not be able to find satisfying assignments
to variables that lead to a test case that follows that path. Other paths may be short but involve
computations that are outside the capabilities of the solver, such as non-linear arithmetic or cryp-
tographic functions. For example, consider the following function:
testme(int x, int y){
if(bbox(x)==y){
ERROR;
} else {
// OK
}
}
If we assume that the implementation of bbox is unavailable, or is too complicated for a the-
orem prover to reason about, then symbolic execution may not be able to determine whether the
error is reachable.
Concolic testing overcomes these problems by combining concrete execution (i.e. testing) with
symbolic execution.1 Symbolic execution is used to solve for inputs that lead along a certain
path. However, when a part of the path condition is infeasible for the SMT solver to handle, we
substitute values from a test run of the program. In many cases, this allows us to make progress
towards covering parts of the code that we could not reach through either symbolic execution or
randomly generated tests.

2 Goals
We will consider the specific goal of automatically unit testing programs to find assertion viola-
tions and run-time errors such as divide by zero. We can reduce these problems to input genera-
tion: given a statement s in program P , compute input i such that P piq executes s.2 For example,
1
The word concolic is a portmanteau of concrete and symbolic
2
This formulation is due to Wolfram Schulte

1
if we have a statement assert x > 5, we can translate that into the code:
1 if (!(x > 5))
2 ERROR;
Now if line 2 is reachable, the assertion is violated. We can play a similar trick with run-time
errors. For example, a statement involving division x = 3 / i can be placed under a guard:
1 if (i != 0)
2 x = 3 / i;
3 else
4 ERROR;

3 Overview
Consider the testme example from the motivating section. Although symbolic analysis cannot
solve for values of x and y that allow execution to reach the error, we can generate random test
cases. These random test cases are unlikely to reach the error: for each x there is only one y that
will work, and random input generation is unlikely to find it. However, concolic testing can use
the concrete value of x and the result of running bbox(x) in order to solve for a matching y value.
Running the code with the original x and the solution for y results in a test case that reaches the
error.
In order to understand how concolic testing works in detail, consider a more realistic and more
complete example:
1 int double (int v) {
2 return 2*v;
3 }
4
5 void bar(int x, int y) {
6 z = double (y);
7 if (z == x) {
8 if (x > y+10) {
9 ERROR;
10 }
11 }
12 }
We want to test the function bar. We start with random inputs such as x  22, y  7. We
then run the test case and look at the path that is taken by execution: in this case, we compute
z  14 and skip the outer conditional. We then execute symbolically along this path. Given inputs
x  x0 , y  y0 , we discover that at the end of execution z  2  y0 , and we come up with a path
condition 2  y0 !  x0 .
In order to reach other statements in the program, the concolic execution engine picks a branch
to reverse. In this case there is only one branch touched by the current execution path; this is the
branch that produced the path condition above. We negate the path condition to get 2  y0  x0
and ask the SMT solver to give us a satisfying solution.
Assume the SMT solver produces the solution x0  2, y0  1. We run the code with that input.
This time the first branch is taken but the second one is not. Symbolic execution returns the same
end result, but this time produces a path condition 2  y0  x0 ^ x0 ¤ y0 10.

2
Now to explore a different path we could reverse either test, but we’ve already explored the
path that involves negating the first condition. So in order to explore new code, the concolic
execution engine negates the condition from the second if statement, leaving the first as-is. We
hand the formula 2  y0  x0 ^ x0 ¡ y0 10 to an SMT solver, which produces a solution
x0  30, y0  15. This input leads to the error.
The example above involves no problematic SMT formulas, so regular symbolic execution
would suffice. The following example illustrates a variant of the example in which concolic exe-
cution is essential:
1 int foo(int v) {
2 return v*v\%50;
3 }
4
5 void baz(int x, int y) {
6 z = foo(y);
7 if (z == x) {
8 if (x > y+10) {
9 ERROR;
10 }
11 }
12 }
Although the code to be tested in baz is almost the same as bar above, the problem is more
difficult because of the non-linear arithmetic and the modulus operator in foo. If we take the
same two initial inputs, x  22, y  7, symbolic execution gives us the formula z  py0  y0 q%50,
and the path condition is x0 !  py0  y0 q%50. This formula is not linear in the input y0 , and so it
may defeat the SMT solver.
We can address the issue by treating foo, the function that includes nonlinear computation,
concretely instead of symbolically. In the symbolic state we now get z  f oopy0 q, and for y0  7
we have z  49. The path condition becaomse f oopy0 q!  x0 , and when we negate this we get
f oopy0 q  x0 , or 49  x0 . This is trivially solvable with x0  49. We leave y0  7 as before;
this is the best choice because y0 is an input to f oopy0 q so if we change it, then setting x0  49 may
not lead to taking the first conditional. In this case, the new test case of x  49, y  7 finds the
error.

4 Implementation
Ball and Daniel [1] give the following pseudocode for concolic execution (which they call dynamic
symbolic execution):
1 i = an input to program P
2 while defined(i):
3 p = path covered by execution P(i)
4 cond = pathCondition(p)
5 s = SMT(Not(cond))
6 i = s.model()
Broadly, this just systematizes the approach illustrated in the previous section. However, a
number of details are worth noting:

3
First, when negating the path condition, there is a choice about how to do it. As discussed
above, the usual approach is to put the path conditions in the order in which they were generated
by symbolic execution. The concolic execution engine may target a particular region of code for
execution. It finds the first branch for which the path to that region diverges from the current test
case. The path conditions are left unchanged up to this branch, but the condition for this branch
is negated. Any conditions beyond the branch under consideration are simply omitted. With this
approach, the solution provided by the SMT solver will result in execution reaching the branch
and then taking it in the opposite direction, leading execution closer to the targeted region of code.
Second, when generating the path condition, the concolic execution engine may choose to
replace some expressions with constants taken from the run of the test case, rather than treating
those expressions symbolically. These expressions can be chosen for one of several reasons. First,
we may choose formulas that are difficult to invert, such as non-linear arithmetic or cryptographic
hash functions. Second, we may choose code that is highly complex, leading to formulas that are
too large to solve efficiently. Third, we may decide that some code is not important to test, such
as low-level libraries that the code we are writing depends on. While sometimes these libraries
could be analyzable, when they add no value to the testing process, they simply make the formulas
harder to solve than they are when the libraries are analyzed using concrete data.

5 Acknowledgments
The structure of these notes and the examples are adapted from a presentation by Koushik Sen.

References
[1] T. Ball and J. Daniel. Deconstructing dynamic symbolic execution. In Proceedings of the 2014
Marktoberdorf Summer School on Dependable Software Systems Engineering, 2015.

4
Strictly Declarative Specification of Sophisticated Points-to Analyses

Martin Bravenboer Yannis Smaragdakis


Department of Computer Science
University of Massachusetts, Amherst
Amherst, MA 01003, USA
[email protected] [email protected]

Abstract analyses. It is, thus, not surprising that a wealth of research


We present the D framework for points-to analysis of has been devoted to efficient and precise pointer analysis
Java programs. D builds on the idea of specifying pointer techniques. Context-sensitive analyses are the most common
analysis algorithms declaratively, using Datalog: a logic- class of precise points-to analyses. Context sensitive analysis
based language for defining (recursive) relations. We carry approaches qualify the analysis facts with a context abstrac-
the declarative approach further than past work by describ- tion, which captures a static notion of the dynamic context
ing the full end-to-end analysis in Datalog and optimizing of a method. Typical contexts include abstractions of method
aggressively using a novel technique specifically targeting call-sites (for a call-site sensitive analysis—the traditional
highly recursive Datalog programs. meaning of “context-sensitive”) or receiver objects (for an
As a result, D achieves several benefits, including full object-sensitive analysis).
order-of-magnitude improvements in runtime. We compare In this work we present D: a general and versatile
D with Lhoták and Hendren’s P, which defines the points-to analysis framework that makes feasible the most
state of the art for context-sensitive analyses. For the exact precise context-sensitive analyses reported in the literature.
same logical points-to definitions (and, consequently, identi- D implements a range of algorithms, including context
cal precision) D is more than 15x faster than P for insensitive, call-site sensitive, and object-sensitive analyses,
a 1-call-site sensitive analysis of the DaCapo benchmarks, all specified modularly as variations on a common code base.
with lower but still substantial speedups for other important Compared to the prior state of the art, D often achieves
analyses. Additionally, D scales to very precise analyses speedups of an order-of-magnitude for several important
that are impossible with P and Whaley et al.’s bddbddb, analyses.
directly addressing open problems in past literature. Finally, The main elements of our approach are the use of the Dat-
our implementation is modular and can be easily configured alog language for specifying the program analyses, and the
to analyses with a wide range of characteristics, largely due aggressive optimization of the Datalog program. The use of
to its declarativeness. Datalog for program analysis (both low-level [13,23,29] and
high-level [6,9]) is far from new. Our novel optimization ap-
Categories and Subject Descriptors F.3.2 [Logics and proach, however, accounts for several orders of magnitude of
Meanings of Programs]: Semantics of Programming performance improvement: unoptimized analyses typically
Languages—Program Analysis; D.1.6 [Programming run over 1000 times more slowly. Generally our optimiza-
Techniques]: Logic Programming tions fit well the approach of handling program facts as a
General Terms Algorithms, Languages, Performance database, by specifically targeting the indexing scheme and
the incremental evaluation of Datalog implementations. Fur-
1. Introduction thermore, our approach is entirely Datalog based, encoding
Points-to (or pointer) analysis intends to answer the question declaratively the logic required both for call graph construc-
“what objects can a program variable point to?” This ques- tion as well as for handling the full semantic complexity
tion forms the basis for practically all higher-level program of the Java language (e.g., static initialization, finalization,
reference objects, threads, exceptions, reflection, etc.). This
makes our pointer analysis specifications elegant, modular,
Permission to make digital or hard copies of all or part of this work for personal or but also efficient and easy to tune. Generally, our work is a
classroom use is granted without fee provided that copies are not made or distributed strong data point in support of declarative languages: we ar-
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. To copy otherwise, to republish, to post on servers or to redistribute gue that prohibitively much human effort is required for im-
to lists, requires prior specific permission and/or a fee. plementing and optimizing complex mutually-recursive def-
OOPSLA 2009, October 25–29, 2009, Orlando, Florida, USA. initions at an operational level of abstraction. On the other
Copyright c 2009 ACM 978-1-60558-734-9/09/10. . . $5.00
hand, declarative specifications both admit automatic opti- ing a 1H-object-sensitive analysis without BDDs will require
mizations as well as afford the user the ability to identify new improvements in the data structures and algorithms
and apply straightforward manual optimizations. used to implement points-to analyses” [18]. D achieves
We evaluate D in comparison to Lhoták and Hen- this goal, with fairly routine data structures (plain B-trees).
dren’s P framework [18]. P is based on Binary Furthermore, D reproduces the most complex points-to
Decision Diagrams (BDDs) and represents the state of the analyses of the P set—a result previously considered
art in context sensitive pointer analyses, in terms of both impossible without BDDs. Even more importantly, D
semantic completeness (i.e., support for Java language fea- scales to analyses that are impossible with current BDD-
tures) and scalability. Furthermore, P is a highly flexi- based approaches, such as a 2H-call-site-sensitive analysis.
ble framework that was used to illustrate the different charac- In summary, our work makes the following contributions:
teristics and parameters of context-sensitivity. D has the
same attractive features and yields identical analysis results • We provide the first fully declarative specification of com-
(based on a logically equivalent algorithm). Our 1-call-site- plex, highly precise points-to analyses. Our specification
sensitive analysis of the DaCapo benchmarks applications distills points-to analysis algorithms down to their essence,
(and JDK 1.4) yields an average speedup of 16.3x, lower- instead of confusing the logical statement of an analy-
ing analysis times from several minutes to below a minute sis with implementation details. Past work on specifying
in many cases. For a 1-object-sensitive analysis, D is points-to analyses in Datalog has always been a hybrid be-
15x faster than P. Such speedups are rare in the pro- tween imperative code and a logical specification, omitting
gram analysis literature, especially for completely equiva- essential elements from the logic. For instance, the bddb-
lent analyses. Lhoták and Hendren recently speculated [18] ddb system [28, 29] (which pioneered practical Datalog-
“it should be feasible to implement an efficient non-BDD- based points-to analysis) expresses the core of a points-to
based 1-object-sensitive analysis”. We show that such an analysis in Datalog, while important parts (such as nor-
analysis not only is feasible but also outperforms BDDs by malization and call-graph computation—except for sim-
an order of magnitude. ple, context-insensitive, analyses) are done in Java code.
Generally, our approach reveals interesting insights re- In general, D offers the first declarative specification
garding the use of BDDs, compared to an explicit represen- of a context-sensitive points-to analysis with on-the-fly
tation of relations, for points-to analysis. Our work raises (i.e., fully interleaved) call-graph computation. Addition-
the question of whether the points-to analysis domain has ally, our specification of algorithms is quite sophisticated,
enough regularity for BDDs to be beneficial. Although we addressing elements of the Java language (such as native
have found analyses that are possible with BDDs yet we code, finalization, and privileged actions) that were absent
could not perform with an explicit representation, every such from previous declarative approaches (e.g., bddbddb) and
analysis seemed to suffer from vast (but very regular) im- that crucially affect precision and performance. As a result,
precision. Easy algorithmic enhancements can be applied to D provides an analysis that emulates and often exceeds
reduce the unnecessary redundancy in the relations that a the rich feature set of the P framework, while staying
points-to analysis keeps, and produce an analysis that is both entirely declarative.
much faster without BDDs and more precise. For instance,
• We introduce a novel optimization methodology, applied
D would not scale to a 1H-object-sensitive analysis (i.e.,
entirely at the Datalog level, for producing efficient algo-
1-object-sensitive with a context-sensitive heap) in the form
rithms directly from the logical specification of an anal-
specified in the P analysis set. Yet this is only because
ysis. The optimization approach employs standard pro-
a naive analysis specification results in high redundancy,
gram transformations (such as variable reordering and
which necessitates BDDs. Two simple algorithmic enhance-
folding—a common logic programming optimization) yet
ments suffice for making the analysis feasible for D:
determines when to do so by taking into account the
1) we perform exception analysis on-the-fly [3], computing
“semi-naive” algorithm for incremental evaluation of Dat-
contexts that are reachable because of exceptional control
alog rules, as well as the indexes that are used for each
flow while performing the points-to analysis itself. The on-
relation. As a result, D achieves order-of-magnitude
the-fly exception analysis significantly improves both pre-
performance improvements over the closest comparable
cision and performance; 2) we treat static class initializers
points-to framework in the literature for common context-
context-insensitively (since points-to results are equivalent
sensitive analyses.
for all contexts of static class initializers), thus improving
performance while keeping identical precision. • We show that D scales to perform the most pre-
The result of combining D’s optimization approach cise context-sensitive analyses ever evaluated in the re-
and our algorithmic enhancements is that D addresses search literature. D not only implements the rich set
several open problems in the points-to analysis literature. of analyses of the P system but also scales to anal-
Lhoták and Hendren estimated that “efficiently implement- yses that are beyond reach for P, such as a 2-call-
site-sensitive analysis with a context-sensitive heap, and
a 2-object-sensitive analysis with a (1-context) context- left arrow symbol (<-) to separate the inferred fact (the
sensitive heap. head) from the previously established facts (the body).
• We contrast and study the performance of BDD-based For instance, lines 3-4 above say that if, for some val-
representations for points-to analysis, relative to explicit ues of ?from, ?to, and ?heap, Assign(?from,?to) and
representations. We show how performance is correlated VarPointsTo(?from,?heap) are both true, then it can be in-
with key BDD metrics and extrapolate on the suitability of ferred that VarPointsTo(?to,?heap) is true. Note the base
BDDs for fast and precise points-to analyses. case of the computation above (lines 1-2), as well as the re-
cursion in the definition of VarPointsTo (line 3-4).
The declarativeness of Datalog makes it attractive for
2. Background: Datalog Points-To Analysis
specifying complex program analysis algorithms. Particu-
The use of deductive databases and logic programming lan- larly important is the ability to specify recursive definitions,
guages for program analysis has a long history (e.g., [4, 23]) as program analysis is fundamentally an amalgam of mu-
and has raised excitement again recently [6, 9, 13, 28, 29]. tually recursive tasks. For instance, in order to do accurate
Like our work, much of the past emphasis has been on reachability analysis (i.e., answer the question “is method
using the Datalog language. Datalog is a logic program- m1 reachable from method m2?”) we need to have points-to
ming language originally introduced in the database do- information, so that the target objects of a virtual method
main. At a first approximation, one can view Datalog as ei- call are known. But in order to do points-to analysis, we
ther “SQL with full recursion” or “Prolog without construc- need to have reachability information, to know which vari-
tors/functions”. The essence of the language is its ability to able assignment actions are truly possible. A mutually re-
define recursive relations. Relations (or equivalently predi- cursive definition of a reachability and points-to analysis is
cates) are the main Datalog data type. Computation consists easy to specify in Datalog, and is part of the D frame-
of inferring the contents of all relations from a set of in- work. The elegance of the approach is evident when con-
put relations. For instance, in our pointer analysis domain, it trasted with common implementations of points-to analyses.
is easy to represent the relevant actions of a Java program Even conceptually clean program analysis algorithms that
as relations, typically stored as database tables. Consider rely on mutually recursive definitions often get transformed
two such relations, AssignHeapAllocation(?var,?heap) into complex imperative code for implementation purposes
and Assign(?from,?to). (We follow the convention of cap- (e.g., compare the straightforward logic with the complex
italizing the first letter of relation names, while writing vari- algorithmic specification in Reference [26]).
able names in lower case and prefixing them with a question- Datalog evaluation is typically bottom-up, meaning that
mark.) The former relation represents all occurrences in the known facts are propagated using the rules until a maximal
program of an instruction “a = new A();” where a heap ob- set of derived facts is reached. This is also the link to the
ject is allocated and assigned to a variable. That is, a pre- data processing intended domain of Datalog: evaluation of
processing step takes a Java program (in our implementation a rule can be thought of as a sequence of relational algebra
this is in intermediate, bytecode, form) as input and produces joins and projections. For instance, the evaluation of lines
the relation contents. A static abstraction of the heap object 3-4 in our above example can be thought of as: Take the
is captured in variable ?heap—it can be concretely repre- join of relation Assign with relation VarPointsTo over the
sented as, e.g., a fully qualified class name and the alloca- first column of both (because of common field ?from) and
tion’s bytecode instruction index. Similarly, relation Assign project the join result on fields ?to and ?heap. The result
contains an entry for each assignment between two Java pro- of the projection is added to relation VarPointsTo (skipping
gram (reference) variables. duplicates) and forms the value of VarPointsTo for the next
The mapping between the input Java program and the iteration step. Application of all rules iterates to fixpoint.
input relations is straightforward and purely syntactic. After Note that this means that the evaluation of a Datalog pro-
this step, a simple pointer analysis can be expressed entirely gram comprises two distinct kinds of looping/iteration ac-
in Datalog as a transitive closure computation: tivities: the relational algebra joins and projections, and the
1 VarPointsTo(?var, ?heap) <-
explicit recursion of the program. The former kind of loop-
2 AssignHeapAllocation(?var, ?heap). ing is highly efficient through traditional database optimiza-
3 VarPointsTo(?to, ?heap) <- tions (e.g., for join order, group-fetching of data from disk,
4 Assign(?from, ?to), VarPointsTo(?from, ?heap). locality of reference, etc.).
We use a commercial Datalog engine, developed by our
The Datalog program consists of a series of rules industrial partner, LogicBlox Inc. (The engine is freely avail-
that are used to establish facts about derived relations able for research use through us and we have already granted
(such as VarPointsTo, which is the points-to relation, access to a handful of early adopters.) This version of Data-
i.e., it links every program variable, ?var, with every log allows “stratified negation”, i.e., negated clauses, as long
heap object abstraction, ?heap, it can point to) from a as the negation is not part of a recursive cycle. It also allows
conjunction of previously established facts. We use the
specifying that some relations are functions, i.e., the vari- 3.2 D Contents
able space is partitioned into domain and range variables, D supports a general pointer analysis trunk and several
and there is only one range value for each unique combina- different analysis variations. The main variants we have ex-
tion of values in domain variables. We will see these features plored are a context-insensitive analysis, as well as context
in action in our algorithm specification, next. sensitive analyses with 1- and 2-object, as well as 1- and
2-call-site contexts, with or without a context-sensitive heap
3. D Pointer Analysis Specifications (a.k.a. heap cloning) with different heap context depths. This
Our D framework is a versatile Datalog implementation variability is not directly supported in Datalog: for instance,
of a range of pointer analyses. D is available online at for a context-sensitive analysis, the relation VarPointsTo
http://doop.program-analysis.org. D strives for full needs extra arguments representing the context (be it a call-
Java language support and follows closely the approach of site context, or an object context) of the variable. Similarly,
P—the most complete analysis in prior literature—in for analyses utilizing a context-sensitive heap, the abstrac-
dealing with various Java language features. We next discuss tion of the heap object needs to be qualified by extra vari-
in more detail the features and precision of the framework. ables for its context. (In such analyses, an abstract object
consists of the allocation site and the context of the method
3.1 Overview and Preliminaries that contains that allocation site.) These differences are su-
perficial, however. We have abstracted away from them by
D distills points-to analysis algorithms to a purely declar-
creating a small extension of Datalog that allows tuples of
ative specification. An advantage of a declarative specifica-
variables in place of a single variable. The extension is im-
tion is that it dissociates the logic of the analysis (i.e., the
plemented as a macro and hides the configuration of the
precision of the end result as well as intermediate results)
particular analysis, to the extent possible. The plain-Datalog
from the implementation decisions used to perform the anal-
code for each analysis is then generated by instantiating the
ysis efficiently. The resulting specification is a Datalog pro-
macros. The total size of the analysis logic in D is less
gram, and is, therefore, executable. Nevertheless, the pro-
than 2500 lines of code (approximately 180 Datalog pro-
gram may not be efficient as originally specified. The goal of
gram rules) and another some 1000 lines of relation declara-
our optimization methodology (described in Section 4) is to
tions (i.e., specifications of the database schema), comments,
produce equivalent Datalog programs that are more efficient.
and minor support code. These metrics include all pointer
For the rest of this section, however, we are only concerned
analysis variants, but commonalities are factored out using
with the logical specification of the analyses.
our variable-tuple mechanism. The plain-Datalog size of a
This separation of specification from implementation is
single analysis variant after macro-expansion is in the or-
already done informally, as a classification, in the points-
der of 500-1000 lines, or 120-150 Datalog rules. In the code
to analysis literature. Several different published algorithms
examples of this paper, unless stated otherwise, we will ig-
occupy the same point in the design space (e.g., they are all
nore variations and concentrate on the standard 1-call-site-
1-object-sensitive analyses) but differ in properties such as
sensitive analysis for concreteness.
their average runtime or asymptotic complexity, often be-
The analysis logic in D can be viewed as an elabora-
cause of different choices of indexing and storage data struc-
tion of the simple Datalog example shown earlier. Consider
tures. Hence, the effort to specify an analysis in D con-
the full-fledged analogues of the two basic rules from Sec-
sists of, first, producing a logical specification and, then, de-
tion 2.
riving an efficient algorithm for that specification. The two
steps are not entirely independent, because it is sometimes 1 VarPointsTo(?ctx, ?var, ?heap) <-
hard to tell which decisions are part of the “specification” 2 AssignHeapAllocation(?var, ?heap, ?inmethod),
of an analysis and which are part of the “implementation”. 3 CallGraphEdge(_, _, ?ctx, ?inmethod).
For example, treating the static initializers of Java classes 4

context-insensitively (even for a context-sensitive analysis) 5 VarPointsTo(?toCtx, ?to, ?heap) <-


6 Assign(?fromCtx, ?from, ?toCtx, ?to, ?type),
does not affect the end result of an analysis, but has a ma-
7 VarPointsTo(?fromCtx, ?from, ?heap),
jor impact on its runtime. However, expressing this deci-
8 HeapAllocation:Type[?heap] = ?heaptype,
sion affects the specification: the two Datalog programs are 9 AssignCompatible(?type, ?heaptype).
not equivalent. (What makes these specifications equivalent
is extra knowledge about the input relations, i.e., a restric- (We use some extensions and notational conventions in the
tion of the input domain that only the algorithm designer code. First, some of our relations are functions, and the func-
knows.) In this paper, we call optimizations the transforma- tional notation “Relation[?domainvar] = ?val” is used in-
tions that produce an equivalent Datalog program (i.e., all stead of the relational notation, “Relation(?domainvar,
relations have the same contents for all inputs), and call log- ?val)”. Semantically the two are equivalent, but the execu-
ical enhancements or algorithmic enhancements the trans- tion engine enforces the functional constraint and produces
formations that logically change the original specification. an error if a computation causes a function to have multiple
range values for the same domain value. Second, the colon CallGraphEdge(?callerCtx, ?call, ?calleeCtx, ?callee) <-
(:) in relation names is just a regular character with no se- VirtualMethodCall:Base[?call] = ?base,
VirtualMethodCall:SimpleName[?call] = ?name,
mantic significance—we use common prefixes ending with a
VirtualMethodCall:Descriptor[?call] = ?descriptor,
colon as a lexical convention for grouping related predicates. VarPointsTo(?callerCtx, ?base, ?heap),
Finally, “ ” stands for “any value”, in the standard logic pro- HeapAllocation:Type[?heap] = ?heaptype,
gramming convention.) MethodLookup[?name, ?descriptor, ?heaptype] = ?callee,
?calleeCtx = ?call.
The full rules differ from their simplified versions in sev-
eral ways. First, all relations have extra arguments for the Figure 1. Computing (context-sensitive) call graph edges
context of Java variables: wherever the original relations had from a call-site to a method, both under specific contexts.
a Datalog variable that corresponded to a Java program vari- A call graph edge exists if there exists a virtual method call,
able (e.g., ?from, ?to) the full relations have first a Data- ?call, whose receiver object is referenced through variable
log variable corresponding to a context, and then one corre- ?base, which points to a heap object, ?heap, whose type
sponding to the Java variable. Second, for an allocation to contains a method, ?callee, compatible with the virtual call.
flow to a variable in a given context, the allocation site has The context of ?callee for this call is just the call-site,
to be reachable in the given context, from any other method ?call, since the code is for a 1-call-site-sensitive analysis.
and context (line 3). Finally, variable assignments take into
MethodLookup[?name, ?descriptor, ?type] = ?method <-
account the type system (through AssignCompatible, on line MethodImpl[?name, ?descriptor, ?type] = ?method.
9) so that a variable is never considered to point to an object
abstraction if its type prohibits it. MethodLookup[?name, ?descriptor, ?type] = ?method <-
Some more rules complete the definition of VarPointsTo. DirectSuperclass[?type] = ?supertype,
MethodLookup[?name, ?descriptor, ?supertype] = ?method,
The full analysis takes into account method calling, assign- not exists MethodImpl[?name, ?descriptor, ?type].
ment to fields, arrays, and more.
Importantly, the entire analysis is specified in Datalog, MethodImpl[?name, ?descriptor, ?type] = ?method <-
including call graph construction. That is, the interdepen- MethodDecl[?name, ?descriptor, ?type] = ?method,
not MethodModifier("abstract", ?method).
dency between call graph construction (i.e., which methods
are reachable in a given context) and points-to analysis is ex- Figure 2. The definition of relation MethodLookup, used in
pressed as plain Datalog mutual recursion. This allows call Figure 1. Looking up a method with a specific name (?name),
graph discovery on-the-fly, which Lhoták and Hendren [18] return type, and parameter types (?descriptor) in a given
find to be an important asset for precision. Previous pointer type (?type) is done by either finding a non-abstract method
analysis algorithms in Datalog (mainly Whaley et al.’s bddb- declaration within ?type, or repeating the lookup for the
ddb and its client analyses [21, 28, 29]) did not support on- direct superclass of ?type if no such declaration exists. (The
the-fly call-graph discovery, except for very simple, context- syntax “not exists F[x]” means that there is no value v for
insensitive, analyses.1 which F[x] = v.)
For a concrete instance of the mutual recursion, we can
look at one of the rules defining the CallGraphEdge rela- closely modeled the handling of Java features after the logic
tion (which is used to compute VarPointsTo and itself uses in the P system. P covers several complex Java
VarPointsTo). The rule computes call graph edges due to features and semantic complexities (e.g., finalization, privi-
virtual method invocations and is shown in Figure 1. The leged actions, threads, etc.). Implementing an analysis that
definition of CallGraphEdge also uses an auxiliary defini- is logically equivalent to P helps demonstrate that our
tion, shown in Figure 2, of a virtual method lookup relation. Datalog-based approach is a full-featured implementation
Combined, this is the declarative specification of fully on- and not a toy or a proof-of-concept.
the-fly call graph discovery, which is more precise than a Indeed, in several cases we found ways to add more
pre-computed call graph, as in bddbddb. precision or model Java semantics better than P, thus
improving over past state-of-the-art techniques and mak-
3.3 Support for Java Language Features ing D probably the most sophisticated pointer analysis
D offers full support for Java language semantics, en- framework in existence for Java. This sophistication is im-
tirely in Datalog, without other peripheral analyses. We portant for client analyses that need sound results. For in-
stance, compared to P, D adds such features as:
1 Specifically,the bddbddb work [13, 29] computes the call-graph on-the-
fly with a context-insensitive analysis, and then uses it as input to context-
sensitive analyses. Thus, the added precision of the context-sensitive points-
• Better initialization of the Java Virtual Machine. For ex-
to analysis is not available to the call-graph computation, which, in turn, ample, we model the system and main thread group, main
reduces the precision of the points-to analysis. This limitation is not inci- thread.
dental. Since context-sensitivity in bddbddb is handled through a cloning
approach [29], a pre-computed call-graph is necessary: cloning techniques • Full support for Java’s reference objects (such as
are based on copying methods for each of their calling contexts. WeakReference) and reference queues. For example, ref-
erence queues are used by Java Virtual Machine to invoke /** If S is an ordinary (nonarray) class, then:
finalize methods. * o If T is a class type, then S must be the
* same class as T, or a subclass of T.
• More sophisticated reflection analysis. For example, */
D uses distinct representations of instances of CheckCast(?s, ?s) <- ClassType(?s).
java.lang.Class for every class in the analyzed program. CheckCast(?s, ?t) <- Subclass(?t, ?s).
This reduces the number of human configuration points, /** o If T is an interface type, then S must
solves more reflection scenarios automatically, and im- * implement interface T.
proves precision. */
CheckCast(?s, ?t) <- ClassType(?s),
• More precise class initialization, modeling better the Java Superinterface(?t, ?s).
Language Specification.
/** If S is an interface type, then:
• More precise handling of cast and assignment compatibil- * o If T is a class type, then T must be Object
ity checking. */
CheckCast(?s, "java.lang.Object") <- InterfaceType(?s).
• More precise exception analysis, using logic that is mu-
tually recursive with the main points-to logic. Exceptions /** o If T is an interface type, then T must be the
are propagated over the context-sensitive call graph, caught * same interface as S or a superinterface of S
*/
exceptions are filtered, and the order of exception handlers CheckCast(?s, ?s) <- InterfaceType(?s).
is considered. In a separate publication [3] we describe CheckCast(?s, ?t) <- InterfaceType(?s),
on-the-fly exception analysis in detail and demonstrate its Superinterface(?t, ?s).
impact on precision, especially for object-sensitive analy-
/** If S is a class representing the array type SC[],
ses. On-the-fly exception analysis is expressible highly el- * that is, an array of components of type SC, then:
egantly in D—another benefit of the declarative speci- * o If T is a class type, then T must be Object.
fication approach. */
CheckCast(?s, "java.lang.Object") <- ArrayType(?s).
• Native methods are simulated in a more principled way.
In P, indirect method calls via native code are some- /** o If T is an array type TC[], that is, an
times not represented explicitly, but shortcut directly from * array of components of type TC, then one
* of the following must be true:
the Java call to the Java method. We model the call graph * + TC and SC are the same primitive type
edges more precisely, which is important if applications */
need a correct call graph. CheckCast(?s, ?t) <- ArrayType(?s), ArrayType(?t),
ComponentType(?s, ?sc),
The declarative approach was of great help in adding ComponentType(?t, ?sc),
PrimitiveType(?sc).
language feature support. A major benefit is that semantic
extensions are well localized and do not affect the basic /** + TC and SC are reference types (2.4.6),
definitions (e.g., those in Section 3.2) at all. In contrast, * and type SC can be cast to TC by
several features in the P framework (e.g., privileged * recursive application of these rules.
*/
actions, finalization, threads) have their implementation span CheckCast(?s, ?t) <- ComponentType(?s, ?sc),
multiple components. ComponentType(?t, ?tc),
A second advantage of the declarative approach is that the ReferenceType(?sc),
logic is high-level and often very close to the Java Language ReferenceType(?tc),
CheckCast(?sc, ?tc).
Specification. A striking example is the implementation of
the logic for the Java cast checking—i.e., the answer to the /** o If T is an interface type, T must be one of
question “can type A be cast to type B?” Figure 3 shows * the interfaces implemented by arrays (2.15).
the full logic, directly from the D implementation, with */
CheckCast(?s, "java.lang.Cloneable") <- ArrayType(?s).
the text of the Java Language Specification in the comments CheckCast(?s, "java.io.Serializable") <- ArrayType(?s).
preceding each rule. As can be seen, the Datalog code is
almost an exact transcription of the Java specification. (The Figure 3. Checkcast implementation in D.
main difference is that the specification is written in a must
ture. This range includes or exceeds practically all precise
style, whereas the Datalog code specifies which casts may
context-sensitive analyses demonstrated to be feasible in
happen. The “must” property is ensured by the least-fixpoint
prior literature. We refer throughout the paper to the pre-
evaluation of Datalog.)
cision characteristics of the analyses in D, especially by
reference to other systems. In order, however, to classify the
3.4 Discussion D-supported analyses in the larger spectrum of pointer
D currently supports a rich range of analyses with
standard precision enhancements from the research litera-
analysis mechanisms, it is convenient to explicitly list the The above list immediately serves to classify the D-
major features for completeness: supported analyses as much more precise and full-featured
than previous declarative pointer analyses in the literature.
Specifically, the bddbddb system [28, 29] lacks in support
• D implements subset-based (or inclusion-based) anal- for many Java features, such as native code, reflection, fi-
yses, which preserve the directionality of assignments (un- nalization, etc., whose handling constitutes a large part of
like equivalence-based analyses). the D analyses. Although sophisticated client analyses
have been implemented on top of bddbddb (e.g., jchord [21])
• There is fully on-the-fly callgraph discovery. Additionally,
these analyses are such that they can tolerate unsound han-
the propagation of analysis facts is limited to reachable
dling of Java features, and they act as pure clients: their
methods (i.e., takes the callgraph into account).
sophistication does not benefit in any way the precision
• The analyses are field-sensitive, which distinguishes be- of the base points-to analysis. Similarly, the quite sophis-
tween the different fields of an object (as opposed to “field- ticated reflection analysis of Livshits et al. [19] is expressed
insensitive”), and between fields of different objects (as op- on top of bddbddb’s points-to analysis, but is not strictly
posed to “field-based”). declarative since it depends on facts computed by a Java
• The analyses can have different kinds of context- pre-analysis, and only applies to context-insensitive analy-
sensitivity (call-site, thread- or object-sensitivity) as well ses. (As mentioned earlier, context-sensitivity in bddbddb is
as a context-sensitive heap abstraction (“heap cloning”). cloning-based and, thus, relies on having a pre-computed
The context of a called method can be chosen from the call-graph. Integrating this with reflection would be non-
current context as well as the context of the receiver ob- trivial.) Furthermore, the reflection analysis of Livshits et al.
ject. produces an incorrect call-graph, because it does not take
into account the possibility of dynamic dispatch for methods
• The analyses are array-element insensitive, i.e. elements
invoked reflectively. This observation is perhaps indicative
of an array are not distinguished.
of the difference between treating language features as an in-
• The analyses take type information into account: points- tegral part of a declarative points-to analysis intended as the
to facts are not propagated if they would violate the JVM basis for sound inferences, vs. separating the base points-to
type system. analysis from language feature support. D’s handling of
• D integrates several specialized precision enhance- reflection can be viewed as analogous to adding a sophisti-
ments. For instance, a straightforward but imprecise way to cated analysis similar to Livshits et al.’s, but in conjunction
model the flow of the receiver object in virtual method dis- with a context-sensitive points-to analysis, to obtain the full
patch is by an assignment of the base variable of the virtual benefit from the mutual increase in precision of both com-
call (?base in Figure 1) to this. This is imprecise, since ponent analyses.
the same virtual method call can invoke different methods, To illustrate the gap in analysis sophistication between
depending on the type of the receiver object. These meth- bddbddb and D (as well as P), we performed the
ods all receive the same points-to set for this if the base same context-insensitive analysis in both frameworks for the
variable is assigned to this. Instead, we combine the as- DaCapo benchmark programs. (DaCapo v.2006-10-MR2,
signment of receiver objects with virtual method dispatch JDK 1.4–j2re1.4.2 18, bddbddb svn revision 654, joeq com-
and assign a specific receiver object (?heap in Figure 1) to piler framework revision 2483.) Compared to D, bddb-
this. This precision improvement is borrowed from P- ddb reports roughly half the reachable methods (max: 74%,
. min: 17%, median: 53%, over the 10 DaCapo applications)
and less than one-quarter of the points-to facts (max: 64%,
• D only considers special methods (constructors, pri-
min: 3%, median: 21%). The discrepancy is due entirely to
vate, and superclass methods) reachable if the base vari-
the incompleteness of the points-to logic in bddbddb, since
able of the invocation points to any objects. Unlike virtual
the analyses have the same inherent precision. (Increased
method invocations, the target of a special method invo-
precision would be unlikely to account for such a dramatic
cation does not depend on the run-time class of the object.
reduction in reachable methods anyway: even the most pre-
Therefore, it is tempting to ignore the objects the base vari-
cise, highly context-sensitive analysis in the D and P-
able points to. However, if the variable does not point to
 set barely reduces the number of reachable methods by
any objects, then the method cannot be invoked. This pre-
3-4%.)
cision improvement is borrowed from P as well.
In the past, researchers have questioned whether it is
• Just as in the P framework, D can achieve some even possible to express purely declaratively a full-featured
of the benefits of flow-sensitivity for local variables, by points-to analysis (comparable to P, which uses im-
applying the analysis on the static single assignment (SSA) perative code with support for relations [17]). Lhoták [15]
form of the program, e.g. the SSA variant of Soot’s Jimple writes:
intermediate representation of Java bytecode.
“[E]ncoding all the details of a complicated program was empty at the beginning of the step. In the second step,
analysis problem (such as the interrelated analyses [on- however, this rule joins the new members of VarPointsTo
the-fly call graph construction, handling of Java features]) from step 1, ∆VarPointsTo1 , with those of input relation
purely in terms of subset constraints [i.e., Datalog] may be Assign. This produces ∆VarPointsTo2 , i.e., the new mem-
difficult or impossible.” bers of VarPointsTo from step 2. The next step only needs
D demonstrates that an elegant declarative specification to join ∆VarPointsTo2 with Assign, in order to produce
is possible and even easy. ∆VarPointsTo3 , and so on.
Although D is a flexible framework, it is not suited to This optimization is straightforward, yet crucial. It is a
all kinds of analyses. A clear limitation, for instance, is that major benefit that we get for free from using a declarative
the context-depth used in the analysis has to be bounded. language for specifying our analysis. There are more benefits
D cannot support analyses that keep an unbounded num- that D receives for free through standard Datalog imple-
ber of calling contexts, even if the number is guaranteed to mentation techniques. Specifically, local join optimization is
be finite (e.g., recursive cycles are flattened). This is due performed: a good order of joins in a single Datalog rule is
to the lack of constructors/functions in Datalog. This ob- automatically determined based on statistics on the size of
servation is unlikely to have any bearing in practice, how- relations and selectivity of joins. This baseline is valuable
ever, since other precision enhancements, such as a context- but still leaves us orders of magnitude away from the perfor-
sensitive heap, have been shown to be a better trade-off mance of a state-of-the-art context-sensitive program analy-
than an unbounded number of contexts [18]. Combining a sis. For this we need optimizations across rules, introduction
context-sensitive heap with even small bounds in context of new database indexes, etc. These optimizations are typi-
sensitivity (e.g., 4-context-sensitive) is sufficient to make an cally not well-automatable: they correspond to producing an
analysis explode in complexity. efficient algorithm from a specification, and require human
intervention.
4. Illustration of D Optimizations In order to execute Datalog programs efficiently, the low-
A declarative specification has advantages in terms of modu- level representation of relations should be compact and an
larity, ease of understanding, and conciseness of expression. indexing scheme should be in place so that all rules are ex-
One more advantage, however, is that it decouples the analy- ecuted efficiently. The LogicBlox Datalog engine used for
sis logic from its implementation, and allows high-level rea- D allows the user to specify maximum cardinalities for
soning about implementation choices. In D we have used the domains of variables (e.g., the maximum number of val-
a novel optimization methodology to convert initial speci- ues for ?var in relation VarPointsTo(?var, ?heap)). These
fications into highly efficient algorithms. Because D is are used to store domain values as integers and all values of
expressed in a version of Datalog that exposes indexing de- variables (keys) in the same relation (?var and ?heap in our
cisions to the language level, we can illustrate the optimiza- example) are packed together in the smallest number of ma-
tions as just Datalog program transformations. chine words possible using bit shifts and mask operations. A
We begin with some background information on Datalog relation is then represented as a sequence of these packed in-
runtimes and the particular engine we use. tegers for which the relation is true. (Alternatively, the user
can specify that the default value for the relation is “false”, in
4.1 Background: Efficient Datalog Evaluation which case the system stores all packed keys for which the
A standard optimization for Datalog (indeed, a virtual pre- relation is false. So far we have not used this capability in
requisite for high performance implementations) is the semi- D because all points-to results are very sparse relations.)
naive evaluation strategy. Semi-naive evaluation keeps track As in all database languages, efficiency of execution typ-
of relation “deltas” on every recursive step, which corre- ically depends on what indexes are defined on the data so
spond to the new facts produced by the step. In this way, the that relational operations can be highly efficient. A unique
next step’s results are derived incrementally by using only feature of the Datalog engine that we use is that the index-
the previous step’s deltas, in all their possible join combina- ing is exposed to the Datalog language level. In this way,
tions with full relations. Consider the evaluation of the ex- introducing and eliminating indexes can be viewed as just a
ample from Section 2, reproduced below: program transformation, instead of needing to edit the data
schema or other configuration files. Specifically, a relation,
1 VarPointsTo(?var, ?heap) <-
e.g., VarPointsTo(?var, ?heap), is stored with its contents
2 AssignHeapAllocation(?var, ?heap).
3 VarPointsTo(?to, ?heap) <-
(pairs of packed variable values) ordered by innermost vari-
4 Assign(?from, ?to), VarPointsTo(?from, ?heap). able, i.e., ?heap, and then by the next innermost variable,
i.e., ?var, etc. The relation is indexed using a B-tree with a
Initially, relation VarPointsTo is empty. The first key consisting of all variables together. Since, however, a B-
step populates relation VarPointsTo with the facts from tree is an ordered map, knowing the value of the innermost
AssignHeapAllocation, as dictated by lines 1-2. The rule variable alone is sufficient for efficient indexing. (I.e., the in-
in lines 3-4 has nothing to contribute, since VarPointsTo
nermost variable is the major index, the second innermost sented by ?from, and an identifier for field fld captured by
is the next major index, etc.) Thus, variable ordering is very ?signature.
important. The user can change the indexing efficiency to op- Our simple analysis can then be elaborated: (The first two
timize joins, by just reordering variables. For instance, a join rules are the same but two more rules are added.) A new
between two relations is very fast if both relations have the relation, InstanceFieldPointsTo, is used to compute which
join variables in their innermost positions and in the same heap object (?baseheap) can point to which other (?heap)
order. In that case, both relations just need to be traversed through a given field (?signature).
linearly and their contents merged. Another scheme for an 1 VarPointsTo(?var, ?heap) <-
efficient join is when joining over the innermost variable of 2 AssignHeapAllocation(?var, ?heap).
one relation and the second relation is small (so it can be 3 VarPointsTo(?to, ?heap) <-
iterated exhaustively and bind the index variable of the first 4 Assign(?from, ?to), VarPointsTo(?from, ?heap).
5 VarPointsTo(?to, ?heap) <-
relation). As a rule of thumb, when a relation is known to be 6 LoadInstanceField(?base, ?signature, ?to),
small, the local query optimizer will automatically choose 7 VarPointsTo(?base, ?baseheap),
to perform the join by iterating exhaustively over its con- 8 InstanceFieldPointsTo(?baseheap, ?signature, ?heap).
9
tents. The iteration will bind variables of other relations be-
10 InstanceFieldPointsTo(?baseheap, ?signature, ?heap) <-
ing joined. These variables should be in the innermost posi- 11 StoreInstanceField(?from, ?base, ?signature),
tions, so that their values can be used for efficient indexing. 12 VarPointsTo(?base, ?baseheap),
Our optimization methodology, described next, exploits this 13 VarPointsTo(?from, ?heap).
technique, in particular considering semi-naive evaluation. Reordering Transformation. The above is a straightfor-
In summary, the use of Datalog in D separates the
ward way to express the analysis, but the resulting program
specification of an analysis from its implementation, there- is highly inefficient. (Recall that the order of variables in the
fore allowing multiple techniques for efficient execution, all above relations reflects how the relations are indexed.) In
expressed at the level of Datalog evaluation. Our current particular, the joins of line 4, 6-8, and 11-13 are all costly.
Datalog engine is in many ways mature, but only uses very In line 4, neither relation has the join variable in its inner-
simple data structures (B-trees and an explicit representation
most position. In particular, relation VarPointsTo is recur-
of relations). It is tempting in the future to consider alterna-
sive. After the first step, Datalog’s semi-naive evaluation will
tive Datalog execution techniques (e.g., the option to trans- only need to join the delta of the VarPointsTo relation (i.e.,
parently use BDDs to represent relations) especially if these a small relation) to produce the new results for the next step.
are provided in a well-engineered implementation. Therefore, it makes sense to reorder the variables of rela-
tion Assign so that it is indexed efficiently based on vari-
4.2 Optimization Methodology able bindings produced by VarPointsTo. That is, the pro-
Based on this understanding of Datalog evaluation and op- gram will be more efficient if relation Assign is stored as
Assign(?to, ?from) rather than Assign(?from, ?to), be-
timization opportunities, we next present the optimization
techniques we use in D through examples. cause variable ?from is bound by iterating over the contents
of small relation ∆VarPointsTo. (Of course, this decision on
Consider a refinement of our above rudimentary two-
how to store Assign may adversely affect joins in other parts
rule pointer analysis logic. We will add to our analy-
sis field sensitivity: heap objects can be stored to and of the program—we will soon see how to resolve this.) Sim-
loaded from instance fields and the analysis keeps track ilar observations apply to the joins in lines 6-8 and 11-13: no
of such actions. (This example ignores other language relation has a join variable in its innermost position. Just by
applying simple reorderings we can produce a much more
features such as method calls—i.e., we assume the ana-
efficient implementation:
lyzed program is just a single main function.) Two new
input relations are derived from the code of a Java pro- 1 VarPointsTo(?heap, ?var) <-
gram: LoadInstanceField(?base, ?signature, ?to) and 2 AssignHeapAllocation(?heap, ?var).
3 VarPointsTo(?heap, ?to) <-
StoreInstanceField(?from, ?base, ?signature). The 4 Assign(?to, ?from), VarPointsTo(?heap, ?from).
former tracks a load from the object referenced by vari- 5 VarPointsTo(?heap, ?to) <-
able ?base in the field identified by ?signature. If, for in- 6 LoadInstanceField(?to, ?signature, ?base),
stance, the Java program contains an action “x = v.fld;”, 7 VarPointsTo(?baseheap, ?base),
8 InstanceFieldPointsTo(?heap, ?signature, ?baseheap).
then LoadInstanceField contains an entry with ?base be- 9
ing the representation of Java variable “v”, ?signature 10 InstanceFieldPointsTo(?heap, ?signature, ?baseheap) <-
identifying field “fld”, and ?to corresponding to “x”. 11 StoreInstanceField(?from, ?signature, ?base),
12 VarPointsTo(?baseheap, ?base),
StoreInstanceField tracks store actions in a similar
13 VarPointsTo(?heap, ?from).
manner: Every Java program action “v.fld = u;” corre-
sponds to an entry in StoreInstanceField(?from, ?base, Folding Transformation. The idea we used in the above
?signature), with v represented by variable ?base, u repre- transformation is general. The key novel principle of our op-
timization methodology is that, for highly recursive Data- remaining variables so that the join with the third relation is
log programs (such as our points-to analyses), the primary highly efficient. This results in the following optimized pro-
determinant of performance is whether the relation deltas gram, with intermediate relation StoreHeapInstanceField
produced by semi-naive evaluation bind all the variables introduced.
needed to index into other relations. In this way, exhaus-
tive traversal of non-deltas is avoided. To achieve this effect, 1 VarPointsTo(?heap, ?var) <-
2 AssignHeapAllocation(?heap, ?var).
we often need to introduce new indexes. Since in our Dat- 3 VarPointsTo(?heap, ?to) <-
alog engine an index is always tied to the order of relation 4 Assign(?to, ?from), VarPointsTo(?heap, ?from).
variables, to obtain a new index we need to introduce new 5 VarPointsTo(?heap, ?to) <-
relations. This is done through applications of the folding 6 LoadInstanceField(?to, ?signature, ?base),
7 VarPointsTo(?baseheap, ?base),
program transformation [5]. Folding introduces a temporary 8 InstanceFieldPointsTo(?heap, ?signature, ?baseheap).
relation that holds the result of intermediate joins. This can 9

improve performance in many ways. First, it can re-order 10 InstanceFieldPointsTo(?heap, ?signature, ?baseheap) <-
11 StoreHeapInstanceField(?baseheap, ?signature, ?from),
variables in the intermediate relation and, thus, introduce a
12 VarPointsTo(?heap, ?from).
new index, so that further joins are more efficient. Second, it 13
can cache intermediate results, implementing the “view ma- 14 StoreHeapInstanceField(?baseheap, ?signature, ?from) <-
terialization” database optimization. Third, it can be used to 15 StoreInstanceField(?from, ?signature, ?base),
16 VarPointsTo(?baseheap, ?base).
guide the query optimizer to perform joins between smaller
relations first, so as to minimize intermediate results. Fi-
nally, it can be used to project out unnecessary variables, Note that the last two rules only contain relations with
the same innermost variables, therefore any delta-execution
thus keeping intermediate results small.
of those rules is efficient. Implicitly, this is achieved because
Many of these benefits can be obtained in our simple
pointer analysis program. Consider the 3-way join in lines the folding also adds a new index, for the new intermediate
11-13 of the above “optimized” program. Since relation relation.
VarPointsTo is recursive and used twice, either of its in-
The above program still admits more optimization, as one
stances can be thought of as a “small” relation from the more inefficient join remains. Consider the joins in lines
6-8 of the above program. Both relation VarPointsTo and
perspective of join efficiency. Specifically, under semi-naive
relation InstanceFieldsPointsTo are recursively defined.
evaluation, one can think of the above rule (in lines 10-13)
as equivalent to the following delta-rewritten program: (There is direct recursion in VarPointsTo, as well as mu-
tual recursion between them.) Thus, after the first step, their
∆InstanceFieldPointsTo(?heap, ?signature, ?baseheap) <- deltas will be joined with the full other relations. Specifi-
StoreInstanceField(?from, ?signature, ?base), cally, in semi-naive evaluation the above rule (lines 5-8) is
∆VarPointsTo(?baseheap, ?base),
VarPointsTo(?heap, ?from).
roughly equivalent to:
∆InstanceFieldPointsTo(?heap, ?signature, ?baseheap) <-
StoreInstanceField(?from, ?signature, ?base), ∆VarPointsTo(?heap, ?to) <-
VarPointsTo(?baseheap, ?base), LoadInstanceField(?to,?signature,?base),
∆VarPointsTo(?heap, ?from). ∆VarPointsTo(?baseheap,?base),
InstanceFieldPointsTo(?heap, ?signature, ?baseheap).
(We elide version numbers, since we are just making an ∆VarPointsTo(?heap, ?to) <-
LoadInstanceField(?to, ?signature, ?base),
efficiency point. Note that the deltas are also part of the VarPointsTo(?baseheap, ?base),
full relation—i.e., they are the deltas from the previous step. ∆InstanceFieldPointsTo(?heap, ?signature, ?baseheap).
Hence, we do not need a third rule that joins two deltas
together.) As before, the performance problem is with the second
The first rule is fairly efficient as-is: the delta relation delta rule: the innermost variable of the large relations is
binds variable ?base, which is used to index into rela- not bound by the delta relation. It is tempting to try to
tion StoreInstanceField and bind variable ?from, which eliminate the inefficiency with a different variable order,
is used to index into relation VarPointsTo(?heap, ?from). without performing more folds. Indeed, we could optimize
The second rule, however, would be disastrous if executed the joins in lines 3-8 without an extra fold, by reordering the
as-is: none of the large relations has its innermost variable variables of VarPointsTo as well as LoadInstanceField—
bound by the delta relation. We could improve the perfor- the latter so that ?signature is last. This would conflict with
mance of the second rule by reordering the variables of the joins in lines 10-16, however, and would require further
StoreInstanceField but there is no way to do so without rewrites.
destroying the performance of the first rule. Therefore, the inefficiency can be resolved with a fold,
This conflict can be resolved by a fold. We introduce a which will also reorder variables so that all joins are highly
temporary relation that captures the result of a two-relation efficient: the joined relations always have a common in-
join, projects away unnecessary variables, and reorders the nermost variable. We introduce the intermediate relation
LoadHeapInstanceField, and get our final highly-optimized to mere seconds. Furthermore, the optimizations are robust
program: with respect to the different analysis variants supported in
D. The same optimized trunk of code is used for analy-
1 VarPointsTo(?heap, ?var) <-
2 AssignHeapAllocation(?heap, ?var). ses with several different kinds of context sensitivity.
3 VarPointsTo(?heap, ?to) <-
4 Assign(?to, ?from), VarPointsTo(?heap, ?from). 5. D Performance
5 VarPointsTo(?heap, ?to) <-
6 LoadHeapInstanceField(?to, ?signature, ?baseheap), We next present performance experiments for D, and
7 InstanceFieldPointsTo(?heap, ?signature, ?baseheap). especially contrast it with P—a BDD-based framework
8
that is state-of-the-art in terms of features and scalability.
9 LoadHeapInstanceField(?to, ?signature, ?baseheap) <-
10 LoadInstanceField(?to, ?signature, ?base),
Because of the variety of experimental results, a roadmap is
11 VarPointsTo(?baseheap, ?base). useful:
12

13 InstanceFieldPointsTo(?heap, ?signature, ?baseheap) <- • We first evaluate D in “P-compatibility” mode. In


14 StoreHeapInstanceField(?baseheap, ?signature, ?from), this mode, D results are precisely equivalent to P.
15 VarPointsTo(?heap, ?from). This, however, means that the analysis does not support ex-
16

17 StoreHeapInstanceField(?baseheap, ?signature, ?from) <-


ceptions, which D treats very differently. In this mode,
18 StoreInstanceField(?from, ?signature, ?base), D is much faster (6.6x to 16.3x in median speedup) than
19 VarPointsTo(?baseheap, ?base). P for standard context-sensitive analyses (1-call-site,
1-call-site+heap, 1-object, 1-object+heap).
Programmer Insights. Note that the above optimization
decisions are intuitively appealing, although no intuition was • We then compare the full analyses of D with the full
used in deriving them. For instance, a programmer with an P. The D analyses are not exactly equivalent, but
understanding of the domain will likely prefer this order- are strictly more precise than their P counterparts.
ing of variables in VarPointsTo. (Recall that the innermost In this “full-mode”, D outperforms P by 10x for
variable yields the most important indexing with our B-tree call-site-sensitive analyses (including heavier ones, such
ordering.) The relation seems intuitively much more useful as 1-call-site+heap) scales similarly or better than P
when treated as a map of program variables to heap objects, for even the heaviest object-sensitive analyses in P’s
rather than as a map of heap objects to variables that can experiment set, and even handles analyses that P does
point to them. Values flow through variables in a points-to not, such as a 2-call-site-sensitive and a 2-object-sensitive
analysis, not through heap objects directly. analysis, both with a context-sensitive heap.
Additionally, the introduction of temporary relation • Finally, we discuss the lessons learned from comparing an
LoadHeapInstanceField orders the three-way join of explicit representation approach with a BDD-based one.
LoadInstanceField, VarPointsTo, and InstanceFieldPointsTo We see that the performance discrepancy between D
so that the first two relations are joined first. This is and P is well-explained when one considers the total
good, since LoadInstanceField is likely smaller than size of BDDs for the call-graph, var-points-to, and field-
InstanceFieldPointsTo: the former is an input relation, points-to relations. The numbers cast doubt on whether
with its contents in one-to-one correspondence with a sub- BDDs can be the best representation of relations in anal-
set of program instructions, while the latter is inferred from yses similar to the ones we have considered.
a subset of program instructions joined with the (multiply
Preliminaries and Experimental Setup. We use a 64-bit
recursive) points-to relation, resulting in a transitive closure
machine with two quad-core Xeon 2.33GHz CPUs (only one
computation.
thread was active at a time, except for P runs, where
Such insights can sometimes guide the optimization ef-
the Java garbage collector ran on a different thread). The
fort, but they are just heuristics. In the end, we have not
machine has 16GB of RAM and 4MB of L2 cache (actually
found dramatic performance differences between optimiza-
8MB of L2 cache per CPU, but every 2 cores share 4MB).
tion paths that both end up with joins that are syntactically
(For comparison, the P study [18] was conducted on
efficient, i.e., have the join variables in innermost positions
a fairly comparable 4-way 2.6GHz Opteron machine, also
and always bound by a recursive relation so that its delta is
with 16GB of RAM. Although we do not compare absolute
used. This syntactic criterion is, more than anything else, the
numbers with that study, it is useful for context to know
primary determinant of performance.
that qualitative scalability estimates are not due to hardware
Impact. Perhaps surprisingly, the above compact set of op- discrepancies.)
timizations and insights are the main source of the efficiency We analyzed the DaCapo benchmark programs, v.2006-
of D, compared to a naive Datalog implementation. Ap- 10-MR2, with JDK 1.4 (j2re1.4.2 18), which is much larger
plying these optimizations on a full pointer analysis for re- than JDK 1.3 used (with the same programs) by Lhoták
alistic programs results in improvements of over 3 orders and Hendren [18]. Since recent points-to analysis algorithms
of magnitude: run-time is often dropped from many hours (e.g., [10, 11]) claim scaling to “a million lines of code”, we
should point out that our benchmarks are the largest in the There is one algorithmic enhancement applied to D as
literature on context-sensitive points-to analysis. well as P in P-compatibility mode: D treats
We contacted P’s author Ondřej Lhoták to confirm static initializer methods (clinit) context-insensitively.
input parameters for optimal performance (including opti- Static initializers are not affected by any context, so they can
mal BDD variable orderings). The initial settings of the anal- be treated context-insensitively for all of the context abstrac-
ysis are identical to those in the most recent P study tions we study. (For very different kinds of analyses, e.g.,
[18]. P takes an option for an initial number of BDD a thread-sensitive analysis, this will not be true.) This en-
nodes to allocate, which can be used to reduce garbage col- hancement (as well as other logical enhancements discussed
lection. We do not use this option for several reasons. 1) This later) is not significant for P (it strictly improves per-
initial number is also the maximum number of nodes, which formance but only marginally), because P can avoid
requires knowing up-front how complex an analysis will be redundancy through its use of BDDs. The enhancement is,
for a specific benchmark. 2) Setting this number to the maxi- however, important for D’s explicit representation of re-
mum required value would immediately consume all virtual lations.
memory, independent of the specific benchmark. For com- Notably, for the experiments in P-compatibility
parison, we want memory consumption to be proportional mode, both D and P ignore control- and data-flow
to the complexity of an analysis. 3) The performance bene- induced by exceptions. This is necessary, since the D
fit of setting an initial number of BDD nodes was limited in handling of exceptions is significantly different from P-
our experiments (less than 10%), and does not change any ’s.
conclusions. Figures 4 to 8 display the execution times of D vs.
When referring to different analyses we use the prefixes P for five analyses, ranging from context-insensitive
“N-call-site-sensitive”, “N-call-site”, or just “N-call” for an to 1-object-sensitive+heap. As can be seen, D is an
N-call-site-sensitive analysis, and “N-object-sensitive” or order of magnitude faster than P for the context-
just “N-object” for an N-object-sensitive analysis, as well as insensitive analysis (min: 7.4x, max: 10.9x, median: 10x),
the suffixes “+N-heap” or just “+NH” for an analysis with the 1-call-site-sensitive analysis (min: 7.9x, max: 19.6x,
a context-sensitive heap with N (object or call-site) contexts median: 16.3x), and the 1-object-sensitive analysis (min:
kept. (We omit the N if it can only be 1.) E.g., “2-call+1H” 3.2x, max: 18.1x, median: 15.2x). For the heavier analyses,
designates a 2-call-site-sensitive analysis with a context- D is almost always significantly faster than P, ex-
sensitive heap using 1 call-site as context for heap object ab- cept for the bloat benchmark and a 1-object-sensitive+heap
stractions; “1-call+H” designates a 1-call-site-sensitive anal- analysis. Specifically, D exhibits a median speedup of
ysis with a context-sensitive heap (which can only have 1 7.3x for the 1-call-site-sensitive+heap analysis (min: 1.8x,
call-site as context). max: 9.2x) and a median speedup of 6.6x for the 1-object-
We consider any analysis that takes more than 7200 sec- sensitive+heap analysis (min: 0.9x, max: 7.3x). (Recall that
onds (2 hours) to have failed. the latter is the analysis that Lhoták and Hendren considered
The software, benchmark scripts, and more statistics are to require a research breakthrough to implement efficiently
available at http://doop.program-analysis.org/oopsla09. without BDDs.)
The analysis times in seconds illustrate the significance
5.1 P-Compatibility
of the speedup: for most programs, analysis time is dropped
We first evaluate D and P in a mode in which the from several hundreds of seconds to just a few tens of sec-
results are equivalent. We worked hard to ensure semantic onds.
equivalence to a high degree. All operations on relations are Generally, D in P-compatibility mode scales
designed to be logically equivalent. That is, all propagation very well even to much more complex analyses (e.g.,
of facts and all intermediate relations are virtually identical. 2-object+heap). Nevertheless, recall that the P-
Comparing the results of pointer analyses is challenging compatibility mode does not support Java exception han-
because of many minor differences between the analyses. dling. Adding exception handling in a way that is compati-
Also, minor differences frequently propagate everywhere, ble with P would artificially distort D performance.
making it difficult to locate the source of an issue. Never- P exception handling is highly imprecise, treating ev-
theless, we achieved exact equivalence of reachable meth- ery exception throw as an assignment to a single global vari-
ods, reachable method contexts, context-sensitive call graph able. The variable is then read at the site of an exception
edges, instance field points-to, static field points-to, and vari- catch. This approach ignores the information about what
able points-to information. We compared the results auto- exceptions can propagate to a catch site: all catch sites be-
matically and report any differences. The various improve- come related with all type-compatible throw sites and with
ments of D over P in support for Java language fea- each other. This very approximate treatment affects the pre-
tures (Section 3.3) have been patched in Paddle and submit- cision of the analysis results but barely affects performance
ted as bug reports (e.g., reference object support), or disabled for P: the BDD representation of relations tolerates the
in D (e.g., reflection analysis) for this comparison.
350
doop
redundancy in the computed relations (e.g., a higher num-
paddle
300 ber of facts in the call-graph or the var-points-to relations)
since the extra facts are highly regular. The BDD represen-
analysis time (seconds)

250
tations of relations in P are hardly larger, even with
200
significant exception-handling-induced imprecision. In con-
150
trast, D’s explicit representation of relations cannot tol-
100 erate the addition of such “regular” imprecision without suf-
50 fering performance penalties. This phenomenon is perhaps
0
counter-intuitive: D performs much better when impre-
antlr bloat chart eclipse hsqldb jython luindex lusearch pmd xalan
cision is avoided, which is also a desirable feature for the
Figure 4. (P-compatibility mode) context-insensitive quality of the analysis.
1400
doop
paddle 5.2 Full D Performance and Precision
1200
Our main experimental results compare the full version of
analysis time (seconds)

1000
D with the full P, and present detailed statistics on
800
the precision of D analyses.
600
The full mode of D is not exactly equivalent to the full
400 P, yet the D analysis logic is always strictly more
200 precise and more complete, resulting in higher-quality anal-
0
yses. The differences are in the more precise and complete
antlr bloat chart eclipse hsqldb jython luindex lusearch pmd xalan
handling of reflection, more precise handling of exceptions,
Figure 5. (P-compatibility mode) 1-call etc.
4000
doop
Figures 9 to 16 compare the performance of D and
paddle
3500 P. (The analyses presented are a representative selec-
3000 tion for space and layout reasons.) This range of analy-
analysis time (seconds)

2500 ses reproduces the most demanding analyses in Lhoták and


2000 Hendren’s experiment set [18] and includes analyses that
1500 even exceed the capabilities of P: 2-call+1-heap, 2-
1000 object+1-heap, and 2-call+2-heap. As can be seen, D
500 is often significantly faster, especially for call-site-sensitive
0
analyses (e.g., a large speedup for 1-call-site—min: 5.0x,
antlr bloat chart eclipse hsqldb jython luindex lusearch pmd xalan
max: 12.9x, median: 9.7x—and for 1-call-site+heap—min:
Figure 6. (P-compatibility mode) 1-call+H 2.3x, max: 16.7x, median: 12.3x).
800
doop
D is not as fast for object-sensitive analyses, but recall
paddle
700 that it performs a much more precise analysis than P
600 because of its precise exception handling. On-the-fly excep-
analysis time (seconds)

500 tion handling results in a dramatic, 2x increase in var points-


400 to precision (i.e., on average each variable is inferred to point
300 to half as many objects) for object-sensitive analyses [3].
200 Still, D outperforms P for the vast majority of data
100 points, even for the heaviest analyses in the P set. For
0 the 1-object+heap analysis D is faster for 8 out of 10
antlr bloat chart eclipse hsqldb jython luindex lusearch pmd xalan
benchmarks (min: 0.4x, max: 4.0x, median: 3.0x). The only
Figure 7. (P-compatibility mode) 1-object benchmark for which D is significantly slower is xalan,
1800
doop but this outlier is due to P’s less complete reflection
1600 paddle
analysis. P misses a large part of the call graph (only
1400
reports 3722 reachable methods, instead of 6468 reported by
analysis time (seconds)

1200
D) and analyzes much less code.
1000

800
The significance of these results cannot be overstated:
600
The conventional wisdom has been that such analyses can-
400
not be performed without BDDs. For instance, Lhoták and
200 Hendren write regarding the P study: “It is the use
0 of BDDs and the P framework that finally makes this
antlr bloat chart eclipse hsqldb jython luindex lusearch pmd xalan
study possible. Moreover, some of the characteristics of the
Figure 8. (P-compatibility mode) 1-object+H analysis results that we are interested in would be very costly
400 1600
doop doop
paddle paddle
350 1400

300 1200
analysis time (seconds)

analysis time (seconds)


250 1000

200 800

150 600

100 400

50 200

0 0
antlr bloat chart eclipse hsqldb jython luindex lusearch pmd xalan antlr bloat chart eclipse hsqldb jython luindex lusearch pmd xalan

Figure 9. (Full mode) context-insensitive Figure 13. (Full mode) 1-call


1400 7000
doop doop
paddle paddle
1200 6000
analysis time (seconds)

analysis time (seconds)


1000 5000

800 4000

600 3000

400 2000

200 1000

0 0
antlr bloat chart eclipse hsqldb jython luindex lusearch pmd xalan antlr bloat chart eclipse hsqldb jython luindex lusearch pmd xalan

Figure 10. (Full mode) 1-object Figure 14. (Full mode) 1-call+H
4500 2500
doop doop
4000 paddle

3500 2000
analysis time (seconds)

analysis time (seconds)

3000
1500
2500

2000
1000
1500

1000 500
500

0 0
antlr bloat chart eclipse hsqldb jython luindex lusearch pmd xalan antlr bloat chart eclipse hsqldb jython luindex lusearch pmd xalan

Figure 11. (Full mode) 1-object+H Figure 15. (Full mode) 2-call+1H
1400 7000
doop doop
1200 6000
analysis time (seconds)
analysis time (seconds)

1000 5000

800 4000

600 3000

400 2000

200 1000

0 0
antlr bloat chart eclipse hsqldb jython luindex lusearch pmd xalan antlr bloat chart eclipse hsqldb jython luindex lusearch pmd xalan

Figure 12. (Full mode) 2-object+1H Figure 16. (Full mode) 2-call+2H

to measure on an explicit representation.” [18] (Recall also supported by the P framework, while the third is too
that the P study analyzed the DaCapo benchmarks with heavy to run. (In our tests, the analysis times-out even for
the smaller JDK 1.3.1 01.) the smallest of the DaCapo benchmarks. Lhoták also reports
The last three analyses of our set (2-call+1-heap, 2- that “[He] never managed to get P to run in available
object+1-heap, and 2-call+2-heap) are more precise than memory with these settings”.2 )
any context-sensitive analyses ever reported in the research The range of D-supported analyses allows us to obtain
literature. With a time limit of 2 hours, D analyzed most insights regarding analysis precision. Figure 17 shows some
of the DaCapo applications under these analyses. All three
analyses are impossible with P. The first two are not 2 http://www.sable.mcgill.ca/pipermail/soot-list/2006-March/000601.html
of the most important statistics on our analyses’ results for analysis nodes edges var points-to
representative programs. Perhaps the most informative met- insens 4510 24K 2.8M 67 - -
ric is the average points-to set size for plain program vari- 1-call 4498 24K 897K 22 4.9M 31
ables.3 The precision observations are very similar to those 1-call+H 4495 24K 887K 22 14M 90

antlr
in the P study: object-sensitivity is very good for en- 2-call+1H 4484 23K 719K 18 48M 84
suring points-to precision, and a context-sensitive heap can 2-call+2H 4451 23K 570K 14 79M 171
1-obj 4486 24K 748K 18 4.7M 16
only serve to significantly enhance the quality of results. We
1-obj+H 4435 23K 435K 11 25M 86
can immediately see the value of our highly precise analyses,
2-obj+1H 4382 22K 264K 7 7.8M 8
and especially the combination of a 2-object-sensitive anal- insens 7873 41K 5.9M 84 - -
ysis with a context-sensitive heap. This most precise analy- 1-call 7820 40K 2.6M 36 18M 66
sis typically drops the average points-to set size to one-tenth 1-call+H 7816 40K 2.5M 36 43M 162

chart
of the size of the least precise (context insensitive) analy- 2-call+1H 7800 40K 2.2M 31 202M 173
sis. Remarkably, this even impacts the number of call-graph 2-call+2H × × × × × ×
edges—a metric that notoriously improves very little with 1-obj 7803 40K 2.4M 34 18M 27
increasing the precision of the points-to analysis. In future 1-obj+H 7676 37K 1.2M 17 81M 123
work we expect to conduct a thorough evaluation of the pre- 2-obj+1H 7570 35K 414K 6 24M 7
cision of a wide range of analyses for several end-user met- insens 5536 27K 3.5M 73 - -
rics. 1-call 5519 26K 1.1M 22 5.8M 31
1-call+H 5516 26K 1.0M 22 16M 89

pmd
5.3 BDDs vs. Explicit Representation 2-call+1H 5506 26K 925K 20 65M 94
2-call+2H 5473 25K 803K 17 136M 219
Generally, the performance differences between D and 1-obj 5504 26K 964K 21 5.2M 15
P are largely attributable to the use of BDDs vs. an 1-obj+H 5440 25K 682K 15 25M 77
explicit representation of relation contents. The comparison 2-obj+1H 5372 24K 302K 7 7.4M 7
of the two systems reveals interesting lessons regarding the insens 6580 33K 3.4M 62 - -
representation of relations in points-to analysis. 1-call 6568 33K 1.4M 25 7.5M 35
BDDs are a maximally reduced data structure (for a given 1-call+H 6565 33K 1.4M 25 22M 104
xalan

variable ordering) so they naturally trade off some time for 2-call+1H 6551 32K 1.2M 22 78M 88
space, at least for large relations. Furthermore, BDDs have 2-call+2H 6505 32K 939K 17 125M 170
1-obj 6549 33K 1.2M 22 19M 30
heavy overheads, in the case of irregular relations that cannot
1-obj+H 6468 31K 696K 13 106M 173
be reduced. Consider the worst-case scenario for BDDs: a
2-obj+1H × × × × × ×
relation with a single tuple. The BDD representation in P-
 uses a node per bit—e.g., the single tuple in a relation Figure 17. Precision statistics of D analyses for a subset
over a 48-bit variable space will be represented by 48 BDD of the DaCapo benchmarks. The columns show call-graph
nodes. Each node in the BuDDy library (used by P) nodes and edges, as well as total and average (per variable)
is 20 bytes, or 160 bits. This represents a space overhead points-to facts, first for plain program variables and then for
of 160x, but it also represents a time overhead, since what “context-sensitive variables” (i.e., context-variable tuples).
would be a very quick operation in an explicit tuple repre-
sentation now requires traversing 48 heap objects (allocated P spent 40x more time, 957 seconds, creating the BDD,
in a single large region, but with no structure-locality). but then performed the join in just 0.527 seconds. In terms
The difficulty in analyzing the trade-off is that results on of space, the BDD representation of the 7 million tuples con-
smaller data sets and operations do not translate to larger sisted of just 148.7 thousand nodes—less than 3MB of mem-
ones. For instance, we tried a simple experiment to com- ory! This demonstrates how different the cost model is for
pare the join performance of D and P, without any the two systems. If P can exploit regularity and build
other recursion or iteration. We read into memory two pre- a new BDD through efficient operations on older ones, then
viously computed points-to analysis relations (including the its performance is unparalleled. Creating the BDD, however,
VarPointsTo relation, for which P’s BDD variable or- can often be extremely time consuming. Furthermore, a sin-
der is highly optimized) and computed their join. The fully gle non-reducible relation can become a bottleneck for the
expanded relation size in D was a little over 1GB, or whole system. Thus, it is hard to translate the results of mi-
7 million tuples. D performed the join in 24.4 seconds. crobenchmarks to more complex settings, as the complexity
3 Note
of BDDs depends on their redundancy.
the apparent paradox of having the average number of var-points-to
To gain a better understanding of performance, we ana-
facts often be higher when computed over context-sensitive variables than
over plain variables. Although each context-sensitive variable has fewer lyzed the sizes of BDDs in P for some major relations
points-to facts than its context-insensitive version, the average over all in its analyses, relative to the size of the explicit representa-
context-sensitive variables can be higher: program variables that have many tions of the same relations. Figure 18 shows the sizes of rela-
points-to facts are also used in many more contexts, skewing the results.
tions “nodes” (representing the context-sensitive call-graph ings “that are simultaneously good for the many BDDs in a
nodes, i.e., context-qualified reachable methods), “edges” system of interrelated analyses” [15]. It does not, therefore,
(i.e., context-sensitive call-graph edges), var points-to (the seem likely that BDDs will be the best representation option
main points-to relation, for context-qualified vars), and field for precise context-sensitive points-to analyses without sig-
points-to (the points-to relation for object fields). For each nificant progress in our understanding of how BDDs can be
relation, the table shows the size of its explicit represen- employed.
tation (measured in number of rows—i.e., number of total
facts in the relation), the size of the BDD representation (in 6. Related and Future Work
number of BDD nodes) and the ratio of these two numbers— Fast and Precise Pointer Analysis. There is an immense
although they are in different units the variation of the ratios body of work on pointer analysis, so we need to restrict
is highly informative. our discussion to some representative and recent work. Fast
The above numbers are for P as configured for our and precise pointer analysis is, unfortunately, still a trade-
P-compatibility experiments, so that the BDD statis- off. This is unlikely to change. Most recent work in pointer
tics can be directly correlated to the performance of D analysis explores methods to improve performance by re-
(explicit representation) vs. P (BDDs). Examination of ducing precision strategically. The challenge is to limit the
the table in comparison with Figures 4-8 reveals that the per- loss of precision, yet gain considerably in performance. For
formance of P relative to D is highly correlated with instance, Lattner et al. show [14] that an analysis with a
the overall effectiveness of BDDs for relation representation. context-sensitive heap abstraction can be very efficient by
For benchmarks and analyses for which P performs sacrificing precision using unification constraints. This is a
better compared to D, we find that all four relations (or at common sacrifice. Furthermore, there are still considerable
least the largest ones, if their size dominates the sizes of oth- improvements possible in solving the constraints of the clas-
ers) exhibit a much lower ratio of BDD-nodes-to-facts than sic inclusion-based pointer analysis of Andersen, as illus-
in other benchmarks or analyses. Consider, for instance, the trated by Hardekopf and Lin [10].
1-object+heap analysis. The BDD size statistics reveal that In full context-sensitive pointer analysis, there is an on-
bloat and jython are significant outliers compared to the going search for context abstractions that provide precise
rest of the DaCapo applications: their BDD-nodes-to-facts pointer information, and do not cause massive redundant
ratios are much lower for all large relations. A quick com- computation. Milanova suggested that an object-sensitive
parison with Figure 8 reveals that P performs unusually analysis [20] is an effective context abstraction for object-
well for these two benchmarks. oriented programs, which was confirmed by Lhoták’s exten-
This understanding of the performance model for the sive evaluation [18]. Several researchers have argued for the
BDD-based representation leads to further insights. The ul- benefits of using a context-sensitive heap abstraction to im-
timate question we want to answer is whether (and under prove precision [18, 22].
what conditions) there is enough regularity in relations in- The use of BDDs attempts to solve the problem of the
volved in points-to analyses for BDDs to be the best rep- large amount of data in context-sensitive pointer analysis by
resentation choice. Figure 18 suggests that this is not the representing its redundancy efficiently [2, 29]. The redun-
case, at least for the analyses studied here. The main way dancy should ideally be eliminated by choosing the right
to improve the performance of the BDD representation is by context abstraction. Xu and Rountev’s recent work [30] ad-
changing the BDD variable ordering. The BDD variable or- dresses this problem. Their method aims to determine con-
dering used in our P experiments is one that minimizes text abstractions that will yield the same points-to informa-
the size of the var points-to relation (which, indeed, consis- tion. This is an exciting research direction, orthogonal to our
tently has a small BDD-nodes-to-facts ratio in Figure 18). work on declarative specifications and optimization. How-
This order was observed by Lhoták to yield the best results ever, in their specific implementation, memory consumption
in terms of performance. (It is worth noting that the P- is growing quickly for bigger benchmarks, even on Java 1.3.
 authors were among the first to use BDDs in program IBM Research’s W [7] static analysis library is de-
analysis, have a long history of experimentation in multiple signed to support different pointer analysis configurations,
successive systems, and have experimented extensively with but no results of W’s accuracy or speed have been re-
BDD variable orderings until deriving ones that yield “im- ported in the literature. It will be interesting to compare our
pressive results” [2].) Nevertheless, what we see in Figure 18 analyses to W in future work.
is that it is very hard to provide a variable ordering that min-
imizes all crucial BDDs. Although the var points-to relation Reflection and Program Analysis. Reflection, dynamic
is consistently small, the (context-sensitive) call-graph edge class loading, and native methods are a major issue for static
relation is inefficient and it is usually large enough to matter. program analysis. P inherits support for many native
All current techniques utilizing BDDs for points-to analy- methods from its predecessor, S [16]. Paddle’s support
sis (e.g., in bddbddb or P) require BDD variable order- for reflection is relatively unsophisticated compared to the
reflection analysis of Livshits specified in Datalog on top
call-graph nodes call-graph edges var points-to field points-to
facts bdd ratio facts bdd ratio facts bdd ratio facts bdd ratio
antlr 4K 1K 0.35 23K 95K 4.23 2.0M 58K 0.03 766K 28K 0.04
bloat 6K 2K 0.26 46K 132K 2.86 7.9M 81K 0.01 1.0M 38K 0.04
context-insensitive

chart 8K 3K 0.35 39K 163K 4.19 5.3M 101K 0.02 1.8M 51K 0.03
eclipse 5K 2K 0.34 24K 104K 4.39 2.4M 63K 0.03 746K 31K 0.04
hsqldb 4K 1K 0.41 17K 80K 4.71 1.5M 50K 0.03 493K 23K 0.05
jython 6K 2K 0.31 32K 123K 3.90 3.3M 72K 0.02 750K 34K 0.04
luindex 4K 1K 0.38 18K 86K 4.70 1.5M 53K 0.03 567K 25K 0.04
lusearch 4K 2K 0.34 21K 98K 4.65 1.8M 59K 0.03 606K 28K 0.05
pmd 5K 2K 0.32 25K 113K 4.51 2.5M 62K 0.02 652K 28K 0.04
xalan 4K 1K 0.40 17K 80K 4.78 1.4M 50K 0.04 501K 23K 0.05
antlr 22K 37K 1.64 83K 682K 8.26 2.9M 735K 0.26 636K 28K 0.04
bloat 45K 55K 1.21 266K 1.1M 4.32 30M 1.5M 0.05 792K 39K 0.05
1-call-site-sensitive

chart 39K 64K 1.67 164K 1.2M 7.09 18M 1.6M 0.09 1.4M 52K 0.04
eclipse 23K 38K 1.64 113K 705K 6.22 4.0M 852K 0.21 572K 32K 0.06
hsqldb 17K 29K 1.73 61K 523K 8.62 2.1M 590K 0.28 395K 24K 0.06
jython 31K 47K 1.51 139K 907K 6.53 5.7M 1.0M 0.18 539K 35K 0.06
luindex 18K 31K 1.73 65K 559K 8.63 2.4M 645K 0.27 459K 26K 0.06
lusearch 21K 36K 1.69 76K 638K 8.41 2.9M 751K 0.26 488K 29K 0.06
pmd 25K 42K 1.69 94K 769K 8.14 4.7M 843K 0.18 512K 29K 0.06
xalan 17K 29K 1.74 60K 519K 8.64 2.1M 595K 0.29 396K 24K 0.06
antlr 22K 37K 1.63 83K 682K 8.26 8.9M 2.4M 0.27 12M 7.3M 0.59
1-call-site-sensitive+heap

bloat 45K 55K 1.22 251K 1.1M 4.55 159M 7.3M 0.05 27M 10M 0.38
chart 39K 64K 1.66 164K 1.2M 7.11 42M 6.3M 0.15 26M 16M 0.63
eclipse 23K 38K 1.64 113K 706K 6.23 14M 3.1M 0.23 9.4M 7.1M 0.75
hsqldb 17K 29K 1.73 61K 523K 8.61 6.2M 1.8M 0.30 5.7M 4.3M 0.76
jython 31K 47K 1.50 139K 908K 6.54 22M 4.2M 0.19 15M 8.6M 0.58
luindex 18K 31K 1.73 65K 560K 8.63 7.0M 2.1M 0.30 6.4M 5.0M 0.78
lusearch 21K 36K 1.70 76K 637K 8.40 8.5M 2.5M 0.30 7.8M 5.7M 0.74
pmd 25K 42K 1.69 94K 768K 8.13 14M 3.1M 0.22 8.2M 6.7M 0.82
xalan 17K 29K 1.74 60K 518K 8.64 6.1M 1.8M 0.30 5.7M 4.3M 0.77
antlr 36K 19K 0.54 218K 489K 2.25 1.5M 324K 0.22 25K 33K 1.33
bloat 71K 27K 0.38 1.8M 1.2M 0.65 14M 646K 0.05 307K 44K 0.14
1-object-sensitive

chart 81K 38K 0.47 1.0M 1.1M 1.14 16M 763K 0.05 60K 58K 0.97
eclipse 40K 22K 0.55 312K 596K 1.91 1.9M 381K 0.20 27K 36K 1.33
hsqldb 31K 17K 0.55 170K 412K 2.43 1.1M 271K 0.25 17K 28K 1.69
jython 64K 26K 0.40 746K 742K 0.99 4.9M 455K 0.09 38K 39K 1.02
luindex 32K 18K 0.57 178K 436K 2.44 1.2M 294K 0.24 18K 30K 1.73
lusearch 35K 20K 0.57 202K 492K 2.43 1.5M 335K 0.23 20K 34K 1.71
pmd 42K 21K 0.50 309K 557K 1.80 2.6M 373K 0.14 40K 34K 0.85
xalan 30K 17K 0.56 168K 411K 2.45 1.1M 274K 0.25 16K 28K 1.73
antlr 35K 19K 0.55 161K 448K 2.79 8.6M 797K 0.09 2.3M 505K 0.22
1-object-sensitive+heap

bloat 69K 27K 0.39 1.4M 1.0M 0.73 56M 1.9M 0.03 13M 1.2M 0.09
chart 76K 37K 0.49 647K 973K 1.50 41M 1.9M 0.05 9.1M 1.3M 0.14
eclipse 39K 22K 0.56 212K 544K 2.56 11M 1.0M 0.10 2.8M 631K 0.23
hsqldb 30K 17K 0.56 131K 380K 2.90 6.3M 656K 0.10 1.7M 409K 0.24
jython 62K 25K 0.41 638K 684K 1.07 76M 1.4M 0.02 15M 1.1M 0.07
luindex 31K 18K 0.58 134K 402K 2.99 6.4M 695K 0.11 1.7M 427K 0.26
lusearch 34K 20K 0.58 147K 447K 3.04 7.3M 785K 0.11 1.8M 488K 0.26
pmd 41K 21K 0.52 216K 499K 2.31 10M 892K 0.09 2.8M 539K 0.19
xalan 30K 17K 0.57 129K 379K 2.93 6.0M 665K 0.11 1.5M 411K 0.27
Figure 18. BDD statistics for the most important context-sensitive relations of Paddle: total number of facts in the context-
sensitive relation, number of BDD nodes used to represent those facts, and the ratio of BDD nodes / total number of facts.
of Whaley’s bddbddb [19]. In particular, P does not ing the DRed [8] algorithm. Efficient incremental evaluation
maintain information about Class objects created through might make context-sensitive pointer analysis suitable for
Class.forName, which requires very conservative assump- use in IDEs.
tions about later Class.newInstance invocations. However,
the reflection analysis of Livshits was only integrated in
7. Conclusions
a context-insensitive pointer analysis. The fully declarative
nature of D allows us to use very similar Datalog rules We presented D: a purely declarative points-to analysis
also in context-sensitive analyses. framework that raises the bar for precise context-sensitive
analyses. D is elegant, full-featured, modular, and high-
Declarative Programming Analysis. Program analysis us- level, yet achieves remarkable performance due to a novel
ing logic programming has a long history (e.g., [4, 23]), but optimization methodology focused on highly recursive Dat-
this early work only considers very small programs. In re- alog programs. D uses an explicit representation of re-
cent years, there have been efforts to apply declarative pro- lations and cha(lle)nges the community’s understanding on
gram analysis to much larger codebases and more complex how to implement efficient points-to analyses.
analysis problems. We discussed the relation to Whaley’s
work on context-sensitive pointer analysis using Datalog and Acknowledgments This work was funded by the NSF
BDDs [29] throughout this paper. The D [1] analysis (CCF-0917774, CCF-0934631) and by LogicBlox Inc. We
framework has shown to be competitive in performance for thank Ondřej Lhoták for his advice on benchmarking P-
context-insensitive pointer analysis using tabled Prolog. The , Oege de Moor and Molham Aref for useful discus-
demonstrated pointer analysis of D uses a conservative, sions, the anonymous reviewers for helpful comments, and
pre-computed call graph, so the analysis is reduced to prop- the LogicBlox developers for their practical help and sup-
agation of points-to information of assignments, which can port.
be very efficient. D expresses all the logic of a context-
sensitive pointer analysis in Datalog.
References
Demand-Driven and Incremental Analysis. A demand- [1] W. C. Benton and C. N. Fischer. Interactive, scalable, declar-
driven evaluation strategy reduces the cost of an analysis ative program analysis: from prototype to implementation.
by computing only those results that are necessary for a In PPDP ’07: Proc. of the 9th ACM SIGPLAN int. conf. on
client program analysis [12, 26, 27, 31]. This is a useful Principles and practice of declarative programming, pages
approach for client analyses that focus on specific locations 13–24, New York, NY, USA, 2007. ACM.
in a program, but if the client needs results from the entire [2] M. Berndl, O. Lhoták, F. Qian, L. J. Hendren, and N. Umanee.
program, then demand-driven analysis is typically slower Points-to analysis using bdds. In PLDI, pages 103–114.
than an exhaustive pointer analysis. Reps [24] showed how ACM, 2003.
to use the standard magic-sets optimization to automatically [3] M. Bravenboer and Y. Smaragdakis. Exception analysis and
derive a demand-driven analysis from an exhaustive analysis points-to analysis: Better together. In L. Dillon, editor, ISSTA
(like ours). This optimization combines the benefits of top- ’09: Proceedings of the 2009 International Symposium on
down and bottom-up evaluation of logic programs by adding Software Testing and Analysis, New York, NY, USA, July
side-conditions to rules that limit the computation to just the 2009. To appear.
required data. [4] S. Dawson, C. R. Ramakrishnan, and D. S. Warren. Practical
More recently, Saha and Ramakrishnan [25] explored the program analysis using general purpose logic programming
application of incremental logic program evaluation strate- systems—a case study. In PLDI ’96: Proc. of the ACM
gies to context-insensitive pointer analysis. As pointed out SIGPLAN 1996 conf. on Programming language design and
in this work, the algorithms for materialized view mainte- implementation, pages 117–126, New York, NY, USA, 1996.
nance and incremental program analysis are highly related. ACM.
As we discussed, incremental evaluation is also crucial for [5] S. K. Debray. Unfold/fold transformations and loop
D’s performance. The large number of reachable meth- optimization of logic programs. In PLDI ’88: Proc. of
ods in an empty Java program4 suggests that incremental the ACM SIGPLAN 1988 conf. on Programming Language
analysis could bring down the from-scratch evaluation time design and Implementation, pages 297–307, New York, NY,
substantially. We have not explored these incremental eval- USA, 1988. ACM.
uation scenarios yet. The engine we use also supports in- [6] M. Eichberg, S. Kloppenburg, K. Klose, and M. Mezini.
cremental evaluation after deletion and updates of facts us- Defining and continuous checking of structural program
dependencies. In ICSE ’08: Proc. of the 30th int. conf. on
4 Even an empty Java program causes the execution of a number of methods
Software engineering, pages 391–400, New York, NY, USA,
from the standard library. This causes a static analysis to compute an even 2008. ACM.
larger number of reachable methods, especially when no assumptions are
made about the loading environment (e.g., security settings and where the [7] S. J. Fink. T.J. Watson libraries for analysis (WALA).
empty class will be loaded from). http://wala.sourceforge.net.
[8] A. Gupta, I. S. Mumick, and V. S. Subrahmanian. Main- and Implementation (PLDI’06), pages 308–319, 2006.
taining views incrementally. In SIGMOD ’93: Proc. of the [22] E. M. Nystrom, H.-S. Kim, and W. mei W. Hwu. Importance
1993 ACM SIGMOD int. conf. on Management of data, pages of heap specialization in pointer analysis. In PASTE ’04:
157–166, New York, NY, USA, 1993. ACM. Proc. of the 5th ACM SIGPLAN-SIGSOFT workshop on
[9] E. Hajiyev, M. Verbaere, and O. de Moor. Codequest: Program analysis for software tools and engineering, pages
Scalable source code queries with datalog. In Proc. European 43–48, New York, NY, USA, 2004. ACM.
Conf. on Object-Oriented Programming (ECOOP), pages 2– [23] T. Reps. Demand interprocedural program analysis using
27. Spinger, 2006. logic databases. In R. Ramakrishnan, editor, Applications
[10] B. Hardekopf and C. Lin. The ant and the grasshopper: fast of Logic Databases, pages 163–196. Kluwer Academic
and accurate pointer analysis for millions of lines of code. Publishers, 1994.
In PLDI’07: Proc. ACM SIGPLAN conf. on Programming [24] T. W. Reps. Solving demand versions of interprocedural
Language Design and Implementation, pages 290–299, New analysis problems. In CC ’94: Proc. of the 5th Int. Conf. on
York, NY, USA, 2007. ACM. Compiler Construction, pages 389–403, London, UK, 1994.
[11] B. Hardekopf and C. Lin. Semi-sparse flow-sensitive pointer Springer-Verlag.
analysis. In POPL ’09: Proceedings of the 36th annual ACM [25] D. Saha and C. R. Ramakrishnan. Incremental and demand-
SIGPLAN-SIGACT symposium on Principles of programming driven points-to analysis using logic programming. In PPDP
languages, pages 226–238, New York, NY, USA, 2009. ’05: Proc. of the 7th ACM SIGPLAN int. conf. on Principles
ACM. and practice of declarative programming, pages 117–128,
[12] N. Heintze and O. Tardieu. Demand-driven pointer analysis. New York, NY, USA, 2005. ACM.
In PLDI ’01: Proc. of the ACM SIGPLAN 2001 conf. on [26] M. Sridharan and R. Bodı́k. Refinement-based context-
Programming language design and implementation, pages sensitive points-to analysis for java. In PLDI ’06: Proc. of
24–34, New York, NY, USA, 2001. ACM. the 2006 ACM SIGPLAN conf. on Programming language
[13] M. S. Lam, J. Whaley, V. B. Livshits, M. C. Martin, D. Avots, design and implementation, pages 387–400, New York, NY,
M. Carbin, and C. Unkel. Context-sensitive program analysis USA, 2006. ACM.
as database queries. In PODS ’05: Proc. of the twenty-fourth [27] M. Sridharan, D. Gopan, L. Shan, and R. Bodı́k. Demand-
ACM SIGMOD-SIGACT-SIGART symposium on Principles driven points-to analysis for java. In OOPSLA ’05: Proc.
of database systems, pages 1–12, New York, NY, USA, 2005. of the 20th annual ACM SIGPLAN conf. on Object oriented
ACM. programming, systems, languages, and applications, pages
[14] C. Lattner, A. Lenharth, and V. Adve. Making context- 59–76, New York, NY, USA, 2005. ACM.
sensitive points-to analysis with heap cloning practical for [28] J. Whaley, D. Avots, M. Carbin, and M. S. Lam. Using
the real world. SIGPLAN Not., 42(6):278–289, 2007. datalog with binary decision diagrams for program analysis.
[15] O. Lhoták. Program Analysis using Binary Decision In K. Yi, editor, APLAS, volume 3780 of Lecture Notes in
Diagrams. PhD thesis, McGill University, Jan. 2006. Computer Science, pages 97–118. Springer, 2005.
[16] O. Lhoták and L. Hendren. Scaling Java points-to analysis [29] J. Whaley and M. S. Lam. Cloning-based context-sensitive
using Spark. In G. Hedin, editor, Compiler Construction, 12th pointer alias analysis using binary decision diagrams. In
Int. Conf., volume 2622 of LNCS, pages 153–169, Warsaw, PLDI ’04: Proc. of the ACM SIGPLAN 2004 conf. on
Poland, April 2003. Springer. Programming language design and implementation, pages
[17] O. Lhoták and L. Hendren. Jedd: a bdd-based relational ex- 131–144, New York, NY, USA, 2004. ACM.
tension of java. In PLDI ’04: Proc. of the ACM SIGPLAN [30] G. Xu and A. Rountev. Merging equivalent contexts
2004 conf. on Programming language design and implemen- for scalable heap-cloning-based context-sensitive points-to
tation, pages 158–169, New York, NY, USA, 2004. ACM. analysis. In ISSTA ’08: Proc. of the 2008 int. symposium
[18] O. Lhoták and L. Hendren. Evaluating the benefits of on Software testing and analysis, pages 225–236, New York,
context-sensitive points-to analysis using a BDD-based NY, USA, 2008. ACM.
implementation. ACM Trans. Softw. Eng. Methodol., 18(1):1– [31] X. Zheng and R. Rugina. Demand-driven alias analysis
53, 2008. for c. In POPL ’08: Proc. of the 35th annual ACM
[19] B. Livshits, J. Whaley, and M. S. Lam. Reflection analysis SIGPLAN-SIGACT symposium on Principles of programming
for Java. In K. Yi, editor, Proceedings of the 3rd Asian languages, pages 197–208, New York, NY, USA, 2008.
Symposium on Programming Languages and Systems, ACM.
volume 3780. Springer-Verlag, Nov. 2005.
[20] A. Milanova, A. Rountev, and B. G. Ryder. Parameterized
object sensitivity for points-to analysis for java. ACM Trans.
Softw. Eng. Methodol., 14(1):1–41, 2005.
[21] M. Naik, A. Aiken, and J. Whaley. Effective static race
detection for java. In Proceedings of the 2006 ACM
SIGPLAN Conference on Programming Language Design
Published in Software Safety and Security; Tools for Analysis and Verification. NATO Science for
Peace and Security Series, vol 33, pp286-318, 2012

A Primer on Separation Logic


(and Automatic Program Verification and
Analysis)
Peter W. O’Hearn 1
Queen Mary University of London

Abstract. These are the notes to accompany a course at the Marktoberdorf PhD
summer school in 2011. The course consists of an introduction to separation logic,
with a slant towards its use in automatic program verification and analysis.
Keywords. Program Logic, Automatic Program Verification, Abstract Interpretation,
Separation Logic

1. Introduction

Separation logic, first developed in papers by John Reynolds, the author, Hongseok Yang
and Samin Ishtiaq, around the turn of the millenium [73,47,61,74], is an extension of
Hoare’s logic for reasoning about programs that access and mutate data held in computer
memory. It is based on the separating conjunction P ∗ Q, which asserts that P and Q
hold for separate portions of memory, and on program-proof rules that exploit separation
to provide modular reasoning about programs.
In this course I am going to introduce the basics of separation logic, its semantics,
and proof theory, in a way that is oriented towards its use in automatic program-proof
tools and abstract interpreters, an area of work which has seen increasing attention in
recent years. After the basics, I will describe how the ideas can be used to build a verifi-
cation or program analysis tool.
The course consists of four lectures:
1. Basics, where the fundamental ideas of the logic are presented in a semi-formal
style;
2. Foundations, where we get into the fomalities, including the semantics of the
assertion language and axioms and inference rules for heap-mutating commands,
and culminating in an account of the local dynamics which underpin some of the
rules in the logic;

1 This work was supported by funding from the Royal Society, the EPSRC and Microsoft Research.
3. Proof Theory and Symbolic Execution, which describes a way of reasoning about
programs by ‘executing’ programs on formulae rather than concrete states, and
which can form the basis for an automatic verifier; and
4. Program Analysis, where abstraction is used to infer loop invariants and other
annotations, increasing the level of automation.
These course notes include two sections based on the first two lectures, followed by a
section collecting ideas from the last two lectures. At this stage the notes are incomplete,
and they will possibly be improved and extended in the future. I hope, though, that they
will still prove useful in giving a flavour of some of the main lines of work, as well as in
pointers into the literature. In particular, at the end I give references to current directions
being pursued in program analysis.
I should say that, with this slant towards automatic proof and program analysis, there
are active ongoing developments related to separation logic in several other directions
that I will not be able to cover, particularly in concurrency, data abstraction and refine-
ment, object-oriented languages and scripting languages; a small sample of work in these
directions includes [62,64,66,10,81,34,28,38].

2. Basics

In this section I introduce separation logic in a semi-formal way. I am hoping that some
of the ideas can strike home and be seen to reflect natural reasoning that programmers
might employ, even before we consider formal definitions. Of course, the informal pre-
sentation inevitably skates over some issues, issues that could very well lead to unsound
conclusions if not treated correctly, and to nail things down we will get to the definitions
in the next section.

2.1. The Separating Conjunction.

Consider the following memory structure.

x|->y * y|-> x

x y

10 42
x=10
42 10
y=42
We read the formula at the top of this figure as ‘x points to y, and separately y points to
x’. Going down the middle of the diagram is a line which represents a heap partitioning:
a separating conjunction asks for a partitioning that divides memory into parts satisfying
its two conjuncts.
At the bottom of the figure is an example of a concrete memory description that
corresponds to the diagram. There, x and y have values 10 and 42 (in the ‘environment’,
or ‘register bank’), and 10 and 42 are themselves locations with the indicated contents
(in the ‘heap’, or even ‘RAM’).
The indicated separating conjunction above is true of the pictured memory because
the parts satisfy the corresponding conjuncts. That is, the components

x|->y y|-> x

x y x y
And

Separately

x=10 10 x=10 42

42 10
y=42 y=42

are separate sub-states that satisfy the relevant conjuncts.


It can be confusing to see a diagram like the one on the left where ‘x points to y and
yet to nothing’. This is disambiguated in the RAM description below the diagram. In the
more concrete description x and y denote values (10 and 42), x’s value is an allocated
memory address which contains y’s value, but y’s value is not allocated. Notice also
that, in comparison to the first diagram, the separating conjunction splits the heap/RAM,
but it does not split the association of variables to values: heap cells, but not variable
associations, are deleted from the original situation to obtain the sub-states. It is usually
simplest to think in terms of the picture semantics of separation logic, but when we get
formal in the next section we will drop down to the RAM level (as we could always do
when pressed).
In general, an assertion P denotes a set of states, and P ∗ Q is true of a state just if
its heap/RAM component can be split into two parts, one of which satisfies P and the
other of which satisfies Q.
When reasoning about programs that manipulate data structures, one normally wants
to use inductively-defined predicates that describe such structures. Here is a definition
for a predicate that describes binary trees:

tree(E) ⇐⇒ if isatom?(E) then emp


else ∃xy. E7→[l: x, r: y] ∗ tree(x) ∗ tree(y)

In this definition we assume a boolean expression isatom?(E) which distinguishes


atomic values (e.g., characters...) from addressible locations: in the RAM model, we
could say that the locations are the non-negative integers and the atoms the negative ones.
We have used a record notation E7→[l: x, r: y] for a ‘points-to predicate’ that describes a
single record E that contains x in its l field and y in its r field. Again in the RAM model,
this binary points-to can be compiled into the unary one: That is, E7→[l: x, r: y] could be
an abbreviation for (E7→x) ∗ (E+17→y) (Or, you could imagine a model where the heap
consists of explicit records with field selection.) The separating conjunction between the
E7→[l: x, r: y] assertion and the two recursive instances of tree in the definition ensures
that there are no cycles, and the separating conjunction between the two subtrees ensures
that we have a tree and not a dag.
The emp predicate in the base case of the inductive definition describes the empty
heap, the heap with no allocated cells. A consequence of this is that when tree(E) holds
there are no extra cells, cells in the heap but not in the tree, in a state satisfying the
predicate. This is a key specification pattern often employed in separation logic proofs:
we use assertions that describe only as much state as is needed, and nothing else.
At this point you might think that I have described an exotic-looking formalism for
writing assertions about heaps and you might wonder: why bother? The mere ability
to describe heaps in principle is not important in and of itself, and in this separation
logic adds nothing significant to traditional predicate logic. It is when we consider the
interaction between assertions and operations for mutating memory that the point of the
formalism comes out.

2.2. In-place Reasoning

Proving by Executing. I am going to show you part of a program proof outline in sepa-
ration logic. It might seem slightly eccentric that I do this before giving you a definition
of the logic. My aim is to use a computational reading of the proof steps to motivate the
inference rules, rather than starting from them.
Consider the following procedure for disposing the elements in a tree.

procedure DispTree(p)
local i, j;
if ¬isatom?(p) then
i := p→l; j := p→r;
DispTree(i);
DispTree(j);
free(p)

This is the expected procedure that walks a tree, recursively disposing left and right
subtrees and then the root pointer. It uses a representation of tree nodes as cells containing
left and right pointers, with the base case corresponding to atomic, non-pointer values.
(See Exercise 2 below for a fuller description.)
The specification of DispTree is just
 
tree(p) DispTree(p) emp}
which says that if you have a tree at the beginning then you end up with the empty heap
at the end. For this to make sense it is crucial that when tree(p) is true of a heap then
that heap (or rather the heaplet, a portion of a global heap) contains all and only those
cells in the tree. So, the spec talks about as small a portion of the global program state as
possible.
The crucial part of the argument for DispTree’s correctness, in the then branch,
can be pictured with the following annotated program which gives a ‘proof by execution’
style argument.

{p7→[l: x, r: y] ∗ tree(x) ∗ tree(y)}


i := p→l; j := p→r;
{p7→[l: i, r: j] ∗ tree(i) ∗ tree(j)}
DispTree(i);
{p7→[l: i, r: j] ∗ emp ∗ tree(j)}
{p7→[l: i, r: j] ∗ tree(j)}
DispTree(j);
{p7→[l: i, r: j] ∗ emp}
free p
{emp}

After we enter the then branch of the conditional we know that ¬isatom?(p), so that
(according to the inductive definition of the tree predicate) p points to left and right sub-
trees occupying separate storage. Then the roots of the two subtrees are loaded into i
and j. The first recursive call operates in-place on the left subtree, removing it. The two
consecutive assertions in the middle of the proof are an application of the rule of conse-
quence of Hoare logic. These two assertions are equivalent because emp is the unit of ∗.
Continuing on, the second call removes the right subtree, and the final instruction frees
the root pointer p. The assertions, and their mutations, follow this operational narrative.
I am leading to a more general suggestion: try thinking about reasoning in separation
logic as if you are an interpreter. The formulae are like states, symbolic states. Execute
code forwards, updating formulae in the usual way you do when thinking about in-place
update of memory. In-place reasoning works not only for freeing a cell, but for heap
mutation and allocation as well. And, it even works for larger-scale operations such as
entire procedure calls: we updated the assertions in-place at each of the recursive call
sites during this ‘proof’.

Exercise 1 The usual Hoare logic rules for sequencing and consequence are

{P } C1 {Q} {Q} C2 {R} P ⇒ P0 {P 0 } C {Q0 } Q0 ⇒ Q


{P } C1 ; C2 {R} {P } C {Q}

where ⇒ refers to implication. Assuming we know how to decide ⇒ formulae, convert


the annotated program block for the then case above into a proof in the usual logical
sense (that is, a tree built from instances of these rules). You can assume that the triples
of pre/post for each of the individual statements in the proof outline are given as axioms.

Local Reasoning and Frame Axioms. In the steps in the proof outline for DispTree(p)
I used the procedure spec as an assumption when reasoning about the recursive calls, as
usual when reasoning about recursive procedures in Hoare logic [43]. However, there is
an extra ingredient at work. For the second recursive call, for instance, the assertion at
the call site does not match the procedure specification’s precondition, even after p in the
spec is instantiated with j, because the assertion has an extra ∗-conjunct, p7→[l: i, r: j].

Assertion at call site : p7→[l: i, r: j] ∗ tree(j)


Precondition in spec : tree(j)

This extra ∗-conjunct is not touched by the recursive call. It is called a ‘frame axiom’
in AI. The terminology ‘frame axiom’ comes from an analogy with animation, where
the moving parts of a scene are successively aid over an unchanging frame. Indeed, the
fact p7→[l: i, r: j] is left unchanged by the second call. You should be able to pick out the
frame in the first call as well.
Thus, there is something slightly awry in this ‘proof’, unless I tell you more: The
mismatch, between the call sites and the procedure precondition, needs to be taken care
of if we are really to have a proof of the procedure. One way to resolve the mismatch
would be to complicate the specification of the procedure, to talk explicitly about frames
in one way or another (see ‘back in the day’ below). A better approach is to have a generic
inference rule, which allows us to avoid mentioning the frames at all in our specifications,
but to bring them in when needed. This generic rule is

{P } C {Q}
Frame Rule
{R ∗ P } C {R ∗ Q}

and it lets us tack on additional assertions ‘for free’, as it were. For instance, in the second
recursive call the frame axiom R selected is p7→[l: i, r: j] and {P }C{Q} is a substitution
instance of the procedure spec: this captures that the recursive call does not alter the root
pointer.
This better way, which avoids talking about frames in specifications, corresponds
to programming intuition. When reasoning about a program we should only have to
talk about the resources it accesses (its ‘footprint’), as all other resources will remain
unchanged. This is the principle of local reasoning [47,61]. In the specification of
DispTree the precondition tree(p) describes only those cells touched by the procedure.
Aside: back in the day... This issue of local reasoning has nothing to do with the ‘power’
or ‘completeness’ of a formal method: what is possible to do in principle. It has only to do
with the simplicity and directness of the specs and proofs. To see the issue more clearly,
consider how we might have written a spec for DispTree(p) in traditional Hoare logic,
before we had the frame rule. Here is a beginning attempt:
 
tree(p) ∧ reach(p, n) DispTree(p) ¬allocated(n)
assuming that we have defined the predicates that say when p points to a (binary) tree in
memory, when n is reachable (following l and r links) from p, and when n is allocated.
This spec says that any node n which is in the tree pointed to by p is not allocated on
conclusion.
While this specification says part of what we would like to say, it leaves too much
unsaid. It does not say what the procedure does to nodes that are not in the tree. As a
result, this specification is too weak to use at many call sites. For example, consider the
first recursive call, DispTree(i), to dispose the left subtree. If we use the specification
(instantiating p by i) as an hypothesis, then we have a problem: the specification does
not rule out the possibility that the procedure call alters the right subtree j, perhaps
creating a cycle or even disposing some of its nodes. As a consequence, when we come
to the second call DispTree(j), we will not know that the required tree(j) part of the
precondition will hold. So our reasoning will get stuck.
We can fix this ‘problem’ by making a stronger specification which includes frame
axioms.
0

tree(p) ∧ reach(p,
n) ∧ ¬reach(p, m) ∧ allocated(m) ∧ m.f = m ∧
¬allocated(q)
DispTree(p)
0

¬allocated(n) ∧ ¬reach(p, m) ∧ allocated(m) ∧ m.f = m ∧
¬allocated(q)
The additional parts of the spec say that any allocated cell not reachable from p has
the same contents in memory and that any previously unallocated cell remains unallo-
cated. The additional clauses are the frame axioms. (I am assuming that m, m0 , n and q
are auxiliary variables, guaranteed not to be altered. The reason why, say, the predicate
¬allocated(q) could conceivably change, even if q is constant, is that the allocated
predicate refers to a behind-the-scenes heap component. f is used in the spec as an arbi-
trary field name.)
Whether or not this more complicated specification is correct, I think you will agree:
it is complicated! I expect that you will agree as well that it is preferable for the frame
axioms to be left out of specs, and inferred when needed.
Beyond Shapes. The above shows one inductive definition, for binary trees. The def-
inition is limited in that it does not talk about the contents of a tree. It is the kind of
definition often used in automatic shape analysis, as we will describe in Section 4, where
avoiding taking about the contents can make it easier to prove entailments or synthesize
invariants.
To illustrate the limitation of the definition, suppose that we were to write a proce-
dure to copy a tree rather than delete it. We could give it a specification such as
 
tree(p) q := CopyTree(p) tree(p) ∗ tree(q)}
but then this specification would also be satisfied by a procedure that rotates a tree as it
copies. A more precise specification would be of the form
 
tree(p, τ ) q := CopyTree(p) tree(p, τ ) ∗ tree(q, τ )}
where tree(p, τ ) is a predicate which says that p points to a data structure in memory
representing the mathematical tree τ . (I use the term ‘mathematical’ tree to distinguish
if from a representation in the computer memory: the mathematical tree does not contain
pointer or other such representation information.)

Exercise 2 The notion of ‘mathematical tree’ appropriate to the above inductive defini-
tion of the tree predicate is that of an s-expression (the terminology comes from Lisp):
that is, an atom, or a pair of s-expressions. An s-expression is an element of the least set
satisfying the equation

Sexp = Atom + (Sexp × Sexp)


for some set Atom of atoms, where × and + are the cartesian product and disjoint union
of sets. Here, then, is the inductive definition of a tree(p, τ ) predicate, where τ ∈ Sexp:

tree(E, τ ) ⇐⇒ if (isatom?(E) ∧ E = τ ) then emp


else ∃xyτ1 τ2 . τ = hτ1 , τ2 i ∧ (E7→[l: x, r: y] ∗ tree(x) ∗ tree(y))

Define the CopyTree procedure and give a proof-by-execution style argument for
its correctness, where you put assertions (symbolic states) at the appropriate program
points. Yes, I am asking you to do a ‘proof’ in a formalism that has not yet been defined
(!), but give it a try.

2.3. Perspective.

In this section I have attempted to illustrate the following points.


(i) The separating conjunction fits together with inductive definitions in a way that
supports natural descriptions of mutable data structures.
(ii) The separating conjunction supports in-place reasoning, where a portion of a
formula is updated in place when passing from precondition to postcondition,
mirroring the operational locality of heap update.
(iii) Frame axioms, which state what does not change, can be avoided when writing
specifications.
These points together enable specifications and proofs for pointer programs that are dra-
matically simpler than was possible previously, in many (not all) cases approaching the
simplicity associated with proofs of pure functional programs. (That is, previous ap-
proaches excepting the remarkable precursor work of Burstall [15], which provided in-
spiration for Reynolds’s earliest work on separation logic [73]. You can see [12] for
references and a good account of work on proving pointer programs before separation
logic.)
However, I should stress at once that program proofs do not always go as easily
as for DispTree. When one considers graph algorithms with significant sharing, or
concurrent programs with nontrivial interaction, proofs can become complicated. Neither
separation logic nor any other formalism takes programs that are difficult to understand
and magically gives them easy proofs.
A more realistic goal is to have simple proofs for simple programs. Whether, or to
what extent, this might be achieved by any given formalism can only be decided person-
ally by you looking at, or better by doing, proofs of a number of examples.

3. Foundations

Building on the ideas described informally in the previous section, I now give a rigorous
treatment of the program logic.

3.1. Semantics of Assertions (the Heaplet Model)

The model has two components, the store and the heap. The store is a finite partial func-
tion mapping from variables to integers, and the heap is a finite function from natural
numbers to integers.
∆ ∆
Stores = Variables *f in Ints Heaps = Nats *f in Ints


(= abbreviates ‘is defined to be equal to’.) In logic, what we are calling the store is often
called the valuation, and the heap is a possible world. In programming languages, what
we are calling the store is also sometimes called the environment (the association of
values to variables).
We have standard integer expressions E and boolean expressions B built up from
variables and constants. These are heap-independent, so determine denotations

[[E]]s ∈ Ints [[B]]s ∈ {true, f alse}

where the domain of s ∈ Stores includes the free variables of E or B. We leave this
semantics unspecified.
We use the following notations in the semantics of assertions.
1. dom(h) denotes the domain of definition of a heap h ∈ Heaps, and dom(s) is the
domain of s ∈ Stores;
2. h#h0 says that dom(h) ∩ dom(h0 ) = ∅;
3. h • h0 denotes the union of functions with disjoint domains, which is undefined
if the domains overlap;
4. (f | i 7→ j) is the partial function like f except that i goes to j.
The satisfaction judgement s, h |= P which says that an assertion holds for a given
store and heap, assuming that the free variables of P are contained in the domain of s.

s, h |= B iff [[B]]s = true

s, h |= E 7→ F iff {[[E]]s} = dom(h) and h([[E]]s) = [[F ]]s

s, h |= false never

s, h |= P ⇒ Q iff if s, h |= P then s, h |= Q

s, h |= ∀x.P iff ∀v ∈ Ints. [s | x 7→ v], h |= P

s, h |= emp iff h = [ ] is the empty heap

s, h |= P ∗ Q iff ∃h0 , h1 . h0 #h1 , h0 ∗ h1 = h, s, h0 |= P and s, h1 |= Q

s, h |= P −∗ Q iff ∀h0 . if h0 #h and s, h0 |= P then s, h • h0 |= Q

The semantics of the connectives (⇒ false, ∀) gives rise to meanings of other connec-
tives of classical logic (∃, ∨, ¬, true) in the usual way. For example, taking P ∧ Q to be
¬(P ⇒ ¬Q), we obtain that s, h |= P ∧ Q has the usual meaning of ‘s, h |= P and
s, h |= Q’.
The general logical context of this form of semantics is that it can be seen as a
possible world model which combines:
(i) the standard semantics of classical logic (⇒, false, ∀) in the complete boolean
algebra of the power set of heaps; and
(ii) a semantics of ‘substructural logic’ (emp, ∗, −∗ ) in the same power set (which
gives us what is known as a residuated commutative monoid, an ordered commu-
tative monoid where A ∗ (−) has a right adjoint A−∗ (−)).
The semantics is an instance of the ‘resource semantics’ of bunched logic devised by
David Pym [63,70,69], where one starts from a partial commutative monoid in place of
heaps (with • and the empty heap giving partial monoid structure). The resulting math-
ematical structure on the powerset, of a boolean algebra with an additional commuta-
tive residuated monoid structure, is sometimes called a ‘boolean BI algebra’. The model
of • as heap partitioning, which lies at the basis of separation logic, was discovered by
John Reynolds when he first described the separating conjunction [73]. The separating
conjunction was connected with Pym’s general resource semantics in [47].
Notice that the semantics of E7→F requires that E is the only active address in
the current heap. Using ∗ we can build up descriptions of larger heaps. For example,
(107→3) ∗ (117→10) describes two adjacent cells whose contents are 3 and 10. We can
express an inexact variant of points-to as follows
E,→F = (true ∗ E7→F ).
Generally, true ∗ P says that P is true of a subheap of the current one. The difference
between ,→ and 7→ shows up in the presence or absence of projection or Weakening for
∗.
1. P ∗ (x7→1) ⇒ (x7→1) is not always true.
2. P ∗ (x,→1) ⇒ (x,→1) is always true.
The different way that the two conjunctions ∗ and ∧ behave is illustrated by the
following examples.
1. (x7→2) ∗ (x7→2) is unsatisfiable (you can’t be in two places at once).
2. (x7→2) ∧ (x7→2) is equivalent to x7→2.
3. (x7→1) ∗ ¬(x7→1) is satisfiable (thus, we have a kind of ‘paraconsistent’ logic).
4. (x7→1) ∧ ¬(x7→1) is unsatisfiable.
The third example drives home how separation logic assertions do not talk about the
global heap: P ∗¬P can be consistent because P can hold of one portion of heap and ¬P
of another. To understand separation logic assertions you should always think locally:
for this you might regard the h component in the semantics of assertions as describing a
‘heaplet’, a portion of heap, rather than a complete global heap in and of itself.

Exercise 3 Define 7→ in terms of ,→, ∧, ∗, ¬ and emp.

Aside: on 7→ versus = A frequent source of confusion when first learning separation


logic concerns how the ∗ separator splits the heap but not the store, and this translates
into confusions reading assertions with = in them. Recall the definitions above

s, h |= B iff [[B]]s = true

s, h |= E 7→ F iff {[[E]]s} = dom(h) and h([[E]]s) = [[F ]]s.

Notice that the rhs of the clause for s, h |= B does not mention h at all, where for
s, h |= E 7→ F the rhs does contain h. I said I would not give a precise semantics of
boolean expressions, but let me consider just one, the expression x = y where x and y
are variables:

[[x = y]]s = (sx = sy).

Now, consider the assertion (x = y) ∗ (x = y). Can it ever be true? Well, yes, it is
satisfiable, and in fact it has the same meaning as x = y and as (x = y) ∧ (x = y). On
the other hand, consider the assertion (x7→y) ∗ (x7→y). Can it ever be true? How about
(x = y) ∗ (x7→y)? Or (x = y) ∧ (x7→y)? Work out the answers to these questions by
expanding the semantic definitions.

3.2. Inductive Definitions, Again

Earlier we considered an inductive definition of trees representing s-expressons.

tree(E, τ ) ⇐⇒ if (isatom?(E) ∧ E = τ ) then emp


else ∃xyτ1 τ2 . τ = hτ1 , τ2 i ∧ (E7→[l: x, r: y] ∗ tree(x) ∗ tree(y))

Now we can be more precise about its meaning. The use of if-then-else can be desugared
using boolean logic connectives in the usual way. if B then P else Q is the same
as (B ∧ P ) ∨ (¬B ∧ Q) where here B is heap-independent. Therefore, in the inductive
definition we can now see that the condition (isatom?(E) ∧ E = τ ) is completely
heap-independent, and not affected by ∗: it talks only about values, and not the contents
of heap cells.
It is also helpful to ponder the clause

τ = hτ1 , τ2 i ∧ (E7→[l: x, r: y] ∗ tree(x) ∗ tree(y))

used in the definition. In fact, we could have rewritten it using ∗ in place of ∧, as

(τ = hτ1 , τ2 i ∧ emp) ∗ (E7→[l: x, r: y] ∗ tree(x) ∗ tree(y)).

Here, we have used a general identity

B ∧ P ⇐⇒ (B ∧ emp) ∗ P

which holds whenever B is heap-independent. On the other hand, if we replaced one of


the other occurrences of ∗ by ∧, it would more dramatically alter the definition (exercise:
by playing with ∗, ∧ and perhaps inserting true, can you alter this definition so that it
describes dags rather than trees?).
In case you missed it, to be fully formal in the interpretation of this definition we
should also extend the store type to be


Stores = Variables *f in (Ints + Sexp)

so that a variable can take on an s-expression as well as an integer value. We could also
distinguish s-expression variables τ from program variables x syntactically. (In practice,
one would probably want to use a many-sorted rather than one-sorted logic as we are
doing in these notes for theoretical simplicity.)
Finally, we can regard E7→[l: x, r: y] as sugar for (E7→x) ∗ (E +17→y) in the RAM
model. Note, though, that this low-level desugaring is not part of the essence of separa-
tion logic, only this particular model. Other models can be used where heaps are repre-
sented by L *f in V where V might be a structured type to represent records. However,
that the RAM model can be used is appealing in a foundational way, as we know that
programs of all kinds are eventually compiled to such a model (modern concerns with
weak memory notwithstanding).
Generally, for any kind of data structure you will want to provide an appropriate
predicate definition which will often be inductive. Linked lists are the most basic case,
and illustrate some of the issues involved.
When reasoning about imperative data structures, one needs to consider not only
complete linked lists (terminated with nil ) but also ‘partial lists’ or linked-list segments.
Here is an example of a list segment predicate describing lists from E to F (where F is
not allocated).

ls(E, F ) ⇐⇒ if E = F then emp


else ∃y.E7→y ∗ ls(y, F )

I am intending that ls is the least predicate satisfying the equation. Mathematically, it


can be worked out as the least fixed-point of a monotone function on a certain lattice,
by reference to the Tarski fixed-point theorem. (Exercise: what is the lattice and what
is the monotone function?) It is possible as well to give an alternate definition whose
formalization does not need to talk about lattices: you define a predicate ls(E, F, n)
describing a linked list segment from E to F of length n, and then define ls(E, F ) to be
∃n. ls(E, F, n).
This list segment predicate rules out cycles. However, cycles can be described using
two list predicates, or a points-to and a list segment. For example, the following assertion
is validated in the pictured model.

ls(x, y) ∗ ls(y, x)

These partial lists are sometimes used in the specifications of data structures, such
as queues. In other cases, they are needed to state the internal invariants of an algorithm,
even when the pre and post of a program use total lists only (total lists list(E) can be
regarded as abbreviations for segments ls(E, nil )). Here is a program from the S MALL -
FOOT tool [7] which exemplifies this point.
list_append(x,y) PRE: [list(x) * list(y)] {
local t;
if (x == NULL) {
x = y;
} else {
t = x; n = t->tl;
while (n != NULL) [ls(x,t) * t |-> n * list(n)] {
t = n;
n = t->tl;
}
t->tl = y;
} /* ls(x,t) * t |-> y * list(y) */
} POST: [list(x)]

This program, which appends two lists by walking down one and then swinging its last
pointer to the other, uses a partial list in its loop invariant, even though partial lists are
not needed in the overall procedure spec. In proving this program an important point is
how one gets from the last statement to the postcondition. A comment near the end of
the program shows an assertion describing what is known at that program point, and we
need to show that it implies the post to verify the program. That is, we need to show an
implication

ls(x, t) ∗ t7→y ∗ list(y) =⇒ list(x).

This implication may seem unremarkable, but it is at this point that automatic tools must
begin to do something clever. For, consider how you, the human, would convince yourself
of the truth of this implication. If it were me, I would look at the semantics and prove
this fact by induction on the length of the list from x to t. But if we were to include such
reasoning in an automatic tool, we had better try to do so in an inductionless way, else
our tool will need to search for induction hypotheses (which is hard to make automatic).

Exercise 4 There are other definitions of list segments that have been used. Here is one,
the ‘imprecise list segment’.

ils(E, F ) ⇐⇒ (E = F ∧ emp)
∨ ∃y.E7→y ∗ ils(y, F )

Q1. What is a heap that distinguishes ls(10, 10) and ils(10, 10) ?
Q2. What distinguishes ls(10, 11) and ils(10, 11) ?
Q3. Prove or disprove the following laws (do your proof by working in the semantics)

ls(x, y) ∗ ls(y, z) =⇒ ls(x, z) ???


ils(x, y) ∗ ils(y, z) =⇒ ils(x, z) ???

Q4. Suppose we want to write a procedure that frees all the cells in a list segment.
For which of ils or ls can you do it? If you cannot do it for one of them, why not?
That is, we are asking for terminating programs satisfying
{ls(x, y)} delete_ls(x, y) {emp}
{ils(x, y)} delete_ils(x, y) {emp}

(I have not given you the definition of the truth of pre/post specs yet, but you
should be able to answer this question anyhow.)

Exercise 5 Give a definition ls(E, F, σ) of a predicate describing a linked list from E to


F that contains the sequence σ in data fields. Write specification of programs that insert
and delete elements from sorted linked lists, where σ is sorted according to an ordering.
Give at least the loop invariants for these programs (write iterative versions). Attempt a
proof-by-execution type argument as well.

Exercise 6 The predicate tree(E, τ ) we used above considers τ as an s-expression,


where the values are only at the leaves of the tree and not at internal nodes. Often, one
wants to use a data structure for mathematical trees including data at internal nodes,
and one way to describe these is with the set equation

Mtree = {nil } + (Mtree × Atom × Mtree)

In this sort of tree, nil is the empty tree and the leaves of a non-empty tree are those
3-tuples that have nil in their first and third components.
Give an inductive definition of a predicate tree(E, τ ), for τ ∈ Mtree. Hint: use a
points-to assertion of the form E7→[l: x, d: y, r: z] where d refers to the data, or atom,
field. Define the CopyTree and DispTree procedures for this sort of tree, and give
proof-by-execution style arguments for their correctness.

3.3. Proof Rules for Programs.

The proof rules for procedure calls, sequencing, conditionals and loops are the same as
in standard Hoare logic [42,43]. Here I concentrate on the rules for primitive commands
for accessing the heap, and the surrounding rules, called the ‘structural rules’. (If you are
unfamiliar with Hoare logic probably the best way to learn is to go directly to the early
sources, such as [42,44,43,37,27], which are pleasantly simple and easy to read.)
We will use the following abbreviations:


E7→F0 , ..., Fn = (E7→F0 ) ∗ · · · ∗ (E + n7→Fn )
. ∆
E=F = (E = F ) ∧ emp

E7→− = ∃y.E7→y (y 6∈ Free(E))

where Free(E) is the set of free variables in E.


We have axioms for each of four atomic commands. In the axioms x, m, n are as-
sumed to be distinct variables.
T HE S MALL A XIOMS
{E7→−} [E] := F {E7→F }
{E7→−} free(E) {emp}
.
{x = m}x := cons(E1 , ..., Ek ){x7→E1 [m/x], ..., Ek [m/x] }
. .
{x = n} x := E {x = (E[n/x])}
{E7→n ∧ x = m} x := [E] {x = n ∧ E[m/x]7→n}

The first small axiom just says that if E points to something beforehand (so it is in
the domain of the heaplet), then it points to F afterwards, and it says this for a small
portion of the state (heaplet) in which E is the only active cell. This corresponds to
the operational idea of [E] := F as a command that stores the value of F at address
E in the heap. The other commands have similar explanations. Notice that each axiom
mentions only the cells accessed or allocated: the axioms talk only about footprints,
and not the entire global program state. We only get fixed-length allocation from x :=
cons(E1 , ..., Ek ). but it is also possible to axiomatize a command x := alloc(E) that
allocates a block of length E.
Notice that our axioms allow us to free any cell that is allocated, even from the mid-
dle of a block given by cons. This is different from the situation in the C programming
language, where you are only supposed to free an entire block that has been allocated by
malloc(). An elegant treatment of this problem has been given using predicate variables
in [66].
The assignment statement x := E is for a variable x and heap-independent arith-
metic expression E. Thus, this statement accesses and alters the store, but not the heap. It
is the assignment statement considered by Hoare in his original system [42]. In contrast,
the form [E] := F alters the heap but not the store.
To go along with the small axioms we have additional surrounding rules.

T HE S TRUCTURAL RULES
Frame Rule
{P }C{Q}
Modifies(C) ∩ Free(R) = ∅
{P ∗ R}C{Q ∗ R}

Auxiliary Variable Elimination


{P } C {Q}
x 6∈ Free(C)
{∃x.P } C {∃x.Q}
Variable Substitution
{P } C {Q} {x1 , ..., xk } ⊇ Free(P, C, Q), and
xi ∈ Modifies(C) implies
({P } C {Q})[E1 /x1 , ..., Ek /xk ] Ei is a variable not free in any other Ej
Rule of Consequence
P0 ⇒ P {P } C {Q} Q ⇒ Q0
{P 0 } C {Q0 }
Modifies(C) here is the set of variables that are assigned to within C. The Modifies set
of each of x := cons(E1 , ..., Ek ), x := E and x := [E] is {x}, while for free(E) and
[E] := F it is empty. Note that the Modifies set only tracks potential alterations to the
store, and says nothing about the heap cells that might be modified.
Two of these rules we have already seen: the frame and consequence rules. The
others are rules that have been considered in the Hoare logic literature. This collection
of axioms and rules is complete in the sense that all true Hoare triples for the basic
statements can be derived them (assuming an oracle for implication in the consequence
rule). A proof of this fact is contained in Hongseok Yang’s thesis [84] (in fact, Yang
chose the existential and substitution structural rules precisely in order to make the small
axioms complete).
This presentation of the proof system above is from [61]. In his LICS’02 paper
[74] Reynolds gives a comprehensive description of a variety of axioms, in local (small)
and global and backwards forms, for the various atomic commands. The additional laws
are important because one prefers to have derived laws that can be applied at once in
common situations without going back to the small axioms every time and invoking the
structural rules extensively.
For example, it follows from Yang’s results that Hoare’s assignment axiom

{P [E/x]} x := E {P }

can be derived, where x := E is the assignment statement that is heap independent. One
can also derive Floyd’s forwards-running axiom [36]

{P } x := E {∃x0 . x = E[x0 /x] ∧ P [x0 /x]}

where the existentially quantified variable x0 (which must be fresh) provides a way to
talk about x’s value in the pre-state. The symbolic execution rules in S MALLFOOT and
related tools use forwards-running rules of this variety (Section 4.2).
As an example derived rule for a heap-accessing command, with the frame rule and
auxiliary variable elimination one can obtain an axiom from [73]

{∃x1 , · · · , xn . (E7→−) ∗ R} [E] := F {∃x1 , · · · , xn . (E7→F ) ∗ R}


(where x1 , ..., xn 6∈ Free(E, F ).)

that will be useful when defining symbolic execution later.


The −∗ connective has not often been used in proofs of particular programs (some
examples are in [84,66,32]). But it is a handy thing to have when doing metatheoretic
reasoning about a system [47,86,67,17]. The de Morgan dual ¬(P −∗ ¬Q) (called ‘sep-
traction’ in [81]) has played a central role in the formulation of a logic marrying sepa-
ration logic and the rely-guarantee method for concurrent programs, and it is used in an
automated tool based on the marriage logic [19].
An example metatheoretic use of −∗ is in proving completeness results. For example,
the following derivation

{E7→−} [E] := F {E7→F }


Frame
{(E7→−) ∗ ((E7→F )−∗ Q)} [E] := F {(E7→F ) ∗ ((E7→F )−∗ Q)}
Consequence
{(E7→−) ∗ ((E7→F )−∗ Q)} [E] := F {Q}
gives us general precondition for any postcondition Q, and this is key to showing that the
small axiom for mutation is not missing anything.

Exercise 7 Go back over the proof-by-execution style arguments you gave in the previous
exercises, and convince yourself that you can formalize them in the proof system given
in this section. You will probably want to use derived laws for each of the basic program
statements. In such proofs you get to use the semantics as an oracle when deciding the
implication statements in the rule of consequence.

Exercise 8 Formulate an operational semantics of [E] := F in terms of stores and


heaps. I.e., say when [E] := F, s, h evaluates to s, h0 . Don’t use separation logic in this
formulation.
For a given set of states Q, say what it means to be the weakest precondition of
[E] := F with postcondition Q.
Finally, prove (in math, not in logic) that (E7→−) ∗ ((E7→F )−∗ Q) expresses the
weakest precondtion.

Aside: On Variable Conditions and store-vs-environment. In Section 2 I skated over


the issue of Modifies sets, not mentioning them when introducing the frame rule. Con-
ditions involving Modifies sets are inelegant, and are all the more irritating because they
arise from a deliberate punning in Hoare logic between store and environment, which is
uncommon in programming languages.
At the birth of program semantics, in one of the founding papers of the field, Stra-
chey advised to distinguish the environment (association of variables to values), which
can be altered by variable binding in a way that obeys a stack discipline, from the store
(association of values to locations), which can be mutated by assignment [78]. Program-
ming languages from C to ML to Java observe Strachey’s distinction. The benefit from
conflating the ideas is that one gets beautifully simple specifications and proofs of simple
example programs in Hoare logic, or in Dijkstra’s wp calculus: it leads to neater (shorter)
examples to illustrate ideas, so in that sense the pun was worth it. I persist with the pun in
these lectures for the same reason. But, researchers are more and more avoiding conflat-
ing store and environment in working out their theories, and proof tools for C and Java
do not need to worry about Modifies sets. See [65] for further discussion.

3.4. Tight Specifications

The issues related to frame axioms that we discussed in Section 2.2 go a long way back,
to the beginning work on logic in AI [57]. Fundamentally, the reason why AI issues are
relevant to program logic is just that programmers describe their code in a way that cor-
responds to a commonsense reading of specifications, where much is left unsaid. Practi-
cally, if we do not employ some kind of solution to the AI problems, then specifications
quickly become extremely complicated [11].
Some people think that the real problem is in a way negative in nature, is to avoid
writing nasty frame axioms like we did in the ‘back in the day’ discussion in Section
2.2. Other people think the problem is just to have succinct specs, however one gets
them. I have always thought both of these, succinct specs and avoiding writing frame
axioms, should be a consequence of a solution, but are not themselves the problem. My
approach to this issue has always been to embrace the ‘commonsense reasoning’ aspect
first, and for this the idea of a ‘tight specification’ is crucial: the idea is that if you don’t
say that something changes, then it doesn’t. For example, if you say that a robot moves
block A from position 1 to position 2, then the commonsense reading is that you are
implicitly saying as well that this action does not change the position of a separate block
B (unless, perhaps, block B is on top of block A). Programmers’ informal descriptions
of their code are similar. In the java.util List interface the description of the copy method
is just that it ‘copies all of the elements from one list into another’. There is no mention
of frames in the description: the description carries the understanding that the frame
remains unchanged. The need to describe the frames explicitly in some formalisms is
just an artefact, which programmers do not find necessary when talking about their code
(because of this commonsense reasoning that they employ).
Be that as it may, formalization of the notion of tight specification proved to be
surprisingly difficult, and in AI there have been many elaborate theories advanced to try
to capture this notion – circumscription, default logic, nonmonotonic logic, and more
– far too many to give a proper account of here. Without claiming to be able to solve
the general AI problem, this section explains how an old idea in program logic, when
connected to the principle of local reasoning (that you only need to talk about the cells a
program touches), gives a powerful and yet very simple approach to tight specifications.
The old idea is of fault-avoiding specifications. To formulate this, let us suppose that
we have a semantics of commands where C, σ ;∗ σ 0 indicates that there is a terminating
computation of command C from state σ to state σ 0 . In the RAM model σ can be a
pair of a store and a heap, but the notion can be formulated at a more general level than
this particular model. Additionally, we require a judgement form C, σ ;∗ fault. In
the RAM model, fault can be taken to indicate a memory fault: a dereferencing of a
dangling pointer or a double-free. Again, more generally, fault can be used for other
sorts of errors.
Here, then, is a fault-avoiding semantics of triples, where for generality we are view-
ing the preconditions and postconditions as sets of states rather than as formulae written
in some particular assertion language.
Faut-Avoiding Partial Correctness
{A} C {B} holds iff ∀σ ∈ A
1. no faults: C, σ 6;∗ fault
2. partial correctness: C, σ ;∗ σ 0 implies σ 0 ∈ B.
The ‘no faults’ clause is a reasonable thing to have as a way for proven programs to avoid
errors, and was used as far back as Hoare and Wirth’s axiomatic semantics of Pascal in
1973 [45]. Notice that the small axioms given above are already in a form compatible
with the fault-avoiding semantics. For instance, in the axiom

{E7→−} [E] := F {E7→F }

the E7→− in the precondition ensures that E is not a dangling pointer, and so [E] := F
will not memory fault.
Remarkably, besides ensuring that well-specified programs avoid certain errors, it
was realized much later [47] that the fault-avoiding interpretation gives us an approach
to tight specifications. The key point is a consequence of the ‘no faults’ clause: touching
any cells not known to be allocated in the precondition falsifies the triple, so any cells
not ‘mentioned’ (known to be allocated) in the pre will remain unchanged. To see why,
suppose I tell you
{10 7→ −} C {10 7→ 25}
but I don’t tell you what C is. Then I claim C cannot change location 11 if it happens to
be allocated in the pre-state (when 10 is also allocated). For, if C changed location 11, it
would have to access location 11, and this would lead to fault when starting in a state
where 10 is allocated and 11 is not. That would falsify the triple (no error clause). As a
consequence we obtain that
{107→ − ∗117→4} C {107→25 ∗ 117→4}
should hold.
This reasoning is the basis for the frame rule. But the semantic fact that location 11
doesn’t change is completely independent of separation logic. In fact, we could state a
similar conclusion without mentioning ∗ at all
{10,→ − ∧11,→4} C {10,→25 ∧ 11,→4}
Separation logic, and the frame rule, only give you a convenient way to exploit the tight-
ness (that things don’t change if you don’t mention them) in the fault-avoiding interpre-
tation. This tightness phenomenon is in a sense at a more fundamental level, prior to
logic.
It is useful to consider that for this approach to tight specifications to work fault
does not literally need to indicate memory fault, and it is not necessary to use a low-level
memory model. For instance, we can put a notion of ‘accesses’ or ‘ownership’ in a model,
and then when the program strays beyond what is owned we declare a specification false:
then, the same argument as above lets us conclude that certain cells do not change. This
is the idea used in implicit dynamic frames [77], and in separation logics for garbage-
collected languages like Java where there are no memory faults (e.g., [66]). Alternate
approaches may be found in [4,3,49].
I have tried to explain the basis for tight specifications above in a semi-formal way.
But, the reader might have noticed that there were some unstated assumptions behind my
arguments. One can imagine mathematical relations on states and fault that contradict
our conclusion that 11 will remain unchanged. One such relation is as follows: if the
input heap is a singleton, it sets the contents of the only allocated location to be 25, and
otherwise sets all allocated locations in the input heap to have contents 50. This is not
a program that you can write in C, but it shows that that there are locality properties
of the semantics of programs at work behind the tight interpretation of triples, and it is
important theoretically to set these conditions down precisely; see [86,18,72].

Exercise 9 Without saying what the commands C are, and ignoring the store component
(i.e., think about heap only), formulate sufficient conditions on the relations C, σ ;∗ σ 0
and C, σ ;∗ fault which make the frame rule valid according to fault-avoiding partial
correctness. Give a proof of the validity of the frame rule from these conditions.
Are your conditions necessary as well as sufficient?
4. Symbolic Heaps, Symbolic Execution and Abstract Interpretation

In the previous sections I emphasized an informal view of program proof as a form


of symbolic execution. That is the view implemented in a number of verification and
analysis tools based on separation logic, beginning with S MALLFOOT [7]. In this section
I describe the foundations of this approach, and give a short introduction to its extension
to program analysis (where abstraction is used to calculate loop invariants).

4.1. Symbolic Heaps

When designing an automatic program verification tool there are almost always compro-
mises to be made, forced by the constraints of recursive undecidability of so many ques-
tions about logics and programs. The first tools based on separation logic chose to restrict
attention to a certain format of assertions which made three tasks easier than they might
otherwise have been: symbolic execution, entailment checking, and frame inference.
Symbolic heaps [6,30] are formulae of the form

~
∃X.(P 1 ∧ · · · Pn ) ∧ (S1 ∗ · · · ∗ Sm )

where the Pi and Sj are primitive pure and spatial predicates, and X~ is a vector of logical
variables (variables not used in programs). We understand the nullary conjunction of Pi ’s
as true and the nullary ∗-conjunction of Si ’s as emp. The special form of symbolic heaps
does not allow, for instance, nesting of ∗ and ∧, or boolean negation ¬ around ∗, or the
separating implication −∗ . This special form was chosen, originally, to match the usage
of separation logic in a number of by-hand proofs that had been done. The form does not
cover all proofs, such as Yang’s proof of the Schorr-Waite algorithm [84], so there are
immediately-known limitations.
The grammar for symbolic heaps can be instantiated with different sets of basic pure
and spatial predicates. Pure formulae are heap-independent, and describe properties of
variables only, where the spatial formulae specify properties of the heap. One instantia-
tion is as follows
S IMPLE L ISTS I NSTANTIATION

P ::= E=E | E6=E |


S ::= E7→E | lsne(E, E) | true

Expressions E include program variables x, logical variables X, or constants κ (e.g.,


nil ). Here, the points-to predicate x7→y denotes a heap with a single allocated cell at ad-
dress x with content y, and lsne(x, y) denotes a nonempty list segment from x to y. This
is the list segment predicate used in the paper [30] on program analysis, which described
the analysis that we call BABY S PACE I NVADER (the grown up version is represented in
[5,85]). In contrast, the ‘possibly empty list segments’ predicate ls we described before
was used in S MALLFOOT. It turns out that there is no one best predicate. In a practical
program analysis tool, it is helpful to keep both forms of segment ls and lsne in the
assertion language, even though they can be expressed in terms of one another and dis-
junction: keeping distinct predicates for empty and nonempty list segments in a language
provides a means to help limit the number of disjuncts that need to be considered by the
program analysis, a key issue in dealing with state-space explosion [85]. Some tools even
prefer the imprecise list segment predicate ils from Exercise 4, to make the abstraction
or widening step in an abstract interpreter easier to design.
There are many other instantiations that one can consider. One instantiation keeps P
the same and replaces simple linked-lists by a higher-order variant which allows lists to
be nested [5]. Varieties of trees, possibly with back pointers, have been considered [20].
As have predicates that track arithmetic information or the contents of data structures
[8,51,80]. Very complicated abstract domains are needed to cope with the complicated
data structures occurring in real-world programs. But in these lectures we will stick with
the simple lists, for simplicity of presentation.
C ONVENTIONS . We observe the following conventions. In writing a symbolic heap
we omit the leading ∃X, ~ understanding that the logical variables X are implicitly ex-
istentially quantified. Also, we overload the ∗ operator, so that it also works for entire
symbolic heaps H and not only the components.

(P1 ∧ · · · Pn ) ∧ (S1 ∗ · · · ∗ Sm ) ∗ (P10 ∧ · · · Pn0 0 ) ∧ (S10 ∗ · · · ∗ Sm


0

0 ))

= (P1 ∧ · · · Pn ∧ P10 ∧ · · · Pn0 0 ) ∧ (S1 ∗ · · · ∗ Sm ∗ S10 ∗ · · · ∗ Sm
0
0)

4.2. Symbolic Execution

The symbolic execution semantics H, A =⇒ H 0 takes a symbolic heap H and an atomic


command, and transforms it into an output symbolic heap or fault. In these rules we
require that the logical variables X, Y be fresh.

S YMBOLIC E XECUTION RULES

H x := E =⇒ x = E[X/x] ∧ H[X/x]
H ∗ E7→F x := [E] =⇒ x = F [X/x] ∧ H ∗ E7→F )[X/x]
H ∗ E7→F [E] := G =⇒ H ∗ E7→G
H x := cons(−) =⇒ H[X/x] ∗ x7→X
H ∗ E7→F free(E) =⇒ H

With the convention that the logical variables are implicitly existentially quantified, the
first rule is just a restating of Floyd’s axiom for assignment. The other rules can be
obtained from the small axioms of Section 3 by applications of the structural rules.
The rules for x := [E], [E] := G and [E] := G all assume that we have E7→F
explicitly in the precondition. In some cases, this knowledge that E points to some-
thing will be somewhat less explicit, as in the symbolic heap E = E 0 ∧ E 0 7→F . Then,
a simple amount of logical reasoning can convert this formula to the equivalent form
E = E 0 ∧ E7→F , which is now ready for an execution step. In another case, lsne(E, F ),
we might have to unroll the inductive definition to reveal the 7→. In general, for any of
these heap-accessing forms, we need to massage a symbolic heap to ‘make E7→ explicit’.
Here are sample rules for doing this massaging.
R EARRANGEMENT RULES

A(E) ::= [E] := G | [E] := G | [E] := G


P (E, F ) ::= E7→F | lsne(E, F )

H0 ∗ P (E, G), A(E) =⇒ H1


H0 ` E = F
H0 ∗ P (F, G), A(E) =⇒ H1

H0 ∗ E7→X ∗ lsne(X, G), A(E) =⇒ H1


H0 ∗ lsne(E, G), A(E) =⇒ H1

H0 ∗ E7→F, A(E) =⇒ H1
H0 ∗ lsne(E, F ), A(E) =⇒ H1

H 6` Allocated(E)
H, A(E) =⇒ fault

In these rules we referred to a notion of entailment ` that will be discussed in Section


4.4. Allocated(E) can be represented by the assertion E7→X ∗ true where X is fresh.
[Aside: This rearrangement notion is related to the partial concretization operation
used in shape analysis [75,76], where one concretizes just enough of an abstract value
so that the concrete program semantics can be applied. Rearrangement is also a special
case of the concept of frame inference discussed later in Section 4.5.]
It is a good idea, and good for practice, for you to become familiar with the different
variations on list segments.

Exercise 10 Without looking at any of the papers referenced in this section...


1. Give an inductive definition of the predicate for necessarily non-empty list seg-
ments lsne, corresponding to the rearrangement rules above.
2. Give rearrangement rules that would be appropriate for the earlier definition of
possibly empty list segments, ls.
3. Consider the formulae

ls(x, y) ∗ ls(x, z) and lsne(x, y) ∗ lsne(x, z)

Is either formula satisfiable? Might this affect any of the steps in symbolic exe-
cution?
4. (Advanced) Write an inductive definition for a predicate that describes doubly-
linked list segments. It should have four arguments. Be careful about the base
case.
Write rearrangement rules for this doubly-linked list predicate.

4.3. Recipe for Cooking a Verifier

Using symbolic execution, it is possible to construct an automatic verification tool as


follows. The input to the tool is a while program with heap-manipulating primitives as
in the previous section. The program must be annotated with loop invariants and a pre-
condition and a postcondition. The S MALLFOOT list_append program from Section
3.2 is of this form. The usual rules of Hoare logic for loops and conditionals then enable
us to chop up the correctness of such a program into a number of questions of the form
{H} c1 ; ...; cn {H 0 } for atomic commands ci . If we can verify that each of these straight-
line specifications {H} c1 ; ...; cn {H 0 } is true then we can conclude that the beginning
program satisfies its pre/post spec.
In many verification tools the straightline specs {H} c1 ; ...; cn {H 0 } are decided by
using a weakest preconditon calculation to obtain a formula wp(c1 ; ...; cn , H 0 ) and then
asking a theorem prover if H ⇒ wp(c1 ; ...; cn , H 0 ). Or, a strongest postcondition could
be used.
The approach used often in separation logic tools is something like the strongest
post calculation, except that lots of subsidiary calls are made to a theorem prover along
the way. To decide {H} c1 ; ...; cn {H 0 } we first ask the theorem prover if H is inconsis-
tent. If it is, we are done (the spec is true). Second, if c1 , ..., cn is the empty sequence
we ask a prover if H ` H 0 . Otherwise, we apply symbolic execution the first statement
c1 , and this gives us fault or several symbolic heaps (several because there is nonde-
terminism in rearrangement, and since some basic commands in a real language might
have disjunctions in their postconditions. (Why might malloc() have a disjunction
as its post?). A theorem prover is consulted in the rearrangement phase here. If fault
resulted from symbolic execution then were are done (the spec is false). If, instead, ex-
ecution yields several heaps H1 , ..., Hm then we return the conjunction of the smaller
questions {Hi } ..; cn {H 0 }. This last case essentially relies on the rule

{P1 } C {Q} · · · {Pn } C {Q}


{P1 ∨ · · · ∨ Pn } C {Q}

of Hoare logic.
This verification strategy relies on having a theorem prover to answer entailment
questions H ` H 0 . A straightforward embedding of separation logic into a classical
logic, where one writes the semantics in the target logic (e.g., ‘∃σ1 σ2 .σ = σ1 • σ2 ...’),
has not yet yielded an effective prover, because it introduces existential quantifiers to give
the semantics of ∗. Therefore, proof tools for separation logic has used dedicated proof
procedures, built from the proof rules of the logic. (Work is underway on more nuanced
interpretations into existing provers that do more than a direct semantic embedding.)

4.4. Proof Procedures for Entailment

An approach to proving symbolic heaps was pioneered by Josh Berdine and Cristiano
Calcagno [6]. Their approach revolves around proof rules for abstraction and subtraction.
A sample abstraction rule is

ls(x, t) ∗ list(t) ` list(x)

where the subtraction rule is

Q1 ` Q2
Q1 ∗ S ` Q2 ∗ S
Their basic idea is to try to reduce an entailment to an axiom B ∧ emp ` true ∧ emp by
successively applying abstraction rules, and Subtracting when possible. The basic idea
can be appreciated by considering two examples.
First, a successful example:


emp ` emp Axiom!
list(x) ` list(x) Subtract
ls(x, t) ∗ list(t) ` list(x) Abstract (Inductive)
ls(x, t) ∗ t7→y ∗ list(y) ` list(x) Abstract (Roll)

The entailment on the bottom is the one we needed to prove at the end of the
list_append procedure from Section 3.2. The first step, going upward, is a simple
rolling up of an inductive definition. The second step is more serious: it is one that we
would use induction in the metalanguage to justify. We then get to a position where we
can apply the subtraction rule, and this gets us back to a basic axiom.
For an unsuccessful example


list(y) ` emp Junk: Not Axiom!
list(x) ∗ list(y) ` list(x) Subtract
ls(x, t) ∗ t7→nil ∗ list(y) ` list(x) Abstract (Inductive)

The last line is an entailment that S MALLFOOT would attempt to prove if the statement
t->tl = y at the end of the list_append program were replaced by t->tl =
nil. There we do an abstraction followed by a subtraction and we get to a position
where we cannot reduce further. Rightly, we cannot prove this entailment.
The detailed design and theoretical analysis of a proof theory based on these ideas
is nontrivial. For the specific case of singly-linked list segments, Berdine and Calcagno
were able to formulate a complete and terminating proof theory. There is no space to go
into all the details of their theory, but it is worth listing their abstraction rules, presented
here as entailments.

Rolling
emp ` ls(E, E)
E1 6= E3 ∧ E1 7→E2 ∗ ls(E2 , E3 ) ` ls(E1 , E3 )

Induction Avoidance
ls(E1 , E2 ) ∗ ls(E2 , nil ) ` ls(E1 , nil )
ls(E1 , E2 ) ∗ E2 7→nil ` ls(E1 , nil )
ls(E1 , E2 ) ∗ ls(E2 , E3 ) ∗ E3 7→E4 ` ls(E1 , E3 ) ∗ E3 7→E4
E3 6= E4 ∧ ls(E1 , E2 ) ∗ ls(E2 , E3 ) ∗ ls(E3 , E4 )
` ls(E1 , E3 ) ∗ ls(E3 , E4 )
The remarkable thing about these abstraction rules is not that they are sound, but in a
sense complete: any true fact about list segments and points-to facts that can be expressed
in symbolic heap form can be proven using these axioms, without appealing to an explicit
induction axiom or rule. The Berdine/Calcagno proof theory works by using these rules
on the left (in effect employing a special case of the Cut rule of sequent calculus). It has
other rules as well, such as for inferring x 6= nil from x7→−: at every stage, their decision
procedure records as many pure disequalities as possible on the left, and it substitutes
out all equalities, getting to a kind of normal form. It is this normal form that makes the
subtraction rule complete (a two-way inference rule).
Note: in this subsection we have gone back to the ls rather than lsne predicate, as
Berdine and Calcagno formulated their rules for ls. In fact, it is easier to design a com-
plete proof theory for lsne rather than ls. It is also relatively easy (and was folklore
knowledge) to see that entailment for the lsne symbolic heaps can be decided in poly-
time, but the question for the ls remained open until a recent paper which showed the
entailment is indeed in polytime in the original problem [23]. (The reason for subtlety in
this question is related to question 3 of Exercise 10; you might go back there and wonder
about it.)
I like to call this approach of combining abstraction and subtraction rules the
‘crunch, crunch’ method. It works by taking a sequent H ` H 0 and applying abstraction
and subtraction rules to crunch it down to a smaller size by removing ∗-conjuncts, until
you get emp as the spatial part on one side or the other of `. If you have emp on only one
side, you have a failed proof. If you have other pure facts, of the form Π∧emp ` Π0 ∧emp
you can then ask a straight classical logic question Π ` Π0 . The final check, Π ` Π0 ,
is a place where one could call an external theorem prover, say for a decidable theory
such as linear arithmetic, and that is all the more useful when the pure part can contain a
richer variety of assertions than in the simple fragment considered in this section. Indeed,
there have been a number of provers for separation logic developed that use variations on
this ‘crunch, crunch’ approach together with an external classical logic solver, including
[13,68] and the provers inside V ERIFAST [48], J S TAR [31], HIP [60] and SLAYER [9].

4.5. Frame Inference

Entailment is a standard problem for verifiers to face. In work applying separation logic,
a pivotal development has been identification of the notion of frame inference, which is
an extension of the entailment question:
In a frame inference question of the form

A ` B ∗ ?frame

the task is, given A and B, to find a formula ?frame which makes the entailment
valid.
Frame inference gives a way to find the ‘leftover’ portions of heap needed to automati-
cally apply the frame rule in program proofs. This extended entailment capability is used
at procedure call sites, where A is an assertion at the call site and B a precondition from
a procedure’s specification.
A first solution to frame inference was sketched in [6] and implemented in the
S MALLFOOT tool. The S MALLFOOT approach works by using information from failed
proofs of the standard entailment question A ` B. Essentially, a failed proof of the form

F ` emp
..
.
A`B

tells us that F is a frame. For, from such a failed proof we can form a proof

F `F
F ` emp ∗ F
..
.
A`B∗F

by tacking ∗F on the right everywhere in the failed proof. So, the frame inferring pro-
cedure is to go upwards using the ‘crunch, crunch’ proof search method until you can
go no further: if your attempted proof is of the form indicated above, it can tell you a
frame. (Dealing with multiple branches in proofs requires some more subtlety than this
description indicates.)
Frame inference is a workhorse of separation logic verification tools. As you can
imagine from the discussion surrounding Disptree in Section 2, it is used at procedure
call sites to identify the part of a symbolic heap that is not not touched by a procedure.
Interprocedural program analysis tools typically use (often incomplete) implementations
of frame inference for reasoning with ‘procedure summaries’ [39,59]. In S MALLFOOT,
proof rules for critical regions in concurrent programs are verified using little phantom
procedures (called ‘specification statements’) with specs of the form {emp} − {R} and
{R} − {emp} for materializing and annihilating portions of storage protected by a lock.
Indeed, if one has a good enough frame inference capacity, then symbolic execution can
be seen to be a special case of a more general scheme, where basic commands are treated
as specification statements (the small axioms), and frame inference is used in place of the
special concept of rearrangement. S MALLFOOT and S PACE I NVADER did not follow this
idealistic approach, preferring to optimize for the common case of the basic statements,
but the more recent J S TAR and SLAYER are examples of tools that call a frame inferring
theorem prover at every step of symbolic execution [31,9].

4.6. A Taste of Abstract Interpretation

Beginning in 2006 [30,52], a significant amount of work has been done on the use of
separation logic in automatic program analysis. There are a number of academic tools,
including S PACE I NVADER [5,85], T HOR [53], X ISA [20], F ORRESTER [41], P REDATOR
[33], S MALLFOOT RG [19], H EAP -H OP [83] and J S TAR [31], and the industrial tools
I NFER from Monoidics [16] and SLAYER from Microsoft [9]. The tools in this area are
a new breed of shape analysis, which attempt to discover the shapes of data structures
(e.g., whether a list is cyclic or acyclic) in a program [75]. These tools cannot prove
functional correctness, but can be applied to code in the thousands or even millions of
LOC [85,17].
The general context for this use of separation logic concerns the relation between
program logic and program analysis. It has been known since the work of the Cousots
in the 1970s [24,25] that concepts from Hoare logic and static program analysis are
related. In principle, static analysis can be used to calculate loop invariants and procedure
specifications via fixed-point computations, thereby lessening annotation burden. There
is a price to pay in that trying to be completely automatic in this way almost forces one
to step away from the ideal of proving full functional correctness.
While the relation between analysis and verification has been long known in prin-
ciple, the last decade has seen a surge of interest in verification-by-static-analysis, with
practical demonstrations of its potential such as in SLAM’s application of proof tech-
nology to Microsoft device drivers [2] and ASTRÉE’s proof of the absence of run-time
errors in Airbus code [26]. Separation logic enters the picture because these practical
tools for verification-oriented static analysis ignore pointer-based data structures, or use
coarse models that are insufficient to prove basic properties of them; e.g., SLAM assumes
memory safety, and ASTRÉE works only on input programs that do not use dynamic
allocation. Similar remarks apply to other tools such as BLAST, Magic and others. Data
structures present a significant problem in verification-oriented program analysis, and
that is the point that the separation logic program analyses are trying to address.
This section illustrates the ideas in the abstractions used in separation logic program
analyzers. To begin, suppose you were to continually symbolically execute a program
with a while loop. You collect sets of formulae (abstract states) at program points, and
generate new ones by symbolically executing program statements. The immediate prob-
lem is that you would go on generating symbolic heaps on loop iterations, and the pro-
cess could diverge: you would never stop generating new symbolic heaps. The most basic
idea of program analysis is to use abstraction, the losing of information, to ensure that
such a process terminates.
Consider the following program that creates a linked list of indeterminate length.
{P re : emp}
x := nil;
while (nondet()){
y := cons(-);
y→tl := x;
x := y;
}
Suppose we start symbolically executing the program from pre-state emp. On the first
iteration, at the program point immediately inside the loop, we will certainly have that
x = nil ∧ emp is true, so let us record this in
Loop Invariant so far (1st iteration)
x = nil ∧ emp.
Now, if we go around the loop once more, then it is clear that x 7→ nil will be true at the
same program point, so let us add that to our calculation.
Loop Invariant so far (2nd iteration)
(x = nil ∧ emp) ∨ (x 7→ nil ).
At the next step we get
Loop Invariant so far (3rd iteration)
(x = nil ∧ emp) ∨ (x 7→ nil ) ∨ (x7→X ∗ X7→nil )
because we put another element on the front of the list. If we keep going this way, we
will get lists of length 3, 4 and so on: infinite regress. However, before we go around the
loop again, we might employ abstraction, to conclude that we have a list segment. That
is, we use the entailment

x7→X ∗ X7→nil ` lsne(x, nil )

on the third disjunct, giving us


Loop Invariant so far (3rd iteration after abstraction)
(x = nil ∧ emp) ∨ (x 7→ nil ) ∨ lsne(x, nil )
Loss of information has occurred because in the third disjunct we have forgotten that
the list from x is of length precisely 2. And, by this step we have taken ourselves to
a situation where the assertion describes finitely many heaps (up to isomorphism), to
an assertion that describes infinitely many concrete heaps: combining abstraction with
symbolic execution allows us to cover a great many more heaps.
Now, if we go around the loop again, we obtain
Loop Invariant so far (4th iteration before abstraction)
(x = nil ∧ emp) ∨ (x 7→ nil ) ∨ (x7→X ∗ lsne(X, nil ))
and we can apply a Berdine/Calcagno abstraction rule

x7→X ∗ lsne(X, nil ) ` lsne(x, nil )

to obtain
Loop Invariant so far (4th iteration after abstraction)
(x = nil ∧ emp) ∨ (x 7→ nil ) ∨ lsne(x, nil )
Lo and behold, what we have obtained on the 4th iteration after abstraction is the same
as the 3rd. We might as well stop now, as further execution of this variety will not give
us anything new: we have reached a fixed-point. As it happens, this loop invariant is also
the postcondition of the procedure in this example.
In this narrative the human (me) was the abstract interpreter, choosing when and
how to do abstraction. To implement a tool we need to make it systematic. In the
S PACE I NVADER breed of tools, this is done using rewrite rules that correspond to
Berdine/Calcagno abstraction rules for entailment described in Section 4.4. The abstract
interpreter is sound automatically because applying those rules to simplify formulae is
just using the Hoare rule of consequence on the right. The art is in not applying the
rules too often, which would make one lose too much information, sometimes resulting
in fault coming out of your abstract interpreter for perfectly safe problems (a ‘false
alarm’).
The way you set up a proof-theoretic abstract interpreter is as follows. In addition to
symbolic execution rules, there are abstraction rules which you apply periodically (say,
when going through loop iterations); this allows the exection process to saturate (find a
fixed-point). Here are some of the rules used in (baby) S PACE I NVADER [30].
(ρ, ρ0 range over lsne, 7→)
~
∃X.H ~
∗ ρ(x, Y ) ∗ ρ0 (Y, Z) −→ ∃X.H ∗ lsne(x, Z) where Y not free in H
~
∃X.H ~
∗ ρ(Y, Z) −→ ∃X.H ∗ true where Y not provably reachable
from program vars

The first rule says to forget about the length of uninterrupted list segments, where there
are no outside pointers (from H) into the internal point. The abstraction ‘gobbles up’
logical variables appearing in internal points of lists, by swallowing them into list seg-
ments, as long as these internal points are unshared. This is true of either free or bound
logical variables. The requirement that they not be shared is an accuracy rather than
soundness consideration; we stop the rules from firing too often, so as not to lose too
much information.
[A remark on terminology: What I am simply calling the ‘abstraction’ step is a
special case of what is called Widening in the abstract interpretation literature [24], and
a direct analogue of the ‘canonical abstraction’ used in 3-valued shape analysis [76].]

Exercise 11 Define a program that never faults, but for which the abstract semantics just
sketched returns fault.
Although I have not given a fully formal specification of the abstract interpreter
above, thinking about the nature of the list segment predicates and the restricted syntax
of symbolic heaps is one way to find such a program (e.g., if the program needs a loop
invariant that is not expressible with finitely many symbolic heaps).

This exercise actually concerns a general point in program analysis. If you have a ter-
minating analysis that is trying to solve an undecidable problem, it must necessarily be
possible to trick the analysis. Since most interesting questions about programs are un-
decidable, we must accept that any program analysis for these questions will have an
heuristic aspect in its design.

4.7. Contextual Remarks

This specific abstraction idea in the illustration in this section, to forget about the length
of uninterrupted list segments, is sometimes called ‘the Distefano abstraction’: it was
defined by Distefano in his PhD thesis [29]. The idea does not depend on separation
logic, and similar ideas have been used in other abstract domains, such as based on 3-
valued logic [54] or on graphs [55]. Once S MALLFOOT’s symbolic execution appeared,
it simply was relatively easy to port Distefano’s abstraction to separation logic, when
defining BABY S PACE I NVADER [30]. Around the same time, very similar ideas were
independently discovered by Magill et. al. [52].
These first abstract interpreters did not achieve a lot practically, but opened up
the possibility of exploring the use of separation logic in program analysis. A growing
amount of work has been going forward in a number of directions, an incomplete list of
which includes the following, which are good places to start for further reading.
1. The use of the frame rule and frame inference A ` B ∗ ?frame in interproce-
dural analysis [39];
2. The use of abductive inference A ∗ ?antiframe ` B to approximate foot-
prints, leading to a compositional analysis, and a boost to the level of automation
and scalability [17];
3. The use of a higher-order list segment notion to attack complicated data structures
in device drivers [5,85];
4. Analyses for concurrent programs [40];
5. Automatic parallelization [71,46];
6. Program-termination proving [8,51,14];
7. Analysis of data structures with sharing [21,50].
This last section has been but a sketch, and I have left out a lot of details. I will
possibly extend these notes in future to put more formalities into this last section. For
now I just point you to [30] for a mathematically thorough description of one abstract
interpreter based on separation logic.
The leading edge, as of 2011, of what can be achieved practically on real-world
code by these tools is probably represented by SLAYER [9] and I NFER [16], and can
be glimpsed by academic papers that fed into them [85,17]. However, there are some
areas (sharing, trees) where academic prototypes outperform them precision-wise, and
the leading edge in any case is moving quickly at this moment.

I have talked about automatic verification and analysis in these notes, but many of the
ideas – such as frame inference, symbolic execution, abstraction/subtraction-based proof
theory – are relevant as well in interactive proving. There have been several embeddings
of separation logic in higher-order logics used in interactive proof assistants (e.g., [56,
35,82,79,58,1]), where the proof theoretic or symbolic execution rules are derived as
lemmas. A recent paper [22] gives a good account of the state of the art and references
to the literature, as well as an explanation of expressivity limitations of approaches to
program verification based on automatic theorem proving for first-order logic.
It should be mentioned that there is no conflict here of having several logics (sepa-
ration, first-order, higher-order, etc.): there is no need to search for ‘the one true logic’.
In particular, even though they can be embedded in foundational higher-order logics,
special-purpose formalisms like separation and modal and temporal logics are useful for
identifying specification and reasoning idioms that make specifications and proofs easier
to find, for either the human or the machine.

References

[1] A.W. Appel. VeriSmall: Verified Smallfoot shape analysis. CPP 2011: First International Conference
on Certified Programs and Proofs, 2011.
[2] T. Ball, E. Bounimova, B. Cook, V. Levin, J. Lichtenberg, C. McGarvey, B. Ondrusek, S.K. Rajamani,
and A. Ustuner. Thorough static analysis of device drivers. In Proceedings of the 2006 EuroSys Confer-
ence, pages 73–85, 2006.
[3] A. Banerjee, D.A. Naumann, and S. Rosenberg. Regional logic for local reasoning about global invari-
ants. In 22nd ECOOP, Springer LNCS 5142, pages 387–411, 2008.
[4] M. Barnett, R. DeLine, M. Fahndrich, K.R.M. Leino, and W. Schulte. Verification of object-oriented
programs with invariants. Journal of Object Technology, 3(6):27–56, 2004.
[5] J. Berdine, C. Calcagno, B. Cook, D. Distefano, P. O’Hearn, T. Wies, and H. Yang. Shape analysis of
composite data structures. 19th CAV, 2007.
[6] J. Berdine, C. Calcagno, and P.W. O’Hearn. Symbolic execution with separation logic. In K. Yi, editor,
APLAS 2005, volume 3780 of LNCS, 2005.
[7] J. Berdine, C. Calcagno, and P.W. O’Hearn. Smallfoot: Automatic modular assertion checking with
separation logic. In 4th FMCO, pp115-137, 2006.
[8] J. Berdine, B. Cook, D. Distefano, and P. O’Hearn. Automatic termination proofs for programs with
shape-shifting heaps. In 18th CAV, Springer LNCS 4144, pages pp386–400, 2006.
[9] J. Berdine, B. Cook, and S. Ishtiaq. Slayer: Memory safety for systems-level code. In 23rd CAV, Springer
LNCS 6806, pages 178–183, 2011.
[10] B. Biering, L. Birkedal, and N. Torp-Smith. BI-hyperdoctrines, higher-order separation logic, and ab-
straction. ACM TOPLAS, 5(29), 2007.
[11] A. Borgida, J. Mylopoulos, and R. Reiter. On the frame problem in procedure specifications. IEEE
Transactions of Software Engineering, 21:809–838, 1995.
[12] R. Bornat. Proving pointer programs in Hoare logic. Mathematics of Program Construction, 2000.
[13] M. Botincan, M.J. Parkinson, and W. Schulte. Separation logic verification of C programs with an SMT
solver. Electr. Notes Theor. Comput. Sci., 254:5–23, 2009.
[14] J. Brotherston, R. Bornat, and C. Calcagno. Cyclic proofs of program termination in separation logic.
In 35th POPL, pages 101–112, 2008.
[15] R.M. Burstall. Some techniques for proving correctness of programs which alter data structures. Ma-
chine Intelligence, 7:23–50, 1972.
[16] C. Calcagno and D. Distefano. Infer: An automatic program verifier for memory safety of C programs.
In NASA Formal Methods Symposium, Springer LNCS 6617, pages 459–465, 2011.
[17] C. Calcagno, D. Distefano, P.W. O’Hearn, and H. Yang. Compositional shape analysis by means of
bi-abduction. Journal of the ACM 58(6). (Preliminary version appeared in POPL’09.), 2011.
[18] C. Calcagno, P. O’Hearn, and H. Yang. Local action and abstract separation logic. In 22nd LICS,
pp366-378, 2007.
[19] C. Calcagno, M.J. Parkinson, and V. Vafeiadis. Modular safety checking for fine-grained concurrency.
In 14th SAS, Springer LNCS 4634, pages 233–248, 2007.
[20] B. Chang and X. Rival. Relational inductive shape analysis. In 36th POPL, pages 247–260. ACM, 2008.
[21] R. Cherini, L. Rearte, and J.O. Blanco. A shape analysis for non-linear data structures. In 17th SAS,
Springer LNCS 6337, pages 201–217, 2010.
[22] A. Chlipala. Mostly-automated verification of low-level programs in computational separation logic. In
32nd PLDI, pages 234–245, 2011.
[23] B. Cook, C. Haase, J. Ouaknine, M.J. Parkinson, and J. Worrell. Tractable reasoning in a fragment of
separation logic. In 22nd CONCUR, Springer LNCS 6901, pages 235–249, 2011.
[24] P. Cousot and R. Cousot. Abstract interpretation: A unified lattice model for static analysis of programs
by construction or approximation of fixpoints. In 4th POPL, pp238-252, 1977.
[25] P. Cousot and R. Cousot. Systematic design of program analysis frameworks. 6th POPL, pp269-282,
1979.
[26] P. Cousot, R. Cousot, J. Feret, L. Mauborgne, A. Miné, D. Monniaux, and X. Rival. The ASTRÉE
analyzer. 14th ESOP, pp21-30, 2005.
[27] E.W. Dijkstra. A Discipline of Programming. Prentice Hall, 1976.
[28] T. Dinsdale-Young, P. Gardner, and M.J. Wheelhouse. Abstraction and refinement for local reasoning.
In 3rd VSTTE, Springer LNCS 6217, pages 199–215, 2010.
[29] D. Distefano. On model checking the dynamics of object-based software: a foundational approach. PhD
thesis, University of Twente, 2003.
[30] D. Distefano, P. O’Hearn, and H. Yang. A local shape analysis based on separation logic. In 12th
TACAS, 2006. pp287-302.
[31] D. Distefano and M. Parkinson. jStar: Towards Practical Verification for Java. In 23rd OOPSLA, pages
213–226, 2008.
[32] Mike Dodds, Suresh Jagannathan, and Matthew J. Parkinson. Modular reasoning for deterministic par-
allelism. In 38th POPL, pages 259–270, 2011.
[33] K. Dudka, P. Peringer, and T. Vojnar. Predator: A practical tool for checking manipulation of dynamic
data structures using separation logic. In 23rd CAV, Springer LNCS 6806, pages 372–378, 2011.
[34] X. Feng, R. Ferreira, and Z. Shao. On the relationship between concurrent separation logic and assume-
guarantee reasoning. In 16th ESOP, Springer LNCS 4421, 2007.
[35] X. Feng, Z. Shao, Y. Guo, and Y. Dong. Combining domain-specific and foundational logics to verify
complete software systems. In 2nd VSTTE, Springer LNCS 5295, pages 54–69, 2008.
[36] R.W. Floyd. Assigning meaning to programs. Proceedings of Symposium on Applied Mathematics, Vol.
19, J.T. Schwartz (Ed.), A.M.S., pp. 19Ð32, 1967.
[37] M. Foley and C.A.R. Hoare. Proof of a recursive program: Quicksort. Computer Journal, 14:391–395,
1971.
[38] P. Gardner, S. Maffeis, and G. Smith. Towards a program logic for Javascript. In 40th POPL. ACM,
2012.
[39] A. Gotsman, J. Berdine, and B. Cook. Interprocedural shape analysis with separated heap abstractions.
In 13th SAS,Springer LNCS 4134, pages 240–260, 2006.
[40] A. Gotsman, J. Berdine, B. Cook, and M. Sagiv. Mostly-automated verification of low-level programs
in computational separation logic. In 28th PLDI, pages 266–277, 2007.
[41] P. Habermehl, L. Holík, A. Rogalewicz, J. Simácek, and T. Vojnar. Forest automata for verification of
heap manipulation. In 23rd CAV, Springer LNCS 6806, 2011.
[42] C.A.R. Hoare. An axiomatic basis for computer programming. Comm. ACM, 12(10):576–580 and 583,
1969.
[43] C.A.R. Hoare. Procedures and parameters: An axiomatic approach. In E. Engler, editor, Symposium on
the Semantics of Algebraic Languages, pages 102–116. Springer, 1971. Lecture Notes in Math. 188.
[44] C.A.R. Hoare. Proof of a Program: FIND. Comm. ACM, 14(1):39–45, 1971.
[45] C.A.R. Hoare and N. Wirth. An axiomatic definition of the programming language Pascal. Acta Infor-
matica, 2:335–355, 1973.
[46] C. Hurlin. Automatic parallelization and optimization of programs by proof rewriting. In 16th SAS,
Springer LNCS 5673, pages 52–68, 2009.
[47] S. Isthiaq and P. W. O’Hearn. BI as an assertion language for mutable data structures. In 28th POPL,
pages 36–49, 2001.
[48] B. Jacobs, J. Smans, P. Philippaerts, F. Vogels, W. Penninckx, and F. Piessens. Verifast: A powerful,
sound, predictable, fast verifier for C and Java. In NASA Formal Methods Symposium, Springer LNCS
6617, pages 41–55, 2011.
[49] I.T. Kassios. The dynamic frames theory. Formal Asp. Comput., 23(3):267–288, 2011.
[50] O. Lee, H.Yang, and R. Petersen. Program analysis for overlaid data structures. In 23rd CAV, Springer
LNCS 6808, pages 592–608, 2011.
[51] S. Magill, J. Berdine, E.M. Clarke, and B. Cook. Arithmetic strengthening for shape analysis. In 14th
SAS, Springer LNCS 4634, pages 419–436, 2007.
[52] S. Magill, A. Nanevski, E. Clarke, and P. Lee. Inferring invariants in Separation Logic for imperative
list-processing programs. 3rd SPACE Workshop, 2006.
[53] S. Magill, M.-S. Tsai, P. Lee, and Y.-K. Tsay. THOR: A tool for reasoning about shape and arithmetic.
20th CAV, Springer LNCS 5123. pp 428-432, 2008.
[54] R. Manevich, E. Yahav, G. Ramalingam, and M. Sagiv. Predicate abstraction and canonical abstraction
for singly-linked lists. In 6th VMCAI, pages pp181–198, 2005.
[55] M. Marron, M.V. Hermenegildo, D. Kapur, and D. Stefanovic. Efficient context-sensitive shape analysis
with graph based heap models. In 17th CC, Springer LNCS 4959, pages 245–259, 2008.
[56] N. Marti and R. Affeldt. A certified verifier for a fragment of separation logic. Computer Software,
25(3):135-147, 2008.
[57] J. McCarthy and P. Hayes. Some philosophical problems from the standpoint of artificial intelligence.
Machine Intelligence, 4:463–502, 1969.
[58] A. Nanevski, V. Vafeiadis, and J. Berdine. Structuring the verification of heap-manipulating programs.
In 37th POPL, pages 261–274, 2010.
[59] H.H. Nguyen and W.-N. Chin. Enhancing program verification with lemmas. 20th CAV, Springer LNCS
5123. pp 355-369, 2008.
[60] H.H. Nguyen, C. David, S. Qin, and W.-Ngan Chin. Automated verification of shape and size properties
via separation logic. In 8th VMCAI, Springer LNCS 4349, pages 251–266, 2007.
[61] P. O’Hearn, J. Reynolds, and H. Yang. Local reasoning about programs that alter data structures. In
15th CSL, pp1-19, 2001.
[62] P. W. O’Hearn. Resources, concurrency and local reasoning. Theoretical Computer Science, 375(1-
3):271–307, 2007. (Preliminary version appeared in CONCUR’04, LNCS 3170, pp49-67).
[63] P. W. O’Hearn and D. J. Pym. The logic of bunched implications. Bulletin of Symbolic Logic, 5(2):215–
244, June 99.
[64] P.W. O’Hearn, H. Yang, and J.C. Reynolds. Separation and information hiding. ACM TOPLAS, 31(3),
2009.
[65] M. Parkinson, R. Bornat, and C. Calcagno. Variables as resource in Hoare logics. In 21st LICS, 2006.
[66] M. J. Parkinson. Local Reasoning for Java. Ph.D. thesis, University of Cambridge, 2005.
[67] M.J. Parkinson and A.J. Summers. The relationship between separation logic and implicit dynamic
frames. In 20th ESOP, Springer LNCS 6602, pages 439–458, 2011.
[68] J. A. Navarro Pérez and A. Rybalchenko. Separation logic + superposition calculus = heap theorem
prover. In 32nd PLDI, pages 556–566, 2011.
[69] D. Pym, P. O’Hearn, and H. Yang. Possible worlds and resources: the semantics of BI. Theoretical
Computer Science, 315(1):257–305, 2004.
[70] D.J. Pym. The Semantics and Proof Theory of the Logic of Bunched Implications. Applied Logic Series.
Kluwer Academic Publishers, 2002.
[71] M. Raza, C. Calcagno, and P. Gardner. Automatic parallelization with separation logic. In 18th ESOP,
Springer LNCS 5502, pages 348–362, 2009.
[72] M. Raza and P. Gardner. Footprints in local reasoning. Logical Methods in Computer Science, 5(2),
2009.
[73] J. C. Reynolds. Intuitionistic reasoning about shared mutable data structure. In Jim Davies, Bill Roscoe,
and Jim Woodcock, editors, Millennial Perspectives in Computer Science, pages 303–321, Houndsmill,
Hampshire, 2000. Palgrave.
[74] J. C. Reynolds. Separation logic: A logic for shared mutable data structures. In 17th LICS, pp55-74,
2002.
[75] M. Sagiv, T. Reps, and R. Wilhelm. Solving shape-analysis problems in languages with destructive
updating. ACM TOPLAS, 20(1):1–50, 1998.
[76] M. Sagiv, T. Reps, and R. Wilhelm. Parametric shape analysis via 3-valued logic. ACM TOPLAS,
24(3):217–298, 2002.
[77] J. Smans, B. Jacobs, and F. Piessens. Implicit dynamic frames: Combining dynamic frames and separa-
tion logic. In 23rd ECOOP, LNCS 5653, pages 148–172, 2009.
[78] C. Strachey. Towards a formal semantics. In T. B. Steel, Jr., editor, Formal Language Description
Languages for Computer Programming, Proceedings of the IFIP Working Conference, pages 198–220,
Baden bei Wien, Austria, September 1964. North-Holland, Amsterdam, 1966.
[79] T. Tuerk. A formalisation of Smallfoot in HOL. In TPHOLs, 22nd International Conference, LNCS
5674, pages 469–484, 2009.
[80] V. Vafeiadis. Shape-value abstraction for verifying linearizability. In 10th VMCAI, LNCS 5403, pages
335–348, 2009.
[81] V. Vafeiadis and M.J. Parkinson. A marriage of rely/guarantee and separation logic. In 18th CONCUR,
Springer LNCS 4703, pages 256–271, 2007.
[82] C. Varming and L. Birkedal. Higher-order separation logic in Isabelle/HOLCF. 24th MFPS, 2008.
[83] J. Villard, É. Lozes, and C. Calcagno. Tracking heaps that hop with Heap-Hop. In 16th TACAS, Springer
LNCS 6015, pages 275–279, 2010.
[84] H. Yang. Local Reasoning for Stateful Programs. Ph.D. thesis, University of Illinois, Urbana-
Champaign, 2001.
[85] H. Yang, O. Lee, J. Berdine, C. Calcagno, B. Cook, D. Distefano, and P. O’Hearn. Scalable shape
analysis for systems code. 20th CAV, Springer LNCS 5123. pp 385-398, 2008.
[86] H. Yang and P. O’Hearn. A semantic basis for local reasoning. In 5th FOSSACS, 2002. Springer LNCS
2303 , pp402-416.

You might also like