Aldrich Prog Analysis
Aldrich Prog Analysis
S statements
a arithmetic expressions (AExp)
x, y program variables (Vars)
n number literals
P boolean predicates (BExp)
The syntax of While is shown below. Statements S can be an assignment x := a; a skip
statement, which does nothing;1 and if and while statements, with boolean predicates P as
conditions. Arithmetic expressions a include variables x, numbers n, and one of several arithmetic
operators (opa ). Predicates are represented by Boolean expressions that include true, false, the
negation of another Boolean expression, Boolean operators opb applied to other Boolean expressions,
and relational operators opr applied to arithmetic expressions.
1
As a starting point, we will eliminate recursive arithmetic and boolean expressions and replace
them with simple atomic statement forms, which are called instructions, after the assembly language
instructions that they resemble. For example, an assignment statement of the form w = x ∗ y + z
will be rewritten as a multiply instruction followed by an add instruction. The multiply assigns to
a temporary variable t1 , which is then used in the subsequent add:
t1 = x ∗ y
w = t1 + z
As the translation from expressions to instructions suggests, program analysis is typically stud-
ied using a representation of programs that is not only simpler, but also lower-level than the source
(While, in this instance) language. Many Java analyses are actually conducted on byte code, for
example. Typically, high-level languages come with features that are numerous and complex, but
can be reduced into a smaller set of simpler primitives. Working at the lower level of abstraction
thus also supports simplicity in the compiler.
Control flow constructs such as if and while are similarly translated into simpler jump and
conditional branch constructs that jump to a particular (numbered) instruction. For example, a
statement of the form if P then S1 else S2 would be translated into:
1: if P then goto 4
2: S2
3: goto 5
4: S1
Exercise 1. How would you translate a While statement of the form while P do S?
This form of code is often called 3-address code, because every instruction has at most two
source operands and one result operand. We now define the syntax for 3-address code produced
from the While language, which we will call While3Addr. This language consists of a set of
simple instructions that load a constant into a variable, copy from one variable to another, compute
the value of a variable from two others, or jump (possibly conditionally) to a new address n. A
program P is just a map from addresses to instructions:2
I ::= x := n op ::= + | − | ∗ | /
| x := y opr ::= < | =
| x := y op z P ∈ N→I
| goto n
| if x opr 0 goto n
Formally defining a translation from a source language such as While to a lower-level in-
termediate language such as While3Addr is possible, but more appropriate for the scope of a
compilers course. For our purposes, the above should suffice as intuition. We will formally define
the semantics of While3Addr in subsequent lectures.
2
The idea of the mapping between numbers and instructions maps conceptually to Nielsens’ use of labels in the
While language specification in the textbook. This concept is akin to mapping line numbers to code.
2
3 Extensions
The languages described above are sufficient to introduce the fundamental concepts of program
analysis in this course. However, we will eventually examine various extensions to While and
While3Addr, so that we can understand how more complicated constructs in real languages can
be analyzed. Some of these extensions to While3Addr will include:
I ::= ...
| x := f (y) function call
| return x return
| x := y.m(z) method call
| x := &p address-of operator
| x := ∗p pointer dereference
| ∗p := x pointer assignment
| x := y.f field read
| x.f := y field assignment
We will not give semantics to these extensions now, but it is useful to be aware of them as you
will see intermediate code like this in practical analysis frameworks.
3
Lecture Notes: Program Semantics
1 Operational Semantics
To reason about analysis correctness, we need a clear definition of what a program means. One way
to do this is using natural language (e.g., the Java Language Specification). However, although
natural language specifications are accessible, they are also often imprecise. This can lead to many
problems, including incorrect compiler implementations or program analyses.
A better alternative is a formal definition of program semantics. We begin with operational
semantics, which mimics, at a high level, the operation of a computer executing the program. Such
a semantics also reflects the way that techniques such as dataflow analysis or Hoare Logic reason
about the program, so it is convenient for our purposes.
There are two broad classes of operational semantics: big-step operational semantics, which spec-
ifies the entire operation of a given expression or statement; and small-step operational semantics,
which specifies the operation of the program one step at a time.
E ∈ Var → Z
Here E denotes a particular program state. The meaning of an expression with a variable like x + 5
involves “looking up” the x’s value in the associated E, and substituting it in. Given a state, we
can write a judgement as follows:
he, Ei ⇓ n
This means that given program state E, the expression e evaluates to n. This formulation is called
big-step operational semantics; the ⇓ judgement relates an expression and its “meaning.”1 We then
build up the meaning of more complex expressions using rules of inference (also called derivation
or evaluation rules). An inference rule is made up of a set of judgments above the line, known as
premises, and a judgment below the line, known as the conclusion. The meaning of an inference
rule is that the conclusion holds if all of the premises hold:
1
Note that I have chosen ⇓ because it is a common notational convention; it’s not otherwise special. This is true for
many notational choices in formal specification.
1
premise1 premise2 . . . premisen
conclusion
An inference rule with no premises is an axiom, which is always true. For example, integers always
evaluate to themselves, and the meaning of a variable is its stored value in the state:
big-int big-var
hn, Ei ⇓ n hx, Ei ⇓ E(x)
he1 , Ei ⇓ n1 he2 , Ei ⇓ n2
big-add
he1 + e2 , Ei ⇓ n1 + n2
But, how does the value of x come to be “stored” in E? For that, we must consider W HILE
Statements. Unlike expressions, statements have no direct result. However, they can have side
effects. That is to say: the “result” or meaning of a Statement is a new state. The judgement ⇓ as
applied to statements and states therefore looks like:
hS, Ei ⇓ E 0
This allows us to write inference rules for statements, bearing in mind that their meaning is not
an integer, but a new state. The meaning of skip, for example, is an unchanged state:
big-skip
hskip, Ei ⇓ E
Statement sequencing, on the other hand, does involve premises:
hs1 , Ei ⇓ E 0 hs2 , E 0 i ⇓ E 00
big-seq
hs1 ; s2 , Ei ⇓ E 00
The if statement involves two rules, one for if the boolean predicate evaluates to true (rules
for boolean expressions not shown), and one for if it evaluates to false. I’ll show you just the
first one for demonstration:
This brings us to assignments, which produce a new state in which the variable being assigned to
is mapped to the value from the right-hand side. We write this with the notation E[x 7→ n], which
can be read “a new state that is the same as E except that x is mapped to n.”
he, Ei ⇓ n
big-assign
hx := e, Ei ⇓ E[x 7→ n]
Note that the update to the state is modeled functionally; the variable E still refers to the old
state, while E[x 7→ n] is the new state represented as a mathematical map.
Fully specifying the semantics of a language requires a judgement rule like this for every lan-
guage construct. These notes only include a subset for W HILE, for brevity.
2
1.2 W HILE: Small-step operational semantics
Big-step operational semantics has its uses. Among other nice features, it directly suggests a sim-
ple interpreter implementation for a given language. However, it is difficult to talk about a state-
ment or program whose evaluation does not terminate. Nor does it give us any way to talk about
intermediate states (so modeling multiple threads of control is out).
Sometimes it is instead useful to define a small-step operational semantics, which specifies pro-
gram execution one step at a time. We refer to the pair of a statement and a state (hS, Ei) as
a configuration. Whereas big step semantics specifies program meaning as a function between a
configuration and a new state, small step models it as a step from one configuration to another.
You can think of small-step semantics as a set of rules that we repeatedly apply to configura-
tions until we reach a final configuration for the language (hskip, Ei, in this case) if ever.2 We write
this new judgement using a slightly different arrow: →. hS, Ei → hS 0 , E 0 i indicates one step of ex-
ecution; hS, Ei →∗ hS 0 , E 0 i indicates zero or more steps of execution. We formally define multiple
execution steps as follows:
hS, Ei → hS 0 , E 0 i hS 0 , E 0 i →∗ hS 00 , E 00 i
multi-reflexive multi-inductive
hS, Ei →∗ hS, Ei hS, Ei →∗ hS 00 , E 00 i
To be complete, we should also define auxiliary small-step operators →a and →b for arithmetic
and boolean expressions, respectively; only the operator for statements results in an updated state
(as in big step). The types of these judgements are thus:
→ : (Stmt × E) → (Stmt × E)
→a : (Aexp × E) → Aexp
→b : (Bexp × E) → Bexp
We can now again write the semantics of a W HILE program as new rules of inference. Some rules
look very similar to the big-step rules, just with a different arrow. For example, consider variables:
small-var
hx, Ei →a E(x)
Things get more interesting when we return to statements. Remember, small-step semantics ex-
press a single execution step. So, consider an if statement:
hb, Ei →b b0
small-if-congruence
hif b then S1 else S2 , Ei → hif b0 then S1 else S2 , Ei
small-iftrue
hif true then s1 else s2 , Ei → hs1 , E 0 i
Exercise 2. We have again omitted the small-iffalse case, as well as rule(s) for while, as exercises
to the reader.
hs1 , Ei → hs01 , E 0 i
small-seq-congruence
hs1 ; s2 , Ei → hs01 ; s2 E 0 i
2
Not all statements reach a final configuration, like while true do skip.
3
small-seq
hskip; s2 , Ei → hs2 , Ei
P [n] = x := m
step-const
P ` hE, ni ; hE[x 7→ m], n + 1i
This states that in the case where the nth instruction of the program P (looked up using P [n])
is a constant assignment x := m, the abstract machine takes a step to a state in which the state E
is updated to map x to the constant m, written as E[x 7→ m], and the program counter now points
to the instruction at the following address n + 1. We similarly define the remaining rules:
P [n] = x := y
step-copy
P ` hE, ni ; hE[x 7→ E[y]], n + 1i
4
P [n] = goto m
step-goto
P ` hE, ni ; hE, mi
h4, E1 i ⇓ 4 h2, E1 i ⇓ 2
h4 ∗ 2, E1 i ⇓ 8 h6, E1 i ⇓ 6
h(4 ∗ 2) − 6, E1 i ⇓ 2
We say that he, Ei ⇓ n is provable (expressed mathematically as ` he, Ei ⇓ n) if there exists a
well-formed derivation with he, Ei ⇓ n as its conclusion. “Well formed” simply means that every
step in the derivation is a valid instance of one of the rules of inference for this system.
A proof system like our operational semantics is complete if every true statement is provable.
It is sound (or consistent) if every provable judgement is true. Typically, a system of semantics is
always complete, unless you forget a rule; soundness can be easier to mess up!
5
Structural induction. Structural induction is another special case of well-founded induction where
the relation is defined on the structure of a program or a derivation. For example, consider
the syntax of arithmetic expressions in W HILE, Aexp. Induction on a recursive definition like this
proves a property about a a mathematical structure by demonstrating that the property holds for
all possible forms of that structure. We define the relation a b to hold if a is a substructure of b.
For Aexp expressions, the relation ⊆ Aexp × Aexp is:
e1 e1 + e2
e1 e1 ∗ e2
e2 e1 + e2
e2 e1 ∗ e2
. . . etc., for all arithmetic operators opa
To prove that a property P holds for all arithmetic expressions in W HILE (or, ∀e ∈ Aexp.P (e)),
we must show P holds for both the base cases and the inductive cases. e is a base case if there is
no e0 such that e0 e; e is an inductive case if ∃e0 .e0 e. There is thus one proof case per form of
the expression. For Aexp, the base cases are:
` ∀n ∈ Z.P (n)
` ∀x ∈ Vars.P (x)
Example. Let L(e) be the number of literals and variable occurrences in some expression e and O(e)
be the number of operators in e. Prove by induction on the structure of e that ∀e ∈ Aexp.L(e) =
O(e) + 1:
Base cases:
• Case e = n.L(e) = 1 and O(e) = 0
• Case e = x.L(e) = 1 and O(e) = 0
Inductive case 1: Case e = e1 + e2
• By definition, L(e) = L(e1 ) + L(e2 ) and O(e) = O(e1 ) + O(e2 ) + 1.
• By the induction hypothesis, L(e1 ) = O(e1 ) + 1 and L(e2 ) = O(e2 ) + 1.
• Thus, L(e) = O(e1 ) + O(e2 ) + 2 = O(e) + 1.
The other arithmetic operators follow the same logic.
Other proofs for the expression sublanguages of W HILE can be similarly conducted. For ex-
ample, we could prove that the small-step and big-step semantics will obtain equivalent results
on expressions:
6
The actual proof is left as an exercise, but note that this works because the semantics rules for
expressions are strictly syntax-directed: the meaning of an expression is determined entirely by
the meaning of its subexpressions, the structure of which guides the induction.
Induction on the structure of derivations. Unfortunately, that last statement is not true for state-
ments in the W HILE language. For example, imagine we’d like to prove that W HILE is deterministic
(that is, if a statement terminates, it always evaluates to the same value). More formally, we want
to prove that:
We can’t prove the third statement with structural induction on the language syntax because
the evaluation of statements (like while) does not depend only on the evaluation of its subexpres-
sions.
Fortunately, there is another way. Recall that the operational semantics assign meaning to pro-
grams by providing rules of inference that allow us to prove judgements by making derivations.
Derivation trees (like the expression trees we discussed above) are also defined inductively, and
are built of sub-derivations. Because they have structure, we can again use structural induction,
but here, on the structure of derivations.
Instead of assuming (and reasoning about) some statement s ∈ S, we instead assume a deriva-
tion D :: hs, Ei ⇓ E 0 and induct on the structure of that derivation (we define D :: Judgement to
mean “D is the derivation that proves judgement.” e.g., D :: hx + 1, Ei ⇓ 2). That is, to prove that
property P holds for a statement, we will prove that P holds for all possible derivations of that
statement. Such a proof consists of the following steps:
Base cases: show that P holds for each atomic derivation rule with no premises (of the form S).
Inductive cases: For each derivation rule of the form
H1 ...Hn
S
By the induction hypothesis, P holds for Hi , where i = 1 . . . n. We then have to prove that the
property is preserved by the derivation using the given rule of inference.
A key technique for induction on derivations is inversion. Because the number of forms of
rules of inference is finite, we can tell which inference rules might have been used last in the
derivation. For example, given D :: hx := 55, Ei i ⇓ E, we know (by inversion) that the assignment
rule of inference must be the last rule used in D (because no other rules of inference involve an
assignment statement in their concluding judgment). Similarly, if D :: hwhile b do c, Ei i ⇓ E, then
(by inversion) the last rule used in D was either the while-true rule or the while-false rule.
Given those preliminaries, to prove that the evaluation of statements is deterministic (equation
(3) above), pick arbitrary s, E, E 0 , and D :: hs, Ei ⇓ E 0
Proof: by induction of the structure of the derivation D, which we define D :: hs, Ei ⇓ E 0 .
Base case: the one rule with no premises, skip:
D :: hskip, Ei ⇓ E
7
By inversion, the last rule used in D0 (which, again, produced E 00 ) must also have been the rule
for skip. By the structure of the skip rule, we know E 00 = E.
Inductive cases: We need to show that the property holds when the last rule used in D was each
of the possible non-skip W HILE commands. I will show you one representative case; the rest are
left as an exercise. If the last rule used was the while-true statement:
8
Lecture Notes: A Dataflow Analysis Framework for While3Addr
17-355/17-665/17-819O: Program Analysis (Spring 2018)
Claire Le Goues and Jonathan Aldrich
[email protected], [email protected]
σ P Var Ñ L
L represents the set of abstract values we are interested in tracking in the analysis. This varies
from one analysis to another. For example, consider a zero analysis, which tracks whether each
variable is zero or not at each program point (Thought Question: Why would this be useful?). For
this analysis, we define L to be the set tZ, N, Ju. The abstract value Z represents the value 0,
N represents all nonzero values. J is pronounced “top”, and we define it more concretely later it
in these notes; we use it as a question mark, for the situations when we do not know whether a
variable is zero or not, due to imprecision in the analysis.
Conceptually, each abstract value represents a set of one or more concrete values that may occur
when a program executes. We define an abstraction function α that maps each possible concrete
value of interest to an abstract value:
α:ZÑL
For zero analysis, we define α so that 0 maps to Z and all other integers map to N :
αZ p0q “ Z
αZ pnq “ N where n ‰ 0
The core of any program analysis is how individual instructions in the program are analyzed
and affect the analysis state σ at each program point. We define this using flow functions that map
the dataflow information at the program point immediately before an instruction to the dataflow
information after that instruction. A flow function should represent the semantics of the instruction,
but abstractly, in terms of the abstract values tracked by the analysis. We will link semantics to the
flow function precisely when we talk about correctness of dataflow analysis. For now, to approach
the idea by example, we define the flow functions fZ for zero analysis on While3Addr as follows:
1
fZ vx :“ 0wpσq “ rx ÞÑ Zsσ (1)
fZ vx :“ nwpσq “ rx ÞÑ N sσ where n ‰ 0 (2)
fZ vx :“ ywpσq “ rx ÞÑ σpyqsσ (3)
fZ vx :“ y op zwpσq “ rx ÞÑ Jsσ (4)
fZ vgoto nwpσq “σ (5)
fZ vif x “ 0 goto nwpσq “σ (6)
In the notation, the form of the instruction is an implicit argument to the function, which is
followed by the explicit dataflow information argument, in the form fZ vIwpσq. (1) and (2) are for
assignment to a constant. If we assign 0 to a variable x, then we should update the input dataflow
information σ so that x maps to the abstract value Z. The notation rx ÞÑ Zsσ denotes dataflow
information that is identical to σ except that the value in the mapping for x is updated to refer
to Z. Flow function (3) is for copies from a variable y to another variable x: we look up y in σ,
written σpyq, and update σ so that x maps to the same abstract value as y.
We start with a generic flow function for arithmetic instructions (4). Arithmetic can produce
either a zero or a nonzero value, so we use the abstract value J to represent our uncertainty. More
precise flow functions are available based on certain instructions or operands. For example, if the
instruction is subtraction and the operands are the same, the result will definitely be zero. Or, if
the instruction is addition, and the analysis information tells us that one operand is zero, then the
addition is really a copy and we can use a flow function similar to the copy instruction above. These
examples could be written as follows (we would still need the generic case above for instructions
that do not fit such special cases):
fZ vx :“ y ´ ywpσq “ rx ÞÑ Zsσ
fZ vx :“ y ` zwpσq “ rx ÞÑ σpyqsσ where σpzq “ Z
Exercise 1. Define another flow function for some arithmetic instruction and certain conditions
where you can also provide a more precise result than J.
The flow function for branches ((5) and (6)) is trivial: branches do not change the state of the
machine other than to change the program counter, and thus the analysis result is unaffected.
However, we can provide a better flow function for conditional branches if we distinguish the
analysis information produced when the branch is taken or not taken. To do this, we extend our
notation once more in defining flow functions for branches, using a subscript to the instruction to
indicate whether we are specifying the dataflow information for the case where the condition is
true (T ) or when it is false (F ). For example, to define the flow function for the true condition
when testing a variable for equality with zero, we use the notation fZ vif x “ 0 goto nwT pσq. In this
case we know that x is zero so we can update σ with the Z lattice value. Conversely, in the false
condition we know that x is nonzero:
Exercise 2. Define a flow function for a conditional branch testing whether a variable x ă 0.
2
2 Running a dataflow analysis
The point of developing a dataflow analysis is to compute information about possible program
states at each point in a program. For example, for of zero analysis, whenever we divide some
expression by a variable x, we might like to know whether x must be zero (the abstract value Z)
or may be zero (represented by J) so that we can warn the developer.
x y z
1: x :“ 0
1 Z
2: y :“ 1
2 Z N
3: z :“ y
3 Z N N
4: y :“ z ` x
4 Z N N
5: x :“ y ´ z
5 J N N
We simulate running the program in the analysis, using the flow function to compute, for each
instruction in turn, the dataflow analysis information after the instruction from the information we
had before the instruction. For such simple code, it is easy to track the analysis information using
a table with a column for each program variable and a row for each program point (right, above).
The information in a cell tells us the abstract value of the column’s variable immediately after the
instruction at that line (corresponding the the program points labeled with circles in the CFG).
Notice that the analysis is imprecise at the end with respect to the value of x. We were able to
keep track of which values are zero and nonzero quite well through instruction 4, using (in the last
case) the flow function that knows that adding a variable known to be zero is equivalent to a copy.
However, at instruction 5, the analysis does not know that y and z are equal, and so it cannot
determine whether x will be zero. Because the analysis is not tracking the exact values of variables,
but rather approximations, it will inevitably be imprecise in certain situations. However, in practice,
well-designed approximations can often allow dataflow analysis to compute quite useful information.
3
x y z
1: if x “ 0 goto 4
2: y :“ 0 1 ZT , NF
3: goto 6 2 N Z
4: y :“ 1 3 N Z
5: x :“ 1 4
6: z :“ y 5
6 N Z Z
In the table above, the entry for x on line 1 indicates the different abstract values produced for
the true and false conditions of the branch. We use the false condition (x is nonzero) in analyzing
instruction 2. Execution proceeds through instruction 3, at which point we jump to instruction 6.
We have not yet analyzed a path through lines 4 and 5.
Turning to that alternative path, we can start by analyzing instructions 4 and 5 as if we had
taken the true branch at instruction 1:
x y z
1 ZT , NF
2 N Z
3 N Z
4 Z N
5 N N
6 N Z Z note: incorrect!
4
x y z
1 ZT , NF
2 N Z
3 N Z
4 Z N
5 N N
6 N J J corrected
2.3 Join
We generalize the procedure of combining analysis results along multiple paths by using a join
operation, \. When taking two abstract values l1 , l2 P L, the result of l1 \ l2 is an abstract value
lj that generalizes both l1 and l2 .
To precisely define what “generalizes” means, we define a partial order Ď over abstract values,
and say that l1 and l2 are at least as precise as lj , written l1 Ď lj . Recall that a partial order is
any relation that is:
• reflexive: @l : l Ď l
• transitive: @l1 , l2 , l3 : l1 Ď l2 ^ l2 Ď l3 ñ l1 Ď l3
• anti-symmetric: @l1 , l2 : l1 Ď l2 ^ l2 Ď l1 ñ l1 “ l2
A set of values L that is equipped with a partial order Ď, and for which the least upper bound
of any two values in that ordering l1 \ l2 is unique and is also in L, is called a join-semilattice. Any
join-semilattice has a maximal element J (pronounced “top”). We require that the abstract values
used in dataflow analyses form a join-semilattice. We will use the term lattice for short; as we will
see below, this is the correct terminology for most dataflow analyses anyway. For zero analysis, we
define the partial order with Z Ď J and N Ď J, where Z \ N “ J.
We have now introduced and considered all the elements necessary to define a dataflow analysis:
• a lattice pL, Ďq
• an abstraction function α
• initial dataflow analysis assumptions σ0
• a flow function f
Note that the theory of lattices answers a side question that comes up when we begin analyzing
the first program instruction: what should we assume about the value of input variables (like x on
program entry)? If we do not know anything about the value x can be, a good choice is to assume
it can be anything; That is, in the initial environment σ0 , input variables like x are mapped to J.
5
Despite this, we would like to analyze looping programs in bounded time. Let us examine how
through the following simple looping example:1
x y z
1: x :“ 10
1 N
2: y :“ 0
2 N Z
3: z :“ 0
3 N Z Z
4: if x “ 0 goto 8
4 ZT , NF Z Z
5: y :“ 1
5 N N Z
6: x :“ x ´ 1
6 J N Z
7: goto 4
7 J N Z
8: x :“ y
8
The right-hand side above shows the straightforward straight-line analysis of the path that runs
the loop once. We must now re-analyze instruction 4. This should not be surprising; it is analogous
to the one we encountered earlier, merging paths after an if instruction. To determine the analysis
information at instruction 4, we join the dataflow analysis information flowing in from instruction
3 with the dataflow analysis information flowing in from instruction 7. For x we have N \ J “ J.
For y we have Z \ N “ J. For z we have Z \ Z “ Z. The information for instruction 4 is therefore
unchanged, except that for y we now have J.
We can now choose between two paths once again: staying within the loop, or exiting out to
instruction 8. We will choose (arbitrarily, for now) to stay within the loop, and consider instruction
5. This is our second visit to instruction 5, and we have new information to consider: since we have
gone through the loop, the assignment y :“ 1 has been executed, and we have to assume that y may
be nonzero coming into instruction 5. This is accounted for by the latest update to instruction 4’s
analysis information, in which y is mapped to J. Thus the information for instruction 4 describes
both possible paths. We must update the analysis information for instruction 5 so it does so as
well. In this case, however, since the instruction assigns 1 to y, we still know that y is nonzero after
it executes. In fact, analyzing the instruction again with the updated input data does not change
the analysis results for this instruction.
A quick check shows that going through the remaining instructions in the loop, and even
coming back to instruction 4, the analysis information will not change. That is because the flow
functions are deterministic: given the same input analysis information and the same instruction,
they will produce the same output analysis information. If we analyze instruction 6, for example,
the input analysis information from instruction 5 is the same input analysis information we used
when analyzing instruction 6 the last time around. Thus, instruction 6’s output information will
not change, and so instruction 7’s input information will not change, and so on. No matter which
1
I provide the CFG for reference but omit the annotations in the interest of a cleaner diagram.
6
instruction we run the analysis on, anywhere in the loop (and in fact before the loop), the analysis
information will not change.
We say that the dataflow analysis has reached a fixed point.2 In mathematics, a fixed point of a
function is a data value v that is mapped to itself by the function, i.e. f pvq “ v. In this analysis, the
mathematical function is the flow function, and the fixed point is a tuple of the dataflow analysis
values at each program point. If we invoke the flow function on the fixed point, the analysis results
do not change (we get the same fixed point back).
Once we have reached a fixed point of the function for this loop, it is clear that further analysis
of the loop will not be useful. Therefore, we will proceed to analyze statement 8. The final analysis
results are as follows:
x y z
1 N
2 N Z
3 N Z Z
4 ZT , NF J Z updated
5 N N Z already at fixed point
6 J N Z already at fixed point
7 J N Z already at fixed point
8 Z J Z
Quickly simulating a run of the program program shows that these results correctly approximate
actual execution. The uncertainty in the value of x at instructions 6 and 7 is real: x is nonzero
after these instructions, except the last time through the loop, when it is zero. The uncertainty in
the value of y at the end shows imprecision in the analysis: this loop always executes at least once,
so y will be nonzero. However, the analysis (as currently formulated) cannot tell this for certain, so
it reports that it cannot tell if y is zero or not. This is safe—it is always correct to say the analysis
is uncertain—but not as precise as would be ideal.
The benefit of analysis, however, is that we can gain correct information about all possible
executions of the program with only a finite amount of work. In our example, we only had to
analyze the loop statements at most twice each before reaching a fixed point. This is a significant
improvement over the actual program execution, which runs the loop 10 times. We sacrificed
precision in exchange for coverage of all possible executions, a classic tradeoff in static analysis.
How can we be confident that the results of the analysis are correct, besides simulating every
possible run of a (possibly very complex) program? The intuition behind correctness is the invariant
that at each program point, the analysis results approximate all the possible program values that
could exist at that point. If the analysis information at the beginning of the program correctly
approximates the program arguments, then the invariant is true at the beginning of program
execution. One can then make an inductive argument that the invariant is preserved. In particular,
when the program executes an instruction, the instruction modifies the program’s state. As long
as the flow functions account for every possible way that instruction can modify state, then at the
analysis fixed point they will have correctly approximated actual program execution. We will make
this argument more precise in a future lecture.
2
Sometimes abbreviated in one word as fixpoint.
7
2.5 A convenience: the K abstract value and complete lattices
As we think about defining an algorithm for dataflow analysis more precisely, a natural question
comes up concerning how instruction 4 is analyzed in the example above. On the first pass, we
analyzed it using the dataflow information from instruction 3, but on the second pass we had to
consider dataflow information from both instruction 3 and instruction 7.
It is more consistent to say that analyzing an instruction always uses the incoming dataflow
analysis information from all instructions that could precede it. That way, we do not have to worry
about following a specific path during analysis. However, for instruction 4, this requires a dataflow
value from instruction 7, even if instruction 7 has not yet been analyzed. We could do this if we
had a dataflow value that is always ignored when it is joined with any other dataflow value. In
other words, we need a abstract dataflow value K (pronounced “bottom”) such that K \ l “ l.
K plays a dual role to the value J: it sits at the bottom of the dataflow value lattice. For all l,
we have the identity l Ď J and correspondingly K Ď l. There is an greatest lower bound operator
meet, [, which is dual to \. The meet of all dataflow values is K.
A set of values L that is equipped with a partial order Ď, and for which both least upper bounds
\ and greatest lower bounds [ exist in L and are unique, is called a complete lattice.
The theory of K and complete lattices provides an elegant solution to the problem mentioned
above. We can initialize σ at every instruction in the program, except at entry, to K, indicating
that the instruction there has not yet been analyzed. We can then always merge all input values
to a node, whether or not the sources of those inputs have been analysed, because we know that
any K values from unanalyzed sources will simply be ignored by the join operator \, and that if
the dataflow value for that variable will change, we will get to it before the analysis is completed.
8
In the code above, the termination condition is expressed abstractly. It can easily be checked,
however, by running the flow function on each instruction in the program. If the results of analysis
do not change as a result of analyzing any instruction, then it has reached a fixed point.
How do we know the algorithm will terminate? The intuition is as follows. We rely on the
choice of an instruction to be fair, so that each instruction is eventually considered. As long as the
analysis is not at a fixed point, some instruction can be analyzed to produce new analysis results.
If our flow functions are well-behaved (technically, if they are monotone, as we will discuss in a
future lecture) then each time the flow function runs on a given instruction, either the results do
not change, or they get become more approximate (i.e. they are higher in the lattice). Later runs of
the flow function consider more possible paths through the program and therefore produce a more
approximate result which considers all these possibilities. If the lattice is of finite height—meaning
there are at most a finite number of steps from any place in the lattice going up towards the J
value—then this process must terminate eventually. More concretely: once an abstract value is
computed to be J, it will stay J no matter how many times the analysis is run. The abstraction
only flows in one direction.
Although the simple algorithm above always terminates and results in the correct answer, it is
still not always the most efficient. Typically, for example, it is beneficial to analyze the program
instructions in order, so that results from earlier instructions can be used to update the results
of later instructions. It is also useful to keep track of a list of instructions for which there has
been a change since the instruction was last analyzed in the result dataflow information of some
predecessor. Only those instructions need be analyzed; reanalyzing other instructions is useless
since their input has not changed. Kildall captured this intuition with his worklist algorithm,
described in pseudocode as:
for Instruction i in program
input [ i ] = K
input [ firstInstruction ] = i n i t i a l D a t a f l o w I n f o r m a t i o n
worklist = { firstInstruction }
9
instructions from the worklist. We would like adding to the work list to be a set addition operation,
so that no instruction appears in it multiple times. If we have just analysed the program with respect
to an instruction, analyzing it again will not produce different results.
That leaves a choice of which instruction to remove from the worklist. We could choose among
several policies, including last-in-first-out (LIFO) order or first-in-first-out (FIFO) order. In prac-
tice, the most efficient approach is to identify the strongly-connected components (i.e. loops) in
the control flow graph of components and process them in topological order, so that loops that are
nested, or appear in program order first, are solved before later loops. This works well because we
do not want to do a lot of work bringing a loop late in the program to a fixed point, then have to
redo that work when dataflow information from an earlier loop changes.
Within each loop, the instructions should be processed in reverse postorder, the reverse of
the order in which each node is last visited when traversing a tree. Consider the example from
Section 2.2 above, in which instruction 1 is an if test, instructions 2-3 are the then branch,
instructions 4-5 are the else branch, and instruction 6 comes after the if statement. A tree
traversal might go as follows: 1, 2, 3, 6, 3 (again), 2 (again), 1 (again), 4, 5, 4 (again), 1 (again).
Some instructions in the tree are visited multiple times: once going down, once between visiting
the children, and once coming up. The postorder, or order of the last visits to each node, is 6, 3,
2, 5, 4, 1. The reverse postorder is the reverse of this: 1, 4, 5, 2, 3, 6. Now we can see why reverse
postorder works well: we explore both branches of the if statement (4-5 and 2-3) before we explore
node 6. This ensures that we do not have to reanalyze node 6 after one of its inputs changes.
Although analyzing code using the strongly-connected component and reverse postorder heuris-
tics improves performance substantially in practice, it does not change the worst-case performance
results described above.
10
Lecture Notes:
Dataflow Analysis Examples
1 Constant Propagation
While zero analysis was useful for simply tracking whether a given variable is zero or not, constant
propagation analysis attempts to track the constant values of variables in the program, where
possible. Constant propagation has long been used in compiler optimization passes in order to
turn variable reads and computations into constants. However, it is generally useful for analysis
for program correctness as well: any client analysis that benefits from knowing program values
(e.g. an array bounds analysis) can leverage it.
For constant propagation, we want to track what is the constant value, if any, of each program
variable. Therefore we will use a lattice where the set LCP is Z Y tJ, Ku. The partial order is
@l P LCP : K l ^ l J. In other words, K is below every lattice element and J is above every
element, but otherwise lattice elements are incomparable.
In the above lattice, as well as our earlier discussion of zero analysis, we used a lattice to
describe individual variable values. We can lift the notion of a lattice to cover all the dataflow
information available at a program point. This is called a tuple lattice, where there is an element of
the tuple for each of the variables in the program. For constant propagation, the elements of the
set σ are maps from Var to LCP , and the other operators and J{K are lifted as follows:
σ P Var Ñ LCP
σ1 lif t σ2 iff @x P Var : σ1 pxq σ2 pxq
σ1 \lif t σ2 tx ÞÑ σ1 pxq \ σ2 pxq | x P Varu
Jlif t tx ÞÑ J | x P Varu
Klif t tx ÞÑ K | x P Varu
We can likewise define an abstraction function for constant propagation, as well as a lifted
version that accepts an environment E mapping variables to concrete values. We also define the
initial analysis information to conservatively assume that initial variable values are unknown.
Note that in a language that initializes all variables to zero, we could make more precise initial
dataflow assumptions, such as tx ÞÑ 0 | x P Varu:
αCP pnq n
αlif t pE q tx ÞÑ αCP pE pxqq | x P Varu
σ0 Jlif t
We can now define flow functions for constant propagation:
1
fCP vx : nwpσ q rx ÞÑ nsσ
fCP vx : y wpσ q rx ÞÑ σpyqsσ
fCP vx : y op z wpσ q rx ÞÑ σpyq oplif t σpzqsσ
where n oplif t m n op m
and n oplif t J J (and symmetric)
and n oplif t K K (and symmetric)
2 Reaching Definitions
Reaching definitions analysis determines, for each use of a variable, which assignments to that
variable might have set the value seen at that use. Consider the following program:
1: y : x
2: z : 1
3: if y 0 goto 7
4: z : z y
5: y : y 1
6: goto 3
7: y : 0
In this example, definitions 1 and 5 reach the use of y at 4.
2
set of all definitions in the program. The set of elements in the lattice is the set of all subsets of
DEFS—that is, the powerset of DEFS, written P DEFS .
What should be for reaching definitions? The intuition is that our analysis is more precise
the smaller the set of definitions it computes at a given program point. This is because we want to
know, as precisely as possible, where the values at a program point came from. So should be the
subset relation : a subset is more precise than its superset. This naturally implies that \ should
be union, and that J and K should be the universal set DEFS and the empty set H, respectively.
In summary, we can formally define our lattice and initial dataflow information as follows:
σ P P DEFS
σ1 σ2 iff σ1 σ2
σ1 \ σ2 σ1 Y σ2
J DEFS
K H
σ0 H
Instead of using the empty set for σ0 , we could use an artificial reaching definition for each
program variable (e.g. x0 as an artificial reaching definition for x) to denote that the variable is
either uninitialized, or was passed in as a parameter. This is convenient if it is useful to track
whether a variable might be uninitialized at a use, or if we want to consider a parameter to be a
definition. We could write this formally as σ0 tx0 | x P Varsu
We will now define flow functions for reaching definitions. Notationally, we will write xn to
denote a definition of the variable x at the program instruction numbered n. Since our lattice is
a set, we can reason about changes to it in terms of elements that are added (called GEN) and
elements that are removed (called KILL) for each statement. This GEN/KILL pattern is common
to many dataflow analyses. The flow functions can be formally defined as follows:
3
3 Live Variables
Live variable analysis determines, for each program point, which variables might be used again
before they are redefined. Consider again the following program:
1: y : x
2: z : 1
3: if y 0 goto 7
4: z : z y
5: y : y 1
6: goto 3
7: y : 0
In this example, after instruction 1, y is live, but x and z are not. Live variables analysis
typically requires knowing what variable holds the main result(s) computed by the program. In
the program above, suppose z is the result of the program. Then at the end of the program, only z
is live.
Live variable analysis was originally developed for optimization purposes: if a variable is not
live after it is defined, we can remove the definition instruction. For example, instruction 7 in the
code above could be optimized away, under our assumption that z is the only program result of
interest.
We must be careful of the side effects of a statement, of course. Assigning a variable that is no
longer live to null could have the beneficial side effect of allowing the garbage collector to collect
memory that is no longer reachable—unless the GC itself takes into consideration which variables
are live. Sometimes warning the user that an assignment has no effect can be useful for software
engineering purposes, even if the assignment cannot safely be optimized away. For example, eBay
found that FindBugs’s analysis detecting assignments to dead variables was useful for identifying
unnecessary database calls.1
For live variable analysis, we will use a set lattice to track the set of live variables at each
program point. The lattice is similar to that for reaching definitions:
σ P P Var
σ1 σ2 iff σ1 σ2
σ1 \ σ2 σ1 Y σ2
J Var
K H
What is the initial dataflow information? This is a tricky question. To determine the variables
that are live at the start of the program, we must reason about how the program will execute...i.e.
we must run the live variables analysis itself! There’s no obvious assumption we can make about
this. On the other hand, it is quite clear which variables are live at the end of the program: just the
variable(s) holding the program result.
Consider how we might use this information to compute other live variables. Suppose the last
statement in the program assigns the program result z, computing it based on some other variable
x. Intuitively, that statement should make x live immediately above that statement, as it is needed
to compute the program result z—but z should now no longer be live. We can use similar logic for
the second-to-last statement, and so on. In fact, we can see that live variable analysis is a backwards
1
see Ciera Jaspan, I-Chin Chen, and Anoop Sharma, Understanding the value of program analysis tools, OOPSLA prac-
titioner report, 2007
4
analysis: we start with dataflow information at the end of the program and use flow functions to
compute dataflow information at earlier statements.
Thus, for our “initial” dataflow information—and note that “initial” means the beginning of
the program analysis, but the end of the program—we have:
KILLLV vI w tx | I defines xu
GENLV vI w tx | I uses xu
We would compute dataflow analysis information for the program shown above as follows.
Note that we iterate over the program backwords, i.e. reversing control flow edges between in-
structions. For each instruction, the corresponding row in our table will hold the information
after we have applied the flow function—that is, the variables that are live immediately before the
statement executes:
stmt worklist defs
end 7 tz u
7 3 tz u
3 6,2 tz, yu
6 5,2 tz, yu
5 4,2 tz, yu
4 3,2 tz, yu
3 2 tz, yu
2 1 ty u
1 H tx u
5
Lecture Notes: Program Analysis Correctness
The
1 Termination
As we think about the correctness of program analysis, let us first think more carefully about the
situations under which program analysis will terminate. In a previous lecture, we analyzed the
performance of Kildall’s worklist algorithm. A critical part of that performance analysis was the
the observation that running a flow function always either leaves the dataflow analysis informa-
tion unchanged, or makes it more approximate—that is, it moves the current dataflow analysis
results up in the lattice. The dataflow values at each program point describe an ascending chain:
We can now show that for a lattice of finite height, the worklist algorithm is guaranteed to
terminate. We do so by showing that the dataflow analysis information at each program point
follows an ascending chain. Consider the following version of the worklist algorithm:
forall (Instruction i P program)
σris “ K
σ[beforeStart] = initialDataflowInformation
worklist = { firstInstruction }
1
let newOutput = flow(i, thisInput)
if (newOutput ‰ σris)
σris = newOutput
worklist = worklist Y successors(i)
Question: what are the differences between this version and the previous version? Convince yourself that it
still does the same thing.
We can make the termination argument inductively: At the beginning of the analysis, the anal-
ysis information at every program point (other than the start) is K (by definition). Thus the first
time we run each flow function for each instruction, the result will be at least as high in the lattice
as what was there before (because nothing is lower in a lattice than K). We will run the flow func-
tion for a given instruction again at a program point only if the dataflow analysis information just
before that instruction changes. Assume that the previous time we ran the flow function, we had
input information σi and output information σo . Now we are running it again because the input
dataflow analysis information has changed to some new σi1 —and by the induction hypothesis, we
can assume it is higher in the lattice than before, i.e. σi Ď σi1 .
What we need to show is that the output information σo1 is at least as high in the lattice as
the old output information σo —that is, we must show that σo Ď σo1 . This will be true if our flow
functions are monotonic:
Proof. Follows the logic given above when motivating monotonicity. Monotonicity implies that
the dataflow value at each program point i can only increase each time σris is assigned. This can
happen a maximum of h times for each program point, where h is the height of the lattice. This
bounds the number of elements added to the worklist to h ˚ e, where e is the number of edges
in the program’s control flow graph. Since we remove one element of the worklist for each time
through the loop, we will execute the loop at most h ˚ e times before the worklist is empty. Thus,
the algorithm will terminate.
2
(αsimple and Ďsimple are simply the unlifted versions of α and Ď, i.e. they operate on individual
values rather than maps.)
3 Correctness
What does it mean for an analysis of a W HILE 3A DDR program to be correct? Intuitively, we would
like the program analysis results to correctly describe every actual execution of the program. To
establish correctness, we will make use of the precise definitions of W HILE 3A DDR we gave in the
form of operational semantics in the first couple of lectures. We start by formalizing a program
execution as a trace:
Exercise 1. Consider the following (incorrect) flow function for zero analysis:
fZ vx :“ y ` zwpσq “ rx ÞÑ Zsσ
Exercise 1. Give an example of a program and a concrete trace that illustrates that this flow func-
tion is unsound.
The key to designing a sound analysis is to make sure that the flow functions map abstract
information before each instruction to abstract information after that instruction in a way that
matches the instruction’s concrete semantics. Another way of saying this is that the manipulation
of the abstract state done by the analysis should reflect the manipulation of the concrete machine
state done by the executing instruction. We can formalize this as a local soundness property:
Exercise 2. Consider again the incorrect zero analysis flow function described above. Specify an
input state ci and use that input state to show that the flow function is not locally sound.
3
We can now show that the flow functions for zero analysis are locally sound. Although techni-
cally the overall abstraction function α accepts a complete program configuration pE, nq, for zero
analysis we can ignore the n component and so in the proof below we will simply focus on the
environment E. We show the cases for a couple of interesting syntax forms; the rest are either
trivial or analogous:
Now we can show that local soundness can be used to prove the global soundness of a dataflow
analysis. To do so, let us formally define the state of the dataflow analysis at a fixed point:
And now the main result we will use to prove program analyses correct:
Theorem 2 (Local Soundness implies Global Soundness). If a dataflow analysis’s flow function f is
monotonic and locally sound, and for all traces T we have αpc0 q Ď σ0 where σ0 is the initial analysis
information, then any fixed point tσi | i P P u of the analysis is sound.
Proof. Consider an arbitrary program trace T . The proof is by induction on the program configu-
rations tci u in the trace.
4
Case c0 :
αpc0 q Ď σ0 by assumption.
σ0 Ď σn0 by the definition of a fixed point.
αpc0 q Ď σn0 by the transitivity of Ď.
Case ci`1 :
αpci q Ď σni by the induction hypothesis.
P $ ci ; ci`1 by the definition of a trace.
αpci`1 q Ď f vP rni swpαpci qq by local soundness.
f vP rni swpαpci qq Ď f vP rni swpσni q by monotonicity of f .
σni`1 “ f vP rni swpσni q \ ... by the definition of fixed point.
f vP rni swpσni q Ď σni`1 by the properties of \.
αpci`1 q Ď σni`1 by the transitivity of Ď.
Since we previously proved that Zero Analysis is locally sound and that its flow functions
are monotonic, we can use this theorem to conclude that the analysis is sound. This means, for
example, that Zero Analysis will never neglect to warn us if we are dividing by a variable that
could be zero.
This discussion leads naturally into a fuller treatment of abstract interpretation, which we will
turn to in subsequent lectures/readings.
5
Lecture Notes:
Widening Operators and Collecting Semantics
ERD P Var Ñ Z ˆ N
We can now extend the semantics to track this information. We show only the rules that differ
from those described in the earlier lectures:
P rns “ x :“ m
step-const
P $ E, n ; Erx ÞÑ m, ns, n ` 1
P rns “ x :“ y
step-copy
P $ E, n ; Erx ÞÑ Erys, ns, n ` 1
Essentially, each rule that defines a variable records the current location as the latest definition
of that variable. Now we can define an abstraction function for reaching definitions from this
collecting semantics:
1
αRD pERD , nq “ tm | Dx P domainpERD q such that ERD pxq “ i, mu
From this point, reasoning about the correctness of reaching definitions proceeds analogously
to the reasoning for zero analysis outlined in the previous lectures.
Formulating a collecting semantics can be tricky for some analyses, but it can be done with a
little thought. For example, consider live variable analysis. The collecting semantics requires us
to know, for each execution of the program, which variables currently in scope will be used before
they are defined in the remainder of the program. We can compute this semantics by assuming a
(possibly infinite) trace for a program run, then specifying the set of live variables at every point
in that trace based on the trace going forward from that point. This semantics, specified in terms
of traces rather than a set of inference rules, can then be used in the definition of an abstraction
function and used to reason about the correctness of live variables analysis.
2 Interval Analysis
Let us consider a program analysis that might be suitable for array bounds checking, namely
interval analysis. As the name suggests, interval analysis tracks the interval of values that each
variable might hold. We can define a lattice, initial dataflow information, and abstraction function
as follows:
L “ Z8 ˆ Z8 where Z8 “ Z Y t´8, 8u
rl1 , h1 s Ď rl2 , h2 s iff l2 ď8 l1 ^ h1 ď8 h2
rl1 , h1 s \ rl2 , h2 s “ rmin8 pl1 , l2 q, max8 ph1 , h2 qs
J “ r´8, 8s
K “ r8, ´8s
σ0 “ J
αpxq “ rx, xs
We have extended the ď operator and the min and max functions to handle sentinels representing
positive and negative infinity in the obvious way. For example ´8 ď8 n for all n P Z. For
convenience we write the empty interval K as r8, ´8s.
Note also that this lattice is defined to capture the range of a single variable. As usual, we
can lift it to a map from variables to interval lattice elements. Thus we (again) have dataflow
information σ P Var Ñ L
We can also define a set of flow functions. Here we provide one for addition; the rest should
be easy for the reader to develop:
2
Just one practical problem remains. Consider: what is the height of the above-defined lattice, and
what consequences does this have for our analysis in practice?
1: x :“ 0
2: if x “ y goto 5
3: x :“ x ` 1
4: goto 2
5: y :“ 0
Using the worklist algorithm (strongly connected components first), gives us:
stmt worklist x y
0 1 J J
1 2 [0,0] J
2 3,5 [0,0] J
3 4,5 [1,1] J
4 2,5 [1,1] J
2 3,5 [0,1] J
3 4,5 [1,2] J
4 2,5 [1,2] J
2 3,5 [0,2] J
3 4,5 [1,3] J
4 2,5 [1,3] J
2 3,5 [0,3] J
...
Consider the sequence of interval lattice elements for x immediately after statement 2. Count-
ing the original lattice value as K (not shown explicitly in the trace above), we can see it is the as-
cending chain K, r0, 0s, r0, 1s, r0, 2s, r0, 3s, .... Recall that ascending chain means that each element
of the sequence is higher in the lattice than the previous element. In the case of interval analysis,
[0,2] (for example) is higher than [0,1] in the lattice because the latter interval is contained within
the former. Given mathematical integers, this chain is clearly infinite; therefore our analysis is not
guaranteed to terminate (and indeed it will not in practice).
A widening operator’s purpose is to compress such infinite chains to finite length. The widen-
ing operator considers the most recent two elements in a chain. If the second is higher than the
first, the widening operator can choose to jump up in the lattice, potentially skipping elements
in the chain. For example, one way to cut the ascending chain above down to a finite height is to
observe that the upper limit for x is increasing, and therefore assume the maximum possible value
8 for x. Thus we will have the new chain K, r0, 0s, r0, 8s, r0, 8s, ... which has already converged
after the third element in the sequence.
3
The widening operator gets its name because it is an upper bound operator, and in many
lattices, higher elements represent a wider range of program values.
We can define the example widening operator given above more formally as follows:
stmt worklist x y
0 1 J J
1 2 [0,0] J
2 3,5 [0,0] J
3 4,5 [1,1] J
4 2,5 [1,1] J
2 3,5 [0,8] J
3 4,5 [1,8] J
4 2,5 [1,8] J
2 5 [0,8] J
5 H [0,8] [0,0]
Before we analyze instruction 2 the first time, we compute W pK, r0, 0sq “ r0, 0s using the
first case of the definition of W . Before we analyze instruction 2 the second time, we compute
W pr0, 0s, r0, 1sq “ r0, 8s. In particular, the lower bound 0 has not changed, but since the upper
bound has increased from h1 “ 0 to h2 “ 1, the maxW helper function sets the maximum to 8.
After we go through the loop a second time we observe that iteration has converged at a fixed
point. We therefore analyze statement 5 and we are done.
Let us consider the properties of widening operators more generally. A widening operator
W plprevious :L, lcurrent :Lq : L accepts two lattice elements, the previous lattice value lprevious at a pro-
gram location and the current lattice value lcurrent at the same program location. It returns a new
lattice value that will be used in place of the current lattice value.
We require two properties of widening operators. The first is that the widening operator must
return an upper bound of its operands. Intuitively, this is required for monotonicity: if the oper-
ator is applied to an ascending chain, an ascending chian should be a result. Formally, we have
@lprevious , lcurrent : lprevious Ď W plprevious , lcurrent q ^ lcurrent Ď W plprevious , lcurrent q.
The second property is that when the widening operator is applied to an ascending chain
li , the resulting ascending chain liW must be of finite height. Formally we define l0W “ l0 and
@i ą 0 : liW “ W pli´1 W , l q. This property ensures that when we apply the widening operator, it will
i
ensure that the analysis terminates.
Where can we apply the widening operator? Clearly it is safe to apply anywhere, since it must
be an upper bound and therefore can only raise the analysis result in the lattice, thus making
the analysis result more conservative. However, widening inherently causes a loss of precision.
Therefore it is better to apply it only when necessary. One solution is to apply the widening
4
operator only at the heads of loops, as in the example above. Loop heads (or their equivalent, in
unstructured control flow) can be inferred even from low-level three address code—see a compiler
text such as Appel and Palsberg’s Modern Compiler Implementation in Java.
We can use a somewhat smarter version of this widening operator with the insight that the
bounds of a lattice are often related to constants in the program. Thus if we have an ascend-
ing chain K, r0, 0s, r0, 1s, r0, 2s, r0, 3s, ... and the constant 10 is in the program, we might change
the chain to K, r0, 0s, r0, 10s, .... If we are lucky, the chain will stop ascending that that point:
K, r0, 0s, r0, 10s, r0, 10s, .... If we are not so fortunate, the chain will continue and eventually sta-
bilize at r0, 8s as before: K, r0, 0s, r0, 10s, r0, 8s.
If the program has the set of constants K, we can define a widening operator as follows:
1: x :“ 0
2: y :“ 1
3: if x “ 10 goto 7
4: x :“ x ` 1
5: y :“ y ´ 1
6: goto 3
7: goto 7
Here the constants in the program are 0, 1 and 10. The analysis results are as follows:
stmt worklist x y
0 1 J J
1 2 [0,0] J
2 3 [0,0] r1, 1s
3 4,7 r0, 0sF , KT r1, 1s
4 5,7 [1,1] r1, 1s
5 6,7 [1,1] r0, 0s
6 3,7 [1,1] r0, 0s
3 4,7 r0, 1sF , KT r0, 1s
4 5,7 [1,2] r0, 1s
5 6,7 [1,2] r´1, 0s
6 3,7 [1,2] r´1, 0s
3 4,7 r0, 9sF , r10, 10sT r´8, 1s
4 5,7 [1,10] r´8, 1s
5 6,7 [1,10] r´8, 0s
6 3,7 [1,10] r´8, 0s
3 7 r0, 9sF , r10, 10sT r´8, 1s
7 H [10,10] r´8, 1s
5
Applying the widening operation the first time we get to statement 3 has no effect, as the
previous analysis value was K. The second time we get to statement 3, the range of both x and y
has been extended, but both are still bounded by constants in the program. The third time we get
to statement 3, we apply the widening operator to x, whose abstract value has gone from [0,1] to
[0,2]. The widened abstract value is [0,10], since 10 is the smallest constant in the program that is at
least as large as 2. For y we must widen to r´8, 1s. The analysis stabilizes after one more iteration.
In this example I have assumed a flow function for the if instruction that propagates different
interval information depending on whether the branch is taken or not. In the table, we list the
branch taken information for x as K until x reaches the range in which it is feasible to take the
branch. K can be seen as a natural representation for dataflow values that propagate along a path
that is infeasible.
6
Lecture Notes:
Interprocedural Analysis
1 Interprocedural Analysis
Consider an extension of W HILE 3A DDR that includes functions. We thus add a new syntactic
category F (for functions), and two new instruction forms (function call and return), as follows:
f vx : g py qwpσ q
rx ÞÑ Lr sσ where σ py q La
f vreturn xwpσ q σ where σ pxq Lr
We can apply zero analysis to the following function, using La Lr J:
1
1 : fun divByXpxq : int
2: y : 10{x
3: return y
4 : fun mainpq : void
5: z : 5
6: w : divByXpz q
The results are sound, but imprecise. We can avoid the false positive by using a more optimistic
assumption La Lr N Z. But then we get a problem with the following program:
Now what?
1.2 Annotations
An alternative approach uses annotations. This allows us to choose different argument and result
assumptions for different procedures. Flow functions might look like:
f vx : g py qwpσ q
rx ÞÑ annotvgw.rsσ where σ py q annotvg w.a
f vreturn xwpσ q σ where σ pxq annotvg w.r
Now we can verify that both of the above programs are safe. But some programs remain
difficult:
We will see other example analysis approaches that use annotations later in the semester, though
historically, programmer buy-in remains a challenge in practice.
2
f vx : g py qwpσ q
rx ÞÑ Lr srz ÞÑ Lg | z P Globalssσ
where σ py q La ^ @z P Globals : σ pz q Lg
f vreturn xwpσ q σ
where σ pxq Lr ^ @z P Globals : σ pz q Lg
Annotations can be extended in a natural way to handle global variables.
• We add additional edges to the control flow graph. For every call to function g, we add an
edge from the call site to the first instruction of g, and from every return statement of g to
the instruction following that call.
• When analyzing the first statement of a procedure, we generally gather analysis information
from each predecessor as usual. However, we take out all dataflow information related to
local variables in the callers. Furthermore, we add dataflow information for parameters in
the callee, initializing their dataflow values according to the actual arguments passed in at
each call site.
• When analyzing an instruction immediately after a call, we get dataflow information about
local variables from the previous statement. Information about global variables is taken from
the return sites of the function that was called. Information about the variable that the result
of the function call was assigned to comes from the dataflow information about the returned
value.
Now the example described above can be successfully analyzed. However, other programs
still cause problems:
3
1.5 Context Sensitive Analysis
Context-sensitive analysis analyzes a function either multiple times, or parametrically, so that the
analysis results returned to different call sites reflect the different analysis results passed in at
those call sites.
We could get context sensitivity just by duplicating all callees. But this works only for non-
recursive programs.
A simple solution is to build a summary of each function, mapping dataflow input information
to dataflow output information. We will analyze each function once for each context, where a
context is an abstraction for a set of calls to that function. At a minimum, each context must track
the input dataflow information to the function.
Let’s look at how this approach allows the program given above to be proven safe by zero
analysis.
[Example will be given in class]
Things become more challenging in the presence of recursive functions, or more generally mu-
tual recursion. Let us consider context-sensitive interprocedural constant propagation analysis of
a factorial function called by main. We are not focused on the intraprocedural part of the analysis,
so we will just show the function in the form of Java or C source code:
int fact ( int x) {
i f ( x == 1 )
return 1;
else
return x ∗ f a c t ( x 1);
}
void main ( ) {
int y = fact ( 2 ) ;
int z = fact ( 3 ) ;
i n t w = f a c t ( getInputFromUser ( ) ) ;
}
We can analyze the first two calls to fact within main in a straightforward way, and in fact if
we cache the results of analyzing fact(2) we can reuse this when analyzing the recursive call inside
fact(3).
For the third call to fact, the argument is determined at runtime and so constant propagation
uses J for the calling context. In this case the recursive call to fact() also has J as the calling
context. But we cannot look up the result in the cache yet as analysis of fact() with J has not
completed. A naı̈ve approach would attempt to analyze fact() with J again, and would therefore
not terminate.
We can solve the problem by applying the same idea as in intraprocedural analysis. The recur-
sive call is a kind of a loop. We can make the initial assumption that the result of the recursive call
is K, which is conceptually equivalent to information coming from the back edge of a loop. When
we discover the result is a higher point in the lattice then K, we reanalyze the calling context (and
recursively, all calling contexts that depend on it). The algorithm to do so can be expressed as
follows:
4
type Context
val f n : F unction
val input : L
type Summary
val input : L
val output : L
function A NALYZE(ctx, σi )
σo resultsrctxs.output
P USH(analyzing, ctx)
σo1 = I NTRAPROCEDURAL(ctx)
P OP(analyzing)
if σo σo1 then
resultsrctxs Summary pσi , σo1 q
for c P callersrctxs do
A DD(worklist, c)
end for
end if
return σo1
end function
5
function R ESULTS F OR(ctx, σi )
σ resultsrctxs.output
if σ K&&σi resultsrctxs.input then
return σ existing results are good
end if
resultsrctxs.input resultsrctxs.input \ σi keep track of possibly more general input
if ctx P analyzing then
return K
else
return A NALYZE(ctx)
end if
end function
The following example shows that the algorithm generalizes naturally to the case of mutually
recursive functions:
bar ( ) { i f ( ∗ ) r e t u r n 2 e l s e r e t u r n foo ( ) }
foo ( ) { i f ( ∗ ) r e t u r n 1 e l s e r e t u r n bar ( ) }
main ( ) { foo ( ) ; }
1.6 Precision
A notable part of the algorithm above is that if we are currently analyzing a context and are asked
to analyze it again, we return K as the result of the analysis. This has similar benefits to using K for
initial dataflow values on the back edges of loops: starting with the most optimistic assumptions
about code we haven’t finished analyzing allows us to reach the best possible fixed point. The
following example program illustrates a function where the result of analysis will be better if we
assume K for recursive calls to the same context, vs. for example if we assumed J:
function i t e r a t i v e I d e n t i t y ( x : int , y : int )
i f x <= 0
return y
else
i t e r a t i v e I d e n t i t y ( x 1 ,y )
f u n c t i o n main ( z )
w= i t e r a t i v e I d e n t i t y ( z , 5 )
1.7 Termination
Under what conditions will context-sensitive interprocedural analysis terminate?
Consider the algorithm above. Analyze is called only when (1) a context has not been analyzed
yet, or when (2) it has just been taken off the worklist. So it is called once per reachable context,
plus once for every time a reachable context is added to the worklist.
6
We can bound the total number of worklist additions by (C) the number of reachable contexts,
times (H) the height of the lattice (we don’t add to the worklist unless results for some context
changed, i.e. went up in the lattice relative to an initial assumption of K or relative to the last
analysis result), times (N) the number of callers of that reachable context. C*N is just the number
of edges (E) in the inter-context call graph, so we can see that we will do intraprocedural analysis
O(E*H) times.
Thus the algorithm will terminate as long as the lattice is of finite height and there are a finite
number of reachable contexts. Note, however, that for some lattices, notably including constant
propagation, there are an unbounded number of lattice elements and thus an unbounded number
of contexts. If more than a finite number are not reachable, the algorithm will not terminate.
So for lattices with an unbounded number of elements, we need to adjust the context-sensitivity
approach above to limit the number of contexts that are analyzed.
7
analyzed. The call-string approach provides an easy, but naive, way to do this: call strings can be
cut off at a certain length. For example, if we have call strings “a b c” and “d e b c” (where c is the
most recent call site) with a cutoff of 2, the input dataflow information for these two call strings
will be merged and the analysis will be run only once, for the context identified by the common
length-two suffix of the strings, “b c”. We can illustrate this by redoing the analysis of the fibonacci
example. The algorithm is the same as above; however, we use a different implementation of
G ET C TX that computes the call string suffix:
type Context
val f n : F unction
val string : ListrZs
Acknowledgements
I thank Claire Le Goues for greatly appreciated extensions and refinements to these notes.
8
Lecture Notes: Pointer Analysis
1: z : 1
2: p : &z
3: p : 2
4: print z
To analyze this program correctly we must be aware that at instruction 3 p points to z. If this
information is available we can use it in a flow function as follows:
When we know exactly what a variable x points to, we have must-point-to information, and
we can perform a strong update of the target variable z, because we know with confidence that
assigning to p assigns to z. A technicality in the rule is quantifying over all z such that p must
point to z. How is this possible? It is not possible in C or Java; however, in a language with pass-
by-reference, for example C++, it is possible that two names for the same location are in scope.
Of course, it is also possible to be uncertain to which of several distinct locations p points:
1: z : 1
2: if pcondq p : &y else p : &z
3: p : 2
4: print z
Now constant propagation analysis must conservatively assume that z could hold either 1 or
2. We can represent this with a flow function that uses may-point-to information:
1
2 Andersen’s Points-To Analysis
Two common kinds of pointer analysis are alias analysis and points-to analysis. Alias analysis
computes sets S holding pairs of variables pp, q q, where p and q may (or must) point to the same
location. Points-to analysis, as described above, computes a relation points-topp, xq, where p may
(or must) point to the location of the variable x. We will focus primarily on points-to analysis,
beginning with a simple but useful approach originally proposed by Andersen (PhD thesis: “Pro-
gram Analysis and Specialization for the C Programming Language”).
Our initial setting will be C programs. We are interested in analyzing instructions that are
relevant to pointers in the program. Ignoring for the moment memory allocation and arrays, we
can decompose all pointer operations into four types: taking the address of a variable, copying a
pointer from one variable to another, assigning through a pointer, and dereferencing a pointer:
I :: ...
| p : &x
| p : q
| p : q
| p : q
Andersen’s points-to analysis is a context-insensitive interprocedural analysis. It is also a flow-
insensitive analysis, that is an analysis that does not consider program statement order. Context-
and flow-insensitivity are used to improve the performance of the analysis, as precise pointer
analysis can be notoriously expensive in practice.
We will formulate Andersen’s analysis by generating set constraints which can later be pro-
cessed by a set constraint solver using a number of technologies. Constraint generation for each
statement works as given in the following set of rules. Because the analysis is flow-insensitive,
we do not care what order the instructions in the program come in; we simply generate a set of
constraints and solve them.
address-of
vp : &xw ãÑ lx P p
copy
vp : q w ãÑ p
q
assign
vp : q w ãÑ p
q
dereference
vp : q w ãÑ p
q
The constraints generated are all set constraints. The first rule states that a constant location lx ,
representation the address of x, is in the set of location pointed to by p. The second rule states that
the set of locations pointed to by p must be a superset of those pointed to by q. The last two rules
state the same, but take into account that one or the other pointer is dereferenced.
A number of specialized set constraint solvers exist and constraints in the form above can be
translated into the input for these. The dereference operation (the in p
q) is not standard
in set constraints, but it can be encoded—see Fähndrich’s Ph.D. thesis for an example of how
to encode Andersen’s points-to analysis for the BANE constraint solving engine. We will treat
constraint-solving abstractly using the following constraint propagation rules:
2
p
q lx P q
copy
lx P p
p
q lr P p lx P q
assign
lx P r
p
q lr P q lx P r
dereference
lx P p
We can now apply Andersen’s points-to analysis to the program above. Note that in this
example if Andersen’s algorithm says that the set p points to only one location lz , we have must-
point-to information, whereas if the set p contains more than one location, we have only may-
point-to information.
We can also apply Andersen’s analysis to programs with dynamic memory allocation, such as:
1: q : malloc1 pq
2: p : malloc2 pq
3: p : q
4: r : &p
5: s : malloc3 pq
6: r : s
7: t : &s
8: u : t
In this example, the analysis is run the same way, but we treat the memory cell allocated at
each malloc or new statement as an abstract location labeled by the location n of the allocation
point. We can use the rules:
malloc
vp : mallocn pqw ãÑ ln P p
We must be careful because a malloc statement can be executed more than once, and each time
it executes, a new memory cell is allocated. Unless we have some other means of proving that
the malloc executes only once, we must assume that if some variable p only points to one abstract
malloc’d location ln , that is still may-alias information (i.e. p points to only one of the many actual
cells allocated at the given program location) and not must-alias information.
Analyzing the efficiency of Andersen’s algorithm, we can see that all constraints can be gener-
ated in a linear Opnq pass over the program. The solution size is Opn2 q because each of the Opnq
variables defined in the program could potentially point to Opnq other variables.
We can derive the execution time from a theorem by David McAllester published in SAS’99.
There are Opnq flow constraints generated of the form p
q, p
q, or p
q. How many
times could a constraint propagation rule fire for each flow constraint? For a p
q constraint,
the rule may fire at most Opnq times, because there are at most Opnq premises of the proper form
lx P p. However, a constraint of the form p
q could cause Opn2 q rule firings, because there
are Opnq premises each of the form lx P p and lr P q. With Opnq constraints of the form p
q
and Opn2 q firings for each, we have Opn3 q constraint firings overall. A similar analysis applies for
p
q constraints. McAllester’s theorem states that the analysis with Opn3 q rule firings can be
3
implemented in Opn3 q time. Thus we have derived that Andersen’s algorithm is cubic in the size
of the program, in the worst case.
1 : p.f : &x
2 : p.g : &y
A field-insensitive analysis would tell us (imprecisely) that p.f could point to y. In order
to be more precise, we can track the contents each field of each abstract location separately. In
the discussion below, we assume a setting in which we cannot take the address of a field; this
assumption is true for Java but not for C. We can define a new kind of constraints for fields:
field-read
vp : q.f w ãÑ p
q.f
field-assign
vp.f : q w ãÑ p.f
q
Now assume that objects (e.g. in Java) are represented by abstract locations l. We can process
field constraints with the following rules:
p
q.f lq P q lf P lq .f
field-read
lf P p
p.f
q lp P p lq P q
field-assign
lq P lp .f
If we run this analysis on the code above, we find that it can distinguish that p.f points to x
and p.g points to y.
4
other variable r. In this situation, Steensgaard’s algorithm unifies the abstract locations for q and
r, creating a single abstract location representing both of them. Now we can track the fact that p
may point to either variable using a single points-to relationship.
For example, consider the program below:
1: p : &x
2: r : &p
3: q : &y
4: s : &q
5: r : s
Andersen’s points-to analysis would produce the following graph:
x y
p q
r s
But in Steensgaard’s setting, when we discover that r could point both to q and to p, we must
merge q and p into a single node:
x y
pq
r s
Notice that we have lost precision: by merging the nodes for p and q our graph now implies
that s could point to p, which is not the case in the actual program. But we are not done. Now
pq has two outgoing arrows, so we must merge nodes x and y. The final graph produced by
Steensgaard’s algorithm is therefore:
xy
pq
r s
To define Steensgaard’s analysis more precisely, we will study a simplified version of that
ignores function pointers. It can be specified as follows:
5
copy
vp : q w ãÑ joinpp, q q
address-of
vp : &xw ãÑ joinpp, xq
dereference
vp : q w ãÑ joinpp, q q
assign
vp : q w ãÑ joinpp, q q
With each abstract location p, we associate the abstract location that p points to, denoted p.
Abstract locations are implemented as a union-find1 data structure so that we can merge two
abstract locations efficiently. In the rules above, we implicitly invoke find on an abstract location
before calling join on it, or before looking up the location it points to.
The join operation essentially implements a union operation on the abstract locations. How-
ever, since we are tracking what each abstract location points to, we must update this information
also. The algorithm to do so is as follows:
j o i n ( e1 , e2 )
i f ( e1 == e2 )
return
e 1 n e x t = ∗ e1
e 2 n e x t = ∗ e2
u n i f y ( e1 , e2 )
j o i n ( e1next , e 2 n e x t )
Once again, we implicitly invoke find on an abstract location before comparing it for equality,
looking up the abstract location it points to, or calling join recursively.
As an optimization, Steensgaard does not perform the join if the right hand side is not a pointer.
For example, if we have an assignment vp : q w and q has not been assigned any pointer value so
far in the analysis, we ignore the assignment. If later we find that q may hold a pointer, we must
revisit the assignment to get a sound result.
Steensgaard illustrated his algorithm using the following program:
1: a : &x
2: b : &y
3: if p then
4: y : &z
5: else
6: y : &x
7: c : &y
His analysis produces the following graph for this program:
1
See any algorithms textbook
6
xz
y a
c b
Rayside illustrates a situation in which Andersen must do more work than Steensgaard:
1: q : &x
2: q : &y
3: p : q
4: q : &z
After processing the first three statements, Steensgaard’s algorithm will have unified variables
x and y, with p and q both pointing to the unified node. In contrast, Andersen’s algorithm will
have both p and q pointing to both x and y. When the fourth statement is processed, Steensgaard’s
algorithm does only a constant amount of work, merging z in with the already-merged xy node.
On the other hand, Andersen’s algorithm must not just create a points-to relation from q to z, but
must also propagate that relationship to p. It is this additional propagation step that results in the
significant performance difference between these algorithms.
Analyzing Steensgaard’s pointer analysis for efficiency, we observe that each of n statements
in the program is processed once. The processing is linear, except for find operations on the union-
find data structure (which may take amortized time Opαpnqq each) and the join operations. We
note that in the join algorithm, the short-circuit test will fail at most Opnq times—at most once for
each variable in the program. Each time the short-circuit fails, two abstract locations are unified,
at cost Opαpnqq. The unification assures the short-circuit will not fail again for one of these two
variables. Because we have at most Opnq operations and the amortized cost of each operation
is at most Opαpnqq, the overall running time of the algorithm is near linear: Opn αpnqq. Space
consumption is linear, as no space is used beyond that used to represent abstract locations for all
the variables in the program text.
Based on this asymptotic efficiency, Steensgaard’s algorithm was run on a 1 million line pro-
gram (Microsoft Word) in 1996; this was an order of magnitude greater scalability than other
pointer analyses known at the time.
Steensgaard’s pointer analysis is field-insensitive; making it field-sensitive would mean that it
is no longer linear.
Acknowledgements
I thank Claire Le Goues for greatly appreciated extensions and refinements to these notes.
7
Lecture Notes: Object-Oriented Call Graph Construction
1 Dynamic dispatch
Analyzing object-oriented programs is challenging because it is not obvious which function is
called at a given call site. In order to construct a precise call graph, an analysis must determine
what the type of the receiver object is at each call site. Therefore, object-oriented call graph con-
struction algorithms must simultaneously build a call graph and compute aliasing information
describing to which objects (and thereby implicitly to which types) each variable could point.
1
the analysis proceeds as in Andersen’s algorithm, but the call graph is built up incrementally as
the analysis discovers the types of the objects to which each variable in the program can point.
Even 0-CFA analysis can be considerably more precise than Rapid Type Analysis. For example,
in the program below, RTA would assume that any implementation of foo() could be invoked at
any program location, but O-CFA can distinguish the two call sites:
class A { A foo (A x ) { return x ; } }
class B extends A { A foo (A x ) { r e t u r n new D ( ) ; } }
class D extends A { A foo (A x ) { r e t u r n new A ( ) ; } }
class C extends A { A foo (A x ) { return t h i s ; } }
// i n main ( )
A x = new A ( ) ;
while ( . . . )
x = x . foo ( new B ( ) ) ; // may c a l l A. foo , B . foo , or D. foo
A y = new C ( ) ;
y . foo ( x ) ; // only c a l l s C . foo
Acknowledgements
I thank Claire Le Goues for greatly appreciated extensions and refinements to these notes.
2
Lecture Notes: Control Flow Analysis for Functional
Languages
e :: λx.e
| x
| e1 e2
| let x e1 in e2
| if e0 then e1 else e2
| n | e1 e2 | ...
The grammar includes a definition of an anonymous function λx.e, where x is the function argu-
ment and e is the function body.1 The function can include any of the other types of expressions,
such as variables x or function calls e1 e2 , where e1 is the function to be invoked and e2 is passed
to that function as an argument. (In an imperative language this would more typically be written
e1 pe2 q but we follow the functional convention here, with parenthesis included when helpful syn-
tactically). We evaluate a function call pλx.eqpv q by substituting the argument v for all occurrences
of x in e. For example, pλx.x 1qp3q evaluates to 3 1, which of course evaluates to 4.
A more interesting example is pλf.f 3qpλx.x 1q, which first substitutes the argument for f ,
yielding pλx.x 1q 3. Then we invoke the function, getting 3 1 which again evaluates to 4.
1.1 0-CFA
Static analysis be just as useful in this type of language as in imperative languages, but immediate
complexities arise. For example: what is a program point in a language without obvious predeces-
sors or successors? Computation is intrinsically nested. Second, because functions are first-class
entities that can be passed around as variables, it’s not obvious which function is being applied
where. Although it is not obvious, we still need some way to figure it out, because the value a
function returns (which we may hope to track, such as through constant propagation analysis)
1
The formulation in PPA also includes a syntactic construct for explicitly recursive functions. The ideas extend
naturally, but we’ll follow the simpler syntax for expository purposes.
1
will inevitably depend on which function is called, as well as its arguments. Control flow analysis2
seeks to statically determine which functions could be associated with which variables. Further,
because functional languages are not based on statements but rather expressions, it is appropriate
to consider both the values of variables and the values expressions evaluate to.
We thus consider each expression to be labeled with a label l P L. Our analysis information σ
maps each variable and label to a lattice value. This first analysis is only concerned with possible
functions associated with each location or variable, and so the abstract domain is as follows:
σ P Var Y L Ñ P pλx.eq
The analysis information at any given program point, or for any program variable, is a set of
functions that could be stored in the variable or computed at that program point. Question: what
is the relation on this dataflow state?
We define the analysis by via inference rules that generate constraints over the possible dataflow
values for each variable or labeled location; those constraints are then solved. We use the ãÑ to
denote a relation such that vewl ãÑ C can be read as “The analysis of expression e with label l
generates constraints C over dataflow state σ.” For our first CFA, we can define inference rules for
this relation as follows:
var
vnwl ãÑ H const vxwl ãÑ σpxq σplq
In the rules above, the constant or variable value flows to the program location l. Question:
what might the rules for the if-then-else or arithmetic operator expressions look like? The rule for function
calls is a bit more complex. We define rules for lambda and application as follows:
vewl ãÑ C 0
ve1wl ãÑ C1 ve2wl ãÑ C2
1 2
apply
vel1 el2 wl ãÑ C1 Y C2 Y @λx.el0 P σpl1q : σpl2q σpxq ^ σpl0q σplq
1 2 0
The first rule just states that if a literal function is declared at a program location l, that function
is part of the lattice value σ plq computed by the analysis for that location. Because we want to
analyze the data flow inside the function, we also generate a set of constraints C from the function
body and return those constraints as well.
The rule for application first analyzes the function and the argument to extract two sets of
constraints C1 and C2 . We then generate a conditional constraint, saying that for every literal
function λx.e0 that the analysis (eventually) determines the function may evaluate to, we must
generate additional constraints capture value flow from the formal function argument to the actual
argument variable, and from the function result to the calling expression.
Consider the first example program given above, properly labelled as: ppλx.pxa 1b qc qd p3qe qg
one by one to analyze it. The first rule to use is apply (because that’s the top-level program con-
struct). We will work this out together, but the generated constraints could look like:
2
There are many possible valid solutions to this constraint set; clearly we want a precise solution
that does not overapproximate. We will elide a formal definition and instead assert that a σ that
maps all variables and locations except d to H and d to tλx.x 1u satisfies this set of constraints.
3
let add λx. λy. x y
let add5 padd 5qa5
let add6 padd 6qa6
let main padd5 2qm
This example illustrates currying, in which a function such as add that takes two arguments x
and y in sequence can be called with only one argument (e.g. 5 in the call labeled a5), resulting in
a function that can later be called with the second argument (in this case, 2 at the call labeled m).
The value 5 for the first argument in this example is stored with the function in the closure add5.
Thus when the second argument is passed to add5, the closure holds the value of x so that the sum
x y 5 2 7 can be computed.
The use of closures complicates program analysis. In this case, we create two closures, add5
and add6, within the program, binding 5 and 6 and the respective values for x. But unfortunately
the program analysis cannot distinguish these two closures, because it only computes one value
for x, and since two different values are passed in, we learn only that x has the value J. This
is illustrated in the following analysis. The trace below has been shortened to focus only on the
variables (the actual analysis, of course, would compute information for each program point too):
4
$ vλx.el wl ãÑ tpλx.e, δqu σpl, δq lambda
δ 0
δ $ ve
1 w ãÑ C1
l1 δ $ ve2 wl ãÑ C2 2 δ 1 suffix pδ l, mq
C3 pλx.e ,δ qPσpl ,δq σ pl2 , δ q σ px, δ q ^ σ pl0 , δ q σ pl, δ q
l0
1 1
C4 pλx.e ,δ qPσpl ,δq analyze pδ 1 $ ve0 wl q
l0
0
0 0 1
apply
δ $ vel1 el2 wl ãÑ C1 Y C2 Y C3 Y C4
1 2
These rules contain a call string context δ in which the analysis of each line of code is done. The
rules const and var are unchanged except for indexing σ by the current context δ. The lambda rule
now captures the context δ along with the lambda expression, so that when the lambda expression
is called the analysis knows in which context to look up the free variables.
The apply rule has gotten more complicated. A new context δ is formed by appending the
current call site l to the old call string, then taking the suffix of length m (or less). We now consider
all functions that may be called, as eventually determined by the analysis (our notation is slightly
loose, because the quantifier must be evaluated continuously for more matches as the analysis
goes along). For each, we produce constraints capturing the flow of values from the formal to
actual arguments, and from the result to the calling expression. We also produce constraints that
bind the free variables in the new context: all free variables in the called function flow from the
point δ0 at which the closure was captured. Finally, in C4 we collect the constraints that we get
from analyzing each of the potentially called functions in the new context δ 1 .
We can now reanalyze the earlier example, observing the benefit of context sensitivity. In the
table below,
denotes the empty calling context (e.g. when analyzing the main procedure):
Note three points about this analysis. First, we can distinguish the values of x in the two
calling contexts: x is 5 in the context a5 but it is 6 in the context a6. Second, the closures returned
to the variables add5 and add6 record the scope in which the free variable x was bound when the
closure was captured. This means, third, that when we invoke the closure add5 at program point
m, we will know that x was captured in calling context a5, and so when the analysis analyzes the
addition, it knows that x holds the constant 5 in this context. This enables constant propagation
to compute a precise answer, learning that the variable main holds the value 7.
1.4 Optional: Uniform k-Calling Context Sensitive Control Flow Analysis (k-CFA)
m-CFA was proposed recently by Might, Smaragdakis, and Van Horn as a more scalable version
of the original k-CFA analysis developed by Shivers for Scheme. While m-CFA now seems to be
a better tradeoff between scalability and precision, k-CFA is interesting both for historical reasons
5
and because it illustrates a more precise approach to tracking the values of variables in a closure.
The following example illustrates a situation in which m-CFA may be too imprecise:
t,
pλz. x y z, rq
x, f 4
x, r J when analyzing second call
f,
pλz. x y z, rq
e,
J
The k-CFA analysis is like m-CFA, except that rather than keeping track of the scope in which
a closure was captured, the analysis keeps track of the scope in which each variable captured in
the closure was defined. We use an environment η to track this. Note that since η can represent
a separately calling context for each variable, rather than merely a single context for all variables,
it has the potential to be more accurate, but also much more expensive. We can represent the
analysis information as follows:
Let us briefly analyze the complexity of this analysis. In the worst case, if a closure captures n
different variables, we may have a different call string for each of them. There are Opnk q different
call strings for a program of size n, so if we keep track of one for each of n variables, we have
Opnnk q different representations of the contexts for the variables captured in each closure. This
exponential blowup is why k-CFA scales so badly. m-CFA is comparatively cheap—there are
“only” Opnk q different contexts for the variables captured in each closure—still exponential in k,
but polynomial in n for a fixed (and generally small) k.
We can now define the rules for k-CFA. They are similar to the rules for m-CFA, except that we
now have two contexts: the calling context δ, and the environment context η tracking the context
in which each variable is bound. When we analyze a variable x, we look it up not in the current
context δ, but the context η pxq in which it was bound. When a lambda is analyzed, we track the
current environment η with the lambda, as this is the information necessary to determine where
captured variables are bound. The application rule is actually somewhat simpler, because we do
not copy bound variables into the context of the called procedure:
6
var
δ, η $ vnwl ãÑ αpnq σpl, δq const δ, η $ vxwl ãÑ σpx, ηpxqq σpl, δq
δ, η $ ve1 wl1 ãÑ C1
δ, η $ ve2 wl2 ãÑ C2 δ 1 suffix pδ l, k q
C3 pλx.el0 ,η qPσpl ,δq σ pl2 , δ q σ px, δ q ^ σ pl0 , δ q σ pl, δ q
1 1
C4 pλx.el0 ,η qPσpl ,δq C where δ 1 , η0 $ ve0 wl0 ãÑ C
0 0 1
0 0 1
apply
δ, η $ vel1 1
el22 wl ãÑ C1 Y C2 Y C3 Y C4
Now we can see how k-CFA analysis can more precisely analyze the latest example program.
In the simulation below, we give two tables: one showing the order in which the functions are
analyzed, along with the calling context δ and the environment η for each analysis, and the other
as usual showing the analysis information computed for the variables in the program:
function δ η
main
H
adde t tx ÞÑ tu
h r tx ÞÑ t, y ÞÑ ru
adde f tx ÞÑ f u
h r tx ÞÑ f, y ÞÑ ru
λz.... e tx ÞÑ t, y ÞÑ r, z ÞÑ eu
Tracking the definition point of each variable separately is enough to restore precision in this
program. However, programs with this structure—in which analysis of the program depends on
different calling contexts for bound variables even when the context is the same for the function
eventually called—appear to be rare in practice. Might et al. observed no examples among the real
programs they tested in which k-CFA was more accurate than m-CFA—but k-CFA was often far
more costly. Thus at this point the m-CFA analysis seems to be a better tradeoff between efficiency
and precision, compared to k-CFA.
Acknowledgements
I thank Claire Le Goues for greatly appreciated extensions and refinements to these notes.
7
Lecture Notes: Axiomatic Semantics and
Hoare-style Verification
It has been found a serious problem to define these languages [ALGOL, FORTRAN,
COBOL] with sufficient rigor to ensure compatibility among all implementations...One
way to achieve this would be to insist that all implementations of the language shall
satisfy the axioms and rules of inference which underlie proofs of properties of pro-
grams expressed in the language. In effect, this is equivalent to accepting the axioms
and rules of inference as the ultimately definitive specification of the meaning of the
language.
C.A.R Hoare, An Axiomatic Basis for Computer Programming,1969
1 Axiomatic Semantics
Axiomatic semantics (or Hoare-style logic) defines the meaning of a statement in terms of its effects
on assertions of truth that can be made about the associated program. This provides a formal
system for reasoning about correctness. An axiomatic semantics fundamentally consists of: (1)
a language for stating assertions about programs (where an assertion is something like “if this
function terminates, x > 0 upon termination”), coupled with (2) rules for establishing the truth of
assertions. Various logics have been used to encode such assertions; for simplicity, we will begin
by focusing on first-order logic.
In this system, a Hoare Triple encodes such assertions:
{P } S {Q}
P is the precondition, Q is the postcondition, and S is a piece of code of interest. Relating this
back to our earlier understanding of program semantics, this can be read as “if P holds in some
state E and if hS, Ei ⇓ E 0 , then Q holds in E 0 .” We distinguish between partial ({P } S {Q}) and
total ([P ] S [Q]) correctness by saying that total correctness means that, given precondition P , S
will terminate, and Q will hold; partial correctness does not make termination guarantees. We
primarily focus on partial correctness.
1
Note that we are somewhat sloppy in mixing logical variables and program variables; All
W HILE variables implicitly range over integers; All W HILE boolean expressions are also assertions
We now define an assertion judgement E A , read “A is true in E”. The judgment is de-
fined inductively on the structure of assertions, and relies on the operational semantics of W HILE
arithmetic expressions. For example:
E true always
E e1 = e2 iff he1 , Ei ⇓ n = he2 , Ei ⇓ n
E e1 ≥ e2 iff he1 , Ei ⇓ n ≥ he2 , Ei ⇓ n
E A1 ∧ A2 iff E A1 and E A2
...
E ∀x.A iff ∀n ∈ Z.E[x := n] A
E ∃x.A iff ∃n ∈ Z.E[x := n] A
Now we can define formally the meaning of a partial correctness assertion {P } S {Q}:
`A `B
and
`A∧B
We can now write ` {P } S {Q} when we can derive a triple using derivation rules. There is
one derivation rule for each statement type in the language (sound familiar?):
skip assign
` {P } skip {P } ` {[e/x]P } x:=e {P }
` P0 ⇒ P ` {P } S {Q} ` Q ⇒ Q0
consq
` {P 0 } S {Q0 }
2
This rule is important because it lets us make progress even when the pre/post conditions
in our program don’t exactly match what we need (even if they’re logically equivalent) or are
stronger or weaker logically than ideal.
We can use this system to prove that triples hold. Consider {true} x := e {x = e}, using (in
this case) the assignment rule plus the rule of consequence:
` true ⇒ e = e {e = e} x := e {x = e}
` {true}x := e{x = e}
We elide a formal statement of the soundness of this system. Intuitively, it expresses that the
axiomatic proof we can derive using these rules is equivalent to the operational semantics deriva-
tion (or that they are sound and relatively complete, that is, as complete as the underlying logic).
2 Proofs of a Program
Hoare-style verification is based on the idea of a specification as a contract between the imple-
mentation of a function and its clients. The specification consists of the precondition and a post-
condition. The precondition is a predicate describing the condition the code/function relies on for
correct operation; the client must fulfill this condition. The postcondition is a predicate describing
the condition the function establishes after correctly running; the client can rely on this condition
being true after the call to the function.
Note that if a client calls a function without fulfilling its precondition, the function can behave
in any way at all and still be correct. Therefore, if a function must be robust to errors, the precon-
dition should include the possibility of erroneous input, and the postcondition should describe
what should happen in case of that input (e.g. an exception being thrown).
The goal in Hoare-style verification is thus to (statically!) prove that, given a pre-condition,
a particular post-condition will hold after a piece of code executes. We do this by generating a
logical formula known as a verification condition, constructed such that, if true, we know that the
program behaves as specified. The general strategy for doing this, introduced by Dijkstra, relies
on the idea of a weakest precondition of a statement with respect to the desired post-condition. We
then show that the given precondition implies it. However, loops, as ever, complicate this strategy.
3
should know that x0 + y = 5 and x = x0 + z, where x0 is the old value of x. The program semantics
doesn’t keep track of the old value of x, but we can express it by introducing a fresh, existentially
quantified variable x0 . This gives us the following strongest postcondition for assignment:1
wp(x := E, P ) = [E/x]P
wp(S; T, Q) = wp(S, wp(T, Q))
wp(if B then S else T, Q) = B ⇒ wp(S, Q) ∧ ¬B ⇒ wp(T, Q)
2.2 Loops
As usual, things get tricky when we get to loops. Consider:
4
2.3 Proving programs
Assume a version of W HILE that annotates loops with invariants: whileinv b do S. Given such a
program, and associated pre- and post-conditions:
{P } Sinv {Q}
The proof strategy constructs a verification condition V C(Sannot , Q) that we seek to prove true
(usually with the help of a theorem prover). V C is guaranteed to be stronger than wp(Sannot , Q)
but still weaker than P : P ⇒ V C(Sannot , Q) ⇒ wp(Sannot , Q) We compute V C using a verification
condition generation procedure V CGen, which mostly follows the definition of the wp function
discussed above:
V CGen(skip, Q) = Q
V CGen(S1 ; S2 , Q) = V CGen(S1 , V CGen(S2 , Q))
V CGen(if b then S1 else S2 , Q) = b ⇒ V CGen(S1 , Q) ∧ ¬b ⇒ V CGen(S2 , Q)
V CGen(x := e, Q) = [e/x]Q
r := 1;
i := 0;
while i < m do
r := r ∗ n;
i := i + 1
We wish to prove that this function computes the nth power of m and leaves the result in r.
We can state this with the postcondition r = nm .
Next, we need to determine a precondition for the program. We cannot simply compute it
with wp because we do not yet know the loop invariant is—and in fact, different loop invariants
could lead to different preconditions. However, a bit of reasoning will help. We must have m ≥ 0
because we have no provision for dividing by n, and we avoid the problematic computation of 00
by assuming n > 0. Thus our precondition will be m ≥ 0 ∧ n > 0.
A good heuristic for choosing a loop invariant is often to modify the postcondition of the loop
to make it depend on the loop index instead of some other variable. Since the loop index runs
from i to m, we can guess that we should replace m with i in the postcondition r = nm . This gives
us a first guess that the loop invariant should include r = ni .
This loop invariant is not strong enough, however, because the loop invariant conjoined with
the loop exit condition should imply the postcondition. The loop exit condition is i ≥ m, but we
need to know that i = m. We can get this if we add i ≤ m to the loop invariant. In addition, for
proving the loop body correct, we will also need to add 0 ≤ i and n > 0 to the loop invariant.
Thus our full loop invariant will be r = ni ∧ 0 ≤ i ≤ m ∧ n > 0.
5
Our next task is to use weakest preconditions to generate proof obligations that will verify the
correctness of the specification. We will first ensure that the invariant is initially true when the
loop is reached, by propagating that invariant past the first two statements in the program:
{m ≥ 0 ∧ n > 0}
r := 1;
i := 0;
{r = ni ∧ 0 ≤ i ≤ m ∧ n > 0}
We propagate the loop invariant past i := 0 to get r = n0 ∧ 0 ≤ 0 ≤ m ∧ n > 0. We propagate
this past r := 1 to get 1 = n0 ∧ 0 ≤ 0 ≤ m ∧ n > 0. Thus our proof obligation is to show that:
m ≥ 0 ∧ n > 0 ⇒ 1 = n0 ∧ 0 ≤ 0 ≤ m ∧ n > 0
We prove this with the following logic:
m≥0∧n>0 by assumption
1 = n0 because n0 = 1 for all n > 0 and we know n > 0
0≤0 by definition of ≤
0≤m because m ≥ 0 by assumption
n>0 by the assumption above
1 = n0 ∧ 0 ≤ 0 ≤ m ∧ n > 0 by conjunction of the above
To show the loop invariant is preserved, we have:
{r = ni ∧ 0 ≤ i ≤ m ∧ n > 0 ∧ i < m}
r := r ∗ n;
i := i + 1;
{r = ni ∧ 0 ≤ i ≤ m ∧ n > 0}
We propagate the invariant past i := i + 1 to get r = ni+1 ∧ 0 ≤ i + 1 ≤ m ∧ n > 0. We propagate
this past r := r ∗ n to get: r ∗ n = ni+1 ∧ 0 ≤ i + 1 ≤ m ∧ n > 0. Our proof obligation is therefore:
r = ni ∧ 0 ≤ i ≤ m ∧ n > 0 ∧ i < m
⇒ r ∗ n = ni+1 ∧ 0 ≤ i + 1 ≤ m ∧ n > 0
We can prove this as follows:
r = ni ∧ 0 ≤ i ≤ m ∧ n > 0 ∧ i ≥ m
⇒ r = nm
6
We can prove it as follows:
r = ni ∧ 0 ≤ i ≤ m ∧ n > 0 ∧ i ≥ m by assumption
i=m because i ≤ m and i ≥ m
r = nm substituting m for i in assumption
7
Lecture Notes: Program Synthesis
Note: A complete, if lengthy, resource on inductive program synthesis is the book “Program Syn-
thesis” by Gulwani et. al [8]. You need not read the whole thing; I encourage you to investigate
the portions of interest to you, and skim as appropriate. I drew many references in this document
from there; if you are interested, it contains many more.
That is, we seek a program P that satisfies some specification ϕ on all inputs. We take a lib-
eral view of P in discussing synthesis, as a wide variety of artifact types have beeen successfully
synthesized (anythign that reads inputs or produces outputs). Beyond (relatively small) program
snippets of the expected variety, this includes protocols, interpreters, classifiers, compression al-
gorithms or implementations, scheduling policies, cache coherence protocols for multicore pro-
cessors. The specification ϕ is an expression of the user intent, and may be expressed in one of
several ways: a formula, a reference implementation, input/output pairs, traces, demonstrations,
or a syntactic sketch, among other options.
Program synthesis can thus be considered along three dimensions:
(1) Expressing user intent. User intent (or ϕ in the above) can be expressed in a number of
ways, including logical specifications, input/output examples [4] (often with some kind of user- or
synthesizer-driven interaction), traces, natural language [3, 7, 13], or full- or partial programs [?].
In this latter category lies reference implementations, such as executable specifications (which
gives the desired output for a given input) or declarative specifications (which checks whether a
given input/output pair is correct). Some synthesis techniques allow for multi-modal specifica-
tions, including pre- and post- conditions, safety assertions at arbitrary program points, or partial
program templates.
Such specifications can constrain two aspects of the synthesis problem:
• Structural properties, or internal computation steps. These are often expressed as a sketch
or template, but can be further constrained by assertions over the number or variety of op-
erations in a synthesized programs (or number of iterations, number of cache misses, etc,
1
depending on the synthesis problem in question). Indeed, one of the key principles behind
the scaling of many modern synthesis techniques lie in the way they syntactically restrict the
space of possible programs, often via a sketch, grammar, or DSL.
.
Note that basically all of the above types of specifications can be translated to constraints in
some form or another. Techniques that operate over multiple types of specifications can overcome
various challenges that come up over the course of an arbitrary synthesis problem. Different
specification types are more suitable for some types of problems than others. Alternatively, trace-
or sketch-based specifications can allow a synthesizer to decompese a synthesis problems into
intermediate program points.
(2) Search space of possible programs. The search space naturally includes programs, often con-
structed of subsets of normal program languages. This can include a predefined set of considered
operators or control structures, defined as grammars. However, other spaces are considered for
various synthesis problems, like logics of various kinds, which can be useful for, e.g., synthesizing
graph/tree algorithms.
(3) Search technique. At a high level, there are two general approaches to logical synthesis:
• Deductive (or classic) synthesis (e.g., [15]), which maps a high-level (e.g. logical) specifica-
tion to an executable implementation. Such approaches are efficient and provably correct:
thanks to the semantics-preserving rules, only correct programs are explored. However,
they require complete specifications and sufficient axiomatization of the domain. These ap-
proaches are classically applied to e.g., controller synthesis.
• Inductive (sometimes called syntax-guided) synthesis, which takes a partial (and often multi-
modal) specification and constructs a program that satisfies it. These techniques are more
flexible in their specification requirements and require no axioms, but often at the cost of
lower efficiency and weaker bounded guarantees on the optimality of synthesized code.
Deductive synthesis shares quite a bit in common, conceptually, with compilation: rewriting a
specification according to various rules to achieve a new program in at a different level of represen-
tation. We will (very) briefly overview Denali [11], a prototypical deductive synthesis techniques,
using slides. However, deductive synthesis approaches assume a complete formal specification
of the desired user intent was provided. In many cases, this can be as complicated as writing the
program itself.
This has motivated new inductive synthesis approaches, towards which considerable modern
research energy has been dedicated. This category of techniques lends itself to a wide variety of
search strategies, including brute-force or enumerative [1] (you might be surprised!), probabilistic
inference/belief propagation [6], or genetic programming [12]. Alternatively, techniques based on
logical reasoning delegate the search problem to a constraint solver. We will spend more time on
this set of techniques.
2 Inductive Synthesis
Inductive synthesis uses inductive reasoning to construct programs in response to partial specifi-
cations. The program is synthesized via a symbolic interpretation of a space of candidates, rather
2
than by deriving the candidate directly. So, to synthesize such a program, we basically only require
an interpreter, rather than a sufficient set of derivation axioms. Inductive synthesis is applicable
to a variety of problem types, such as string transformation (FlashFill) [5], data extraction/pro-
cessing/wrangling [4, 19], layout transformation of tables or tree-shaped structures [20], graphics
(constructing structured, repetitive drawings) [9, 2], program repair [16, 14] (spoiler alert!), super-
optimization [11], and efficient synchronization, among others.
Inductive synthesis consists of several family of approaches; we will overview several promi-
nent examples, without claiming to be complete.
Syntax-Guided Synthesis (or SyGuS) formalizes the problem of program synthesis where specifi-
cation is supplemented with a syntactic template. This defines a search space of possible programs
that the synthesizer effectively traverses. Many search strategies exist; two especially well-known
strategies are enumerative search (which can be remarkably effective, though rarely scales), and
deductive or top down search, which recursively reduces the problem into simpler sub-problems.
3
component-based oracle-guided program synthesis in detail, which illustrates the use of distin-
guishing oracles.
Functionality. φf unc denotes the functionality constraint that guarantees that the solution f satis-
fies the given input-output pairs:
1
These notes are inspired by Section III.B of Nguyen et al., ICSE 2013 [17] ...which provides a really beautifully clear
exposition of the work that originally proposed this type of synthesis in Jha et al., ICSE 2010 [10].
4
def
φf unc pL, α, βq “ ψconn pL, Ñ
Ý
χ , r, Q, Rq ^ φlib pQ, Rq ^ pα “ Ñ
Ý
χ q ^ pβ “ rq
def
ψconn pL, Ñ
Ý Ź
χ , r, Q, Rq “ plx “ ly ñ x “ yq
x,yPQYRYÑ Ý
χ Ytru
def N
p φi pÑ Ý
Ź
φlib pQ, Rq “ χ i , ri qq
i“1
ψconn encodes the meaning of the location variables: If two locations are equal, then the values
of the variables defined at those locations are also equal. φlib encodes the semantics of the pro-
vided basic components, with φi representing the specification of component fi . The rest of φf unc
encodes that if the input to the synthesized function is α, the output must be β.
Almost done! φf unc provides constraints over a single input-output pair αi , βi , we still need to
generalize it over all n provided pairs tă αi , betai ą |1 ď i ď nu:
def n
Ź
θ “ p φf unc pL, αi , βi qq ^ ψwf p pL, Q, Rq
i“1
θ collects up all the previous constraints, and says that the synthesized function f should satisfy
all input-output pairs and the function has to be well formed.
LVal2Prog. The only real unknowns in all of θ are the values for the location variables L. So, the
solver that provides a satisfying assignment to θ is basically giving a valuation of L that we then
turn into a constructed program as follows:
Given a valuation of L, Lval2ProgpLq converts it to a program as follows: The it h line of the
η
Ź
program is rj “ fj prσp1q , ..., rσpηq q whenlrj ““ i and plχk ““ lrσpkq q, where η is the number of
j
k“1
inputs for component fj and χkj denotes the k th input parameter of component fj . The program
output is produced in line lr .
Example. Assume we only have one component, +. + has two inputs: χ1` and χ2` . The output
variable is r` . Further assume that the desired program f has one input χ (which we call input0
in the actual program text) and one output r. Given a mapping for location variables of: tlr` ÞÑ
1, lχ1` ÞÑ 0, χ2` ÞÑ 0, lr ÞÑ 1, lχ ÞÑ 0u, then the program looks like:
0 r0 :“ input0
1 r` :“ r0 ` r0
2 return r`
This occurs because the location of the variables used as input to + are both on the same line (0),
which is also the same line as the input to the program (0). lr , the return variable of the program,
is defined on line 1, which is also where the output of the + component is located. (lr` ). We added
the return on line 2 as syntactic sugar.
References
[1] R. Alur, R. Bodı́k, E. Dallal, D. Fisman, P. Garg, G. Juniwal, H. Kress-Gazit, P. Madhusudan,
M. M. K. Martin, M. Raghothaman, S. Saha, S. A. Seshia, R. Singh, A. Solar-Lezama, E. Torlak,
5
and A. Udupa. Syntax-guided synthesis. In M. Irlbeck, D. A. Peled, and A. Pretschner,
editors, Dependable Software Systems Engineering, volume 40 of NATO Science for Peace and
Security Series, D: Information and Communication Security, pages 1–25. IOS Press, 2015.
[2] R. Chugh, B. Hempel, M. Spradlin, and J. Albers. Programmatic and direct manipulation,
together at last. SIGPLAN Not., 51(6):341–354, June 2016.
[3] A. Desai, S. Gulwani, V. Hingorani, N. Jain, A. Karkare, M. Marron, S. R, and S. Roy. Program
synthesis using natural language. In Proceedings of the 38th International Conference on Software
Engineering, ICSE ’16, pages 345–356, New York, NY, USA, 2016. ACM.
[5] S. Gulwani, W. R. Harris, and R. Singh. Spreadsheet data manipulation using examples.
Commun. ACM, 55(8):97–105, Aug. 2012.
[7] S. Gulwani and M. Marron. Nlyze: Interactive programming by natural language for spread-
sheet data analysis and manipulation. In Proceedings of the 2014 ACM SIGMOD International
Conference on Management of Data, SIGMOD ’14, pages 803–814, New York, NY, USA, 2014.
ACM.
[8] S. Gulwani, O. Polozov, and R. Singh. Program synthesis. Foundations and Trends in Program-
ming Languages, 4(1-2):1–119, 2017.
[9] B. Hempel and R. Chugh. Semi-automated svg programming via direct manipulation. In
Proceedings of the 29th Annual Symposium on User Interface Software and Technology, UIST ’16,
pages 379–390, New York, NY, USA, 2016. ACM.
[11] R. Joshi, G. Nelson, and K. Randall. Denali: A goal-directed superoptimizer. SIGPLAN Not.,
37(5):304–314, May 2002.
[12] G. Katz and D. Peled. Genetic programming and model checking: Synthesizing new mutual
exclusion algorithms. In Proceedings of the 6th International Symposium on Automated Technology
for Verification and Analysis, ATVA ’08, pages 33–47, Berlin, Heidelberg, 2008. Springer-Verlag.
[13] V. Le, S. Gulwani, and Z. Su. Smartsynth: Synthesizing smartphone automation scripts from
natural language. In Proceeding of the 11th Annual International Conference on Mobile Systems,
Applications, and Services, MobiSys ’13, pages 193–206, New York, NY, USA, 2013. ACM.
[14] C. Le Goues, T. Nguyen, S. Forrest, and W. Weimer. GenProg: A generic method for auto-
mated software repair. IEEE Transactions on Software Engineering, 38(1):54–72, 2012.
6
[15] Z. Manna and R. J. Waldinger. Toward automatic program synthesis. Commun. ACM,
14(3):151–165, Mar. 1971.
[16] S. Mechtaev, J. Yi, and A. Roychoudhury. Angelix: Scalable Multiline Program Patch Synthe-
sis via Symbolic Analysis. In International Conference on Software Engineering, ICSE ’16, pages
691–701, 2016.
[17] H. D. T. Nguyen, D. Qi, A. Roychoudhury, and S. Chandra. Semfix: Program repair via
semantic analysis. In Proceedings of the 2013 International Conference on Software Engineering,
ICSE ’13, pages 772–781, Piscataway, NJ, USA, 2013. IEEE Press.
[18] O. Polozov and S. Gulwani. Flashmeta: A framework for inductive program synthesis. SIG-
PLAN Not., 50(10):107–126, Oct. 2015.
[19] R. Singh and S. Gulwani. Transforming spreadsheet data types using examples. In Proceedings
of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages,
POPL ’16, pages 343–356, New York, NY, USA, 2016. ACM.
[20] A. Solar-Lezama. Program Synthesis by Sketching. PhD thesis, Berkeley, CA, USA, 2008.
AAI3353225.
7
Lecture Notes: Symbolic Execution
1
values that would lead to that branch. After a given symbolic execution is complete, the engine
may go back to the branches taken and explore other paths through the program.
To get an intuition for how symbolic analysis works, consider abstractly executing a path
through the program above. As we go along the path, we will keep track of the (potentially
symbolic) values of variables, and we will also track the conditions that must be true in order for
us to take that path. We can write this in tabular form, showing the values of the path condition g
and symbolic environment E after each line:
line g E
0 true a ÞÑ α, b ÞÑ β, c ÞÑ γ
1 true . . . , x ÞÑ 0, y ÞÑ 0, z ÞÑ 0
2 α . . . , x ÞÑ 0, y ÞÑ 0, z ÞÑ 0
5 α^β ¥5 . . . , x ÞÑ 0, y ÞÑ 0, z ÞÑ 0
9 α ^ β ¥ 5 ^ 0 0 0 3 . . . , x ÞÑ 0, y ÞÑ 0, z ÞÑ 0
In the example, we arbitrarily picked the path where the abstract value of a, i.e. α, is false,
and the abstract value of b, i.e. β, is not less than 5. We build up a path condition out of these
boolean predicates as we hit each branch in the code. The assignment to x, y, and z updates the
symbolic state E with expressions for each variable; in this case we know they are all equal to 0.
At line 9, we treat the assert statement like a branch. In this case, the branch expression evaluates
to 0 0 0 3 which is true, so the assertion is not violated.
Now, we can run symbolic execution again along another path. We can do this multiple times,
until we explore all paths in the program (exercise to the reader: how many paths are there in the
program above?) or we run out of time. If we continue doing this, eventually we will explore the
following path:
line g E
0 true a ÞÑ α, b ÞÑ β, c ÞÑ γ
1 true . . . , x ÞÑ 0, y ÞÑ 0, z ÞÑ 0
2 α . . . , x ÞÑ 0, y ÞÑ 0, z ÞÑ 0
5 α^β 5 . . . , x ÞÑ 0, y ÞÑ 0, z ÞÑ 0
6 α^β 5^γ . . . , x ÞÑ 0, y ÞÑ 1, z ÞÑ 0
6 α^β 5^γ . . . , x ÞÑ 0, y ÞÑ 1, z ÞÑ 2
9 α ^ β 5 ^ p0 1 2 3q . . . , x ÞÑ 0, y ÞÑ 1, z ÞÑ 2
Along this path, we have α ^ β 5. This means we assign y to 1 and z to 2, meaning that the
assertion 0 1 2 3 on line 9 is false. Symbolic execution has found an error in the program!
2
We start by defining symbolic analogs for arithmetic expressions and boolean predicates. We
will call symbolic predicates guards and use the metavariable g, as these will turn into guards for
paths the symbolic evaluator explores. These analogs are the same as the ordinary versions, except
that in place of variables we use symbolic constants:
E P Var Ñ as
Now we can define big-step rules for the symbolic evaluation of expressions, resulting in sym-
bolic expressions. Since we don’t have actual values in many cases, the expressions won’t evalu-
ate, but variables will be replaced with symbolic constants:
xn, E y ó n big-int
We can likewise define rules for statement evaluation. These rules need to update not only the
environment E, but also a path guard g:
xa, E y ó as
xg, E, x : ay ó xg, E rx ÞÑ assy big-assign
xP, E y ó g g ^ g SAT xg ^ g , E, s1y ó xg , E y big-iftrue
1 1 1 2 1
The rules for skip, sequence, and assignment are compositional in the expected way, with the
arithmetic expression on the right-hand side of an assignment evaluating to a symbolic expression
3
rather than a value. The interesting rules are the ones for if. Here, we evaluate the condition to a
symbolic predicate g 1 . In the true case, we use a SMT solver to verify that the guard is satisfiable
when conjoined with the existing path condition. If that’s the case, we continue by evaluating the
true branch symbolically. The false case is analogous.
We leave the rule for while to the reader, following the principles behind the if rules above.
References
[1] W. R. Bush, J. D. Pincus, , and D. J. Sielaff. A static analyzer for finding dynamic programming
errors. SoftwarePractice and Experience, 30:775–802, 2000.
4
Mixing Type Checking and Symbolic Execution
expressions. The typing of symbolic reference expressions must be V , and some Λ00 , Λ0 such that Λ00 ∗ Λ0 ⊇ Λ0 ∗ Λ.
with respect to some memory typing. This memory typing is given Here we say that if we have n > 0 symbolic executions that each
by Λ0 and Λ. For technical reasons, we need to separate the loca- start with a path condition of true and where their resulting path
tions in the arbitrary memory on entry Λ0 from the locations that conditions are exhaustive (i.e., g(S1 ) ∨ . . . ∨ g(Sn ) is a tautology
come from allocations during symbolic execution Λ; to get typing meaning it holds under any valuation V ), then one of those sym-
for the entire memory, we write Λ0 ∗ Λ to mean the union of sub- bolic executions must match the concrete execution. Observe that
memory typings Λ0 and Λ with disjoint domains. Analogously, we in this statement, there is no premise on the resulting path condi-
also have a symbolic soundness relation that applies to memory- tion, but rather that we start with a initial path condition of true.
value pairs: hM ; vi ∼Λ0 ·V ·Λ hm; si.
As alluded to above, we first consider a notion of symbolic ex-
ecution soundness with respect to a concrete execution. This no- 4. M IXY: A Prototype of M IX for C
tion is what is stated in the second part of mix soundness (Theo- We have developed M IXY, a prototype tool for C that uses M IX
rem 1). Analogous to type soundness, it says that suppose we have to detect null pointer errors. M IXY mixes a (flow-insensitive) type
a concrete evaluation E ` hM ; ei → r and a symbolic execution qualifier inference system with a symbolic executor. M IXY is built
on top of the CIL front-end for C [Necula et al. 2002], and our with MIX(symbolic). We use CIL’s built-in pointer analysis to find
type qualifier inference system, CilQual, is essentially a simplified the targets of calls through function pointers. Finally, we switch
CIL reimplementation of the type qualifier inference algorithm to symbolic execution for each function marked MIX(symbolic) that
described by Foster et al. [2006]. Our symbolic executor, Otter was discovered at the frontier.
[Reisner et al. 2010], uses STP [Ganesh and Dill 2007] as its SMT In this section, we describe implementation details that are not
solver and works in a manner similar to KLEE [Cadar et al. 2008]. captured by our formal system from Section 3:
Type Qualifiers and Null Pointer Errors. For this application, we • The formal system M IX is based on a type checking system
introduce two qualifier annotations for pointers: nonnull indicates where all types are given. Since type qualifier inference in-
that a pointer must not be null, and null indicates that a pointer may volves variables, we need to handle variables that are not yet
be null. Our inference system automatically annotates uses of the constrained to concrete type qualifiers when transitioning to a
NULL macro with the null qualifier annotation. The type qualifier symbolic block (Section 4.1).
inference system generates constraints among known qualifiers • We need to translate information about aliasing between blocks
and unknown qualifier variables, solves those constraints, and then (Section 4.2).
reports a warning if null values may flow to nonnull positions. • Since the same block or function may be called from multiple
Thus, our type qualifier inference system ensures pointers that may contexts, we need to avoid repeating analysis of the same func-
be null cannot be used where non-null pointers are required. tion (Section 4.3).
For example, consider the following C code: • Since functions can contain blocks and be recursive, we need
1 void free(int ∗nonnull x);
to handle recursion between typed and symbolic blocks (Sec-
2 int ∗id(int ∗p) { return p; } tion 4.4).
3 int ∗x = NULL; Finally, we present our initial experience with M IXY (Section 4.5),
4 int ∗y = id(x);
and we discuss some limitations and future work (Section 4.6).
5 free(y);
Here on line 1 we annotate free to indicate it takes a nonnull pointer. 4.1 Translating Null/Non-null and Type Variables
Then on line 3, we initialize x to be NULL, pass that value through At transitions between typed and symbolic blocks, we need to
id, and store the result in y on line 4. Then on line 5 we call free translate null and nonnull annotations back and forth.
with NULL.
Our qualifier inference system will generate the following types From Types to Symbolic Values. Suppose local variable x has
and constraints (with some simplifications, and ignoring l- and r- type int ∗nonnull. Then in the symbolic executor, we initialize x
value issues): to point to a fresh memory cell. If x has type int ∗null, then we ini-
free : int ∗ nonnull → void x : int ∗β tialize x to be (α:bool)?loc:0, where α is a fresh boolean that may
be either true or false, loc is a newly initialized pointer (described
id : int ∗γ → int ∗δ y : int ∗ in Section 4.2), and 0 represents null. Hence this expression means
null = β β=γ γ=δ δ= = nonnull x may be either null or non-null, and the symbolic executor will try
both possibilities.
Here β, γ, δ, and are variables that standard for unknown quali- A more interesting case occurs if a variable x has a type with
fiers. Put together, these constraints require null = nonnull, which a qualifier variable (e.g., int ∗β ). In this case, we first try to solve
is not allowed, and hence qualifier inference will report an error for the current set of constraints to see whether β has a solution as
this program. either null or nonnull, and if it does, we perform the translation
Our symbolic executor also looks for null pointer errors. The given above. Otherwise, if β could be either, we first optimistically
symbolic executor tracks C values at the bit level, using a repre- assume it is nonnull.
sentation similar to KLEE [Cadar et al. 2008]. A null pointer is We can safely use this assumption when returning from a typed
represented as the value 0, and the symbolic executor reports an block to a symbolic block since such a qualifier variable can only
error if 0 is ever dereferenced. be introduced when variables are aliased (e.g., via pointer assign-
Typed and Symbolic Blocks. In our formal system, we allow ment), a case that is separately taken into account by the M IXY
typed and symbolic blocks to be introduced anywhere in the memory model (Section 4.2).
program. In M IXY, these blocks can only be introduced around However, if we use this assumption when entering a symbolic
whole function bodies by annotating a function as MIX(typed) or block from a typed block, we may later discover our assumption
MIX(symbolic), and M IXY switches between qualifier inference and was too optimistic. For example, consider the following code:
symbolic execution at function calls. We can simulate blocks within 1 {t int ∗x; {s x = NULL; s} ; {s free(x); s} t}
functions by manually extracting the relevant code into a fresh
function. In the type system, x has type int ∗ β , where initially β is uncon-
Skipping some details for the moment, this switching process strained. Suppose that we analyze the symbolic block on the right
works as follows. When M IXY is invoked, the programmer speci- before the one on the left. This scenario could happen because the
fies (as a command-line option) whether to begin in a typed block or analysis of the enclosing typed block does not model control-flow
a symbolic block. In either case, we first initialize global variables order (i.e., is flow-insensitive). Then initially, we would think the
as appropriate for the analysis, and then analyze the program start- call to free was safe because we optimistically treat unconstrained
ing with main. In symbolic execution mode, we begin simulating β as nonnull—but this is clearly not accurate here.
the program at the entry function, and at calls to functions that are The solution is, as expected, to repeat our analyses until we
either unmarked or are marked as symbolic, we continue symbolic reach a fixed point. In this case, after we analyze the left symbolic
execution into the function body. At calls to functions marked with block, we will discover a new constraint on x, and hence when we
MIX(typed), we switch to type inference starting with that function. iterate and reanalyze the right symbolic block, we will discover the
In type inference mode, we begin analysis at the entry function error. We are computing a least fixed point because we start with
f, applying qualifier inference to f and all functions reachable from f optimistic assumptions—nothing is null—and then monotonically
in the call graph, up to the frontier of any functions that are marked discover more expressions may be null.
From Symbolic Values to Types. We use the SMT solver to block to a typed block, we add constraints to require that all may-
discover the possible final values of variables and translate those to aliased expressions have the same type.
the appropriate types. Given a variable x that is mapped to symbolic
expression s, we ask whether g ∧ (s = 0) is satisfiable where g is 4.3 Caching Blocks
the path condition. If the condition is satisfiable, we constrain x to In C, a block or function may be called from many different call
be null in the type system. There are no nonnull constraints to be sites, so we may need to analyze that block in the context of
added since they correspond to places in code where pointers are each call site. Since it can be quite costly to analyze that block
dereferenced, which is not reflected in symbolic values. repeatedly, we cache the calling context and the results of the
Thus, null pointers from symbolic blocks will lead to errors analysis for that block, and we reuse the results when the block
in typed blocks if they flow to a nonnull position; whereas null is called again with a compatible calling context. Conceptually,
pointers from typed blocks will lead to errors in symbolic blocks if caching is similar to compositional symbolic execution [Godefroid
they are dereferenced symbolically. 2007]; in M IXY, we implement caching as an extension to the
mix rules, using types to summarize blocks rather than symbolic
4.2 Aliasing and M IXY’s Memory Model constraints.
The formal system M IX defers all reasoning about aliasing to as
Caching Symbolic Blocks. Before we translate the types from the
late of a time as possible. As alluded to in Section 3, this choice
enclosing typed block to symbolic values, we first check to see
may be difficult to implement in practice given limitations in the
if we have previously analyzed the same symbolic block with a
constraint solver. Thus in M IXY, we use a pre-pass pointer analysis
compatible calling context. We define the calling context to be the
to initialize aliasing relationships.
types for all variables that will be translated into symbolic values,
Typed to Symbolic Block. When we switch from a typed block and we say two calling contexts are compatible if every variable
to a symbolic block, we initialize a fresh symbolic memory, which has the same type in both contexts.
may include pointers. We use a variant of the approach described in If we have not analyzed the symbolic block before with a com-
Section 3 that makes use of aliasing information to be more precise. patible calling context, we translate the types into symbolic values,
Rather than modeling memory as one big array, M IXY models analyze the symbolic block, and translate the symbolic values to
memory as a map from locations to separate arrays. Aliasing within types by adding type constraints as usual. At this point, we will
arrays is modeled as in our formalism, and aliasing between arrays cache the translated types for this calling context; we cache the
is modeled using Morris’s general axiom of assignment [Bornat translated types instead of the symbolic values since the translation
2000; Morris 1982]. from symbolic values to types is expensive. Otherwise, if we have
C also supports a richer variety of types such as arrays and analyzed the symbolic block before with a compatible calling con-
structs, as well as recursive data structures. M IXY lazily initializes text, we use the cached results by adding null type constraints for
memory in an incremental manner so that we can sidestep the issue null cached types in a manner similar to translating symbolic val-
of initializing an arbitrarily recursive data structure; M IXY only ues. Finally, in both cached and uncached cases, we restore aliasing
initializes as much as is required by the symbolic block. We use relationships and return to the enclosing typed block as usual.
CIL’s pointer analysis to determine possible points-to relationships
Caching Typed Blocks. Caching for typed blocks is similarly im-
and initialize memory accordingly.
plemented, but with one difference: unlike above, we first translate
Symbolic to Typed Block. An issue arises from using type infer- symbolic values into types, then use the translated types as the call-
ence when we switch from a symbolic block to a typed block. Con- ing context, and finally cache the final types as the result of analyz-
sider the following code snippets, which are identical except that y ing the typed block. We could have chosen to use symbolic values
points to r on the left, and y points to x on the right: as the calling context and the result, but since translating symbolic
values to types or comparing symbolic values both involve similar
{s {s number of calls to the SMT solver, we chose to use types to unify
// ∗y not aliased to x // ∗y aliased to x the implementation.
int ∗x = . . .; int ∗x = . . .;
int ∗r = . . ., ∗∗y = &r; int ∗∗y = &x; 4.4 Recursion between Typed and Symbolic Blocks
{t // okay {t // should fail
A typed block and a symbolic block may recursively call each
x = NULL; x = NULL;
assert nonnull(∗y); t} assert nonnull(∗y); t} other, and we found block recursion to be surprisingly common
s} s}
in our experiments. Without special handling for recursion, M IXY
will keep switching between them indefinitely since a block is
In both cases, at entry to the typed blocks, x and ∗y are assigned analyzed with a fresh initial state upon every entry. Therefore, we
types β ref and γ ref respectively, based on their current values. need to detect when recursion occurs, either beginning with a typed
Notice, however, that for the code on the right, we should also block or a symbolic block, and handle it specially.
have β = γ . Otherwise, after the assignment x = NULL, we will To handle recursion, we maintain a block stack to keep track of
not know that ∗y is also NULL. blocks that are currently being analyzed. Similar to a function call
This example illustrates an important difference between type stack, the block stack is a stack of blocks and their calling contexts,
inference and type checking. In type checking, this problem cannot which are defined in terms of types as in caching (Section 4.3). We
arise because every value has a known type, and we only have push blocks onto the stack upon entry and pop them upon return.
to check that those types are consistent. However, type inference Before entering a block, we first look for recursion by search-
actually has to discover richer information, such as what types must ing the block stack for the same block with a compatible calling
be equal because of aliasing, in order to find a valid typing. context. If recursion is detected, then instead of entering the block,
One solution to this problem would be to translate aliasing in- we mark the matching block on the stack as recursive and return an
formation from symbolic execution to and from type constraints. In assumption about the result. For the initial assumption, we use the
M IXY, we use an alternative solution that is easier to implement: calling context of the marked block, optimistically assuming that
we use CIL’s built-in may pointer analysis to conservatively dis- the block has no effect. When we eventually return to the marked
cover points-to relationships. When we transition from a symbolic block, we compare the assumption with the actual result of analyz-
ing the block. If the assumption is compatible with the actual result, Annotating function str next dirent as symbolic, while leaving
we return the result; otherwise, we re-analyze the block using the sysutil next dirent and str alloc text as typed, successfully elim-
actual result as the updated assumption until we reach a fixed point. inates this warning: the symbolic executor correctly determines
that p filename is not null when it is used as an argument to
4.5 Preliminary Experience str alloc text. And although the extra precision does not matter
We gained some initial experience with M IXY by running it on in this particular example, notice that the call on line 8 will be an-
vsftpd-2.0.7 and looking for false null pointer warnings from alyzed in a separate invocation of the type system than the call on
pure type qualifier inference that can be eliminated with the addi- line 10, thus introducing some context-sensitivity.
tion of symbolic execution. Since M IXY is in the prototype stage,
we started small. Rather than annotate all dereferences as requiring Case 3: Flow- and path-insensitivity in dns resolve and main
nonnull, we added just one nonnull annotation: 1 void main BLOCK(struct sockaddr∗∗ p sock) MIX(symbolic) {
sysutil free(void ∗ nonnull p ptr) MIX(typed) { . . . } 2 ∗p sock = NULL;
3 dns resolve(p sock, tunable pasv address);
The sysutil free function wraps the free system call and checks, at 4 }
run time, that the pointer argument is not null. In essence, our anal- 5 int main(. . .) {
ysis tries to check this property statically. We annotated sysutil free 6 . . .main BLOCK(&p addr); . . .; sysutil free(p addr); . . .
7 }
itself with MIX(typed), so M IXY need not symbolically execute its
8 void dns resolve(struct sockaddr∗∗ p sock,
body—our annotation captures the important part of its behavior 9 const char∗ p name) {
for our analysis. 10 struct hostent∗ hent = gethostbyname(p name);
We then ran M IXY on vsftpd, beginning with typing at the out- 11 sockaddr clear(p sock);
ermost level. We examined the resulting warnings and then tried 12 if (hent→h addrtype == AF INET)
adding MIX(symbolic) annotations to eliminate warnings. We suc- 13 sockaddr alloc ipv4(p sock);
ceeded in several cases, discussed next. We did not fully examine 14 else if (hent→h addrtype == AF INET6)
many of the other cases, but Section 4.6 describes some prelimi- 15 sockaddr alloc ipv6(p sock);
nary observations about M IXY in practice. Note that the code snip- 16 else
pets shown below are abbreviated, and many identifiers have been 17 die(”gethostbyname(): neither IPv4 nor IPv6”);
18 }
shortened. We should also point out that all the examples below
eliminate one or more imprecise qualifier flows from type qualifier There are two sources of null values in the code above: ∗p sock
inference; this pruning may or may not suppress a given warning, is set to null on line 2; and sockaddr clear, which was previously
depending on whether other flows could produce the same warning. marked as symbolic in Case 1 above, also sets ∗p sock to null on
Case 1: Flow and path insensitivity in sockaddr clear line 11 in dns resolve. Due to flow insensitivity in the type system,
both these null values eventually reach sysutil free on line 6, leading
1 void sockaddr clear(struct sockaddr ∗∗p sock) MIX(symbolic) { to false warnings.
2 if (∗p sock != NULL) { However, we can see that these null values are actually overwrit-
3 sysutil free(∗p sock);
ten by non-null values on lines 13 and 15, where sockaddr alloc ipv4
4 ∗p sock = NULL;
5 } or sockaddr alloc ipv6 allocates the appropriate structure and as-
6 } signs it to ∗p sock (not shown). We can eliminate these warnings
by extracting the code in main that includes both null sources into
This function is implicated in a false warning: due to flow insen- a symbolic block.
sitivity in the type system, the null assignment on line 4 flows to Also, there is a system call gethostbyname on line 10 that we
the argument to sysutil free on line 3, even though the assignment need to handle. Here, we define a well-behaved, symbolic model
occurs after the call. Also, the type system ignores the null check of gethostbyname that returns only AF INET and AF INET6 as is
on line 2 due to path-insensitivity. standard (not shown). This will cause the symbolic executor to skip
Marking sockaddr clear with MIX(symbolic) successfully resolves the last branch on line 17, which we need to do because we cannot
this warning: the symbolic executor determines that ∗p sock is not analyze die symbolically as it eventually calls a function pointer, an
null when used as an argument to sysutil free(). operation that our symbolic executor currently has limited support
Case 2: Path and context insensitivity in str next dirent for. We also cannot put gethostbyname or die in typed blocks in this
case, since ∗p sock is null and will result in false warnings.
1 void str alloc text(struct mystr∗ p str) MIX(typed);
2 const char∗ sysutil next dirent(. . .) MIX(typed) { Case 4: Helping symbolic execution with symbolic function point-
3 if (p dirent == NULL) return NULL; ers
4 }
5 void str next dirent(. . .) MIX(symbolic) { 1 void sysutil exit BLOCK(void) MIX(typed) {
6 const char∗ p filename = sysutil next dirent(. . .); 2 if (s exit func) (∗s exit func)();
7 if (p filename != NULL) 3 }
8 str alloc text(p filename); 4 void sysutil exit(int exit code) {
9 } 5 sysutil exit BLOCK();
10 . . .str alloc text(str); sysutil free(str); . . . 6 exit(exit code);
7 }
In this example, the function str next direct calls sysutil next dirent
on line 6, which may return a null value. Hence p filename may be In several instances, we would like to evaluate symbolic blocks
null. The type system ignores the null check on line 7 and due to that call sysutil exit, defined on line 4, which in turn calls exit to
context-insensitivity, conflates p filename with other variables, such terminate the program. However, before terminating the program,
as str, that are passed to str alloc text (lines 8 and 10). Hence the sysutil exit calls the function pointer s exit func on line 2. Our sym-
type system believes str may be null. However, str is used as an bolic executor does not support calling symbolic function pointers
argument to sysutil free (line 10), which leads the type system to (i.e., which targets are unknown), so instead, we extract the call to
report a false warning. s exit func into a typed block to analyze the call conservatively.
4.6 Discussion and Future Work execution to explore a small subset of the possible program paths,
Our preliminary experience provides some real-world validation since in the presence of loops with symbolic bounds, pure symbolic
of M IX’s efficacy in removing false positives. However, there are execution will not terminate in a reasonable amount of time (unless
several limitations to be addressed in future work. loop invariants are assumed). In the M IX formalism, in contrast, we
Most importantly, the overwhelming source of issues in M IXY use symbolic execution in a sound manner by exploring all paths,
is its coarse treatment of aliasing, which relies on an imprecise which is possible because we can use type checking on parts of the
pointer analysis. One immediate consequence is that it impedes per- code where symbolic execution takes too long. Of course, it is also
formance in the symbolic executor: if an imprecise pointer analysis possible to mix unsound symbolic execution with type checking, to
returns large points-to sets for pointers, translating symbolic point- gain whatever level of assurance the user desires.
ers to type constraints becomes slow because we first need to check There are several static analyses that can operate at different lev-
if each pointer target is valid in the current path condition by call- els of abstraction. Bandera [Corbett et al. 2000] is a model check-
ing the SMT solver, then determine if any valid targets may be null. ing system that uses abstraction-based program specialization, in
This leads to a significant slowdown: our small examples from Sec- which the user specifies the exact abstractions to use. System Z
tion 4.5 take less than a second to run without symbolic blocks, but is an abstract interpreter generator in which the user can tune the
from 5 to 25 seconds to run with one symbolic block, and about level of abstraction to trade off cost and precision [Yi and Harri-
60 seconds with two symbolic blocks. This issue is further com- son 1993]. Tuning these systems requires a deep knowledge of pro-
pounded by the fixed-point computation that repeatedly analyzes gram analysis. In contrast, we believe that M IX’s tradeoff is eas-
symbolic blocks nested in typed blocks or for handling recursion. ier to understand—one selects between essentially no abstraction
We also noticed several cases in vsftpd where calls to symbolic (symbolic execution), or abstraction in terms of types, which are
blocks would help introduce context sensitivity to distinguish calls arguably the most successful, well-understood static analysis.
to malloc. However, since we rely on a context-insensitive pointer M IX bears some resemblance to static analysis based on ab-
analysis to restore aliasing relationships when switching to typed straction refinement, such as SLAM [Ball and Rajamani 2002],
blocks, these calls will again be conflated. The issue especially af- BLAST [Henzinger et al. 2004], and client-driven pointer analy-
fects the analysis of typed-to-symbolic-to-typed recursive blocks sis [Guyer and Lin 2005]. These tools incrementally refine their
because the nested typed blocks are polluted by aliasing relation- abstraction of the program as necessary for analysis. Adding sym-
ships from the entire program. A similar issue occurs with symbolic bolic blocks to a program can be seen as introducing a very precise
blocks, as pointers are initialized to point to targets from the entire “refinement” of the program abstraction.
program, rather than being limited to the enclosing context. There are a few systems that combine type checking or infer-
Just as in the formalism, M IXY has to consider the entire mem- ence with other analyses. Dependent types provide an elegant way
ory when switching from typed to symbolic or vice-versa. Since to augment standard type with very rich type refinements [Xi and
this was a deliberate design decision, we were not surprised to find Pfenning 1999]. Liquid types combines Hindley-Milner style type
out that this has an impact on performance and leads to many limi- inference with predicate abstraction [Rondon et al. 2008, 2010].
tations in practice. Any temporary violation of type invariants from Hybrid types combines static typing, theorem proving, and dy-
symbolic blocks would immediately be flagged when switching to namic typing [Flanagan 2006]. All of these systems combine types
typed blocks, even if they have no effect on the code in the typed with refinements at a deep level—the refinements are placed “on
blocks. In the other direction, symbolic blocks are forced to start top of” the type structure. In contrast, M IX uses a much coarser
with a fresh memory when switching from typed blocks even if approach in which the precise analysis is almost entirely separated
there were no effects. from the type system, except for a thin interface between the two
Ultimately, we believe that these issues can be addressed with systems.
more precise information about aliasing as well as effects, perhaps Many others have considered the problem of combining pro-
extracted directly from the type inference constraints and symbolic gram analyses. A reduced product in abstract interpretation [Cousot
execution. and Cousot 1979] is a theoretical description of the most precise
In addition to checking for null pointer errors, we plan to ex- combination of two abstract domains. It is typically obtained via
tend M IXY to check other properties, such as data races, and to manually defined reduction operators that depend on the domains
mix other types of analysis together. We also plan to investigate au- being combined. Another example of combining abstract domains
tomatic placement of type/symbolic blocks, i.e., essentially using is the logical product of Gulwani and Tiwari [2006]. Combining
M IX as an intermediate language for combining analyses. One idea program analyses for compiler optimizations is also well-studied
is to begin with just typed blocks and then incrementally add sym- (e.g., Lerner et al. [2002]). In all of these cases, the combinations
bolic blocks to refine the result. This approach resembles abstrac- strengthen the kinds of derivable facts over the entire program.
tion refinement (e.g., Ball and Rajamani [2002]; Henzinger et al. With M IX, we instead analyze separate parts of the program with
[2004]), except the refinement can be obtained using completely different analyses. Finally, M IX was partially inspired by Nelson-
different analyses instead of one particular family of abstractions. Oppen style cooperating decision procedures [Nelson and Oppen
1979]. One important feature of the Nelson-Oppen framework is
that it provides an automatic method for distributing the appropri-
5. Related Work ate formula fragments to each solver (if that the solvers match cer-
tain criteria). Clearly M IX is targeted at solving a very different
There are several threads of related work. There have been numer- problem, but it would be an interesting direction for future work to
ous proposals for static analyses based on type systems; see Pals- try to extend M IX into a similar framework that can automatically
berg and Millstein [2008] for pointers. Symbolic execution was first integrate analyses that have appropriately structured interfaces.
proposed by King [1976] as an enhanced testing strategy, but was
difficult to apply for many years. Recently, SMT solvers have be-
come very powerful, making symbolic execution much more at- 6. Conclusion
tractive as even very complex path conditions can be solved sur- We presented M IX, a new approach for mixing type checking and
prisingly fast. There have been many recent, impressive results us- symbolic execution to trade off efficiency and precision. The key
ing symbolic execution for bug finding [Cadar et al. 2006, 2008; feature of our approach is that the mixed systems are essentially
Godefroid et al. 2005; Sen et al. 2005]. These systems use symbolic completely independent, and they are used in an off-the-shelf man-
ner. Only at the boundaries between typed blocks—which the user James C. King. Symbolic execution and program testing. Commun. ACM,
inserts to indicate where type checking should be used—and sym- 19(7):385–394, 1976.
bolic blocks—the symbolic checking annotation—do we invoke Sorin Lerner, David Grove, and Craig Chambers. Composing dataflow
special mix rules to translate information between the two sys- analyses and transformations. In Principles of Programming Languages
tems. We proved that M IX is sound (which implies that type check- (POPL), pages 270–282, 2002.
ing and symbolic execution are also independently sound). We Rupak Majumdar and Koushik Sen. Hybrid concolic testing. In Inter-
also described a preliminary implementation, M IXY, which per- national Conference on Software Engineering (ICSE), pages 416–426,
forms null/non-null type qualifier inference for C. We identified 2007.
several cases in which symbolic execution could eliminate false Joe M. Morris. A general axiom of assignment. Assignment and linked
positives from type inference. In sum, we believe that M IX provides data structure. A proof of the Schorr-Waite algorithm. In Theoretical
a promising new approach to trade off precision and efficiency in Foundations of Programming Methodology, pages 25–51, 1982.
static analysis. George C. Necula, Scott McPeak, Shree Prakash Rahul, and Westley
Weimer. CIL: Intermediate language and tools for analysis and transfor-
mation of C programs. In Compiler Construction (CC), pages 213–228,
Acknowledgments 2002.
We would like to thank the anonymous reviewers and Patrice Gode- Greg Nelson and Derek C. Oppen. Simplification by cooperating decision
froid for their helpful comments and suggestions. This research was procedures. ACM Trans. Program. Lang. Syst., 1(2):245–257, 1979.
supported in part by DARPA ODOD.HR00110810073, NSF CCF- Jens Palsberg and Todd Millstein. Type Systems: Advances and Applica-
0541036, and NSF CCF-0915978. tions. In The Compiler Design Handbook: Optimizations and Machine
Code Generation, chapter 9. 2008.
References Polyvios Pratikakis, Jeffrey S. Foster, and Michael W. Hicks. Locksmith:
context-sensitive correlation analysis for race detection. In Program-
Thomas Ball and Sriram K. Rajamani. The SLAM project: debugging
ming Language Design and Implementation (PLDI), pages 320–331,
system software via static analysis. In Principles of Programming
2006.
Languages (POPL), pages 1–3, 2002.
Elnatan Reisner, Charles Song, Kin-Keung Ma, Jeffrey S. Foster, and Adam
Richard Bornat. Proving pointer programs in Hoare logic. In Mathematics Porter. Using symbolic evaluation to understand behavior in config-
of Program Construction (MPC), pages 102–126, 2000. urable software systems. In International Conference on Software Engi-
Cristian Cadar, Vijay Ganesh, Peter M. Pawlowski, David L. Dill, and neering (ICSE), 2010. To appear.
Dawson R. Engler. EXE: automatically generating inputs of death. In Patrick M. Rondon, Ming Kawaguci, and Ranjit Jhala. Liquid types.
Computer and Communications Security (CCS), pages 322–335, 2006. In Programming Language Design and Implementation (PLDI), pages
Cristian Cadar, Daniel Dunbar, and Dawson R. Engler. KLEE: Unassisted 159–169, 2008.
and automatic generation of high-coverage tests for complex systems Patrick M. Rondon, Ming Kawaguchi, and Ranjit Jhala. Low-level liquid
programs. In Operating Systems Design and Implementation (OSDI), types. In Principles of Programming Languages (POPL), pages 131–
pages 209–224, 2008. 144, 2010.
James C. Corbett, Matthew B. Dwyer, John Hatcliff, Shawn Laubach, Koushik Sen, Darko Marinov, and Gul Agha. CUTE: a concolic unit testing
Corina S. Păsăreanu, Robby, and Hongjun Zheng. Bandera: extracting engine for C. In Foundations of Software Engineering (FSE), pages 263–
finite-state models from Java source code. In International Conference 272, 2005.
on Software Engineering (ICSE), pages 439–448, 2000.
Hongwei Xi and Frank Pfenning. Dependent types in practical program-
Patrick Cousot and Radhia Cousot. Systematic design of program analysis ming. In Principles of Programming Languages (POPL), pages 214–
frameworks. In Principles of Programming Languages (POPL), pages 227, 1999.
269–282, 1979.
Kwangkeun Yi and Williams Ludwell Harrison, III. Automatic generation
Cormac Flanagan. Hybrid type checking. In Principles of Programming and management of interprocedural program analyses. In Principles of
Languages (POPL), pages 245–256, 2006. Programming Languages (POPL), pages 246–259, 1993.
Jeffrey S. Foster, Robert Johnson, John Kodumal, and Alex Aiken. Flow-
insensitive type qualifiers. ACM Trans. Program. Lang. Syst., 28(6):
1035–1087, 2006.
Vijay Ganesh and David L. Dill. A decision procedure for bit-vectors
and arrays. In Computer-Aided Verification (CAV), pages 519–531, July
2007.
Patrice Godefroid. Compositional dynamic test generation. In Principles of
Programming Languages (POPL), pages 47–54, 2007.
Patrice Godefroid, Nils Klarlund, and Koushik Sen. DART: directed auto-
mated random testing. In Programming Language Design and Imple-
mentation (PLDI), pages 213–223, 2005.
Sumit Gulwani and Ashish Tiwari. Combining abstract interpreters. In Pro-
gramming Language Design and Implementation (PLDI), pages 376–
386, 2006.
Samuel Z. Guyer and Calvin Lin. Error checking with client-driven pointer
analysis. Sci. Comput. Program., 58(1-2):83–114, 2005.
Thomas A. Henzinger, Ranjit Jhala, Rupak Majumdar, and Kenneth L.
McMillan. Abstractions from proofs. In Principles of Programming
Languages (POPL), pages 232–244, 2004.
Khoo Yit Phang, Bor-Yuh Evan Chang, and Jeffrey S. Foster. Mixing
type checking and symbolic execution (extended version). Technical
Report CS-TR-4954, Department of Computer Science, University of
Maryland, College Park, 2010.
Lecture Notes: Concolic Testing
1 Motivation
Companies today spend a huge amount of time and energy testing software to determine whether
it does the right thing, and to find and then eliminate bugs. A major challenge is writing a set of
test cases that covers all of the source code, as well as finding inputs that lead to difficult-to-trigger
corner case defects.
Symbolic execution, discussed in the last lecture, is a promising approach to exploring differ-
ent execution paths through programs. However, it has significant limitations. For paths that are
long and involve many conditions, SMT solvers may not be able to find satisfying assignments
to variables that lead to a test case that follows that path. Other paths may be short but involve
computations that are outside the capabilities of the solver, such as non-linear arithmetic or cryp-
tographic functions. For example, consider the following function:
testme(int x, int y){
if(bbox(x)==y){
ERROR;
} else {
// OK
}
}
If we assume that the implementation of bbox is unavailable, or is too complicated for a the-
orem prover to reason about, then symbolic execution may not be able to determine whether the
error is reachable.
Concolic testing overcomes these problems by combining concrete execution (i.e. testing) with
symbolic execution.1 Symbolic execution is used to solve for inputs that lead along a certain
path. However, when a part of the path condition is infeasible for the SMT solver to handle, we
substitute values from a test run of the program. In many cases, this allows us to make progress
towards covering parts of the code that we could not reach through either symbolic execution or
randomly generated tests.
2 Goals
We will consider the specific goal of automatically unit testing programs to find assertion viola-
tions and run-time errors such as divide by zero. We can reduce these problems to input genera-
tion: given a statement s in program P , compute input i such that P piq executes s.2 For example,
1
The word concolic is a portmanteau of concrete and symbolic
2
This formulation is due to Wolfram Schulte
1
if we have a statement assert x > 5, we can translate that into the code:
1 if (!(x > 5))
2 ERROR;
Now if line 2 is reachable, the assertion is violated. We can play a similar trick with run-time
errors. For example, a statement involving division x = 3 / i can be placed under a guard:
1 if (i != 0)
2 x = 3 / i;
3 else
4 ERROR;
3 Overview
Consider the testme example from the motivating section. Although symbolic analysis cannot
solve for values of x and y that allow execution to reach the error, we can generate random test
cases. These random test cases are unlikely to reach the error: for each x there is only one y that
will work, and random input generation is unlikely to find it. However, concolic testing can use
the concrete value of x and the result of running bbox(x) in order to solve for a matching y value.
Running the code with the original x and the solution for y results in a test case that reaches the
error.
In order to understand how concolic testing works in detail, consider a more realistic and more
complete example:
1 int double (int v) {
2 return 2*v;
3 }
4
5 void bar(int x, int y) {
6 z = double (y);
7 if (z == x) {
8 if (x > y+10) {
9 ERROR;
10 }
11 }
12 }
We want to test the function bar. We start with random inputs such as x 22, y 7. We
then run the test case and look at the path that is taken by execution: in this case, we compute
z 14 and skip the outer conditional. We then execute symbolically along this path. Given inputs
x x0 , y y0 , we discover that at the end of execution z 2 y0 , and we come up with a path
condition 2 y0 ! x0 .
In order to reach other statements in the program, the concolic execution engine picks a branch
to reverse. In this case there is only one branch touched by the current execution path; this is the
branch that produced the path condition above. We negate the path condition to get 2 y0 x0
and ask the SMT solver to give us a satisfying solution.
Assume the SMT solver produces the solution x0 2, y0 1. We run the code with that input.
This time the first branch is taken but the second one is not. Symbolic execution returns the same
end result, but this time produces a path condition 2 y0 x0 ^ x0 ¤ y0 10.
2
Now to explore a different path we could reverse either test, but we’ve already explored the
path that involves negating the first condition. So in order to explore new code, the concolic
execution engine negates the condition from the second if statement, leaving the first as-is. We
hand the formula 2 y0 x0 ^ x0 ¡ y0 10 to an SMT solver, which produces a solution
x0 30, y0 15. This input leads to the error.
The example above involves no problematic SMT formulas, so regular symbolic execution
would suffice. The following example illustrates a variant of the example in which concolic exe-
cution is essential:
1 int foo(int v) {
2 return v*v\%50;
3 }
4
5 void baz(int x, int y) {
6 z = foo(y);
7 if (z == x) {
8 if (x > y+10) {
9 ERROR;
10 }
11 }
12 }
Although the code to be tested in baz is almost the same as bar above, the problem is more
difficult because of the non-linear arithmetic and the modulus operator in foo. If we take the
same two initial inputs, x 22, y 7, symbolic execution gives us the formula z py0 y0 q%50,
and the path condition is x0 ! py0 y0 q%50. This formula is not linear in the input y0 , and so it
may defeat the SMT solver.
We can address the issue by treating foo, the function that includes nonlinear computation,
concretely instead of symbolically. In the symbolic state we now get z f oopy0 q, and for y0 7
we have z 49. The path condition becaomse f oopy0 q! x0 , and when we negate this we get
f oopy0 q x0 , or 49 x0 . This is trivially solvable with x0 49. We leave y0 7 as before;
this is the best choice because y0 is an input to f oopy0 q so if we change it, then setting x0 49 may
not lead to taking the first conditional. In this case, the new test case of x 49, y 7 finds the
error.
4 Implementation
Ball and Daniel [1] give the following pseudocode for concolic execution (which they call dynamic
symbolic execution):
1 i = an input to program P
2 while defined(i):
3 p = path covered by execution P(i)
4 cond = pathCondition(p)
5 s = SMT(Not(cond))
6 i = s.model()
Broadly, this just systematizes the approach illustrated in the previous section. However, a
number of details are worth noting:
3
First, when negating the path condition, there is a choice about how to do it. As discussed
above, the usual approach is to put the path conditions in the order in which they were generated
by symbolic execution. The concolic execution engine may target a particular region of code for
execution. It finds the first branch for which the path to that region diverges from the current test
case. The path conditions are left unchanged up to this branch, but the condition for this branch
is negated. Any conditions beyond the branch under consideration are simply omitted. With this
approach, the solution provided by the SMT solver will result in execution reaching the branch
and then taking it in the opposite direction, leading execution closer to the targeted region of code.
Second, when generating the path condition, the concolic execution engine may choose to
replace some expressions with constants taken from the run of the test case, rather than treating
those expressions symbolically. These expressions can be chosen for one of several reasons. First,
we may choose formulas that are difficult to invert, such as non-linear arithmetic or cryptographic
hash functions. Second, we may choose code that is highly complex, leading to formulas that are
too large to solve efficiently. Third, we may decide that some code is not important to test, such
as low-level libraries that the code we are writing depends on. While sometimes these libraries
could be analyzable, when they add no value to the testing process, they simply make the formulas
harder to solve than they are when the libraries are analyzed using concrete data.
5 Acknowledgments
The structure of these notes and the examples are adapted from a presentation by Koushik Sen.
References
[1] T. Ball and J. Daniel. Deconstructing dynamic symbolic execution. In Proceedings of the 2014
Marktoberdorf Summer School on Dependable Software Systems Engineering, 2015.
4
Strictly Declarative Specification of Sophisticated Points-to Analyses
improve performance in many ways. First, it can re-order 10 InstanceFieldPointsTo(?heap, ?signature, ?baseheap) <-
11 StoreHeapInstanceField(?baseheap, ?signature, ?from),
variables in the intermediate relation and, thus, introduce a
12 VarPointsTo(?heap, ?from).
new index, so that further joins are more efficient. Second, it 13
can cache intermediate results, implementing the “view ma- 14 StoreHeapInstanceField(?baseheap, ?signature, ?from) <-
terialization” database optimization. Third, it can be used to 15 StoreInstanceField(?from, ?signature, ?base),
16 VarPointsTo(?baseheap, ?base).
guide the query optimizer to perform joins between smaller
relations first, so as to minimize intermediate results. Fi-
nally, it can be used to project out unnecessary variables, Note that the last two rules only contain relations with
the same innermost variables, therefore any delta-execution
thus keeping intermediate results small.
of those rules is efficient. Implicitly, this is achieved because
Many of these benefits can be obtained in our simple
pointer analysis program. Consider the 3-way join in lines the folding also adds a new index, for the new intermediate
11-13 of the above “optimized” program. Since relation relation.
VarPointsTo is recursive and used twice, either of its in-
The above program still admits more optimization, as one
stances can be thought of as a “small” relation from the more inefficient join remains. Consider the joins in lines
6-8 of the above program. Both relation VarPointsTo and
perspective of join efficiency. Specifically, under semi-naive
relation InstanceFieldsPointsTo are recursively defined.
evaluation, one can think of the above rule (in lines 10-13)
as equivalent to the following delta-rewritten program: (There is direct recursion in VarPointsTo, as well as mu-
tual recursion between them.) Thus, after the first step, their
∆InstanceFieldPointsTo(?heap, ?signature, ?baseheap) <- deltas will be joined with the full other relations. Specifi-
StoreInstanceField(?from, ?signature, ?base), cally, in semi-naive evaluation the above rule (lines 5-8) is
∆VarPointsTo(?baseheap, ?base),
VarPointsTo(?heap, ?from).
roughly equivalent to:
∆InstanceFieldPointsTo(?heap, ?signature, ?baseheap) <-
StoreInstanceField(?from, ?signature, ?base), ∆VarPointsTo(?heap, ?to) <-
VarPointsTo(?baseheap, ?base), LoadInstanceField(?to,?signature,?base),
∆VarPointsTo(?heap, ?from). ∆VarPointsTo(?baseheap,?base),
InstanceFieldPointsTo(?heap, ?signature, ?baseheap).
(We elide version numbers, since we are just making an ∆VarPointsTo(?heap, ?to) <-
LoadInstanceField(?to, ?signature, ?base),
efficiency point. Note that the deltas are also part of the VarPointsTo(?baseheap, ?base),
full relation—i.e., they are the deltas from the previous step. ∆InstanceFieldPointsTo(?heap, ?signature, ?baseheap).
Hence, we do not need a third rule that joins two deltas
together.) As before, the performance problem is with the second
The first rule is fairly efficient as-is: the delta relation delta rule: the innermost variable of the large relations is
binds variable ?base, which is used to index into rela- not bound by the delta relation. It is tempting to try to
tion StoreInstanceField and bind variable ?from, which eliminate the inefficiency with a different variable order,
is used to index into relation VarPointsTo(?heap, ?from). without performing more folds. Indeed, we could optimize
The second rule, however, would be disastrous if executed the joins in lines 3-8 without an extra fold, by reordering the
as-is: none of the large relations has its innermost variable variables of VarPointsTo as well as LoadInstanceField—
bound by the delta relation. We could improve the perfor- the latter so that ?signature is last. This would conflict with
mance of the second rule by reordering the variables of the joins in lines 10-16, however, and would require further
StoreInstanceField but there is no way to do so without rewrites.
destroying the performance of the first rule. Therefore, the inefficiency can be resolved with a fold,
This conflict can be resolved by a fold. We introduce a which will also reorder variables so that all joins are highly
temporary relation that captures the result of a two-relation efficient: the joined relations always have a common in-
join, projects away unnecessary variables, and reorders the nermost variable. We introduce the intermediate relation
LoadHeapInstanceField, and get our final highly-optimized to mere seconds. Furthermore, the optimizations are robust
program: with respect to the different analysis variants supported in
D. The same optimized trunk of code is used for analy-
1 VarPointsTo(?heap, ?var) <-
2 AssignHeapAllocation(?heap, ?var). ses with several different kinds of context sensitivity.
3 VarPointsTo(?heap, ?to) <-
4 Assign(?to, ?from), VarPointsTo(?heap, ?from). 5. D Performance
5 VarPointsTo(?heap, ?to) <-
6 LoadHeapInstanceField(?to, ?signature, ?baseheap), We next present performance experiments for D, and
7 InstanceFieldPointsTo(?heap, ?signature, ?baseheap). especially contrast it with P—a BDD-based framework
8
that is state-of-the-art in terms of features and scalability.
9 LoadHeapInstanceField(?to, ?signature, ?baseheap) <-
10 LoadInstanceField(?to, ?signature, ?base),
Because of the variety of experimental results, a roadmap is
11 VarPointsTo(?baseheap, ?base). useful:
12
250
tations of relations in P are hardly larger, even with
200
significant exception-handling-induced imprecision. In con-
150
trast, D’s explicit representation of relations cannot tol-
100 erate the addition of such “regular” imprecision without suf-
50 fering performance penalties. This phenomenon is perhaps
0
counter-intuitive: D performs much better when impre-
antlr bloat chart eclipse hsqldb jython luindex lusearch pmd xalan
cision is avoided, which is also a desirable feature for the
Figure 4. (P-compatibility mode) context-insensitive quality of the analysis.
1400
doop
paddle 5.2 Full D Performance and Precision
1200
Our main experimental results compare the full version of
analysis time (seconds)
1000
D with the full P, and present detailed statistics on
800
the precision of D analyses.
600
The full mode of D is not exactly equivalent to the full
400 P, yet the D analysis logic is always strictly more
200 precise and more complete, resulting in higher-quality anal-
0
yses. The differences are in the more precise and complete
antlr bloat chart eclipse hsqldb jython luindex lusearch pmd xalan
handling of reflection, more precise handling of exceptions,
Figure 5. (P-compatibility mode) 1-call etc.
4000
doop
Figures 9 to 16 compare the performance of D and
paddle
3500 P. (The analyses presented are a representative selec-
3000 tion for space and layout reasons.) This range of analy-
analysis time (seconds)
1200
D) and analyzes much less code.
1000
800
The significance of these results cannot be overstated:
600
The conventional wisdom has been that such analyses can-
400
not be performed without BDDs. For instance, Lhoták and
200 Hendren write regarding the P study: “It is the use
0 of BDDs and the P framework that finally makes this
antlr bloat chart eclipse hsqldb jython luindex lusearch pmd xalan
study possible. Moreover, some of the characteristics of the
Figure 8. (P-compatibility mode) 1-object+H analysis results that we are interested in would be very costly
400 1600
doop doop
paddle paddle
350 1400
300 1200
analysis time (seconds)
200 800
150 600
100 400
50 200
0 0
antlr bloat chart eclipse hsqldb jython luindex lusearch pmd xalan antlr bloat chart eclipse hsqldb jython luindex lusearch pmd xalan
800 4000
600 3000
400 2000
200 1000
0 0
antlr bloat chart eclipse hsqldb jython luindex lusearch pmd xalan antlr bloat chart eclipse hsqldb jython luindex lusearch pmd xalan
Figure 10. (Full mode) 1-object Figure 14. (Full mode) 1-call+H
4500 2500
doop doop
4000 paddle
3500 2000
analysis time (seconds)
3000
1500
2500
2000
1000
1500
1000 500
500
0 0
antlr bloat chart eclipse hsqldb jython luindex lusearch pmd xalan antlr bloat chart eclipse hsqldb jython luindex lusearch pmd xalan
Figure 11. (Full mode) 1-object+H Figure 15. (Full mode) 2-call+1H
1400 7000
doop doop
1200 6000
analysis time (seconds)
analysis time (seconds)
1000 5000
800 4000
600 3000
400 2000
200 1000
0 0
antlr bloat chart eclipse hsqldb jython luindex lusearch pmd xalan antlr bloat chart eclipse hsqldb jython luindex lusearch pmd xalan
Figure 12. (Full mode) 2-object+1H Figure 16. (Full mode) 2-call+2H
to measure on an explicit representation.” [18] (Recall also supported by the P framework, while the third is too
that the P study analyzed the DaCapo benchmarks with heavy to run. (In our tests, the analysis times-out even for
the smaller JDK 1.3.1 01.) the smallest of the DaCapo benchmarks. Lhoták also reports
The last three analyses of our set (2-call+1-heap, 2- that “[He] never managed to get P to run in available
object+1-heap, and 2-call+2-heap) are more precise than memory with these settings”.2 )
any context-sensitive analyses ever reported in the research The range of D-supported analyses allows us to obtain
literature. With a time limit of 2 hours, D analyzed most insights regarding analysis precision. Figure 17 shows some
of the DaCapo applications under these analyses. All three
analyses are impossible with P. The first two are not 2 http://www.sable.mcgill.ca/pipermail/soot-list/2006-March/000601.html
of the most important statistics on our analyses’ results for analysis nodes edges var points-to
representative programs. Perhaps the most informative met- insens 4510 24K 2.8M 67 - -
ric is the average points-to set size for plain program vari- 1-call 4498 24K 897K 22 4.9M 31
ables.3 The precision observations are very similar to those 1-call+H 4495 24K 887K 22 14M 90
antlr
in the P study: object-sensitivity is very good for en- 2-call+1H 4484 23K 719K 18 48M 84
suring points-to precision, and a context-sensitive heap can 2-call+2H 4451 23K 570K 14 79M 171
1-obj 4486 24K 748K 18 4.7M 16
only serve to significantly enhance the quality of results. We
1-obj+H 4435 23K 435K 11 25M 86
can immediately see the value of our highly precise analyses,
2-obj+1H 4382 22K 264K 7 7.8M 8
and especially the combination of a 2-object-sensitive anal- insens 7873 41K 5.9M 84 - -
ysis with a context-sensitive heap. This most precise analy- 1-call 7820 40K 2.6M 36 18M 66
sis typically drops the average points-to set size to one-tenth 1-call+H 7816 40K 2.5M 36 43M 162
chart
of the size of the least precise (context insensitive) analy- 2-call+1H 7800 40K 2.2M 31 202M 173
sis. Remarkably, this even impacts the number of call-graph 2-call+2H × × × × × ×
edges—a metric that notoriously improves very little with 1-obj 7803 40K 2.4M 34 18M 27
increasing the precision of the points-to analysis. In future 1-obj+H 7676 37K 1.2M 17 81M 123
work we expect to conduct a thorough evaluation of the pre- 2-obj+1H 7570 35K 414K 6 24M 7
cision of a wide range of analyses for several end-user met- insens 5536 27K 3.5M 73 - -
rics. 1-call 5519 26K 1.1M 22 5.8M 31
1-call+H 5516 26K 1.0M 22 16M 89
pmd
5.3 BDDs vs. Explicit Representation 2-call+1H 5506 26K 925K 20 65M 94
2-call+2H 5473 25K 803K 17 136M 219
Generally, the performance differences between D and 1-obj 5504 26K 964K 21 5.2M 15
P are largely attributable to the use of BDDs vs. an 1-obj+H 5440 25K 682K 15 25M 77
explicit representation of relation contents. The comparison 2-obj+1H 5372 24K 302K 7 7.4M 7
of the two systems reveals interesting lessons regarding the insens 6580 33K 3.4M 62 - -
representation of relations in points-to analysis. 1-call 6568 33K 1.4M 25 7.5M 35
BDDs are a maximally reduced data structure (for a given 1-call+H 6565 33K 1.4M 25 22M 104
xalan
variable ordering) so they naturally trade off some time for 2-call+1H 6551 32K 1.2M 22 78M 88
space, at least for large relations. Furthermore, BDDs have 2-call+2H 6505 32K 939K 17 125M 170
1-obj 6549 33K 1.2M 22 19M 30
heavy overheads, in the case of irregular relations that cannot
1-obj+H 6468 31K 696K 13 106M 173
be reduced. Consider the worst-case scenario for BDDs: a
2-obj+1H × × × × × ×
relation with a single tuple. The BDD representation in P-
uses a node per bit—e.g., the single tuple in a relation Figure 17. Precision statistics of D analyses for a subset
over a 48-bit variable space will be represented by 48 BDD of the DaCapo benchmarks. The columns show call-graph
nodes. Each node in the BuDDy library (used by P) nodes and edges, as well as total and average (per variable)
is 20 bytes, or 160 bits. This represents a space overhead points-to facts, first for plain program variables and then for
of 160x, but it also represents a time overhead, since what “context-sensitive variables” (i.e., context-variable tuples).
would be a very quick operation in an explicit tuple repre-
sentation now requires traversing 48 heap objects (allocated P spent 40x more time, 957 seconds, creating the BDD,
in a single large region, but with no structure-locality). but then performed the join in just 0.527 seconds. In terms
The difficulty in analyzing the trade-off is that results on of space, the BDD representation of the 7 million tuples con-
smaller data sets and operations do not translate to larger sisted of just 148.7 thousand nodes—less than 3MB of mem-
ones. For instance, we tried a simple experiment to com- ory! This demonstrates how different the cost model is for
pare the join performance of D and P, without any the two systems. If P can exploit regularity and build
other recursion or iteration. We read into memory two pre- a new BDD through efficient operations on older ones, then
viously computed points-to analysis relations (including the its performance is unparalleled. Creating the BDD, however,
VarPointsTo relation, for which P’s BDD variable or- can often be extremely time consuming. Furthermore, a sin-
der is highly optimized) and computed their join. The fully gle non-reducible relation can become a bottleneck for the
expanded relation size in D was a little over 1GB, or whole system. Thus, it is hard to translate the results of mi-
7 million tuples. D performed the join in 24.4 seconds. crobenchmarks to more complex settings, as the complexity
3 Note
of BDDs depends on their redundancy.
the apparent paradox of having the average number of var-points-to
To gain a better understanding of performance, we ana-
facts often be higher when computed over context-sensitive variables than
over plain variables. Although each context-sensitive variable has fewer lyzed the sizes of BDDs in P for some major relations
points-to facts than its context-insensitive version, the average over all in its analyses, relative to the size of the explicit representa-
context-sensitive variables can be higher: program variables that have many tions of the same relations. Figure 18 shows the sizes of rela-
points-to facts are also used in many more contexts, skewing the results.
tions “nodes” (representing the context-sensitive call-graph ings “that are simultaneously good for the many BDDs in a
nodes, i.e., context-qualified reachable methods), “edges” system of interrelated analyses” [15]. It does not, therefore,
(i.e., context-sensitive call-graph edges), var points-to (the seem likely that BDDs will be the best representation option
main points-to relation, for context-qualified vars), and field for precise context-sensitive points-to analyses without sig-
points-to (the points-to relation for object fields). For each nificant progress in our understanding of how BDDs can be
relation, the table shows the size of its explicit represen- employed.
tation (measured in number of rows—i.e., number of total
facts in the relation), the size of the BDD representation (in 6. Related and Future Work
number of BDD nodes) and the ratio of these two numbers— Fast and Precise Pointer Analysis. There is an immense
although they are in different units the variation of the ratios body of work on pointer analysis, so we need to restrict
is highly informative. our discussion to some representative and recent work. Fast
The above numbers are for P as configured for our and precise pointer analysis is, unfortunately, still a trade-
P-compatibility experiments, so that the BDD statis- off. This is unlikely to change. Most recent work in pointer
tics can be directly correlated to the performance of D analysis explores methods to improve performance by re-
(explicit representation) vs. P (BDDs). Examination of ducing precision strategically. The challenge is to limit the
the table in comparison with Figures 4-8 reveals that the per- loss of precision, yet gain considerably in performance. For
formance of P relative to D is highly correlated with instance, Lattner et al. show [14] that an analysis with a
the overall effectiveness of BDDs for relation representation. context-sensitive heap abstraction can be very efficient by
For benchmarks and analyses for which P performs sacrificing precision using unification constraints. This is a
better compared to D, we find that all four relations (or at common sacrifice. Furthermore, there are still considerable
least the largest ones, if their size dominates the sizes of oth- improvements possible in solving the constraints of the clas-
ers) exhibit a much lower ratio of BDD-nodes-to-facts than sic inclusion-based pointer analysis of Andersen, as illus-
in other benchmarks or analyses. Consider, for instance, the trated by Hardekopf and Lin [10].
1-object+heap analysis. The BDD size statistics reveal that In full context-sensitive pointer analysis, there is an on-
bloat and jython are significant outliers compared to the going search for context abstractions that provide precise
rest of the DaCapo applications: their BDD-nodes-to-facts pointer information, and do not cause massive redundant
ratios are much lower for all large relations. A quick com- computation. Milanova suggested that an object-sensitive
parison with Figure 8 reveals that P performs unusually analysis [20] is an effective context abstraction for object-
well for these two benchmarks. oriented programs, which was confirmed by Lhoták’s exten-
This understanding of the performance model for the sive evaluation [18]. Several researchers have argued for the
BDD-based representation leads to further insights. The ul- benefits of using a context-sensitive heap abstraction to im-
timate question we want to answer is whether (and under prove precision [18, 22].
what conditions) there is enough regularity in relations in- The use of BDDs attempts to solve the problem of the
volved in points-to analyses for BDDs to be the best rep- large amount of data in context-sensitive pointer analysis by
resentation choice. Figure 18 suggests that this is not the representing its redundancy efficiently [2, 29]. The redun-
case, at least for the analyses studied here. The main way dancy should ideally be eliminated by choosing the right
to improve the performance of the BDD representation is by context abstraction. Xu and Rountev’s recent work [30] ad-
changing the BDD variable ordering. The BDD variable or- dresses this problem. Their method aims to determine con-
dering used in our P experiments is one that minimizes text abstractions that will yield the same points-to informa-
the size of the var points-to relation (which, indeed, consis- tion. This is an exciting research direction, orthogonal to our
tently has a small BDD-nodes-to-facts ratio in Figure 18). work on declarative specifications and optimization. How-
This order was observed by Lhoták to yield the best results ever, in their specific implementation, memory consumption
in terms of performance. (It is worth noting that the P- is growing quickly for bigger benchmarks, even on Java 1.3.
authors were among the first to use BDDs in program IBM Research’s W [7] static analysis library is de-
analysis, have a long history of experimentation in multiple signed to support different pointer analysis configurations,
successive systems, and have experimented extensively with but no results of W’s accuracy or speed have been re-
BDD variable orderings until deriving ones that yield “im- ported in the literature. It will be interesting to compare our
pressive results” [2].) Nevertheless, what we see in Figure 18 analyses to W in future work.
is that it is very hard to provide a variable ordering that min-
imizes all crucial BDDs. Although the var points-to relation Reflection and Program Analysis. Reflection, dynamic
is consistently small, the (context-sensitive) call-graph edge class loading, and native methods are a major issue for static
relation is inefficient and it is usually large enough to matter. program analysis. P inherits support for many native
All current techniques utilizing BDDs for points-to analy- methods from its predecessor, S [16]. Paddle’s support
sis (e.g., in bddbddb or P) require BDD variable order- for reflection is relatively unsophisticated compared to the
reflection analysis of Livshits specified in Datalog on top
call-graph nodes call-graph edges var points-to field points-to
facts bdd ratio facts bdd ratio facts bdd ratio facts bdd ratio
antlr 4K 1K 0.35 23K 95K 4.23 2.0M 58K 0.03 766K 28K 0.04
bloat 6K 2K 0.26 46K 132K 2.86 7.9M 81K 0.01 1.0M 38K 0.04
context-insensitive
chart 8K 3K 0.35 39K 163K 4.19 5.3M 101K 0.02 1.8M 51K 0.03
eclipse 5K 2K 0.34 24K 104K 4.39 2.4M 63K 0.03 746K 31K 0.04
hsqldb 4K 1K 0.41 17K 80K 4.71 1.5M 50K 0.03 493K 23K 0.05
jython 6K 2K 0.31 32K 123K 3.90 3.3M 72K 0.02 750K 34K 0.04
luindex 4K 1K 0.38 18K 86K 4.70 1.5M 53K 0.03 567K 25K 0.04
lusearch 4K 2K 0.34 21K 98K 4.65 1.8M 59K 0.03 606K 28K 0.05
pmd 5K 2K 0.32 25K 113K 4.51 2.5M 62K 0.02 652K 28K 0.04
xalan 4K 1K 0.40 17K 80K 4.78 1.4M 50K 0.04 501K 23K 0.05
antlr 22K 37K 1.64 83K 682K 8.26 2.9M 735K 0.26 636K 28K 0.04
bloat 45K 55K 1.21 266K 1.1M 4.32 30M 1.5M 0.05 792K 39K 0.05
1-call-site-sensitive
chart 39K 64K 1.67 164K 1.2M 7.09 18M 1.6M 0.09 1.4M 52K 0.04
eclipse 23K 38K 1.64 113K 705K 6.22 4.0M 852K 0.21 572K 32K 0.06
hsqldb 17K 29K 1.73 61K 523K 8.62 2.1M 590K 0.28 395K 24K 0.06
jython 31K 47K 1.51 139K 907K 6.53 5.7M 1.0M 0.18 539K 35K 0.06
luindex 18K 31K 1.73 65K 559K 8.63 2.4M 645K 0.27 459K 26K 0.06
lusearch 21K 36K 1.69 76K 638K 8.41 2.9M 751K 0.26 488K 29K 0.06
pmd 25K 42K 1.69 94K 769K 8.14 4.7M 843K 0.18 512K 29K 0.06
xalan 17K 29K 1.74 60K 519K 8.64 2.1M 595K 0.29 396K 24K 0.06
antlr 22K 37K 1.63 83K 682K 8.26 8.9M 2.4M 0.27 12M 7.3M 0.59
1-call-site-sensitive+heap
bloat 45K 55K 1.22 251K 1.1M 4.55 159M 7.3M 0.05 27M 10M 0.38
chart 39K 64K 1.66 164K 1.2M 7.11 42M 6.3M 0.15 26M 16M 0.63
eclipse 23K 38K 1.64 113K 706K 6.23 14M 3.1M 0.23 9.4M 7.1M 0.75
hsqldb 17K 29K 1.73 61K 523K 8.61 6.2M 1.8M 0.30 5.7M 4.3M 0.76
jython 31K 47K 1.50 139K 908K 6.54 22M 4.2M 0.19 15M 8.6M 0.58
luindex 18K 31K 1.73 65K 560K 8.63 7.0M 2.1M 0.30 6.4M 5.0M 0.78
lusearch 21K 36K 1.70 76K 637K 8.40 8.5M 2.5M 0.30 7.8M 5.7M 0.74
pmd 25K 42K 1.69 94K 768K 8.13 14M 3.1M 0.22 8.2M 6.7M 0.82
xalan 17K 29K 1.74 60K 518K 8.64 6.1M 1.8M 0.30 5.7M 4.3M 0.77
antlr 36K 19K 0.54 218K 489K 2.25 1.5M 324K 0.22 25K 33K 1.33
bloat 71K 27K 0.38 1.8M 1.2M 0.65 14M 646K 0.05 307K 44K 0.14
1-object-sensitive
chart 81K 38K 0.47 1.0M 1.1M 1.14 16M 763K 0.05 60K 58K 0.97
eclipse 40K 22K 0.55 312K 596K 1.91 1.9M 381K 0.20 27K 36K 1.33
hsqldb 31K 17K 0.55 170K 412K 2.43 1.1M 271K 0.25 17K 28K 1.69
jython 64K 26K 0.40 746K 742K 0.99 4.9M 455K 0.09 38K 39K 1.02
luindex 32K 18K 0.57 178K 436K 2.44 1.2M 294K 0.24 18K 30K 1.73
lusearch 35K 20K 0.57 202K 492K 2.43 1.5M 335K 0.23 20K 34K 1.71
pmd 42K 21K 0.50 309K 557K 1.80 2.6M 373K 0.14 40K 34K 0.85
xalan 30K 17K 0.56 168K 411K 2.45 1.1M 274K 0.25 16K 28K 1.73
antlr 35K 19K 0.55 161K 448K 2.79 8.6M 797K 0.09 2.3M 505K 0.22
1-object-sensitive+heap
bloat 69K 27K 0.39 1.4M 1.0M 0.73 56M 1.9M 0.03 13M 1.2M 0.09
chart 76K 37K 0.49 647K 973K 1.50 41M 1.9M 0.05 9.1M 1.3M 0.14
eclipse 39K 22K 0.56 212K 544K 2.56 11M 1.0M 0.10 2.8M 631K 0.23
hsqldb 30K 17K 0.56 131K 380K 2.90 6.3M 656K 0.10 1.7M 409K 0.24
jython 62K 25K 0.41 638K 684K 1.07 76M 1.4M 0.02 15M 1.1M 0.07
luindex 31K 18K 0.58 134K 402K 2.99 6.4M 695K 0.11 1.7M 427K 0.26
lusearch 34K 20K 0.58 147K 447K 3.04 7.3M 785K 0.11 1.8M 488K 0.26
pmd 41K 21K 0.52 216K 499K 2.31 10M 892K 0.09 2.8M 539K 0.19
xalan 30K 17K 0.57 129K 379K 2.93 6.0M 665K 0.11 1.5M 411K 0.27
Figure 18. BDD statistics for the most important context-sensitive relations of Paddle: total number of facts in the context-
sensitive relation, number of BDD nodes used to represent those facts, and the ratio of BDD nodes / total number of facts.
of Whaley’s bddbddb [19]. In particular, P does not ing the DRed [8] algorithm. Efficient incremental evaluation
maintain information about Class objects created through might make context-sensitive pointer analysis suitable for
Class.forName, which requires very conservative assump- use in IDEs.
tions about later Class.newInstance invocations. However,
the reflection analysis of Livshits was only integrated in
7. Conclusions
a context-insensitive pointer analysis. The fully declarative
nature of D allows us to use very similar Datalog rules We presented D: a purely declarative points-to analysis
also in context-sensitive analyses. framework that raises the bar for precise context-sensitive
analyses. D is elegant, full-featured, modular, and high-
Declarative Programming Analysis. Program analysis us- level, yet achieves remarkable performance due to a novel
ing logic programming has a long history (e.g., [4, 23]), but optimization methodology focused on highly recursive Dat-
this early work only considers very small programs. In re- alog programs. D uses an explicit representation of re-
cent years, there have been efforts to apply declarative pro- lations and cha(lle)nges the community’s understanding on
gram analysis to much larger codebases and more complex how to implement efficient points-to analyses.
analysis problems. We discussed the relation to Whaley’s
work on context-sensitive pointer analysis using Datalog and Acknowledgments This work was funded by the NSF
BDDs [29] throughout this paper. The D [1] analysis (CCF-0917774, CCF-0934631) and by LogicBlox Inc. We
framework has shown to be competitive in performance for thank Ondřej Lhoták for his advice on benchmarking P-
context-insensitive pointer analysis using tabled Prolog. The , Oege de Moor and Molham Aref for useful discus-
demonstrated pointer analysis of D uses a conservative, sions, the anonymous reviewers for helpful comments, and
pre-computed call graph, so the analysis is reduced to prop- the LogicBlox developers for their practical help and sup-
agation of points-to information of assignments, which can port.
be very efficient. D expresses all the logic of a context-
sensitive pointer analysis in Datalog.
References
Demand-Driven and Incremental Analysis. A demand- [1] W. C. Benton and C. N. Fischer. Interactive, scalable, declar-
driven evaluation strategy reduces the cost of an analysis ative program analysis: from prototype to implementation.
by computing only those results that are necessary for a In PPDP ’07: Proc. of the 9th ACM SIGPLAN int. conf. on
client program analysis [12, 26, 27, 31]. This is a useful Principles and practice of declarative programming, pages
approach for client analyses that focus on specific locations 13–24, New York, NY, USA, 2007. ACM.
in a program, but if the client needs results from the entire [2] M. Berndl, O. Lhoták, F. Qian, L. J. Hendren, and N. Umanee.
program, then demand-driven analysis is typically slower Points-to analysis using bdds. In PLDI, pages 103–114.
than an exhaustive pointer analysis. Reps [24] showed how ACM, 2003.
to use the standard magic-sets optimization to automatically [3] M. Bravenboer and Y. Smaragdakis. Exception analysis and
derive a demand-driven analysis from an exhaustive analysis points-to analysis: Better together. In L. Dillon, editor, ISSTA
(like ours). This optimization combines the benefits of top- ’09: Proceedings of the 2009 International Symposium on
down and bottom-up evaluation of logic programs by adding Software Testing and Analysis, New York, NY, USA, July
side-conditions to rules that limit the computation to just the 2009. To appear.
required data. [4] S. Dawson, C. R. Ramakrishnan, and D. S. Warren. Practical
More recently, Saha and Ramakrishnan [25] explored the program analysis using general purpose logic programming
application of incremental logic program evaluation strate- systems—a case study. In PLDI ’96: Proc. of the ACM
gies to context-insensitive pointer analysis. As pointed out SIGPLAN 1996 conf. on Programming language design and
in this work, the algorithms for materialized view mainte- implementation, pages 117–126, New York, NY, USA, 1996.
nance and incremental program analysis are highly related. ACM.
As we discussed, incremental evaluation is also crucial for [5] S. K. Debray. Unfold/fold transformations and loop
D’s performance. The large number of reachable meth- optimization of logic programs. In PLDI ’88: Proc. of
ods in an empty Java program4 suggests that incremental the ACM SIGPLAN 1988 conf. on Programming Language
analysis could bring down the from-scratch evaluation time design and Implementation, pages 297–307, New York, NY,
substantially. We have not explored these incremental eval- USA, 1988. ACM.
uation scenarios yet. The engine we use also supports in- [6] M. Eichberg, S. Kloppenburg, K. Klose, and M. Mezini.
cremental evaluation after deletion and updates of facts us- Defining and continuous checking of structural program
dependencies. In ICSE ’08: Proc. of the 30th int. conf. on
4 Even an empty Java program causes the execution of a number of methods
Software engineering, pages 391–400, New York, NY, USA,
from the standard library. This causes a static analysis to compute an even 2008. ACM.
larger number of reachable methods, especially when no assumptions are
made about the loading environment (e.g., security settings and where the [7] S. J. Fink. T.J. Watson libraries for analysis (WALA).
empty class will be loaded from). http://wala.sourceforge.net.
[8] A. Gupta, I. S. Mumick, and V. S. Subrahmanian. Main- and Implementation (PLDI’06), pages 308–319, 2006.
taining views incrementally. In SIGMOD ’93: Proc. of the [22] E. M. Nystrom, H.-S. Kim, and W. mei W. Hwu. Importance
1993 ACM SIGMOD int. conf. on Management of data, pages of heap specialization in pointer analysis. In PASTE ’04:
157–166, New York, NY, USA, 1993. ACM. Proc. of the 5th ACM SIGPLAN-SIGSOFT workshop on
[9] E. Hajiyev, M. Verbaere, and O. de Moor. Codequest: Program analysis for software tools and engineering, pages
Scalable source code queries with datalog. In Proc. European 43–48, New York, NY, USA, 2004. ACM.
Conf. on Object-Oriented Programming (ECOOP), pages 2– [23] T. Reps. Demand interprocedural program analysis using
27. Spinger, 2006. logic databases. In R. Ramakrishnan, editor, Applications
[10] B. Hardekopf and C. Lin. The ant and the grasshopper: fast of Logic Databases, pages 163–196. Kluwer Academic
and accurate pointer analysis for millions of lines of code. Publishers, 1994.
In PLDI’07: Proc. ACM SIGPLAN conf. on Programming [24] T. W. Reps. Solving demand versions of interprocedural
Language Design and Implementation, pages 290–299, New analysis problems. In CC ’94: Proc. of the 5th Int. Conf. on
York, NY, USA, 2007. ACM. Compiler Construction, pages 389–403, London, UK, 1994.
[11] B. Hardekopf and C. Lin. Semi-sparse flow-sensitive pointer Springer-Verlag.
analysis. In POPL ’09: Proceedings of the 36th annual ACM [25] D. Saha and C. R. Ramakrishnan. Incremental and demand-
SIGPLAN-SIGACT symposium on Principles of programming driven points-to analysis using logic programming. In PPDP
languages, pages 226–238, New York, NY, USA, 2009. ’05: Proc. of the 7th ACM SIGPLAN int. conf. on Principles
ACM. and practice of declarative programming, pages 117–128,
[12] N. Heintze and O. Tardieu. Demand-driven pointer analysis. New York, NY, USA, 2005. ACM.
In PLDI ’01: Proc. of the ACM SIGPLAN 2001 conf. on [26] M. Sridharan and R. Bodı́k. Refinement-based context-
Programming language design and implementation, pages sensitive points-to analysis for java. In PLDI ’06: Proc. of
24–34, New York, NY, USA, 2001. ACM. the 2006 ACM SIGPLAN conf. on Programming language
[13] M. S. Lam, J. Whaley, V. B. Livshits, M. C. Martin, D. Avots, design and implementation, pages 387–400, New York, NY,
M. Carbin, and C. Unkel. Context-sensitive program analysis USA, 2006. ACM.
as database queries. In PODS ’05: Proc. of the twenty-fourth [27] M. Sridharan, D. Gopan, L. Shan, and R. Bodı́k. Demand-
ACM SIGMOD-SIGACT-SIGART symposium on Principles driven points-to analysis for java. In OOPSLA ’05: Proc.
of database systems, pages 1–12, New York, NY, USA, 2005. of the 20th annual ACM SIGPLAN conf. on Object oriented
ACM. programming, systems, languages, and applications, pages
[14] C. Lattner, A. Lenharth, and V. Adve. Making context- 59–76, New York, NY, USA, 2005. ACM.
sensitive points-to analysis with heap cloning practical for [28] J. Whaley, D. Avots, M. Carbin, and M. S. Lam. Using
the real world. SIGPLAN Not., 42(6):278–289, 2007. datalog with binary decision diagrams for program analysis.
[15] O. Lhoták. Program Analysis using Binary Decision In K. Yi, editor, APLAS, volume 3780 of Lecture Notes in
Diagrams. PhD thesis, McGill University, Jan. 2006. Computer Science, pages 97–118. Springer, 2005.
[16] O. Lhoták and L. Hendren. Scaling Java points-to analysis [29] J. Whaley and M. S. Lam. Cloning-based context-sensitive
using Spark. In G. Hedin, editor, Compiler Construction, 12th pointer alias analysis using binary decision diagrams. In
Int. Conf., volume 2622 of LNCS, pages 153–169, Warsaw, PLDI ’04: Proc. of the ACM SIGPLAN 2004 conf. on
Poland, April 2003. Springer. Programming language design and implementation, pages
[17] O. Lhoták and L. Hendren. Jedd: a bdd-based relational ex- 131–144, New York, NY, USA, 2004. ACM.
tension of java. In PLDI ’04: Proc. of the ACM SIGPLAN [30] G. Xu and A. Rountev. Merging equivalent contexts
2004 conf. on Programming language design and implemen- for scalable heap-cloning-based context-sensitive points-to
tation, pages 158–169, New York, NY, USA, 2004. ACM. analysis. In ISSTA ’08: Proc. of the 2008 int. symposium
[18] O. Lhoták and L. Hendren. Evaluating the benefits of on Software testing and analysis, pages 225–236, New York,
context-sensitive points-to analysis using a BDD-based NY, USA, 2008. ACM.
implementation. ACM Trans. Softw. Eng. Methodol., 18(1):1– [31] X. Zheng and R. Rugina. Demand-driven alias analysis
53, 2008. for c. In POPL ’08: Proc. of the 35th annual ACM
[19] B. Livshits, J. Whaley, and M. S. Lam. Reflection analysis SIGPLAN-SIGACT symposium on Principles of programming
for Java. In K. Yi, editor, Proceedings of the 3rd Asian languages, pages 197–208, New York, NY, USA, 2008.
Symposium on Programming Languages and Systems, ACM.
volume 3780. Springer-Verlag, Nov. 2005.
[20] A. Milanova, A. Rountev, and B. G. Ryder. Parameterized
object sensitivity for points-to analysis for java. ACM Trans.
Softw. Eng. Methodol., 14(1):1–41, 2005.
[21] M. Naik, A. Aiken, and J. Whaley. Effective static race
detection for java. In Proceedings of the 2006 ACM
SIGPLAN Conference on Programming Language Design
Published in Software Safety and Security; Tools for Analysis and Verification. NATO Science for
Peace and Security Series, vol 33, pp286-318, 2012
Abstract. These are the notes to accompany a course at the Marktoberdorf PhD
summer school in 2011. The course consists of an introduction to separation logic,
with a slant towards its use in automatic program verification and analysis.
Keywords. Program Logic, Automatic Program Verification, Abstract Interpretation,
Separation Logic
1. Introduction
Separation logic, first developed in papers by John Reynolds, the author, Hongseok Yang
and Samin Ishtiaq, around the turn of the millenium [73,47,61,74], is an extension of
Hoare’s logic for reasoning about programs that access and mutate data held in computer
memory. It is based on the separating conjunction P ∗ Q, which asserts that P and Q
hold for separate portions of memory, and on program-proof rules that exploit separation
to provide modular reasoning about programs.
In this course I am going to introduce the basics of separation logic, its semantics,
and proof theory, in a way that is oriented towards its use in automatic program-proof
tools and abstract interpreters, an area of work which has seen increasing attention in
recent years. After the basics, I will describe how the ideas can be used to build a verifi-
cation or program analysis tool.
The course consists of four lectures:
1. Basics, where the fundamental ideas of the logic are presented in a semi-formal
style;
2. Foundations, where we get into the fomalities, including the semantics of the
assertion language and axioms and inference rules for heap-mutating commands,
and culminating in an account of the local dynamics which underpin some of the
rules in the logic;
1 This work was supported by funding from the Royal Society, the EPSRC and Microsoft Research.
3. Proof Theory and Symbolic Execution, which describes a way of reasoning about
programs by ‘executing’ programs on formulae rather than concrete states, and
which can form the basis for an automatic verifier; and
4. Program Analysis, where abstraction is used to infer loop invariants and other
annotations, increasing the level of automation.
These course notes include two sections based on the first two lectures, followed by a
section collecting ideas from the last two lectures. At this stage the notes are incomplete,
and they will possibly be improved and extended in the future. I hope, though, that they
will still prove useful in giving a flavour of some of the main lines of work, as well as in
pointers into the literature. In particular, at the end I give references to current directions
being pursued in program analysis.
I should say that, with this slant towards automatic proof and program analysis, there
are active ongoing developments related to separation logic in several other directions
that I will not be able to cover, particularly in concurrency, data abstraction and refine-
ment, object-oriented languages and scripting languages; a small sample of work in these
directions includes [62,64,66,10,81,34,28,38].
2. Basics
In this section I introduce separation logic in a semi-formal way. I am hoping that some
of the ideas can strike home and be seen to reflect natural reasoning that programmers
might employ, even before we consider formal definitions. Of course, the informal pre-
sentation inevitably skates over some issues, issues that could very well lead to unsound
conclusions if not treated correctly, and to nail things down we will get to the definitions
in the next section.
x|->y * y|-> x
x y
10 42
x=10
42 10
y=42
We read the formula at the top of this figure as ‘x points to y, and separately y points to
x’. Going down the middle of the diagram is a line which represents a heap partitioning:
a separating conjunction asks for a partitioning that divides memory into parts satisfying
its two conjuncts.
At the bottom of the figure is an example of a concrete memory description that
corresponds to the diagram. There, x and y have values 10 and 42 (in the ‘environment’,
or ‘register bank’), and 10 and 42 are themselves locations with the indicated contents
(in the ‘heap’, or even ‘RAM’).
The indicated separating conjunction above is true of the pictured memory because
the parts satisfy the corresponding conjuncts. That is, the components
x|->y y|-> x
x y x y
And
Separately
x=10 10 x=10 42
42 10
y=42 y=42
Proving by Executing. I am going to show you part of a program proof outline in sepa-
ration logic. It might seem slightly eccentric that I do this before giving you a definition
of the logic. My aim is to use a computational reading of the proof steps to motivate the
inference rules, rather than starting from them.
Consider the following procedure for disposing the elements in a tree.
procedure DispTree(p)
local i, j;
if ¬isatom?(p) then
i := p→l; j := p→r;
DispTree(i);
DispTree(j);
free(p)
This is the expected procedure that walks a tree, recursively disposing left and right
subtrees and then the root pointer. It uses a representation of tree nodes as cells containing
left and right pointers, with the base case corresponding to atomic, non-pointer values.
(See Exercise 2 below for a fuller description.)
The specification of DispTree is just
tree(p) DispTree(p) emp}
which says that if you have a tree at the beginning then you end up with the empty heap
at the end. For this to make sense it is crucial that when tree(p) is true of a heap then
that heap (or rather the heaplet, a portion of a global heap) contains all and only those
cells in the tree. So, the spec talks about as small a portion of the global program state as
possible.
The crucial part of the argument for DispTree’s correctness, in the then branch,
can be pictured with the following annotated program which gives a ‘proof by execution’
style argument.
After we enter the then branch of the conditional we know that ¬isatom?(p), so that
(according to the inductive definition of the tree predicate) p points to left and right sub-
trees occupying separate storage. Then the roots of the two subtrees are loaded into i
and j. The first recursive call operates in-place on the left subtree, removing it. The two
consecutive assertions in the middle of the proof are an application of the rule of conse-
quence of Hoare logic. These two assertions are equivalent because emp is the unit of ∗.
Continuing on, the second call removes the right subtree, and the final instruction frees
the root pointer p. The assertions, and their mutations, follow this operational narrative.
I am leading to a more general suggestion: try thinking about reasoning in separation
logic as if you are an interpreter. The formulae are like states, symbolic states. Execute
code forwards, updating formulae in the usual way you do when thinking about in-place
update of memory. In-place reasoning works not only for freeing a cell, but for heap
mutation and allocation as well. And, it even works for larger-scale operations such as
entire procedure calls: we updated the assertions in-place at each of the recursive call
sites during this ‘proof’.
Exercise 1 The usual Hoare logic rules for sequencing and consequence are
Local Reasoning and Frame Axioms. In the steps in the proof outline for DispTree(p)
I used the procedure spec as an assumption when reasoning about the recursive calls, as
usual when reasoning about recursive procedures in Hoare logic [43]. However, there is
an extra ingredient at work. For the second recursive call, for instance, the assertion at
the call site does not match the procedure specification’s precondition, even after p in the
spec is instantiated with j, because the assertion has an extra ∗-conjunct, p7→[l: i, r: j].
This extra ∗-conjunct is not touched by the recursive call. It is called a ‘frame axiom’
in AI. The terminology ‘frame axiom’ comes from an analogy with animation, where
the moving parts of a scene are successively aid over an unchanging frame. Indeed, the
fact p7→[l: i, r: j] is left unchanged by the second call. You should be able to pick out the
frame in the first call as well.
Thus, there is something slightly awry in this ‘proof’, unless I tell you more: The
mismatch, between the call sites and the procedure precondition, needs to be taken care
of if we are really to have a proof of the procedure. One way to resolve the mismatch
would be to complicate the specification of the procedure, to talk explicitly about frames
in one way or another (see ‘back in the day’ below). A better approach is to have a generic
inference rule, which allows us to avoid mentioning the frames at all in our specifications,
but to bring them in when needed. This generic rule is
{P } C {Q}
Frame Rule
{R ∗ P } C {R ∗ Q}
and it lets us tack on additional assertions ‘for free’, as it were. For instance, in the second
recursive call the frame axiom R selected is p7→[l: i, r: j] and {P }C{Q} is a substitution
instance of the procedure spec: this captures that the recursive call does not alter the root
pointer.
This better way, which avoids talking about frames in specifications, corresponds
to programming intuition. When reasoning about a program we should only have to
talk about the resources it accesses (its ‘footprint’), as all other resources will remain
unchanged. This is the principle of local reasoning [47,61]. In the specification of
DispTree the precondition tree(p) describes only those cells touched by the procedure.
Aside: back in the day... This issue of local reasoning has nothing to do with the ‘power’
or ‘completeness’ of a formal method: what is possible to do in principle. It has only to do
with the simplicity and directness of the specs and proofs. To see the issue more clearly,
consider how we might have written a spec for DispTree(p) in traditional Hoare logic,
before we had the frame rule. Here is a beginning attempt:
tree(p) ∧ reach(p, n) DispTree(p) ¬allocated(n)
assuming that we have defined the predicates that say when p points to a (binary) tree in
memory, when n is reachable (following l and r links) from p, and when n is allocated.
This spec says that any node n which is in the tree pointed to by p is not allocated on
conclusion.
While this specification says part of what we would like to say, it leaves too much
unsaid. It does not say what the procedure does to nodes that are not in the tree. As a
result, this specification is too weak to use at many call sites. For example, consider the
first recursive call, DispTree(i), to dispose the left subtree. If we use the specification
(instantiating p by i) as an hypothesis, then we have a problem: the specification does
not rule out the possibility that the procedure call alters the right subtree j, perhaps
creating a cycle or even disposing some of its nodes. As a consequence, when we come
to the second call DispTree(j), we will not know that the required tree(j) part of the
precondition will hold. So our reasoning will get stuck.
We can fix this ‘problem’ by making a stronger specification which includes frame
axioms.
0
tree(p) ∧ reach(p,
n) ∧ ¬reach(p, m) ∧ allocated(m) ∧ m.f = m ∧
¬allocated(q)
DispTree(p)
0
¬allocated(n) ∧ ¬reach(p, m) ∧ allocated(m) ∧ m.f = m ∧
¬allocated(q)
The additional parts of the spec say that any allocated cell not reachable from p has
the same contents in memory and that any previously unallocated cell remains unallo-
cated. The additional clauses are the frame axioms. (I am assuming that m, m0 , n and q
are auxiliary variables, guaranteed not to be altered. The reason why, say, the predicate
¬allocated(q) could conceivably change, even if q is constant, is that the allocated
predicate refers to a behind-the-scenes heap component. f is used in the spec as an arbi-
trary field name.)
Whether or not this more complicated specification is correct, I think you will agree:
it is complicated! I expect that you will agree as well that it is preferable for the frame
axioms to be left out of specs, and inferred when needed.
Beyond Shapes. The above shows one inductive definition, for binary trees. The def-
inition is limited in that it does not talk about the contents of a tree. It is the kind of
definition often used in automatic shape analysis, as we will describe in Section 4, where
avoiding taking about the contents can make it easier to prove entailments or synthesize
invariants.
To illustrate the limitation of the definition, suppose that we were to write a proce-
dure to copy a tree rather than delete it. We could give it a specification such as
tree(p) q := CopyTree(p) tree(p) ∗ tree(q)}
but then this specification would also be satisfied by a procedure that rotates a tree as it
copies. A more precise specification would be of the form
tree(p, τ ) q := CopyTree(p) tree(p, τ ) ∗ tree(q, τ )}
where tree(p, τ ) is a predicate which says that p points to a data structure in memory
representing the mathematical tree τ . (I use the term ‘mathematical’ tree to distinguish
if from a representation in the computer memory: the mathematical tree does not contain
pointer or other such representation information.)
Exercise 2 The notion of ‘mathematical tree’ appropriate to the above inductive defini-
tion of the tree predicate is that of an s-expression (the terminology comes from Lisp):
that is, an atom, or a pair of s-expressions. An s-expression is an element of the least set
satisfying the equation
Define the CopyTree procedure and give a proof-by-execution style argument for
its correctness, where you put assertions (symbolic states) at the appropriate program
points. Yes, I am asking you to do a ‘proof’ in a formalism that has not yet been defined
(!), but give it a try.
2.3. Perspective.
3. Foundations
Building on the ideas described informally in the previous section, I now give a rigorous
treatment of the program logic.
The model has two components, the store and the heap. The store is a finite partial func-
tion mapping from variables to integers, and the heap is a finite function from natural
numbers to integers.
∆ ∆
Stores = Variables *f in Ints Heaps = Nats *f in Ints
∆
(= abbreviates ‘is defined to be equal to’.) In logic, what we are calling the store is often
called the valuation, and the heap is a possible world. In programming languages, what
we are calling the store is also sometimes called the environment (the association of
values to variables).
We have standard integer expressions E and boolean expressions B built up from
variables and constants. These are heap-independent, so determine denotations
where the domain of s ∈ Stores includes the free variables of E or B. We leave this
semantics unspecified.
We use the following notations in the semantics of assertions.
1. dom(h) denotes the domain of definition of a heap h ∈ Heaps, and dom(s) is the
domain of s ∈ Stores;
2. h#h0 says that dom(h) ∩ dom(h0 ) = ∅;
3. h • h0 denotes the union of functions with disjoint domains, which is undefined
if the domains overlap;
4. (f | i 7→ j) is the partial function like f except that i goes to j.
The satisfaction judgement s, h |= P which says that an assertion holds for a given
store and heap, assuming that the free variables of P are contained in the domain of s.
s, h |= false never
s, h |= P ⇒ Q iff if s, h |= P then s, h |= Q
The semantics of the connectives (⇒ false, ∀) gives rise to meanings of other connec-
tives of classical logic (∃, ∨, ¬, true) in the usual way. For example, taking P ∧ Q to be
¬(P ⇒ ¬Q), we obtain that s, h |= P ∧ Q has the usual meaning of ‘s, h |= P and
s, h |= Q’.
The general logical context of this form of semantics is that it can be seen as a
possible world model which combines:
(i) the standard semantics of classical logic (⇒, false, ∀) in the complete boolean
algebra of the power set of heaps; and
(ii) a semantics of ‘substructural logic’ (emp, ∗, −∗ ) in the same power set (which
gives us what is known as a residuated commutative monoid, an ordered commu-
tative monoid where A ∗ (−) has a right adjoint A−∗ (−)).
The semantics is an instance of the ‘resource semantics’ of bunched logic devised by
David Pym [63,70,69], where one starts from a partial commutative monoid in place of
heaps (with • and the empty heap giving partial monoid structure). The resulting math-
ematical structure on the powerset, of a boolean algebra with an additional commuta-
tive residuated monoid structure, is sometimes called a ‘boolean BI algebra’. The model
of • as heap partitioning, which lies at the basis of separation logic, was discovered by
John Reynolds when he first described the separating conjunction [73]. The separating
conjunction was connected with Pym’s general resource semantics in [47].
Notice that the semantics of E7→F requires that E is the only active address in
the current heap. Using ∗ we can build up descriptions of larger heaps. For example,
(107→3) ∗ (117→10) describes two adjacent cells whose contents are 3 and 10. We can
express an inexact variant of points-to as follows
E,→F = (true ∗ E7→F ).
Generally, true ∗ P says that P is true of a subheap of the current one. The difference
between ,→ and 7→ shows up in the presence or absence of projection or Weakening for
∗.
1. P ∗ (x7→1) ⇒ (x7→1) is not always true.
2. P ∗ (x,→1) ⇒ (x,→1) is always true.
The different way that the two conjunctions ∗ and ∧ behave is illustrated by the
following examples.
1. (x7→2) ∗ (x7→2) is unsatisfiable (you can’t be in two places at once).
2. (x7→2) ∧ (x7→2) is equivalent to x7→2.
3. (x7→1) ∗ ¬(x7→1) is satisfiable (thus, we have a kind of ‘paraconsistent’ logic).
4. (x7→1) ∧ ¬(x7→1) is unsatisfiable.
The third example drives home how separation logic assertions do not talk about the
global heap: P ∗¬P can be consistent because P can hold of one portion of heap and ¬P
of another. To understand separation logic assertions you should always think locally:
for this you might regard the h component in the semantics of assertions as describing a
‘heaplet’, a portion of heap, rather than a complete global heap in and of itself.
Notice that the rhs of the clause for s, h |= B does not mention h at all, where for
s, h |= E 7→ F the rhs does contain h. I said I would not give a precise semantics of
boolean expressions, but let me consider just one, the expression x = y where x and y
are variables:
∆
[[x = y]]s = (sx = sy).
Now, consider the assertion (x = y) ∗ (x = y). Can it ever be true? Well, yes, it is
satisfiable, and in fact it has the same meaning as x = y and as (x = y) ∧ (x = y). On
the other hand, consider the assertion (x7→y) ∗ (x7→y). Can it ever be true? How about
(x = y) ∗ (x7→y)? Or (x = y) ∧ (x7→y)? Work out the answers to these questions by
expanding the semantic definitions.
Now we can be more precise about its meaning. The use of if-then-else can be desugared
using boolean logic connectives in the usual way. if B then P else Q is the same
as (B ∧ P ) ∨ (¬B ∧ Q) where here B is heap-independent. Therefore, in the inductive
definition we can now see that the condition (isatom?(E) ∧ E = τ ) is completely
heap-independent, and not affected by ∗: it talks only about values, and not the contents
of heap cells.
It is also helpful to ponder the clause
B ∧ P ⇐⇒ (B ∧ emp) ∗ P
∆
Stores = Variables *f in (Ints + Sexp)
so that a variable can take on an s-expression as well as an integer value. We could also
distinguish s-expression variables τ from program variables x syntactically. (In practice,
one would probably want to use a many-sorted rather than one-sorted logic as we are
doing in these notes for theoretical simplicity.)
Finally, we can regard E7→[l: x, r: y] as sugar for (E7→x) ∗ (E +17→y) in the RAM
model. Note, though, that this low-level desugaring is not part of the essence of separa-
tion logic, only this particular model. Other models can be used where heaps are repre-
sented by L *f in V where V might be a structured type to represent records. However,
that the RAM model can be used is appealing in a foundational way, as we know that
programs of all kinds are eventually compiled to such a model (modern concerns with
weak memory notwithstanding).
Generally, for any kind of data structure you will want to provide an appropriate
predicate definition which will often be inductive. Linked lists are the most basic case,
and illustrate some of the issues involved.
When reasoning about imperative data structures, one needs to consider not only
complete linked lists (terminated with nil ) but also ‘partial lists’ or linked-list segments.
Here is an example of a list segment predicate describing lists from E to F (where F is
not allocated).
ls(x, y) ∗ ls(y, x)
These partial lists are sometimes used in the specifications of data structures, such
as queues. In other cases, they are needed to state the internal invariants of an algorithm,
even when the pre and post of a program use total lists only (total lists list(E) can be
regarded as abbreviations for segments ls(E, nil )). Here is a program from the S MALL -
FOOT tool [7] which exemplifies this point.
list_append(x,y) PRE: [list(x) * list(y)] {
local t;
if (x == NULL) {
x = y;
} else {
t = x; n = t->tl;
while (n != NULL) [ls(x,t) * t |-> n * list(n)] {
t = n;
n = t->tl;
}
t->tl = y;
} /* ls(x,t) * t |-> y * list(y) */
} POST: [list(x)]
This program, which appends two lists by walking down one and then swinging its last
pointer to the other, uses a partial list in its loop invariant, even though partial lists are
not needed in the overall procedure spec. In proving this program an important point is
how one gets from the last statement to the postcondition. A comment near the end of
the program shows an assertion describing what is known at that program point, and we
need to show that it implies the post to verify the program. That is, we need to show an
implication
This implication may seem unremarkable, but it is at this point that automatic tools must
begin to do something clever. For, consider how you, the human, would convince yourself
of the truth of this implication. If it were me, I would look at the semantics and prove
this fact by induction on the length of the list from x to t. But if we were to include such
reasoning in an automatic tool, we had better try to do so in an inductionless way, else
our tool will need to search for induction hypotheses (which is hard to make automatic).
Exercise 4 There are other definitions of list segments that have been used. Here is one,
the ‘imprecise list segment’.
ils(E, F ) ⇐⇒ (E = F ∧ emp)
∨ ∃y.E7→y ∗ ils(y, F )
Q1. What is a heap that distinguishes ls(10, 10) and ils(10, 10) ?
Q2. What distinguishes ls(10, 11) and ils(10, 11) ?
Q3. Prove or disprove the following laws (do your proof by working in the semantics)
Q4. Suppose we want to write a procedure that frees all the cells in a list segment.
For which of ils or ls can you do it? If you cannot do it for one of them, why not?
That is, we are asking for terminating programs satisfying
{ls(x, y)} delete_ls(x, y) {emp}
{ils(x, y)} delete_ils(x, y) {emp}
(I have not given you the definition of the truth of pre/post specs yet, but you
should be able to answer this question anyhow.)
In this sort of tree, nil is the empty tree and the leaves of a non-empty tree are those
3-tuples that have nil in their first and third components.
Give an inductive definition of a predicate tree(E, τ ), for τ ∈ Mtree. Hint: use a
points-to assertion of the form E7→[l: x, d: y, r: z] where d refers to the data, or atom,
field. Define the CopyTree and DispTree procedures for this sort of tree, and give
proof-by-execution style arguments for their correctness.
The proof rules for procedure calls, sequencing, conditionals and loops are the same as
in standard Hoare logic [42,43]. Here I concentrate on the rules for primitive commands
for accessing the heap, and the surrounding rules, called the ‘structural rules’. (If you are
unfamiliar with Hoare logic probably the best way to learn is to go directly to the early
sources, such as [42,44,43,37,27], which are pleasantly simple and easy to read.)
We will use the following abbreviations:
∆
E7→F0 , ..., Fn = (E7→F0 ) ∗ · · · ∗ (E + n7→Fn )
. ∆
E=F = (E = F ) ∧ emp
∆
E7→− = ∃y.E7→y (y 6∈ Free(E))
The first small axiom just says that if E points to something beforehand (so it is in
the domain of the heaplet), then it points to F afterwards, and it says this for a small
portion of the state (heaplet) in which E is the only active cell. This corresponds to
the operational idea of [E] := F as a command that stores the value of F at address
E in the heap. The other commands have similar explanations. Notice that each axiom
mentions only the cells accessed or allocated: the axioms talk only about footprints,
and not the entire global program state. We only get fixed-length allocation from x :=
cons(E1 , ..., Ek ). but it is also possible to axiomatize a command x := alloc(E) that
allocates a block of length E.
Notice that our axioms allow us to free any cell that is allocated, even from the mid-
dle of a block given by cons. This is different from the situation in the C programming
language, where you are only supposed to free an entire block that has been allocated by
malloc(). An elegant treatment of this problem has been given using predicate variables
in [66].
The assignment statement x := E is for a variable x and heap-independent arith-
metic expression E. Thus, this statement accesses and alters the store, but not the heap. It
is the assignment statement considered by Hoare in his original system [42]. In contrast,
the form [E] := F alters the heap but not the store.
To go along with the small axioms we have additional surrounding rules.
T HE S TRUCTURAL RULES
Frame Rule
{P }C{Q}
Modifies(C) ∩ Free(R) = ∅
{P ∗ R}C{Q ∗ R}
{P [E/x]} x := E {P }
can be derived, where x := E is the assignment statement that is heap independent. One
can also derive Floyd’s forwards-running axiom [36]
where the existentially quantified variable x0 (which must be fresh) provides a way to
talk about x’s value in the pre-state. The symbolic execution rules in S MALLFOOT and
related tools use forwards-running rules of this variety (Section 4.2).
As an example derived rule for a heap-accessing command, with the frame rule and
auxiliary variable elimination one can obtain an axiom from [73]
Exercise 7 Go back over the proof-by-execution style arguments you gave in the previous
exercises, and convince yourself that you can formalize them in the proof system given
in this section. You will probably want to use derived laws for each of the basic program
statements. In such proofs you get to use the semantics as an oracle when deciding the
implication statements in the rule of consequence.
The issues related to frame axioms that we discussed in Section 2.2 go a long way back,
to the beginning work on logic in AI [57]. Fundamentally, the reason why AI issues are
relevant to program logic is just that programmers describe their code in a way that cor-
responds to a commonsense reading of specifications, where much is left unsaid. Practi-
cally, if we do not employ some kind of solution to the AI problems, then specifications
quickly become extremely complicated [11].
Some people think that the real problem is in a way negative in nature, is to avoid
writing nasty frame axioms like we did in the ‘back in the day’ discussion in Section
2.2. Other people think the problem is just to have succinct specs, however one gets
them. I have always thought both of these, succinct specs and avoiding writing frame
axioms, should be a consequence of a solution, but are not themselves the problem. My
approach to this issue has always been to embrace the ‘commonsense reasoning’ aspect
first, and for this the idea of a ‘tight specification’ is crucial: the idea is that if you don’t
say that something changes, then it doesn’t. For example, if you say that a robot moves
block A from position 1 to position 2, then the commonsense reading is that you are
implicitly saying as well that this action does not change the position of a separate block
B (unless, perhaps, block B is on top of block A). Programmers’ informal descriptions
of their code are similar. In the java.util List interface the description of the copy method
is just that it ‘copies all of the elements from one list into another’. There is no mention
of frames in the description: the description carries the understanding that the frame
remains unchanged. The need to describe the frames explicitly in some formalisms is
just an artefact, which programmers do not find necessary when talking about their code
(because of this commonsense reasoning that they employ).
Be that as it may, formalization of the notion of tight specification proved to be
surprisingly difficult, and in AI there have been many elaborate theories advanced to try
to capture this notion – circumscription, default logic, nonmonotonic logic, and more
– far too many to give a proper account of here. Without claiming to be able to solve
the general AI problem, this section explains how an old idea in program logic, when
connected to the principle of local reasoning (that you only need to talk about the cells a
program touches), gives a powerful and yet very simple approach to tight specifications.
The old idea is of fault-avoiding specifications. To formulate this, let us suppose that
we have a semantics of commands where C, σ ;∗ σ 0 indicates that there is a terminating
computation of command C from state σ to state σ 0 . In the RAM model σ can be a
pair of a store and a heap, but the notion can be formulated at a more general level than
this particular model. Additionally, we require a judgement form C, σ ;∗ fault. In
the RAM model, fault can be taken to indicate a memory fault: a dereferencing of a
dangling pointer or a double-free. Again, more generally, fault can be used for other
sorts of errors.
Here, then, is a fault-avoiding semantics of triples, where for generality we are view-
ing the preconditions and postconditions as sets of states rather than as formulae written
in some particular assertion language.
Faut-Avoiding Partial Correctness
{A} C {B} holds iff ∀σ ∈ A
1. no faults: C, σ 6;∗ fault
2. partial correctness: C, σ ;∗ σ 0 implies σ 0 ∈ B.
The ‘no faults’ clause is a reasonable thing to have as a way for proven programs to avoid
errors, and was used as far back as Hoare and Wirth’s axiomatic semantics of Pascal in
1973 [45]. Notice that the small axioms given above are already in a form compatible
with the fault-avoiding semantics. For instance, in the axiom
the E7→− in the precondition ensures that E is not a dangling pointer, and so [E] := F
will not memory fault.
Remarkably, besides ensuring that well-specified programs avoid certain errors, it
was realized much later [47] that the fault-avoiding interpretation gives us an approach
to tight specifications. The key point is a consequence of the ‘no faults’ clause: touching
any cells not known to be allocated in the precondition falsifies the triple, so any cells
not ‘mentioned’ (known to be allocated) in the pre will remain unchanged. To see why,
suppose I tell you
{10 7→ −} C {10 7→ 25}
but I don’t tell you what C is. Then I claim C cannot change location 11 if it happens to
be allocated in the pre-state (when 10 is also allocated). For, if C changed location 11, it
would have to access location 11, and this would lead to fault when starting in a state
where 10 is allocated and 11 is not. That would falsify the triple (no error clause). As a
consequence we obtain that
{107→ − ∗117→4} C {107→25 ∗ 117→4}
should hold.
This reasoning is the basis for the frame rule. But the semantic fact that location 11
doesn’t change is completely independent of separation logic. In fact, we could state a
similar conclusion without mentioning ∗ at all
{10,→ − ∧11,→4} C {10,→25 ∧ 11,→4}
Separation logic, and the frame rule, only give you a convenient way to exploit the tight-
ness (that things don’t change if you don’t mention them) in the fault-avoiding interpre-
tation. This tightness phenomenon is in a sense at a more fundamental level, prior to
logic.
It is useful to consider that for this approach to tight specifications to work fault
does not literally need to indicate memory fault, and it is not necessary to use a low-level
memory model. For instance, we can put a notion of ‘accesses’ or ‘ownership’ in a model,
and then when the program strays beyond what is owned we declare a specification false:
then, the same argument as above lets us conclude that certain cells do not change. This
is the idea used in implicit dynamic frames [77], and in separation logics for garbage-
collected languages like Java where there are no memory faults (e.g., [66]). Alternate
approaches may be found in [4,3,49].
I have tried to explain the basis for tight specifications above in a semi-formal way.
But, the reader might have noticed that there were some unstated assumptions behind my
arguments. One can imagine mathematical relations on states and fault that contradict
our conclusion that 11 will remain unchanged. One such relation is as follows: if the
input heap is a singleton, it sets the contents of the only allocated location to be 25, and
otherwise sets all allocated locations in the input heap to have contents 50. This is not
a program that you can write in C, but it shows that that there are locality properties
of the semantics of programs at work behind the tight interpretation of triples, and it is
important theoretically to set these conditions down precisely; see [86,18,72].
Exercise 9 Without saying what the commands C are, and ignoring the store component
(i.e., think about heap only), formulate sufficient conditions on the relations C, σ ;∗ σ 0
and C, σ ;∗ fault which make the frame rule valid according to fault-avoiding partial
correctness. Give a proof of the validity of the frame rule from these conditions.
Are your conditions necessary as well as sufficient?
4. Symbolic Heaps, Symbolic Execution and Abstract Interpretation
When designing an automatic program verification tool there are almost always compro-
mises to be made, forced by the constraints of recursive undecidability of so many ques-
tions about logics and programs. The first tools based on separation logic chose to restrict
attention to a certain format of assertions which made three tasks easier than they might
otherwise have been: symbolic execution, entailment checking, and frame inference.
Symbolic heaps [6,30] are formulae of the form
~
∃X.(P 1 ∧ · · · Pn ) ∧ (S1 ∗ · · · ∗ Sm )
where the Pi and Sj are primitive pure and spatial predicates, and X~ is a vector of logical
variables (variables not used in programs). We understand the nullary conjunction of Pi ’s
as true and the nullary ∗-conjunction of Si ’s as emp. The special form of symbolic heaps
does not allow, for instance, nesting of ∗ and ∧, or boolean negation ¬ around ∗, or the
separating implication −∗ . This special form was chosen, originally, to match the usage
of separation logic in a number of by-hand proofs that had been done. The form does not
cover all proofs, such as Yang’s proof of the Schorr-Waite algorithm [84], so there are
immediately-known limitations.
The grammar for symbolic heaps can be instantiated with different sets of basic pure
and spatial predicates. Pure formulae are heap-independent, and describe properties of
variables only, where the spatial formulae specify properties of the heap. One instantia-
tion is as follows
S IMPLE L ISTS I NSTANTIATION
H x := E =⇒ x = E[X/x] ∧ H[X/x]
H ∗ E7→F x := [E] =⇒ x = F [X/x] ∧ H ∗ E7→F )[X/x]
H ∗ E7→F [E] := G =⇒ H ∗ E7→G
H x := cons(−) =⇒ H[X/x] ∗ x7→X
H ∗ E7→F free(E) =⇒ H
With the convention that the logical variables are implicitly existentially quantified, the
first rule is just a restating of Floyd’s axiom for assignment. The other rules can be
obtained from the small axioms of Section 3 by applications of the structural rules.
The rules for x := [E], [E] := G and [E] := G all assume that we have E7→F
explicitly in the precondition. In some cases, this knowledge that E points to some-
thing will be somewhat less explicit, as in the symbolic heap E = E 0 ∧ E 0 7→F . Then,
a simple amount of logical reasoning can convert this formula to the equivalent form
E = E 0 ∧ E7→F , which is now ready for an execution step. In another case, lsne(E, F ),
we might have to unroll the inductive definition to reveal the 7→. In general, for any of
these heap-accessing forms, we need to massage a symbolic heap to ‘make E7→ explicit’.
Here are sample rules for doing this massaging.
R EARRANGEMENT RULES
H0 ∗ E7→F, A(E) =⇒ H1
H0 ∗ lsne(E, F ), A(E) =⇒ H1
H 6` Allocated(E)
H, A(E) =⇒ fault
Is either formula satisfiable? Might this affect any of the steps in symbolic exe-
cution?
4. (Advanced) Write an inductive definition for a predicate that describes doubly-
linked list segments. It should have four arguments. Be careful about the base
case.
Write rearrangement rules for this doubly-linked list predicate.
of Hoare logic.
This verification strategy relies on having a theorem prover to answer entailment
questions H ` H 0 . A straightforward embedding of separation logic into a classical
logic, where one writes the semantics in the target logic (e.g., ‘∃σ1 σ2 .σ = σ1 • σ2 ...’),
has not yet yielded an effective prover, because it introduces existential quantifiers to give
the semantics of ∗. Therefore, proof tools for separation logic has used dedicated proof
procedures, built from the proof rules of the logic. (Work is underway on more nuanced
interpretations into existing provers that do more than a direct semantic embedding.)
An approach to proving symbolic heaps was pioneered by Josh Berdine and Cristiano
Calcagno [6]. Their approach revolves around proof rules for abstraction and subtraction.
A sample abstraction rule is
Q1 ` Q2
Q1 ∗ S ` Q2 ∗ S
Their basic idea is to try to reduce an entailment to an axiom B ∧ emp ` true ∧ emp by
successively applying abstraction rules, and Subtracting when possible. The basic idea
can be appreciated by considering two examples.
First, a successful example:
^¨
emp ` emp Axiom!
list(x) ` list(x) Subtract
ls(x, t) ∗ list(t) ` list(x) Abstract (Inductive)
ls(x, t) ∗ t7→y ∗ list(y) ` list(x) Abstract (Roll)
The entailment on the bottom is the one we needed to prove at the end of the
list_append procedure from Section 3.2. The first step, going upward, is a simple
rolling up of an inductive definition. The second step is more serious: it is one that we
would use induction in the metalanguage to justify. We then get to a position where we
can apply the subtraction rule, and this gets us back to a basic axiom.
For an unsuccessful example
_¨
list(y) ` emp Junk: Not Axiom!
list(x) ∗ list(y) ` list(x) Subtract
ls(x, t) ∗ t7→nil ∗ list(y) ` list(x) Abstract (Inductive)
The last line is an entailment that S MALLFOOT would attempt to prove if the statement
t->tl = y at the end of the list_append program were replaced by t->tl =
nil. There we do an abstraction followed by a subtraction and we get to a position
where we cannot reduce further. Rightly, we cannot prove this entailment.
The detailed design and theoretical analysis of a proof theory based on these ideas
is nontrivial. For the specific case of singly-linked list segments, Berdine and Calcagno
were able to formulate a complete and terminating proof theory. There is no space to go
into all the details of their theory, but it is worth listing their abstraction rules, presented
here as entailments.
Rolling
emp ` ls(E, E)
E1 6= E3 ∧ E1 7→E2 ∗ ls(E2 , E3 ) ` ls(E1 , E3 )
Induction Avoidance
ls(E1 , E2 ) ∗ ls(E2 , nil ) ` ls(E1 , nil )
ls(E1 , E2 ) ∗ E2 7→nil ` ls(E1 , nil )
ls(E1 , E2 ) ∗ ls(E2 , E3 ) ∗ E3 7→E4 ` ls(E1 , E3 ) ∗ E3 7→E4
E3 6= E4 ∧ ls(E1 , E2 ) ∗ ls(E2 , E3 ) ∗ ls(E3 , E4 )
` ls(E1 , E3 ) ∗ ls(E3 , E4 )
The remarkable thing about these abstraction rules is not that they are sound, but in a
sense complete: any true fact about list segments and points-to facts that can be expressed
in symbolic heap form can be proven using these axioms, without appealing to an explicit
induction axiom or rule. The Berdine/Calcagno proof theory works by using these rules
on the left (in effect employing a special case of the Cut rule of sequent calculus). It has
other rules as well, such as for inferring x 6= nil from x7→−: at every stage, their decision
procedure records as many pure disequalities as possible on the left, and it substitutes
out all equalities, getting to a kind of normal form. It is this normal form that makes the
subtraction rule complete (a two-way inference rule).
Note: in this subsection we have gone back to the ls rather than lsne predicate, as
Berdine and Calcagno formulated their rules for ls. In fact, it is easier to design a com-
plete proof theory for lsne rather than ls. It is also relatively easy (and was folklore
knowledge) to see that entailment for the lsne symbolic heaps can be decided in poly-
time, but the question for the ls remained open until a recent paper which showed the
entailment is indeed in polytime in the original problem [23]. (The reason for subtlety in
this question is related to question 3 of Exercise 10; you might go back there and wonder
about it.)
I like to call this approach of combining abstraction and subtraction rules the
‘crunch, crunch’ method. It works by taking a sequent H ` H 0 and applying abstraction
and subtraction rules to crunch it down to a smaller size by removing ∗-conjuncts, until
you get emp as the spatial part on one side or the other of `. If you have emp on only one
side, you have a failed proof. If you have other pure facts, of the form Π∧emp ` Π0 ∧emp
you can then ask a straight classical logic question Π ` Π0 . The final check, Π ` Π0 ,
is a place where one could call an external theorem prover, say for a decidable theory
such as linear arithmetic, and that is all the more useful when the pure part can contain a
richer variety of assertions than in the simple fragment considered in this section. Indeed,
there have been a number of provers for separation logic developed that use variations on
this ‘crunch, crunch’ approach together with an external classical logic solver, including
[13,68] and the provers inside V ERIFAST [48], J S TAR [31], HIP [60] and SLAYER [9].
Entailment is a standard problem for verifiers to face. In work applying separation logic,
a pivotal development has been identification of the notion of frame inference, which is
an extension of the entailment question:
In a frame inference question of the form
A ` B ∗ ?frame
the task is, given A and B, to find a formula ?frame which makes the entailment
valid.
Frame inference gives a way to find the ‘leftover’ portions of heap needed to automati-
cally apply the frame rule in program proofs. This extended entailment capability is used
at procedure call sites, where A is an assertion at the call site and B a precondition from
a procedure’s specification.
A first solution to frame inference was sketched in [6] and implemented in the
S MALLFOOT tool. The S MALLFOOT approach works by using information from failed
proofs of the standard entailment question A ` B. Essentially, a failed proof of the form
F ` emp
..
.
A`B
tells us that F is a frame. For, from such a failed proof we can form a proof
F `F
F ` emp ∗ F
..
.
A`B∗F
by tacking ∗F on the right everywhere in the failed proof. So, the frame inferring pro-
cedure is to go upwards using the ‘crunch, crunch’ proof search method until you can
go no further: if your attempted proof is of the form indicated above, it can tell you a
frame. (Dealing with multiple branches in proofs requires some more subtlety than this
description indicates.)
Frame inference is a workhorse of separation logic verification tools. As you can
imagine from the discussion surrounding Disptree in Section 2, it is used at procedure
call sites to identify the part of a symbolic heap that is not not touched by a procedure.
Interprocedural program analysis tools typically use (often incomplete) implementations
of frame inference for reasoning with ‘procedure summaries’ [39,59]. In S MALLFOOT,
proof rules for critical regions in concurrent programs are verified using little phantom
procedures (called ‘specification statements’) with specs of the form {emp} − {R} and
{R} − {emp} for materializing and annihilating portions of storage protected by a lock.
Indeed, if one has a good enough frame inference capacity, then symbolic execution can
be seen to be a special case of a more general scheme, where basic commands are treated
as specification statements (the small axioms), and frame inference is used in place of the
special concept of rearrangement. S MALLFOOT and S PACE I NVADER did not follow this
idealistic approach, preferring to optimize for the common case of the basic statements,
but the more recent J S TAR and SLAYER are examples of tools that call a frame inferring
theorem prover at every step of symbolic execution [31,9].
Beginning in 2006 [30,52], a significant amount of work has been done on the use of
separation logic in automatic program analysis. There are a number of academic tools,
including S PACE I NVADER [5,85], T HOR [53], X ISA [20], F ORRESTER [41], P REDATOR
[33], S MALLFOOT RG [19], H EAP -H OP [83] and J S TAR [31], and the industrial tools
I NFER from Monoidics [16] and SLAYER from Microsoft [9]. The tools in this area are
a new breed of shape analysis, which attempt to discover the shapes of data structures
(e.g., whether a list is cyclic or acyclic) in a program [75]. These tools cannot prove
functional correctness, but can be applied to code in the thousands or even millions of
LOC [85,17].
The general context for this use of separation logic concerns the relation between
program logic and program analysis. It has been known since the work of the Cousots
in the 1970s [24,25] that concepts from Hoare logic and static program analysis are
related. In principle, static analysis can be used to calculate loop invariants and procedure
specifications via fixed-point computations, thereby lessening annotation burden. There
is a price to pay in that trying to be completely automatic in this way almost forces one
to step away from the ideal of proving full functional correctness.
While the relation between analysis and verification has been long known in prin-
ciple, the last decade has seen a surge of interest in verification-by-static-analysis, with
practical demonstrations of its potential such as in SLAM’s application of proof tech-
nology to Microsoft device drivers [2] and ASTRÉE’s proof of the absence of run-time
errors in Airbus code [26]. Separation logic enters the picture because these practical
tools for verification-oriented static analysis ignore pointer-based data structures, or use
coarse models that are insufficient to prove basic properties of them; e.g., SLAM assumes
memory safety, and ASTRÉE works only on input programs that do not use dynamic
allocation. Similar remarks apply to other tools such as BLAST, Magic and others. Data
structures present a significant problem in verification-oriented program analysis, and
that is the point that the separation logic program analyses are trying to address.
This section illustrates the ideas in the abstractions used in separation logic program
analyzers. To begin, suppose you were to continually symbolically execute a program
with a while loop. You collect sets of formulae (abstract states) at program points, and
generate new ones by symbolically executing program statements. The immediate prob-
lem is that you would go on generating symbolic heaps on loop iterations, and the pro-
cess could diverge: you would never stop generating new symbolic heaps. The most basic
idea of program analysis is to use abstraction, the losing of information, to ensure that
such a process terminates.
Consider the following program that creates a linked list of indeterminate length.
{P re : emp}
x := nil;
while (nondet()){
y := cons(-);
y→tl := x;
x := y;
}
Suppose we start symbolically executing the program from pre-state emp. On the first
iteration, at the program point immediately inside the loop, we will certainly have that
x = nil ∧ emp is true, so let us record this in
Loop Invariant so far (1st iteration)
x = nil ∧ emp.
Now, if we go around the loop once more, then it is clear that x 7→ nil will be true at the
same program point, so let us add that to our calculation.
Loop Invariant so far (2nd iteration)
(x = nil ∧ emp) ∨ (x 7→ nil ).
At the next step we get
Loop Invariant so far (3rd iteration)
(x = nil ∧ emp) ∨ (x 7→ nil ) ∨ (x7→X ∗ X7→nil )
because we put another element on the front of the list. If we keep going this way, we
will get lists of length 3, 4 and so on: infinite regress. However, before we go around the
loop again, we might employ abstraction, to conclude that we have a list segment. That
is, we use the entailment
to obtain
Loop Invariant so far (4th iteration after abstraction)
(x = nil ∧ emp) ∨ (x 7→ nil ) ∨ lsne(x, nil )
Lo and behold, what we have obtained on the 4th iteration after abstraction is the same
as the 3rd. We might as well stop now, as further execution of this variety will not give
us anything new: we have reached a fixed-point. As it happens, this loop invariant is also
the postcondition of the procedure in this example.
In this narrative the human (me) was the abstract interpreter, choosing when and
how to do abstraction. To implement a tool we need to make it systematic. In the
S PACE I NVADER breed of tools, this is done using rewrite rules that correspond to
Berdine/Calcagno abstraction rules for entailment described in Section 4.4. The abstract
interpreter is sound automatically because applying those rules to simplify formulae is
just using the Hoare rule of consequence on the right. The art is in not applying the
rules too often, which would make one lose too much information, sometimes resulting
in fault coming out of your abstract interpreter for perfectly safe problems (a ‘false
alarm’).
The way you set up a proof-theoretic abstract interpreter is as follows. In addition to
symbolic execution rules, there are abstraction rules which you apply periodically (say,
when going through loop iterations); this allows the exection process to saturate (find a
fixed-point). Here are some of the rules used in (baby) S PACE I NVADER [30].
(ρ, ρ0 range over lsne, 7→)
~
∃X.H ~
∗ ρ(x, Y ) ∗ ρ0 (Y, Z) −→ ∃X.H ∗ lsne(x, Z) where Y not free in H
~
∃X.H ~
∗ ρ(Y, Z) −→ ∃X.H ∗ true where Y not provably reachable
from program vars
The first rule says to forget about the length of uninterrupted list segments, where there
are no outside pointers (from H) into the internal point. The abstraction ‘gobbles up’
logical variables appearing in internal points of lists, by swallowing them into list seg-
ments, as long as these internal points are unshared. This is true of either free or bound
logical variables. The requirement that they not be shared is an accuracy rather than
soundness consideration; we stop the rules from firing too often, so as not to lose too
much information.
[A remark on terminology: What I am simply calling the ‘abstraction’ step is a
special case of what is called Widening in the abstract interpretation literature [24], and
a direct analogue of the ‘canonical abstraction’ used in 3-valued shape analysis [76].]
Exercise 11 Define a program that never faults, but for which the abstract semantics just
sketched returns fault.
Although I have not given a fully formal specification of the abstract interpreter
above, thinking about the nature of the list segment predicates and the restricted syntax
of symbolic heaps is one way to find such a program (e.g., if the program needs a loop
invariant that is not expressible with finitely many symbolic heaps).
This exercise actually concerns a general point in program analysis. If you have a ter-
minating analysis that is trying to solve an undecidable problem, it must necessarily be
possible to trick the analysis. Since most interesting questions about programs are un-
decidable, we must accept that any program analysis for these questions will have an
heuristic aspect in its design.
This specific abstraction idea in the illustration in this section, to forget about the length
of uninterrupted list segments, is sometimes called ‘the Distefano abstraction’: it was
defined by Distefano in his PhD thesis [29]. The idea does not depend on separation
logic, and similar ideas have been used in other abstract domains, such as based on 3-
valued logic [54] or on graphs [55]. Once S MALLFOOT’s symbolic execution appeared,
it simply was relatively easy to port Distefano’s abstraction to separation logic, when
defining BABY S PACE I NVADER [30]. Around the same time, very similar ideas were
independently discovered by Magill et. al. [52].
These first abstract interpreters did not achieve a lot practically, but opened up
the possibility of exploring the use of separation logic in program analysis. A growing
amount of work has been going forward in a number of directions, an incomplete list of
which includes the following, which are good places to start for further reading.
1. The use of the frame rule and frame inference A ` B ∗ ?frame in interproce-
dural analysis [39];
2. The use of abductive inference A ∗ ?antiframe ` B to approximate foot-
prints, leading to a compositional analysis, and a boost to the level of automation
and scalability [17];
3. The use of a higher-order list segment notion to attack complicated data structures
in device drivers [5,85];
4. Analyses for concurrent programs [40];
5. Automatic parallelization [71,46];
6. Program-termination proving [8,51,14];
7. Analysis of data structures with sharing [21,50].
This last section has been but a sketch, and I have left out a lot of details. I will
possibly extend these notes in future to put more formalities into this last section. For
now I just point you to [30] for a mathematically thorough description of one abstract
interpreter based on separation logic.
The leading edge, as of 2011, of what can be achieved practically on real-world
code by these tools is probably represented by SLAYER [9] and I NFER [16], and can
be glimpsed by academic papers that fed into them [85,17]. However, there are some
areas (sharing, trees) where academic prototypes outperform them precision-wise, and
the leading edge in any case is moving quickly at this moment.
I have talked about automatic verification and analysis in these notes, but many of the
ideas – such as frame inference, symbolic execution, abstraction/subtraction-based proof
theory – are relevant as well in interactive proving. There have been several embeddings
of separation logic in higher-order logics used in interactive proof assistants (e.g., [56,
35,82,79,58,1]), where the proof theoretic or symbolic execution rules are derived as
lemmas. A recent paper [22] gives a good account of the state of the art and references
to the literature, as well as an explanation of expressivity limitations of approaches to
program verification based on automatic theorem proving for first-order logic.
It should be mentioned that there is no conflict here of having several logics (sepa-
ration, first-order, higher-order, etc.): there is no need to search for ‘the one true logic’.
In particular, even though they can be embedded in foundational higher-order logics,
special-purpose formalisms like separation and modal and temporal logics are useful for
identifying specification and reasoning idioms that make specifications and proofs easier
to find, for either the human or the machine.
References
[1] A.W. Appel. VeriSmall: Verified Smallfoot shape analysis. CPP 2011: First International Conference
on Certified Programs and Proofs, 2011.
[2] T. Ball, E. Bounimova, B. Cook, V. Levin, J. Lichtenberg, C. McGarvey, B. Ondrusek, S.K. Rajamani,
and A. Ustuner. Thorough static analysis of device drivers. In Proceedings of the 2006 EuroSys Confer-
ence, pages 73–85, 2006.
[3] A. Banerjee, D.A. Naumann, and S. Rosenberg. Regional logic for local reasoning about global invari-
ants. In 22nd ECOOP, Springer LNCS 5142, pages 387–411, 2008.
[4] M. Barnett, R. DeLine, M. Fahndrich, K.R.M. Leino, and W. Schulte. Verification of object-oriented
programs with invariants. Journal of Object Technology, 3(6):27–56, 2004.
[5] J. Berdine, C. Calcagno, B. Cook, D. Distefano, P. O’Hearn, T. Wies, and H. Yang. Shape analysis of
composite data structures. 19th CAV, 2007.
[6] J. Berdine, C. Calcagno, and P.W. O’Hearn. Symbolic execution with separation logic. In K. Yi, editor,
APLAS 2005, volume 3780 of LNCS, 2005.
[7] J. Berdine, C. Calcagno, and P.W. O’Hearn. Smallfoot: Automatic modular assertion checking with
separation logic. In 4th FMCO, pp115-137, 2006.
[8] J. Berdine, B. Cook, D. Distefano, and P. O’Hearn. Automatic termination proofs for programs with
shape-shifting heaps. In 18th CAV, Springer LNCS 4144, pages pp386–400, 2006.
[9] J. Berdine, B. Cook, and S. Ishtiaq. Slayer: Memory safety for systems-level code. In 23rd CAV, Springer
LNCS 6806, pages 178–183, 2011.
[10] B. Biering, L. Birkedal, and N. Torp-Smith. BI-hyperdoctrines, higher-order separation logic, and ab-
straction. ACM TOPLAS, 5(29), 2007.
[11] A. Borgida, J. Mylopoulos, and R. Reiter. On the frame problem in procedure specifications. IEEE
Transactions of Software Engineering, 21:809–838, 1995.
[12] R. Bornat. Proving pointer programs in Hoare logic. Mathematics of Program Construction, 2000.
[13] M. Botincan, M.J. Parkinson, and W. Schulte. Separation logic verification of C programs with an SMT
solver. Electr. Notes Theor. Comput. Sci., 254:5–23, 2009.
[14] J. Brotherston, R. Bornat, and C. Calcagno. Cyclic proofs of program termination in separation logic.
In 35th POPL, pages 101–112, 2008.
[15] R.M. Burstall. Some techniques for proving correctness of programs which alter data structures. Ma-
chine Intelligence, 7:23–50, 1972.
[16] C. Calcagno and D. Distefano. Infer: An automatic program verifier for memory safety of C programs.
In NASA Formal Methods Symposium, Springer LNCS 6617, pages 459–465, 2011.
[17] C. Calcagno, D. Distefano, P.W. O’Hearn, and H. Yang. Compositional shape analysis by means of
bi-abduction. Journal of the ACM 58(6). (Preliminary version appeared in POPL’09.), 2011.
[18] C. Calcagno, P. O’Hearn, and H. Yang. Local action and abstract separation logic. In 22nd LICS,
pp366-378, 2007.
[19] C. Calcagno, M.J. Parkinson, and V. Vafeiadis. Modular safety checking for fine-grained concurrency.
In 14th SAS, Springer LNCS 4634, pages 233–248, 2007.
[20] B. Chang and X. Rival. Relational inductive shape analysis. In 36th POPL, pages 247–260. ACM, 2008.
[21] R. Cherini, L. Rearte, and J.O. Blanco. A shape analysis for non-linear data structures. In 17th SAS,
Springer LNCS 6337, pages 201–217, 2010.
[22] A. Chlipala. Mostly-automated verification of low-level programs in computational separation logic. In
32nd PLDI, pages 234–245, 2011.
[23] B. Cook, C. Haase, J. Ouaknine, M.J. Parkinson, and J. Worrell. Tractable reasoning in a fragment of
separation logic. In 22nd CONCUR, Springer LNCS 6901, pages 235–249, 2011.
[24] P. Cousot and R. Cousot. Abstract interpretation: A unified lattice model for static analysis of programs
by construction or approximation of fixpoints. In 4th POPL, pp238-252, 1977.
[25] P. Cousot and R. Cousot. Systematic design of program analysis frameworks. 6th POPL, pp269-282,
1979.
[26] P. Cousot, R. Cousot, J. Feret, L. Mauborgne, A. Miné, D. Monniaux, and X. Rival. The ASTRÉE
analyzer. 14th ESOP, pp21-30, 2005.
[27] E.W. Dijkstra. A Discipline of Programming. Prentice Hall, 1976.
[28] T. Dinsdale-Young, P. Gardner, and M.J. Wheelhouse. Abstraction and refinement for local reasoning.
In 3rd VSTTE, Springer LNCS 6217, pages 199–215, 2010.
[29] D. Distefano. On model checking the dynamics of object-based software: a foundational approach. PhD
thesis, University of Twente, 2003.
[30] D. Distefano, P. O’Hearn, and H. Yang. A local shape analysis based on separation logic. In 12th
TACAS, 2006. pp287-302.
[31] D. Distefano and M. Parkinson. jStar: Towards Practical Verification for Java. In 23rd OOPSLA, pages
213–226, 2008.
[32] Mike Dodds, Suresh Jagannathan, and Matthew J. Parkinson. Modular reasoning for deterministic par-
allelism. In 38th POPL, pages 259–270, 2011.
[33] K. Dudka, P. Peringer, and T. Vojnar. Predator: A practical tool for checking manipulation of dynamic
data structures using separation logic. In 23rd CAV, Springer LNCS 6806, pages 372–378, 2011.
[34] X. Feng, R. Ferreira, and Z. Shao. On the relationship between concurrent separation logic and assume-
guarantee reasoning. In 16th ESOP, Springer LNCS 4421, 2007.
[35] X. Feng, Z. Shao, Y. Guo, and Y. Dong. Combining domain-specific and foundational logics to verify
complete software systems. In 2nd VSTTE, Springer LNCS 5295, pages 54–69, 2008.
[36] R.W. Floyd. Assigning meaning to programs. Proceedings of Symposium on Applied Mathematics, Vol.
19, J.T. Schwartz (Ed.), A.M.S., pp. 19Ð32, 1967.
[37] M. Foley and C.A.R. Hoare. Proof of a recursive program: Quicksort. Computer Journal, 14:391–395,
1971.
[38] P. Gardner, S. Maffeis, and G. Smith. Towards a program logic for Javascript. In 40th POPL. ACM,
2012.
[39] A. Gotsman, J. Berdine, and B. Cook. Interprocedural shape analysis with separated heap abstractions.
In 13th SAS,Springer LNCS 4134, pages 240–260, 2006.
[40] A. Gotsman, J. Berdine, B. Cook, and M. Sagiv. Mostly-automated verification of low-level programs
in computational separation logic. In 28th PLDI, pages 266–277, 2007.
[41] P. Habermehl, L. Holík, A. Rogalewicz, J. Simácek, and T. Vojnar. Forest automata for verification of
heap manipulation. In 23rd CAV, Springer LNCS 6806, 2011.
[42] C.A.R. Hoare. An axiomatic basis for computer programming. Comm. ACM, 12(10):576–580 and 583,
1969.
[43] C.A.R. Hoare. Procedures and parameters: An axiomatic approach. In E. Engler, editor, Symposium on
the Semantics of Algebraic Languages, pages 102–116. Springer, 1971. Lecture Notes in Math. 188.
[44] C.A.R. Hoare. Proof of a Program: FIND. Comm. ACM, 14(1):39–45, 1971.
[45] C.A.R. Hoare and N. Wirth. An axiomatic definition of the programming language Pascal. Acta Infor-
matica, 2:335–355, 1973.
[46] C. Hurlin. Automatic parallelization and optimization of programs by proof rewriting. In 16th SAS,
Springer LNCS 5673, pages 52–68, 2009.
[47] S. Isthiaq and P. W. O’Hearn. BI as an assertion language for mutable data structures. In 28th POPL,
pages 36–49, 2001.
[48] B. Jacobs, J. Smans, P. Philippaerts, F. Vogels, W. Penninckx, and F. Piessens. Verifast: A powerful,
sound, predictable, fast verifier for C and Java. In NASA Formal Methods Symposium, Springer LNCS
6617, pages 41–55, 2011.
[49] I.T. Kassios. The dynamic frames theory. Formal Asp. Comput., 23(3):267–288, 2011.
[50] O. Lee, H.Yang, and R. Petersen. Program analysis for overlaid data structures. In 23rd CAV, Springer
LNCS 6808, pages 592–608, 2011.
[51] S. Magill, J. Berdine, E.M. Clarke, and B. Cook. Arithmetic strengthening for shape analysis. In 14th
SAS, Springer LNCS 4634, pages 419–436, 2007.
[52] S. Magill, A. Nanevski, E. Clarke, and P. Lee. Inferring invariants in Separation Logic for imperative
list-processing programs. 3rd SPACE Workshop, 2006.
[53] S. Magill, M.-S. Tsai, P. Lee, and Y.-K. Tsay. THOR: A tool for reasoning about shape and arithmetic.
20th CAV, Springer LNCS 5123. pp 428-432, 2008.
[54] R. Manevich, E. Yahav, G. Ramalingam, and M. Sagiv. Predicate abstraction and canonical abstraction
for singly-linked lists. In 6th VMCAI, pages pp181–198, 2005.
[55] M. Marron, M.V. Hermenegildo, D. Kapur, and D. Stefanovic. Efficient context-sensitive shape analysis
with graph based heap models. In 17th CC, Springer LNCS 4959, pages 245–259, 2008.
[56] N. Marti and R. Affeldt. A certified verifier for a fragment of separation logic. Computer Software,
25(3):135-147, 2008.
[57] J. McCarthy and P. Hayes. Some philosophical problems from the standpoint of artificial intelligence.
Machine Intelligence, 4:463–502, 1969.
[58] A. Nanevski, V. Vafeiadis, and J. Berdine. Structuring the verification of heap-manipulating programs.
In 37th POPL, pages 261–274, 2010.
[59] H.H. Nguyen and W.-N. Chin. Enhancing program verification with lemmas. 20th CAV, Springer LNCS
5123. pp 355-369, 2008.
[60] H.H. Nguyen, C. David, S. Qin, and W.-Ngan Chin. Automated verification of shape and size properties
via separation logic. In 8th VMCAI, Springer LNCS 4349, pages 251–266, 2007.
[61] P. O’Hearn, J. Reynolds, and H. Yang. Local reasoning about programs that alter data structures. In
15th CSL, pp1-19, 2001.
[62] P. W. O’Hearn. Resources, concurrency and local reasoning. Theoretical Computer Science, 375(1-
3):271–307, 2007. (Preliminary version appeared in CONCUR’04, LNCS 3170, pp49-67).
[63] P. W. O’Hearn and D. J. Pym. The logic of bunched implications. Bulletin of Symbolic Logic, 5(2):215–
244, June 99.
[64] P.W. O’Hearn, H. Yang, and J.C. Reynolds. Separation and information hiding. ACM TOPLAS, 31(3),
2009.
[65] M. Parkinson, R. Bornat, and C. Calcagno. Variables as resource in Hoare logics. In 21st LICS, 2006.
[66] M. J. Parkinson. Local Reasoning for Java. Ph.D. thesis, University of Cambridge, 2005.
[67] M.J. Parkinson and A.J. Summers. The relationship between separation logic and implicit dynamic
frames. In 20th ESOP, Springer LNCS 6602, pages 439–458, 2011.
[68] J. A. Navarro Pérez and A. Rybalchenko. Separation logic + superposition calculus = heap theorem
prover. In 32nd PLDI, pages 556–566, 2011.
[69] D. Pym, P. O’Hearn, and H. Yang. Possible worlds and resources: the semantics of BI. Theoretical
Computer Science, 315(1):257–305, 2004.
[70] D.J. Pym. The Semantics and Proof Theory of the Logic of Bunched Implications. Applied Logic Series.
Kluwer Academic Publishers, 2002.
[71] M. Raza, C. Calcagno, and P. Gardner. Automatic parallelization with separation logic. In 18th ESOP,
Springer LNCS 5502, pages 348–362, 2009.
[72] M. Raza and P. Gardner. Footprints in local reasoning. Logical Methods in Computer Science, 5(2),
2009.
[73] J. C. Reynolds. Intuitionistic reasoning about shared mutable data structure. In Jim Davies, Bill Roscoe,
and Jim Woodcock, editors, Millennial Perspectives in Computer Science, pages 303–321, Houndsmill,
Hampshire, 2000. Palgrave.
[74] J. C. Reynolds. Separation logic: A logic for shared mutable data structures. In 17th LICS, pp55-74,
2002.
[75] M. Sagiv, T. Reps, and R. Wilhelm. Solving shape-analysis problems in languages with destructive
updating. ACM TOPLAS, 20(1):1–50, 1998.
[76] M. Sagiv, T. Reps, and R. Wilhelm. Parametric shape analysis via 3-valued logic. ACM TOPLAS,
24(3):217–298, 2002.
[77] J. Smans, B. Jacobs, and F. Piessens. Implicit dynamic frames: Combining dynamic frames and separa-
tion logic. In 23rd ECOOP, LNCS 5653, pages 148–172, 2009.
[78] C. Strachey. Towards a formal semantics. In T. B. Steel, Jr., editor, Formal Language Description
Languages for Computer Programming, Proceedings of the IFIP Working Conference, pages 198–220,
Baden bei Wien, Austria, September 1964. North-Holland, Amsterdam, 1966.
[79] T. Tuerk. A formalisation of Smallfoot in HOL. In TPHOLs, 22nd International Conference, LNCS
5674, pages 469–484, 2009.
[80] V. Vafeiadis. Shape-value abstraction for verifying linearizability. In 10th VMCAI, LNCS 5403, pages
335–348, 2009.
[81] V. Vafeiadis and M.J. Parkinson. A marriage of rely/guarantee and separation logic. In 18th CONCUR,
Springer LNCS 4703, pages 256–271, 2007.
[82] C. Varming and L. Birkedal. Higher-order separation logic in Isabelle/HOLCF. 24th MFPS, 2008.
[83] J. Villard, É. Lozes, and C. Calcagno. Tracking heaps that hop with Heap-Hop. In 16th TACAS, Springer
LNCS 6015, pages 275–279, 2010.
[84] H. Yang. Local Reasoning for Stateful Programs. Ph.D. thesis, University of Illinois, Urbana-
Champaign, 2001.
[85] H. Yang, O. Lee, J. Berdine, C. Calcagno, B. Cook, D. Distefano, and P. O’Hearn. Scalable shape
analysis for systems code. 20th CAV, Springer LNCS 5123. pp 385-398, 2008.
[86] H. Yang and P. O’Hearn. A semantic basis for local reasoning. In 5th FOSSACS, 2002. Springer LNCS
2303 , pp402-416.