Efficient Path Conditions in Dependence Graphs For Early Software Safety Analysis
Efficient Path Conditions in Dependence Graphs For Early Software Safety Analysis
A new method for software safety analysis is presented which uses program slicing and constraint
solving to construct and analyze path conditions, conditions defined on a program’s input vari-
ables which must hold for information flow between two points in a program. Path conditions are
constructed from subgraphs of a program’s dependence graph, specifically, slices and chops. The
article describes how constraint solvers can be used to determine if a path condition is satisfiable
and, if so, to construct a witness for a safety violation, such as an information flow from a program
point at one security level to another program point at a different security level. Such a witness
can prove useful in legal matters.
The article reviews previous research on path conditions in program dependence graphs;
presents new extensions of path conditions for arrays, pointers, abstract data types, and multi-
threaded programs; presents new decomposition formulae for path conditions; demonstrates how
interval analysis and BDDs (binary decision diagrams) can be used to reduce the scalability prob-
lem for path conditions; and presents case studies illustrating the use of path conditions in safety
analysis. Applying interval analysis and BDDs is shown to overcome the combinatorial explosion
that can occur in constructing path conditions. Case studies and empirical data demonstrate the
usefulness of path conditions for analyzing practical programs, in particular, how illegal influences
on safety-critical programs can be discovered and analyzed.
Categories and Subject Descriptors: D.2.4 [Software Engineering]: Software/Program Verifica-
tion—validation, reliability; F.3.1 [Logics and meanings of Programs]: Specifying and Verifying
and Reasoning about Programs—Mechanical Verification; F.3.2 [Logics and meanings of Pro-
grams]: Semantics of Programming Languages—Program Analysis
General Terms: Algorithms, Reliability, Security, Verification, Theory
Additional Key Words and Phrases: Safety analysis, program slicing, path condition, information
flow control
1. INTRODUCTION
In many safety-critical software applications, guarantees are needed that in-
ternal or external agents cannot illegally influence critical computations and
This paper is a revised and extended version of a paper presented at the ICSE 2002.
This work is supported by Deutsche Forschungsgemeinschaft, grants DFG Snll/5-1 and Snll/5-2.
Authors’ addresses: G. Snelting, T. Robschink, Fakultät für Mathematik und Informatik, Univer-
sität Passau, Innstr. 33, 94032 Passau, Germany, email: [email protected]; J. Krinke,
FernUniversität in Hagen, Fachbereich Elektrotechnik, 58084 Hagen, Germany.
Permission to make digital or hard copies of part or all of this work for personal or classroom use is
granted without fee provided that copies are not made or distributed for profit or direct commercial
advantage and that copies show this notice on the first page or initial screen of a display along
with the full citation. Copyrights for components of this work owned by others than ACM must be
honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers,
to redistribute to lists, or to use any component of this work in other works requires prior specific
permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn
Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or [email protected].
C 2006 ACM 1049-331X/06/1000-0410 $5.00
ACM Transactions on Software Engineering and Methodology, Vol. 15, No. 4, October 2006, Pages 410–457.
Efficient Path Conditions in Dependence Graphs • 411
1987; Horwitz et al. 1990] to determine for a given statement x all statements
which may influence x. Appendix 1 explains the relationship between slicing
and Goguen and Meseguer [1984] noninterference.
Slicing today is reasonably fast and can deal with production programs writ-
ten in commercial programming languages. There are some language features
that are hard to deal with, such as pointer arithmetic in C, but, in a safety-
critical context, such features could be disallowed by programming standards.
Unfortunately, even if the best known algorithms are used, slicing is quite
imprecise in practice; slices are bigger than expected and sometimes too big to
be useful [Bent et al. 2000]. Furthermore, slicing gives only binary information;
it can decide whether statement y may influence statement x, or whether this
is definitely not the case, but slicing does not say how “strong” the influence
is or under which circumstances it can happen. For purposes of information
flow control, we therefore proposed to combine slicing with path conditions
and constraint solving [Snelting 1996; Krinke and Snelting 1998]. Let y →∗ x
denote any path in a dependence graph from node y to node x.
Thus path conditions can make slicing more precise by reducing the number
of false positives of potential influences on a computation. Note that our usage
of the term “path condition” differs from its traditional usage in test case gen-
eration. We do not determine necessary conditions for execution flow along a
specific path in the control flow graph; we determine necessary conditions for in-
formation flow between two points in the dependence graph, that is, conditions
that must hold for information flow to occur between those two points.
Snelting [1996] presented fundamental formulae and theorems for the defi-
nition and simplification of path conditions. But at that time we had no imple-
mentation, no efficient algorithms, no support for the full C language, and no
empirical data. Hence the contributions of this article include:
When control flow is structured, (C, →C ) is a tree and CP(x) is a single path.
In practice, unstructured control flow is rare, hence even if (C, →C ) is not a tree,
CP(x) can very well be a single path from START to x.
is unsatisfiable. Thus line 1 cannot influence line 3 even though (1) ∈ BS((3)).
This example demonstrates how path conditions can make slicing more precise.
Since there may be many assignments to the same variable, and therefore
variables may have different values at different program points, all programs
are transformed into static single assignment form (SSA) [Cytron et al. 1991])
first. In SSA form, there is at most one assignment to every variable. If neces-
sary, we distinguish different SSA variants of a program variable by additional
indices. As an example, consider the fragment and its SSA form
(1) x = a; (1) x1 = a;
(2) while (x<7) { (2) while (x2 = φ(x1 ,x3 ),x2 <7) {
(3) x = y+x; (3) x3 = y+x2 ;
(4) if (x==8) (4) if (x3 ==8)
(5) p(x); (5) p(x3 );
} }
SSA distinguishes x1 defined in (1) and x3 defined in (3).1 SSA uses φ func-
tions [Cytron et al. 1991] to describe situations where different SSA variants
of a variable may reach the same program point. In the example, x2 = φ(x1 , x3 )
just means that x2 = x1 ∨ x2 = x3 . Statement (5) is only executed if x2 < 7
and x3 = 8, and therefore an information flow along the data dependencies
(1) → (3) → (5) is only possible under that same condition
1 In
explanatory examples, we use line numbers as SSA indices; in fact, PDG node numbers are
used.
ACM Transactions on Software Engineering and Methodology, Vol. 15, No. 4, October 2006.
Efficient Path Conditions in Dependence Graphs • 417
P2 , . . . ∈ CH( y, x). The following fundamental formula for a strong and neces-
sary path condition was introduced by Snelting [1996]:
PC( y, x) = E(z). (1)
Pρ ∈CH( y,x) z∈Pρ
All control conditions c(ν → μ) on the path from START to z must be satisfiable,
otherwise z cannot be executed. In case of unstructured control flow, more than
one control path from the start node to z might exist, and the disjunction of the
corresponding conditions must be taken.
Note that the PDG may contain cycles so the outer disjunction in Equation
(1) may run over infinitely many paths. But Section 4 will prove that nonover-
lapping cycles can simply be ignored and that overlapping cycles can be handled
by interval analysis. For the time being, the reader may ignore cycles.
As already mentioned, SSA form must be used to generate correct path condi-
tions. Hence some additional constraints must be generated which represent φ
functions. Let xi , x j , xk , . . . be different SSA variants of variable x. A φ function
xi = φ(x j , xk , . . . ) generates the additional φ constraints xi = x j ∨ xi = xk ∨ . . . .
In the example, the φ function x2 = φ(x1 , x3 ) translates into the additional φ
constraint x2 = x1 ∨ x2 = x3 , thus
The set of all φ constraints is denoted . For a specific data dependence edge
i → j from a definition of a variable to its use in a φ node, the corresponding
constraint is written (i → j ). In the previous example, (1 → 2) ≡ x2 = x1 ,
and (3 → 2) ≡ x2 = x3 .
The constraints (or at least those constraints relevant for a given path)
are always assumed to be part of a path condition, that is, conjunctively added
to PC( y, x). Hence Equation (1) in fact reads
PC( y, x) = E(z) ∧ (u → v) .
Pρ ∈CH( y,x) z∈Pρ u→v∈Pρ
2.7 An Example
Figure 1 presents a mergesort program in C. Parts of the PDG are also pre-
sented, in particular, essential dependences from the chop CH(45, 21) between
constant 999 in line 45 and array temp in line 21. Rounded boxes represent
statement nodes in the SDG, while rectangular boxes represent control con-
ditions; for both, their internal structure is indicated.2 Normal arcs represent
data dependences; boldface arcs represent control dependences; and dashed
arcs represent loop-carried data dependences. The example contains arrays as
well as procedures. The full details for arrays and procedures are presented
only in Section 3. Therefore, this example treats arrays as scalar variables, and
recursive calls are ignored. It serves to explain the basic machinery before we
proceed to more complex language constructs.
2 ValSoft
uses a fine-grained PDG where subexpressions have their own PDG nodes; this has ad-
vantages for some applications [Krinke 2003a].
ACM Transactions on Software Engineering and Methodology, Vol. 15, No. 4, October 2006.
420 • G. Snelting et al.
Let us now compute PC(45, 21), that is, the path condition from the start node
to the end node in Figure 1. As recursive calls are ignored, we compute the path
condition for one specific path in CH(45, 21) and not for the whole chop; we leave
out the dependencies from the recursive calls in line 36/37 (full treatment of
interprocedural conditions is presented in Section 3.2). Thus we compute the
path condition for data flow along lines 49 → 51 → 32 → 38 → 10 → 21.
Equations (1) and (2) also apply for parts of a chop such as a specific path but,
ACM Transactions on Software Engineering and Methodology, Vol. 15, No. 4, October 2006.
Efficient Path Conditions in Dependence Graphs • 421
of course, does not deliver a general condition for information flow between two
program points, but a specific, necessary condition for flow along that specific
path or subchop. Compared to the full chop, conditions for a specific path or
subchop are stronger; if they are satisfiable, then the path condition for the full
chop is satisfiable as well.
To generate the condition, the execution conditions for statements on
the path and their constituent control conditions must be generated. For
example,
c(17 → 18) ≡ (index117 ≤ mid10 ) ∧ (index217 ≤ last10 ),
(condition 4 in Figure 1), and
c(18 → 21) ≡ data49 [index117 ] ≥ data49 [index217 ],
(condition 5). Next, constraints must be considered. As we will see later in
more detail, constraints as described before can be improved by substituting
the right-hand side from assignments. A precise definition of this technique will
be given in Section 3.6 (Equation (8)); right now we substitute the right-hand
side of the assignment to m (line 34) in line 36, yielding
(34 → 36) ≡ m34 = (left32 + right32 )/2.
In the example, there is just one assignment to m, hence the constraints do not
contain any disjunctions; de facto they act like constant propagation. In fact,
full constant propagation is automatically built into path conditions, making
the conditions much stronger.
Substitution of right-hand sides cannot only be applied for assignments
but also for value parameters. Exploiting such formal/actual constraints
for the calls in line 51 and 38 (parameter dependences marked 1 and 3 ),
Remember that all program variables in this necessary condition are exis-
tentially quantified. Furthermore, path conditions in their basic form (Equa-
tions (1)+(2)) treat arrays like scalar variables, and array elements are not
distinguished.
Automatic simplification generates true, which is quite obvious. Of course,
we can always find values for index1, index2, and data[] such that PC(45, 21)
becomes true.
Hence there is high probability that there is information flow from line 45
to line 21 even without a recursive mergesort call; the value 999 is eventually
ACM Transactions on Software Engineering and Methodology, Vol. 15, No. 4, October 2006.
422 • G. Snelting et al.
assigned to the temp array. But note that, due to the coarse-grained array treat-
ment, the path condition is too weak. That means, it is too pessimistic—the
path condition is overly suggestive of the existence of an influence. On the
other hand, the still missing recursive calls would make the path condition
even weaker. This observation is the motivation for extensions which handle
specific language constructs and will be described in the following section.
(5) b = a;
(6) else
(7) c = b;
}
(8) z = c;
In order to compute PC(1, 8), we observe that there is exactly one cycle free
path in the SDG from (1) to (8), namely 1 → 5 → 7 → 8. All statements on
this path must at least be executable, thus E(5) ≡ (n > 0) ∧ (x > 0) as well as
E(7) ≡ (n > 0) ∧ ¬(x > 0) must be satisfiable. Applying Equation (1), we obtain
PC(1, 8) ≡ E(1) ∧ E(5) ∧ E(7) ∧ E(8) ≡ (n > 0) ∧ (x > 0) ∧ (n > 0) ∧ ¬(x > 0) ≡ false
which is clearly incorrect even though the example is already in SSA form.
The reason is that 5 → 7 is a loop-carried dependency; the value for b is used
for c only one loop iteration later when x may already have a new value. Thus
two values for x must be distinguished, one for the path fragment before the
loop-carried dependency, and one for the path fragment after it.
Earlier versions of the path condition generator simply replaced control con-
ditions containing the same variable connected by a loop-carried dependency
by true. The resulting path conditions are still necessary conditions but weaker
than those respecting loop-carried variable distinctions.
For increased precision, we now use additional SSA indices (and φ functions)
to distinguish between variable instances connected by a loop-carried depen-
dency. For the previous example, we thus obtain
E(5) = (n > 0) ∧ (x1 > 0), E(7) = (n > 0) ∧ ¬(x2 > 0).
In general, such additional indices have to be provided for all path segments
of a chop which are connected by a loop-carried dependency. Henceforth we
assume that the SSA indices respect such loop-carried distinctions of variable
instances. Thus path conditions can now express that a variable may have
different values during loop iterations. The details of this technique can be
found in Krinke [2003a].
3.4 Arrays
If array elements are distinguished, additional constraints for index expres-
sions are generated for data dependencies concerning an array. We have already
seen such a constraint in Section 2 (namely, i + 3 = 2 j − 42). In general, any
data dependence edge a[exp1 ] → a[exp2 ] generates a constraint exp1 = exp2 ,
and for a path in the SDG, all such constraints along its edges are conjunctively
added to the path condition. The general formula (1) thus becomes
PC(x, y) = E(z) ∧ δ(z → z ) , (4)
Pρ ∈CH(x, y) z∈Pρ z→z ∈Pρ
where
δ(z → z ) ≡ true if z → z is not an array dependence edge
δ(a[e1 ] → a[e2 ]) ≡ e1 = e2 otherwise.
ACM Transactions on Software Engineering and Methodology, Vol. 15, No. 4, October 2006.
Efficient Path Conditions in Dependence Graphs • 425
(1) a[i] = x;
(2) a[j] = y;
(3) z = a[k];
dd dd du
δ(z → z , Pρ ) ≡ let z 1 → z 2 . . . → z k→
z be a maximal subpath
in Pρ , z k = z
in (A(z 1 ) = A(z )) ∧ i=2..k A(z i ) =
A(z )
(6)
Note that the maximal subpath in this equation is uniquely determined. If there
are no dd-edges, Equations (5) and (6) reduce to Equation (4).
For the example, Equations (5) and (6) generate
PC(1, 3) ≡ δ(2 → 3, 1 → 2 → 3) ≡ (i = k) ∧ ( j = k)
which makes clear that information flows from (1) to (3) only if (2) does not kill
the definition at (1) and is stronger than the condition i = k which was obtained
using the standard approach.
Let us now consider the example in Figure 2. For the chop between lines 8
and 21 in Figure 2, ValSoft generates a big path condition which heavily relies
on Equations (4), (5), (6) as well as constraints and substitutions of right-hand
sides. It is simplified by constraint solving to
PC(8, 21) ≡ i = 53.
This condition becomes clear after a closer look at the program. Line 9 is data
dependent on line 8 via a[0]; since line 10 is control dependent on line 9, it is
also dependent on line 8. All the a[i] in line 10 and in particular a[53] are
thus dependent on line 8. The usage of array a in line 21 creates a dependence
if i = l . constraints, acting as constant propagation, imply that in line 21
the only possible value for l is 53. Thus line 21 depends on line 8 if and only if
i = 53.
3.5 Pointers
Until now with the exception of array dependencies a data dependence i → j
always leads from the definition i of a variable x and its usage j . In the presence
ACM Transactions on Software Engineering and Methodology, Vol. 15, No. 4, October 2006.
Efficient Path Conditions in Dependence Graphs • 427
3 If
this is not the case, there are normalizing techniques like Knuth-Bendix completion (see the
excellent description by Baader and Nipkow [1998]) but this is outside the scope of this article.
ACM Transactions on Software Engineering and Methodology, Vol. 15, No. 4, October 2006.
428 • G. Snelting et al.
calls. The path condition generator checks this condition and substitutes right-
sides whenever possible as it usually makes path conditions much stronger.
Full details can be found in Robschink [2005].
3.7 Assertions
In order to improve path conditions by exploiting background knowledge, as-
sertions can be used. Ordinary assert statements in C generate a corresponding
control condition which already makes path conditions stronger. Furthermore,
ValSoft provides XASSERT which allow more fine-grained control. Besides
specifying a boolean formula, it also allows specification of the scope of an
assertion. Usually the scope is a list of statements within a procedure body;
scopes may be nested.
Assertions can be invariants which in principle could be derived from the
source code but, more typically, are truly additional constraints that cannot be
derived from the program. The use of assertions makes path conditions much
stronger and can reduce the size of path conditions dramatically. In practice,
assertions are often used to focus on a specific region in a chop by providing an
assertion which excludes data flow along other paths. Assertions are assumed
to be valid for all SSA variants of a variable occurring in its scope, thus acting
as a scope invariant.
Figure 3 shows a simple measurement program which was discussed in
Snelting [1996]; this program allows manipulation of the displayed weight
value by keyboard input + or −. The source text contains two nested XAS-
SERTs, both containing variable idx. The outer XASSERT affects only idx21
(this SSA variant comes from the for loop initialization idx=0;), whereas the
inner assertion acts upon both idx21 and idx31 (the latter coming from the loop
variable incrementation idx++;, which in the SDG is part of the loop body).
Thus assertions are not just extracted from the source code, but the appro-
priate SSA indices are added (if more than one SSA variant is affected by an
ACM Transactions on Software Engineering and Methodology, Vol. 15, No. 4, October 2006.
430 • G. Snelting et al.
assertion, the assertion is duplicated for every SSA variant). Finally, the asser-
tion is conjunctively combined with the path condition.
In Figure 3, the original path condition for the chop between keyboard input
(variable p cd in line 8) and displayed value (printf statement in line 33) is
(without SSA indices)
PC(8, 33) = (( p ab [0] & 16 = 0) ∧ ( p cd [0] & 1 = 0) ∧ (idx < 7)
∧ ( p cd [0] & 16 = 0) ∧ (e puf [idx] = + ))
∨ (( p ab [0] & 16 = 0) ∧ ( p cd [0] & 1 = 0) ∧ (idx < 7)
∧ ( p cd [0] & 16 = 0) ∧ (e puf [idx] = + ) ∧ (e puf [idx] = − ))
indicating the safety violation mentioned before. Now let us assume the en-
gineer knows that the hardware used for keyboard input can only deliver
capital letters, but not special characters. This is expressed by the condition
65 ≤ e puf [idx] ≤ 90 in the second assertion, which is conjunctively added to
PC(8, 33). The result is false (if only primitive hardware is used, the safety vi-
olation is not possible). This example shows that assertions can indeed reduce
path conditions dramatically but also demonstrates that erroneous assertions
can generate false safety statements.
thread 1: thread 2:
(1) a = b; (5) if (x>0)
(2) c = d; (6) d = e;
(3) e = a; (7) if (y>0)
(8) d = a;
It is impossible that (2) is executed after (3); however, due to the interference de-
pendences 3 → 6 and 6 → 2, there exists a path from (3) to (2) and—interpreting
(3) → (6) and (6) → (2) as ordinary data dependency edges—the path condi-
tion computes to PC(3, 2) ≡ x > 0. This weak, but satisfiable, path condition
indicates that (3) may travel backward in time and be executed before (2).
One possibility to eliminate time travel is to compute more precise chops.
Unfortunately, for precise threaded chops, it is not enough to compute more
precise slices (perhaps with Krinke’s technique [1998, 2003b]) and use the in-
tersection of a forward and a backward slice (as for sequential intraprocedural
programs). In the example, the intersection of FS(1) and BS(2) includes (6), but
a precise chop would not contain (6) because there is no execution where (6)
is influenced by (1) and executed before (2); (6) is only influenced by (1) if it
is executed after (2). Still, the intersection of threaded slices can be used as a
basis for path conditions because in Equation (1) time traveling can be excluded
by using the notion of a threaded witness. A threaded witness is a witness of a
possible program execution. It presents a statement execution sequence which
is free of time travel and consistent with the execution order in every thread.
Formally, a sequence l = n1 , . . . , nk of nodes is a threaded witness if and only
if
Hence all nodes in a thread must be reachable from its predecessors if they
cannot execute in parallel. The definition assumes a simple model of parallel
execution similar to structured cobegin/coend parallelism.4 Under the assump-
tion that for every path it can be decided whether it is a threaded witness, the
4 This
definition is different than the one presented by Krinke [1998]: it is more precise and more
generally applicable.
ACM Transactions on Software Engineering and Methodology, Vol. 15, No. 4, October 2006.
432 • G. Snelting et al.
Pρ is a threaded witness
In fact, the definition of a threaded witness implies its decidability. The decom-
positions from Section 4.1 can also be applied to the multithreaded version.
The idea is to decompose a path into interference-edge-free subpaths. For the
subpaths, the path condition can be generated without checking the threaded
witness property. The property only has to be checked at the connecting inter-
ference edges.
4. SCALING UP
Path conditions as introduced in the last two sections do not scale. In practice,
SDGs have thousands to tens of thousands of nodes, and chops have thousands
of paths as well as hundreds of cycles. Furthermore, naive generation of path
conditions can easily cause an exponential blowup in their sizes.
To overcome these obstacles, we apply several techniques:
(1) new formulae for the recursive decomposition of path conditions are intro-
duced;
(2) ordered binary decision diagrams (OBDDs) are used to avoid blowup of path
conditions;
(3) interval analysis is performed on the SDG, identifying a hierarchy of re-
ducible loops, irreducible loops, or acyclic subgraphs;
(4) the path condition is generated in a divide-and-conquer-style, exploiting
the interval analysis and decomposition results.
Now let us assume that any path x →∗ y must pass through a subgraph
S ⊆ N . From x, S can only be entered via entry points e1 , . . . , ek ∈ S, and y
can only be reached via exit points o1 , . . . , om ∈ S. Entry and exit points need
not necessarily be disjoint. Then
k m
PC(x, y) = PC(x, ei ) ∧ PC(ei , o j ) ∧ PC(o j , y) (12)
i=1 j =1
An important theorem, first proved by Snelting [1996], states that cycles can
be ignored. This makes the set of paths for any chop finite. Let x → x1 → · · · →
xk → x be a cycle. Then
PC(x, y) = E(z). (15)
Pρ ∈CH(x, y) z∈Pρ
xk →x ∈ Pρ
This equation is the same as fundamental Equation (1) except that the cycle’s
back edge xk → x is excluded and therefore the path through the cycle is left
out. Hence the equation states that, for the computation of PC(x, y), the cycle at
x can be ignored. This theorem is due to the fact that a path through a cycle only
makes a path condition stronger, but the stronger subconditions are canceled
out in the outer disjunction of Equation (1) due to the absorption law (A ∨ (A ∧
B) = A). Note that Equation (15) does only apply to nonoverlapping cycles; if
cycles overlap, just ignoring them would miss some paths and hence generate
a path condition which is too strong and not necessary anymore. Section 4.3
explains how to handle overlapping or nested cycles.
Let us add some remarks on execution conditions. As Equation (2) is struc-
turally identical to Equation (1), decompositions analogous to Equations (10)–
(15) can be derived for execution conditions as well. We omit the corresponding
equations. Note, however, that the equations for path conditions are defined
with respect to a chop CH(x, y) ⊆ (N , →), while the corresponding equations
for execution condition E(z) are defined with respect to CP(z) ⊆ (C, →C ). If
intraprocedural control flow is structured, CP(z) is a tree and the outer dis-
junction in Equation (2) disappears. in fact, E(z) can then be computed by
Fig. 4. Left: Example of a Sreedhar-Gao-Lee SDG decomposition. Right: An irreducible graph, its
dominator tree, and its strongly connected components.
in outer SCCs according to Equation (12). Thus SCCs are processed bottom-up.
In detail, path conditions are computed as follows.
(1) For a reducible SCC L, let e be the entry point and x1 , . . . , xn be the exit
points. Since backward arcs only go back to the entry point and can be
ignored due to Equation (15), path conditions can be computed in topological
order: the SCC without back edges is cycle-free. For any node z ∈ L, PC(e, z)
is computed according to Equation (13).
The necessary execution conditions are computed as needed according to
Equation (2). In case of structured control flow within the SCC, the outer
disjunction in this equation becomes redundant, and the control dependen-
cies form a tree. Therefore, execution conditions can be computed efficiently
according to Equation (16). Execution conditions are cached in SDG nodes.
As most of the control flow is structured, the topological ordering of path
conditions as well as execution conditions touches each individual c(x → y),
δ(x → y) or (x → y) only once. Eventually topological order reaches the
xi , thus all PC(e, xi ) can be collected in time O(|L|). Note that this time for
constructing path conditions does not include the time for BDD operations;
these typically have a complexity of O(|L|) themselves, resulting in a total
of O(|L|2 ).
(2) For an irreducible SCC L, let e1 , . . . , ek be the entry points and x1 , . . . , xn
the exit points (entry and exit points need not be disjoint). All cycle-free
paths from an ei to an x j are generated by depth-first search starting at
ei , and PC(ei , x j ) is computed according to Equation (14)—common prefixes
for two paths are thus automatically factored out.
Computation of execution conditions is as in the reducible case. The
complexity is O( p · |L|), where p is the number of paths in the SCC (again
not counting the BDD operations).
(3) Once the PC(ei , x j ) have been computed for all SCCs at a certain level,
these conditions are exploited on the next level up by applying Equations
(10) or (12). For the purpose of path conditions, SCCs from a lower level are
treated as collapsed into one meganode where execution and path conditions
for this meganode are computed as expressed in Equation (12). Note that
entry and exit points of SCCs are needed in Equation (12) and thus must
be propagated up to the next level.
If L is the SCC on the next upper level, time for computing the path
conditions (without the time for the inner SCCs and BDD operations) is
O(|L |) for reducible L and O( p · |L |) for irreducible L .
As an example, consider Figure 5, which displays a simple SDG and the
bottom-up generation of path conditions. Solid arcs are SDG edges, while
dashed arcs are dominator edges not in the SDG. The SGL algorithm discovers
D/G as innermost cycle, which is a reducible loop (the back edge G → D can
be ignored), and PC(D, G) = E(D) ∧ E(G), PC(D, D) = E(D).5 The cycle is col-
lapsed, and the bottom-up strategy identifies the SCC DG/C/E/F/H next. This
ACM Transactions on Software Engineering and Methodology, Vol. 15, No. 4, October 2006.
Efficient Path Conditions in Dependence Graphs • 437
Fig. 5. Bottom-up treatment of nested loops. Dotted arcs are dominator edges.
time, it is an irreducible SCC as it has two entry points DG and C, and one exit
point, H. According to Equation (12),
PC(DG, H) = (PC(D, G) ∧ E(H)) ∨ (PC(D, D) ∧ E(F ) ∧ E(H))
= E(D) ∧ E(H) ∧ (E(F ) ∨ E(G))
PC(C, H) = E(C) ∧ (PC(DG, H) ∨ E(E) ∧ E(F ) ∧ E(H)).
After collapsing CDGEF, the next SGL step identifies the SCC A/B/CDGEFH.
The path condition is
PC(A, CDGEFH) = E(A) ∧ (PC(C, H) ∨ E(B) ∧ PC(DG, H)).
Thus the last step computes the final path condition
PC(S, Z ) = E(S) ∧ PC(A, CDGEFH) ∧ E(Z ).
Substituting all intermediate path conditions in the equations would lead to a
blowup of the formula, an effect which is fortunately avoided by using BDDs.
Note also how the hierarchical SGL decomposition avoids an explosion of the
number of paths since enumeration of paths is limited to local SCCs at a certain
level in the bottom-up process.
The total complexity depends very much on the structure of the chop under
consideration. If the SGL decomposition produces many small nested SCCs, the
complexity of path condition generation for a bottom-level SCC is bounded by
a constant, and a standard divide-and-conquer analysis results in a complexity
of O(n · ln n) (n = |SDG|). If the chop is just one huge nondecomposable SCC,
ACM Transactions on Software Engineering and Methodology, Vol. 15, No. 4, October 2006.
438 • G. Snelting et al.
4.5 Implementation
The ValSoft system can generate path conditions for full ANSI C (except pointer
arithmetic and setjmp/longjmp).6 The path condition generator was imple-
mented on top of the ValSoft slicer. In order to implement the algorithm from
Section 4.3, we implemented the Lengauer/Tarjan fast dominator algorithm
[Lengauer and Tarjan 1979] as well as SGL’s generalized interval analysis.
All path conditions are handled through the BuDDy BDD package, and the
BDDs for all conditions are cached in the corresponding SDG nodes. The final
path conditions are extracted from the BDD and fed into a standard Quine/
McCluskey minimizer [Quine 1955] to obtain a minimal disjunctive nor-
mal form (MDNF). This MDNF is used for displaying path conditions and
also prevents the subsequent constraint solvers from drowning in huge for-
mulae. Note that computing the MDNF can have exponential time com-
plexity, but our experiments indicate that this poses no problem in prac-
tice; in any case, a MDNF is not an absolute requirement (see Section 4.2).
An interface to the Redlog solver [Dolzmann and Sturm 1997; Sturm and
Weispfenning 1996] has been implemented, and interfaces to other solvers
are in preparation. The solved conditions are displayed to the user in textual
form.
The current implementation utilizes the SGL decomposition only in an in-
traprocedural manner. The dominator tree is computed separately for every
procedure, and the SGL eggs never cross procedure boundaries. Section 5 will
show that even intraprocedural SGL decomposition alone has a very positive
effect on performance. It is doubtful that an attempt to determine an interpro-
cedural SGL decomposition will have much additional positive effect for the
following reason. If the same procedure is called many times from different
places, this will inhibit many interprocedural dominator relationships in the
SDG and thus not lead to SCCs which are bigger than a procedure.
In order to introduce an additional control mechanism for precision and per-
formance of path condition generation, we implemented path length limitation.
For any path in a nonreducible SCC, the number of execution conditions used
in Equation (1) (and hence the number of path nodes, i.e., the path length) can
be limited to k% of the SCC’s node count (where k can be chosen as an analysis
parameter, and path nodes are selected in depth-first order in the SCC). The
empirical section will show that this has an additional positive effect on per-
formance, while the generated path conditions remain the same in most cases.
Note that path length limitation never violates the principle of conservative ap-
proximation. According to Equation (1) it may generate path conditions which
are too weak (see Section 2.5), but it never generates incorrect (i.e., nonneces-
sary) path conditions.
6 If
programs contain pointer arithmetic or setjmp/longjmp, ValSoft makes very conservative ap-
proximations which can easily ruin precision. Fortunately, these constructs are rare; none of our
benchmark programs (see Section 5) contained pointer arithmetic or setjmp/longjmp.
ACM Transactions on Software Engineering and Methodology, Vol. 15, No. 4, October 2006.
Efficient Path Conditions in Dependence Graphs • 439
ValSoft comprises about 75,000 lines of C++, among them 25,000 for path
condition generation and simplification (without the BDD package).
5. PERFORMANCE MEASUREMENTS
In this section, we investigate the performance of path conditions based on a
set of case studies. In particular, we study the following two questions.
(1) What is the effect of BBDs and interval analysis on performance? Do these
techniques scale up?
(2) What is the dominating factor for overall performance: program size, SDG
size, SDG structure, chop size, or chop structure?
The experiments in this section are based on 13 programs and 27 chops, thus
we do not claim that our results are generally valid. But we think that the
experiments display typical behavior of the path condition generator.7
Table I presents data for the set of 13 benchmark programs.8 The criteria
for this particular selection of benchmark programs were as follows: (1) the
programs are written in ANSI C, and contain neither pointer arithmetic nor
setjmp/longjmp; (2) the programs cover a wide variety of Unix programming
styles, not just safety-critical systems (which normally disallow pointers). The
first criterion mirrors requirements of the current ValSoft implementation. The
second criterion mirrors the fact that path conditions are helpful for any pro-
gram, not just safety-critical systems. Thus, we included standard Unix pro-
grams with heavy pointer usage; the only safety-critical system is WobbleTable.
None of the case studies uses multithreading. patch and ctags contain ordinary
C asserts, XASSERTs (see Section 3.7) have not been used.
Table I provides data about SDG size, program size, and number of func-
tion definitions and calls. Table II presents information about 27 chops which
7 During the final revision of this article, many more experiments and case studies became available
ACM Transactions on Software Engineering and Methodology, Vol. 15, No. 4, October 2006.
440 • G. Snelting et al.
have been selected for computing path conditions.9 All measurements were
done on a standard 1GHz PC with 2GB of RAM. For every program, two or
three chops were randomly chosen and then selected according to the following
criteria:
— chops should not be too small compared to the SDG size, that is, the number
of chop nodes is at least 5% of the number of SDG nodes;
— chops with obvious simple path conditions have been excluded, therefore, the
start and end node should be deeply nested in the control flow.
Table II provides the number of chop nodes and edges as well as structural
information. First, the top-level SCCs were determined; redSCC is the number
of top-level reducible SCCs, and irrSCC is the number of irreducible top-level
SCCs. The rest of the columns are concerned with the SGL decomposition, which
determines not only top-level SCCs, but nested SCCs as explained in Section
4.3. Columns redL and irrL give the numbers of nested reducible respectively
irreducible SCCs, and of course, redSCC ≤ redL and irrSCC ≤ irrL always hold.
Column Depth displays the maximal depth over all intraprocedural dominator
trees (remember that dominators and SGL decompositions are computed only
intraprocedurally). Columns maxN and maxE present the number of nodes
9 The numbers presented in this section differ from the numbers presented by Robschink and
Snelting [2002] as they were computed with points-to analysis activated and the improved inter-
procedural chopping algorithm from Krinke [2002].
ACM Transactions on Software Engineering and Methodology, Vol. 15, No. 4, October 2006.
Efficient Path Conditions in Dependence Graphs • 441
Table III. Performance for Various Path Conditions in the Standard Algorithmic Configuration
(SGL Decomposition and BDDs are Active)
ChOP Disj Conj Neg Cond BddN BddV BddVR Time(s) Mem(MB)
mergesort 1 2 14 2 17 261 8 6 0 0.9
mergesort 2 4 20 1 25 362 13 10 1 1.0
calculator 1 0 1 0 2 253 9 2 0 1.4
calculator 2 0 2 0 3 209 6 3 0 0.8
triple des 1 3 24 4 28 372 13 8 1 4.6
triple des 2 0 1 1 2 207 4 2 1 3.8
assembler 1 17 177 96 195 373 21 14 40 21.7
assembler 2 41 581 214 623 926 46 15 6 15.9
agrep 1 0 4 4 5 209 5 5 4 20.8
agrep 2 1 12 5 14 416 25 8 17 21.0
WobbleTable 14 323 126 338 4112 132 16 47 11.6
WobbleTableM 11 148 63 160 4535 128 15 38 10.7
flex 2 0 1 2 2 848 59 2 78 40.0
flex 3 379 9515 4200 9895 17185 101 34 1901 357.9
bison 1 2 12 3 15 322 10 6 11 20.3
larn 1 1 9 5 11 426 54 6 148 79.3
larn 2 1 1 1 3 455 55 2 150 79.4
moria 2 0 2 0 3 3102 220 3 63 99.9
The data for ctags 1, ctags 2, gnugo 1, gnugo 2, flex 1, patch 1, patch 2, bison 2, moria 1 are not shown as the
runtimes were more than one hour.
and edges in the biggest reducible respectively irreducible SCC in the SGL
decomposition.
For the 27 chops, the number of SCCs varies widely. While the only safety-
critical program, WobbleTable has a large number of small decomposable SCCs,
the flex 3 chop is not decomposable at all, and the chops for patch have very big
SCCs, some with a node/edge ratio of 1:20 or less.10 The latter two scenarios are
indicative of complex program structure as there are lots of interfering depen-
dencies from unstructured control flow, unstructured data flow, or unstructured
pointer usage. For such programs, generation of path conditions is expected to
be expensive.
Table III presents running times and memory requirements for the path con-
dition examples. These tables were determined with active BDDs and active
SGL decomposition but without path length limitation. 9 out of 27 path condi-
tions could not be determined within one hour. Among these is bison 2, which
has—as manual inspection revealed—a very bad decomposition structure com-
pared to bison 1, resulting in a huge performance difference. For the other 18
chops, we see times in the range of one minute and moderate space require-
ments. Comparing the data to the chop structure, the case studies indicate that
— path conditions for decomposable chops with small SCCs are easily de-
termined with reasonable effort (chop size is less important than chop
structure);
— nondecomposable chops (flex 3) are difficult to analyze, but even worse is
a large number of irreducible SCCs with bad structure (i.e., bad node/edge
ratio as in gnugo or ctags).
10 The nesting structure of the SCCs is not visible in Table II, but was manually analyzed for the
three examples.
ACM Transactions on Software Engineering and Methodology, Vol. 15, No. 4, October 2006.
442 • G. Snelting et al.
Table IV. Performance for Some Algorithmic Variants (Only Top-Level SCC Decomposition,
Without and With BDDs, vs. Standard Configuration with Active Decomposition
−BDD +SCC +BDD +SCC +BDD +S.G.L.
CHOP Time(s) Mem(MB) Time(s) Mem(MB) Time(s) Mem(MB)
mergesort 1 0 0.9 0 0.9 0 0.9
mergesort 2 0 1.0 1 1.0 1 1.0
calculator 1 227 208.7 1 1.3 0 1.4
calculator 2 0 0.8 0 0.8 0 0.8
triple des 1 0 4.6 1 4.6 1 4.6
triple des 2 0 3.8 0 3.8 1 3.8
assembler 1 ∞ ∞ 50 21.7 40 21.7
assembler 2 ∞ ∞ 12 15.9 6 15.9
agrep 1 7 20.8 7 20.8 4 20.8
agrep 2 ∞ ∞ ∞ ∞ 17 21.0
WobbleTable ∞ ∞ 3332 10.1 47 11.6
WobbleTableM ∞ ∞ 339 10.2 38 10.7
flex 2 ∞ ∞ 95 40.0 78 40.0
flex 3 ∞ ∞ ∞ ∞ 1901 357.9
bison 1 26 26.2 14 20.4 11 20.3
larn 1 208 79.3 167 79.3 148 79.3
larn 2 207 79.4 167 79.4 150 79.4
moria 2 86 99.9 69 99.9 63 99.9
Table III also contains information about the structure of the path condi-
tion and the structure of the BDD. Typical path conditions have less than one
hundred or at most a few hundred conjunctions and disjunctions, but the non-
decomposable flex 3 has a few thousand. Some path conditions are very small
(e.g., flex 2) even though the chop is quite big; this happens whenever path con-
ditions from alternating if-then-else paths cancel each other out. The number
of intermediate BDD nodes (BddN) and variables (BddV) compared to the final
number of BDD variables (BddVR) shows how the intermediate BDDs collapse
for the final path condition.
Table IV demonstrates the effect of BDDs and SGL decomposition. Here, ∞
means that the analysis ran out of memory. Using simple syntax trees instead of
BDDs (left columns), many of the examples from Table III were not analyzable
at all. With BDDs, but using only a simple top-level SCC decomposition (middle
columns), the time requirements are much higher than with BDDs and SGL
decomposition (right columns, repeated from Table III).
Table V demonstrates the effect of path length limitation. Path length is
limited to an amount between 0.5% und 8.5% of the chop node count. Compared
to Table III, another spectacular improvement in runtime behavior is visible,
which was to be expected. In particular, all 27 path conditions can now be
computed in less than an hour, and most of them in less than a minute. Of
course, the 0.5% limitation is faster than the 5% limitation.
Section 4.5 explained that path length limitation can make path conditions
less precise but never incorrect. Surprisingly, the structural information about
the path conditions demonstrates that for 25 out of 27 chops, the path conditions
remain unchanged. Only for WobbleTable and ctags 2 is there a difference.
For WobbleTable, the 8.5% limitation results in the same path condition as
ACM Transactions on Software Engineering and Methodology, Vol. 15, No. 4, October 2006.
Efficient Path Conditions in Dependence Graphs • 443
Table V. Effects of Limiting the Path Length to k% of SCC Node Count (SGL and BDDs Active)
CHOP Disj Conj Neg Cond BddN BddV BddVR Time(s) Mem (MB) k(%)
mergesort 1 2 14 2 17 255 8 6 0 0.8 5.0
2 14 2 17 255 8 6 0 0.8 0.5
mergesort 2 4 20 1 25 358 13 10 0 0.9 5.0
4 20 1 25 358 13 10 0 0.9 0.5
calculator 1 0 1 0 2 225 9 2 0 1.0 5.0
0 1 0 2 217 9 2 1 1.0 0.5
calculator 2 0 2 0 3 209 6 3 0 0.8 5.0
0 2 0 3 209 6 3 0 0.8 0.5
triple des 1 3 24 4 28 363 13 8 1 4.6 5.0
3 24 4 28 363 13 8 0 4.6 0.5
triple des 2 0 1 1 2 207 4 2 0 3.8 5.0
0 1 1 2 207 4 2 0 3.8 0.5
ctags 1 1 1 2 3 547 81 2 11 11.2 5.0
1 1 2 3 1056 81 2 10 11.2 0.5
ctags 2 1 15 3 17 391 36 9 5 9.6 5.0
1 11 3 13 370 36 7 6 9.6 0.5
assembler 1 17 177 96 195 373 21 14 40 21.7 5.0
17 177 96 195 373 21 14 40 21.7 0.5
assembler 2 41 581 214 623 811 39 15 6 15.9 5.0
41 581 214 623 811 39 15 6 15.9 0.5
gnugo 1 0 1 0 2 83855 322 2 437 10.2 5.0
0 1 0 2 18090 316 2 7 10.2 0.5
gnugo 2 0 9 2 10 83797 354 10 329 11.1 5.0
0 9 2 10 83797 354 10 329 11.1 0.5
agrep 1 0 4 4 5 209 5 5 5 20.8 5.0
0 4 4 5 209 5 5 4 20.8 0.5
agrep 2 1 12 5 14 384 25 8 18 21.0 5.0
1 12 5 14 437 25 8 4 21.0 0.5
WobbleTable 14 323 126 338 5237 129 16 39 11.0 8.5
11 175 77 187 5892 129 23 19 12.8 5.0
0 4 3 5 348 21 5 2 9.4 0.5
WobbleTableM 11 148 63 160 5889 125 15 29 10.4 5.0
11 148 63 160 1248 60 15 3 9.6 0.5
flex 1 21 191 85 213 1415 27 9 14 30.1 5.0
21 191 85 213 757 27 9 8 30.1 0.5
flex 2 0 1 2 2 848 59 2 74 40.0 5.0
0 1 2 2 848 59 2 74 40.0 0.5
flex 3 379 9515 4200 9895 17185 101 34 1918 357.9 5.0
379 9515 4200 9895 17185 101 34 1898 357.9 0.5
patch 1 103 1734 967 1838 906 25 16 23 33.6 0.1
103 1734 967 1838 906 25 16 24 33.6 0.05
patch 2 71 1348 404 1420 1036 155 27 139 36.9 0.1
71 1348 404 1420 1004 154 27 133 36.8 0.05
bison 1 2 12 3 15 322 10 6 11 20.3 5.0
2 12 3 15 322 10 6 11 20.3 0.5
bison 2 0 9 3 10 181707 344 10 1337 25.0 2.5
0 9 3 10 188652 341 10 134 24.7 0.5
larn 1 1 9 5 11 426 54 6 148 79.3 5.0
1 9 5 11 426 54 6 147 79.3 0.5
larn 2 1 1 1 3 455 55 2 150 79.4 5.0
1 1 1 3 455 55 2 149 79.4 0.5
moria 1 0 0 0 1 202 2 0 27 99.4 0.1
0 0 0 1 202 2 0 27 99.4 0.05
moria 2 0 2 0 3 3102 220 3 63 99.9 5.0
0 2 0 3 3102 220 3 64 99.9 0.5
ACM Transactions on Software Engineering and Methodology, Vol. 15, No. 4, October 2006.
444 • G. Snelting et al.
the unlimited variant from Table III, while the 5.0% limitation is slightly less
and the 0.5% sharply less precise than the unlimited variant.11 For the “bad
guys” gnugo, ctags etc., the unlimited path conditions are not available but, as
there are no differences between 5.0% and 0.5% limitation (except for ctags 2,
where there is a slight difference), we would be surprised if the unlimited path
conditions were more precise.
Hence Table V demonstrates that, in practice, length limitation does not in-
fluence precision. The engineer can start with small values for k and increase
k for interesting path conditions until the path condition does not change any-
more. The combination of BDDs, SGL decomposition, and path length limitation
guarantees that all programs can be analyzed, path conditions do scale up.
6. A CASE STUDY
The WobbleTable system has been developed in a student project about real-
time controllers. A ball in a maze has to be moved into a target. To achieve this,
the maze can be rotated to a vertical angle along two orthogonal axes; rotation is
controlled by a step motor. A stereo camera above the maze is used to determine
the position of the ball. WobbleTable reads the camera input, computes the
ball position and the way to the target, determines the horizontal and vertical
angle for the maze, and sends corresponding signals to the step motor. We chose
WobbleTable as an example of our methodology for although WobbleTable is not
a safety-critical system (such as a shutdown system for a nuclear power plant),
it exhibits many characteristics of such systems.
The source file is 4563 LOC of ANSI C; computation of the SDG took 15
seconds. For some library functions concerned with camera and motor control,
C stubs were provided which simulate the function’s behavior with respect to
data and control dependencies between parameters and global variables.12 In
our experiment, we wanted to check whether the step motor is influenced by
an outside agent and, if so, determine witnesses for suspicious behavior.
Figure 6 displays the central loop of the source code. While the ball did
not reach the target, the ball position is read from the camera and converted
to maze coordinates (function “getDPoint”, line 4). The function “pathNPos”
computes the distance to the next intermediate ball position, and the function
“getEngSteps” uses a neural net to compute the rotation of the maze. Func-
tion “calcCtrlVect” transforms this information into a control vector which
is sent to the motor (“sendEngSteps”, line 45); the maze angles are adjusted
accordingly.
Figure 7 displays the path condition for the chop between line 4 and line
45, that is, a necessary condition for influence of the motor by the camera. For
all atomic conditions in this path condition, their source file and source line
11 Table III only shows that the limited conditions are smaller. But as all variants are necessary
conditions, smaller indicates less precise. This was manually checked for most of the examples.
12 Providing stubs is a popular way to deal with libraries but quite expensive in practice. Libraries
are big, thus many stubs are required. Worse, source code is not always available, forcing the
analysis to use imprecise approximations.
ACM Transactions on Software Engineering and Methodology, Vol. 15, No. 4, October 2006.
Efficient Path Conditions in Dependence Graphs • 445
PC(4, 45) ≡
notTarget9263 = TRUE (control.c : 200)
∧ getDPoint9267 (calloc9093 (1, 8)) = 0
(control.c : 160, 203, 206)
∧ |table9289 .x9290 )| ≤ 250 (control.c : 213)
∧ |table9296 . y 9297 )| ≤ 250 (control.c : 213)
∧ pathNPos9325 (getPath9132 (. . . ), calloc9093 (1, 8),
calloc9100 (1, 8), . . . ) = TRUE (control.c : 160 . . . , 177, 226)
∧ sqrt (getPath9132 (. . . ).root7874 .pos7876 .x7878 − calloc9093 (1, 8).x7880 )∗
(getPath9132 (. . . ).root7884 .pos7886 .x7888 − calloc9093 (1, 8).x7890 ) +
(getPath9132 (. . . ).root7895 .pos7897 . y 7899 − calloc9093 (1, 8). y 7901 ) ∗
(getPath9132 (. . . ).root7905 .pos7907 . y 7909 − calloc9093 (1, 8). y 7911 )
< MAX TARGETDIST9330 (pfad.c : 71, control.c : 71 − 74, 160, 177, 226)
∧ i208 < | − table9570 .x9571 | + | − table9575 . y 9576 |
(calc.c : 80, 157, 137, control.c : 270, 271)
∧
|calloc9107 (1, 8).x7923 | < MAX TARGETSPEED9331
(pfad.c : 111, control.c : 162, 226)
∧ |calloc9107 (1, 8). y 7932 | < MAX TARGETSPEED9331
(pfad.c : 112, control.c : 162, 226)
∧ getPath9132 (. . . ).root7941 .next7943 = 0
(pfad.c : 115, control.c : 177)
∨
i4518 < noNeurons6160 (neuronal.c : 1101, 1108)
∧ NeuronsInLayer9424 [0] = 2 (neuronal.c : 1348)
∧ NeuronsInLayer9424 [0] = 3 (neuronal.c : 1390)
∧ NeuronsInLayer9424 [0] = 8 (neuronal.c : 1434)
ACM Transactions on Software Engineering and Methodology, Vol. 15, No. 4, October 2006.
446 • G. Snelting et al.
13 The path conditions presented in this section differ slightly from the path conditions in, Robschink
and Snelting [2002] as they were computed with activated points-to analysis and an improved
interprocedural chopping algorithm [Krinke 2002], leading to smaller chops and more precise path
conditions.
ACM Transactions on Software Engineering and Methodology, Vol. 15, No. 4, October 2006.
Efficient Path Conditions in Dependence Graphs • 447
PC(def(key), 45) ≡
i207 < |m x582 | + |m y 583 | (calc.c : 80, 157, 137)
∧ Layer3591 (neuron id3497 ) = 0 (neuronal.c : 696, 719)
∧ *ping3498 &128 > 0 (neuronal.c : 696, 731)
possible manipulation
Following the back links to the source code, we immediately see what the pro-
grammers did. In file variable.h, they added declaration extern int* ping;,
and in file dspkomm.c, they added declaration int* ping;. In file control.c, they
added the statement ping = (int*) key;. Deeply hidden inside neuronal.c,
they added the statement
which increases the scale factor in the neural net by 20% if the 8th bit of *ping
(i.e., key) is set. Interestingly, the variable val does not occur in the path con-
dition as it is never used in any control condition. But the SSA index ping3499
in the witness condition links back to the source and immediately identifies
the malicious if statement. Note that this is a constructed manipulation, but
not at all an obvious manipulation—a few lines of manipulative statements are
distributed over various source files. A human expert would have a hard time
discovering such a manipulation.
Summarizing this case study, we would like to point out the following facts.
— Witnesses make visible the reason for an influence and allow the engineer
to decide whether the influence is legal or not. Manipulating the step motor
in WobbleTable is certainly illegal, while manipulating a weight value as in
Figure 3 may be legal if it just serves to switch between grams and ounces.
— Witnesses are useful in legal matters, for instance, a lawsuit against a soft-
ware vendor. Using the witness, one can see the safety violation directly and
must not understand artifacts like model checking counterexamples or type
mismatches in eclectic type systems (see also Section 7).
7. RELATED WORK
Our work is similar in spirit to constraint-based test data generation (e.g.,
Gotlieb et al. [1998], Gupta et al. [1998], Goldberg et al. [1994], and deMillo
and Offut [1991]). All such methods are based on the control-flow graph and
generate constraints which enforce a specific control flow. Hence they cannot
generate constraints for data flow, which are essential for information flow con-
trol. Most methods (e.g., Gotlieb et al. [1998], Gupta et al. [1998]) have only
been applied to small programs, while we emphasize scaling up. Some (e.g., de-
Millo and Offut [1991]) do not obey the principle of conservative approximation
required for safety analysis. Others are restricted to specialized domains (e.g.,
Gupta et al. [1998]). Our approach provides a general path condition generator
which can then be connected to specialized solvers.
Parametric program slicing [Field et al. 1995] generalizes static and dynamic
slicing by allowing the specification of arbitrary constraints over input variables
(similar to our assertions in Section 3.7). A parametric slice is valid for all
inputs satisfying the constraints. Parametric slicing requires that the language
semantics is defined in terms of rewrite rules, augments these rules by the
given constraints, and determines the slice during rewriting. Unfortunately,
applications to realistic programs have not been reported. Conditioned slicing
[Canfora et al. 1998], a very similar technique, also shares the same problems
with realistic applications.
PREfix [Bush et al. 2000] analyzes the control flow and builds a memory
model to discover bugs like memory leaks or dangling references. It can also
generate simple path conditions for such bugs but again, these are based on
control flow rather than on data flow. ESC/Modula3 [Rustan et al. 1998] finds
similar bugs by applying verification technology but requires that the program-
mer add assertions to the program. Our approach does not require assertions
and is aimed at information flow control rather than detection of low-level bugs.
Pugh and Wonnacott [1998] use Presburger arithmetic for solving con-
straints concerning array dependencies. Their goal is automatic parallelization
of loops, and describe dedicated constraints and solving techniques. Our array
constraints are in fact a subset of Pugh and Wonnacott’s constraints, hence not
as strong. Furthermore, we have not yet emploed a Presburger solver. But, in
principle, it would be possible to plug their sophisticated analysis techniques
into ValSoft.
Reps [2000] also investigated the use of abstract data types in dependence
graphs. He extends his technique of context-free language reachability to model
ACM Transactions on Software Engineering and Methodology, Vol. 15, No. 4, October 2006.
Efficient Path Conditions in Dependence Graphs • 449
equations for abstract types. It turns out that interprocedural data dependence
becomes undecidable. Our approach, on the other hand, is based on rewrit-
ing modulo data dependencies, which is a mechanism completely orthogonal
to dependence analysis. While perhaps less precise, it avoids any decidabil-
ity problems and is completely decoupled from the rest of the path condition
generator.
Smith and Volpano [1998] presented a type system for an imperative lan-
guage with threads which can be used to check the Bell/La Padula condition.
It is a type-based implementation of the approach by Denning and Denning
[1977], who assumed that security domains form a lattice and presented a non-
standard semantics for a simple language in order to determine information
flow between different security levels. Compared to slicing and path conditions,
Denning and Denning’s original approach and the Smith/Volpano method are
flow-insensitive and hence miss some of the information present in slices and
path conditions.
Another type-based approach is the CQual system [Foster et al. 2002]. CQual
is flow-sensitive as flow information is coded into types. It has successfully been
used to detect locking bugs in the Linux kernel. While CQual requires anno-
tations, it may be that it can be used to improve the Smith/Volpano method
(to our knowledge, this has not been done). However, illegal information flow
shows up as a type error in a nonstandard type system, while path conditions
can be used as witnesses which make illegal behavior directly visible. Consid-
ering the usefulnes of program analysis, for example, in a lawsuit, we believe
that our witnesses are more convincing to the judge than an abstract type
mismatch.
The recent overview article [Sabelfeld and Myers 2003] presents even more
work on information flow control, based on program analysis. The focus is again
on type-based methods, and while data flow analysis and program slicing are
mentioned, the true value of these techniques (and improvements such as path
conditions) for safety analysis has in our opinion not yet been recognized.
There are several generators of static analysers such as PAG [Martin 1998],
TVLA [Lev-Ami and Sagiv 2000], and Lande [Metayer and Schmidt 1996]. In
principle, these could be used to implement path conditions. This requires a
formal semantics for full ANSI C, which, to our knowledge, has not been con-
structed for any of these systems. Furthermore, we suspect that the generated
systems do not scale up as the algorithmic techniques from Section 4 are not
available.
Recently, model checking has gained popularity as a device to check certain
safety properties of programs. The Bandera project [Corbett et al. 2000] as well
as SLAM [Ball and Rajamani 2002] extract finite models from software which
can then automatically be checked against specifications in temporal logics.
While model checking is certainly a most useful instrument, ordinary model
checking cannot be used for manipulation detection; during model extraction,
illegal information flow might get abstracted away. Note, however, that an SDG
can be viewed as a nonstandard finite model on which LTL formulae can be
checked, an idea we plan to explore in the future.
ACM Transactions on Software Engineering and Methodology, Vol. 15, No. 4, October 2006.
450 • G. Snelting et al.
APPENDIXES
ACM Transactions on Software Engineering and Methodology, Vol. 15, No. 4, October 2006.
Efficient Path Conditions in Dependence Graphs • 451
SDGs and path conditions to check whether the noninterference criterion for
(D, ;) is fulfilled for the program.
Noninterference is defined with respect to an abstract automaton A =
(Z , run, output, z 0 ) where run : Z × A → Z is the state transition function
and output : Z × A → O is the output function. Z , the set of program states, is
usually infinite; z 0 ∈ Z is the start state. The set of actions A is, in our case, the
set of program statements or expressions or, more precisely, the set of nodes
N in the SDG. run is extended to A∗ as usual: run(z, ) = z; run(z, a∧ x) =
run(run(z, a), x).
The security domain of action a ∈ A is dom(a) ∈ D. The noninterference
relation ; ⊆ D × D specifies which security domains must not influence each
other. The complement of ;, namely, the interference relation ; is assumed
to be reflexive and transitive. Given a statement sequence x and a security
domain d , the function purge : A∗ × D → A∗ removes from x all statements
which must not influence security level d : purge(x, d ) = a ∈ x | ¬(dom(a)
d ) = a ∈ x | dom(a) ; d .
;
A system is considered safe according to the Goguen/Meseguer noninterfer-
ence criterion if, for all possible statement sequences x and all final statements
a,
output(run(z 0 , x), a) = output(run(z 0 , purge(x, dom(a))), a).
That is, the final program output is unchanged if any statement which must not
influence the last action according to its security level is deleted. If the condition
is not satisfied, there might be some action which produces a different output
on an actual run than on a run with all supposedly noninfluential statements
removed, that is, there is an influence from a statement s in x to a even though
this is forbidden due to dom(s) ; dom(a). We see that the notion of safety is
based on observational behavior and not on the source code.
The following theorem and corollary demonstrate how slices and path con-
ditions can be used to check for noninterference.
THEOREM 1. If
s ∈ BS(a) =⇒ dom(s) ; dom(a)
then the noninterference criterion is satisfied for a.
PROOF.14 By definition of purge, we have
output(run(z 0 , purge(x, dom(a))), a)
= output(run(z 0 , s ∈ x | dom(s) ; dom(a)), a).
For every s ∈ s ∈ x | dom(s) ; dom(a), either s ∈ BS(a) or s ∈ BS(a)
holds. In the latter case, we may conclude ¬I (s, a) as I (s, a) =⇒ s ∈ BS(a)
(see Section 2.1), and s can be ignored as it cannot influence the final output.
Thus we may assume s ∈ BS(a), hence
= output(run(z 0 , s ∈ x | dom(s) ; dom(a)), a)
= output(run(z 0 , s ∈ x | dom(s) ; dom(a) ∧ s ∈ BS(a)), a).
14 We thank one reviewer for providing this proof which is simpler than our original inductive proof.
ACM Transactions on Software Engineering and Methodology, Vol. 15, No. 4, October 2006.
452 • G. Snelting et al.
COROLLARY 1. If
ACM Transactions on Software Engineering and Methodology, Vol. 15, No. 4, October 2006.
Efficient Path Conditions in Dependence Graphs • 453
PROOF OF EQUATION (11). We write CH(x, y)|z i for the subchop of CH(x, y)
containing all paths containing z i , and PC(x, y)|z i for the path condition in this
subchop (i.e., Equation (1) is applied to CH(x, y)|z i ). Then
k
PC(x, y) = E(u) = E(u)
P ∈CH(x, y) u∈P i=1 P ∈CH(x, y)|z i u∈P
k
k
= PC(x, y)|z i = PC(x, z i ) ∧ PC(z i , y)
i=1 i=1
k
PC(x, y) = E(u) = E(u)
P ∈CH(x, y) u∈P i=1 P ∈CH(x, y)|ei u∈P
k
k
= PC(x, y)|ei = PC(x, ei ) ∧ PC(ei , y)
i=1 i=1
k
m
= PC(x, ei ) ∧ PC(ei , y)|o j
i=1 j =1
k
m
= PC(x, ei ) ∧ PC(ei , o j ) ∧ PC(o j , y)
i=1 j =1
PC(x, y) = E(u) = E(u)
P ∈CH(x, y) u∈P z∈pred( y) P ∈CH(x, y)|z u∈P
= PC(x, y)|z = PC(x, z) ∧ PC(z, y)
z∈pred( y) z∈pred( y)
= PC(x, z) ∧ E(z) ∧ E( y) = E( y) ∧ PC(x, z)
z∈pred( y) z∈pred( y)
as E(z) is already conjunctively added in PC(x, z). The proof of Equation (14)
is analogous.
ACM Transactions on Software Engineering and Methodology, Vol. 15, No. 4, October 2006.
454 • G. Snelting et al.
as the cycle is excluded from PC(x, y) iff its back edge is excluded.
ACKNOWLEDGMENTS
Making path conditions in ValSoft work and scale up would not have been
possible without earlier work in slicing algorithms, points-to analysis, interval
analysis, BDDs, and constraint solvers; we collectively thank all the researchers
involved in these topics. We also would like to thank the reviewers for their
substantial and helpful comments on an earlier version of this article.
REFERENCES
AGRAWAL, H., DEMILLO, R. A., AND SPAFFORD, E. H. 1991. Dynamic slicing in the presence of point-
ers, arrays and records. In Proceedings of the ACM 4th Symposium on Testing, Analysis and
Verification (TAV4). ACM Press, New York, 60–73.
BAADER, F. AND NIPKOW, T. 1998. Term rewriting and All That. Cambridge University Press, Cam-
bridge, UK.
BALL, T. AND RAJAMANI, S. K. 2002. The SLAM project: Debugging system software via static
analysis. In Proceedings of the 29th ACM Symposium on Principles of Programming Languages.
ACM Press, 1–4.
BELL, D. AND LA PADULA, L. 1973. Secure computer systems: Mathematical foundations. MITRE
Tech. rep. 2547.
BENHAMOU, F. AND COLMERAUER, A. 1993. Constraint Logic Programming: Selected Research. MIT
Press, Cambridge, MA.
BENT, L., ATKINSON, D. C., AND GRISWOLD, W. G. 2000. A comparative study of two whole pro-
gram slicers for C. Tech. rep. CS2000-0643, Department of Computer Science and Engineering.
University of California, San Diego, CA.
BERGSTRA, J., HEERING, J., AND KLINT, P. 1989. Algebraic specifications. ACM Press/Addison Wesley,
CA.
BRYANT, R. E. 1986. Graph-based algorithms for boolean function manipulation. IEEE Trans.
Comput. 677–691.
BURKE, M., CARINI, P., CHOI, J.-D., AND HIND, M. 1995. Flow-insensitive interprocedural alias
analysis in the presence of pointers. Lecture Notes in Computer Science, vol. 892, D. Gelertner,
A. Nicolau, and D. Padua, Eds. Springer-Verlag, Berlin, Germany.
ACM Transactions on Software Engineering and Methodology, Vol. 15, No. 4, October 2006.
Efficient Path Conditions in Dependence Graphs • 455
BUSH, W., PINCUS, J., AND SIELAFF, D. 2000. A static analyzer for finding dynamic programming
errors. Softw. Pract. Exper. 30, 775–802.
CANFORA, G., CIMITILE, A., A. DE LUCIA, AND LUCCA, G. 1998. Conditioned program slicing. Inform.
Softw. Techn. 40, (Special Issue on Program Slicing). 595–607.
COMMON CRITERIA PROJECT SPONSORING ORGANIZATIONS. 2004. Common criteria for information tech-
nology security evaluation. CCIMB-2004-01-001, Version 2.2, Revision 2561 (Jan.).
CORBETT, J. C., DWYER, M. B., HATCLIFF, J., LAUBACH, S., PĂSĂREANU, C. S., ROBBY, AND ZHENG, H. 2000.
Bandera: Extracting finite-state models from java source code. In International Conference on
Software Engineering. ACM Press, 439–448.
CYTRON, R., FERRANTE, J., ROSEN, B. K., WEGMAN, M. N., AND ZADECK, F. K. 1991. Efficiently com-
puting static single assignment form and the control dependence graph. ACM Trans. Program.
Lang. Syst., 451–490.
DEMILLO, R. AND OFFUT, A. 1991. Constraint-based automatic test data generation. IEEE Trans.
Softw. Engin., 900–910.
DENNING, D. AND DENNING, P. 1977. Certification of programs for secure information flow. Comm.
ACM 20, 7, 504–513.
DOLZMANN, A. AND STURM, T. 1997. Redlog: Computer algebra meets computer logic. ACM SIGSAM
Bulletin 31, 2, 2–9.
FERRANTE, J., OTTENSTEIN, K. J., AND WARREN, J. D. 1987. The program dependence graph and its
use in optimization. ACM Trans. Program. Lang. Syst. 9, 3 (July), 319–349.
FIELD, J., RAMALINGAM, G., AND TIP, F. 1995. Parametric program slicing. In Proceedings of the 22nd
Symposium on Principles of Programming Languages (POPL’95). ACM SIGPLAN-SIGACT, 379–
392.
FOSTER, J., TERAUCHI, T., AND AIKEN, A. 2002. Flow-sensitive type qualifiers. In Proceedings of the
ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM Press.
GOGUEN, J. AND MESEGUER, J. 1984. Interference control and unwinding. In Proceedings of the
Symposium on Security and Privacy. IEEE, 75–86.
GOLDBERG, A., WANG, T. C., AND ZIMMERMAN, D. 1994. Applications of feasible path analysis to pro-
gram testing. In Proceedings of the International Symposium on Software Testing and Analysis.
ACM Press, 80–94.
GOTLIEB, A., BOTELLA, B., AND RUEHER, M. 1998. Automatic test data generation using constraint
solving techniques. In Proceedings of the International Symposium on Software Testing and
Analysis. ACM, 53–62.
GUPTA, N., MATHUR, A., AND SOFFA, M. L. 1998. Automated test data generation using an iterative
relaxation model. In Proceedings of the International Symposium on Foundations of Software
Engineering. ACM, 231–244.
HORWITZ, S. B., REPS, T. W., AND BINKLEY, D. 1990. Interprocedural slicing using dependence
graphs. ACM Trans. Program. Lang. Syst. 12, 1 (Jan.), 26–60.
KNOOP, J., STEFFEN, B., AND VOLLMER, J. 1996. Parallelism for free: Efficient and optimal bitvector
analyses for parallel programms. ACM Trans. Program. Lang. Syst. 18, 3.
KRINKE, J. 1998. Static slicing of threaded programs. In Proceedings of the SIGPLAN/SIGSOFT
Workshop on Program Analysis for Software Tools and Engineering. 35–42.
KRINKE, J. 2002. Evaluating context-sensitive slicing and chopping. In Proceedings of the Inter-
national Conference on Software Maintenance. IEEE, 22–31.
KRINKE, J. 2003a. Advanced slicing of sequential and concurrent programs. Ph.D. thesis, Uni-
versität Passau.
KRINKE, J. 2003b. Context-sensitive slicing of concurrent programs. In Proceedings of the
FSE/ESEC. ACM Press, 178–187.
KRINKE, J. 2004. Slicing, chopping and path conditions with barriers. Softw. Quality J. 12, 4.
KRINKE, J. AND SNELTING, G. 1998. Validation of measurement software as an application of
slicing and constraint solving. Infor. Softw. Techn. (Special issue on Program Slicing). 661–
675.
LENGAUER, T. AND TARJAN, R. 1979. A fast algorithm for finding dominators in a flowgraph. ACM
Trans. Program. Lang. Syst. 121–141.
LEV-AMI, T. AND SAGIV, S. 2000. TVLA: A system for implementing static analyses. Static Analysis
Symposium. 280–301.
ACM Transactions on Software Engineering and Methodology, Vol. 15, No. 4, October 2006.
456 • G. Snelting et al.
LIND-NIELSEN, J. 2001. BuDDy—A binary decision diagram package. Tech. rep., University of
Copenhagen. http://www.itu.dk/reserach/buddy.
MANTEL, H., STEPHAN, W., ULLMANN, M., AND VOGT, R. 2000. Leitfaden für die Erstellung und
Prüfung formaler Sicherheitsmodelle im Rahmen von ITSEC und Common Criteria. Tech. rep.,
Bundesamt für Sicherheit in der Informationstechnik und Deutsches Forschungszentrum für
Künstliche Intelligenz. Version 0.8.
MARRIOTT, K. AND STUCKEY, P. 1998. Programming with Constraints. MIT Press, Cambridge, MA.
MARTIN, F. 1998. PAG—An efficient program analyzer generator. Int. J. Softw. Tools Techn. Trans-
fer 2, 1, 46–67.
MCDOWELL, C. E. AND HELMBOLD, D. P. 1989. Debugging concurrent programs. ACM Comput.
Surv. 21, 4.
METAYER, D. L. AND SCHMIDT, D. 1996. Structural operational semantics as a basis for static pro-
gram analysis. ACM Comput. Surv. 2, 2 (June) 340–343.
NAUMOVICH, G. AND AVRUNIN, G. S. 1998. A conservative data flow algorithm for detecting all
pairs of statements that may happen in parallel. In Proceddings of the 6th ACM Symposium on
Foundations of Software Engineering (FSE’98). 24–34.
OTTENSTEIN, K. J. AND OTTENSTEIN, L. M. 1984. The program dependence graph in a software de-
velopment environment. In Proceedings of the ACM SIGSOFT/SIGPLAN Software Engineering
Symposium on Practical Software Development Environments 19, 5, 177–184.
PUGH, W. AND WONNACOTT, D. 1998. Constraint-based array dependency analysis. ACM Trans.
Program. Lang. Syst., 1248–1278.
QUINE, W. 1955. A way to simplify truth functions. Amer. Mathemat. Soc. 62, 627–631.
RAMALINGAM, G. 1999. Identifying loops in almost linear time. ACM Trans. Program. Lang. Syst.
175–188.
REPS, T. 1998. Program analysis via graph reachability. Inform. Softw. Techn. (Special issue on
program slicing). 701–726.
REPS, T. 2000. Undecideability of context-sensitive data-dependence analysis. ACM Trans. Pro-
gram. Lang. Syst. 162–186.
REPS, T. W., HORWITZ, S. B., SAGIV, M., AND ROSAY, G. 1994. Speeding up slicing. In Proceedings of
the 2nd ACM SIGSOFT Symposium on the Foundations of Software Engineering (SIGSOFT ’94).
ACM Press, 19, 5, 11–20.
REPS, T. W. AND ROSAY, G. 1995. Precise interprocedural chopping. In Proceedings of the 3rd ACM
SIGSOFT Symposium on the Foundations of Software Engineering (SIGSOFT ’95). Washington,
DC. ACM Press.
ROBSCHINK, T. 2005. Pfadbedingungen in Abhängigkeitsgraphen und ihre Anwendung in der
Softwaresicherheitstechnik. Ph.D. thesis, Universität Passau.
ROBSCHINK, T. AND SNELTING, G. 2002. Efficient path conditions in dependence graphs. In Proceed-
ings of the International ACM/IEEE Conference on Software Engineering (ICSE’02). Orlando,
FL, 478–488.
RUSTAN, K., LEINO, M., AND NELSON, G. 1998. An extended static checker for Modula-3. In Compiler
Construction: 7th International Conference (CC’98). Lecture Notes in Computer Science, vol.
1383. Springer Verlag, Berlin, Germany, 302–305.
SABELFELD, A. AND MYERS, A. 2003. Language-based information-flow security. IEEE J. Select.
Areas Comm 21, 1 (Jan.).
SMITH, G. AND VOLPANO, D. 1998. Secure information flow in a multithreaded imperative language.
In Proceedings of the 25th ACM Symposium on Principles of Programming Languages. San Diego,
CA, ACM Press, 355–364.
SNELTING, G. 1996. Combining slicing and constraint solving for validation of measurement soft-
ware. In Proceedings of the Static Analysis Symposium. Lecture Notes in Computer Science, vol.
1145. Sprenger-Verlag, Berlin, Germany, 332–348.
SREEDHAR, V. C., GAO, G. R., AND LEE, Y.-F. 1996. Identifying loops using DJ graphs. ACM Trans.
Program. Lang. Syst., 649–658.
STURM, T. AND WEISPFENNING, V. 1996. Computational geometry problems in REDLOG. In Auto-
mated Deduction in Geometry. 58–86.
TARJAN, R. E. 1974. Testing flow graph reducibility. J. Comput. Syst. Science 9, 355–
365.
ACM Transactions on Software Engineering and Methodology, Vol. 15, No. 4, October 2006.
Efficient Path Conditions in Dependence Graphs • 457
TEITELBAUM, T. 2001. Code surfer user guide and reference. Tech. rep., Gramma Tech Product
Documentation. http://www.grammatech.com/csurf-doc/manual.html.
TIP, F. 1995. A survey of program slicing techniques. J. Program. Lang. 3, 3 (Sept.) 121–189.
WEISER, M. 1984. Program slicing. IEEE Trans. Softw. Eng. 10, 4 (July), 352–357. (Republished
in Berzins 1995).
WEISPFENNING, V. 1997. Simulation and optimization by quantifier elimination. J. Symbolic Com-
put. 24, 2, 189–208.
WEISPFENNING, V. 1999. Mixed real-integer linear quantifier elimination. In Proceedings of the
ACM SIGSAM International Symposium on Symbolic and Algebraic Computation (ISSAC ’99).
129–136.
WOLFRAM, S. 1999. The Mathematica Book. Wolfram Research.
Received July 2002; revised August 2003, October 2004, May 2005; accepted January 2006
ACM Transactions on Software Engineering and Methodology, Vol. 15, No. 4, October 2006.