2007-CAV - Configurable Software Verification
2007-CAV - Configurable Software Verification
1 Introduction
Automatic program verification requires a choice between precision and efficiency. The
more precise a method, the fewer false positives it will produce, but also the more ex-
pensive it is, and thus applicable to fewer programs. Historically, this trade-off was
reflected in two major approaches to static verification: program analysis and model
checking. While in principle, each of the two approaches can be (and has been) viewed
as a subcase of the other [18, 19, 7], such theoretical relationships have had little impact
on the practice of verification. Program analyzers, by and large, still target the efficient
computation of few simple facts about large programs; model checkers, by contrast,
focus still on the removal of false alarms through ever more refined analyses of rel-
atively small programs. Emphasizing efficiency, static program analyzers are usually
path-insensitive, because the most efficient abstract domains lose precision at the join
points of program paths. Emphasizing precision, software model checkers, on the other
hand, usually never join abstract domain elements (such as predicates), but explore an
abstract reachability tree that keeps different program paths separate.
In order to experiment with the trade-offs, and in order to be able to set the dial be-
tween the two extreme points, we have extended the software model checker
B LAST [11] to permit customized program analyses. Traditionally, customization has
This research was supported in part by the grant SFU/PRG 06-3, and by the Swiss National
Science Foundation.
W. Damm and H. Hermanns (Eds.): CAV 2007, LNCS 4590, pp. 504–518, 2007.
c Springer-Verlag Berlin Heidelberg 2007
Configurable Software Verification 505
meant to choose a particular abstract interpreter (abstract domain and transfer func-
tions, perhaps a widening operator) [13, 8, 14, 20], or a combination of abstract inter-
preters [10, 6, 4, 12]. Here, we go a step further in that we also configure the execution
engine of the chosen abstract interpreters. At one extreme (typical for program analyz-
ers), the execution engine propagates abstract domain elements along the edges of the
control-flow graph of a program until a fixpoint is reached [5]. At the other extreme
(typical for model checkers), the execution engine unrolls the control-flow graph into
a reachability tree and decorates the tree nodes with abstract domain elements, until
each node is ‘covered’ by some other node that has already been explored [11]. In or-
der to customize the execution of a program analysis, we define and implement a meta
engine that needs to be configured by providing, in addition to one or more abstract
interpreters, a merge operator and a termination check.
The merge operator indicates when two nodes of a reachability tree are merged, and
when they are explored separately: in classical program analysis, two nodes are merged
if they refer to the same control location of the program; in classical model checking,
no nodes are merged. The termination check indicates when the exploration of a path
in the reachability tree is stopped at a node: in classical program analysis, when the
corresponding abstract state does not represent new (unexplored) concrete states (i.e.,
a fixpoint has been reached); in classical model checking, when the corresponding ab-
stract state represents a subset of the concrete states represented by another node. Our
motivation is practical, not theoretical: while it is theoretically possible to redefine the
abstract interpreter to capture different merge operators and termination checks within a
single execution engine, we wish to reuse abstract interpreters as building blocks, while
still experimenting with different merge operators and termination checks. This is par-
ticularly useful when several abstract interpreters are combined. In this case, our meta
engine can be configured by defining a composite merge operator from the compo-
nent merge operators; a composite termination check from the component termination
checks; but also a composite transfer function from the component transfer functions.
Combining the advantages of different execution engines for different abstract inter-
preters can yield dramatic results, as was shown by predicated lattices [9]. That work
combined predicate abstraction with a data-flow domain: the data-flow analysis be-
comes more precise by distinguishing different paths through predicates; at the same
time, the efficiency of a lattice-based analysis is preserved for facts that are difficult to
track by predicates. However, the configuration of predicated lattices is just one possi-
bility, combining abstract reachability trees for the predicate domain with a join-based
analysis for the data-flow domain. Another example is lazy shape analysis [2], where
we combined predicate abstraction and shape analysis. Again, we ‘hard-wired’ one par-
ticular such combination: no merging of nodes; termination by checking coverage be-
tween individual nodes; cartesian product of transfer functions. Our new, configurable
implementation permits the systematic experimentation with many variations, and the
results are presented in this paper. We show that different configurations can lead to
large, example-dependent differences in precision and performance. In particular, it is
often useful to use non-cartesian transfer functions, where information flows between
multiple abstract interpreters, e.g., from the predicate state to the shape state (or lat-
tice state), and vice versa. By choosing suitable abstract interpreters and configuring
506 D. Beyer, T.A. Henzinger, and G. Théoduloz
the meta engine, we can also compare the effectiveness and efficiency of symbolic ver-
sus explicit representations of values, and the use of different pointer alias analyses in
software model checking.
In recent years we have observed a convergence of historically distinct program ver-
ification techniques. It is indeed difficult to say whether our configurable verifier is a
model checker (as it is based on B LAST) or a program analyzer (as it is configured by
choosing a set of abstract interpreters and some parameters for executing and combin-
ing them). We believe that the distinction is no longer practically meaningful (it has
not been theoretically meaningful for some time), and that this signals a new phase in
automatic software verification tools.
g
ee . For soundness and progress of the program analysis, the abstract domain and the
corresponding transfer relation have to fulfill the following requirements:
Algorithm 1. CPA(D, e0 )
Input: a configurable program analysis D = (D, , merge, stop),
an initial abstract state e0 ∈ E, let E denote the set of elements of the semi-lattice of D
Output: a set of reachable abstract states
Variables: a set reached of elements of E, a set waitlist of elements of E
waitlist := {e0 }
reached := {e0 }
while waitlist = ∅ do
pop e from waitlist
for each e with ee do
for each e ∈ reached do
// Combine with existing abstract state.
enew := merge(e , e )
if enew = e then
waitlist := waitlist ∪ {enew } \ {e }
reached := reached ∪ {enew } \ {e }
if ¬ stop(e , reached) then
waitlist := waitlist ∪ {e }
reached := reached ∪ {e }
return reached
analysis P by using their abstract domain and transfer relation, and choosing the merge
operator mergeP = mergesep and the termination check stopP = stopsep .
Shape Analysis. Shape analysis is a static analysis that uses finite structures (shape
graphs) to represent instances of heap-stored data structures. We can express the frame-
work of Sagiv et al. [17] as a configurable program analysis S by using their abstract
(powerset) domain and transfer relation, and choosing the merge operator mergeS =
mergejoin and the termination check stopS = stopjoin .
replaced by the new one.3 If after the merge step the resulting new abstract state is not
covered by the set reached, then it is added to the set reached and to the set waitlist.4
We now show how model checking and data-flow analysis are instances of configurable
program analysis.
3
Implementation remark: The operator merge can be implemented in a way that it operates
directly on the reached set. If the set reached is stored in a sorted data structure, there is no
need to iterate over the full set of reachable abstract states, but only over the abstract states that
need to be combined.
4
Implementation remark: The termination check can be done additionally before the merge
process. This speeds up cases where the termination check is cheaper than the merge.
510 D. Beyer, T.A. Henzinger, and G. Théoduloz
Combinations of Model Checking and Program Analysis. Due to the fact that the model-
checking algorithm never uses a join operator, the analysis is automatically path-
sensitive. In contrast, path-sensitivity in data-flow analysis requires the use of a more
precise data-flow lattice that distinguishes abstract states on different paths. On the
other hand, due to the join operations, the data-flow analysis can reach the fixpoint
much faster in many cases. Different abstract interpreters exhibit significant differences
in precision and cost, depending on the choice for the merge operator and termination
check. Therefore, we need a mechanism to combine the best choices of the operators
for different abstract interpreters when composing the resulting program analyses.
the three composite operators determines the precision of the resulting configurable pro-
gram analysis. In previous approaches, a redefinition of basic operations was necessary,
but using configurable program analysis, we can reuse the existing abstract interpreters.
Example: B LAST’s Domain. The program analysis that is implemented in the tool
B LAST can be expressed as a configurable program analysis D that derives from the
composite program analysis C = (L, P, × , merge× , stop× ), where the components
are the configurable program analysis L for locations and the configurable program
analysis P for predicate abstraction. We construct the composite transfer relation ×
g g g
such that (l, r)× (l , r ) iff lL l and rP r . We choose the composite merge opera-
tor merge× = mergesep and the composite termination check stop× = stopsep .
Example: B LAST’s Domain + Shape Analysis. The combination of predicate abstrac-
tion and shape analysis [2] can now be expressed as the composite program analysis
C = (L, P, S, × , merge× , stop× ) with the three components: location analysis L,
predicate abstraction P, and shape analysis S. In our previous work [2] we used a con-
figuration that corresponds to the composite merge operator merge× = mergesep and
the composite termination check stop× = stopsep . Our new tool allows us now to de-
fine the three composite operators × , merge× , and stop× in many different ways, and
we report the results of our experiments in Sect. 3.
Example: B LAST’s Domain + Pointer Analysis. Fischer et al. used a particular com-
bination (called predicated lattices) of predicate abstraction and a data-flow analy-
sis for pointers [9], which we can express as the composite program analysis C =
(L, P, A, × , merge× , stop× ), where A is a configurable pointer analysis. The trans-
g g g g
fer relation × is such that (l, r, d)× (l , r , d ) iff lL l and rP r and dA d .
We can configure the algorithm of Fischer et al. by choosing the composite termination
check stop× = stopsep and the composite merge operator that joins the third elements
if the first two agree:
(l , r , mergeA (d, d )) if l = l and r = r
merge× ((l, r, d), (l , r , d )) =
(l , r , d ) otherwise
with mergeA (d, d ) = d A d .
Remark: Location Domain. Traditional data-flow analyses do not consider the location
domain as a separate abstract domain; they assume that the locations are always ex-
plicitly analyzed. In contrast, we leave this completely up to the interpreter. We find it
interesting to consider the program counter as just another program variable, and de-
fine a location domain that makes the program counter explicit when composed with
other domains. This releases the other abstract domains from defining the location han-
dling, and only the parameters for the composite program analysis need to be set. This
keeps different concerns separate. Usually, only the program counter variable is mod-
eled explicitly, and all other variables are represented symbolically (e.g., by predicates
or shapes). We have the freedom to treat any program variable explicitly, not only the
program counter; this may be useful for loop indices. Conversely, we can treat the
program counter symbolically, and let other variables ‘span’ the abstract reachability
tree.
512 D. Beyer, T.A. Henzinger, and G. Théoduloz
3 Experiments
We evaluated our new approach on several combinations of abstract interpreters, under
several different configurations. We implemented the configurable program analysis as
an extension of the model checker B LAST, in order to be able to reuse many compo-
nents that are necessary for an analysis tool but out of our focus in this work. B LAST
supports recursive function calls, as well as pointers and recursive data structures on
the heap. For representing the shape-analysis domain we use parts of the T VLA imple-
mentation [13]. For pointer-alias analysis, we use the implementation that comes with
C IL [16]. We use the configuration of Fischer et al. [9] to compare with predicated
lattices.
The transfer relation is cartesian, i.e., the successors of the different components are
computed independently (cf. [2]). The merge operator joins the shape graphs of abstract
regions that agree on both the location and the predicate region. The predicate regions
are never joined. Termination is checked using the coverage against a single abstract
state. This configuration corresponds to Fischer et al.’s predicated lattice [9].
Example. To illustrate the difference between the various configurations, we use the
C program in Fig. 1(a). This program constructs a lists that contains the data values 1
or 2, depending on the value of the variable flag, and ends with a single 3. We illustrate
the example using the following abstractions. In the predicate abstraction, we keep track
of the nullary predicate flag. In the shape analysis, we consider shape graphs for the list
Configurable Software Verification 513
14 p->h = 3;
15 } h=1 h=1 h=3 h=2 h=2 h=3
ite program analysis joins neither predicate regions nor shape regions, and corresponds
to lazy shape analysis [2].
Example. Since this composite program analysis is not joining elements, there is no
reached abstract state with a set of shape graphs of size larger than 1 (unlike in the
previous configuration A). Instead, we maintain distinct abstract states. In particu-
lar, at the exit location, the set of reached abstract states contains the following ab-
stract states: (15, flag, {g1 }), (15, flag, {g2,1 }), (15, flag, {g3,1 }), (15, flag, {g4,1 }),
(15, ¬flag, {g1 }), (15, ¬flag, {g2,1 }), (15, ¬flag, {g3,1 }), and (15, ¬flag, {g4,1 }). This
set of abstract states represents exactly the same set of concrete states as the result of
the previous analysis (configuration A).
Experimental Results. All examples in our experiments have smaller run times using
this configuration, and the precision in the experiments does not change, compared to
configuration A. Precision: Shape analysis is based on a powerset domain, and there-
fore, joins are precise. The precision of the predicated lattice is the same as the precision
of this variant without joins. Performance: Although the number of explored abstract
states is slightly higher, this configuration improves the performance of the analysis.
The size of lattice elements (i.e., the average number of shape graphs in an abstract
state) is considerably smaller than in the predicated-lattice configuration (A). There-
fore, we achieve a better performance, because operations (in particular the successor
computations) on small sets of shape graphs are much more efficient than on large sets.
Example. In the example, the strengthening operator has no effect, because the nullary
predicate flag has no relation with any predicates used in the shape graph. The strength-
ening operator would prove useful if, for example, the shape graphs had in addition a
unary field predicate h = x (indicating that the field h of a node has the same value as
the program variable x), and the predicate abstraction had the nullary predicate x = 3.
Consider the operation at line 14 (p->h = 3). The successor of the shape graph be-
fore applying the strengthening operator can only update the unary field predicate h = x
to value 1/2, while the unary field predicate h = 3 can be set to value 1 for the node
pointed to by p. Supposing x = 3 holds in the predicate region of the abstract successor,
the strengthening operator updates the field predicate h = x to value 1 as well.
cause of the strengthening operator, the abstract successors are more precise than using
the cartesian transfer relation. Therefore, the whole analysis is more precise.
Performance: The cost of the strengthening operator is small compared to the cost of
the shape-successor computation. Therefore, the performance is not severely impacted
when compared to a cartesian transfer relation.
D: As Precise as Model Checking with Improved Termination Check (merge-sep,
stop-join). Now we try to achieve another improvement over configuration B: we re-
place the termination check with one that checks the abstract state against the join of
the reached abstract states that agree on locations and predicates:
stop× ((l, r, s), R) = (s S S {s | (l, r, s ) ∈ R})
The previous termination check was going through the set of already reached abstract
states, checking against every abstract state for coverage. Alternatively, abstract states
that agree on the predicate abstraction can be summarized by one single abstract state
that is used for the termination check. This is sound because the shape-analysis domain
is a powerset domain.
Example. To illustrate the use of the new termination check in the example, consider a set
of reached abstract states that contains at some intermediate step the following abstract
states: (15, flag, {g1 }), (15, flag, {g2,1 }), (15, ¬flag, {g1 }), and (15, ¬flag, {g2,2 }). If
we want to apply the termination check to the abstract state (15, flag, {g1 , g2,1 }) and the
given set of reached abstract states, we check whether the set {g1 , g2,1 } of shape graphs
is a subset of the join of all shape graphs already found for this location and valuation
of predicates (that is, the set {g1 , g2,1 }). The check would not be positive at this point
using termination check stopsep .
Experimental Results. The overall performance impact is slightly negative. Precision:
This configuration does not change the precision for our examples. Performance: We
expected improved performance by (1) avoiding many single coverage checks because
of the summary abstract state, and (2) fewer successor computations, because we may
recognize earlier that the fixpoint is reached. However, the performance impact in our
examples is negligible, because a very small portion of the time is spent on termination
checks, and the gain is more than negated by the overhead due to the joins.
Example. The composite program analysis encounters, for example, the abstract state
(15, flag, {g1 , g2,1 , g2,2 , g3,1 , g3,2 , . . .}), which contains shape graphs for lists that con-
tain either 1s or 2s despite the fact that flag has the value true. Therefore, we note
a loss of precision compared to the predicated-lattice approach (configuration A), be-
516 D. Beyer, T.A. Henzinger, and G. Théoduloz
Program A B C D E F
pred-join merge-sep merge-sep merge-sep merge-join merge-join
stop-sep stop-sep stop-sep stop-join stop-join stop-join
transfer-new join preds
simple 0.53 s 0.32 s 0.40 s 0.34 s 0.51 s 0.50 s
simple backw 0.43 s 0.28 s 0.26 s 0.31 s 0.44 s 0.45 s
list 1 0.42 s 0.37 s 0.41 s 0.32 s 0.41 s 0.41 s
list 2 5.24 s 0.85 s 1.25 s 0.86 s 5.34 s 5.36 s
list 3 138.97 s 1.79 s 2.62 s 2.10 s 132.08 s 132.07 s
list 4 > 600 s 9.67 s 15.44 s 11.87 s > 600 s > 600 s
alternating 0.86 s 0.61 s 0.96 s 0.60 s FP FP
list flag 0.69 s 0.49 s 0.79 s 0.46 s FP FP
list flag2 FP FP 0.81 s FP FP FP
Program CFA nodes LOC A: orig. (join) B: more precision (no join)
s3 clnt 272 2 547 0.680 s 0.830 s
s3 srvr 322 2 542 0.560 s 0.590 s
cdaudio 968 18 225 33.50 s > 600 s
diskperf 549 14 286 248.330 s > 600 s
cause the less precise merge operator looses the correlation between the value of the
nullary predicate flag and the shape graphs.
Experimental Results. The analysis is not able to prove several of the examples that
were successfully verified with previous configurations. Precision: The shape-analysis
component has lost the path-sensitivity: the resulting shape graphs are similar to what a
classical fixpoint algorithm for data-flow analysis would yield. Therefore, the analysis
is less precise. Performance: The run time is similar to configuration A.
F: Predicate Abstraction with Join (merge-join for preds). We now evaluate a com-
posite program analysis that is similar to a classical data-flow analysis, i.e., both pred-
icates and shapes are joined for the abstract states that agree on the program location.
We consider the following merge operator:
(l , mergeP (r, r ), mergeS (d, d )) if l = l
merge× ((l, r, s), (l , r , s )) =
(l , r , d ) otherwise
where mergeP (r, r ) = r P r is the weakest conjunction of predicates that implies r∨r.
This composite program analysis corresponds exactly to a data-flow analysis on the direct
product of the two lattices: the set of reached abstract states contains only one abstract
state per location, because the merge operator joins abstract states of the same location.
Example. At location 15, we have one abstract state: (15, true, {g1 , g2,1 , g2,2 , . . .}).
Experimental Results. This configuration can prove the same example programs as con-
figuration E, and the run times are also similar to configuration E.
Precision: This composite program analysis is the least precise in our set of config-
urations, because the merge operator joins both the predicates and the shape graphs
independently, for a given location. While join is suitable for many data-flow analy-
ses, predicate abstraction becomes very imprecise when predicate regions are joined,
Configurable Software Verification 517
4 Conclusion
When the goal is as difficult as automatic software verification, it is imperative to bring
to bear insights and optimizations no matter if they originated in model checking, pro-
gram analysis, or automated theorem proving (which is heavily used in B LAST, to com-
pute transfer functions and to perform termination checks). We have therefore modified
B LAST from a tree-based software model checker to a tool that can be configured using
different lattice-based abstract interpreters, composite transfer functions, merge opera-
tors, and termination checks. Specifically configured extensions of B LAST with lattice-
based analysis had been implemented before, e.g., in predicated lattices [9] and in lazy
shape analysis [2]. As a side-effect, we can now express the algorithmic settings of
these papers in a simple and systematic way, and moreover, we have found different
configurations that perform even better.
References
1. Ball, T., Podelski, A., Rajamani, S.K.: Boolean and cartesian abstractions for model checking
C programs. In: Margaria, T., Yi, W. (eds.) ETAPS 2001 and TACAS 2001. LNCS, vol. 2031,
pp. 268–283. Springer, Heidelberg (2001)
2. Beyer, D., Henzinger, T.A., Théoduloz, G.: Lazy shape analysis. In: Ball, T., Jones, R.B.
(eds.) CAV 2006. LNCS, vol. 4144, pp. 532–546. Springer, Heidelberg (2006)
8
A: predicated join; B: no join (model checking); C: no join and more precise transfer relation;
D: no join, termination check with join; E: normal join of shapes (data-flow analysis); F: join
for predicate abstraction. All experiments were run on a 3 GHz Intel Xeon processor.
518 D. Beyer, T.A. Henzinger, and G. Théoduloz
3. Blanchet, B., Cousot, P., Cousot, R., Feret, J., Mauborgne, L., Miné, A., Monniaux, D., Ri-
val, X.: Design and implementation of a special-purpose static program analyzer for safety-
critical real-time embedded software. In: Mogensen, T.Æ., Schmidt, D.A., Sudborough, I.H.
(eds.) The Essence of Computation. LNCS, vol. 2566, pp. 85–108. Springer, Heidelberg
(2002)
4. Codish, M., Mulkers, A., Bruynooghe, M., de la Banda, M., Hermenegildo, M.: Improving
abstract interpretations by combining domains. In: Proc. PEPM, pp. 194–205. ACM Press,
New York (1993)
5. Cousot, P., Cousot, R.: Abstract interpretation: A unified lattice model for the static analysis
of programs by construction or approximation of fixpoints. In: Proc. POPL, pp. 238–252.
ACM Press, New York (1977)
6. Cousot, P., Cousot, R.: Systematic design of program analysis frameworks. In: Proc. POPL,
pp. 269–282. ACM Press, New York (1979)
7. Cousot, P., Cousot, R.: Compositional and inductive semantic definitions in fixpoint, equa-
tional, constraint, closure-condition, rule-based and game-theoretic form. In: Wolper, P. (ed.)
CAV 1995. LNCS, vol. 939, pp. 293–308. Springer, Heidelberg (1995)
8. Dwyer, M.B., Clarke, L.A.: A flexible architecture for building data-flow analyzers. In: Proc.
ICSE, pp. 554–564. IEEE Computer Society Press, Los Alamitos (1996)
9. Fischer, J., Jhala, R., Majumdar, R.: Joining data flow with predicates. In: Proc. ESEC/FSE,
pp. 227–236. ACM Press, New York (2005)
10. Gulwani, S., Tiwari, A.: Combining abstract interpreters. In: Proc. PLDI, pp. 376–386. ACM
Press, New York (2006)
11. Henzinger, T.A., Jhala, R., Majumdar, R., Sutre, G.: Lazy abstraction. In: Proc. POPL, pp.
58–70. ACM Press, New York (2002)
12. Lerner, S., Grove, D., Chambers, C.: Composing data-flow analyses and transformations. In:
Proc. POPL, pp. 270–282. ACM Press, New York (2002)
13. Lev-Ami, T., Sagiv, M.: TVLA: A system for implementing static analyses. In: Palsberg, J.
(ed.) SAS 2000. LNCS, vol. 1824, pp. 280–301. Springer, Heidelberg (2000)
14. Martin, F.: PAG: An efficient program analyzer generator. STTT 2, 46–67 (1998)
15. Mauborgne, L., Rival, X.: Trace partitioning in abstract interpretation based static analyzers.
In: Sagiv, M. (ed.) ESOP 2005. LNCS, vol. 3444, pp. 5–20. Springer, Heidelberg (2005)
16. Necula, G., McPeak, S., Rahul, S., Weimer, W.: CIL: Intermediate language and tools for
analysis and transformation of C programs. In: Horspool, R.N. (ed.) CC 2002 and ETAPS
2002. LNCS, vol. 2304, pp. 213–228. Springer, Heidelberg (2002)
17. Sagiv, M., Reps, T.W., Wilhelm, R.: Parametric shape analysis via 3-valued logic. ACM
TOPLAS 24, 217–298 (2002)
18. Schmidt, D.A.: Data-flow analysis is model checking of abstract interpretations. In: Proc.
POPL, pp. 38–48. ACM Press, New York (1998)
19. Steffen, B.: Data-flow analysis as model checking. In: Proc. TACS, pp. 346–365 (1991)
20. Tjiangan, S.W.K., Hennessy, J.: SHARLIT: A tool for building optimizers. In: Proc. PLDI,
pp. 82–93. ACM Press, New York (1992)