Static Program Analysis Using Abstract Interpretation: Guillaume Brat Arnaud Venet
Static Program Analysis Using Abstract Interpretation: Guillaume Brat Arnaud Venet
Abstract Interpretation
Kestrel Technology
NASA Ames Research Center
Moffett Field, CA 94035
Introduction
Static Program Analysis
Static program analysis consists of automatically
discovering properties of a program that hold for
all possible execution paths of the program.
Static program analysis is not
• Testing: manually checking a property for
some execution paths
• Model checking: automatically checking a
property for all execution paths
Program Analysis for what?
• Optimizing compilers
• Program understanding
• Semantic preprocessing:
– Model checking
– Automated test generation
• Program verification
Program Verification
• Check that every operation of a program
will never cause an error (division by zero,
buffer overrun, deadlock, etc.)
• Example:
int a[1000];
for (i = 0; i < 1000; i++) {
safe operation a[i] = … ; // 0 <= i <= 999
}
buffer overrun a[i] = … ; // i = 1000;
Incompleteness of Program Analysis
• Discovering a sufficient set of properties
for checking every operation of a program
is an undecidable problem!
• False positives: operations that are safe
in reality but which cannot be decided safe
or unsafe from the properties inferred by
static analysis.
Precision versus Efficiency
Precision: number of program operations that
can be decided safe or unsafe by an analyzer.
• Precision and computational complexity
are strongly related
• Tradeoff precision/efficiency: limit in the
average precision and scalability of a
given analyzer
• Greater precision and scalability is
achieved through specialization
Specialization
• Tailoring the program analyzer algorithms
for a specific class of programs (flight
control commands, digital signal
processing, etc.)
• Precision and scalability is guaranteed for
this class of programs only
• Requires a lot of try-and-test to fine-tune
the algorithms
• Need for an open architecture
Soundness
• What guarantees the soundness of the analyzer
results?
• In dataflow analysis and type inference the
soundness proof of the resolution algorithm is
independent from the analysis specification
• An independent soundness proof precludes the
use of test-and-try techniques
• Need for analyzers correct by construction
Abstract Interpretation
• A general methodology for designing static
program analyzers that are:
– Correct by construction
– Generic
– Easy to fine-tune
• Scalability is difficult to achieve but the
payoff is worth the effort!
Approximation
The core idea of Abstract Interpretation is the
formalization of the notion of approximation
• An approximation of memory configurations is first
defined
• Then the approximation of all atomic operations
• The approximation is automatically lifted to the
whole program structure
• The approximation is generally a scheme that
depends on some other parameter
approximations
Overview of Abstract Interpretation
• Start with a formal specification of the program
semantics (the concrete semantics)
• Construct abstract semantic equations w.r.t. a
parametric approximation scheme
• Use general algorithms to solve the abstract
semantic equations
• Try-and-test various instantiations of the
approximation scheme in order to find the best fit
The Methodology of Abstract
Interpretation
Methodology
Concrete
Semantics
Collecting
Semantics
Tuners
Abstract Partitioning
Domain
Iterative
Abstract
Resolution
Semantics
Abstract Algorithms
Domain
Lattices and Fixpoints
• A lattice (L, ⊑, ⊥, ⊔, ⊤, ⊓) is a partially ordered
set (L, ⊑) with:
– Least upper bounds (⊔) and greatest lower
bounds (⊔) operators
– A least element “bottom”: ⊥
– A greatest element “top”: ⊤
• L is complete if all least upper bounds exist
• A fixpoint X of F: L → L satisfies F(X) = X
• We denote by lfp F the least fixpoint if it exists
Fixpoint Theorems
• Knaster-Tarski theorem: If F: L → L is
monotone and L is a complete lattice, the set of
fixpoints of F is also a complete lattice.
• Kleene theorem: If F: L → L is monotone, L is a
complete lattice and F preserves all least upper
bounds then lfp F is the limit of the sequence:
F0 = ⊥
Fn+1 = F (Fn)
Methodology
Concrete
Semantics
Collecting
Semantics
Tuners
Abstract Partitioning
Domain
Iterative
Abstract
Resolution
Semantics
Abstract Algorithms
Domain
Concrete Semantics
Small-step operational semantics: (
Example:
1: n = 0;
2: while n < 1000 do
3: n = n + 1;
4: end
5: exit
1, n n0n0n1
n1n1000
Undefined value
Control Flow Graph
1
1: n = 0;
n = 0
3: n = n + 1; 3
n = n + 1
4: end 4
5: exit 5
Transition Relation
op
Control flow graph: ⓘ ⓙ
Semantics of op
Methodology
Concrete
Semantics
Collecting
Semantics
Tuners
Abstract Partitioning
Domain
Iterative
Abstract
Resolution
Semantics
Abstract Algorithms
Domain
Collecting Semantics
The collecting semantics is the set of
observable behaviours in the operational
semantics. It is the starting point of any
analysis design.
• The set of all descendants of the initial state
• The set of all descendants of the initial state
that can reach a final state
• The set of all finite traces from the initial state
• The set of all finite and infinite traces from the
initial state
• etc.
Which Collecting Semantics?
• Buffer overrun, division by zero, arithmetic
overflows: state properties
• Deadlocks, un-initialized variables: finite
trace properties
• Loop termination: finite and infinite trace
properties
State properties
The set of descendants of the initial state s0:
S = {s | s0 s}
S = lfp F
Example
1: n = 0;
2: while n < 1000 do
3: n = n + 1;
4: end
5: exit
S = {1, n n0n0n1
n1n1000}
Computation
• F0 = ∅
• F1 = {〈1,n⇒Ω〉 }
• F2 = {〈1,n⇒Ω〉, 〈2,n⇒0〉 }
• F3 = {〈1,n⇒Ω〉, 〈2,n⇒0〉, 〈3,n⇒0〉 }
• F4 = {〈1,n⇒Ω〉, 〈2,n⇒0〉, 〈3,n⇒0〉,
〈4,n⇒1〉 }
• ...
Methodology
Concrete
Semantics
Collecting
Semantics
Tuners
Abstract Partitioning
Domain
Iterative
Abstract
Resolution
Semantics
Abstract Algorithms
Domain
Partitioning
We partition the set S of states w.r.t. program
points:
• Σ = Σ1 ⊕ Σ2 ⊕ ... ⊕ Σn
• Σi = {〈k, ε〉 ∈ Σ | k = i }
• F(S1, ..., Sn)i = {s' ∈ Si | ∃j ∃s ∈ Sj: s s'}
op
• F(S1, ..., Sn)i = {〈i, 〚 op 〛 ε〉 | ⓙⓘ ∈ CFG (P)}
• F(S1, ..., Sn)0 = { s0 }
Illustration
Σ Σ
S1 〈1, e1〉
〈1, e2〉
Σ1
S
〈i, 〚 op 〛 ε1〉 i
op 〈i, 〚 op 〛 ε2〉
op Σi
Sj 〈j, ε1〉
〈j, ε2〉
F
Σj
Semantic Equations
• Notation: Ei = set of environments at
program point i
• System of semantic equations:
op
Ei = U { 〚 op 〛 Ej | ⓙ ⓘ ∈ CFG (P) }
E1 = {n }
E 2 = 〚 n = 0 〛 E1 E 4
E3 = E2 ]-, 999]
E4 = 〚 n = n + 1 〛 E3
E5 = E2 [1000, [
Example
E1 = {n } 1
n = 0
E1:
2
=〚
n n = 0 〛 E1
= 0;
2: while n < 1000 do 2
E3:
4
n = n + 1;
n ≥ 1000 n < 1000
4: end
3
]-, 999]
E5:= Eexit
2
3
n = n + 1
E4 = 〚 n = n +
4
1 〛 E3
E5 = E2 [1000, [ 5
Other Kinds of Partitioning
• In the case of collecting semantics of
traces:
– Partitioning w.r.t. procedure calls: context
sensitivity
– Partitioning w.r.t. executions paths in a
procedure: path sensitivity
– Dynamic partitioning (Bourdoncle)
Methodology
Concrete
Semantics
Collecting
Semantics
Tuners
Abstract Partitioning
Domain
Iterative
Abstract
Resolution
Semantics
Abstract Algorithms
Domain
Approximation
(L1, ) (L2, )
●
xy :(x) y x (y)
●
xy : x (x) & o (y) y
Fixpoint Approximation
oFo
L2 L2
L1 L1
F
Theorem:
lfp F (lfp o F o )
Abstracting the Collecting Semantics
• Find a Galois connection:
((), ) (, )
• Find a function: o F o F#
Collecting
Semantics
Tuners
Abstract Partitioning
Domain
Iterative
Abstract
Resolution
Semantics
Abstract Algorithms
Domain
Abstract Domains
Environment: xv, yw, ...
Various kinds of approximations:
• Intervals (nonrelational):
x[a, b], y[a', b'], ...
• Polyhedra (relational):
x + y - 2z 10, ...
• Difference-bound matrices (weakly relational):
y - x z - y
Example: intervals
1: n = 0;
2: while n < 1000 do
3: n = n + 1;
4: end
5: exit
Collecting
Semantics
Tuners
Abstract Partitioning
Domain
Iterative
Abstract
Resolution
Semantics
Abstract Algorithms
Domain
Widening operator
Lattice (L, ): L L L
y0 = x0
yn + 1 = yn xn+1
●
If a a' then a else -
●
If b' b then b else +
fixpoint
widening
Iteration with widening
1: n = 0;
2: while n < 1000 do
3: n = n + 1;
4: end
5: exit
(E2#)n+1 = (E2#)n ( 〚 n = 0 〛 # (E1#)n
(E4#)n)
●
E5# = [1000, [
●
The information is present in the equations
Narrowing operator
Lattice (L, ): L L L
●
Abstract intersection operator:
xy : x y x y
●
Enforces convergence: (xn)n 0
y0 = x 0
yn + 1 = yn xn+1
●
If a = - then a' else a
●
If b = + then b' else b
narrowing
fixpoint
widening
Iteration with narrowing
1: n = 0;
2: while n < 1000 do
3: n = n + 1;
4: end
5: exit; t[n] = 0;
Collecting
Semantics
Tuners
Abstract Partitioning
Domain
Iterative
Abstract
Resolution
Semantics
Abstract Algorithms
Domain
Tuning the abstract domains
1: n = 0;
2: k = 0;
3: while n < 1000 do
4: n = n + 1;
5: k = k + 1;
6: end
7: exit
●
Intervals:
E4# = n k[
●
Convex polyhedra or DBMs:
E4# = nkn - k = 0
Comparison with Data Flow
Analysis
Data Flow Framework
• Forward Data Flow Equations
Init , B entry
in( B )
F (in( B )) , otherwise
PPred( B )
B
• L is a lattice
• in(B) L is the data-flow information on entry to B
• Init is the appropriate initial value on entry to the
program
• FB is the transformation of the data-flow information
upon executing block B
• ∩ models the effect of combining the data-flow
information on the edges entering a block
Data-Flow Solutions
• Solving the data-flow equations computes the meet-over-all-
paths (MOP) solution
MOP(B) Fp (Init) for B entry, B1,...,Bn,exit
pPath(B)
• If FB is monotone, i.e.,
F (x y) F (x)F ( y)
B B B
• If FB is distributive, i.e.,
F (x y) F (x)F ( y)
B B B
• then MOP = MFP
Typical Data-Flow Analyses
• Reaching Definitions
• Available Expressions
• Live Variables
• Upwards-Exposed Uses
• Copy-Propagation Analysis
• Constant-Propagation Analysis
• Partial-redundancy Analysis
Reaching Definitions
• Data-flow equations:
i: RCHin(i) = U (GEN(j) (RCHin(j) PRSV(j)))
where
– PRSV are the definitions preserved by the block
– GEN are the definitions generated by the blocks
• A thorough description of a static analyzer with all the proofs (difficult to read):
– Patrick Cousot. The Calculational Design of a Generic Abstract Interpreter. Course notes for
the NATO International Summer School 1998 on Calculational System Design. Marktoberdorf,
Germany, 28 July—9 august 1998, organized by F.L. Bauer, M. Broy, E.W. Dijkstra, D. Gries
and C.A.R. Hoare.
References
• The abstract domain of intervals:
– Patrick Cousot & Radhia Cousot. Static Determination of Dynamic Properties of Programs.
In B. Robinet, editor, Proceedings of the second international symposium on Programming,
Paris, France, pages 106—130, april 13-15 1976, Dunod, Paris.