Static Program Analysis: Part 4 - Flow Sensitive Analyses

Download as pdf or txt
Download as pdf or txt
You are on page 1of 42

Static Program Analysis

Part 4 – flow sensitive analyses

http://cs.au.dk/~amoeller/spa/

Anders Møller & Michael I. Schwartzbach


Computer Science, Aarhus University
Agenda

• Constant propagation analysis


• Live variables analysis
• Available expressions analysis
• Very busy expressions analysis
• Reaching definitions analysis
• Initialized variables analysis

2
Constant propagation optimization
var x,y,z;
x = 27;
y = input,
z = 2*x+y;
if (x<0) { y=z-3; } else { y=12 }
output y;

var x,y,z; var y;


x = 27; y = input;
y = input; output 12;
z = 54+y;
if (0) { y=z-3; } else { y=12 }
output y;
3
Constant propagation analysis
• Determine variables with a constant value
• Flat lattice:

-3 -2 -1 0 1 2 3

4
Constraints for constant propagation
• Essentially as for the Sign analysis…

• Abstract operator for addition:


⊥ if n=⊥ ∨ m=⊥
+(n,m) = ⊤ else if n=⊤ ∨ m=⊤
n+m otherwise

5
Agenda

• Constant propagation analysis


• Live variables analysis
• Available expressions analysis
• Very busy expressions analysis
• Reaching definitions analysis
• Initialized variables analysis

6
Liveness analysis
• A variable is live at a program point if its current value
may be read in the remaining execution

• This is clearly undecidable, but the property can be


conservatively approximated

• The analysis must only answer “dead”


if the variable is really dead
– no need to store the values of dead variables

7
A lattice for liveness
A powerset lattice of program variables

var x,y,z; L = (P({x,y,z}), )


x = input;
while (x>1) { the trivial answer
y = x/2;
if (y>3) x = x-y; {x,y,z}
z = x-4;
{x,y} {y,z} {x,z}
if (z>0) x = x/2;
z = z-1; {x} {y} {z}
}
output x; 

8
The control flow graph
x = input x > 1 y = x/2 y > 3 x = x-y

var x,y,z z = x-4

z > 0 x = x/2

z = z-1

output x

9
Setting up
• For every CFG node, v, we have a variable ⟦v⟧:
– the set of program variables that are live
at the program point before v

• Since the analysis is conservative, the computed sets


may be too large
v

• Auxiliary definition:
JOIN(v) =  ⟦w⟧
wsucc(v)
w1
w2
wk

10
Liveness constraints
• For the exit node: vars(E) = variables occurring in E
⟦exit⟧ = 
• For conditions and output:
⟦ if (E) ⟧ = ⟦ output E ⟧ = JOIN(v)  vars(E)
• For assignments:
⟦ x = E ⟧ = JOIN(v) \ {x}  vars(E)
• For variable declarations:
⟦ var x1, ..., xn ⟧ = JOIN(v) \ {x1, ..., xn}
• For all other nodes:
right-hand sides are monotone
⟦v⟧ = JOIN(v) since JOIN is monotone, and …
11
Generated constraints
⟦var x,y,z⟧ = ⟦x=input⟧ \ {x,y,z}
⟦x=input⟧ = ⟦x>1⟧ \ {x}
⟦x>1⟧ = (⟦y=x/2⟧  ⟦output x⟧)  {x}
⟦y=x/2⟧ = (⟦y>3⟧ \ {y})  {x}
⟦y>3⟧ = ⟦x=x-y⟧  ⟦z=x-4⟧  {y}
⟦x=x-y⟧ = (⟦z=x-4⟧ \ {x})  {x,y}
⟦z=x-4⟧ = (⟦z>0⟧ \ {z})  {x}
⟦z>0⟧ = ⟦x=x/2⟧  ⟦z=z-1⟧  {z}
⟦x=x/2⟧ = (⟦z=z-1⟧ \ {x})  {x}
⟦z=z-1⟧ = (⟦x>1⟧ \ {z})  {z}
⟦output x⟧ = ⟦exit⟧  {x}
⟦exit⟧ = 
12
Least solution
⟦entry⟧ = 
⟦var x,y,z⟧ = 
⟦z>0⟧ = {x,z}
⟦x=input⟧ = 
⟦x=x/2⟧ = {x,z}
⟦x>1⟧ = {x}
⟦z=z-1⟧ = {x,z}
⟦y=x/2⟧ = {x}
⟦output x⟧ = {x}
⟦y>3⟧ = {x,y}
⟦exit⟧ = 
⟦x=x-y⟧ = {x,y}
⟦z=x-4⟧ = {x}

Many non-trivial answers!

13
Optimizations
• Variables y and z are never simultaneously live
 they can share the same variable location
• The value assigned in z=z-1 is never read
 the assignment can be skipped
var x,yz;
x = input;
while (x>1) {
yz = x/2; • better register allocation
if (yz>3) x = x-yz;
• a few clock cycles saved
yz = x-4;
if (yz>0) x = x/2;
}
output x;
14
Time complexity
(for the naive algorithm)
• With n CFG nodes and k variables:
– the lattice Ln has height kn
– so there are at most kn iterations
• Subsets of Vars (the variables in the program)
can be represented as bitvectors:
– each element has size k
– each , \, = operation takes time O(k)
• Each iteration uses O(n) bitvector operations:
– so each iteration takes time O(kn)
• Total time complexity: O(k2n2)

• Exercise: what is the complexity for the worklist algorithm?


15
Agenda

• Constant propagation analysis


• Live variables analysis
• Available expressions analysis
• Very busy expressions analysis
• Reaching definitions analysis
• Initialized variables analysis

16
Available expressions analysis
• A (nontrivial) expression is available at a program
point if its current value has already been computed
earlier in the execution

• The approximation generally includes too few


expressions
– the analysis can only report “available” if the expression
is definitely available
– no need to re-compute available expressions
(e.g. common subexpression elimination)

17
A lattice for available expressions
A reverse powerset lattice of nontrivial expressions

var x,y,z,a,b;
z = a+b; L = (P({a+b, a*b, y>a+b, a+1}), )
y = a*b;
while (y > a+b) {
a = a+1;
x = a+b;
}

18
Reverse powerset lattice
the trivial answer

{a+b} {a*b} {y>a+b} {a+1}

{a+b, a*b} {a+b, y>a+b} {a+b, a+1} {a*b, y>a+b} {a*b, a+1} {y>a+b, a+1}

{a+b, a*b, y>a+b} {a+b, a*b, a+1} {a+b, y>a+b, a+1} {a*b, y>a+b, a+1}

{a+b, a*b, y>a+b, a+1}

19
The control flow graph

var x,y,z,a,b

z=a+b

y=a*b

y>a+b

a=a+1

x=a+b

20
Setting up
• For every CFG node, v, we have a variable ⟦v⟧:
– the set of expressions that are available
at the program point after v

• Since the analysis is conservative, the computed sets


may be too small
w2
wk
w1
• Auxiliary definition:
JOIN(v) =  ⟦w⟧
wpred(v)
v

21
Auxiliary functions
• The function Xx removes all expressions from X
that contain a reference to the variable x

• The function exps(E) is defined as:


– exps(intconst) = 
– exps(x) = 
– exps(input) = 
– exps(E1 op E2) = {E1 op E2}  exps(E1)  exps(E2)
but don’t include expressions containing input

22
Availability constraints
• For the entry node:
⟦entry⟧ = 
• For conditions and output:
⟦ if (E) ⟧ = ⟦ output E ⟧ = JOIN(v)  exps(E)
• For assignments:
⟦ x = E ⟧ = (JOIN(v)  exps(E))x
• For any other node v:
⟦v⟧ = JOIN(v)

23
Generated constraints

⟦entry⟧ = 
⟦var x,y,z,a,b⟧ = ⟦entry⟧
⟦z=a+b⟧ = exps(a+b)z
⟦y=a*b⟧ = (⟦z=a+b⟧  exps(a*b))y
⟦y>a+b⟧ = (⟦y=a*b⟧  ⟦x=a+b⟧)  exps(y>a+b)
⟦a=a+1⟧ = (⟦y>a+b⟧  exps(a+1))a
⟦x=a+b⟧ = (⟦a=a+1⟧  exps(a+b))x
⟦exit⟧ = ⟦y>a+b⟧

24
Least solution
⟦entry⟧ = 
⟦var x,y,z,a,b⟧ = 
⟦z=a+b⟧ = {a+b}
⟦y=a*b⟧ = {a+b, a*b}
⟦y>a+b⟧ = {a+b, y>a+b}
⟦a=a+1⟧ = 
⟦x=a+b⟧ = {a+b}
⟦exit⟧ = {a+b}

Again, many nontrivial answers!

25
Optimizations
• We notice that a+b is available before the loop
• The program can be optimized (slightly):
var x,y,x,a,b,aplusb;
aplusb = a+b;
z = aplusb;
y = a*b;
while (y > aplusb) {
a = a+1;
aplusb = a+b;
x = aplusb;
}

26
Agenda

• Constant propagation analysis


• Live variables analysis
• Available expressions analysis
• Very busy expressions analysis
• Reaching definitions analysis
• Initialized variables analysis

27
Very busy expressions analysis
• A (nontrivial) expression is very busy if it will definitely
be evaluated before its value changes

• The approximation generally includes too few


expressions
– the answer “very busy” must be the true one
– very busy expressions may be pre-computed
(e.g. loop hoisting)

• Same lattice as for available expressions

28
An example program

var x,a,b;
x = input;
a = x-1;
b = x-2;
while (x > 0) {
output a*b-x;
x = x-1;
}
output a*b;

The analysis shows that a*b is very busy

29
Code hoisting
var x,a,b; var x,a,b,atimesb;
x = input; x = input;
a = x-1; a = x-1;
b = x-2; b = x-2;
while (x > 0) { atimesb = a*b;
output a*b-x; while (x > 0) {
x = x-1; output atimesb-x;
} x = x-1;
output a*b; }
output atimesb;

30
Setting up
• For every CFG node, v, we have a variable ⟦v⟧:
– the set of expressions that are very busy
at the program point before v

• Since the analysis is conservative, the computed sets


may be too small
v

• Auxiliary definition:

JOIN(v) =  ⟦w⟧
wsucc(v)
w1
w2
wk

31
Very busy constraints
• For the exit node:
⟦exit⟧ = 
• For conditions and output:
⟦ if (E) ⟧ = ⟦ output E ⟧ = JOIN(v)  exps(E)
• For assignments:
⟦ x = E ⟧ = JOIN(v)x  exps(E)
• For all other nodes:
⟦v⟧ = JOIN(v)

32
Agenda

• Constant propagation analysis


• Live variables analysis
• Available expressions analysis
• Very busy expressions analysis
• Reaching definitions analysis
• Initialized variables analysis

33
Reaching definitions analysis
• The reaching definitions for a program point are
those assignments that may define the current
values of variables

• The conservative approximation may include too


many possible assignments

34
A lattice for reaching definitions
The powerset lattice of assignments
L = (P({x=input, y=x/2, x=x-y, z=x-4, x=x/2, z=z-1}),)
var x,y,z;
x = input;
while (x > 1) {
y = x/2;
if (y>3) x = x-y;
z = x-4;
if (z>0) x = x/2;
z = z-1;
}
output x;

35
Reaching definitions constraints
• For assignments:
⟦ x = E ⟧ = JOIN(v)x  { x = E }
• For all other nodes:
⟦v⟧ = JOIN(v) w2
wk
w1
• Auxiliary definition:
JOIN(v) =  ⟦w⟧
wpred(v)
v

• The function Xx removes assignments to x


from the set X
36
Def-use graph
Reaching definitions define the def-use graph:
– like a CFG but with edges from def to use nodes
– basis for dead code elimination and code motion

x=input
x=x-y z=z-1
x>1

y=x/2 z=x-4 output x

y>3 z>0 x=x/2

37
Forward vs. backward
• A forward analysis:
– computes information about the past behavior
– examples: available expressions, reaching definitions

• A backward analysis:
– computes information about the future behavior
– examples: liveness, very busy expressions

38
May vs. must
• A may analysis:
– describes information that is possibly true
– an over-approximation
– examples: liveness, reaching definitions

• A must analysis:
– describes information that is definitely true
– an under-approximation
– examples: available expressions, very busy expressions

39
Classifying analyses
forward backward
example: reaching definitions example: liveness

⟦v⟧ describes state after v ⟦v⟧ describes state before v


may
JOIN(v) = ⨆⟦w⟧ = ⟦w⟧ JOIN(v) = ⨆⟦w⟧ = ⟦w⟧
wpred(v) wpred(v) wsucc(v) wsucc(v)

example: available expressions example: very busy expressions

⟦v⟧ describes state after v ⟦v⟧ describes state before v


must
JOIN(v) = ⨆⟦w⟧ = ⟦w⟧ JOIN(v) = ⨆⟦w⟧ = ⟦w⟧
wsucc(v) wsucc(v)
wpred(v) wpred(v)

40
Agenda

• Constant propagation analysis


• Live variables analysis
• Available expressions analysis
• Very busy expressions analysis
• Reaching definitions analysis
• Initialized variables analysis

41
Initialized variables analysis
• Compute for each program point those variables
that have definitely been initialized in the past
• (Called definite assignment analysis in Java and C#)
•  forward must analysis
• Reverse powerset lattice of all variables
JOIN(v) = ⟦w⟧
wpred(v)

• For assignments: ⟦ x = E ⟧ = JOIN(v)  {x}


• For all others: ⟦v⟧ = JOIN(v)
42

You might also like