Class Data Flow Analysis
Class Data Flow Analysis
Class Data Flow Analysis
October 5, 2004
Compiler Structure
Source
code
Abstract
Control
Syntax
Flow
tree
Graph
Object
code
x := a + b;
y := a * b
while
+
a
>
b
block
a
a
ASTs
Disadvantages of ASTs
ASTs have many similar forms
e.g., for while, repeat , until, etc
e.g., if, ?, switch
Expressions in AST may be complex, nested
(42 * y) + ( z > 5 ? 12 * z : z +20)
Want simpler representation for analysis
at least for dataflow analysis.
Variations on CFGs
Usually dont include declarations (e.g. int x;).
May want a unique entry and exit point.
May group statements into basic blocks.
A basic block is a sequence of instructions with no
branches into or out of the block.
Available Expressions
An expression e = x op y is available at a program
point p, if
on every path from the entry node of the graph to node p, e
is computed at least once, and
And there are no definitions of x or y since the most recent
occurance of e on the path
Optimization
If an expression is available, it need not be recomputed
At least, if it is in a register somewhere
gen
x = a + b
a + b
y = a * b
a * b
a = a + 1
kill
a + b
a * b
a + 1
{a + b,
a * b}
{a
+
b}
{a + b,
a * b}
{a + b}
{a
+
{a
b}
+
{a
b}
+
b}
Terminology
A join point is a program point where two branches
meet
Available expressions is a forward, must problem
Forward = Data Flow from in to out
Must = At joint point, property must hold on all
paths that are joined.
In(s) =
s 2 pred(s)
Out(s)
Liveness Analysis
A variable v is live at a program point p if
v will be used on some execution path
originating from p before v is overwritten
Optimization
If a variable is not live, no need to keep it
in a register
If a variable is dead at assignment, can
eliminate assignment.
s 2 succ(s)
In(s)
gen
kill
x = a + b
a, b
y = a * b
a, b
y > a
a, y
a = a + 1
{x
}
{x, y, a}
{x
}
{x, y, a}
{x
}
{x, y, a}
{x, y, a}
{x
}
{y, a,
b}
{x,
y, a}
{x, y, a}
{y, a,
b}
{y, a,
b}
{x,
y, a}
{x}
{x,
y, a,
b}
{y, a,
b}
{y, a,
b}
{x,
y, a}
{x
}
{x,
y, a,
b}
{y, a,
b}
{y, a,
b}
{x, y, a,
b}
{x
}
{x, a, b}
{x, y, a,
b}
{y, a,
b}
{y, a,
b}
{x, y, a,
b}
{x
}
{x, a, b}
{x, y, a,
b}
{y, a,
b}
{y, a,
b}
{x, y, a,
b}
{x
}
Code Hoisting
Code hoisting finds expressions that are always
evaluated following some point in a program,
regardless of the execution path and moves them to
the latest point beyond which they would always be
evaluated.
It is a transformation that almost always reduces
the space occupied but that may affect its execution
time positively or not at all.
Reaching Definitions
A definition of a variable v is an assignment to v
A definition of variable v reaches point p if
There is no intervening assignment to v
Also called def-use information
What kind of problem?
Forward or backward? Forward
May or must? may
Forward
Backward
Must
Reaching
Available
definitions
expressions
Live
Very busy
Variables
expressions
top
bottom
Partial Orders
A partial order is a pair (P, ) such that
P P
is reflexive: x x
is anti-symmetric: x y and y x implies x
=y
is transitive: x y and y z implies x z
Lattices
A partial order is a lattice if u and t are defined so that
Lattices (cont.)
A finite partial order is a lattice if meet and join exist for
every pair of elements
A lattice has unique elements bot and top such that
xu?=?
xu>=x
x t ? =x
xt>=>
In a lattice
x y iff x u y = x
x y iff x t y = y
Useful Lattices
s 2 pred(s)
Out(s)
Monotonicity
A function f on a partial order is monotonic if
x y implies f(x) f(y)
Easy to check that operations to compute In and
Out are monotonic
In(s) =
s 2 pred(s)
Out(s)
s 2 pred(s)
Out(s))
Termination
We know algorithm terminates because
The lattice has finite height
The operations to compute In and Out are
monotonic
On every iteration we remove a statement
from the worklist and/or move down the
lattice.