0% found this document useful (0 votes)
40 views10 pages

Dataflow Handout

data flow analysis

Uploaded by

Venkatesh Kumar
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views10 pages

Dataflow Handout

data flow analysis

Uploaded by

Venkatesh Kumar
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Data-Flow Analysis

Compiler Design

CSE 504

1 2 3 4

Preliminaries Live Variables Data Flow Equations Other Analyses


Last modied: Mon Feb 18 2013 at 22:24:43 EST Version: 1.2 03:25:46 2013/02/19 Compiled at 22:29 on 2013/02/18 Compiler Design Data-Flow Analysis CSE 504 1 / 20

Preliminaries

Program Analysis

The compiler needs to understand properties of a program (e.g. the set of variables live at a program point). This information should be computed at compile time, with incomplete information on the values the program computes, and without executing the program itself! This information is likely to be approximate: in general, at compile time, we will not know which sequence of instructions will be executed. Data-Flow Analysis is a standard way to formulate intra-procedural program analysis.

Compiler Design

Data-Flow Analysis

CSE 504

2 / 20

Preliminaries

Control Flow Graphs


When we try to deduce properties of a procedure, we rst build a control ow graph (CFG). Nodes of a CFG are Basic Blocks. Edges indicate which blocks can follow which other blocks. A Basic Block is a sequence of instructions such that: There are no jumps/branches in the sequence except as the last instruction. For all jumps/branches in the program, the target is the rst instruction in some basic block.
In other words, no jump lands in the middle of a basic block.

Compiler Design

Data-Flow Analysis

CSE 504

3 / 20

Preliminaries

Example of CFGs
B1: B2: B3: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. i = 1 j = 1 t1 = 10 * i t2 = t1 + j t3 = 4 * t2 a[t3] = 0 j = j + 1 if j < 10 goto (3) i = i + 1 if i < 10 goto (2) i = 1 t4 = 10 * i t5 = t4 + i a[t5] = 1 i = i + 1 if i < 10 goto (12)

Branches only at the end of a block. Branch destinations only at beginning of a block.
B4:

B5: B6:

Exit:
Compiler Design Data-Flow Analysis CSE 504 4 / 20

Live Variables

Live Variables
Consider the problem of nding the set of live variables at some program point. A variable is live after a statement s in the program, if it is used in a statement s , and there is a control ow path from s to s . Example:
1. 2. 3. 4. 5. 6. 7. . . . i = 1 j = 1 t1 = 10 * i t2 = t1 + j t3 = 4 * t2 a[t3] = 0 j = j + 1 . . . Variable t3 is live after statement 5 since it is used in statement 6. Variable j is also live after statement 5 since it is used in statement 7.

Compiler Design

Data-Flow Analysis

CSE 504

5 / 20

Live Variables

Live Variable Analysis (1)

Let def (s ) be the set of all variables dened by statement s (e.g. the lhs variable in an assignment statement). Let use (s ) be the set of all variables used by statement s (e.g. the variables on the rhs of an assignment statement). succ (s ): the set of statements that immediately follow statement s . The above denitions for def , use , and succ can be extended for whole blocks as well.
def (B ): set of variables dened in block b . use (B ): set of variables used, but not dened earlier, in block b . succ (B ): set of blocks that immediately succeed block B .

Compiler Design

Data-Flow Analysis

CSE 504

6 / 20

Live Variables

Live Variable Analysis (2)


B1: B2: B3: 1. 2. 3. 4. 5. 6. 7. 8. B4: 9. 10. B5: 11. B6: 12. 13. 14. 15. 16. Exit: i=1 j=1 t1 = 10 * i t2 = t1 + j t3 = 4 * t2 a[t3] = 0 j=j+1 if j < 10 goto (3) i=i+1 if i < 10 goto (2) i=1 t4 = 10 * i t5 = t4 + i a[t5] = 1 i=i+1 if i < 10 goto (12)

Block 1 2 3 4 5 6

Succ {2} {3} {3,4} {2,5} {6} {6,Exit}

Def {i} {j} {t1, t2, t3, j} {i} {i} {t4,t5,i}

Use {} {} {a,i, j} {i} {} {a,i}

Compiler Design

Data-Flow Analysis

CSE 504

7 / 20

Live Variables

Live Variable Analysis (3)


Out (s ): the set of variables live just after statement s . In(s ): the set of variables live just before statement s . The above denitions for Out and In can be readily extended for blocks. Observe that:
If a variable is used by a statement, then it must be live before the statement. If a variable is live immediately after a statement, then it must be live before the statement as well, unless it is dened by the statement. For a statement s , if a variable is live before any of its successors, then it must be live after s . From these observations, we get:

In(s ) = use (s ) (Out (s ) def (s )) Out (s ) =


t succ (s )
Compiler Design Data-Flow Analysis CSE 504 8 / 20

In(t )

Live Variables

Live Variable Analysis (4)

In(s ) = Out (s ) =

use (s ) (Out (s ) def (s )) In(t )


t succ (s )

Let a be a variable that is needed after the procedure exits (e.g. it is a global variable). Then, In(Exit ) = {a}.
Block 1 2 3 4 5 6 Exit Succ {2} {3} {3,4} {2,5} {6} {6,Exit} {} Def {i} {j} {t1,t2,t3,j} {i} {i} {t4,t5,i} {} Use {} {} {a,i, j} {i} {} {a,i} {} In Out(1){i} Out(2){j} {a,i,j} Out(4){t1,t2,t3,j} {i} Out(4){i} Out(5){i} {a,i} Out(6){t4,t5,i} {a} Out In(2) In(3) In(3) In(4) In(2) In(5) In(6) In(6) In(Exit)

Compiler Design

Data-Flow Analysis

CSE 504

9 / 20

Live Variables

Live Variable Analysis (5)


The equations for In and Out form a set of simultaneous set equations. For this analysis, we require the least solution to these equations. Consider the equations relating In(6), Out(6) and In(Exit): In(6) = {a, i } Out (6) {t 4, t 5, i } Out (6) = In(6) In(Exit ) In(Exit ) = {a} There are many solutions to these equations:
1 2 3

In(6) = Out (6) = {a, i }, and In(Exit ) = {a}. In(6) = Out (6) = {a, i , t 3}, and In(Exit ) = {a}. . . .

Of these, (1) is the least. In fact, it can be shown that every solution will contain (1).
Compiler Design Data-Flow Analysis CSE 504 10 / 20

Data Flow Equations

Solutions to Data Flow Equations

Data ow analysis is formulated in terms of nding the least (or sometimes, the greatest) solution to a set of simultaneous equations. The ow equations can be written as X = F (X ), where X is a vector of Ins and Outs. Solutions X such that X = F (X ) are xed points of F . The smallest X such that X = F (X ) is called the least xed point of F.

Compiler Design

Data-Flow Analysis

CSE 504

11 / 20

Data Flow Equations

Partial Orders
Let U be a nite set, and let D = P (U ), i.e. the powerset of U . Let Dn = D D (n times) D, i.e., an n-dimensional cartesian space over P (U ). We can dene partial order among vectors of sets such that X if, and only if, for all components of the vector, Xi Xi . X

It is easy to verify that is a partial order: it is reexive, transitive and anti-symmetric.

Let be a n-vector of empty sets. Clearly, Let be a n-vector of U . Observe that X

X for all X Dn . for all X Dn . as

(Dn , ) is a complete lattice with as the least element and the greatest element. Vectors X , X , . . . , X (0) (1) (i ) X X X .
(0) (1) (i )

is called a chain if

Note all chains in (Dn , ) are nite, since U is nite.


Compiler Design Data-Flow Analysis CSE 504 12 / 20

Data Flow Equations

Monotone Functions
Let F : Dn Dn (i.e. a function from Dn to Dn ). A function F is monotone over partial order if, for every X and X such that X X , we have F (X ) F (X ).
Note the denition of monotonicity. It says the function returns smaller values if it is given smaller argument values. It is not necessary that the returned values must be smaller than the argument values!

It is easy to see that the ow equations for live variable analysis denes a monotone function. There is a simple way to show the existence of xed points, and to compute the Least/Greatest Fixed Points of a monotone function. Tarski-Knaster Theorem: Given a complete lattice L and a function G : L L, the xed points of G form a complete lattice. Consequently, there exist both least and greatest xed points.
Compiler Design Data-Flow Analysis CSE 504 13 / 20

Data Flow Equations

Computing Least Fixed Point (1)


Kleenes Fixed Point Theorem: Construct a sequence X , X (i +1) (i ) X = F (X ). This sequence forms a chain.
(0) (1)

(0)

(1)

,...,X

(i )

, . . ., where X = and

X = X . (i ) (i +1) (i +1) If X X , then X


(i +1) (i )

(i +2)

.
F (X
(i +1)

X = F (X ) (i ) (i +1) (i ) Since X X , by monotonicity of F , F (X ) (i +2) (i +1) X = F (X )

).

Since all chains over are nite, consider the last element of the (n) chain X .
X
(n)

= F (X
(n )

(n )

), otherwise it is not the last element.

So, X
Compiler Design

is a xed point of F .
Data-Flow Analysis CSE 504 14 / 20

Data Flow Equations

Computing Least Fixed Point (2)


Consider the sequence X (i +1) (i ) = F (X ). and X X
(n) (0)

,X

(1)

,...,X

(i )

,...,X

(n)

, where X =

is the least xed point of F .


We already know that X is a xed point of F . Let Y be any xed point of F . (0) Clearly, X = Y . (i ) (i +1) (i ) If X Y , since F is monotone, X = F (X ) (since Y is a xed point).
(i ) (n )

F (Y ) = Y

Hence, by induction, for all elements of the chain X Y. (n ) In particular, X Y , is at least as small as any xed point Y of F , and hence is the least xed point.

Compiler Design

Data-Flow Analysis

CSE 504

15 / 20

Data Flow Equations

Computing the Greatest Fixed Point

Consider the sequence X (i +1) (i ) and X = F (X ).

(0)

,X

(1)

,...,X

(i )

,...,X

(n)

, where X =

Note the starting point of this sequence: the greatest element in the lattice. By an argument similar to the one we used for the least xed point, (n) X can be shown to be the greatest xed point of F .

Compiler Design

Data-Flow Analysis

CSE 504

16 / 20

Data Flow Equations

Live Variable Analysis Revisited


Set In(1) Out(1) In(2) Out(2) In(3) Out(3) In(4) Out(4) In(5) Out(5) In(6) Out(6) In(Exit) Eqn Out(1){i} In(2) Out(2){j} In(3) {a,i,j} Out(3){t1,t2,t3,j} In(3) In(4) {i} Out(4){i} In(2) In(5) Out(5){i} In(6) {a,i} Out(6){t4,t5,i} In(6) In(Exit) {a} 0 {} {} {} {} {} {} {} {} {} {} {} {} {} 1 {a} {a,i} {a,i} {a,i,j} {a,i,j} {a,i} {a,i} {a} {a} {a,i} {a,i} {a} {a} 2 {a} {a,i} {a,i} {a,i,j} {a,i,j} {a,i,j} {a,i} {a,i} {a} {a,i} {a,i} {a,i} {a} 3 {a} {a,i} {a,i} {a,i,j} {a,i,j} {a,i,j} {a,i} {a,i} {a} {a,i} {a,i} {a,i} {a}

Compiler Design

Data-Flow Analysis

CSE 504

17 / 20

Other Analyses

Reaching Denitions
An assignment of the form x = e for some expression e is said to dene x . A denition at statement s1 reaches another statement s2 if:
there is some control ow path from s1 to s2 , such that there is no other denition of x on the path from s1 to s2 .

Let In(s ) be the set of all denitions that reach s . Let Out (s ) be the set of all denitions that reach all the immediate successors of s . Then Out (s ) = gen(s ) (In(s ) kill (s )), where
gen(s ) is the set of denitions generated by s , and kill (s )) is the set of denitions with the same lhs variables as those in s .

In(s ) =

t pred (s ) Out (t )

Compiler Design

Data-Flow Analysis

CSE 504

18 / 20

Other Analyses

Reaching Denitions vs. Live Variables


Live Variables: In and Out are the smallest sets such that In(s ) = use (s ) (Out (s ) def (s )) Out (s ) =
t succ (s )

In(t )

Reaching Denitions: In and Out are the smallest sets such that In(s ) =
t pred (s )

Out (t )

Out (s ) = gen(s ) (In(s ) kill (s )) The form of equations is identical, and they can be computed using the same procedure, except:
Live Variables are best computed backwards through the ow graph (information goes from successors to predecessors). Reaching Denitions are best computed forwards through the ow graph (information goes from predecessors to successors).
Compiler Design Data-Flow Analysis CSE 504 19 / 20

Other Analyses

Available Expressions
An expression e is available at statement s if, for every path that reaches s1 , there is some statement s where e is evaluated. Let In(s ) be the set of all expressions available immediately before s is evaluated. Let Out (s ) be the set of all expressions available immediately after s is evaluated. Then Out (s ) = gen(s ) (In(s ) kill (s )), where
gen(s ) is the set of all expressions evaluated in s , and kill (s ) is the set of all expressions that use the lhs variables dened in s .

In(s ) =

t pred (s ) Out (t )

In and Out are the greatest sets that satisfy the above equations.

Compiler Design

Data-Flow Analysis

CSE 504

20 / 20

You might also like