0% found this document useful (0 votes)
11 views

03 Data Flow Analysis Handout

The document outlines an upcoming lecture on intraprocedural data flow analysis. It will cover static analysis techniques to derive information about how data flows through a program during execution. The lecture will define basic concepts like basic blocks, control flow graphs, and data flow frameworks. It will also discuss how to model the flow of data through transfer functions and solve the resulting data flow equations to compute the values at each program point.

Uploaded by

Bef
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

03 Data Flow Analysis Handout

The document outlines an upcoming lecture on intraprocedural data flow analysis. It will cover static analysis techniques to derive information about how data flows through a program during execution. The lecture will define basic concepts like basic blocks, control flow graphs, and data flow frameworks. It will also discuss how to model the flow of data through transfer functions and solve the resulting data flow equations to compute the values at each program point.

Uploaded by

Bef
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Agenda

CS738: Advanced Compiler Optimizations

Data Flow Analysis


◮ Static analysis and compile-time optimizations
Amey Karkare ◮ For the next few lectures
[email protected] ◮ Intraprocedural Data Flow Analysis
http://www.cse.iitk.ac.in/~karkare/cs738 ◮ Classical Examples
Department of CSE, IIT Kanpur ◮ Components

Assumptions 3-address Code Format

◮ Assignments
x = y op z
x = op y
◮ Intraprocedural: Restricted to a single function x=y
◮ Jump/control transfer
◮ Input in 3-address format
goto L
◮ Unless otherwise specified if x relop y goto L
◮ Statements can have label(s)
L: . . .
◮ Arrays, Pointers and Functions to be added later when
needed
Data Flow Analysis Data Flow Abstraction

◮ Class of techniques to derive information about flow of


data ◮ Basic Blocks (BB)
◮ along program execution paths ◮ sequence of 3-address code stmts
◮ Used to answer questions such as: ◮ single entry at the first statement
◮ whether two identical expressions evaluate to same value ◮ single exit at the last statement
◮ used in common subexpression elimination
◮ Typically we use “maximal” basic block (maximal sequence
◮ whether the result of an assignment is used later of such instructions)
◮ used by dead code elimination

Identifying Basic Blocks Special Basic Blocks

◮ Two special BBs are added to simplify the analysis


◮ Leader: The first statement of a basic block ◮ empty (?) blocks!
◮ The first instruction of the program (procedure)
◮ Target of a branch (conditional and unconditional goto) ◮ Entry: The first block to be executed for the procedure
◮ Instruction immediately following a branch analyzed
◮ Exit: The last block to be executed
Data Flow Abstraction CFG Edges

◮ Control Flow Graph (CFG) ◮ Edge B1 → B2 ∈ E if control can transfer from B1 to B2


◮ Fall through
◮ A rooted directed graph G = (N, E) ◮ Through jump (goto)
◮ N = set of BBs ◮ Edge from Entry to (all?) real first BB(s)
◮ including Entry, Exit ◮ Edge to Exit from all last BBs
◮ BBs containing return
◮ E = set of edges
◮ Last real BB

Data Flow Abstraction: Control Flow Graph Data Flow Abstraction: Program Points

◮ Graph representation of paths that program may exercise


◮ Input state/Output state for Stmt
during execution
◮ Program point before/after a stmt
◮ Typically one graph per procedure ◮ Denoted IN[s] and OUT[s]
◮ Graphs for separate procedures have to be ◮ Within a basic block:
combined/connected for interprocedural analysis ◮ Program point after a stmt is same as the program point
◮ Later! before the next stmt
◮ Single procedure, single flow graph for now.
Data Flow Abstraction: Program Points Data Flow Abstraction: Execution Paths

◮ An execution path is of the form


◮ Input state/Output state for BBs
p1 , p2 , p3 , . . . , pn
◮ Program point before/after a bb
◮ Denoted IN[B] and OUT[B] where pi → pi+1 are adjacent program points in the CFG.
◮ For B1 and B2 :
◮ if there is an edge from B1 to B2 in CFG, then the program ◮ Infinite number of possible execution paths in practical
point after the last stmt of B1 may be followed immediately by programs.
the program point before the first stmt of B2 . ◮ Paths having no finite upper bound on the length.
◮ Need to summarize the information at a program point with
a finite set of facts.

Data Flow Schema Data Flow Problem

◮ Constraints on data flow values


◮ Transfer constraints
◮ Data flow values associated with each program point ◮ Control flow constraints
◮ Summarize all possible states at that point ◮ Aim: To find a solution to the constraints
◮ Domain: set of all possible data flow values ◮ Multiple solutions possible
◮ Different domains for different analyses/optimizations ◮ Trivial solutions, . . . , Exact solutions
◮ We typically compute approximate solution
◮ Close to the exact solution (as close as possible!)
◮ Why not exact solution?
Data Flow Constraints: Transfer Constraints Data Flow Constraints: Control Flow Constraints

◮ Transfer functions
◮ relationship between the data flow values before and after a
stmt
◮ forward functions: Compute facts after a statement s from ◮ Relationship between the data flow values of two points
the facts available before s. that are related by program execution semantics
◮ General form:
◮ For a basic block having n statements:
OUT[s] = fs (IN[s])
IN[si+1 ] = OUT[si ], i = 1, 2, . . . , n − 1
◮ backward functions: Compute facts before a statement s
from the facts available after s. ◮ IN[s1 ], OUT[sn ] to come later
◮ General form:
IN[s] = fs (OUT[s])

◮ fs depends on the statement and the analysis

Data Flow Constraints: Notations Data Flow Constraints: Basic Blocks


◮ Forward
◮ For B consisting of s1 , s2 , . . . , sn

fB = fsn ◦ . . . ◦ fs2 ◦ fs1

◮ PRED (B): Set of predecessor BBs of block B in CFG OUT[B] = fB (IN[B])

◮ SUCC (B): Set of successor BBs of block B in CFG ◮ Control flow constraints
M
◮ f ◦ g : Composition of functions f and g IN[B] = OUT[P]
L
◮ : An abstract operator denoting some way of combining P∈PRED(B)

facts present in a set .


◮ Backward
f B = fs 1 ◦ fs 2 ◦ . . . ◦ fs n
IN[B] = fB (OUT [B])
M
OUT[B] = IN[S]
S∈SUCC(B)
Data Flow Equations Example Data Flow Analysis

◮ Typical Equation

OUT[s] = IN[s] − kill[s] ∪ gen[s] ◮ Reaching Definitions Analysis


gen(s): information generated ◮ Definition of a variable x: x = . . . something . . .
kill(s): information killed ◮ Could be more complex (e.g. through pointers, references,
◮ Example: implicit)
a = b*c // generates expression b * c
c = 5 // kills expression b*c
d = b*c // is b*c redundant here?

Reaching Definitions Analysis RD Analysis of a Structured Program

IN(s1 )

d :x =y +z s1
◮ A definition d reaches a point p if
◮ there is a path from the point immediately following d to p OUT(s1 )
◮ d is not “killed” along that path
◮ “Kill” means redefinition of the left hand side (x in the earlier
example) OUT(s1 ) = IN(s1 ) − KILL(s1 ) ∪ GEN(s1 )
GEN(s1 ) = {d}
KILL(s1 ) = Dx − {d}, where Dx : set of all definitions of x
KILL(s1 ) = Dx ? will also work here
but may not work in general
RD Analysis of a Structured Program RD Analysis of a Structured Program

IN(S)
IN(S)
S
s1 S

s1 s2
s2
OUT(S)
OUT(S)

GEN(S) = GEN(s1 ) − KILL(s2 ) ∪ GEN(s2 )


GEN(S) = GEN(s1 ) ∪ GEN(s2 )
KILL(S) = KILL(s1 ) − GEN(s2 ) ∪ KILL(s2 )
KILL(S) = KILL(s1 ) ∩ KILL(s2 )
IN(s1 ) = IN(S)
IN(s1 ) = IN(s2 ) = IN(S)
IN(s2 ) = OUT(s1 )
OUT(S) = OUT(s1 ) ∪ OUT(s2 )
OUT(S) = OUT(s2 )

RD Analysis of a Structured Program RD Analysis is Approximate

IN(S)
IN(S) S
S
s1 s2
s1

OUT(S)
OUT(S)
◮ Assumption: All paths are feasible.
◮ Example:
GEN(S) = GEN(s1 )
KILL(S) = KILL(s1 ) if (true) s1;
else s2;
OUT(S) = OUT(s1 )
Fact Computed Actual
IN(s1 ) = IN(S) ∪ GEN(s1 )
GEN(S) = GEN(s1 ) ∪ GEN(s2 ) ⊇ GEN(s1 )
KILL(S) = KILL(s1 ) ∩ KILL(s2 ) ⊆ KILL(s1 )
RD Analysis is Approximate RD at BB level

IN(S)
S ◮ A definition d can reach the start of a block from any of its
predecessor
s1 s2 ◮ if it reaches the end of some predecessor
[
IN(B) = OUT(P)
OUT(S) P∈PRED(B)

◮ Thus, ◮ A definition d reaches the end of a block if


true GEN(S) ⊆ analysis GEN(S) ◮ either it is generated in the block
true KILL(S) ⊇ analysis KILL(S) ◮ or it reaches block and not killed
◮ More definitions computed to be reaching than actually do! OUT(B) = IN(B) − KILL(B) ∪ GEN(B)
◮ Later we shall see that this is SAFE approximation
◮ prevents optimizations
◮ but NO wrong optimization

Solving RD Constraints

for each block B {


OUT(B) = ∅;
◮ KILL & GEN known for each BB. }
OUT(Entry ) = ∅; // note this for later discussion
◮ A program with N BBs has 2N equations with 2N change = true;
unknowns. while (change) {
◮ Solution is possible. change = false;
◮ Iterative approach (on the next slide). for each block B other than Entry {
S
IN(B) = P∈PRED(B) OUT(P);
oldOut = OUT(B);
OUT(B) = IN(B) − KILL(B) ∪ GEN(B);
if (OUT(B) 6=oldOut) then {
change = true;
}
}
}
Reaching Definitions: Example Reaching Definitions: Example
Pass# Pt B1 B2 B3 B4
Init IN - - - -
OUT ∅ ∅ ∅ ∅
1 IN ∅ d1, d2, d3, d3,
d3 d4, d5 d4,
d5, d6
OUT d1, d3, d4, d4, d3,
BB GEN KILL d2, d3 d5 d5, d6 d5,
d6, d7
B1 {d1, d2, d3} {d4, d5, d6, d7}
2 IN ∅ d1, d2, d3, d3,
B2 {d4, d5} {d1, d2, d7} d3, d5, d4, d4,
B3 {d6} {d3} d6, d7 d5, d6 d5, d6
OUT d1, d3, d4, d4, d3,
B4 {d7} {d1, d4} d2, d3 d5, d6 d5, d6 d5,
d6, d7
3 IN ∅ d1, d2, d3, d3,
d3, d5, d4, d4,
d6, d7 d5, d6 d5, d6
OUT d1, d3, d4, d4, d3,
d2, d3 d5, d6 d5, d6 d5,
d6, d7

Reaching Definitions: Bitvectors Reaching Definitions: Bitvectors

◮ Set-theoretic definitions:
[
a bit for each definition: IN(B) = OUT(P)
d1 d2 d3 d4 d5 d6 d7 P∈PRED(B)
Pass# Pt B1 B2 B3 B4
Init IN - - - - OUT(B) = IN(B) − KILL(B) ∪ GEN(B)
OUT 0000000 0000000 0000000 0000000
1 IN 0000000 1110000 0011100 0011110 ◮ Bitvector definitions:
OUT 1110000 0011100 0001110 0010111
_
2 IN 0000000 1110111 0011110 0011110
OUT 1110000 0011110 0001110 0010111
IN(B) = OUT(P)
3 IN 0000000 1110111 0011110 0011110 P∈PRED(B)
OUT 1110000 0011110 0001110 0010111
OUT(B) = IN(B) ∧ ¬KILL(B) ∨ GEN(B)
◮ Bitwise ∨, ∧, ¬ operators
Reaching Definitions: Application Reaching Definitions: Application

Constant Folding
◮ Recall the approximation in reaching definition analysis
while changes occur { true GEN(S) ⊆ analysis GEN(S)
forall the stmts S of the program { true KILL(S) ⊇ analysis KILL(S)
foreach operand B of S { ◮ Can it cause the application to infer
if there is a unique definition of B ◮ an expression as a constant when it is has different values
that reaches S and is a constant C { for different executions?
replace B by C in S; ◮ an expression as not a constant when it is a constant for all
if all operands of S are constant { executions?
replace rhs by eval(rhs); ◮ Safety? Profitability?
mark definition as constant;
}}}}}

Reaching Definitions: Summary Reaching Definitions: Summary

 
dx in B defines variable x and is not ◮ Entry block has to be initialized specially:
◮ GEN(B) = dx
followed by another definition of x in B
◮ KILL(B) = {dx | B contains some definition of x } OUT(Entry ) = EntryInfo
S EntryInfo = ∅
◮ IN(B) = P∈PRED(B) OUT(P)
◮ OUT(B) = IN(B) − KILL(B) ∪ GEN(B) ◮ A better entry info could be:
V
◮ meet ( ) operator: The operator to combine information
coming along different predecessors is ∪ EntryInfo = {x = undefined | x is a variable}
◮ What about the Entry block? ◮ Why?

You might also like