Compiler Design
Compiler Design
Compiler Design
of
Amsterdam
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 1/98
Compilers: Organization Revisited
University
of
Amsterdam
Optimizer
Independent part of compiler
Different optimizations possible
IR to IR translation
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 2/98
Intermediate Representation (IR)
University
of
Amsterdam
Flow graph
Nodes are basic blocks
Basic blocks are single entry and single exit
Edges represent control-flow
Abstract Machine Code
Including the notion of functions and procedures
Symbol table(s) keep track of scope and binding
information about names
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 3/98
Partitioning into basic blocks
University
of
Amsterdam
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 4/98
Partitioning into basic blocks (cont’d)
University
of
Amsterdam
1 prod=0
BB1
2 i=1
3 t1=4*i
4 t2=a[t1]
5 t3=4*i
6 t4=b[t3]
BB2 7 t5=t2*t4
8 t6=prod+t5
9 prod=t6
10 t7=i+i
11 i=t7
12 if i < 21 goto 3
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 5/98
Intermediate Representation (cont’d)
University
of
Amsterdam
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 6/98
Directed Acyclic Graph
University
of
Amsterdam
Like ASTs:
Leaves are labeled by variable names or constants
Interior nodes are labeled by an operator
Nodes can have variable names attached that contain the
value of that expression
Common subexpressions are represented by multiple edges
to the same expression
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 7/98
DAG creation
University
of
Amsterdam
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 8/98
DAG creation (cont’d)
University
of
Amsterdam
applicable
Find node n labeled op with children node y and node z
if applicable. When not found, create node n. In case 3 let n
be node y
Make node x point to n and update the attached identifiers
for x
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 9/98
DAG example
University
of
Amsterdam
1 t1 = 4 * i
2 t2 = a[t1] + t6, prod
3 t3 = 4 * i t5
prod *
4 t4 = b[t3]
[] t2 [] t4 <=
5 t5 = t2 * t4
* t1, t3 + t7, i
6 t6 = prod + t5 20
7 prod = t6 a b 4 i 1
8 t7 = i + 1
9 i = t7
CSA 10 if (i 20) goto 1
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 10/98
Local optimizations
University
of
Amsterdam
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 11/98
Transformations on basic blocks
University
of
Amsterdam
Examples
Function-preserving transformations
Common subexpression elimination
Constant folding
Copy propagation
Dead-code elimination
Temporary variable renaming
Interchange of independent statements
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 12/98
Transformations on basic blocks (cont’d)
University
of
Amsterdam
Algebraic transformations
Machine dependent eliminations/transformations
Removal of redundant loads/stores
Use of machine idioms
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 13/98
Common subexpression elimination
University
of
Amsterdam
x a b x a b
y a b y x
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 14/98
Constant folding
University
of
Amsterdam
x 3 5 x 8
y x 2 y 16
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 15/98
Copy propagation
University
of
Amsterdam
x y x y
z x 2 z y 2
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 16/98
Dead-code elimination
University
of
Amsterdam
eliminated
Requires live-variable analysis (discussed later on)
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 17/98
Temporary variable renaming
University
of
Amsterdam
t1 a b t1 a b
t2 t1 2 t2 t1 2
t1 d e t3 d e
c t1 1 c t3 1
If each statement that defines a temporary defines a new
temporary, then the basic block is in normal-form
Makes some optimizations at BB level a lot simpler
(e.g. common subexpression elimination, copy
CSA propagation, etc.)
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 18/98
Algebraic transformations
University
of
Amsterdam
x x 2 x x 1
x x2 x x x
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 19/98
Machine dependent eliminations/transformations
University
of
Amsterdam
3 $Lx: ...
Use of machine idioms, e.g.,
Auto increment/decrement addressing modes
CSA SIMD instructions
Computer Etc., etc. (see practical assignment)
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 20/98
Other sources of optimizations
University
of
Amsterdam
Global optimizations
Global common subexpression elimination
Global constant folding
Global copy propagation, etc.
Loop optimizations
They all need some dataflow analysis on the flow graph
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 21/98
Loop optimizations
University
of
Amsterdam
Code motion
Decrease amount of code inside loop
Take a loop-invariant expression and place it before the
loop
while (i t)
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 22/98
Loop optimizations (cont’d)
University
of
Amsterdam
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 23/98
Loop optimizations (cont’d)
University
of
Amsterdam
Strength reduction
Strength reduction is the replacement of expensive
operations by cheaper ones (algebraic transformation)
Its use is not limited to loops but can be helpful for
induction variable elimination
i i 1 i i 1
t1 i 4 t1 t1 4
t2 a t1 t2 a t1
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 24/98
Loop optimizations (cont’d)
University
of
Amsterdam
i i 1 t1 t1 4
t1 t1 4 t2 a t1
t2 a t1 if (t1 40) goto top
if (i 10) goto top
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 25/98
Finding loops in flow graphs
University
of
Amsterdam
Dominator relation
Node A dominates node B if all paths to node B go through
node A
A node always dominates itself
We can construct a tree using this relation: the Dominator tree
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 26/98
Dominator tree example
University
of
Amsterdam
1 1
2 2 3
3
4
4 5 7
5 6 6
7 8
9 10
8
9 10
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 27/98
Natural loops
University
of
Amsterdam
nodes that can reach n without going through d
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 28/98
Finding natural loop of n d
University
of
Amsterdam
procedure insert(m) {
if (not m loop) {
loop loop m
push(m)
}
}
stack 0/
loop d
insert(n)
/ {
while (stack 0)
m = pop()
CSA for (p pred m ) insert(p)
Computer }
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 29/98
Natural loops (cont’d)
University
of
Amsterdam
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 30/98
Our example revisited
University
of
Amsterdam
Flow graph
1
2
3
4
5 6
7
8
9 10
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 31/98
Our example revisited
University
of
Amsterdam
Flow graph
1
2
Natural loops:
3
1. backedge 10 −> 7: {7,8,10} (the inner loop)
2. backedge 7 −> 4: {4,5,6,7,8,10}
4 3. backedges 4 −> 3 and 8 −> 3: {3,4,5,6,7,8,10}
4. backedge 9 −> 1: the entire flow graph
5 6
7
8
9 10
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 31/98
Reducible flow graphs
University
of
Amsterdam
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 32/98
Reducible flow graphs (cont’d)
University
of
Amsterdam
b c b c
c’
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 33/98
Dataflow analysis
University
of
Amsterdam
The notion of generation and killing depends on the
dataflow analysis problem to be solved
Let’s first consider Reaching Definitions analysis for
structured programs
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 34/98
Reaching definitions
University
of
Amsterdam
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 35/98
Reaching definitions (cont’d)
University
of
Amsterdam
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 36/98
Dataflow analysis for reaching definitions
University
of
Amsterdam
Amsterdam
kill[S]=Da d
S d: a=b+c
out[S]=gen[S] (in[S]-kill[S])
gen[S]=gen[S2] (gen[S1]-kill[S2])
kill[S]=kill[S2] (kill[S1]-gen[S2])
S1
S
in[S1]=in[S]
S2 in[S2]=out[S1]
out[S]=out[S2]
gen[S]=gen[S1] gen[S2]
kill[S]=kill[S1] kill[S2]
S S1 S2
in[S1]=in[S2]=in[S]
out[S]=out[S1] out[S2]
gen[S]=gen[S1]
kill[S]=kill[S1]
CSA S S1
in[S1]=in[S] gen[S1]
Computer out[S]=out[S1]
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 38/98
Dealing with loops
University
of
Amsterdam
The in-set to the code inside the loop is the in-set of the
loop plus the out-set of the loop: in S1 in S out S1
The out-set of the loop is the out-set of the code inside:
out S out S1
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 39/98
Dealing with loops (cont’d)
University
of
Amsterdam
I J O
O G I K
Assume O
/ then I 1
0, J
O1 G I1 K G J K
I2 J O1 J G J K J G
O2 G I2 K G J G K G J K
O1 O2 so in S1 in S gen S1 and out S out S1
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 40/98
Reaching definitions example
University
of
Amsterdam
do 100 0000
000 1001
; 001 0000 d3
000 0010
d4 i=i+1 d1
010 0000
d2
000 1111
000 0100
d5 j=j-1 000 1111
110 0000 do
110 0000 ; e2
if (e1) 000 1100
110 0001 ;
000 0011
d6 a = u2 000 1000
100 0001 d4 d5
000 0000
if
e1
else 000 0100
010 0000 d6 d7
000 0001
000 0010
d7 i = u3 001 0000
100 1000
while (e2)
CSA In reality, dataflow analysis is often performed at the granularity
Computer of basic blocks rather than statements
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 41/98
Iterative solutions
University
of
Amsterdam
in B
out P
P pred B
!
out B
gen B in B kill B
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 42/98
Iterative algorithm for reaching definitions
University
of
Amsterdam
do {
change false
for (each block B) {
in B out P
P pred B
!
oldout out B
if (out B oldout) change true
}
CSA
Computer } while (change)
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 43/98
Reaching definitions: an example
University
of
Amsterdam
d1: i = m −1 gen[B1] = {d1,d2,d3}
B1 d2: j = n kill[B1] = {d4,d5,d6,d7}
d3: a = u1
gen[B3] = {d6}
d6: a = u2 B3 kill[B3] = {d3}
gen[B4] = {d7}
B4 d7: i = u3
kill[B4] = {d1,d4}
"#
"#
"#
"#
"#
B1 000 0000 111 0000 000 0000 111 0000 000 0000 111 0000
B2 000 0000 000 1100 111 0011 001 1110 111 1111 001 1110
CSA B3 000 0000 000 0010 001 1110 000 1110 001 1110 000 1110
Computer
Systems B4 000 0000 000 0001 001 1110 001 0111 001 1110 001 0111
Architecture
Introduction to Compiler Design – A. Pimentel – p. 44/98
Available expressions
University
of
Amsterdam
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 45/98
Available expressions (cont’d)
University
of
Amsterdam
B1 t1 = 4 * i B1 t1 = 4 * i
i = ...
? B2
B2 t0 = 4 * i
B3 t2 = 4 * i B3 t2 = 4 * i
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 46/98
Available expressions (cont’d)
University
of
Amsterdam
Dataflow equations:
in B out P for B not initial
P pred B
!
in B1 0/ where B1 is the initial block
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 47/98
Liveness analysis
University
of
Amsterdam
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 48/98
Dataflow for liveness
University
of
Amsterdam
in B use B out B de f B
out B in S
S succ B
$
%
Note the relation between reaching-definitions equations:
the roles of in and out are interchanged
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 50/98
Algorithms for global optimizations
University
of
Amsterdam
available do the following
Search backwards in the graph for the evaluations of
y z
Replace statement s by x u
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 51/98
Copy propagation
University
of
Amsterdam
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 52/98
Copy propagation (cont’d)
University
of
Amsterdam
in B out P for B not initial
P pred B
!
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 53/98
Copy propagation (cont’d)
University
of
Amsterdam
If so, remove s and replace the uses of x by uses of y
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 54/98
Detection of loop-invariant computations
University
of
Amsterdam
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 55/98
Code motion
University
of
Amsterdam
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 56/98
Code motion (cont’d)
University
of
Amsterdam
i = 1 B1
i = 1 B1 i = 1 B1
i=3
if u < v goto B3 B2 if u < v goto B3 B2 if u < v goto B3 B2
v=v−1 v=v−1
B4 B4 k=i
if v <= 20 goto B5 if v <= 20 goto B5 v=v−1 B4
if v <= 20 goto B5
j = i B5 j = i B5
j = i B5
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 57/98
Detection of induction variables
University
of
Amsterdam
&
Associated with each induction variable j is a triple i c d
'
'
where i is a basic induction variable and c and d are
constants such that j c i d
In this case j belongs to the family of i
The basic induction variable i belongs to its own family,
with the associated triple i 1 0
'
'
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 58/98
Detection of induction variables (cont’d)
University
of
Amsterdam
b is a constant and j is an induction variable
If j is not basic and in the family of i then there must be
No assignment of i between the assignment of j and k
No definition of j outside the loop that reaches k
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 59/98
Strength reduction for induction variables
University
of
Amsterdam
'
'
Create a new variable s
Replace the assignment to j by j s
Immediately after each assignment i i n append
&
s s c n
Place s in the family of i with triple i c d
'
'
Initialize s in the preheader: s c i d
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 60/98
Strength reduction for induction variables (cont’d)
University
of
Amsterdam
B1 B1
i=m−1 i=m−1
t1 = 4 * n t1 = 4 * n
v = a[t1] v = a[t1]
s2 = 4 * i
B2
i=i+1 Strength reduction B2
t2 = 4 * i i=i+1
t3 = a[t2] s2 = s2 + 4
if t3 < v goto B2 t2 = s2
t3 = a[t2]
B3 if t3 < v goto B2
if i < n goto B5
B3
if i < n goto B5
B4 B5
CSA B4 B5
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 61/98
Elimination of induction variables
University
of
Amsterdam
Delete assignments to i from the loop
Do some copy propagation to eliminate j s assignments
formed during strength reduction
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 62/98
Alias Analysis
University
of
Amsterdam
Use dataflow analysis to determine what a pointer might
point to
in B contains for each pointer p the set of variables to
'
and a a variable, meaning that p might point to a
out B is defined similarly for the end of B
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 63/98
Alias Analysis (cont’d)
University
of
Amsterdam
&
transs S
S p b any variable b pa
)
'
'
If s is p q c for pointer q and nonzero integer c,
then &
transs S S p b any variable b
)
'
pb qb
)
'
'
S and b is an array variable
CSA If s is p q, then
transs S S p b any variable b
)
Computer
'
Systems pb qb S
)
'
'
Architecture
Introduction to Compiler Design – A. Pimentel – p. 64/98
Alias Analysis (cont’d)
University
of
Amsterdam
)
'
– If s is not an assignment to a pointer, then transs S S
Dataflow equations for alias analysis:
out B transB in B
in B
out P
P pred B
!
where transB S
transsk transsk transs1 S
,++
+
1
*
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 65/98
Alias Analysis (cont’d)
University
of
Amsterdam
How to use the alias dataflow information? Examples:
In reaching definitions analysis (to determine gen and
kill)
statement p a generates a definition of every
variable b such that p could point to b
p a kills definition of b only if b is not an array
Computer
Systems use of any variable that p could point to
Architecture
Introduction to Compiler Design – A. Pimentel – p. 66/98
Code generation
University
of
Amsterdam
Instruction selection
Was a problem in the CISC era (e.g., lots of addressing
modes)
RISC instructions mean simpler instruction selection
However, new instruction sets introduce new, complicated
instructions (e.g., multimedia instruction sets)
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 67/98
Instruction selection methods
University
of
Amsterdam
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 68/98
Tree pattern based selection
University
of
Amsterdam
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 69/98
Tree pattern based selection (cont’d)
University
of
Amsterdam
— ri temp 0
mem mem
+
ADD ri rj rk 1
.
-
+ +
mem const b *
+ const d MUL ri r j rk 1
-
/
* temp 2 const c + +
ADDI ri rj c const 1
.
-
const const
temp 1 const a
mem mem mem mem
LOAD ri M rj c 3
0
.1
-
+ + const
const const
0
.1
-
mem mem mem mem
+ + const
const const
move
MOVEM M r j M ri 6
01
01
-
mem mem
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 70/98
Optimal and optimum tilings
University
of
Amsterdam
The cost of a tiling is the sum of the costs of the tree patterns
An optimal tiling is one where no two adjacent tiles can be
combined into a single tile of lower cost
An optimum tiling is a tiling with lowest possible cost
An optimum tiling is also optimal, but not vice-versa
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 71/98
Maximal Munch
University
of
Amsterdam
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 72/98
Dynamic programming
University
of
Amsterdam
2
Store m on a stack
Color the graph G
2
Graph G can be colored since m has less than K
neighbors
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 75/98
Coloring by simplification (cont’d)
University
of
Amsterdam
Spill
If a node with less than K neigbors cannot be found in
G
Mark a node n to be spilled, remove n and its edges
from G (and stack n) and continue simplification
Select
Assign colors by popping the stack
Arriving at a spill node, check whether it can be
colored. If not:
The variable represented by this node will reside in
memory (i.e. is spilled to memory)
CSA
Computer
Actual spill code is inserted in the program
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 76/98
Coalescing
University
of
Amsterdam
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 77/98
Sketch of the algorithm with coalescing
University
of
Amsterdam
its moves for coalescing anymore
CSA Spill
Computer Select
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 78/98
Register allocation: an example
University
of
Amsterdam
f
Live in: k,j
g = mem[j+12] e
h = k −1
f=g*h
e = mem[j+8]
m = mem[j+16] j k b m
b = mem[f]
c=e+8
d=c
k=m+4 d c
j=b
goto d
Live out: d,k,j h g
Assume a 4-coloring (K 4)
CSA
Simplify by removing and stacking nodes with 4
Computer
Systems neighbors (g,h,k,f,e,m)
Architecture
Introduction to Compiler Design – A. Pimentel – p. 79/98
Register allocation: an example (cont’d)
University
of
Amsterdam
j b
j&b d&c
d c
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 80/98
Register allocation: an example (cont’d)
University
of
Amsterdam
f f
e e
j k b m j k b m
d c d c
h g h g
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 81/98
Register allocation: an example (cont’d)
University
of
Amsterdam
f f
e e
j k b m j k b m
d c d c
g g
h h ETC., ETC.
CSA No spills are required and both moves were optimized away
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 82/98
Instruction scheduling
University
of
Amsterdam
3
Time
Position in matrix is true or false, indicating whether
the resource is in use at that time
Instructions represented by matrices Resources
3
Instruction duration
Using dependency analysis, the schedule is made by
CSA fitting instructions as tight as possible
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 83/98
List scheduling (cont’d)
University
of
Amsterdam
heuristics, e.g. at an operation conflict schedule the most
time-critical first
For a VLIW processor, the maximum instruction duration
is used for scheduling painful for memory loads!
Basic blocks usually are small (5 operations on the average)
benefit of scheduling limited Trace Scheduling
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 84/98
Trace scheduling
University
of
Amsterdam
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 85/98
Trace scheduling (cont’d)
University
of
Amsterdam
BB1
BB2 BB3
BB4
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 86/98
Trace scheduling (cont’d)
University
of
Amsterdam
Operation to be moved
below Branch in
trace
Basic Block Op A Branch Copied code
in Trace Branch
Op A
Op C Basic Block Op A Op C
Basic Block in Trace
in Trace Op B Op B
Off Trace
Basic Block
(a)
In Trace
Operation to be moved Moved code only
Branch above Branch Op A allowed if no side-
Branch effects in Off trace
code
Op C
Op A Op C
Op B Off Trace Op B
In Trace (b)
Op A Op C Op B Op C
CSA Op A Op B
Computer Op B
Operation to be moved
before Op A Copied code in
Systems In Trace off Trace Basic Block
(c)
Architecture
Introduction to Compiler Design – A. Pimentel – p. 87/98
Trace scheduling (cont’d)
University
of
Amsterdam
Trace selection
Because of the code copies, the trace that is most often
executed has to be scheduled first
A longer trace brings more opportunities for ILP (loop
unrolling!)
Use heuristics about how often a basic block is executed
and which paths to and from a block have the most chance
of being taken (e.g. inner-loops) or use profiling (input
dependent)
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 88/98
Other methods to increase ILP
University
of
Amsterdam
Loop unrolling
Technique for increasing the amount of code available
inside a loop: make several copies of the loop body
Reduces loop control overhead and increases ILP (more
instructions to schedule)
When using trace scheduling this results in longer traces
and thus more opportunities for better schedules
In general, the more copies, the better the job the scheduler
can do but the gain becomes minimal
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 89/98
Loop unrolling (cont’d)
University
of
Amsterdam
Example
for (i = 0; i < 100; i += 4) {
a[i] = a[i] + b[i];
for (i = 0; i < 100; i++) a[i+1] = a[i+1] + b[i+1];
becomes
a[i] = a[i] + b[i]; a[i+2] = a[i+2] + b[i+2];
a[i+3] = a[i+3] + b[i+3];
}
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 90/98
Software pipelining
University
of
Amsterdam
Software
pipelined
iteration
Loop: LD F0,0(R1)
ADDD F4,F0,F2 Body
SD 0(R1),F4
SBGEZ R1, Loop Loop control
T0 LD
T1 . LD
Prologue
T2 ADDD . LD
Steady state T... Loop: SD ADDD . LD SBGEZ Loop
.
Tn SD ADDD
Computer Tn+2 SD
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 92/98
Modulo scheduling
University
of
Amsterdam
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 93/98
Compiler optimizations for cache performance
University
of
Amsterdam
4
struct merge m_array[SIZE]
Loop interchange
Loop fusion and fission
Blocking (better temporal locality)
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 94/98
Loop interchange
University
of
Amsterdam
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 95/98
Loop fusion
University
of
Amsterdam
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 96/98
Loop fission
University
of
Amsterdam
CSA
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 97/98
Blocking
University
of
Amsterdam
Perform computations on sub-matrices (blocks), e.g. when
multiple matrices are accessed both row by row and column by
column
Matrix multiplication x = y*z
for (i=0; i < N; i++) X j Y k Z j
for (j=0; j < N; j++) {
r = 0;
for (k = 0; k < N; k++) { i i k
r = r + y[i][k]*z[k][j];
};
x[i][j] = r;
};
not touched older access recent access
Blocking
X j Y k Z j
CSA i i k
Computer
Systems
Architecture
Introduction to Compiler Design – A. Pimentel – p. 98/98