0% found this document useful (0 votes)
96 views

Instruction Scheduling: List Scheduling, Trace Scheduling, Loop Unrolling & Software Pipelining

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
96 views

Instruction Scheduling: List Scheduling, Trace Scheduling, Loop Unrolling & Software Pipelining

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 137

CSCI 565 - Compiler Design Spring 2016

Instruction Scheduling
List Scheduling, Trace Scheduling, Loop Unrolling &
Software Pipelining

Copyright 2016, Pedro C. Diniz, all rights reserved.


Students enrolled in the Compilers class at the University of Southern California (USC)
have explicit permission to make copies of these materials for their personal use.
Pedro Diniz
[email protected]
CSCI 565 - Compiler Design Spring 2016

Outline
Overview of Instruction Scheduling
List Scheduling
Resource Constraints
Interaction with Register Allocation
Scheduling across Basic Blocks
Trace Scheduling
Scheduling for Loops
Loop Unrolling
Software Pipelining

Pedro Diniz 2
[email protected]
CSCI 565 - Compiler Design Spring 2016

Simple Execution Model


5 Stage pipe-line

fetch decode execute memory writeback

Fetch: get the next instruction


Decode: figure-out what that instruction is
Execute: Perform ALU operation
address calculation in a memory op
Memory: Do the memory access in a Mem. Op.
Write Back: write the results back
Pedro Diniz 3
[email protected]
CSCI 565 - Compiler Design Spring 2016

Simple Execution Model


time

Inst 1 IF DE EXE MEM WB

IF DE EXE MEM WB
Inst 2

Pedro Diniz 4
[email protected]
CSCI 565 - Compiler Design Spring 2016

Simple Execution Model


time

Inst 1 IF DE EXE MEM WB

IF DE EXE MEM WB
Inst 2

Inst 1 IF DE EXE MEM WB

Inst 2 IF DE EXE MEM WB

Inst 3 IF DE EXE MEM WB

Inst 4 IF DE EXE MEM WB

Pedro Diniz 5
[email protected]
CSCI 565 - Compiler Design Spring 2016

Handling Branch and Jump Instructions


Does not know the location of the next instruction
until later
after DE in jump instructions
after EXE in branch instructions

Branch IF DE EXE MEM WB

IF DE EXE MEM WB
Inst

Pedro Diniz 6
[email protected]
CSCI 565 - Compiler Design Spring 2016

Handling Branch and Jump Instructions


Does not know the location of the next instruction
until later
after DE in jump instructions
after EXE in branch instructions
What to do with the middle 2 instructions?

Branch IF DE EXE MEM WB

IF DE EXE MEM WB
???
IF DE EXE MEM WB
???
IF DE EXE MEM WB
Inst
Pedro Diniz 7
[email protected]
CSCI 565 - Compiler Design Spring 2016

Handling Branch and Jump Instructions


What to do with the middle 2 Instructions?
Delay the Action of the Branch (Delay Slots)
Make branch affect only after two instructions
Two instructions after the branch get executed regardless of the branch

Branch IF DE EXE MEM WB

IF DE EXE MEM WB
Next seq inst
IF DE EXE MEM WB
Next seq inst
IF DE EXE MEM WB
Branch target inst

Pedro Diniz 8
[email protected]
CSCI 565 - Compiler Design Spring 2016

Constraints On Scheduling
Data Dependences
Inherent in the code
Control Dependences
Inherent in the code
Resource Constraints

Sometimes we can Mitigate these Issues


Code restructuring
Instruction Scheduling

Pedro Diniz 9
[email protected]
CSCI 565 - Compiler Design Spring 2016

Data Dependency between Instructions


If two instructions access the same variable (i.e. the
same memory location), they can be dependent
Kind of Dependences
True: write read
Anti: read write
Output: write write
Input: read read

What to do if two Instructions are Dependent?


The order of execution cannot be reversed
Reduce the possibilities for scheduling

Pedro Diniz 10
[email protected]
CSCI 565 - Compiler Design Spring 2016

Representing Dependences
Using a dependence DAG, one per Basic Block
Nodes are instructions, edges represent dependences

1: r2 = *(r1 + 4)
2: r3 = *(r1 + 8)
3: r4 = r2 + r3
4: r5 = r2 - 1

Pedro Diniz 11
[email protected]
CSCI 565 - Compiler Design Spring 2016

Representing Dependences
Using a dependence DAG, one per Basic Block
Nodes are instructions, edges represent dependences
1 2
1: r2 = *(r1 + 4)
2: r3 = *(r1 + 8)
3: r4 = r2 + r3 2 2
4: r5 = r2 - 1
2

4 3
Edge is labeled with Latency:
v(i j) = delay required between initiation times of i and j minus the
execution time required by i

Pedro Diniz 12
[email protected]
CSCI 565 - Compiler Design Spring 2016

Resource Constraints
Modern Machines Have Many Resource Constraints
Superscalar Architectures:
Can Execute few Operations Concurrently
But have constraints
Example:
1 integer operation
ALUop dest, src1, src2 # in 1 clock cycle
In parallel with
1 memory operation
LD dst, addr # in 2 clock cycles
ST src, addr # in 1 clock cycle

Pedro Diniz 13
[email protected]
CSCI 565 - Compiler Design Spring 2016

Outline
Overview of Instruction Scheduling
List Scheduling
Resource Constraints
Interaction with Register Allocation
Scheduling across Basic Blocks
Trace Scheduling
Scheduling for Loops
Loop Unrolling
Software Pipelining

Pedro Diniz 14
[email protected]
CSCI 565 - Compiler Design Spring 2016

List Scheduling Algorithm


Idea:
Do a Topological Sorting of the Dependence DAG
Consider when an instruction can be scheduled without
causing a stall
Schedule the instruction if it causes no stall and all its
predecessors are already scheduled

Optimal List Scheduling is NP-complete


Use Heuristics when necessary

Pedro Diniz 15
[email protected]
CSCI 565 - Compiler Design Spring 2016

List Scheduling Algorithm


Create a dependence DAG of a Basic Block
Topological Sorting
READY List = nodes with no predecessors
Loop until READY list is Empty
Schedule each node in READY list when no stalling
Update READY list
end Loop

Pedro Diniz 16
[email protected]
CSCI 565 - Compiler Design Spring 2016

Heuristics for Selection


Heuristics for selecting from the READY List
pick a node with the longest path to a leaf in the dependence graph
pick a node with most immediate successors
pick a node that can go to a less busy pipeline (in a superscalar)

Pedro Diniz 17
[email protected]
CSCI 565 - Compiler Design Spring 2016

Heuristics for Selection


Pick a node with the longest path to a leaf in the
dependence graph
Algorithm (for node x)
If no successors then dx = 0
else dx = MAX( dy + cxy) for all successors y of x
reverse breadth-first visitation order

Pedro Diniz 18
[email protected]
CSCI 565 - Compiler Design Spring 2016

Heuristics for Selection


Pick a node with most immediate successors
Rationale: Highest-degree will mean solve the most dependences
Algorithm (for node x):
fx = number of successors of x

Pedro Diniz 19
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
1 3 4
1 3

2 6 5

1 4
7
3 3

8 9

Pedro Diniz 20
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=0
1 3 4
1 3
d=0
2 6 5

1 4
7
3 3

d=0 d=0
8 9

Pedro Diniz 21
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=0
1 3 4
1 3
d=0
2 6 5

1 4
7 d=3
3 3

d=0 d=0
8 9

Pedro Diniz 22
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=0
1 3 4
1 3
d=4 d=7 d=0
2 6 5

1 4
7 d=3
3 3

d=0 d=0
8 9

Pedro Diniz 23
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
d=4 d=7 d=0
2 f=1 6 f=1 5 f=0

1 4
7 d=3
f=2
3 3

d=0 d=0
8 f=0 9 f=0

Pedro Diniz 24
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0

1 4
7 d=3
f=2
3 3

d=0 d=0
8 f=0 9 f=0

Pedro Diniz 25
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=5 d=0
1,3,4,6 1 3 4 f=1
f=1 f=0
1 3
READY = { 6,1,4,3 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0

1 4
7 d=3
f=2
3 3

d=0 d=0
8 f=0 9 f=0

Pedro Diniz 26
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 6,1,4,3 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0

1 4
7 d=3
f=2
3 3

d=0 d=0
8 f=0 9 f=0

Pedro Diniz 27
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 6,1,4,3 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0

1 4
7 d=3
f=2
3 3

d=0 d=0
8 f=0 9 f=0

6
Pedro Diniz 28
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 1, 4, 3 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0

1 4
7 d=3
f=2
3 3

d=0 d=0
8 f=0 9 f=0

6
Pedro Diniz 29
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 1, 4, 3 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0

1 4
7 d=3
f=2
3 3

d=0 d=0
8 f=0 9 f=0

6
Pedro Diniz 30
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 1, 4, 3 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0

1 4
7 d=3
f=2
3 3

d=0 d=0
8 f=0 9 f=0

6 1
Pedro Diniz 31
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=5 d=0
2 1 3 4 f=1
f=1 f=0
1 3
READY = { 4 , 3 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0

1 4
7 d=3
f=2
3 3

d=0 d=0
8 f=0 9 f=0

6 1
Pedro Diniz 32
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 2, 4 , 3 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0

1 4
7 d=3
f=2
3 3

d=0 d=0
8 f=0 9 f=0

6 1
Pedro Diniz 33
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 2, 4 , 3 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0

1 4
7 d=3
f=2
3 3

d=0 d=0
8 f=0 9 f=0

6 1
Pedro Diniz 34
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 2, 4 , 3 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0

1 4
7 d=3
f=2
3 3

d=0 d=0
8 f=0 9 f=0

6 1 2
Pedro Diniz 35
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 4,3 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0

1 4
7 d=3
f=2
3 3

d=0 d=0
8 f=0 9 f=0

6 1 2
Pedro Diniz 36
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=5 d=0
7 1 3 4 f=1
f=1 f=0
1 3
READY = { 4,3 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0

1 4
7 d=3
f=2
3 3

d=0 d=0
8 f=0 9 f=0

6 1 2
Pedro Diniz 37
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 7, 4 , 3 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0

1 4
7 d=3
f=2
3 3

d=0 d=0
8 f=0 9 f=0

6 1 2
Pedro Diniz 38
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 7, 4 , 3 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0

1 4
7 d=3
f=2
3 3

d=0 d=0
8 f=0 9 f=0

6 1 2
Pedro Diniz 39
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 7, 4 , 3 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0

1 4
7 d=3
f=2
3 3

d=0 d=0
8 f=0 9 f=0

6 1 2
Pedro Diniz 40
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 7, 4 , 3 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0

1 4
7 d=3
f=2
3 3

d=0 d=0
8 f=0 9 f=0

6 1 2
Pedro Diniz 41
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 7, 4 , 3 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0

1 4
7 d=3
f=2
3 3

d=0 d=0
8 f=0 9 f=0

6 1 2 4
Pedro Diniz 42
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=5 d=0
5 1 3 4 f=1
f=1 f=0
1 3
READY = { 7, 3 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0

1 4
7 d=3
f=2
3 3

d=0 d=0
8 f=0 9 f=0

6 1 2 4
Pedro Diniz 43
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 7, 3, 5 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0

1 4
7 d=3
f=2
3 3

d=0 d=0
8 f=0 9 f=0

6 1 2 4
Pedro Diniz 44
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 7, 3, 5 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0

1 4
7 d=3
f=2
3 3

d=0 d=0
8 f=0 9 f=0

6 1 2 4 7
Pedro Diniz 45
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 3, 5 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0

1 4
7 d=3
f=2
3 3

d=0 d=0
8 f=0 9 f=0

6 1 2 4 7
Pedro Diniz 46
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=5 d=0
8, 9 1 3 4 f=1
f=1 f=0
1 3
READY = { 3, 5 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0

1 4
7 d=3
f=2
3 3

d=0 d=0
8 f=0 9 f=0

6 1 2 4 7
Pedro Diniz 47
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = {3, 5, 8, 9} d=4 d=7 d=0
2 f=1 6 f=1 5 f=0

1 4
7 d=3
f=2
3 3

d=0 d=0
8 f=0 9 f=0

6 1 2 4 7
Pedro Diniz 48
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = {3, 5, 8, 9} d=4 d=7 d=0
2 f=1 6 f=1 5 f=0

1 4
7 d=3
f=2
3 3

d=0 d=0
8 f=0 9 f=0

6 1 2 4 7
Pedro Diniz 49
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = {3, 5, 8, 9} d=4 d=7 d=0
2 f=1 6 f=1 5 f=0

1 4
7 d=3
f=2
3 3

d=0 d=0
8 f=0 9 f=0

6 1 2 4 7 3
Pedro Diniz 50
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 5, 8, 9 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0

1 4
7 d=3
f=2
3 3

d=0 d=0
8 f=0 9 f=0

6 1 2 4 7 3
Pedro Diniz 51
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 5, 8, 9 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0

1 4
7 d=3
f=2
3 3

d=0 d=0
8 f=0 9 f=0

6 1 2 4 7 3
Pedro Diniz 52
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 5, 8, 9 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0

1 4
7 d=3
f=2
3 3

d=0 d=0
8 f=0 9 f=0

6 1 2 4 7 3
Pedro Diniz 53
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 5, 8, 9 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0

1 4
7 d=3
f=2
3 3

d=0 d=0
8 f=0 9 f=0

6 1 2 4 7 3 5
Pedro Diniz 54
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 8, 9 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0

1 4
7 d=3
f=2
3 3

d=0 d=0
8 f=0 9 f=0

6 1 2 4 7 3 5
Pedro Diniz 55
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 8, 9 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0

1 4
7 d=3
f=2
3 3

d=0 d=0
8 f=0 9 f=0

6 1 2 4 7 3 5
Pedro Diniz 56
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 8, 9 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0

1 4
7 d=3
f=2
3 3

d=0 d=0
8 f=0 9 f=0

6 1 2 4 7 3 5
Pedro Diniz 57
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 8, 9 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0

1 4
7 d=3
f=2
3 3

d=0 d=0
8 f=0 9 f=0

6 1 2 4 7 3 5 8
Pedro Diniz 58
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 8, 9 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0

1 4
7 d=3
f=2
3 3

d=0 d=0
8 f=0 9 f=0

6 1 2 4 7 3 5 8
Pedro Diniz 59
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 9 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0

1 4
7 d=3
f=2
3 3

d=0 d=0
8 f=0 9 f=0

6 1 2 4 7 3 5 8
Pedro Diniz 60
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 9 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0

1 4
7 d=3
f=2
3 3

d=0 d=0
8 f=0 9 f=0

6 1 2 4 7 3 5 8
Pedro Diniz 61
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 9 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0

1 4
7 d=3
f=2
3 3

d=0 d=0
8 f=0 9 f=0

6 1 2 4 7 3 5 8
Pedro Diniz 62
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { 9 } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0

1 4
7 d=3
f=2
3 3

d=0 d=0
8 f=0 9 f=0

6 1 2 4 7 3 5 8 9
Pedro Diniz 63
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0

1 4
7 d=3
f=2
3 3

d=0 d=0
8 f=0 9 f=0

6 1 2 4 7 3 5 8 9
Pedro Diniz 64
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
d=3
d=5 d=0
1 3 4 f=1
f=1 f=0
1 3
READY = { } d=4 d=7 d=0
2 f=1 6 f=1 5 f=0

1 4
7 d=3
f=2
3 3

d=0 d=0
8 f=0 9 f=0

6 1 2 4 7 3 5 8 9
Pedro Diniz 65
[email protected]
CSCI 565 - Compiler Design Spring 2016

Outline
Overview of Instruction Scheduling
List Scheduling
Resource Constraints
Interaction with Register Allocation
Scheduling across basic blocks
Trace Scheduling
Scheduling for Loops
Loop Unrolling
Software Pipelining

Pedro Diniz 66
[email protected]
CSCI 565 - Compiler Design Spring 2016

List Scheduling with Resource


Constraints
Create a Dependence DAG of a Basic Block
Topological Sort
READY = nodes with no predecessors
Loop until READY list is empty
Let n READY list be the node with the highest priority
Schedule n in the earliest slot
that satisfies precedence + resource constraints
Update READY list

Pedro Diniz 67
[email protected]
CSCI 565 - Compiler Design Spring 2016

Constraints of a Superscalar Processor


Example:
1 integer operation
ALUop dest, src1, src2 # in 1 clock cycle
In parallel with
1 memory operation
LD dst, addr # in 2 clock cycles
ST src, addr # in 1 clock cycle

Pedro Diniz 68
[email protected]
CSCI 565 - Compiler Design Spring 2016

List Scheduling Example


1: LD r2, 0(r1)
2: ADD r2,r2,r3 7 1 4 3
3: ST r4,4(r5)
4: LD r6,8(r1) 3
3
5: ADD r6,r6,r2 3
6: ADD r6,r6,r4 2 3
7: ST r7,0(r7)
1
1
5

1
6

Pedro Diniz 69
[email protected]
CSCI 565 - Compiler Design Spring 2016

List Scheduling Example


1: LD r2, 0(r1)
2: ADD r2,r2,r3 7 1 4 3
3: ST r4,4(r5)
4: LD r6,8(r1) 3
3
5: ADD r6,r6,r2
6: ADD r6,r6,r4 2 3
7: ST r7,0(r7)
1
1
5
READY= {1, 3, 4, 7}

1
6

Pedro Diniz 70
[email protected]
CSCI 565 - Compiler Design Spring 2016

List Scheduling Example


1: LD r2, 0(r1)
2: ADD r2,r2,r3 7 1 4 3
3: ST r4,4(r5)
4: LD r6,8(r1) 3
3
5: ADD r6,r6,r2
6: ADD r6,r6,r4 2 3
7: ST r7,0(r7)
1
1
5
READY= {1, 3, 4, 7}

1
6
ALUop
MEM 1
MEM 2
Pedro Diniz 71
[email protected]
CSCI 565 - Compiler Design Spring 2016

List Scheduling Example


1: LD r2, 0(r1)
2: ADD r2,r2,r3 7 1 4 3
3: ST r4,4(r5)
4: LD r6,8(r1) 3
3
5: ADD r6,r6,r2
6: ADD r6,r6,r4 2 3
7: ST r7,0(r7)
1
1
5
READY= {1, 3, 4, 7}

1
6
ALUop
MEM 1 1
MEM 2 1
Pedro Diniz 72
[email protected]
CSCI 565 - Compiler Design Spring 2016

List Scheduling Example


1: LD r2, 0(r1)
2: ADD r2,r2,r3 7 1 4 3
3: ST r4,4(r5)
4: LD r6,8(r1) 3
3
5: ADD r6,r6,r2
6: ADD r6,r6,r4 2 3
7: ST r7,0(r7)
1
1
5
READY= {2, 3, 4, 7}

1
6
ALUop
MEM 1 1
MEM 2 1
Pedro Diniz 73
[email protected]
CSCI 565 - Compiler Design Spring 2016

List Scheduling Example


1: LD r2, 0(r1)
2: ADD r2,r2,r3 7 1 4 3
3: ST r4,4(r5)
4: LD r6,8(r1) 3
3
5: ADD r6,r6,r2
6: ADD r6,r6,r4 2 3
7: ST r7,0(r7)
1
1
5
READY= {2, 3, 4, 7}

1
6
ALUop 2
MEM 1 1
MEM 2 1
Pedro Diniz 74
[email protected]
CSCI 565 - Compiler Design Spring 2016

List Scheduling Example


1: LD r2, 0(r1)
2: ADD r2,r2,r3 7 1 4 3
3: ST r4,4(r5)
4: LD r6,8(r1) 3
3
5: ADD r6,r6,r2
6: ADD r6,r6,r4 2 3
7: ST r7,0(r7)
1
1
5
READY= {3, 4, 7}

1
6
ALUop 2
MEM 1 1
MEM 2 1
Pedro Diniz 75
[email protected]
CSCI 565 - Compiler Design Spring 2016

List Scheduling Example


1: LD r2, 0(r1)
2: ADD r2,r2,r3 7 1 4 3
3: ST r4,4(r5)
4: LD r6,8(r1) 3
3
5: ADD r6,r6,r2
6: ADD r6,r6,r4 2 3
7: ST r7,0(r7)
1
1
5
READY= {3, 4, 7}

1
6
ALUop 2
MEM 1 1 4
MEM 2 1 4
Pedro Diniz 76
[email protected]
CSCI 565 - Compiler Design Spring 2016

List Scheduling Example


1: LD r2, 0(r1)
2: ADD r2,r2,r3 7 1 4 3
3: ST r4,4(r5)
4: LD r6,8(r1) 3
3
5: ADD r6,r6,r2
6: ADD r6,r6,r4 2 3
7: ST r7,0(r7)
1
1
5
READY= {3, 5, 7}

1
6
ALUop 2
MEM 1 1 4
MEM 2 1 4
Pedro Diniz 77
[email protected]
CSCI 565 - Compiler Design Spring 2016

List Scheduling Example


1: LD r2, 0(r1)
2: ADD r2,r2,r3 7 1 4 3
3: ST r4,4(r5)
4: LD r6,8(r1) 3
3
5: ADD r6,r6,r2
6: ADD r6,r6,r4 2 3
7: ST r7,0(r7)
1
1
5
READY= {3, 5, 7}

1
6
ALUop 2
MEM 1 1 4 3
MEM 2 1 4
Pedro Diniz 78
[email protected]
CSCI 565 - Compiler Design Spring 2016

List Scheduling Example


1: LD r2, 0(r1)
2: ADD r2,r2,r3 7 1 4 3
3: ST r4,4(r5)
4: LD r6,8(r1) 3
3
5: ADD r6,r6,r2
6: ADD r6,r6,r4 2 3
7: ST r7,0(r7)
1
1
5
READY= {5, 7}

1
6
ALUop 2
MEM 1 1 4 3
MEM 2 1 4
Pedro Diniz 79
[email protected]
CSCI 565 - Compiler Design Spring 2016

List Scheduling Example


1: LD r2, 0(r1)
2: ADD r2,r2,r3 7 1 4 3
3: ST r4,4(r5)
4: LD r6,8(r1) 3
3
5: ADD r6,r6,r2
6: ADD r6,r6,r4 2 3
7: ST r7,0(r7)
1
1
5
READY= {5, 7}

1
6
ALUop 2 5
MEM 1 1 4 3
MEM 2 1 4
Pedro Diniz 80
[email protected]
CSCI 565 - Compiler Design Spring 2016

List Scheduling Example


1: LD r2, 0(r1)
2: ADD r2,r2,r3 7 1 4 3
3: ST r4,4(r5)
4: LD r6,8(r1) 3
3
5: ADD r6,r6,r2
6: ADD r6,r6,r4 2 3
7: ST r7,0(r7)
1
1
5
READY= {6, 7}

1
6
ALUop 2 5
MEM 1 1 4 3
MEM 2 1 4
Pedro Diniz 81
[email protected]
CSCI 565 - Compiler Design Spring 2016

List Scheduling Example


1: LD r2, 0(r1)
2: ADD r2,r2,r3 7 1 4 3
3: ST r4,4(r5)
4: LD r6,8(r1) 3
3
5: ADD r6,r6,r2
6: ADD r6,r6,r4 2 3
7: ST r7,0(r7)
1
1
5
READY= {6, 7}

1
6
ALUop 2 5 6
MEM 1 1 4 3
MEM 2 1 4
Pedro Diniz 82
[email protected]
CSCI 565 - Compiler Design Spring 2016

List Scheduling Example


1: LD r2, 0(r1)
2: ADD r2,r2,r3 7 1 4 3
3: ST r4,4(r5)
4: LD r6,8(r1) 3
3
5: ADD r6,r6,r2
6: ADD r6,r6,r4 2 3
7: ST r7,0(r7)
1
1
5
READY= {7}

1
6
ALUop 2 5 6
MEM 1 1 4 3
MEM 2 1 4
Pedro Diniz 83
[email protected]
CSCI 565 - Compiler Design Spring 2016

List Scheduling Example


1: LD r2, 0(r1)
2: ADD r2,r2,r3 7 1 4 3
3: ST r4,4(r5)
4: LD r6,8(r1) 3
3
5: ADD r6,r6,r2
6: ADD r6,r6,r4 2 3
7: ST r7,0(r7)
1
1
5
READY= {7}

1
6
ALUop 2 5 6
MEM 1 1 4 3 7
MEM 2 1 4
Pedro Diniz 84
[email protected]
CSCI 565 - Compiler Design Spring 2016

List Scheduling Example


1: LD r2, 0(r1)
2: ADD r2,r2,r3 7 1 4 3
3: ST r4,4(r5)
4: LD r6,8(r1) 3
3
5: ADD r6,r6,r2
6: ADD r6,r6,r4 2 3
7: ST r7,0(r7)
1
1
5
READY= { }

1
6
ALUop 2 5 6
MEM 1 1 4 3 7
MEM 2 1 4
Pedro Diniz 85
[email protected]
CSCI 565 - Compiler Design Spring 2016

Outline
Overview of Instruction Scheduling
List Scheduling
Resource Constraints
Interaction with Register Allocation
Scheduling across Basic Blocks
Trace Scheduling
Scheduling for Loops
Loop Unrolling
Software Pipelining

Pedro Diniz 86
[email protected]
CSCI 565 - Compiler Design Spring 2016

Register Allocation and Instruction


Scheduling
If Register Allocation is before instruction scheduling
restricts the choices for scheduling

Pedro Diniz 87
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
1: LD r2, 0(r1)
2: ADD r3,r3,r2
3: LD r2,4(r5)
4: ADD r6,r6,r2

Pedro Diniz 88
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
1: LD r2, 0(r1) 1
2: ADD r3,r3,r2
3: LD r2,4(r5) 3 3
4: ADD r6,r6,r2
2 3
1

1 3

Pedro Diniz 89
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
1: LD r2, 0(r1) 1
2: ADD r3,r3,r2
3: LD r2,4(r5) 3 3
4: ADD r6,r6,r2
2 3
1

1 3

3
ALUop 2 4 4
MEM 1 1 3
MEM 2 1 3
Pedro Diniz 90
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
1: LD r2, 0(r1) 1
2: ADD r3,r3,r2
3: LD r2,4(r5) 3 3
4: ADD r6,r6,r2
2 3
1
Anti-Dependence between 3 and 2.
There is really no data flowing... 1 3
How to fix this?
How about using a different Register? 3

Pedro Diniz 91
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
1: LD r2, 0(r1) 1
2: ADD r3,r3,r2
3: LD r4,4(r5) 3
4: ADD r6,r6,r4
2

Eliminated Anti-Dependence but


increased the number of registers
3
i.e., increased Register Pressure
3

Pedro Diniz 92
[email protected]
CSCI 565 - Compiler Design Spring 2016

Example
1: LD r2, 0(r1) 1
2: ADD r3,r3,r2
3: LD r4,4(r5) 3
4: ADD r6,r6,r4
2

3
ALUop 2 4 4
MEM 1 1 3
MEM 2 1 3
Pedro Diniz 93
[email protected]
CSCI 565 - Compiler Design Spring 2016

Register Allocation and Instruction


Scheduling
If Register Allocation is before Instruction Scheduling
restricts the choices for scheduling

Pedro Diniz 94
[email protected]
CSCI 565 - Compiler Design Spring 2016

Register Allocation and Instruction


Scheduling
If Register Allocation is before Instruction Scheduling
restricts the choices for scheduling

If Instruction Scheduling before Register Allocation


Register allocation may spill registers
Will change the carefully done schedule!!!

Pedro Diniz 95
[email protected]
CSCI 565 - Compiler Design Spring 2016

Outline
Overview of Instruction Scheduling
List Scheduling
Resource Constraints
Interaction with Register Allocation
Scheduling across Basic Blocks
Trace Scheduling
Scheduling for Loops
Loop Unrolling
Software Pipelining

Pedro Diniz 96
[email protected]
CSCI 565 - Compiler Design Spring 2016

Scheduling across Basic Blocks


Number of Instructions in a Basic Block is small
Cannot keep a multiple units with long pipelines busy by
just scheduling within a basic block

Need to handle Control Dependence


Scheduling constraints across basic blocks
Scheduling policy

Pedro Diniz 97
[email protected]
CSCI 565 - Compiler Design Spring 2016

Moving across Basic Blocks


Downward to adjacent Basic Block

B C

Pedro Diniz 98
[email protected]
CSCI 565 - Compiler Design Spring 2016

Moving across Basic Blocks


Downward to adjacent Basic Block

B C

Pedro Diniz 99
[email protected]
CSCI 565 - Compiler Design Spring 2016

Moving across Basic Blocks


Downward to adjacent Basic Block

B C

A path to B that does not execute A?

Pedro Diniz 100


[email protected]
CSCI 565 - Compiler Design Spring 2016

Moving across Basic Blocks


Upward to adjacent Basic Block

B C

Pedro Diniz 101


[email protected]
CSCI 565 - Compiler Design Spring 2016

Moving across Basic Blocks


Upward to adjacent Basic Block

B C

Pedro Diniz 102


[email protected]
CSCI 565 - Compiler Design Spring 2016

Moving across Basic Blocks


Upward to adjacent Basic Block

B C

A path from C that does not reach A?

Pedro Diniz 103


[email protected]
CSCI 565 - Compiler Design Spring 2016

Outline
Overview of Instruction Scheduling
List Scheduling
Resource Constraints
Interaction with Register Allocation
Scheduling across Basic Blocks
Trace Scheduling
Scheduling for Loops
Loop Unrolling
Software Pipelining

Pedro Diniz 104


[email protected]
CSCI 565 - Compiler Design Spring 2016

Trace Scheduling
Find the most common Trace of Basic Blocks
Use profiling information
Combine the Basic Blocks in the trace and schedule
them as one Block
Create clean-up code if the execution goes off-trace

Pedro Diniz 105


[email protected]
CSCI 565 - Compiler Design Spring 2016

Trace Scheduling

B C

F G

H
Pedro Diniz 106
[email protected]
CSCI 565 - Compiler Design Spring 2016

Trace Scheduling

B C

F G

H
Pedro Diniz 107
[email protected]
CSCI 565 - Compiler Design Spring 2016

Trace Scheduling

H
Pedro Diniz 108
[email protected]
CSCI 565 - Compiler Design Spring 2016

Trace Scheduling

H
Pedro Diniz 109
[email protected]
CSCI 565 - Compiler Design Spring 2016

Large Basic Blocks via Code


Duplication
Creating large extended Basic Blocks by duplication

B C

E
Pedro Diniz 110
[email protected]
CSCI 565 - Compiler Design Spring 2016

Large Basic Blocks via Code


Duplication
Creating large extended Basic Blocks by duplication
Schedule the larger Blocks

A A

B C B C

D D D

E E E
Pedro Diniz 111
[email protected]
CSCI 565 - Compiler Design Spring 2016

Scheduling Loops
Loop bodies are small
But, lot of time is spent in loops due to large number
of iterations
Need better ways to schedule loops

Pedro Diniz 112


[email protected]
CSCI 565 - Compiler Design Spring 2016

Loop Example
Machine Model
One load/store unit
load 2 cycles
store 2 cycles
Two arithmetic units
add 2 cycles
branch 2 cycles (no delay slot)
multiply 3 cycles
Both units are pipelined (initiate one op each cycle)

Source Code
for i = 1 to N
A[i] = A[i] * b

Pedro Diniz 113


[email protected]
CSCI 565 - Compiler Design Spring 2016

Loop Example
Source Code
for i = 1 to N
A[i] = A[i] * b

Assembly Code
loop:
ld r6, (r2)
mul r6, r6, r3
st r6, (r2)
add r2, r2, 4
ble r2, r5, loop

Pedro Diniz 114


[email protected]
CSCI 565 - Compiler Design Spring 2016

Loop Example
Assembly Code
loop:
ld r6, (r2)
mul r6, r6, r3
st r6, (r2)
add r2, r2, 4
ble r2, r5, loop

Schedule (7 cycles per iteration) excluding branch


ld st
ld st
mul ble
mul ble
mul
add
add

Pedro Diniz 115


[email protected]
CSCI 565 - Compiler Design Spring 2016

Outline
Overview of Instruction Scheduling
List Scheduling
Resource Constraints
Interaction with Register Allocation
Scheduling across Basic Blocks
Trace Scheduling
Scheduling for Loops
Loop Unrolling
Software Pipelining

Pedro Diniz 116


[email protected]
CSCI 565 - Compiler Design Spring 2016

Loop Unrolling
Unroll the Loop Body a few times
Pros:
Create a much larger basic block for the body
Eliminate few loop bounds checks
Cons:
Much larger program
Setup code (# of iterations < unroll factor)
beginning and end of the schedule can still have unused slots

Pedro Diniz 117


[email protected]
CSCI 565 - Compiler Design Spring 2016

Loop Example
loop:
ld r6, (r2)
mul r6, r6, r3
st r6, (r2)
add r2, r2, 4
ble r2, r5, loop

Pedro Diniz 118


[email protected]
CSCI 565 - Compiler Design Spring 2016

Loop Example
loop:
ld r6, (r2)
mul r6, r6, r3
st r6, (r2)
add r2, r2, 4
ld r6, (r2)
mul r6, r6, r3
st r6, (r2)
add r2, r2, 4
ble r2, r5, loop

Pedro Diniz 119


[email protected]
CSCI 565 - Compiler Design Spring 2016

Loop Example
loop:
ld r6, (r2)
mul r6, r6, r3
st r6, (r2)
add r2, r2, 4
ld r6, (r2)
mul r6, r6, r3
st r6, (r2)
add r2, r2, 4
ble r2, r5, loop

Schedule (7 cycles per iteration)


ld st ld st
ld st ld st
mul mul ble
mul mul ble
mul mul
add add
add add

Pedro Diniz 120


[email protected]
CSCI 565 - Compiler Design Spring 2016

Loop Unrolling
Rename Registers
Use Different Registers in Different Loop Iterations

Pedro Diniz 121


[email protected]
CSCI 565 - Compiler Design Spring 2016

Loop Example
loop:
ld r6,(r2)
mul r6, r6, r3
st r6,(r2)
add r2, r2, 4
ld r6,(r2)
mul r6, r6, r3
st r6,(r2)
add r2, r2, 4
ble r2, r5, loop

Pedro Diniz 122


[email protected]
CSCI 565 - Compiler Design Spring 2016

Loop Example
loop:
ld r6,(r2)
mul r6, r6, r3
st r6,(r2)
add r2, r2, 4
ld r7,(r2)
mul r7, r7, r3
st r7,(r2)
add r2, r2, 4
ble r2, r5, loop

Pedro Diniz 123


[email protected]
CSCI 565 - Compiler Design Spring 2016

Loop Unrolling
Rename Registers
Use Different Registers in Different Loop Iterations

Eliminate Unnecessary Dependences


Use more registers to eliminate true, anti and output
dependences
Eliminate dependent-chains of calculations when possible

Pedro Diniz 124


[email protected]
CSCI 565 - Compiler Design Spring 2016

Loop Example
loop:
ld r6,(r2)
mul r6, r6, r3
st r6,(r2)
add r2, r2, 4
ld r7,(r2)
mul r7, r7, r3
st r7,(r2)
add r2, r2, 4
ble r2, r5, loop

Pedro Diniz 125


[email protected]
CSCI 565 - Compiler Design Spring 2016

Loop Example
loop:
ld r6,(r1)
mul r6, r6, r3
st r6,(r1)
add r2, r1, 4
ld r7,(r2)
mul r7, r7, r3
st r7,(r2)
add r1, r2, 4
ble r1, r5, loop

Pedro Diniz 126


[email protected]
CSCI 565 - Compiler Design Spring 2016

Loop Example
loop:
ld r6,(r1)
mul r6, r6, r3
st r6,(r1)
add r2, r1, 4
ld r7,(r2)
mul r7, r7, r3
st r7,(r2)
add r1, r2, 4
ble r1, r5, loop

Pedro Diniz 127


[email protected]
CSCI 565 - Compiler Design Spring 2016

Loop Example
loop:
ld r6,(r1)
mul r6, r6, r3
st r6,(r1)
add r2, r1, 4
ld r7,(r2)
mul r7, r7, r3
st r7,(r2)
add r1, r1, 8
ble r1, r5, loop

Pedro Diniz 128


[email protected]
CSCI 565 - Compiler Design Spring 2016

Loop Example
loop:
ld r6, (r1)
mul r6, r6, r3
st r6, (r1)
add r2, r1, 4
ld r7, (r2)
mul r7, r7, r3
st r7, (r2)
add r1, r1, 8
ble r1, r5, loop

Schedule (3.5 cycles per iteration)


ld ld st st
ld ld st st
mul mul ble
mul mul ble
mul mul
add add
add add
Pedro Diniz 129
[email protected]
CSCI 565 - Compiler Design Spring 2016

Outline
Overview of Instruction Scheduling
List Scheduling
Resource Constraints
Interaction with Register Allocation
Scheduling across Basic Blocks
Trace Scheduling
Scheduling for Loops
Loop Unrolling
Software Pipelining

Pedro Diniz 130


[email protected]
CSCI 565 - Compiler Design Spring 2016

Software Pipelining
Try to overlap Multiple Iterations so that the Slots
will be filled
Find the Steady-State Window so that:
All the instructions of the loop body are executed
But from different iterations

Pedro Diniz 131


[email protected]
CSCI 565 - Compiler Design Spring 2016

Loop Example
Assembly Code
loop:
ld r6, (r2)
mul r6, r6, r3
st r6, (r2)
add r2, r2, 4
ble r2, r5, loop

Schedule
ld ld1 ld2 st ld3 st1 ld4 st2 ld5 st3 ld6
ld ld1 ld2 st ld3 st1 ld4 st2 ld5 st3
mul mul1 mul2 ble mul3 ble1 mul4 ble2 mul5
mul mul1 mul2 ble mul3 ble1 mul4 ble2
mul mul1 mul2 mul3 mul4
add add1 add2 add3
add add1 add2 add3

Pedro Diniz 132


[email protected]
CSCI 565 - Compiler Design Spring 2016

Loop Example
Assembly Code
loop:
ld r6, (r2)
mul r6, r6, r3
st r6, (r2)
add r2, r2, 4
ble r2, r5, loop

Schedule
ld ld1 ld2 st ld3 st1 ld4 st2 ld5 st3 ld6
ld ld1 ld2 st ld3 st1 ld4 st2 ld5 st3
mul mul1 mul2 ble mul3 ble1 mul4 ble2 mul5
mul mul1 mul2 ble mul3 ble1 mul4 ble2
mul mul1 mul2 mul3 mul4
add add1 add2 add3
add add1 add2 add3

Pedro Diniz 133


[email protected]
CSCI 565 - Compiler Design Spring 2016

Loop Example
Assembly Code
loop:
ld r6, (r2)
mul r6, r6, r3
st r6, (r2)
add r2, r2, 4
ble r2, r5, loop

Schedule (2 cycles per iteration)


ld3 st1
st ld3
mul2 ble
mul2
mul1
add1
add

Pedro Diniz 134


[email protected]
CSCI 565 - Compiler Design Spring 2016

Loop Example
4 Iterations are Overlapped
value of r3 and r5 dont change ld3 st1
4 regs for &A[i] (r2) st ld3
each address incremented by 4*4 mul2 ble
mul2
4 regs to keep value A[i] (r6) mul1
add1
Same registers can be reused after 4 of
these blocks; generate code for 4 add
blocks, otherwise need to move
loop:
ld r6, (r2)
mul r6, r6, r3
st r6, (r2)
add r2, r2, 4
ble r2, r5, loop

Pedro Diniz 135


[email protected]
CSCI 565 - Compiler Design Spring 2016

Software Pipelining
Optimal use of Resources
Need a lot of Registers
Values in multiple iterations need to be kept separated
Issues with Dependences:
Executing a store instruction in an iteration before branch
instruction is executed for a previous iteration (writing
when it should not have)
Loads and stores are issued out-of-order (need to figure-
out dependencies before doing this)
Code Generation Issues:
Generate pre-amble and post-amble code
Multiple blocks so no register copy is needed
Pedro Diniz 136
[email protected]
CSCI 565 - Compiler Design Spring 2016

Summary
Overview of Instruction Scheduling
List Scheduling
Resource Constraints
Interaction with Register Allocation
Scheduling across Basic Blocks
Trace Scheduling
Scheduling for Loops
Loop Unrolling
Software Pipelining

Pedro Diniz 137


[email protected]

You might also like