0% found this document useful (0 votes)
11 views31 pages

41-Instruction Scheduling and Software Pipelining-19!11!2024

Uploaded by

Aashish Mahato
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views31 pages

41-Instruction Scheduling and Software Pipelining-19!11!2024

Uploaded by

Aashish Mahato
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Instruction scheduling

Instruction ordering
• When a compiler emits the instructions corresponding to a
program, it imposes a total order on them.
• However, that order is usually not the only valid one, in the sense
that it can be changed without modifying the program’s
behavior.
• For example, if two instructions i1 and i2 appear sequentially in
that order and are independent, then it is possible to swap them.
• Among all the valid permutations of the instructions composing a
program — i.e. those that preserve the program’s behavior — some
can be more desirable than others. For example, one order might
lead to a faster program on some machine, because of architectural
constraints.
• The aim of instruction scheduling is to find a valid order that
optimizes some metric, like execution speed.
Pipeline stalls
• Modern, pipelined architectures can usually issue at least one
instruction per clock cycle.
• However, an instruction can be executed only if the data it needs is
ready. Otherwise, the pipeline stalls for one or several cycles.
• Stalls can appear because some instructions, e.g. division, require
several cycles to complete, or because data has to be fetched from
memory.
Scheduling example
The following example will illustrate how proper scheduling
can reduce the time required to execute a piece of RTL
code.
We assume the following delays for instructions:

Instruction kind RTL notation Delay


Ra ← Mem[Rb+c]
Memory load or store 3
Mem[Rb+c] ← Ra

Multiplication Ra ← Rb * Rc 2

Addition Ra ← Rb + Rc 1
Scheduling example
Cycle Instruction Cycle Instruction
1 R1 ← Mem[RSP] 1 R1 ← Mem[RSP]
4 R1 ← R1 + R1 2 R2 ← Mem[RSP+1]
5 R2 ← Mem[RSP+1] 3 R3 ← Mem[RSP+2]
8 R1 ← R1 * R2 4 R1 ← R1 + R1
9 R2 ← Mem[RSP+2] 5 R1 ← R1 * R2
12 R1 ← R1 * R2 6 R2 ← Mem[RSP+3]
13 R2 ← Mem[RSP+3] 7 R1 ← R1 * R3
16 R1 ← R1 * R2 9 R1 ← R1 * R2
18 Mem[RSP+4] ← R1 11 Mem[RSP+4] ← R1

After scheduling (including renaming), the last


instruction is issued at cycle 11 instead of 18!
Instruction dependences

An instruction i2 depends on an instruction i1 when it is not


possible to execute i2 before i1 without changing the
behavior of the program.
The most common reason for dependence is data-
dependence: i2 uses a value that is computed by i1.
However, as we will see, there are other kinds of
dependences.
Data dependences

We distinguish three kinds of dependences between two


instructions i1 and i2:
1. true dependence —i2 reads a value written by i1 (read
after write or RAW),
2. antidependence —i2 writes a value read by i1 (write
after read or WAR),
3. antidependence —i2 writes a value written by i1 (write
after write or WAW).
Antidependences
Antidependences are not real dependences, in the sense
that they do not arise from the flow of data. They are due to
a single location being used to store different values.
Most of the time, antidependences can be removed by
renaming locations —e.g. registers.
In the example below, the program on the left contains a
WAW antidependence between the two memory load
instructions, that can be removed by renaming the second
use of R1.
R1 ← Mem[RSP] R1 ← Mem[RSP]
R4 ← R4 + R1 R4 ← R4 + R1
R1 ← Mem[RSP+1] R2 ← Mem[RSP+1]
R4 ← R4 + R1 R4 ← R4 + R2
Computing dependences

Identifying dependences among instructions that only


access registers is easy.
Instructions that access memory are harder to handle. In
general, it is not possible to know whether two such
instructions refer to the same memory location.
Conservative approximations —not examined here —
therefore have to be used.
Dependence graph

The dependence graph is a directed graph representing


dependences among instructions.
Its nodes are the instructions to schedule, and there is an
edge from node n1 to node n2 iff the instruction of n2
depends on n1.
Any topological sort of the nodes of this graph represents a
valid way to schedule the instructions.
Dependence graph example
Name Instruction a
a. R1 ← Mem[RSP] b c
b. R1 ← R1 + R1
d e
c. R2 ← Mem[RSP+1]
d. R1 ← R1 * R2 f g
e. R2 ← Mem[RSP+2]
h
f. R1 ← R1 * R2
g. R2 ← Mem[RSP+3] i
h. R1 ← R1 * R2 true dependence
i Mem[RSP+4] ← R1 antidependence
Difficulty of scheduling

Optimal instruction scheduling is NP-complete.


As always, this implies that we will use techniques based on
heuristics to find a good —but sometimes not optimal —
solution to that problem.
List scheduling is a technique to schedule the instructions of
a single basic block.
Its basic idea is to simulate the execution of the instructions,
and to try to schedule instructions only when all their
operands can be used without stalling the pipeline.
List scheduling algorithm

The list scheduling algorithm maintains two lists:


–ready is the list of instructions that could be scheduled
without stall, ordered by priority,
– active is the list of instructions that are being executed.
At each step, the highest-priority instruction from ready is
scheduled, and moved to active, where it stays for a time
equal to its delay.
Before scheduling is performed, renaming is done to
remove all antidependences that can be removed.
Prioritizing instructions
Nodes (i.e. instructions) are sorted by priority in the ready
list. Several schemes exist to compute the priority of a node,
which can be equal to:
– the length of the longest latency-weighted path from it
to a root of the dependence graph,
– the number of its immediate successors,
–the number of its descendants,
– its latency,
– etc.
Unfortunately, no single scheme is better for all cases.
List scheduling example
a13

b10 c12 priority


d9 e10

f7 g8

h5

i3

A node's priority is the length


of the longest latency-
weighted path from it to a root
of the dependence graph
List scheduling example
a13 Cycle ready active

b10 c12 priority


d9 e10

f7 g8

h5

i3

A node's priority is the length


of the longest latency-
weighted path from it to a root
of the dependence graph
List scheduling example
a13 Cycle ready active
1 [a13,c12,e10,g8] [a]
b10 c12 priority
d9 e10

f7 g8

h5

i3

A node's priority is the length


of the longest latency-
weighted path from it to a root
of the dependence graph
List scheduling example
a13 Cycle ready active
1 [a13,c12,e10,g8] [a]
b10 c12 priority 2 [c12,e10,g8] [a,c]

d9 e10

f7 g8

h5

i3

A node's priority is the length


of the longest latency-
weighted path from it to a root
of the dependence graph
List scheduling example
a13 Cycle ready active
1 [a13,c12,e10,g8] [a]
b10 c12 priority 2 [c12,e10,g8] [a,c]

d9 e10 3 [e10,g8] [a,c,e]

f7 g8

h5

i3

A node's priority is the length


of the longest latency-
weighted path from it to a root
of the dependence graph
List scheduling example
a13 Cycle ready active
1 [a13,c12,e10,g8] [a]
b10 c12 priority 2 [c12,e10,g8] [a,c]

d9 e10 3 [e10,g8] [a,c,e]


4 [b10,g8] [b,c,e]
f7 g8

h5

i3

A node's priority is the length


of the longest latency-
weighted path from it to a root
of the dependence graph
List scheduling example
a13 Cycle ready active
1 [a13,c12,e10,g8] [a]
b10 c12 priority 2 [c12,e10,g8] [a,c]

d9 e10 3 [e10,g8] [a,c,e]


4 [b10,g8] [b,c,e]
f7 g8 5 [d9,g8] [d,e]

h5

i3

A node's priority is the length


of the longest latency-
weighted path from it to a root
of the dependence graph
List scheduling example
a13 Cycle ready active
1 [a13,c12,e10,g8] [a]
b10 c12 priority 2 [c12,e10,g8] [a,c]

d9 e10 3 [e10,g8] [a,c,e]


4 [b10,g8] [b,c,e]
f7 g8 5 [d9,g8] [d,e]
6 [g8] [d,g]
h5

i3

A node's priority is the length


of the longest latency-
weighted path from it to a root
of the dependence graph
List scheduling example
a13 Cycle ready active
1 [a13,c12,e10,g8] [a]
b10 c12 priority 2 [c12,e10,g8] [a,c]

d9 e10 3 [e10,g8] [a,c,e]


4 [b10,g8] [b,c,e]
f7 g8 5 [d9,g8] [d,e]
6 [g8] [d,g]
h5 7 [f7] [f,g]

i3

A node's priority is the length


of the longest latency-
weighted path from it to a root
of the dependence graph
List scheduling example
a13 Cycle ready active
1 [a13,c12,e10,g8] [a]
b10 c12 priority 2 [c12,e10,g8] [a,c]

d9 e10 3 [e10,g8] [a,c,e]


4 [b10,g8] [b,c,e]
f7 g8 5 [d9,g8] [d,e]
6 [g8] [d,g]
h5 7 [f7] [f,g]
8 [] [f,g]
i3

A node's priority is the length


of the longest latency-
weighted path from it to a root
of the dependence graph
List scheduling example
a13 Cycle ready active
1 [a13,c12,e10,g8] [a]
b10 c12 priority 2 [c12,e10,g8] [a,c]

d9 e10 3 [e10,g8] [a,c,e]


4 [b10,g8] [b,c,e]
f7 g8 5 [d9,g8] [d,e]
6 [g8] [d,g]
h5 7 [f7] [f,g]
8 [] [f,g]
i3
9 [h5] [h]
A node's priority is the length
of the longest latency-
weighted path from it to a root
of the dependence graph
List scheduling example
a13 Cycle ready active
1 [a13,c12,e10,g8] [a]
b10 c12 priority 2 [c12,e10,g8] [a,c]

d9 e10 3 [e10,g8] [a,c,e]


4 [b10,g8] [b,c,e]
f7 g8 5 [d9,g8] [d,e]
6 [g8] [d,g]
h5 7 [f7] [f,g]
8 [] [f,g]
i3
9 [h5] [h]
A node's priority is the length 10 [] [h]
of the longest latency-
weighted path from it to a root
of the dependence graph
List scheduling example
a13 Cycle ready active
1 [a13,c12,e10,g8] [a]
b10 c12 priority 2 [c12,e10,g8] [a,c]

d9 e10 3 [e10,g8] [a,c,e]


4 [b10,g8] [b,c,e]
f7 g8 5 [d9,g8] [d,e]
6 [g8] [d,g]
h5 7 [f7] [f,g]
8 [] [f,g]
i3
9 [h5] [h]
A node's priority is the length 10 [] [h]
of the longest latency- 11 [i3] [i]
weighted path from it to a root
of the dependence graph
List scheduling example
a13 Cycle ready active
1 [a13,c12,e10,g8] [a]
b10 c12 priority 2 [c12,e10,g8] [a,c]

d9 e10 3 [e10,g8] [a,c,e]


4 [b10,g8] [b,c,e]
f7 g8 5 [d9,g8] [d,e]
6 [g8] [d,g]
h5 7 [f7] [f,g]
8 [] [f,g]
i3
9 [h5] [h]
A node's priority is the length 10 [] [h]
of the longest latency- 11 [i3] [i]
weighted path from it to a root 12 [] [i]
of the dependence graph
List scheduling example
a13 Cycle ready active
1 [a13,c12,e10,g8] [a]
b10 c12 priority 2 [c12,e10,g8] [a,c]

d9 e10 3 [e10,g8] [a,c,e]


4 [b10,g8] [b,c,e]
f7 g8 5 [d9,g8] [d,e]
6 [g8] [d,g]
h5 7 [f7] [f,g]
8 [] [f,g]
i3
9 [h5] [h]
A node's priority is the length 10 [] [h]
of the longest latency- 11 [i3] [i]
weighted path from it to a root 12 [] [i]
of the dependence graph 13 [] [i]
List scheduling example
a13 Cycle ready active
1 [a13,c12,e10,g8] [a]
b10 c12 priority 2 [c12,e10,g8] [a,c]

d9 e10 3 [e10,g8] [a,c,e]


4 [b10,g8] [b,c,e]
f7 g8 5 [d9,g8] [d,e]
6 [g8] [d,g]
h5 7 [f7] [f,g]
8 [] [f,g]
i3
9 [h5] [h]
A node's priority is the length 10 [] [h]
of the longest latency- 11 [i3] [i]
weighted path from it to a root 12 [] [i]
of the dependence graph 13 [] [i]
14 [] []
Scheduling conflicts

It is hard to decide whether scheduling should be done


before or after register allocation.
If register allocation is done first, it can introduce
antidependences when reusing registers.
If scheduling is done first, register allocation can introduce
spilling code, destroying the schedule.
Solution: schedule first, then allocate registers and schedule
once more if spilling was necessary.

You might also like