41-Instruction Scheduling and Software Pipelining-19!11!2024
41-Instruction Scheduling and Software Pipelining-19!11!2024
Instruction ordering
• When a compiler emits the instructions corresponding to a
program, it imposes a total order on them.
• However, that order is usually not the only valid one, in the sense
that it can be changed without modifying the program’s
behavior.
• For example, if two instructions i1 and i2 appear sequentially in
that order and are independent, then it is possible to swap them.
• Among all the valid permutations of the instructions composing a
program — i.e. those that preserve the program’s behavior — some
can be more desirable than others. For example, one order might
lead to a faster program on some machine, because of architectural
constraints.
• The aim of instruction scheduling is to find a valid order that
optimizes some metric, like execution speed.
Pipeline stalls
• Modern, pipelined architectures can usually issue at least one
instruction per clock cycle.
• However, an instruction can be executed only if the data it needs is
ready. Otherwise, the pipeline stalls for one or several cycles.
• Stalls can appear because some instructions, e.g. division, require
several cycles to complete, or because data has to be fetched from
memory.
Scheduling example
The following example will illustrate how proper scheduling
can reduce the time required to execute a piece of RTL
code.
We assume the following delays for instructions:
Multiplication Ra ← Rb * Rc 2
Addition Ra ← Rb + Rc 1
Scheduling example
Cycle Instruction Cycle Instruction
1 R1 ← Mem[RSP] 1 R1 ← Mem[RSP]
4 R1 ← R1 + R1 2 R2 ← Mem[RSP+1]
5 R2 ← Mem[RSP+1] 3 R3 ← Mem[RSP+2]
8 R1 ← R1 * R2 4 R1 ← R1 + R1
9 R2 ← Mem[RSP+2] 5 R1 ← R1 * R2
12 R1 ← R1 * R2 6 R2 ← Mem[RSP+3]
13 R2 ← Mem[RSP+3] 7 R1 ← R1 * R3
16 R1 ← R1 * R2 9 R1 ← R1 * R2
18 Mem[RSP+4] ← R1 11 Mem[RSP+4] ← R1
f7 g8
h5
i3
f7 g8
h5
i3
f7 g8
h5
i3
d9 e10
f7 g8
h5
i3
f7 g8
h5
i3
h5
i3
h5
i3
i3
i3