Lec18 Tomasulo Algorithm
Lec18 Tomasulo Algorithm
• The SUB.D instruction cannot execute because the dependence of ADD.D on DIV.D causes the
pipeline to stall; yet, SUB.D is not data dependent on anything in the pipeline. This hazard creates
a performance limitation that can be eliminated by not requiring instructions to execute in
program order.
• Enables out-of-order execution implying out-of-order completion (e.g., SUBD)
• In a dynamically scheduled pipeline, all instructions still pass through issue stage in order (in-order issue)
• To allow out-of-order execution, we essentially split the ID pipe stage of our simple five-stage pipeline into
two stages:
• 1. Issue—Decode instructions, check for structural hazards.
• 2. Read operands—Wait until no data hazards, then read operands.
• An instruction fetch stage precedes the issue stage and may fetch either into an instruction register or into a
queue of pending instructions; instructions are then issued from the register or queue. The execution stage
follows the read operands stage, just as in the five-stage pipeline. Execution may take multiple cycles,
depending on the operation.
• Will distinguish when an instruction begins execution and when it completes execution; between these
two times, the instruction is in execution
Dynamic Scheduling
• In the classic five-stage pipeline, both structural and data hazards could be
checked during instruction decode (ID): When an instruction could execute
without hazards, it was issued from ID knowing that all data hazards had
been resolved.
• To allow us to begin executing the SUB.D , we must separate the issue
process into two parts:
• checking for any structural hazards and
• waiting for the absence of a data hazard.
• Thus, we still use in-order instruction issue (i.e., instructions issued in
program order), but we want an instruction to begin execution as soon as
its data operands are available. Such a pipeline does out-of-order
execution, which implies out-of-order completion.
Out-of-order execution introduces the possibility of WAR and WAW hazards,
DIV.D F0,F2,F4
ADD.D F6,F0,F8
SUB.D F8,F10,F14
MUL.D F6,F10,F8
There is an antidependence between the ADD.D and the SUB.D, and if the pipeline
executes the SUB.D before the ADD.D (which is waiting for the DIV.D), it will violate
the antidependence, yielding a WAR hazard.
Issue—Get the next instruction from the head of the instruction queue, which is maintained in
FIFO order to ensure the maintenance of correct data flow. If there is a matching
reservation station that is empty, issue the instruction to the station with the operand
values, if they are currently in the registers. If there is not an empty reservation station,
then there is a structural hazard and the instruction stalls until a station or buffer is freed. If
the operands are not in the registers, keep track of the functional units that will produce the
operands. This step renames registers, eliminating WAR and WAW hazards. (This stage is
sometimes called dispatch in a dynamically scheduled processor.)
Execute—If one or more of the operands is not yet available, monitor the common data bus
while waiting for it to be computed. When an operand becomes available, it is placed into
any reservation station waiting for it. When all the operands are available, the operation
can be executed by the corresponding functional unit. By delaying instruction execution
until the operands are available, RAW hazards are avoided.
Write result—When the result is available, write it on the CDB and from there into the registers
and into any reservation stations (including store buffers) waiting for this result. Stores are
buffered in the store buffer until both the value to be stored and the store address are
available, then the result is written as soon as the memory unit is free.
R eservation Stations: S1 S2 RS RS
Time Name Busy Op Vj Vk Qj Qk
Add1 No
FU coun t Add2 No
3 FP Adder R.S.
down Add3 No
2 FP Mult R.S.
Mult1 No
Mult2 No
R egister result status:
C lock F0 F2 F4 F6 F8 F 10 F 12 ... F 30
0 FU
Clock cycle
counter
Tomasulo Example Cycle 1
Instruction statu s: E xec Wr i te
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 Load1 Yes 34+R2
LD F2 45+ R3 Load2 No
MULTD F0 F2 F4 Load3 No
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
R eservation Stations: S1 S2 RS RS
Time Name Busy Op Vj Vk Qj Qk
Add1 No
Add2 No
Add3 No
Mult1 No
Mult2 No
R eservation Stations: S1 S2 RS RS
Time Name Busy Op Vj Vk Qj Qk
Add1 No
Add2 No
Add3 No
Mult1 No
Mult2 No
R eservation Stations: S1 S2 RS RS
Tim e N a m e B u s y Op Vj Vk Qj Qk
Add1 No
Add2 No
Add3 No
Mult1 Yes MULTD R(F4) Load2
Mult2 No
R egister result status:
C lock F0 F2 F4 F6 F8 F10 F12 ... F30
3 FU Mult1 Load2 Load1
R eservation Stations: S1 S2 RS RS
Tim e N a m e B u s y O p Vj Vk Qj Qk
A d d 1 Yes SUB D M (A 1) L o ad 2
Add2 No
Add3 No
M u lt1 Yes M ULTD R( F4) L o ad 2
M u lt2 N o
R eservation Stations: S1 S2 RS RS
Tim e N a m e B u s y O p Vj Vk Qj Qk
2 A d d 1 Yes SUB D M (A 1) M (A 2)
Add2 No
Add3 No
1 0 M u lt1 Yes M ULTD M (A 2) R( F4)
M u lt2 Yes D IVD M (A 1) M u lt1
R eservation Stations: S1 S2 RS RS
Tim e N a m e B u s y Op Vj Vk Qj Qk
1 A d d1 Yes SUBD M(A1) M(A2)
A dd 2 Yes ADDD M(A2) Add1
Add3 No
9 M u lt1 Yes MULTD M(A2) R(F4)
M u lt2 Yes DIVD M(A1) Mult1
R eservation Stations: S1 S2 RS RS
Tim e N a m e B u s y O p Vj Vk Qj Qk
0 A d d1 Yes SUB D M (A 1) M (A 2)
A dd 2 Yes A D D D M (A 2) A d d 1
Add3 No
8 M u lt1 Yes M ULTD M (A 2) R( F4)
M u lt2 Yes D IVD M (A 1) M u lt1
R eservation Stations: S1 S2 RS RS
Tim e N a m e B u s y O p Vj Vk Qj Qk
Add1 No
2 A d d 2 Yes A D D D (M -M ) M (A 2)
Add3 No
7 M u lt1 Yes M ULTD M (A 2) R( F4)
M u lt2 Yes D IVD M (A 1) M u lt1
R eservation Stations: S1 S2 RS RS
Tim e N a m e B u s y O p Vj Vk Qj Qk
Add1 No
1 A d d 2 Yes A D D D (M -M ) M (A 2)
Add3 No
6 M u lt1 Yes M ULTD M (A 2) R( F4)
M u lt2 Yes D IVD M (A 1) M u lt1
R eservation Stations: S1 S2 RS RS
Tim e N a m e B u s y O p Vj Vk Qj Qk
Add1 No
0 A d d 2 Yes A D D D (M -M ) M (A 2)
Add3 No
5 M u lt1 Yes M ULTD M (A 2) R( F4)
M u lt2 Yes D IVD M (A 1) M u lt1
R eservation Stations: S1 S2 RS RS
Tim e N a m e B u s y O p Vj Vk Qj Qk
A dd 1 No
A dd 2 No
Add3 No
4 M u lt1 Yes M ULTD M (A 2) R( F4)
M u lt2 Yes D IVD M (A 1) M u lt1
R eservation Stations: S1 S2 RS RS
Tim e N a m e B u s y O p Vj Vk Qj Qk
A dd 1 No
A dd 2 No
Add3 No
3 M u lt1 Yes M ULTD M (A 2) R( F4)
M u lt2 Yes D IVD M (A 1) M u lt1
R eservation Stations: S1 S2 RS RS
Tim e N a m e B u s y O p Vj Vk Qj Qk
A dd 1 No
A dd 2 No
Add3 No
2 M u lt1 Yes M ULTD M (A 2) R( F4)
M u lt2 Yes D IVD M (A 1) M u lt1
R eservation Stations: S1 S2 RS RS
Tim e N a m e B u s y O p Vj Vk Qj Qk
A dd 1 No
A dd 2 No
Add3 No
1 M u lt1 Yes M ULTD M (A 2) R( F4)
M u lt2 Yes D IVD M (A 1) M u lt1
R eservation Stations: S1 S2 RS RS
Tim e N a m e B u s y O p Vj Vk Qj Qk
A dd 1 No
A dd 2 No
Add3 No
0 M u lt1 Yes M ULTD M (A 2) R( F4)
M u lt2 Yes D IVD M (A 1) M u lt1
R eservation Stations: S1 S2 RS RS
Time Name Busy Op Vj Vk Qj Qk
Add1 No
Add2 No
Add3 No
Mult1 No
40 Mult2 Yes DIVD M*F4 M(A1)
R eservation Stations: S1 S2 RS RS
Time Name Busy Op Vj Vk Qj Qk
Add1 No
Add2 No
Add3 No
Mult1 No
1 Mult2 Yes DIVD M*F4 M(A1)
R eservation Stations: S1 S2 RS RS
Time Name Busy Op Vj Vk Qj Qk
Add1 No
Add2 No
Add3 No
Mult1 No
0 Mult2 Yes DIVD M*F4 M(A1)
R eservation Stations: S1 S2 RS RS
Time Name Busy Op Vj Vk Qj Qk
Add1 No
Add2 No
Add3 No
Mult1 No
Mult2 Yes DIVD M*F4 M(A1)