0% found this document useful (0 votes)
42 views

Lec18 Tomasulo Algorithm

Dynamic scheduling using hardware reorders instructions to reduce stalls while maintaining data flow and exception behavior. It allows out-of-order execution to avoid stalling when dependencies are present. Tomasulo's algorithm uses reservation stations to track operand availability and bypass results directly to functional units, avoiding register renaming hazards and enabling more flexibility than a compiler. Instructions issue in-order but execute out-of-order when operands are ready.

Uploaded by

mayank p
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Lec18 Tomasulo Algorithm

Dynamic scheduling using hardware reorders instructions to reduce stalls while maintaining data flow and exception behavior. It allows out-of-order execution to avoid stalling when dependencies are present. Tomasulo's algorithm uses reservation stations to track operand availability and bypass results directly to functional units, avoiding register renaming hazards and enabling more flexibility than a compiler. Instructions issue in-order but execute out-of-order when operands are ready.

Uploaded by

mayank p
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

VLSI Architecture

ES ZG642 / MEL ZG 642


Session 16
BITS Pilani
Pawan Sharma
[email protected]
Pilani Campus 18/11/2023
Last Lecture

• Instruction Level Parallelism


• Compiler techniques to increase ILP
• Loop Unrolling

BITS Pilani, Pilani Campus


Today’s lecture

Dynamic Scheduling using Hardware

BITS Pilani, Pilani Campus


Overview
• Hardware reorders (rearranges the instruction execution) to reduce the stalls
while maintaining data flow and exception behaviour
• Dynamic scheduling offers several advantages.
• First, it allows code that was compiled with one pipeline in mind to run efficiently on a
different pipeline, eliminating the need to have multiple binaries and recompile for a
different microarchitecture. It is essential as in today’s computing environment, where
much of the software is from third parties and distributed in binary form, this advantage
is significant.
• Second, it enables handling some cases when dependences are unknown at compile
time; for example, they may involve a memory reference or a data-dependent branch, or
or dispatching.
• Third, and perhaps most importantly, it allows the processor to tolerate unpredictable
delays, such as cache misses, by executing other code while waiting for the miss to
resolve
• Although a dynamically scheduled processor cannot change the data flow, it tries to
avoid stalling when dependences are present.
• In contrast, static pipeline scheduling by the compiler tries to minimize stalls by
separating dependent instructions so that they will not lead to hazards.

BITS Pilani, Pilani Campus


• A major limitation of simple pipelining techniques is that
they use in-order instruction issue and execution:
Instructions are issued in program order, and if an
instruction is stalled in the pipeline, later instructions also
can not proceed.
• Thus, if there is a dependence between two closely
spaced instructions in the pipeline, this will lead to a
hazard and a stall will result.
• If there are multiple functional units, these units could lie
idle

BITS Pilani, Pilani Campus


HW Schemes: Instruction Parallelism
If instruction ‘J’ depends on a long-running (high latency) instruction ‘I’ currently in execution in the pipeline,
then all instructions after j must be stalled until i is finished and j can execute
I: DIVD F0,F2,F4
J: ADDD F10,F0,F8
SUBD F12,F8,F14

• The SUB.D instruction cannot execute because the dependence of ADD.D on DIV.D causes the
pipeline to stall; yet, SUB.D is not data dependent on anything in the pipeline. This hazard creates
a performance limitation that can be eliminated by not requiring instructions to execute in
program order.
• Enables out-of-order execution implying out-of-order completion (e.g., SUBD)
• In a dynamically scheduled pipeline, all instructions still pass through issue stage in order (in-order issue)
• To allow out-of-order execution, we essentially split the ID pipe stage of our simple five-stage pipeline into
two stages:
• 1. Issue—Decode instructions, check for structural hazards.
• 2. Read operands—Wait until no data hazards, then read operands.
• An instruction fetch stage precedes the issue stage and may fetch either into an instruction register or into a
queue of pending instructions; instructions are then issued from the register or queue. The execution stage
follows the read operands stage, just as in the five-stage pipeline. Execution may take multiple cycles,
depending on the operation.
• Will distinguish when an instruction begins execution and when it completes execution; between these
two times, the instruction is in execution
Dynamic Scheduling
• In the classic five-stage pipeline, both structural and data hazards could be
checked during instruction decode (ID): When an instruction could execute
without hazards, it was issued from ID knowing that all data hazards had
been resolved.
• To allow us to begin executing the SUB.D , we must separate the issue
process into two parts:
• checking for any structural hazards and
• waiting for the absence of a data hazard.
• Thus, we still use in-order instruction issue (i.e., instructions issued in
program order), but we want an instruction to begin execution as soon as
its data operands are available. Such a pipeline does out-of-order
execution, which implies out-of-order completion.
Out-of-order execution introduces the possibility of WAR and WAW hazards,

DIV.D F0,F2,F4
ADD.D F6,F0,F8
SUB.D F8,F10,F14
MUL.D F6,F10,F8

There is an antidependence between the ADD.D and the SUB.D, and if the pipeline
executes the SUB.D before the ADD.D (which is waiting for the DIV.D), it will violate
the antidependence, yielding a WAR hazard.

Likewise, to avoid violating output dependences, such as the write of F6 by MUL.D,


WAW hazards must be handled. As we will see, both these hazards are avoided by the
use of register renaming.

BITS Pilani, Pilani Campus


• Since the two capabilities—pipelined functional units and
multiple functional units—are essentially equivalent for
the purposes of pipeline control, we will assume the
processor has multiple functional units.
• In a dynamically scheduled pipeline, all instructions pass
through the issue stage in order (in-order issue);
however, they can be stalled or bypass each other in the
second stage (read operands) and thus enter execution
stage in out of order fashion.

BITS Pilani, Pilani Campus


A Dynamic Algorithm: Tomasulo’s
• The IBM 360/91 floating-point unit used a sophisticated scheme to allow out-of-order execution.
• This scheme, invented by Robert Tomasulo, tracks when operands for instructions are available to
minimize RAW hazards and introduces register renaming in hardware to minimize WAW and WAR
hazards
• There are many variations on this scheme in modern processors, although the key concepts of
tracking instruction dependences to allow execution as soon as operands are available and renaming
registers to avoid WAR and WAW hazards are common characteristics.
• Goal: High Performance without special compilers
• Small number of floating point registers (4 in 360 family of IBM) that limited the effectiveness of
compiler scheduling of operations.
• This led Tomasulo to try to figure out how to get more effective registers — renaming in hardware!
(using reservation stations)
• RAW hazards are avoided by executing an instruction only when its operands are available,
• WAR and WAW hazards, which arise from name dependences, are eliminated by register renaming
• Why Study 1966 Computer?
• The descendants of this have flourished!
– Alpha 21264, Pentium 4, AMD Opteron, Power 5, …
• Basic task of reservation station is to fetch and buffer an
operand as soon as it is available, eliminating the need to get
the operand from a register.
• In addition, pending instructions reach out to their designated
reservation station which provides their input (source
operands).
• Finally, when successive writes to a register overlap in
execution, only the last one is actually used to update the
register.
• Since there can be more reservation stations than real
registers, the technique can even eliminate hazards arising
from name dependences that could not be eliminated by a
compiler.

BITS Pilani, Pilani Campus


• The use of reservation stations, rather than a
centralized register file, leads to two other important
properties.
• First, hazard detection and execution control are distributed: The information
held in the reservation stations at each functional unit determines when an
instruction can begin execution at that unit.
• Second, results are passed directly to functional units from the reservation
stations where they are buffered, rather than going through the registers.
• This bypassing is done with a common data bus (CDB) that allows all units
waiting for an operand to be loaded simultaneously

BITS Pilani, Pilani Campus


• Instructions are sent from the instruction unit into the instruction queue from
which they are issued in first-in, first-out (FIFO) order.
• The reservation stations include the operation and the actual operands, as
well as information used for detecting and resolving hazards.
• Load buffers have three functions: (1) hold the components of the effective
address until it is computed, (2) track outstanding loads that are waiting on
the memory, and (3) hold the results of completed loads that are waiting for
the CDB.
• Similarly, store buffers have three functions: (1) hold the components of the
effective address until it is computed, (2) hold the destination memory
addresses of outstanding stores that are waiting for the data value to store,
and (3) hold the address and value to store until the memory nit is available.
• All results from either the FP units or the load unit are put on the CDB, which
goes to the FP register file as well as to the reservation stations and store
buffers.
• The FP adders implement addition and subtraction, and the FP multipliers do
multiplication and division.

BITS Pilani, Pilani Campus


Let’s look at the steps an instruction goes through. There are only three steps, although each
one can now take an arbitrary number of clock cycles:

Issue—Get the next instruction from the head of the instruction queue, which is maintained in
FIFO order to ensure the maintenance of correct data flow. If there is a matching
reservation station that is empty, issue the instruction to the station with the operand
values, if they are currently in the registers. If there is not an empty reservation station,
then there is a structural hazard and the instruction stalls until a station or buffer is freed. If
the operands are not in the registers, keep track of the functional units that will produce the
operands. This step renames registers, eliminating WAR and WAW hazards. (This stage is
sometimes called dispatch in a dynamically scheduled processor.)
Execute—If one or more of the operands is not yet available, monitor the common data bus
while waiting for it to be computed. When an operand becomes available, it is placed into
any reservation station waiting for it. When all the operands are available, the operation
can be executed by the corresponding functional unit. By delaying instruction execution
until the operands are available, RAW hazards are avoided.

Write result—When the result is available, write it on the CDB and from there into the registers
and into any reservation stations (including store buffers) waiting for this result. Stores are
buffered in the store buffer until both the value to be stored and the store address are
available, then the result is written as soon as the memory unit is free.

BITS Pilani, Pilani Campus


Reservation Station Components

Op: Operation to perform on source operands S1 and S2 in the


unit (e.g., + or –)
Vj, Vk: Value of Source operands
– Store buffers has V field, result to be stored
Qj, Qk: Reservation stations producing corresponding source
operands. A value of 0 indicates that the operand is already
available in Vj or Vk or is unnecessary.

Busy: Indicates reservation station and its accompanying FU is


busy

Register result status—Indicates which functional unit will write


each register, if one exists. Blank when no pending instructions
that will write that register.
Example

1. L.D F6, 34(R2)


2. L.D F2, 45(R3)
3. MUL.D F0, F2, F4
4. SUB.D F8, F2, F6
5. DIV.D F10, F0, F6
6. ADD.D F6, F8, F2
Latencies
• Assume operation latencies
– load: 2 clock cycles
– add/sub: 2 clock cycles
– multiply: 10 clock cycles
– divide: 40 clock cycles

– 1 cycle to write the result


Tomasulo Example
Instruction stream
Instruction statu s: E xec Wr i te
Instructio n j k Is s u e C o m p R es u lt B u sy A d d ress
LD F6 34+ R2 L o ad 1 No
LD F2 45+ R3 L o ad 2 No
M ULTD F0 F2 F4 L o ad 3 No
SUB D F8 F6 F2
D IVD F1 0 F0 F6
ADDD F6 F8 F2 3 Load/Buffers

R eservation Stations: S1 S2 RS RS
Time Name Busy Op Vj Vk Qj Qk
Add1 No
FU coun t Add2 No
3 FP Adder R.S.
down Add3 No
2 FP Mult R.S.
Mult1 No
Mult2 No
R egister result status:
C lock F0 F2 F4 F6 F8 F 10 F 12 ... F 30
0 FU

Clock cycle
counter
Tomasulo Example Cycle 1
Instruction statu s: E xec Wr i te
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 Load1 Yes 34+R2
LD F2 45+ R3 Load2 No
MULTD F0 F2 F4 Load3 No
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2

R eservation Stations: S1 S2 RS RS
Time Name Busy Op Vj Vk Qj Qk
Add1 No
Add2 No
Add3 No
Mult1 No
Mult2 No

R egister result status:


C lock F0 F2 F4 F6 F8 F 10 F 12 ... F 30
1 FU L o ad 1
Tomasulo Example Cycle 2
Instruction statu s: E xec Wr i te
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 Load1 Yes 34+R2
LD F2 45+ R3 2 Load2 Yes 45+R3
MULTD F0 F2 F4 Load3 No
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2

R eservation Stations: S1 S2 RS RS
Time Name Busy Op Vj Vk Qj Qk
Add1 No
Add2 No
Add3 No
Mult1 No
Mult2 No

R egister result status:


C lock F0 F2 F4 F6 F8 F10 F12 ... F30
2 FU Load2 Load1

Note: Can have multiple loads outstanding


Tomasulo Example Cycle 3
In stru ction statu s: E xec Wr i te
Instructio n j k Is s u e C o m p R es u lt B u sy A d d ress
LD F6 34+ R2 1 3 L o ad 1 Yes 34+ R2
LD F2 45+ R3 2 L o ad 2 Yes 45+ R3
M ULTD F0 F2 F4 3 L o ad 3 No
SUB D F8 F6 F2
D IVD F1 0 F0 F6
ADDD F6 F8 F2

R eservation Stations: S1 S2 RS RS
Tim e N a m e B u s y Op Vj Vk Qj Qk
Add1 No
Add2 No
Add3 No
Mult1 Yes MULTD R(F4) Load2
Mult2 No
R egister result status:
C lock F0 F2 F4 F6 F8 F10 F12 ... F30
3 FU Mult1 Load2 Load1

• Note: registers names are removed (“renamed”) in Reservation


Stations; MULT issued
• Load1 completing; what is waiting for Load1?
Tomasulo Example Cycle 4
Instruction statu s: E xec Wr i te
Instructio n j k Is s u e C o m p R es u lt B u sy A d d ress
LD F6 34+ R2 1 3 4 L o ad 1 No
LD F2 45+ R3 2 4 L o ad 2 Yes 45+ R3
M ULTD F0 F2 F4 3 L o ad 3 No
SUB D F8 F6 F2 4
D IVD F1 0 F0 F6
ADDD F6 F8 F2

R eservation Stations: S1 S2 RS RS
Tim e N a m e B u s y O p Vj Vk Qj Qk
A d d 1 Yes SUB D M (A 1) L o ad 2
Add2 No
Add3 No
M u lt1 Yes M ULTD R( F4) L o ad 2
M u lt2 N o

R egister result status:


C lock F0 F2 F4 F6 F8 F10 F12 ... F30
4 FU Mult1 Load2 M(A1) Add1

• Load2 completing; what is waiting for Load2?


Tomasulo Example Cycle 5
Instruction statu s: E xec Wr i te
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 3 4 Load1 No
LD F2 45+ R3 2 4 5 Load2 No
MULTD F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4
DIVD F10 F0 F6 5
ADDD F6 F8 F2

R eservation Stations: S1 S2 RS RS
Tim e N a m e B u s y O p Vj Vk Qj Qk
2 A d d 1 Yes SUB D M (A 1) M (A 2)
Add2 No
Add3 No
1 0 M u lt1 Yes M ULTD M (A 2) R( F4)
M u lt2 Yes D IVD M (A 1) M u lt1

R egister result status:


C lock F0 F2 F4 F6 F8 F10 F12 ... F30
5 FU Mult1 M(A2) M(A1) Add1 Mult2

• Timer starts down for Add1, Mult1


Tomasulo Example Cycle 6
Instruction statu s: E xec Wr i te
Instructio n j k Is s u e C o m p R es u lt B u sy A d d ress
LD F6 34+ R2 1 3 4 L o ad 1 No
LD F2 45+ R3 2 4 5 L o ad 2 No
M ULTD F0 F2 F4 3 L o ad 3 No
SUB D F8 F6 F2 4
D IVD F1 0 F0 F6 5
ADDD F6 F8 F2 6

R eservation Stations: S1 S2 RS RS
Tim e N a m e B u s y Op Vj Vk Qj Qk
1 A d d1 Yes SUBD M(A1) M(A2)
A dd 2 Yes ADDD M(A2) Add1
Add3 No
9 M u lt1 Yes MULTD M(A2) R(F4)
M u lt2 Yes DIVD M(A1) Mult1

R egister result status:


C lock F0 F2 F4 F6 F8 F10 F12 ... F30
6 FU Mult1 M(A2) Add2 Add1 Mult2

• Issue ADDD here despite name dependency on F6?


Tomasulo Example Cycle 7
In stru ction statu s: E xec Wr i te
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 3 4 Load1 No
LD F2 45+ R3 2 4 5 Load2 No
MULTD F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4 7
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6

R eservation Stations: S1 S2 RS RS
Tim e N a m e B u s y O p Vj Vk Qj Qk
0 A d d1 Yes SUB D M (A 1) M (A 2)
A dd 2 Yes A D D D M (A 2) A d d 1
Add3 No
8 M u lt1 Yes M ULTD M (A 2) R( F4)
M u lt2 Yes D IVD M (A 1) M u lt1

R egister result status:


C lock F0 F2 F4 F6 F8 F10 F12 ... F30
7 FU Mult1 M(A2) Add2 Add1 Mult2

• Add1 (SUBD) completing; what is waiting for it?


Tomasulo Example Cycle 8
Instruction statu s: E xec Wr i te
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 3 4 Load1 No
LD F2 45+ R3 2 4 5 Load2 No
MULTD F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6

R eservation Stations: S1 S2 RS RS
Tim e N a m e B u s y O p Vj Vk Qj Qk
Add1 No
2 A d d 2 Yes A D D D (M -M ) M (A 2)
Add3 No
7 M u lt1 Yes M ULTD M (A 2) R( F4)
M u lt2 Yes D IVD M (A 1) M u lt1

R egister result status:


C lock F0 F2 F4 F6 F8 F10 F12 ... F30
8 FU Mult1 M(A2) Add2 (M-M) Mult2
Tomasulo Example Cycle 9
Instruction statu s: E xec Wr i te
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 3 4 Load1 No
LD F2 45+ R3 2 4 5 Load2 No
MULTD F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6

R eservation Stations: S1 S2 RS RS
Tim e N a m e B u s y O p Vj Vk Qj Qk
Add1 No
1 A d d 2 Yes A D D D (M -M ) M (A 2)
Add3 No
6 M u lt1 Yes M ULTD M (A 2) R( F4)
M u lt2 Yes D IVD M (A 1) M u lt1

R egister result status:


C lock F0 F2 F4 F6 F8 F10 F12 ... F30
9 FU Mult1 M(A2) Add2 (M-M) Mult2
Tomasulo Example Cycle 10
In stru ction statu s: E xec Wr i te
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 3 4 Load1 No
LD F2 45+ R3 2 4 5 Load2 No
MULTD F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 10

R eservation Stations: S1 S2 RS RS
Tim e N a m e B u s y O p Vj Vk Qj Qk
Add1 No
0 A d d 2 Yes A D D D (M -M ) M (A 2)
Add3 No
5 M u lt1 Yes M ULTD M (A 2) R( F4)
M u lt2 Yes D IVD M (A 1) M u lt1

R egister result status:


C lock F0 F2 F4 F6 F8 F10 F12 ... F30
10 FU Mult1 M(A2) Add2 (M-M) Mult2

• Add2 (ADDD) completing; what is waiting for it?


Tomasulo Example Cycle 11
Instruction statu s: E xec Wr i te
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 3 4 Load1 No
LD F2 45+ R3 2 4 5 Load2 No
MULTD F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 10 11

R eservation Stations: S1 S2 RS RS
Tim e N a m e B u s y O p Vj Vk Qj Qk
A dd 1 No
A dd 2 No
Add3 No
4 M u lt1 Yes M ULTD M (A 2) R( F4)
M u lt2 Yes D IVD M (A 1) M u lt1

R egister result status:


C lock F0 F2 F4 F6 F8 F 10 F 12 ... F 30
11 FU Mult1 M(A2) (M-M+M (M-M) Mult2

• Write result of ADDD here?


• All quick instructions complete in this cycle!
Tomasulo Example Cycle 12
Instruction statu s: E xec Wr i te
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 3 4 Load1 No
LD F2 45+ R3 2 4 5 Load2 No
MULTD F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 10 11

R eservation Stations: S1 S2 RS RS
Tim e N a m e B u s y O p Vj Vk Qj Qk
A dd 1 No
A dd 2 No
Add3 No
3 M u lt1 Yes M ULTD M (A 2) R( F4)
M u lt2 Yes D IVD M (A 1) M u lt1

R egister result status:


C lock F0 F2 F4 F6 F8 F 10 F 12 ... F 30
12 FU M u lt1 M (A 2) (M -M + M (M -M ) M u lt2
Tomasulo Example Cycle 13
Instruction statu s: E xec Wr i te
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 3 4 Load1 No
LD F2 45+ R3 2 4 5 Load2 No
MULTD F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 10 11

R eservation Stations: S1 S2 RS RS
Tim e N a m e B u s y O p Vj Vk Qj Qk
A dd 1 No
A dd 2 No
Add3 No
2 M u lt1 Yes M ULTD M (A 2) R( F4)
M u lt2 Yes D IVD M (A 1) M u lt1

R egister result status:


C lock F0 F2 F4 F6 F8 F 10 F 12 ... F 30
13 FU M u lt1 M (A 2) (M -M + M (M -M ) M u lt2
Tomasulo Example Cycle 14
Instruction statu s: E xec Wr i te
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 3 4 Load1 No
LD F2 45+ R3 2 4 5 Load2 No
MULTD F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 10 11

R eservation Stations: S1 S2 RS RS
Tim e N a m e B u s y O p Vj Vk Qj Qk
A dd 1 No
A dd 2 No
Add3 No
1 M u lt1 Yes M ULTD M (A 2) R( F4)
M u lt2 Yes D IVD M (A 1) M u lt1

R egister result status:


C lock F0 F2 F4 F6 F8 F 10 F 12 ... F 30
14 FU M u lt1 M (A 2) (M -M + M (M -M ) M u lt2
Tomasulo Example Cycle 15
Instruction statu s: E xec Wr i te
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 3 4 Load1 No
LD F2 45+ R3 2 4 5 Load2 No
MULTD F0 F2 F4 3 15 Load3 No
SUBD F8 F6 F2 4 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 10 11

R eservation Stations: S1 S2 RS RS
Tim e N a m e B u s y O p Vj Vk Qj Qk
A dd 1 No
A dd 2 No
Add3 No
0 M u lt1 Yes M ULTD M (A 2) R( F4)
M u lt2 Yes D IVD M (A 1) M u lt1

R egister result status:


C lock F0 F2 F4 F6 F8 F 10 F 12 ... F 30
15 FU M u lt1 M (A 2) (M -M + M (M -M ) M u lt2

• Mult1 (MULTD) completing; what is waiting for it?


Tomasulo Example Cycle 16
Instruction statu s: E xec Wr i te
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 3 4 Load1 No
LD F2 45+ R3 2 4 5 Load2 No
MULTD F0 F2 F4 3 15 16 Load3 No
SUBD F8 F6 F2 4 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 10 11

R eservation Stations: S1 S2 RS RS
Time Name Busy Op Vj Vk Qj Qk
Add1 No
Add2 No
Add3 No
Mult1 No
40 Mult2 Yes DIVD M*F4 M(A1)

R egister result status:


C lock F0 F2 F4 F6 F8 F 10 F 12 ... F 30
16 FU M * F4 M (A 2) (M -M + M (M -M ) M u lt2

• Just waiting for Mult2 (DIVD) to complete


skip a couple of cycles
Tomasulo Example Cycle 55
Instruction statu s: E xec Wr i te
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 3 4 Load1 No
LD F2 45+ R3 2 4 5 Load2 No
MULTD F0 F2 F4 3 15 16 Load3 No
SUBD F8 F6 F2 4 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 10 11

R eservation Stations: S1 S2 RS RS
Time Name Busy Op Vj Vk Qj Qk
Add1 No
Add2 No
Add3 No
Mult1 No
1 Mult2 Yes DIVD M*F4 M(A1)

R egister result status:


C lock F0 F2 F4 F6 F8 F 10 F 12 ... F 30
55 FU M * F4 M (A 2) (M -M + M (M -M ) M u lt2
Tomasulo Example Cycle 56
In stru ction statu s: E xec Wr i te
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 3 4 Load1 No
LD F2 45+ R3 2 4 5 Load2 No
MULTD F0 F2 F4 3 15 16 Load3 No
SUBD F8 F6 F2 4 7 8
DIVD F10 F0 F6 5 56
ADDD F6 F8 F2 6 10 11

R eservation Stations: S1 S2 RS RS
Time Name Busy Op Vj Vk Qj Qk
Add1 No
Add2 No
Add3 No
Mult1 No
0 Mult2 Yes DIVD M*F4 M(A1)

R egister result status:


C lock F0 F2 F4 F6 F8 F 10 F 12 ... F 30
56 FU M * F4 M (A 2) (M -M + M (M -M ) M u lt2

• Mult2 (DIVD) is completing; what is waiting for it?


Tomasulo Example Cycle 57
In stru ction statu s: E xec Wr i te
Instructio n j k Is s u e C o m p R es u lt B u sy A d d ress
LD F6 34+ R2 1 3 4 L o ad 1 No
LD F2 45+ R3 2 4 5 L o ad 2 No
M ULTD F0 F2 F4 3 15 16 L o ad 3 No
SUB D F8 F6 F2 4 7 8
D IVD F1 0 F0 F6 5 56 57
ADDD F6 F8 F2 6 10 11

R eservation Stations: S1 S2 RS RS
Time Name Busy Op Vj Vk Qj Qk
Add1 No
Add2 No
Add3 No
Mult1 No
Mult2 Yes DIVD M*F4 M(A1)

R egister result status:


C lock F0 F2 F4 F6 F8 F 10 F 12 ... F 30
56 FU M * F4 M (A 2) (M -M + M (M -M ) R esu lt

• Once again: In-order issue, out-of-order execution and


out-of-order completion.

You might also like