0% found this document useful (0 votes)

42 views

Lec18 Tomasulo Algorithm

Dynamic scheduling using hardware reorders instructions to reduce stalls while maintaining data flow and exception behavior. It allows out-of-order execution to avoid stalling when dependencies are present. Tomasulo's algorithm uses reservation stations to track operand availability and bypass results directly to functional units, avoiding register renaming hazards and enabling more flexibility than a compiler. Instructions issue in-order but execute out-of-order when operands are ready.

Uploaded by

mayank p

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views

Lec18 Tomasulo Algorithm

Uploaded by

mayank p

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

VLSI Architecture

ES ZG642 / MEL ZG 642

Session 16
BITS Pilani
Pawan Sharma
[email protected]
Pilani Campus 18/11/2023
Last Lecture

• Instruction Level Parallelism

• Compiler techniques to increase ILP
• Loop Unrolling

BITS Pilani, Pilani Campus

Today’s lecture

Dynamic Scheduling using Hardware

BITS Pilani, Pilani Campus

Overview
• Hardware reorders (rearranges the instruction execution) to reduce the stalls
while maintaining data flow and exception behaviour
• Dynamic scheduling offers several advantages.
• First, it allows code that was compiled with one pipeline in mind to run efficiently on a
different pipeline, eliminating the need to have multiple binaries and recompile for a
different microarchitecture. It is essential as in today’s computing environment, where
much of the software is from third parties and distributed in binary form, this advantage
is significant.
• Second, it enables handling some cases when dependences are unknown at compile
time; for example, they may involve a memory reference or a data-dependent branch, or
or dispatching.
• Third, and perhaps most importantly, it allows the processor to tolerate unpredictable
delays, such as cache misses, by executing other code while waiting for the miss to
resolve
• Although a dynamically scheduled processor cannot change the data flow, it tries to
avoid stalling when dependences are present.
• In contrast, static pipeline scheduling by the compiler tries to minimize stalls by
separating dependent instructions so that they will not lead to hazards.

BITS Pilani, Pilani Campus

• A major limitation of simple pipelining techniques is that
they use in-order instruction issue and execution:
Instructions are issued in program order, and if an
instruction is stalled in the pipeline, later instructions also
can not proceed.
• Thus, if there is a dependence between two closely
spaced instructions in the pipeline, this will lead to a
hazard and a stall will result.
• If there are multiple functional units, these units could lie
idle

BITS Pilani, Pilani Campus

HW Schemes: Instruction Parallelism
If instruction ‘J’ depends on a long-running (high latency) instruction ‘I’ currently in execution in the pipeline,
then all instructions after j must be stalled until i is finished and j can execute
I: DIVD F0,F2,F4
J: ADDD F10,F0,F8
SUBD F12,F8,F14

• The SUB.D instruction cannot execute because the dependence of ADD.D on DIV.D causes the
pipeline to stall; yet, SUB.D is not data dependent on anything in the pipeline. This hazard creates
a performance limitation that can be eliminated by not requiring instructions to execute in
program order.
• Enables out-of-order execution implying out-of-order completion (e.g., SUBD)
• In a dynamically scheduled pipeline, all instructions still pass through issue stage in order (in-order issue)
• To allow out-of-order execution, we essentially split the ID pipe stage of our simple five-stage pipeline into
two stages:
• 1. Issue—Decode instructions, check for structural hazards.
• 2. Read operands—Wait until no data hazards, then read operands.
• An instruction fetch stage precedes the issue stage and may fetch either into an instruction register or into a
queue of pending instructions; instructions are then issued from the register or queue. The execution stage
follows the read operands stage, just as in the five-stage pipeline. Execution may take multiple cycles,
depending on the operation.
• Will distinguish when an instruction begins execution and when it completes execution; between these
two times, the instruction is in execution
Dynamic Scheduling
• In the classic five-stage pipeline, both structural and data hazards could be
checked during instruction decode (ID): When an instruction could execute
without hazards, it was issued from ID knowing that all data hazards had
been resolved.
• To allow us to begin executing the SUB.D , we must separate the issue
process into two parts:
• checking for any structural hazards and
• waiting for the absence of a data hazard.
• Thus, we still use in-order instruction issue (i.e., instructions issued in
program order), but we want an instruction to begin execution as soon as
its data operands are available. Such a pipeline does out-of-order
execution, which implies out-of-order completion.
Out-of-order execution introduces the possibility of WAR and WAW hazards,

DIV.D F0,F2,F4
ADD.D F6,F0,F8
SUB.D F8,F10,F14
MUL.D F6,F10,F8

There is an antidependence between the ADD.D and the SUB.D, and if the pipeline
executes the SUB.D before the ADD.D (which is waiting for the DIV.D), it will violate
the antidependence, yielding a WAR hazard.

Likewise, to avoid violating output dependences, such as the write of F6 by MUL.D,

WAW hazards must be handled. As we will see, both these hazards are avoided by the
use of register renaming.

BITS Pilani, Pilani Campus

• Since the two capabilities—pipelined functional units and
multiple functional units—are essentially equivalent for
the purposes of pipeline control, we will assume the
processor has multiple functional units.
• In a dynamically scheduled pipeline, all instructions pass
through the issue stage in order (in-order issue);
however, they can be stalled or bypass each other in the
second stage (read operands) and thus enter execution
stage in out of order fashion.

BITS Pilani, Pilani Campus

A Dynamic Algorithm: Tomasulo’s
• The IBM 360/91 floating-point unit used a sophisticated scheme to allow out-of-order execution.
• This scheme, invented by Robert Tomasulo, tracks when operands for instructions are available to
minimize RAW hazards and introduces register renaming in hardware to minimize WAW and WAR
hazards
• There are many variations on this scheme in modern processors, although the key concepts of
tracking instruction dependences to allow execution as soon as operands are available and renaming
registers to avoid WAR and WAW hazards are common characteristics.
• Goal: High Performance without special compilers
• Small number of floating point registers (4 in 360 family of IBM) that limited the effectiveness of
compiler scheduling of operations.
• This led Tomasulo to try to figure out how to get more effective registers — renaming in hardware!
(using reservation stations)
• RAW hazards are avoided by executing an instruction only when its operands are available,
• WAR and WAW hazards, which arise from name dependences, are eliminated by register renaming
• Why Study 1966 Computer?
• The descendants of this have flourished!
– Alpha 21264, Pentium 4, AMD Opteron, Power 5, …
• Basic task of reservation station is to fetch and buffer an
operand as soon as it is available, eliminating the need to get
the operand from a register.
• In addition, pending instructions reach out to their designated
reservation station which provides their input (source
operands).
• Finally, when successive writes to a register overlap in
execution, only the last one is actually used to update the
register.
• Since there can be more reservation stations than real
registers, the technique can even eliminate hazards arising
from name dependences that could not be eliminated by a
compiler.

BITS Pilani, Pilani Campus

• The use of reservation stations, rather than a
centralized register file, leads to two other important
properties.
• First, hazard detection and execution control are distributed: The information
held in the reservation stations at each functional unit determines when an
instruction can begin execution at that unit.
• Second, results are passed directly to functional units from the reservation
stations where they are buffered, rather than going through the registers.
• This bypassing is done with a common data bus (CDB) that allows all units
waiting for an operand to be loaded simultaneously

BITS Pilani, Pilani Campus

• Instructions are sent from the instruction unit into the instruction queue from
which they are issued in first-in, first-out (FIFO) order.
• The reservation stations include the operation and the actual operands, as
well as information used for detecting and resolving hazards.
• Load buffers have three functions: (1) hold the components of the effective
address until it is computed, (2) track outstanding loads that are waiting on
the memory, and (3) hold the results of completed loads that are waiting for
the CDB.
• Similarly, store buffers have three functions: (1) hold the components of the
effective address until it is computed, (2) hold the destination memory
addresses of outstanding stores that are waiting for the data value to store,
and (3) hold the address and value to store until the memory nit is available.
• All results from either the FP units or the load unit are put on the CDB, which
goes to the FP register file as well as to the reservation stations and store
buffers.
• The FP adders implement addition and subtraction, and the FP multipliers do
multiplication and division.

BITS Pilani, Pilani Campus

Let’s look at the steps an instruction goes through. There are only three steps, although each
one can now take an arbitrary number of clock cycles:

Issue—Get the next instruction from the head of the instruction queue, which is maintained in
FIFO order to ensure the maintenance of correct data flow. If there is a matching
reservation station that is empty, issue the instruction to the station with the operand
values, if they are currently in the registers. If there is not an empty reservation station,
then there is a structural hazard and the instruction stalls until a station or buffer is freed. If
the operands are not in the registers, keep track of the functional units that will produce the
operands. This step renames registers, eliminating WAR and WAW hazards. (This stage is
sometimes called dispatch in a dynamically scheduled processor.)
Execute—If one or more of the operands is not yet available, monitor the common data bus
while waiting for it to be computed. When an operand becomes available, it is placed into
any reservation station waiting for it. When all the operands are available, the operation
can be executed by the corresponding functional unit. By delaying instruction execution
until the operands are available, RAW hazards are avoided.

Write result—When the result is available, write it on the CDB and from there into the registers
and into any reservation stations (including store buffers) waiting for this result. Stores are
buffered in the store buffer until both the value to be stored and the store address are
available, then the result is written as soon as the memory unit is free.

BITS Pilani, Pilani Campus

Reservation Station Components

Op: Operation to perform on source operands S1 and S2 in the

unit (e.g., + or –)
Vj, Vk: Value of Source operands
– Store buffers has V field, result to be stored
Qj, Qk: Reservation stations producing corresponding source
operands. A value of 0 indicates that the operand is already
available in Vj or Vk or is unnecessary.

Busy: Indicates reservation station and its accompanying FU is

busy

Register result status—Indicates which functional unit will write

each register, if one exists. Blank when no pending instructions
that will write that register.
Example

1. L.D F6, 34(R2)

2. L.D F2, 45(R3)
3. MUL.D F0, F2, F4
4. SUB.D F8, F2, F6
5. DIV.D F10, F0, F6
6. ADD.D F6, F8, F2
Latencies
• Assume operation latencies
– load: 2 clock cycles
– add/sub: 2 clock cycles
– multiply: 10 clock cycles
– divide: 40 clock cycles

– 1 cycle to write the result

Tomasulo Example
Instruction stream
Instruction statu s: E xec Wr i te
Instructio n j k Is s u e C o m p R es u lt B u sy A d d ress
LD F6 34+ R2 L o ad 1 No
LD F2 45+ R3 L o ad 2 No
M ULTD F0 F2 F4 L o ad 3 No
SUB D F8 F6 F2
D IVD F1 0 F0 F6
ADDD F6 F8 F2 3 Load/Buffers

R eservation Stations: S1 S2 RS RS
Time Name Busy Op Vj Vk Qj Qk
Add1 No
FU coun t Add2 No
3 FP Adder R.S.
down Add3 No
2 FP Mult R.S.
Mult1 No
Mult2 No
R egister result status:
C lock F0 F2 F4 F6 F8 F 10 F 12 ... F 30
0 FU

Clock cycle
counter
Tomasulo Example Cycle 1
Instruction statu s: E xec Wr i te
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 Load1 Yes 34+R2
LD F2 45+ R3 Load2 No
MULTD F0 F2 F4 Load3 No
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2

R eservation Stations: S1 S2 RS RS
Time Name Busy Op Vj Vk Qj Qk
Add1 No
Add2 No
Add3 No
Mult1 No
Mult2 No

R egister result status:

C lock F0 F2 F4 F6 F8 F 10 F 12 ... F 30
1 FU L o ad 1
Tomasulo Example Cycle 2
Instruction statu s: E xec Wr i te
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 Load1 Yes 34+R2
LD F2 45+ R3 2 Load2 Yes 45+R3
MULTD F0 F2 F4 Load3 No
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2

R eservation Stations: S1 S2 RS RS
Time Name Busy Op Vj Vk Qj Qk
Add1 No
Add2 No
Add3 No
Mult1 No
Mult2 No

R egister result status:

C lock F0 F2 F4 F6 F8 F10 F12 ... F30
2 FU Load2 Load1

Note: Can have multiple loads outstanding

Tomasulo Example Cycle 3
In stru ction statu s: E xec Wr i te
Instructio n j k Is s u e C o m p R es u lt B u sy A d d ress
LD F6 34+ R2 1 3 L o ad 1 Yes 34+ R2
LD F2 45+ R3 2 L o ad 2 Yes 45+ R3
M ULTD F0 F2 F4 3 L o ad 3 No
SUB D F8 F6 F2
D IVD F1 0 F0 F6
ADDD F6 F8 F2

R eservation Stations: S1 S2 RS RS
Tim e N a m e B u s y Op Vj Vk Qj Qk
Add1 No
Add2 No
Add3 No
Mult1 Yes MULTD R(F4) Load2
Mult2 No
R egister result status:
C lock F0 F2 F4 F6 F8 F10 F12 ... F30
3 FU Mult1 Load2 Load1

• Note: registers names are removed (“renamed”) in Reservation

Stations; MULT issued
• Load1 completing; what is waiting for Load1?
Tomasulo Example Cycle 4
Instruction statu s: E xec Wr i te
Instructio n j k Is s u e C o m p R es u lt B u sy A d d ress
LD F6 34+ R2 1 3 4 L o ad 1 No
LD F2 45+ R3 2 4 L o ad 2 Yes 45+ R3
M ULTD F0 F2 F4 3 L o ad 3 No
SUB D F8 F6 F2 4
D IVD F1 0 F0 F6
ADDD F6 F8 F2

R eservation Stations: S1 S2 RS RS
Tim e N a m e B u s y O p Vj Vk Qj Qk
A d d 1 Yes SUB D M (A 1) L o ad 2
Add2 No
Add3 No
M u lt1 Yes M ULTD R( F4) L o ad 2
M u lt2 N o

R egister result status:

C lock F0 F2 F4 F6 F8 F10 F12 ... F30
4 FU Mult1 Load2 M(A1) Add1

• Load2 completing; what is waiting for Load2?

Tomasulo Example Cycle 5
Instruction statu s: E xec Wr i te
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 3 4 Load1 No
LD F2 45+ R3 2 4 5 Load2 No
MULTD F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4
DIVD F10 F0 F6 5
ADDD F6 F8 F2

R eservation Stations: S1 S2 RS RS
Tim e N a m e B u s y O p Vj Vk Qj Qk
2 A d d 1 Yes SUB D M (A 1) M (A 2)
Add2 No
Add3 No
1 0 M u lt1 Yes M ULTD M (A 2) R( F4)
M u lt2 Yes D IVD M (A 1) M u lt1

R egister result status:

C lock F0 F2 F4 F6 F8 F10 F12 ... F30
5 FU Mult1 M(A2) M(A1) Add1 Mult2

• Timer starts down for Add1, Mult1

Tomasulo Example Cycle 6
Instruction statu s: E xec Wr i te
Instructio n j k Is s u e C o m p R es u lt B u sy A d d ress
LD F6 34+ R2 1 3 4 L o ad 1 No
LD F2 45+ R3 2 4 5 L o ad 2 No
M ULTD F0 F2 F4 3 L o ad 3 No
SUB D F8 F6 F2 4
D IVD F1 0 F0 F6 5
ADDD F6 F8 F2 6

R eservation Stations: S1 S2 RS RS
Tim e N a m e B u s y Op Vj Vk Qj Qk
1 A d d1 Yes SUBD M(A1) M(A2)
A dd 2 Yes ADDD M(A2) Add1
Add3 No
9 M u lt1 Yes MULTD M(A2) R(F4)
M u lt2 Yes DIVD M(A1) Mult1

R egister result status:

C lock F0 F2 F4 F6 F8 F10 F12 ... F30
6 FU Mult1 M(A2) Add2 Add1 Mult2

• Issue ADDD here despite name dependency on F6?

Tomasulo Example Cycle 7
In stru ction statu s: E xec Wr i te
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 3 4 Load1 No
LD F2 45+ R3 2 4 5 Load2 No
MULTD F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4 7
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6

R eservation Stations: S1 S2 RS RS
Tim e N a m e B u s y O p Vj Vk Qj Qk
0 A d d1 Yes SUB D M (A 1) M (A 2)
A dd 2 Yes A D D D M (A 2) A d d 1
Add3 No
8 M u lt1 Yes M ULTD M (A 2) R( F4)
M u lt2 Yes D IVD M (A 1) M u lt1

R egister result status:

C lock F0 F2 F4 F6 F8 F10 F12 ... F30
7 FU Mult1 M(A2) Add2 Add1 Mult2

• Add1 (SUBD) completing; what is waiting for it?

Tomasulo Example Cycle 8
Instruction statu s: E xec Wr i te
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 3 4 Load1 No
LD F2 45+ R3 2 4 5 Load2 No
MULTD F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6

R eservation Stations: S1 S2 RS RS
Tim e N a m e B u s y O p Vj Vk Qj Qk
Add1 No
2 A d d 2 Yes A D D D (M -M ) M (A 2)
Add3 No
7 M u lt1 Yes M ULTD M (A 2) R( F4)
M u lt2 Yes D IVD M (A 1) M u lt1

R egister result status:

C lock F0 F2 F4 F6 F8 F10 F12 ... F30
8 FU Mult1 M(A2) Add2 (M-M) Mult2
Tomasulo Example Cycle 9
Instruction statu s: E xec Wr i te
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 3 4 Load1 No
LD F2 45+ R3 2 4 5 Load2 No
MULTD F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6

R eservation Stations: S1 S2 RS RS
Tim e N a m e B u s y O p Vj Vk Qj Qk
Add1 No
1 A d d 2 Yes A D D D (M -M ) M (A 2)
Add3 No
6 M u lt1 Yes M ULTD M (A 2) R( F4)
M u lt2 Yes D IVD M (A 1) M u lt1

R egister result status:

C lock F0 F2 F4 F6 F8 F10 F12 ... F30
9 FU Mult1 M(A2) Add2 (M-M) Mult2
Tomasulo Example Cycle 10
In stru ction statu s: E xec Wr i te
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 3 4 Load1 No
LD F2 45+ R3 2 4 5 Load2 No
MULTD F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 10

R eservation Stations: S1 S2 RS RS
Tim e N a m e B u s y O p Vj Vk Qj Qk
Add1 No
0 A d d 2 Yes A D D D (M -M ) M (A 2)
Add3 No
5 M u lt1 Yes M ULTD M (A 2) R( F4)
M u lt2 Yes D IVD M (A 1) M u lt1

R egister result status:

C lock F0 F2 F4 F6 F8 F10 F12 ... F30
10 FU Mult1 M(A2) Add2 (M-M) Mult2

• Add2 (ADDD) completing; what is waiting for it?

Tomasulo Example Cycle 11
Instruction statu s: E xec Wr i te
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 3 4 Load1 No
LD F2 45+ R3 2 4 5 Load2 No
MULTD F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 10 11

R eservation Stations: S1 S2 RS RS
Tim e N a m e B u s y O p Vj Vk Qj Qk
A dd 1 No
A dd 2 No
Add3 No
4 M u lt1 Yes M ULTD M (A 2) R( F4)
M u lt2 Yes D IVD M (A 1) M u lt1

R egister result status:

C lock F0 F2 F4 F6 F8 F 10 F 12 ... F 30
11 FU Mult1 M(A2) (M-M+M (M-M) Mult2

• Write result of ADDD here?

• All quick instructions complete in this cycle!
Tomasulo Example Cycle 12
Instruction statu s: E xec Wr i te
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 3 4 Load1 No
LD F2 45+ R3 2 4 5 Load2 No
MULTD F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 10 11

R eservation Stations: S1 S2 RS RS
Tim e N a m e B u s y O p Vj Vk Qj Qk
A dd 1 No
A dd 2 No
Add3 No
3 M u lt1 Yes M ULTD M (A 2) R( F4)
M u lt2 Yes D IVD M (A 1) M u lt1

R egister result status:

C lock F0 F2 F4 F6 F8 F 10 F 12 ... F 30
12 FU M u lt1 M (A 2) (M -M + M (M -M ) M u lt2
Tomasulo Example Cycle 13
Instruction statu s: E xec Wr i te
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 3 4 Load1 No
LD F2 45+ R3 2 4 5 Load2 No
MULTD F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 10 11

R eservation Stations: S1 S2 RS RS
Tim e N a m e B u s y O p Vj Vk Qj Qk
A dd 1 No
A dd 2 No
Add3 No
2 M u lt1 Yes M ULTD M (A 2) R( F4)
M u lt2 Yes D IVD M (A 1) M u lt1

R egister result status:

C lock F0 F2 F4 F6 F8 F 10 F 12 ... F 30
13 FU M u lt1 M (A 2) (M -M + M (M -M ) M u lt2
Tomasulo Example Cycle 14
Instruction statu s: E xec Wr i te
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 3 4 Load1 No
LD F2 45+ R3 2 4 5 Load2 No
MULTD F0 F2 F4 3 Load3 No
SUBD F8 F6 F2 4 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 10 11

R eservation Stations: S1 S2 RS RS
Tim e N a m e B u s y O p Vj Vk Qj Qk
A dd 1 No
A dd 2 No
Add3 No
1 M u lt1 Yes M ULTD M (A 2) R( F4)
M u lt2 Yes D IVD M (A 1) M u lt1

R egister result status:

C lock F0 F2 F4 F6 F8 F 10 F 12 ... F 30
14 FU M u lt1 M (A 2) (M -M + M (M -M ) M u lt2
Tomasulo Example Cycle 15
Instruction statu s: E xec Wr i te
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 3 4 Load1 No
LD F2 45+ R3 2 4 5 Load2 No
MULTD F0 F2 F4 3 15 Load3 No
SUBD F8 F6 F2 4 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 10 11

R eservation Stations: S1 S2 RS RS
Tim e N a m e B u s y O p Vj Vk Qj Qk
A dd 1 No
A dd 2 No
Add3 No
0 M u lt1 Yes M ULTD M (A 2) R( F4)
M u lt2 Yes D IVD M (A 1) M u lt1

R egister result status:

C lock F0 F2 F4 F6 F8 F 10 F 12 ... F 30
15 FU M u lt1 M (A 2) (M -M + M (M -M ) M u lt2

• Mult1 (MULTD) completing; what is waiting for it?

Tomasulo Example Cycle 16
Instruction statu s: E xec Wr i te
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 3 4 Load1 No
LD F2 45+ R3 2 4 5 Load2 No
MULTD F0 F2 F4 3 15 16 Load3 No
SUBD F8 F6 F2 4 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 10 11

R eservation Stations: S1 S2 RS RS
Time Name Busy Op Vj Vk Qj Qk
Add1 No
Add2 No
Add3 No
Mult1 No
40 Mult2 Yes DIVD M*F4 M(A1)

R egister result status:

C lock F0 F2 F4 F6 F8 F 10 F 12 ... F 30
16 FU M * F4 M (A 2) (M -M + M (M -M ) M u lt2

• Just waiting for Mult2 (DIVD) to complete

skip a couple of cycles
Tomasulo Example Cycle 55
Instruction statu s: E xec Wr i te
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 3 4 Load1 No
LD F2 45+ R3 2 4 5 Load2 No
MULTD F0 F2 F4 3 15 16 Load3 No
SUBD F8 F6 F2 4 7 8
DIVD F10 F0 F6 5
ADDD F6 F8 F2 6 10 11

R eservation Stations: S1 S2 RS RS
Time Name Busy Op Vj Vk Qj Qk
Add1 No
Add2 No
Add3 No
Mult1 No
1 Mult2 Yes DIVD M*F4 M(A1)

R egister result status:

C lock F0 F2 F4 F6 F8 F 10 F 12 ... F 30
55 FU M * F4 M (A 2) (M -M + M (M -M ) M u lt2
Tomasulo Example Cycle 56
In stru ction statu s: E xec Wr i te
Instruction j k Issue Comp Result Busy Address
LD F6 34+ R2 1 3 4 Load1 No
LD F2 45+ R3 2 4 5 Load2 No
MULTD F0 F2 F4 3 15 16 Load3 No
SUBD F8 F6 F2 4 7 8
DIVD F10 F0 F6 5 56
ADDD F6 F8 F2 6 10 11

R eservation Stations: S1 S2 RS RS
Time Name Busy Op Vj Vk Qj Qk
Add1 No
Add2 No
Add3 No
Mult1 No
0 Mult2 Yes DIVD M*F4 M(A1)

R egister result status:

C lock F0 F2 F4 F6 F8 F 10 F 12 ... F 30
56 FU M * F4 M (A 2) (M -M + M (M -M ) M u lt2

• Mult2 (DIVD) is completing; what is waiting for it?

Tomasulo Example Cycle 57
In stru ction statu s: E xec Wr i te
Instructio n j k Is s u e C o m p R es u lt B u sy A d d ress
LD F6 34+ R2 1 3 4 L o ad 1 No
LD F2 45+ R3 2 4 5 L o ad 2 No
M ULTD F0 F2 F4 3 15 16 L o ad 3 No
SUB D F8 F6 F2 4 7 8
D IVD F1 0 F0 F6 5 56 57
ADDD F6 F8 F2 6 10 11

R eservation Stations: S1 S2 RS RS
Time Name Busy Op Vj Vk Qj Qk
Add1 No
Add2 No
Add3 No
Mult1 No
Mult2 Yes DIVD M*F4 M(A1)

R egister result status:

C lock F0 F2 F4 F6 F8 F 10 F 12 ... F 30
56 FU M * F4 M (A 2) (M -M + M (M -M ) R esu lt

• Once again: In-order issue, out-of-order execution and

out-of-order completion.

Fortinet FCP - FortiGate 7.4 Administrator Exam Preparation
From Everand
Fortinet FCP - FortiGate 7.4 Administrator Exam Preparation
Georgio Daccache
No ratings yet
PH Eur 2.8.20. Herbal Drugs - Sampling and Sample Preparation
No ratings yet
PH Eur 2.8.20. Herbal Drugs - Sampling and Sample Preparation
2 pages
Pipeline Hazards (1)
No ratings yet
Pipeline Hazards (1)
53 pages
Module 1: PARALLEL AND DISTRIBUTED COMPUTING
No ratings yet
Module 1: PARALLEL AND DISTRIBUTED COMPUTING
65 pages
Dynamic Scheduling:-: If An Instruction Is Stalled in The Pipeline, No Later Instructions Can Proceed
No ratings yet
Dynamic Scheduling:-: If An Instruction Is Stalled in The Pipeline, No Later Instructions Can Proceed
4 pages
Advanced Topics in Computer Architecture ECE 7373
No ratings yet
Advanced Topics in Computer Architecture ECE 7373
40 pages
CA Classes-126-130
No ratings yet
CA Classes-126-130
5 pages
Aca 3
No ratings yet
Aca 3
113 pages
Instruction-Level Parallelism 2
No ratings yet
Instruction-Level Parallelism 2
77 pages
Superscalar Vs Vliw
No ratings yet
Superscalar Vs Vliw
8 pages
Design of 3 Stage Pipelining Processor Using VHDL
No ratings yet
Design of 3 Stage Pipelining Processor Using VHDL
22 pages
Coa Iat-2 QB Soln
No ratings yet
Coa Iat-2 QB Soln
16 pages
Week 4 - Pipelining
No ratings yet
Week 4 - Pipelining
44 pages
Cs2354 Advanced Computer Architecture 2 Marks
No ratings yet
Cs2354 Advanced Computer Architecture 2 Marks
10 pages
CSE 820 Graduate Computer Architecture Week 5 - Instruction Level Parallelism
No ratings yet
CSE 820 Graduate Computer Architecture Week 5 - Instruction Level Parallelism
38 pages
Dynamic Scheduling Using Tomasulo's Approach
No ratings yet
Dynamic Scheduling Using Tomasulo's Approach
4 pages
Pipelining2019_(1)[1]
No ratings yet
Pipelining2019_(1)[1]
82 pages
CS6461 Computer Architecture Lecture 8
No ratings yet
CS6461 Computer Architecture Lecture 8
61 pages
CA unit-2 Chapter-2
No ratings yet
CA unit-2 Chapter-2
36 pages
Itanium Processor Seminar Report
No ratings yet
Itanium Processor Seminar Report
30 pages
Data Hazards
No ratings yet
Data Hazards
31 pages
CS8491 Ca Unit 4
No ratings yet
CS8491 Ca Unit 4
32 pages
Cs501 Notes (1)
No ratings yet
Cs501 Notes (1)
33 pages
Module 3
No ratings yet
Module 3
20 pages
Content: - Introduction To Pipeline Hazard - Structural Hazard - Data Hazard - Control Hazard
No ratings yet
Content: - Introduction To Pipeline Hazard - Structural Hazard - Data Hazard - Control Hazard
27 pages
Unit 5
No ratings yet
Unit 5
43 pages
EEF011 Computer Architecture 計算機結構: Exploiting Instruction-Level Parallelism with Software Approaches
0% (1)
EEF011 Computer Architecture 計算機結構: Exploiting Instruction-Level Parallelism with Software Approaches
40 pages
Co - Unit Ii - Ii
No ratings yet
Co - Unit Ii - Ii
34 pages
Superscalar Processors: What Is A Superscalar Architecture?
No ratings yet
Superscalar Processors: What Is A Superscalar Architecture?
9 pages
Tomasulo's Algorithm and Scoreboarding
No ratings yet
Tomasulo's Algorithm and Scoreboarding
17 pages
Detailed Instruction Level Parallelism
No ratings yet
Detailed Instruction Level Parallelism
12 pages
Chapter 14 - Processor Structure and Function
No ratings yet
Chapter 14 - Processor Structure and Function
74 pages
Unit 5
No ratings yet
Unit 5
36 pages
DCO Presentation 5 PDF
No ratings yet
DCO Presentation 5 PDF
75 pages
03 Dynamic Sched
No ratings yet
03 Dynamic Sched
84 pages
Parallel Processing: sp2016 Lec#3
No ratings yet
Parallel Processing: sp2016 Lec#3
23 pages
Instruction Pipelining
No ratings yet
Instruction Pipelining
32 pages
Instruction Pipeline Design, Arithmetic Pipeline Deign - Super Scalar Pipeline Design
No ratings yet
Instruction Pipeline Design, Arithmetic Pipeline Deign - Super Scalar Pipeline Design
34 pages
Instruction Pipeline - Study Notes
No ratings yet
Instruction Pipeline - Study Notes
14 pages
OS Unit 2
No ratings yet
OS Unit 2
67 pages
BITS Pilani: Reconfigurable Computing Es ZG 554 / Mel ZG 554 Session 1
No ratings yet
BITS Pilani: Reconfigurable Computing Es ZG 554 / Mel ZG 554 Session 1
23 pages
CAunitiii
No ratings yet
CAunitiii
36 pages
MIPS
No ratings yet
MIPS
70 pages
moduel 5
No ratings yet
moduel 5
46 pages
Courseproject - Computers Assignment Design Compilers .
No ratings yet
Courseproject - Computers Assignment Design Compilers .
6 pages
Module3 Process Synchronization
No ratings yet
Module3 Process Synchronization
40 pages
ACA Question Bank
No ratings yet
ACA Question Bank
19 pages
Survey of Software Architectures
No ratings yet
Survey of Software Architectures
29 pages
Dpco Unit 4
No ratings yet
Dpco Unit 4
21 pages
Pipeline Hazards. Presentation
100% (2)
Pipeline Hazards. Presentation
20 pages
STW120CT Computer Architecture and Networks: (Instruction Pipelining)
No ratings yet
STW120CT Computer Architecture and Networks: (Instruction Pipelining)
24 pages
CHAPTER 8 BEE 3113
No ratings yet
CHAPTER 8 BEE 3113
52 pages
System On Chip: Inside Processor Pipeline Stalls
No ratings yet
System On Chip: Inside Processor Pipeline Stalls
12 pages
Ch2 Lec7 Instruction Piplining
No ratings yet
Ch2 Lec7 Instruction Piplining
34 pages
OLP Notes
No ratings yet
OLP Notes
11 pages
2ecde54 Soc Design Processors: DR N P Gajjar
No ratings yet
2ecde54 Soc Design Processors: DR N P Gajjar
14 pages
CO
No ratings yet
CO
11 pages
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet
Hack into your Friends Computer
From Everand
Hack into your Friends Computer
Magelan Cyber Security
No ratings yet
Fortinet FCP - FortiGate 7.6 Administrator Exam Preparation
From Everand
Fortinet FCP - FortiGate 7.6 Administrator Exam Preparation
Georgio Daccache
No ratings yet
Mastering Terraform A Comprehensive Guide to Infrastructure As Code
From Everand
Mastering Terraform A Comprehensive Guide to Infrastructure As Code
Mario Marinov
No ratings yet
Greatest Books of All Time Original
No ratings yet
Greatest Books of All Time Original
95 pages
Preheat Calculation 2 PDF
No ratings yet
Preheat Calculation 2 PDF
3 pages
Dharma Patanjala
No ratings yet
Dharma Patanjala
18 pages
Geometry Expressions Examples
No ratings yet
Geometry Expressions Examples
10 pages
Three Basic Moral Virtues by Aristotle
No ratings yet
Three Basic Moral Virtues by Aristotle
1 page
Green-Cover Project Report
No ratings yet
Green-Cover Project Report
35 pages
Testing Requirements One Eighty Presentation 2
No ratings yet
Testing Requirements One Eighty Presentation 2
22 pages
Test Cases For Decision Coverage and Modified Condition - Decision Coverage
No ratings yet
Test Cases For Decision Coverage and Modified Condition - Decision Coverage
71 pages
MR 56
No ratings yet
MR 56
3 pages
Module-01 Water Supply Engineering PDF
100% (1)
Module-01 Water Supply Engineering PDF
57 pages
Raychem Screened Elbow/Straight Separable Connectors Rses/Rsss
No ratings yet
Raychem Screened Elbow/Straight Separable Connectors Rses/Rsss
2 pages
WG J-F - Brokers 12
No ratings yet
WG J-F - Brokers 12
6 pages
Template Service Level Agreement
No ratings yet
Template Service Level Agreement
102 pages
RET615 PG 756891 ENe PDF
No ratings yet
RET615 PG 756891 ENe PDF
72 pages
27.verb-Ing Modifier - V2 PDF
No ratings yet
27.verb-Ing Modifier - V2 PDF
6 pages
PC350LC-8 (Kepb075306)
100% (3)
PC350LC-8 (Kepb075306)
590 pages
Module 4
No ratings yet
Module 4
21 pages
Lip Posture and Its Signi Ficance Treatment Plannin G: Indiamapoli., Ind
No ratings yet
Lip Posture and Its Signi Ficance Treatment Plannin G: Indiamapoli., Ind
20 pages
Food Menu TTB 2024
No ratings yet
Food Menu TTB 2024
2 pages
Flushing Plan
No ratings yet
Flushing Plan
1 page
Let 2022
No ratings yet
Let 2022
11 pages
AEL Smart Report
No ratings yet
AEL Smart Report
14 pages
Resum Anglés
No ratings yet
Resum Anglés
12 pages
Gasuzifinekozewik
No ratings yet
Gasuzifinekozewik
2 pages
Concealed Type
No ratings yet
Concealed Type
2 pages
Submittal Transmittal: Date: 3/8/2016 Reference Number: 0483
No ratings yet
Submittal Transmittal: Date: 3/8/2016 Reference Number: 0483
28 pages
Tutorial 01
No ratings yet
Tutorial 01
131 pages
Sample CVs For Civil Professor
No ratings yet
Sample CVs For Civil Professor
11 pages
Incoming 3P4W 22Kv 50Hz: Change New Cts
No ratings yet
Incoming 3P4W 22Kv 50Hz: Change New Cts
1 page