0% found this document useful (0 votes)
11 views26 pages

Dynamic Approach Hardware Based Speculation

The document discusses the concept of speculation in advanced computer architecture to enhance instruction-level parallelism (ILP) by predicting branch outcomes and executing instructions accordingly. It outlines the key components of hardware-based speculation, including dynamic branch prediction, speculative execution, and dynamic scheduling, as well as the role of the reorder buffer (ROB) in managing instruction execution and commitment. The document also elaborates on the implementation of speculation within the Tomasulo algorithm, emphasizing the separation of execution and commitment stages to handle mispredictions effectively.

Uploaded by

Herlin L.T.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views26 pages

Dynamic Approach Hardware Based Speculation

The document discusses the concept of speculation in advanced computer architecture to enhance instruction-level parallelism (ILP) by predicting branch outcomes and executing instructions accordingly. It outlines the key components of hardware-based speculation, including dynamic branch prediction, speculative execution, and dynamic scheduling, as well as the role of the reorder buffer (ROB) in managing instruction execution and commitment. The document also elaborates on the implementation of speculation within the Tomasulo algorithm, emphasizing the separation of execution and commitment stages to handle mispredictions effectively.

Uploaded by

Herlin L.T.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 26

Advanced Computer Architecture

(CS2354)

ILP– Speculation
Outline
• Speculation to Greater ILP
• Speculative Tomasulo Example

04/21/25 CPE 731, ILP4 2


Speculation to Greater ILP
• Greater ILP: Overcome control dependence by
hardware speculating on outcome of branches
and executing program as if guesses were correct
– Speculation  fetch, issue, and execute instructions as if
branch predictions were always correct
– Dynamic scheduling  only fetches and issues
instructions
• Essentially a data flow execution model:
Operations execute as soon as their operands are
available

04/21/25 CPE 731, ILP4 3


Speculation to Greater ILP
 To optimally exploit ILP (instruction-level parallelism) –
e.g. with pipelining, Tomasulo, etc. – it is critical to
efficiently maintain control dependencies (=branch
dependencies)
 Key idea: Speculate on the outcome of
branches(=predict) and execute instructions as if the
predictions are correct
of course, we must proceed in such a manner as to
be able to recover if our speculation turns out wrong
 Modern processors such as PowerPC 603/604,
MIPS R10000, Intel Pentium II/III/4, Alpha 21264
extend Tomasulo’s approach to support
speculation

04/21/25 CPE 731, ILP4 4


Speculation to Greater ILP
• Key ideas:
– separate execution from completion: allow
instructions to execute speculatively but do not let
instructions update registers or memory until they
are no longer speculative
– therefore, add a final step – after an instruction is no
longer speculative – when it is allowed to make
register and memory updates, called instruction
commit
– allow instructions to execute and complete out of
order but force them to commit in order
– add a hardware buffer, called the reorder buffer
(ROB), with registers to hold the result of an
instruction between completion and commit

04/21/25 CPE 731, ILP4 5


Speculation to Greater ILP
• 3 components of HW-based speculation:
1. Dynamic branch prediction to choose which
instructions to execute
2. Speculation to allow execution of instructions
before control dependences are resolved
+ ability to undo effects of incorrectly speculated sequence
3. Dynamic scheduling to deal with scheduling of
different combinations of basic blocks

04/21/25 CPE 731, ILP4 6


Adding Speculation to Tomasulo
• Must separate execution from allowing
instruction to finish or “commit”
• This additional step called instruction commit
• When an instruction is no longer speculative,
allow it to update the register file or memory
• Requires additional set of buffers to hold results
of instructions that have finished execution but
have not committed
• This reorder buffer (ROB) is also used to pass
results among instructions that may be
speculated

04/21/25 CPE 731, ILP4 7


Adding Speculation to Tomasulo

Reorder
FP Buffer
Op
Queue FP Regs

Res Stations Res Stations


FP Adder FP Adder

04/21/25 CPE 731, ILP4 8


04/21/25 CPE 731, ILP4 9
Reorder Buffer (ROB)
• In Tomasulo’s algorithm, once an instruction
writes its result, any subsequently issued
instructions will find result in the register file
• With speculation, the register file is not updated
until the instruction commits
– (we know definitively that the instruction should execute)
• Thus, the ROB supplies operands in interval
between completion of instruction execution and
instruction commit
– ROB is a source of operands for instructions, just as
reservation stations (RS) provide operands in Tomasulo’s
algorithm
– ROB extends architectured registers like RS

04/21/25 CPE 731, ILP4 10


Reorder Buffer Entry
• Each entry in the ROB contains four fields:
1. Instruction type
• a branch (has no destination result), a store (has a memory
address destination), or a register operation (ALU operation
or load, which has register destinations)
2. Destination
• Register number (for loads and ALU operations) or
memory address (for stores)
where the instruction result should be written
3. Value
• Value of instruction result until the instruction commits
4. Ready
• Indicates that instruction has completed execution, and the
value is ready

04/21/25 CPE 731, ILP4 11


Reorder Buffer operation
• Holds instructions in FIFO order, exactly as issued
• When instructions complete, results placed into ROB
– Supplies operands to other instructions between execution
complete & commit  more registers like RS
– Tag results with ROB buffer number instead of reservation station
• Instructions commit values at head of ROB placed in
registers
• As a result, easy to undo Reorder
speculated instructions FP Buffer
on mispredicted branches Op
Queue FP Regs
or on exceptions
Commit path

Res Stations Res Stations


FP Adder FP Adder

04/21/25 CPE 731, ILP4 12


Recall: 4 Steps of Speculative Tomasulo
Algorithm
1. Issue—get instruction from FP Op Queue
If reservation station and reorder buffer slot free, issue instr &
send operands & reorder buffer no. for destination (this stage
sometimes called “dispatch”)
2. Execution—operate on operands (EX)
When both operands ready then execute; if not ready, watch CDB
for result; when both in reservation station, execute; checks RAW
(sometimes called “issue”)
3. Write result—finish execution (WB)
Write on Common Data Bus to all awaiting FUs
& reorder buffer; mark reservation station available.
4. Commit—update register with reorder result
When instr. at head of reorder buffer & result present, update
register with result (or store to memory) and remove instr from
reorder buffer. Mispredicted branch flushes reorder buffer
(sometimes called “graduation”)

04/21/25 CPE 731, ILP4 13


Tomasulo With Reorder buffer:
Done?
FP Op ROB7 Newest
Queue ROB6
ROB5

Reorder Buffer ROB4


ROB3
ROB2 Oldest
F0
F0 LD
LD F0,10(R2)
F0,10(R2) NN ROB1

Registers To
Memory
Dest from
Dest
Memory
Dest
Reservation 11 10+R2
10+R2
Stations
FP
FPadders
adders FP
FPmultipliers
multipliers

04/21/25 CPE 731, ILP4 14


Tomasulo With Reorder buffer:
Done?
FP Op ROB7 Newest
Queue ROB6
ROB5

Reorder Buffer ROB4


ROB3
F10
F10 ADDD
ADDD F10,F4,F0
F10,F4,F0 NN ROB2 Oldest
F0
F0 LD
LD F0,10(R2)
F0,10(R2) NN ROB1

Registers To
Memory
Dest from
Dest
22 ADDD
ADDD R(F4),ROB1
R(F4),ROB1 Memory
Dest
Reservation 11 10+R2
10+R2
Stations
FP
FPadders
adders FP
FPmultipliers
multipliers

15
Tomasulo With Reorder buffer:
Done?
FP Op ROB7 Newest
Queue ROB6
ROB5

Reorder Buffer F2 DIVD


ROB4

F2 DIVD F2,F10,F6
F2,F10,F6 NN ROB3
F10
F10 ADDD
ADDD F10,F4,F0
F10,F4,F0 NN ROB2 Oldest
F0
F0 LD
LD F0,10(R2)
F0,10(R2) NN ROB1

Registers To
Memory
Dest from
Dest
22 ADDD
ADDD R(F4),ROB1
R(F4),ROB1 Memory
33 DIVD
DIVD ROB2,R(F6)
ROB2,R(F6)
Dest
Reservation 11 10+R2
10+R2
Stations
FP
FPadders
adders FP
FPmultipliers
multipliers

04/21/25 CPE 731, ILP4 16


Tomasulo With Reorder buffer:
Done?
FP Op ROB7 Newest
Queue F0 ADDD ROB6
F0 ADDD F0,F4,F6
F0,F4,F6 NN
F4
F4 LD
LD F4,0(R3)
F4,0(R3) NN ROB5

Reorder Buffer --
--
F2
BNE
BNE F2,<…>
DIVD
F2,<…> NN ROB4

F2 DIVD F2,F10,F6
F2,F10,F6 NN ROB3
F10
F10 ADDD
ADDD F10,F4,F0
F10,F4,F0 NN ROB2 Oldest
F0
F0 LD
LD F0,10(R2)
F0,10(R2) NN ROB1

Registers To
Memory
Dest from
Dest
22 ADDD
ADDD R(F4),ROB1
R(F4),ROB1 Memory
66 ADDD 33 DIVD
DIVD ROB2,R(F6)
ADDD ROB5, R(F6)
ROB5, R(F6) ROB2,R(F6)
Dest
Reservation 11 10+R2
10+R2
Stations 55 0+R3
0+R3
FP
FPadders
adders FP
FPmultipliers
multipliers

04/21/25 CPE 731, ILP4 17


Tomasulo With Reorder buffer:
Done?
FP Op -- ROB7
-- ROB5
ROB5 ST
ST F4,0(R3)
F4,0(R3) NN Newest
Queue F0 ADDD ROB6
F0 ADDD F0,F4,F6
F0,F4,F6 NN
F4
F4 LD
LD F4,0(R3)
F4,0(R3) NN ROB5

Reorder Buffer --
--
F2
BNE
BNE F2,<…>
DIVD
F2,<…> NN ROB4

F2 DIVD F2,F10,F6
F2,F10,F6 NN ROB3
F10
F10 ADDD
ADDD F10,F4,F0
F10,F4,F0 NN ROB2 Oldest
F0
F0 LD
LD F0,10(R2)
F0,10(R2) NN ROB1

Registers To
Memory
Dest from
Dest
22 ADDD
ADDD R(F4),ROB1
R(F4),ROB1 Memory
66 ADDD 33 DIVD
DIVD ROB2,R(F6)
ADDD ROB5, R(F6)
ROB5, R(F6) ROB2,R(F6)
Dest
Reservation 11 10+R2
10+R2
Stations 55 0+R3
0+R3
FP
FPadders
adders FP
FPmultipliers
multipliers

04/21/25 CPE 731, ILP4 18


Tomasulo With Reorder buffer:
Done?
FP Op -- ROB7
-- M[10]
M[10] ST
ST F4,0(R3)
F4,0(R3) YY Newest
Queue F0 ADDD ROB6
F0 ADDD F0,F4,F6
F0,F4,F6 NN
F4
F4 M[10]
M[10] LD
LD F4,0(R3)
F4,0(R3) YY ROB5

Reorder Buffer --
--
F2
BNE
BNE F2,<…>
DIVD
F2,<…> NN ROB4

F2 DIVD F2,F10,F6
F2,F10,F6 NN ROB3
F10
F10 ADDD
ADDD F10,F4,F0
F10,F4,F0 NN ROB2 Oldest
F0
F0 LD
LD F0,10(R2)
F0,10(R2) NN ROB1

Registers To
Memory
Dest from
Dest
22 ADDD
ADDD R(F4),ROB1
R(F4),ROB1 Memory
66 ADDD M[10],R(F6) 33 DIVD
DIVD ROB2,R(F6)
ROB2,R(F6)
ADDD M[10],R(F6)
Dest
Reservation 11 10+R2
10+R2
Stations
FP
FPadders
adders FP
FPmultipliers
multipliers

04/21/25 CPE 731, ILP4 19


Tomasulo With Reorder buffer:
Done?
FP Op -- ROB7
-- M[10]
M[10] ST
ST F4,0(R3)
F4,0(R3) YY Newest
Queue F0 ROB6
F0 <val2>
<val2> ADDD
ADDD F0,F4,F6
F0,F4,F6 YY
F4
F4 M[10]
M[10] LD
LD F4,0(R3)
F4,0(R3) YY ROB5

Reorder Buffer --
--
F2
BNE
BNE F2,<…>
DIVD
F2,<…> NN ROB4

F2 DIVD F2,F10,F6
F2,F10,F6 NN ROB3
F10
F10 ADDD
ADDD F10,F4,F0
F10,F4,F0 NN ROB2 Oldest
F0
F0 LD
LD F0,10(R2)
F0,10(R2) NN ROB1

Registers To
Memory
Dest from
Dest
22 ADDD
ADDD R(F4),ROB1
R(F4),ROB1 Memory
33 DIVD
DIVD ROB2,R(F6)
ROB2,R(F6)
Dest
Reservation 11 10+R2
10+R2
Stations
FP
FPadders
adders FP
FPmultipliers
multipliers

04/21/25 CPE 731, ILP4 20


Tomasulo With Reorder buffer:
Done?
FP Op -- ROB7
-- M[10]
M[10] ST
ST F4,0(R3)
F4,0(R3) YY Newest
Queue F0 ROB6
F0 <val2>
<val2> ADDD
ADDD F0,F4,F6
F0,F4,F6 YY
F4
F4 M[10]
M[10] LD
LD F4,0(R3)
F4,0(R3) YY ROB5

Reorder Buffer --
--
F2
BNE
BNE F2,<…>
DIVD
F2,<…> NN ROB4

F2 DIVD F2,F10,F6
F2,F10,F6 NN ROB3
F10
F10 ADDD
ADDD F10,F4,F0
F10,F4,F0 NN ROB2 Oldest
F0
F0 M[20]
M[20] LD
LD F0,10(R2)
F0,10(R2) YY ROB1

Registers To
Memory
Dest from
Dest
22 ADDD
ADDD R(F4),M[20]
R(F4),M[20] Memory
33 DIVD
DIVD ROB2,R(F6)
ROB2,R(F6)
Dest
Reservation
Stations
FP
FPadders
adders FP
FPmultipliers
multipliers

04/21/25 CPE 731, ILP4 21


Tomasulo With Reorder buffer:
Done?
FP Op -- ROB7
-- M[10]
M[10] ST
ST F4,0(R3)
F4,0(R3) YY Newest
Queue F0 ROB6
F0 <val2>
<val2> ADDD
ADDD F0,F4,F6
F0,F4,F6 YY
F4
F4 M[10]
M[10] LD
LD F4,0(R3)
F4,0(R3) YY ROB5

Reorder Buffer --
--
F2
BNE
BNE F2,<…>
DIVD
F2,<…> NN ROB4

F2 DIVD F2,F10,F6
F2,F10,F6 NN ROB3
F10
F10 ADDD
ADDD F10,F4,F0
F10,F4,F0 NN ROB2 Oldest
ROB1

Registers To
Memory
Dest from
Dest
22 ADDD
ADDD R(F4),M[20]
R(F4),M[20] Memory
33 DIVD
DIVD ROB2,R(F6)
ROB2,R(F6)
Dest
Reservation
Stations
FP
FPadders
adders FP
FPmultipliers
multipliers

04/21/25 CPE 731, ILP4 22


Tomasulo With Reorder buffer:
Done?
FP Op -- ROB7
-- M[10]
M[10] ST
ST F4,0(R3)
F4,0(R3) YY Newest
Queue F0 ROB6
F0 <val2>
<val2> ADDD
ADDD F0,F4,F6
F0,F4,F6 YY
F4
F4 M[10]
M[10] LD
LD F4,0(R3)
F4,0(R3) YY ROB5

Reorder Buffer --
--
F2
BNE
BNE F2,<…>
DIVD
F2,<…> NN ROB4

F2 DIVD F2,F10,F6
F2,F10,F6 NN ROB3
F10
F10<val3>
<val3> ADDD
ADDD F10,F4,F0
F10,F4,F0 YY ROB2 Oldest
ROB1

Registers To
Memory
Dest from
Dest
Memory
33 DIVD
DIVD val3,R(F6)
val3,R(F6)
Dest
Reservation
Stations
FP
FPadders
adders FP
FPmultipliers
multipliers

04/21/25 CPE 731, ILP4 23


Tomasulo With Reorder buffer:
Done?
FP Op -- ROB7
-- M[10]
M[10] ST
ST F4,0(R3)
F4,0(R3) YY Newest
Queue F0 ROB6
F0 <val2>
<val2> ADDD
ADDD F0,F4,F6
F0,F4,F6 YY
F4
F4 M[10]
M[10] LD
LD F4,0(R3)
F4,0(R3) YY ROB5
-- BNE
BNE F2,<…> NN
Reorder Buffer --
F2
F2,<…> ROB4

F2 <val4>
<val4> DIVD
DIVD F2,F10,F6
F2,F10,F6 YY ROB3
ROB2 Oldest
ROB1

Registers To
Memory
Dest from
Dest
Memory
Dest
Reservation
Stations
FP
FPadders
adders FP
FPmultipliers
multipliers

04/21/25 CPE 731, ILP4 24


Tomasulo With Reorder buffer:
Done?
FP Op -- ROB7
-- M[10]
M[10] ST
ST F4,0(R3)
F4,0(R3) YY Newest
Queue F0 ROB6
F0 <val2>
<val2> ADDD
ADDD F0,F4,F6
F0,F4,F6 YY
F4
F4 M[10]
M[10] LD
LD F4,0(R3)
F4,0(R3) YY ROB5
--
-- Wrong BNE
BNE F2,<…> YY
Reorder Buffer Wrong F2,<…> ROB4
ROB3
ROB2 Oldest
ROB1

Registers To
Memory
Dest from
Dest
Memory
Dest
Reservation
Stations
FP
FPadders
adders FP
FPmultipliers
multipliers

04/21/25 CPE 731, ILP4 25


Tomasulo With Reorder buffer:
Done?
FP Op -- ROB7
-- M[10]
M[10] ST
ST F4,0(R3)
F4,0(R3) YY Newest
Queue F0 ROB6
F0 <val2>
<val2> ADDD
ADDD F0,F4,F6
F0,F4,F6 YY
F4
F4 M[10]
M[10] LD
LD F4,0(R3)
F4,0(R3) YY ROB5

Reorder Buffer ----


F2
BNE
BNE F2,<…>
DIVD
F2,<…> NN ROB4

F2 DIVD F2,F10,F6
F2,F10,F6 NN ROB3
F10
F10 ADDD
ADDD F10,F4,F0
F10,F4,F0 NN ROB2 Oldest
What about memory F0
F0 LD
LD F0,10(R2)
F0,10(R2) NN ROB1
hazards???
Registers To
Memory
Dest from
Dest
22 ADDD
ADDD R(F4),ROB1
R(F4),ROB1 Memory
33 DIVD
DIVD ROB2,R(F6)
ROB2,R(F6)
Dest
Reservation 11 10+R2
10+R2
Stations
FP
FPadders
adders FP
FPmultipliers
multipliers

04/21/25 CPE 731, ILP4 26

You might also like