0% found this document useful (0 votes)
12 views16 pages

ILP2

Uploaded by

ritik12041998
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views16 pages

ILP2

Uploaded by

ritik12041998
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Speculation for greater ILP

• 3 components of HW-based speculation:


1. Dynamic branch prediction to choose which
instructions to execute
2. Speculation to allow execution of instructions before
control dependences are resolved
+ ability to undo effects of incorrectly speculated sequence
3. Dynamic scheduling to deal with scheduling of
different combinations of basic blocks
Adding Speculation to Tomasulo
• Must separate execution from allowing instruction to
finish or “commit”
• This additional step called instruction commit
• When an instruction is no longer speculative, allow
it to update the register file or memory
• Requires additional set of buffers to hold results of
instructions that have finished execution but have
not committed
• This reorder buffer (ROB) is also used to pass
results among instructions that may be speculated
Reorder Buffer (ROB)
• In Tomasulo’s algorithm, once an instruction writes
its result, any subsequently issued instructions will
find result in the register file
• With speculation, the register file is not updated until
the instruction commits
– (we know definitively that the instruction should execute)
• Thus, the ROB supplies operands in interval between
completion of instruction execution and instruction
commit
– ROB is a source of operands for instructions, just as reservation
stations (RS) provide operands in Tomasulo’s algorithm
– ROB extends architectured registers like RS
Reorder Buffer Entry
• Each entry in the ROB contains four fields:
1. Instruction type
• a branch (has no destination result), a store (has a memory
address destination), or a register operation (ALU operation or
load, which has register destinations)
2. Destination
• Register number (for loads and ALU operations) or
memory address (for stores)
where the instruction result should be written
3. Value
• Value of instruction result until the instruction commits
4. Ready
• Indicates that instruction has completed execution, and the value
is ready
Reorder Buffer operation
• Holds instructions in FIFO order, exactly as issued
• When instructions complete, results placed into ROB
– Supplies operands to other instruction between execution
complete & commit  more registers like RS
– Tag results with ROB buffer number instead of reservation station
• Instructions commit values at head of ROB placed in
registers
• As a result, easy to undo Reorder
speculated instructions FP Buffer
on mispredicted branches Op
or on exceptions Queue FP Regs
Commit path

Res Stations Res Stations


FP Adder FP Adder
Recall: 4 Steps of Speculative Tomasulo
Algorithm
1. Issue—get instruction from FP Op Queue
If reservation station and reorder buffer slot free, issue instr & send
operands & reorder buffer no. for destination (this stage sometimes
called “dispatch”)
2. Execution—operate on operands (EX)
When both operands ready then execute; if not ready, watch CDB
for result; when both in reservation station, execute; checks RAW
3. Write result—finish execution (WB)
Write on Common Data Bus to all awaiting FUs
& reorder buffer; mark reservation station available.
4. Commit—update register with reorder result
When instr. at head of reorder buffer & result present, update
register with result (or store to memory) and remove instr from
reorder buffer. Mispredicted branch flushes reorder buffer
(sometimes called “graduation”)
Speculative Tomasulo Example
LD F0 10 R2
ADDD F10 F4 F0
DIVD F2 F10 F6
BNEZ F2 Exit
LD F4 0 R3
ADDD F0 F4 F9
SD F4 0 R3

Exit:
Tomasulo With Reorder buffer:
Done?
FP Op ROB7 Newest
Queue ROB6
ROB5

Reorder Buffer
ROB4
ROB3
ROB2
Oldest
F0 LD F0,10(R2) N ROB1

Registers To
Memory
Dest from
Dest
Memory
Dest
Reservation 1 10+R2
Stations
FP adders FP multipliers
Tomasulo With Reorder buffer:
Done?
FP Op ROB7 Newest
Queue ROB6
ROB5

Reorder Buffer
ROB4
ROB3
F10 ADDD F10,F4,F0 N ROB2
Oldest
F0 LD F0,10(R2) N ROB1

Registers To
Memory
Dest from
Dest
2 ADDD R(F4),ROB1 Memory
Dest
Reservation 1 10+R2
Stations
FP adders FP multipliers
Tomasulo With Reorder buffer:
Done?
FP Op ROB7 Newest
Queue ROB6
ROB5

Reorder Buffer
ROB4
F2 DIVD F2,F10,F6 N ROB3
F10 ADDD F10,F4,F0 N ROB2
Oldest
F0 LD F0,10(R2) N ROB1

Registers To
Memory
Dest from
Dest
2 ADDD R(F4),ROB1 Memory
3 DIVD ROB2,R(F6)
Dest
Reservation 1 10+R2
Stations
FP adders FP multipliers
Tomasulo With Reorder buffer:
Done?
FP Op ROB7 Newest
Queue F0 ADDD F0,F4,F6 N ROB6
F4 LD F4,0(R3) N ROB5

Reorder Buffer -- BNEZ F2,<…> N ROB4


F2 DIVD F2,F10,F6 N ROB3
F10 ADDD F10,F4,F0 N ROB2
Oldest
F0 LD F0,10(R2) N ROB1

Registers To
Memory
Dest from
Dest
2 ADDD R(F4),ROB1 Memory
6 ADDD ROB5, R(F6) 3 DIVD ROB2,R(F6)
Dest
Reservation 1 10+R2
Stations 5 0+R3
FP adders FP multipliers
Tomasulo With Reorder buffer:
Done?
FP Op -- ROB5 ST 0(R3),F4 N ROB7 Newest
Queue F0 ADDD F0,F4,F6 N ROB6
F4 LD F4,0(R3) N ROB5

Reorder Buffer --
F2
BNEZ F2,<…> N ROB4
DIVD F2,F10,F6 N ROB3
F10 ADDD F10,F4,F0 N ROB2
Oldest
F0 LD F0,10(R2) N ROB1

Registers To
Memory
Dest from
Dest
2 ADDD R(F4),ROB1 Memory
6 ADDD ROB5, R(F6) 3 DIVD ROB2,R(F6)
Dest
Reservation 1 10+R2
Stations 5 0+R3
FP adders FP multipliers
Tomasulo With Reorder buffer:
Done?
FP Op -- M[10] ST 0(R3),F4 Y ROB7 Newest
Queue F0 ADDD F0,F4,F6 N ROB6
F4 M[10] LD F4,0(R3) Y ROB5

Reorder Buffer --
F2
BNEZ F2,<…> N ROB4
DIVD F2,F10,F6 N ROB3
F10 ADDD F10,F4,F0 N ROB2
Oldest
F0 LD F0,10(R2) N ROB1

Registers To
Memory
Dest from
Dest
2 ADDD R(F4),ROB1 Memory
6 ADDD M[10],R(F6) 3 DIVD ROB2,R(F6)
Dest
Reservation 1 10+R2
Stations
FP adders FP multipliers
Tomasulo With Reorder buffer:
Done?
FP Op -- M[10] ST 0(R3),F4 Y ROB7 Newest
Queue F0 <val2> ADDD F0,F4,F6 Ex ROB6
F4 M[10] LD F4,0(R3) Y ROB5

Reorder Buffer --
F2
BNEZ F2,<…> N ROB4
DIVD F2,F10,F6 N ROB3
F10 ADDD F10,F4,F0 N ROB2
Oldest
F0 LD F0,10(R2) N ROB1

Registers To
Memory
Dest from
Dest
2 ADDD R(F4),ROB1 Memory
3 DIVD ROB2,R(F6)
Dest
Reservation 1 10+R2
Stations
FP adders FP multipliers
Tomasulo With Reorder buffer:
Done?
FP Op -- M[10] ST 0(R3),F4 Y ROB7 Newest
Queue F0 <val2> ADDD F0,F4,F6 Ex ROB6
F4 M[10] LD F4,0(R3) Y ROB5

Reorder Buffer --
F2
BNEZ F2,<…> N ROB4
DIVD F2,F10,F6 N ROB3
F10 ADDD F10,F4,F0 N ROB2
Oldest
What about memory F0 LD F0,10(R2) N ROB1
hazards???
Registers To
Memory
Dest from
Dest
2 ADDD R(F4),ROB1 Memory
3 DIVD ROB2,R(F6)
Dest
Reservation 1 10+R2
Stations
FP adders FP multipliers
Summary
• Reservations stations: implicit register renaming to larger set of
registers + buffering source operands
– Prevents registers as bottleneck
– Avoids WAR, WAW hazards
– Allows loop unrolling in HW
• Not limited to basic blocks
• Today, helps cache misses as well
– Don’t stall for L1 Data cache miss
• Lasting Contributions
– Dynamic scheduling
– Register renaming
• 360/91 descendants are Pentium III; PowerPC 604; MIPS R10000;
HP-PA 8000; Alpha 21264

You might also like