0% found this document useful (0 votes)
60 views24 pages

Hardware

Hardware based speculation allows processors to speculatively execute instructions along predicted execution paths. A reorder buffer stores instruction results until commit to prevent irrevocable actions if a prediction is incorrect. Tomasulo's algorithm uses reservation stations to track dependencies and allow out-of-order execution in the presence of data hazards. The reorder buffer example shows a loop executing multiple dependent instructions and tracking their status through issue, execution and result writeback.

Uploaded by

priyanka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views24 pages

Hardware

Hardware based speculation allows processors to speculatively execute instructions along predicted execution paths. A reorder buffer stores instruction results until commit to prevent irrevocable actions if a prediction is incorrect. Tomasulo's algorithm uses reservation stations to track dependencies and allow out-of-order execution in the presence of data hazards. The reorder buffer example shows a loop executing multiple dependent instructions and tracking their status through issue, execution and result writeback.

Uploaded by

priyanka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Hardware based Speculation

● Execute instructions along predicted execution paths but


only commit the results if prediction was correct
● Instruction commit: allowing an instruction to update the
register file when instruction is no longer speculative
● Need an additional piece of hardware to prevent any
irrevocable action until an instruction commits
– Reorder Buffer
● In-order commit
● Stores instruction results before instruction commits
● Clear ROB on misprediction
● Exceptions
Tomasulo's Algorithm with Speculation
ROB – Loop Based Example
ROB
Entry Busy Instruction State Destination Value
1 no L.D F0, 0(R1) Commit F0 Mem[0+Regs[R1]]
2 no MUL.D F4, F0, F2 Commit F4 #1 * Regs[F2]
3 yes S.D F4, 0(R1) Write Result 0+Regs[R1] #2
4 yes DADDIU R1, R1, #-8 Write Result R1 Regs[R1]-8
5 yes BNE R1, R2, LOOP Write Result
6 yes L.D F0, 0(R1) Write Result F0 Mem[#4]
7 yes MUL.D F4, F0, F2 Write Result F4 #6 * Regs[F2]
8 yes S.D F4, 0(R1) Write Result 0+Regs[R1] #7
9 yes DADDIU R1, R1, #-8 Write Result R1 #4 - #8
10 yes BNE R1, R2, LOOP Write Result
Multiple Issue and Static Scheduling
To achieve CPI < 1, need to complete multiple
instructions per clock

● Statically scheduled superscalar processors


● VLIW (Very Long Instruction Word) processors
● Dynamically scheduled superscalar processors
Multiple Issue Processors
Dynamic Scheduling + Multiple Issue + Speculation
Limit the number of instructions of a given class that can be
issued in a “bundle”
I.e. on FP, one integer, one load, one store

Examine all the dependencies among the instructions in the


bundle

Also need multiple completion/commit


Dynamic Scheduling + Multiple Issue
2-way Superscalar
Instructions Issues Executes Mem Access Write CDB
at clock at clock at clock at clock
1 LD R2, 0(R1) 1 2 3 4
1 DADDIU R2, R2, #1 1 5 6
1 SD R2, 0(R1) 2 3 7
1 DADDIU R1, R1, #8 2 3 4
1 BNE R2, R3, L 3 7
2 LD R2, 0(R1) 4 8 9 10
2 DADDIU R2, R2, #1 4 11 12
2 SD R2, 0(R1) 5 9 13
2 DADDIU R1, R1, #8 5 8 9
2 BNE R2, R3, L 6 13
3 LD R2, 0(R1) 7 14 15 16
3 DADDIU R2, R2, #1 7 17 18
3 SD R2, 0(R1) 8 15 19
3 DADDIU R1, R1, #8 8 14 15
3 BNE R2, R3, L 9 19
Dynamic Scheduling + Multiple Issue + Speculation
2-way Superscalar
Instructions Issues Executes Mem Access Write CDB Commits at
at clock at clock at clock at clock clock
1 LD R2, 0(R1) 1 2 3 4 5
1 DADDIU R2, R2, #1 1 5 6 7
1 SD R2, 0(R1) 2 3 7
1 DADDIU R1, R1, #8 2 3 4 8
1 BNE R2, R3, L 3 7 8
2 LD R2, 0(R1) 4 5 6 7 9
2 DADDIU R2, R2, #1 4 8 9 10
2 SD R2, 0(R1) 5 6 10
2 DADDIU R1, R1, #8 5 6 7 11
2 BNE R2, R3, L 6 10 11
3 LD R2, 0(R1) 7 8 9 10 12
3 DADDIU R2, R2, #1 7 11 12 13
3 SD R2, 0(R1) 8 9 13
3 DADDIU R1, R1, #8 8 9 10 14
3 BNE R2, R3, L 9 13 14
Literature on Processors
● Efficient Reading of Papers in Science and Technolo
gy
● Yeager, The MIPS R10000 Processor, MICRO,
1996.
● Hinton et. al., The Microarchitecture of the Pentium 4
Processor. Intel Technology Journal Q1, 2001.
● Smith and Sohi. Microarchitecture of Superscalar
Processors. Proc. of IEEE. 1995.
● Kahle, et. al. Introduction to the Cell multiprocessor.
IBM J. RES. & DEV. 2005.
● Hammerlund, et. al., Haswell: The fourth generation
Intel Processor, MICRO 2014.
Extra
Tomasulo's - Loop based Example 1

Instruction Status
Instruction Issue Execute Write result
L.D F0, 0(R1) √
MUL.D F4, F0, F2 √
S.D F4, 0(R1) √
L.D F0, 0(R1) √
MUL.D F4, F0, F2 √
S.D F4, 0(R1) √
Reservation Stations
Name Busy Op Vj Vk Qj Qk A
Load1 yes
no Load Regs[R1] + 0
Load2 yes
no Load Regs[R1] - 8
Add1 no
Mult1 yes
no MUL Regs[F2] Load1
Mult2 yes
no MUL Regs[F2] Load2
Store1 yes
no Store Regs[R1]+0 Mult1
Store2 yes
no Store Regs[R1]-8 Mult2

Register Status
Field F0 F2 F4 F6 F8 F10 12 ... F30
Qi Load1
Load2 Mult2
Mult1
Tomasulo's - Loop based Example 2

Instruction Status
Instruction Issue Execute Write result
L.D F0, 0(R1) √ √
MUL.D F4, F0, F2 √
S.D F4, 0(R1) √
L.D F0, 0(R1) √
MUL.D F4, F0, F2 √
S.D F4, 0(R1) √
Reservation Stations
Name Busy Op Vj Vk Qj Qk A
Load1 yes Load Regs[R1] + 0
Load2 yes Load Regs[R1] - 8
Add1 no
Mult1 yes MUL Regs[F2] Load1
Mult2 yes MUL Regs[F2] Load2
Store1 yes Store Regs[R1]+0 Mult1
Store2 yes Store Regs[R1]-8 Mult2

Register Status
Field F0 F2 F4 F6 F8 F10 12 ... F30
Qi Load2 Mult2
Tomasulo's - Loop based Example 3

Instruction Status
Instruction Issue Execute Write result
L.D F0, 0(R1) √ √ √
MUL.D F4, F0, F2 √
S.D F4, 0(R1) √
L.D F0, 0(R1) √ √
MUL.D F4, F0, F2 √
S.D F4, 0(R1) √
Reservation Stations
Name Busy Op Vj Vk Qj Qk A
Load1 yes
no Load Regs[R1] + 0
Load2 yes Load Regs[R1] - 8
Add1 no
Mult1 yes MUL Mem[Regs[R1] + 0] Regs[F2] Load1
Mult2 yes MUL Regs[F2] Load2
Store1 yes Store Regs[R1]+0 Mult1
Store2 yes Store Regs[R1]-8 Mult2

Register Status
Field F0 F2 F4 F6 F8 F10 12 ... F30
Qi Load2 Mult2
Tomasulo's - Loop based Example 4

Instruction Status
Instruction Issue Execute Write result
L.D F0, 0(R1) √ √ √
MUL.D F4, F0, F2 √ √
S.D F4, 0(R1) √
L.D F0, 0(R1) √ √ √
MUL.D F4, F0, F2 √
S.D F4, 0(R1) √
Reservation Stations
Name Busy Op Vj Vk Qj Qk A
Load1 no
Load2 yes
no Load Regs[R1] - 8
Add1 no
Mult1 yes MUL Mem[Regs[R1] + 0] Regs[F2]
Mult2 yes MUL Regs[F2] Load2
Store1 yes Store Regs[R1]+0 Mult1
Store2 yes Store Regs[R1]-8 Mult2

M: Register Status
4 Field F0 F2 F4 F6 F8 F10 12 ... F30
Qi Load2 Mult2
Tomasulo's - Loop based Example 5

Instruction Status
Instruction Issue Execute Write result
L.D F0, 0(R1) √ √ √
MUL.D F4, F0, F2 √ √
S.D F4, 0(R1) √
L.D F0, 0(R1) √ √ √
MUL.D F4, F0, F2 √
S.D F4, 0(R1) √
Reservation Stations
Name Busy Op Vj Vk Qj Qk A
Load1 no
Load2 no
Add1 no
Mult1 yes MUL Mem[Regs[R1] + 0] Regs[F2]
Mult2 yes MUL Mem[Regs[R1] - 8] Regs[F2] Load2
Store1 yes Store Regs[R1]+0 Mult1
Store2 yes Store Regs[R1]-8 Mult2

M: Register Status
4 Field F0 F2 F4 F6 F8 F10 12 ... F30
Qi Mult2
Tomasulo's - Loop based Example 6

Instruction Status
Instruction Issue Execute Write result
L.D F0, 0(R1) √ √ √
MUL.D F4, F0, F2 √ √
S.D F4, 0(R1) √
L.D F0, 0(R1) √ √ √
MUL.D F4, F0, F2 √ √
S.D F4, 0(R1) √
Reservation Stations
Name Busy Op Vj Vk Qj Qk A
Load1 no
Load2 no
Add1 no
Mult1 yes MUL Mem[Regs[R1] + 0] Regs[F2]
Mult2 yes MUL Mem[Regs[R1] - 8] Regs[F2]
Store1 yes Store Regs[R1]+0 Mult1
Store2 yes Store Regs[R1]-8 Mult2

M: Register Status
4 Field F0 F2 F4 F6 F8 F10 12 ... F30
Qi Mult2
Tomasulo's - Loop based Example 7

Instruction Status
Instruction Issue Execute Write result
L.D F0, 0(R1) √ √ √
MUL.D F4, F0, F2 √ √
S.D F4, 0(R1) √
L.D F0, 0(R1) √ √ √
MUL.D F4, F0, F2 √ √
S.D F4, 0(R1) √
Reservation Stations
Name Busy Op Vj Vk Qj Qk A
Load1 no
Load2 no
Add1 no
Mult1 yes MUL Mem[Regs[R1] + 0] Regs[F2]
Mult2 yes MUL Mem[Regs[R1] - 8] Regs[F2]
Store1 yes Store Regs[R1]+0 Mult1
Store2 yes Store Regs[R1]-8 Mult2

M: Register Status
4 Field F0 F2 F4 F6 F8 F10 12 ... F30
Qi Mult2
Tomasulo's - Loop based Example 8

Instruction Status
Instruction Issue Execute Write result
L.D F0, 0(R1) √ √ √
MUL.D F4, F0, F2 √ √ √
S.D F4, 0(R1) √
L.D F0, 0(R1) √ √ √
MUL.D F4, F0, F2 √ √
S.D F4, 0(R1) √
Reservation Stations
Name Busy Op Vj Vk Qj Qk A
Load1 no
Load2 no
Add1 no
Mult1 yes
no MUL Mem[Regs[R1] + 0] Regs[F2]
Mult2 yes MUL Mem[Regs[R1] - 8] Regs[F2]
Store1 yes Store Regs[R1]+0 Mul[F4] Mult1
Store2 yes Store Regs[R1]-8 Mult2

M: Register Status
4 Field F0 F2 F4 F6 F8 F10 12 ... F30
Qi Mult2
Tomasulo's - Loop based Example 9

Instruction Status
Instruction Issue Execute Write result
L.D F0, 0(R1) √ √ √
MUL.D F4, F0, F2 √ √ √
S.D F4, 0(R1) √ √
L.D F0, 0(R1) √ √ √
MUL.D F4, F0, F2 √ √
S.D F4, 0(R1) √
Reservation Stations
Name Busy Op Vj Vk Qj Qk A
Load1 no
Load2 no
Add1 no
Mult1 no
Mult2 yes MUL Mem[Regs[R1] - 8] Regs[F2]
Store1 yes Store Regs[R1]+0 Mul[F4]
Store2 yes Store Regs[R1]-8 Mult2

M: Register Status
4 Field F0 F2 F4 F6 F8 F10 12 ... F30
Qi Mult2
Tomasulo's - Loop based Example 10

Instruction Status
Instruction Issue Execute Write result
L.D F0, 0(R1) √ √ √
MUL.D F4, F0, F2 √ √ √
S.D F4, 0(R1) √ √ √
L.D F0, 0(R1) √ √ √
MUL.D F4, F0, F2 √ √ √
S.D F4, 0(R1) √
Reservation Stations
Name Busy Op Vj Vk Qj Qk A
Load1 no
Load2 no
Add1 no
Mult1 no
Mult2 yes
no MUL Mem[Regs[R1] - 8] Regs[F2]
Store1 yes
no Store Regs[R1]+0 Mul[F4]
Store2 yes Store Regs[R1]-8 Mem[Regs[R1] - 8] Mult2

M: Register Status
4 Field F0 F2 F4 F6 F8 F10 12 ... F30
Qi Mult2
Tomasulo's - Loop based Example 11

Instruction Status
Instruction Issue Execute Write result
L.D F0, 0(R1) √ √ √
MUL.D F4, F0, F2 √ √ √
S.D F4, 0(R1) √ √ √
L.D F0, 0(R1) √ √ √
MUL.D F4, F0, F2 √ √ √
S.D F4, 0(R1) √ √
Reservation Stations
Name Busy Op Vj Vk Qj Qk A
Load1 no
Load2 no
Add1 no
Mult1 no
Mult2 no
Store1 no
Store2 yes Store Regs[R1]-8 Mem[Regs[R1] - 8]

M: Register Status
4 Field F0 F2 F4 F6 F8 F10 12 ... F30
Qi
Tomasulo's - Loop based Example 12

Instruction Status
Instruction Issue Execute Write result
L.D F0, 0(R1) √ √ √
MUL.D F4, F0, F2 √ √ √
S.D F4, 0(R1) √ √ √
L.D F0, 0(R1) √ √ √
MUL.D F4, F0, F2 √ √ √
S.D F4, 0(R1) √ √ √
Reservation Stations
Name Busy Op Vj Vk Qj Qk A
Load1 no
Load2 no
Add1 no
Mult1 no
Mult2 no
Store1 no
Store2 yes Store Regs[R1]-8 Mem[Regs[R1] - 8]

M: Register Status
4 Field F0 F2 F4 F6 F8 F10 12 ... F30
Qi
VLIW Example

● Performance?
● Overhead?

You might also like