Computer Science 146 Computer Architecture
Computer Science 146 Computer Architecture
Computer Architecture
Spring 2004
Harvard University
Instructor: Prof. David Brooks
[email protected]
Lecture 8: Multiple Issue and Speculation
Lecture Outline
Dynamic Branch Predictor Review
Superscalar/Multiple-Issue Designs
Speculative Execution
Tomasulo with ROB example
Multiple Issue
Goal: Sustain a CPI of less than 1 by issuing and
processing multiple instructions per cycle
SuperScalar
Issue varying number of instructions per clock
Statically Scheduled
Dynamically Scheduled
VLIW (EPIC)
Issue a fixed number of instructions formatted as one
large instruction or instruction packet
Similar to static-scheduled superscalar
Computer Science 146
David Brooks
Issue
Structure
Hazard
Detection
Scheduling
Examples
Superscalar
(static)
Dynamic
Hardware
Static
Superscalar
(dynamic)
Dynamic
Hardware
Dynamic
IBM POWER2
Superscalar Dynamic
(speculative)
Hardware
Dynamic with
speculation
VLIW
Static
Software
Static
Trimedia, i860
EPIC
mostly
static
mostly
software
mostly static
Itanium (IA64)
IF
ID
EX
WB
IF
ID
EX
WB
IF
ID
EX
IF
ID
IF
EX
10
IF
ID
EX
WB
IF
ID
EX
WB
WB
IF
ID
EX
WB
IF
ID
EX
WB
IF
ID
EX
WB
IF
ID
EX
WB
WB
ID
EX
WB
IF
ID
EX
WB
Maybe 1 ALU + 1 FP
2 ALU + 2 LD/ST + 2 FP
Many combinations possible restriction ease implementation
Computer Science 146
David Brooks
Structural Hazards
If both instructions per cycle are int/float we may
need two int ALUs and two FP ALUs
What about register files?
This may lead to issue restrictions
Compiler/hardware has to manage these restrictions
Data Hazards
ADD R1, R2, R3
ADD R4, R5, R6
ADD R8, R1, R7
ADD R10, R9, R1
Assume full-bypassing
How many stalls for single issue?
How many stalls for dual issue?
Full bypassing?
Not easy
Control Hazards
Multiple Issue Clock Cycle
Branch
3 Branch
Delay Slots
IF
ID
EX
WB
i+1
IF
ID
EX
WB
i+2
IF
ID
EX
WB
i+3
IF
ID
EX
WB
i+4
IF
ID
EX
WB
i+5
IF
ID
EX
WB
10
M1
M2
WB
IF1
IF2
ID
EX
M1
M2
WB
IF1
IF2
ID
EX
M1
M2
i+1
IF1
i+2
i+3
10
WB
IF2
ID
EX
M1
M2
WB
IF1
IF2
ID
EX
M1
M2
WB
IF1
IF2
ID
EX
M1
M2
IF1
IF2
ID
EX
M1
i+4
i+5
IF1
IF2
ID
EX
M1
M2
WB
i+1
IF1
10
IF2
ID
EX
M1
M2
WB
i+2
IF1
IF2
ID
EX
M1
M2
WB
i+3
IF1
IF2
ID
EX
M1
M2
WB
i+4
IF1
IF2
ID
EX
M1
M2
WB
i+5
IF1
IF2
ID
EX
M1
M2
WB
i+6
IF1
IF2
ID
EX
M1
M2
WB
i+7
IF1
IF2
ID
EX
M1
M2
WB
i+8
IF1
IF2
ID
EX
M1
M2
i+9
IF1
IF2
ID
EX
M1
M2
Scheduling/Hazard elimination
Dynamic Scheduling with Tomasulo (RAW Hazards)
Register Renaming (WAR and WAW Hazards)
Speculative Execution
Precise Interrupts
Memory systems (later this semester)
Computer Science 146
David Brooks
Focus on Speculation/Interrupts
Precise Interrupts
All instructions before interrupt must complete
All instructions after interrupt must seem to never start
Out-of-Order completion
Post-interrupt/mispredict writebacks change state
Does Out-of-Order scheduling require this?
Computer Science 146
David Brooks
Reorder
Buffer
FP Regs
Res Stations
FP Adder
10
Busy
Op
Vj
Vk
Qj
Qk
Dest
Add1
No
Reservation
Add2
No
Stations
Add3
No
Mult1
No
Mult2
No
Busy
Entry Busy
Instruction
State
Destination Value
Address
Load1
Load2
Load3
3
4
5
6
Reorder Buffer
7
8
9
10
F0
F2
F4
F6
F8
F10
F12
Busy no
no
no
no
no
no
no
...
F30
Reorder #
no
11
Busy
Op
Vj
Vk
Qj
Qk
Dest
Add1
No
Reservation
Add2
No
Stations
Add3
No
Mult1
No
Mult2
No
Busy
Entry Busy
1
Yes
Instruction
State
Destination Value
Load1 Yes
LD F6, 34(R2)
Issue
F6
Load2
Address
34+Regs[R2]
Load3
3
4
5
6
Reorder Buffer
7
8
9
10
F0
F2
F4
Reorder #
F6
F8
F10
F12
no
no
no
...
F30
#1
Busy no
no
no
Yes
no
Busy
Op
Vj
Vk
Qj
Qk
Dest
Add1
No
Reservation
Add2
No
Stations
Add3
No
Mult1
No
Mult2
No
Busy
Entry Busy
head
tail
Address
Instruction
State
Destination Value
Load1 Yes
34+Regs[R2]
45+Regs[R3]
Yes
LD F6, 34(R2)
Ex1
F6
Load2 Yes
Yes
LD F2, 45(R3)
Issue
F2
Load3
3
4
5
6
Reorder Buffer
7
8
9
10
F0
Reorder #
Busy no
F2
F4
#2
Yes
F6
F8
F10
F12
no
no
no
...
F30
#1
no
Yes
no
12
Busy
Op
Vj
Vk
Qj
Qk
Dest
Add1
No
Reservation
Add2
No
Stations
Add3
No
Mult1
Yes
Mult2
No
Mult
Regs[F4]
#2
#3
Instruction
State
Destination Value
Load1 No
Load2 Yes
Busy
Entry Busy
head
tail
Yes
LD F6, 34(R2)
write
F6
Yes
LD F2, 45(R3)
Ex1
F2
Yes
Issue
F0
Mem[load1]
Address
45+Regs[R3]
Load3
4
5
6
Reorder Buffer
7
8
9
10
F0
F2
Reorder # #3
#2
Busy Yes
Yes
F4
F6
F8
F10
F12
no
no
no
...
F30
#1
no
Yes
no
Busy
Op
Vj
Vk
Add1
Yes
SUB
Regs[F6]
Mem[45+Regs[R3]]
Qj
Add2
No
Add3
No
Mult1
Yes
Mult2
No
Qk
Dest
#4
Reservation
Stations
Mult
Mem[45+Regs[R3]] Regs[F4]
#3
Busy
Entry Busy
head
tail
Instruction
State
Destination Value
Load1 No
No
LD F6, 34(R2)
commit
F6
Mem[load1]
Load2 No
Yes
LD F2, 45(R3)
write
F2
Mem[load2]
Load3
Yes
EX1
F0
Yes
Issue
F8
Address
5
6
Reorder Buffer
7
8
9
10
F0
F2
Reorder # #3
#2
Busy Yes
Yes
F4
F6
F8
F10
F12
no
no
...
F30
#4
no
no
Yes
no
13
Busy
Op
Vj
Vk
Add1
Yes
SUB
Regs[F6]
Mem[45+Regs[R3]]
Qj
Add2
No
Add3
No
Mult1
Yes
Mult
Mult2
Yes
DIV
Qk
Dest
#4
Reservation
Stations
Mem[45+Regs[R3]] Regs[F4]
#3
Regs[F6]
#3
#5
Instruction
State
Destination Value
Load1 No
Busy
Entry Busy
head
tail
No
LD F6, 34(R2)
commit
F6
Mem[load1]
Load2 No
No
LD F2, 45(R3)
commit
F2
Mem[load2]
Load3
Yes
Ex2
F0
Yes
Ex1
F8
Yes
Issue
F10
Address
Reorder Buffer
7
8
9
10
F0
F2
F4
F6
Reorder # #3
Busy Yes
no
no
no
F8
F10
#4
#5
Yes
Yes
F12
...
no
F30
no
Busy
Op
Vj
Vk
Add1
Yes
SUB
Regs[F6]
Mem[45+Regs[R3]]
Qj
Qk
Dest
#4
Reservation
Add2
Yes
Add
Regs[F2]
#4
#6
Stations
Add3
No
Mult1
Yes
Mult
Mem[45+Regs[R3]] Regs[F4]
Mult2
Yes
DIV
Regs[F6]
#3
#5
Instruction
State
Destination Value
Load1 No
#3
Busy
Entry Busy
head
tail
No
LD F6, 34(R2)
commit
F6
Mem[load1]
Load2 No
No
LD F2, 45(R3)
commit
F2
Mem[load2]
Load3
Yes
Ex3
F0
Yes
Ex2
F8
Yes
Issue
F10
Yes
Issue
F6
F2
F4
F6
F8
F10
#6
#4
#5
Yes
Yes
Yes
Address
Reorder Buffer
7
8
9
10
F0
Reorder # #3
Busy Yes
no
no
F12
no
...
F30
no
14
Busy
Op
Vj
Vk
Add
#4
Regs[F2]
Yes
Mult
Mem[45+Regs[R3]] Regs[F4]
Yes
DIV
Add1
No
Add2
Yes
Add3
No
Mult1
Mult2
Qj
Qk
Dest
Reservation
#6
Stations
#3
Regs[F6]
#3
#5
Instruction
State
Destination Value
Load1 No
Busy
Entry Busy
head
tail
No
LD F6, 34(R2)
commit
F6
Mem[load1]
Load2 No
No
LD F2, 45(R3)
commit
F2
Mem[load2]
Load3
Yes
Ex4
F0
Yes
write
F8
Yes
Issue
F10
Yes
EX1
F6
F2
F4
F6
F8
F10
#6
#4
#5
Yes
Yes
Yes
Address
F6 - #2
Reorder Buffer
7
8
9
10
F0
Reorder # #3
Busy Yes
no
no
F12
...
no
F30
no
Busy
Op
Vj
Vk
Add
#4
Regs[F2]
Yes
Mult
Mem[45+Regs[R3]] Regs[F4]
Yes
DIV
Add1
No
Add2
Yes
Add3
No
Mult1
Mult2
Qj
Qk
Dest
Reservation
#6
Stations
#3
Regs[F6]
#3
#5
Instruction
State
Destination Value
Load1 No
Busy
Entry Busy
head
tail
No
LD F6, 34(R2)
commit
F6
Mem[load1]
Load2 No
No
LD F2, 45(R3)
commit
F2
Mem[load2]
Load3
Yes
Ex5
F0
Yes
write
F8
Yes
Issue
F10
Yes
Ex2
F6
F2
F4
F6
F8
F10
#6
#4
#5
Yes
Yes
Yes
Address
F6 - #2
Reorder Buffer
7
8
9
10
F0
Reorder # #3
Busy Yes
no
no
F12
no
...
F30
no
15
Busy
Op
Vj
Vk
Add
#4
Regs[F2]
Yes
Mult
Mem[45+Regs[R3]] Regs[F4]
Yes
DIV
Add1
No
Add2
Yes
Add3
No
Mult1
Mult2
Qj
Qk
Dest
Reservation
#6
Stations
#3
Regs[F6]
#3
#5
Instruction
State
Destination Value
Load1 No
Busy
Entry Busy
head
tail
No
LD F6, 34(R2)
commit
F6
Mem[load1]
Load2 No
No
LD F2, 45(R3)
commit
F2
Mem[load2]
Load3
Yes
Ex6
F0
Yes
write
F8
Yes
Issue
F10
Yes
write
F6
#4 + F2
Reorder Buffer
F2
F4
F6
F8
F10
F12
#6
#4
#5
Yes
Yes
Yes
Address
F6 - #2
7
8
9
10
F0
Reorder # #3
Busy Yes
no
no
...
no
F30
no
Busy
Op
Vj
Vk
Qj
Qk
Dest
Add1
No
Reservation
Add2
No
Stations
Add3
No
Mult1
Yes
Mult
Mult2
Yes
DIV
Mem[45+Regs[R3]] Regs[F4]
#3
Regs[F6]
#3
#5
Instruction
State
Destination Value
Load1 No
Busy
Entry Busy
head
tail
No
LD F6, 34(R2)
commit
F6
Mem[load1]
Load2 No
No
LD F2, 45(R3)
commit
F2
Mem[load2]
Load3
Yes
Ex7
F0
Yes
write
F8
Yes
Issue
F10
Yes
write
F6
#4 + F2
Reorder Buffer
F2
F4
F6
F8
F10
F12
#6
#4
#5
Yes
Yes
Yes
Address
F6 - #2
7
8
9
10
F0
Reorder # #3
Busy Yes
no
no
no
...
F30
no
16
Busy
Op
Vj
Vk
Qj
Qk
Dest
Add1
No
Reservation
Add2
No
Stations
Add3
No
Mult1
Yes
Mult
Mult2
Yes
DIV
Mem[45+Regs[R3]] Regs[F4]
#3
Regs[F6]
#3
#5
Instruction
State
Destination Value
Load1 No
Busy
Entry Busy
head
tail
No
LD F6, 34(R2)
commit
F6
Mem[load1]
Load2 No
No
LD F2, 45(R3)
commit
F2
Mem[load2]
Load3
Yes
Ex8
F0
Yes
write
F8
Yes
Issue
F10
Yes
write
F6
#4 + F2
Reorder Buffer
F2
F4
F6
F8
F10
F12
#6
#4
#5
Yes
Yes
Yes
Address
F6 - #2
7
8
9
10
F0
Reorder # #3
Busy Yes
no
no
...
no
F30
no
Busy
Op
Vj
Vk
Qj
Qk
Dest
Add1
No
Reservation
Add2
No
Stations
Add3
No
Mult1
Yes
Mult
Mult2
Yes
DIV
Mem[45+Regs[R3]] Regs[F4]
#3
Regs[F6]
#3
#5
Instruction
State
Destination Value
Load1 No
Busy
Entry Busy
head
tail
No
LD F6, 34(R2)
commit
F6
Mem[load1]
Load2 No
No
LD F2, 45(R3)
commit
F2
Mem[load2]
Load3
Yes
Ex9
F0
Yes
write
F8
Yes
Issue
F10
Yes
write
F6
#4 + F2
Reorder Buffer
F2
F4
F6
F8
F10
F12
#6
#4
#5
Yes
Yes
Yes
Address
F6 - #2
7
8
9
10
F0
Reorder # #3
Busy Yes
no
no
no
...
F30
no
17
Busy
Op
Vj
Vk
Qj
Qk
Dest
Add1
No
Reservation
Add2
No
Stations
Add3
No
Mult1
No
Mult2
Yes
DIV
#2xRegs[F4]
Regs[F6]
#5
Instruction
State
Destination Value
Load1 No
Busy
Entry Busy
head
tail
No
LD F6, 34(R2)
commit
F6
Mem[load1]
Load2 No
No
LD F2, 45(R3)
commit
F2
Mem[load2]
Load3
Yes
write
F0
#2 x Regs[F4]
Yes
write
F8
F6 - #2
Yes
Ex1
F10
Yes
write
F6
#4 + F2
Address
Reorder Buffer
Figure 3.30
P 230
9
10
F0
F2
F4
Reorder # #3
Busy Yes
no
no
F6
F8
F10
#6
#4
#5
Yes
Yes
Yes
F12
...
no
F30
no
Busy
Op
Vj
Vk
Qj
Qk
Dest
Add1
No
Reservation
Add2
No
Stations
Add3
No
Mult1
No
Mult2
Yes
DIV
#2xRegs[F4]
Regs[F6]
#5
Instruction
State
Destination Value
Load1 No
Busy
Entry Busy
head
tail
No
LD F6, 34(R2)
commit
F6
Mem[load1]
Load2 No
No
LD F2, 45(R3)
commit
F2
Mem[load2]
Load3
No
commit
F0
#2 x Regs[F4]
Yes
write
F8
F6 - #2
Yes
Ex2
F10
Yes
write
F6
#4 + F2
Reorder Buffer
F2
F4
F6
F8
F10
F12
#6
#4
#5
Yes
Yes
Yes
Address
7
8
9
10
F0
Reorder #
Busy No
no
no
no
...
F30
no
18
Busy
Op
Vj
Vk
Qj
Qk
Dest
Add1
No
Reservation
Add2
No
Stations
Add3
No
Mult1
No
Mult2
Yes
DIV
#2xRegs[F4]
Regs[F6]
#5
Instruction
State
Destination Value
Load1 No
Busy
Entry Busy
head
tail
No
LD F6, 34(R2)
commit
F6
Mem[load1]
Load2 No
No
LD F2, 45(R3)
commit
F2
Mem[load2]
Load3
No
commit
F0
#2 x Regs[F4]
No
commit
F8
F6 - #2
Yes
Ex3
F10
Yes
write
F6
#4 + F2
Reorder Buffer
F2
F4
F6
F8
F10
F12
Address
7
8
9
10
F0
Reorder #
#6
Busy no
no
no
Yes
...
F30
#5
no
Yes
no
no
Busy
Op
Vj
Vk
Qj
Qk
Dest
Add1
No
Reservation
Add2
No
Stations
Add3
No
Mult1
No
Mult2
Yes
DIV
#2xRegs[F4]
Regs[F6]
#5
Instruction
State
Destination Value
Load1 No
Busy
Entry Busy
head
tail
No
LD F6, 34(R2)
commit
F6
Mem[load1]
Load2 No
No
LD F2, 45(R3)
commit
F2
Mem[load2]
Load3
No
commit
F0
#2 x Regs[F4]
No
commit
F8
F6 - #2
Yes
Ex4
F10
Yes
write
F6
#4 + F2
Reorder Buffer
Need 36 more
EX cycles for
DIV to finish
8
9
10
F0
F2
F4
Reorder #
Busy no
Address
F6
F8
#6
no
no
Yes
F10
F12
...
F30
#5
no
Yes
no
no
19
Issue
Exec Comp
Writeback
Commit
LD F6, 34(R2)
LD F2, 45(R3)
12
13
14
15
52
53
54
55
20
Add1
No
Reservation
Add2
No
Stations
Add3
No
Mult1
No
MULT
Mem[0+Regs[R1]]
Regs[F2]
#2
Mult2
No
MULT
Mem[0+Regs[R1]]
Regs[F2]
#7
Instruction
State
Destination Value
Load1 No
Busy
Entry Busy
First
loop
Second
loop
No
LD F0, 0(R1)
commit
F0
Mem[0+R1]
Load2 No
No
commit
F4
F0 x F2
Load3
Yes
SD 0(R1), F4
write
0+Reg[R1]
#2
Yes
write
R1
R1 - 8
Yes
write
Yes
LD F0, 0(R1)
write
F0
Mem[#4]
Yes
write
F4
#6 X F2
Yes
SD 0(R1), F4
write
0+Regs[R1] #7
Yes
write
R1
#4 - 8
10
Yes
write
F2
F4
F6
F8
F10
F12
no
no
no
no
F0
Reorder # 6
Address
Reorder Buffer
...
F30
Busy yes
no
yes
no
Multiply has just reached commit, so other instructions can start committing
Some limitations
Too many value copy operations
Register file => RS => ROB => Register File
21
22