0% found this document useful (0 votes)
44 views

Multiple Instruction Issue and Hardware Based Speculation: Soner Önder

The document discusses hardware-based speculation and techniques to exploit instruction-level parallelism (ILP) in modern processors. It describes how processors can speculatively execute instructions past branches based on predictions and how the reorder buffer allows restoring state and communicating results if predictions are wrong. The reorder buffer entries hold instruction information, source operands, and results, allowing speculative execution and correctly committing changes on correct predictions. Renaming registers extend the register file to hold speculative results until commit. While assumptions like perfect prediction, unlimited resources, and 1-cycle latency allow ideal ILP, real machines face limits to parallelism from dependencies, memory aliasing, and other factors.

Uploaded by

Padmasri Durai
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views

Multiple Instruction Issue and Hardware Based Speculation: Soner Önder

The document discusses hardware-based speculation and techniques to exploit instruction-level parallelism (ILP) in modern processors. It describes how processors can speculatively execute instructions past branches based on predictions and how the reorder buffer allows restoring state and communicating results if predictions are wrong. The reorder buffer entries hold instruction information, source operands, and results, allowing speculative execution and correctly committing changes on correct predictions. Renaming registers extend the register file to hold speculative results until commit. While assumptions like perfect prediction, unlimited resources, and 1-cycle latency allow ideal ILP, real machines face limits to parallelism from dependencies, memory aliasing, and other factors.

Uploaded by

Padmasri Durai
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Multiple Instruction Issue

and
Hardware Based Speculation

Soner Önder
Michigan Technological University, Houghton MI
www.cs.mtu.edu/~soner
Hardware Based Speculation 2

•Exploiting more ILP requires that we overcome the


limitation of control dependence:
 With branch prediction we allowed the processor continue
issuing instructions past a branch based on a prediction:
 Those fetched instructions do not modify the processor state.
 These instructions are squashed if prediction is incorrect.
 We now allow the processor to execute these instructions
before we know if it is ok to execute them:
 We need to correctly restore the processor state if such an
instruction should not have been executed.
 We need to pass the results from these instructions to future
instructions as if the program is just following that path.
Hardware Based Speculation 3

B1 x < y? •Assume the processor


predicts B1 to be taken and
N T executes.
A =b+c C=0
C=c-1 A=0 •What will happen if the
prediction was wrong?
X<z B •What value of each variable
N T
2 should be used if the
B=b+1 C=a processor predicts B1 and B2
A=a+1 taken and executes
instructions along the way?
D=a+b+c
….
Use d
Hardware Based Speculation 4

•In order to execute instructions speculatively, we


need to provide means:
 To roll back the values of both registers and the memory to their
correct values upon a misprediction,
 To communicate speculatively calculated values to the new uses
of those values.

•Both can be provided by using a simple structure


called Reorder Buffer (ROB).
Reorder Buffer 5

•It is a simple circular array with a head and a tail


pointer:
 New instructions is allocated a position at the tail in program
order.
 Each entry provides a location for storing the instruction’s result.
 New instructions look for the values starting from tail – back.
 When the instruction at the head complete and becomes non-
speculative the values are committed and the instruction is
removed from the buffer.

Tail Head
Reorder Buffer 6

 3 fields: instr, destination, value


 Reorder buffer can be operand source => more registers like
RS
 Use reorder buffer number instead of reservation station
when execution completes
 Supplies operands between execution complete & commit
 Once operand commits, result is put into register
 Instructions commit
 As a result, its easy to undo speculated instructions
on mispredicted branches
or on exceptions
Steps of Speculative Tomasulo Algorithm
7

1. Issue [get instruction from FP Op Queue]

• Check if the reorder buffer is full.


• Check if a reservation station is available.
• Access the register file and the reorder buffer for the current
values of the source operands.
• Send the instruction, its reorder buffer slot number and the
source operands to the reservation station.

Once issued, the instruction stays in the reservation


station until it gets both operands.
Steps of Speculative Tomasulo Algorithm
8

2. Execute [operate on operands (EX)]


When both operands ready and a functional
unit is available, the instruction executes.
This step checks RAW hazards and as long as
operands are not ready, watches CDB for results.
Steps of Speculative Tomasulo Algorithm
9

3. Write result [finish execution (WB)]


Write on Common Data Bus to all awaiting FUs and
the reorder buffer; mark reservation station available.
Steps of Speculative Tomasulo Algorithm
10

4. Commit [update register file with reorder result]


 When instruction reaches the head of reorder buffer
 The result is present
 No exceptions associated with the instruction:

The instruction becomes non-speculative:


 Update register file with result (or store to memory)
 Remove the instruction from the reorder buffer.

A mispredicted branch flushes the reorder buffer.


MIPS FP Unit 11
Renaming Registers 12

Common variation of speculative design


Reorder buffer keeps instruction information
but not the result
Extend register file with extra
renaming registers to hold speculative results
Rename register allocated at issue;
result into rename register on execution complete;
rename register into real register on commit
Operands read either from register file
(real or speculative) or via Common Data Bus
Advantage: operands are always from single source
(extended register file)
Renaming Registers 13

1. Index a MAP table using the


0
source register identifiers to 1
get the physical register 2
125 Map table
number. .
.
2. Get the previous physical 29
register number for the 30
destination register. 31

3. Allocate a free physical


register and modify the MAP
table by indexing it with the 0
1
destination register 2
identifier. .
.
4. When instruction commits,
125
return the previous physical 126 Physical registers
register to the pool. 127
Renaming Registers 14

0 0
1 1 R7=r4+r3
2 2
3 R6=r2+r6
4
3 R3=r6+r7
5 4 R6=r6+10
6 5
7 6
8 7

Map table Code sequence

9
10
22
13
17
Renaming Registers 15

0 0
1 1
2 2 R7=r4+r3
3 3 R6=r2+r6
4 4 R3=r6+r7
5 5 R6=r6+10
6 6
7 7

Map table Code sequence Renamed Code sequence

9
10
22
13
17
Renaming Registers 16

Previous Dest
0 0 R9=r4+r3 R7
1 1
2 2 R7=r4+r3
3 3 R6=r2+r6
4 4 R3=r6+r7
5 5 R6=r6+10
6 6
7 9

Map table Code sequence Renamed Code sequence

10
22
13
17
Renaming Registers 17

Previous Dest
0 0 R9=r4+r3 R7
1 1 R10=r2+r6 r6
2 2 R7=r4+r3
3 3 R6=r2+r6
4 4 R3=r6+r7
5 5 R6=r6+10
6 10
7 9

Map table Code sequence Renamed Code sequence

22
13
17
Renaming Registers 18

Previous Dest
0 0 R9=r4+r3 R7
1 1 R10=r2+r6 R6
2 2 R7=r4+r3 R22=r10+r9 R3
3 22 R6=r2+r6
4 4 R3=r6+r7
5 5 R6=r6+10
6 10
7 9

Map table Code sequence Renamed Code sequence

13
17
Renaming Registers 19

Previous Dest
0 0 R9=r4+r3 R7
1 1 R10=r2+r6 R6
2 2 R7=r4+r3 R22=r10+r9 R3
3 22 R6=r2+r6 R13=r10+10 R10
4 4 R3=r6+r7
5 5 R6=r6+10
6 13
7 9

Map table Code sequence Renamed Code sequence

17
Renaming Registers 20

Previous Dest
0 0 R9=r4+r3 R7
1 1 R10=r2+r6 R6
2 2 R7=r4+r3 R22=r10+r9 R3
3 22 R6=r2+r6 R13=r10+10 R10
4 4 R3=r6+r7
5 5 R6=r6+10
6 13
7 9

Map table Code sequence Renamed Code sequence

17
10 When r13=r10+10
retires
Limits to ILP 21

Assumptions for ideal/perfect machine to start:


1. Register renaming–infinite virtual registers and all
WAW & WAR hazards are avoided
2. Branch prediction–perfect; no mispredictions
3. Jump prediction–all jumps perfectly predicted =>
machine with perfect speculation & an unbounded buffer of
instructions available
4. Memory-address alias analysis–addresses are known &
a load can be moved before a store provided addresses not
equal
1 cycle latency for all instructions; unlimited number of
instructions issued per clock cycle
Upper Limit to ILP: Ideal Machine 22

160 150.1
FP: 75 - 150
140
Inst ruct ion Issues per cycle

120 Integer: 18 - 60 118.7

100

75.2
IPC

80
62.6
54.8
60

40
17.9
20

0
gcc espresso li f pppp doducd t omcat v

Programs
More Realistic HW: Branch Impact
23

Change from Infinite window 61 FP: 15 -6045


to examine to 2000 and
60 58

maximum issue of 64
50 instructions per clock cycle 48
46 45 46 45 45
Inst ruct ion issues per cycle

41
40
35

Integer: 6 - 12
29
30
IPC

19
20 16
15
13 14
12
10
9
10 6 7 6 6 7
6
4
2 2 2

gcc espresso li fpppp doducd tomcatv

Program

Perfect Selective predictor Standard 2-bit Static None


More Realistic HW: Register Impact
24

59
FP: 11 - 45
60
Change 2000 instr 54

window, 64 instr issue, 8K 49

2 level Prediction
50
45
44
Inst ruct ion issues per cycle

40
IPC

35

30 Integer: 5 - 15 29 28

20
20 16
15 15 15
13
12 12 12 11 11
11 10 10 10
9
10 7
5 6 5 5 5 5
4 5 4 5
4

gcc espresso li fpppp doducd tomcatv

Program

Infinite 256 128 64 32 None


More Realistic HW: Alias Impact
25

49 49
50

Change 2000 instr window,


45 45
45
64 instr issue, 8K 2 level FP: 4 - 45
Prediction, 256 renaming
40
Inst ruct ion issues per cycle

35 registers (Fortran,
30 no heap)
25

20 Integer: 4 - 9
IPC

16 16
15
15
12
10
10 9
7 7
5 5 6
4 4 4 5
3 3 3 4 4
5

gcc espresso li fpppp doducd tomcatv

Program

Perfect Global/stack Perfect Inspection None


Realistic HW for ‘9X: Window Impact
26

60
56
Perfect disambiguation (HW), 1K
Selective Prediction, 16 entry 47
52

50 FP: 8 - 45
return, 64 registers, issue as 45
Inst ruct ion issues per cycle

many as window
40
35
Integer: 6 - 12 34

30
IPC

22 22

20 17 16
15 15 15 14
13 14
12 12 11 11 12
10 10 10 10
9 8 9 8 9 9
10 8
6 6 6 7
5 6
4 4 4 4
3 2 3 3 3 3

gcc expresso li fpppp doducd tomcatv

Program

Infinite 256 128 64 32 16 8 4

You might also like