0% found this document useful (0 votes)

44 views

Multiple Instruction Issue and Hardware Based Speculation: Soner Önder

The document discusses hardware-based speculation and techniques to exploit instruction-level parallelism (ILP) in modern processors. It describes how processors can speculatively execute instructions past branches based on predictions and how the reorder buffer allows restoring state and communicating results if predictions are wrong. The reorder buffer entries hold instruction information, source operands, and results, allowing speculative execution and correctly committing changes on correct predictions. Renaming registers extend the register file to hold speculative results until commit. While assumptions like perfect prediction, unlimited resources, and 1-cycle latency allow ideal ILP, real machines face limits to parallelism from dependencies, memory aliasing, and other factors.

Uploaded by

Padmasri Durai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views

Multiple Instruction Issue and Hardware Based Speculation: Soner Önder

Uploaded by

Padmasri Durai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Multiple Instruction Issue

and
Hardware Based Speculation

Soner Önder
Michigan Technological University, Houghton MI
www.cs.mtu.edu/~soner
Hardware Based Speculation 2

•Exploiting more ILP requires that we overcome the

limitation of control dependence:
 With branch prediction we allowed the processor continue
issuing instructions past a branch based on a prediction:
 Those fetched instructions do not modify the processor state.
 These instructions are squashed if prediction is incorrect.
 We now allow the processor to execute these instructions
before we know if it is ok to execute them:
 We need to correctly restore the processor state if such an
instruction should not have been executed.
 We need to pass the results from these instructions to future
instructions as if the program is just following that path.
Hardware Based Speculation 3

B1 x < y? •Assume the processor

predicts B1 to be taken and
N T executes.
A =b+c C=0
C=c-1 A=0 •What will happen if the
prediction was wrong?
X<z B •What value of each variable
N T
2 should be used if the
B=b+1 C=a processor predicts B1 and B2
A=a+1 taken and executes
instructions along the way?
D=a+b+c
….
Use d
Hardware Based Speculation 4

•In order to execute instructions speculatively, we

need to provide means:
 To roll back the values of both registers and the memory to their
correct values upon a misprediction,
 To communicate speculatively calculated values to the new uses
of those values.

•Both can be provided by using a simple structure

called Reorder Buffer (ROB).
Reorder Buffer 5

•It is a simple circular array with a head and a tail

pointer:
 New instructions is allocated a position at the tail in program
order.
 Each entry provides a location for storing the instruction’s result.
 New instructions look for the values starting from tail – back.
 When the instruction at the head complete and becomes non-
speculative the values are committed and the instruction is
removed from the buffer.

Tail Head
Reorder Buffer 6

 3 fields: instr, destination, value

 Reorder buffer can be operand source => more registers like
RS
 Use reorder buffer number instead of reservation station
when execution completes
 Supplies operands between execution complete & commit
 Once operand commits, result is put into register
 Instructions commit
 As a result, its easy to undo speculated instructions
on mispredicted branches
or on exceptions
Steps of Speculative Tomasulo Algorithm
7

1. Issue [get instruction from FP Op Queue]

• Check if the reorder buffer is full.

• Check if a reservation station is available.
• Access the register file and the reorder buffer for the current
values of the source operands.
• Send the instruction, its reorder buffer slot number and the
source operands to the reservation station.

Once issued, the instruction stays in the reservation

station until it gets both operands.
Steps of Speculative Tomasulo Algorithm
8

2. Execute [operate on operands (EX)]

When both operands ready and a functional
unit is available, the instruction executes.
This step checks RAW hazards and as long as
operands are not ready, watches CDB for results.
Steps of Speculative Tomasulo Algorithm
9

3. Write result [finish execution (WB)]

Write on Common Data Bus to all awaiting FUs and
the reorder buffer; mark reservation station available.
Steps of Speculative Tomasulo Algorithm
10

4. Commit [update register file with reorder result]

 When instruction reaches the head of reorder buffer
 The result is present
 No exceptions associated with the instruction:

The instruction becomes non-speculative:

 Update register file with result (or store to memory)
 Remove the instruction from the reorder buffer.

A mispredicted branch flushes the reorder buffer.

MIPS FP Unit 11
Renaming Registers 12

Common variation of speculative design

Reorder buffer keeps instruction information
but not the result
Extend register file with extra
renaming registers to hold speculative results
Rename register allocated at issue;
result into rename register on execution complete;
rename register into real register on commit
Operands read either from register file
(real or speculative) or via Common Data Bus
Advantage: operands are always from single source
(extended register file)
Renaming Registers 13

1. Index a MAP table using the

0
source register identifiers to 1
get the physical register 2
125 Map table
number. .
.
2. Get the previous physical 29
register number for the 30
destination register. 31

3. Allocate a free physical

register and modify the MAP
table by indexing it with the 0
1
destination register 2
identifier. .
.
4. When instruction commits,
125
return the previous physical 126 Physical registers
register to the pool. 127
Renaming Registers 14

0 0
1 1 R7=r4+r3
2 2
3 R6=r2+r6
4
3 R3=r6+r7
5 4 R6=r6+10
6 5
7 6
8 7

Map table Code sequence

9
10
22
13
17
Renaming Registers 15

0 0
1 1
2 2 R7=r4+r3
3 3 R6=r2+r6
4 4 R3=r6+r7
5 5 R6=r6+10
6 6
7 7

Map table Code sequence Renamed Code sequence

9
10
22
13
17
Renaming Registers 16

Previous Dest
0 0 R9=r4+r3 R7
1 1
2 2 R7=r4+r3
3 3 R6=r2+r6
4 4 R3=r6+r7
5 5 R6=r6+10
6 6
7 9

Map table Code sequence Renamed Code sequence

10
22
13
17
Renaming Registers 17

Previous Dest
0 0 R9=r4+r3 R7
1 1 R10=r2+r6 r6
2 2 R7=r4+r3
3 3 R6=r2+r6
4 4 R3=r6+r7
5 5 R6=r6+10
6 10
7 9

Map table Code sequence Renamed Code sequence

22
13
17
Renaming Registers 18

Previous Dest
0 0 R9=r4+r3 R7
1 1 R10=r2+r6 R6
2 2 R7=r4+r3 R22=r10+r9 R3
3 22 R6=r2+r6
4 4 R3=r6+r7
5 5 R6=r6+10
6 10
7 9

Map table Code sequence Renamed Code sequence

13
17
Renaming Registers 19

Previous Dest
0 0 R9=r4+r3 R7
1 1 R10=r2+r6 R6
2 2 R7=r4+r3 R22=r10+r9 R3
3 22 R6=r2+r6 R13=r10+10 R10
4 4 R3=r6+r7
5 5 R6=r6+10
6 13
7 9

Map table Code sequence Renamed Code sequence

17
Renaming Registers 20

Previous Dest
0 0 R9=r4+r3 R7
1 1 R10=r2+r6 R6
2 2 R7=r4+r3 R22=r10+r9 R3
3 22 R6=r2+r6 R13=r10+10 R10
4 4 R3=r6+r7
5 5 R6=r6+10
6 13
7 9

Map table Code sequence Renamed Code sequence

17
10 When r13=r10+10
retires
Limits to ILP 21

Assumptions for ideal/perfect machine to start:

1. Register renaming–infinite virtual registers and all
WAW & WAR hazards are avoided
2. Branch prediction–perfect; no mispredictions
3. Jump prediction–all jumps perfectly predicted =>
machine with perfect speculation & an unbounded buffer of
instructions available
4. Memory-address alias analysis–addresses are known &
a load can be moved before a store provided addresses not
equal
1 cycle latency for all instructions; unlimited number of
instructions issued per clock cycle
Upper Limit to ILP: Ideal Machine 22

160 150.1
FP: 75 - 150
140
Inst ruct ion Issues per cycle

120 Integer: 18 - 60 118.7

100

75.2
IPC

80
62.6
54.8
60

40
17.9
20

0
gcc espresso li f pppp doducd t omcat v

Programs
More Realistic HW: Branch Impact
23

Change from Infinite window 61 FP: 15 -6045

to examine to 2000 and
60 58

maximum issue of 64
50 instructions per clock cycle 48
46 45 46 45 45
Inst ruct ion issues per cycle

41
40
35

Integer: 6 - 12
29
30
IPC

19
20 16
15
13 14
12
10
9
10 6 7 6 6 7
6
4
2 2 2

gcc espresso li fpppp doducd tomcatv

Program

Perfect Selective predictor Standard 2-bit Static None

More Realistic HW: Register Impact
24

59
FP: 11 - 45
60
Change 2000 instr 54

window, 64 instr issue, 8K 49

2 level Prediction
50
45
44
Inst ruct ion issues per cycle

40
IPC

30 Integer: 5 - 15 29 28

20
20 16
15 15 15
13
12 12 12 11 11
11 10 10 10
9
10 7
5 6 5 5 5 5
4 5 4 5
4

gcc espresso li fpppp doducd tomcatv

Program

Infinite 256 128 64 32 None

More Realistic HW: Alias Impact
25

49 49
50

Change 2000 instr window,

45 45
45
64 instr issue, 8K 2 level FP: 4 - 45
Prediction, 256 renaming
40
Inst ruct ion issues per cycle

35 registers (Fortran,
30 no heap)
25

20 Integer: 4 - 9
IPC

16 16
15
15
12
10
10 9
7 7
5 5 6
4 4 4 5
3 3 3 4 4
5

gcc espresso li fpppp doducd tomcatv

Program

Perfect Global/stack Perfect Inspection None

Realistic HW for ‘9X: Window Impact
26

60
56
Perfect disambiguation (HW), 1K
Selective Prediction, 16 entry 47
52

50 FP: 8 - 45
return, 64 registers, issue as 45
Inst ruct ion issues per cycle

many as window
40
35
Integer: 6 - 12 34

30
IPC

22 22

20 17 16
15 15 15 14
13 14
12 12 11 11 12
10 10 10 10
9 8 9 8 9 9
10 8
6 6 6 7
5 6
4 4 4 4
3 2 3 3 3 3

gcc expresso li fpppp doducd tomcatv

Program

Infinite 256 128 64 32 16 8 4

League of Legends Review
No ratings yet
League of Legends Review
5 pages
08 Speculation
No ratings yet
08 Speculation
21 pages
3. Dynamic Approach Hardware Based Speculation
No ratings yet
3. Dynamic Approach Hardware Based Speculation
26 pages
3. Dynamic Approach Hardware Based Speculation
No ratings yet
3. Dynamic Approach Hardware Based Speculation
27 pages
Falut Collapsing
No ratings yet
Falut Collapsing
41 pages
ILP2
No ratings yet
ILP2
16 pages
CSE502 Lec10 11-Dynamic-schedB SpeculationS10
No ratings yet
CSE502 Lec10 11-Dynamic-schedB SpeculationS10
36 pages
M14
No ratings yet
M14
44 pages
Dynamic Approach Hardware Based Speculation
No ratings yet
Dynamic Approach Hardware Based Speculation
27 pages
UNIT-3 Hardware-Based Speculation
No ratings yet
UNIT-3 Hardware-Based Speculation
27 pages
CAQA5e ch3
No ratings yet
CAQA5e ch3
45 pages
Onur 447 Spring15 Lecture11 Precise Exceptions Afterlecture
No ratings yet
Onur 447 Spring15 Lecture11 Precise Exceptions Afterlecture
49 pages
Lecture-14-03.02.2025
No ratings yet
Lecture-14-03.02.2025
53 pages
Onur 447 Spring15 Lecture12 Ooo Execution Afterlecture
No ratings yet
Onur 447 Spring15 Lecture12 Ooo Execution Afterlecture
67 pages
Refer Slide Time: 01:50
No ratings yet
Refer Slide Time: 01:50
28 pages
Onur Digitaldesign - Comparch 2021 Lecture16 Out of Order Execution Beforelecture
No ratings yet
Onur Digitaldesign - Comparch 2021 Lecture16 Out of Order Execution Beforelecture
89 pages
RN ACA-5 Unit-II
No ratings yet
RN ACA-5 Unit-II
42 pages
Midterm Recap: Performance Evaluation
No ratings yet
Midterm Recap: Performance Evaluation
5 pages
Onur Digitaldesign - Comparch 2021 Lecture15b Out of Order Execution I Afterlecture
No ratings yet
Onur Digitaldesign - Comparch 2021 Lecture15b Out of Order Execution I Afterlecture
110 pages
Hardware Support For Exposing Parallelism
No ratings yet
Hardware Support For Exposing Parallelism
8 pages
Computer Science 146 Computer Architecture
No ratings yet
Computer Science 146 Computer Architecture
22 pages
03ILP Speculation and Advanced Topics
No ratings yet
03ILP Speculation and Advanced Topics
48 pages
Lec02 Superscalar SW VLIW 22 23
No ratings yet
Lec02 Superscalar SW VLIW 22 23
34 pages
Reading Assignment1
No ratings yet
Reading Assignment1
15 pages
CS252 Graduate Computer Architecture Reorder Buffers and Explicit Register Renaming
No ratings yet
CS252 Graduate Computer Architecture Reorder Buffers and Explicit Register Renaming
55 pages
Arch4 Pipelined Processor Design Afterlecture
No ratings yet
Arch4 Pipelined Processor Design Afterlecture
130 pages
Hardware
No ratings yet
Hardware
24 pages
CS6461 Computer Architecture Lecture 8
No ratings yet
CS6461 Computer Architecture Lecture 8
61 pages
And, Finally... The Stack
No ratings yet
And, Finally... The Stack
36 pages
CH10-Processor Structure and Function
No ratings yet
CH10-Processor Structure and Function
14 pages
Hardware Based Speculation
No ratings yet
Hardware Based Speculation
2 pages
ILP Limitations
No ratings yet
ILP Limitations
31 pages
CS252 Graduate Computer Architecture Reorder Buffers and Explicit Register Renaming
No ratings yet
CS252 Graduate Computer Architecture Reorder Buffers and Explicit Register Renaming
55 pages
Instruction-Level Parallelism and Its Exploitation: Prof. Dr. Nizamettin AYDIN
No ratings yet
Instruction-Level Parallelism and Its Exploitation: Prof. Dr. Nizamettin AYDIN
170 pages
3.hardware Support For Exposing Parallelism
No ratings yet
3.hardware Support For Exposing Parallelism
21 pages
Branch Instructions
No ratings yet
Branch Instructions
24 pages
Ch04 - Finally - Stack
No ratings yet
Ch04 - Finally - Stack
10 pages
COL216 Assignment 4: 1 Problem Statement
No ratings yet
COL216 Assignment 4: 1 Problem Statement
4 pages
CA_HW5 copy
No ratings yet
CA_HW5 copy
4 pages
Onur Digitaldesign - Comparch 2021 Lecture14 Pipelined Processor Design Afterlecture
No ratings yet
Onur Digitaldesign - Comparch 2021 Lecture14 Pipelined Processor Design Afterlecture
97 pages
CPU Structure & Functions
No ratings yet
CPU Structure & Functions
44 pages
Micro Controller
No ratings yet
Micro Controller
17 pages
Instruction Level Parallelism
No ratings yet
Instruction Level Parallelism
36 pages
CPU Structure and Function
100% (1)
CPU Structure and Function
30 pages
Topic2c Ss Dynamicscheduling
No ratings yet
Topic2c Ss Dynamicscheduling
94 pages
5.Advanced-1
No ratings yet
5.Advanced-1
60 pages
Module 5_Processor Structure and Function
No ratings yet
Module 5_Processor Structure and Function
74 pages
M116C 1 M116C 1 Lec10-Pipeline-II
No ratings yet
M116C 1 M116C 1 Lec10-Pipeline-II
18 pages
Decision Instructions: CS353 - Computer Architecture
No ratings yet
Decision Instructions: CS353 - Computer Architecture
43 pages
5-Stage Pipeline CPU Hardware
No ratings yet
5-Stage Pipeline CPU Hardware
33 pages
ARM
No ratings yet
ARM
44 pages
Onur Ddca 2025 Lecture14 Out of Order Execution Afterlecture
No ratings yet
Onur Ddca 2025 Lecture14 Out of Order Execution Afterlecture
114 pages
FPGA Design Final
No ratings yet
FPGA Design Final
4 pages
Assignment DataSheet
No ratings yet
Assignment DataSheet
5 pages
Rotate - Conditional and Io
No ratings yet
Rotate - Conditional and Io
112 pages
Arch3 Pipelining Afterlecture
No ratings yet
Arch3 Pipelining Afterlecture
180 pages
13) Ilp1 PDF
No ratings yet
13) Ilp1 PDF
85 pages
Chapter 2 Lecture 4 and 5
No ratings yet
Chapter 2 Lecture 4 and 5
56 pages
Out-of-Order Execution: N The Previous Sections
No ratings yet
Out-of-Order Execution: N The Previous Sections
69 pages
Lec12_DataPath
No ratings yet
Lec12_DataPath
43 pages
Modern C++ Programming: Including the recent standards C++11, C++17, C++20, C++23
From Everand
Modern C++ Programming: Including the recent standards C++11, C++17, C++20, C++23
Orhan Gazi
No ratings yet
Users Guide
No ratings yet
Users Guide
51 pages
CLASS XI Computer Project
No ratings yet
CLASS XI Computer Project
17 pages
satya sai html file (1)
No ratings yet
satya sai html file (1)
40 pages
PHB 3104
No ratings yet
PHB 3104
10 pages
MATH 573 Adv Scientific Computing
No ratings yet
MATH 573 Adv Scientific Computing
4 pages
Expert Systems: 5.1 Overview
No ratings yet
Expert Systems: 5.1 Overview
11 pages
035 TT015 PDF
No ratings yet
035 TT015 PDF
4 pages
Apple Macbook Pro A1151 17 SYMPHONY M9 EVT 051-7066 820-1970 Rev02 (1-19-2006) Schematics
No ratings yet
Apple Macbook Pro A1151 17 SYMPHONY M9 EVT 051-7066 820-1970 Rev02 (1-19-2006) Schematics
21 pages
_2048 GAME USING PYTHON_REPORT
No ratings yet
_2048 GAME USING PYTHON_REPORT
52 pages
DN - 60408 IPDACT-2-2UD - FireWatch IP Series New IP Fire Alarm Communicator
No ratings yet
DN - 60408 IPDACT-2-2UD - FireWatch IP Series New IP Fire Alarm Communicator
4 pages
MIDIPLUS Manual AKM322 EN V0.2
No ratings yet
MIDIPLUS Manual AKM322 EN V0.2
12 pages
As WS CS G7 Ch 7 Python- Conditions and Loops
No ratings yet
As WS CS G7 Ch 7 Python- Conditions and Loops
4 pages
Fm/Am Compact Disc Player: CDX-G3200UV
No ratings yet
Fm/Am Compact Disc Player: CDX-G3200UV
56 pages
Project Running LED
100% (1)
Project Running LED
1 page
Jessica M. Rosin: Education
No ratings yet
Jessica M. Rosin: Education
1 page
WCF Architecture
No ratings yet
WCF Architecture
67 pages
FINEZ_IT0021-Laboratory-Exercise-3
No ratings yet
FINEZ_IT0021-Laboratory-Exercise-3
40 pages
CARD Inc., Regional IT Pre-Employment Examination Part 1
No ratings yet
CARD Inc., Regional IT Pre-Employment Examination Part 1
1 page
CAD Lab Manual
100% (2)
CAD Lab Manual
61 pages
Labels and Merge: LLF, University of Paris Diderot/ Paris 7
No ratings yet
Labels and Merge: LLF, University of Paris Diderot/ Paris 7
20 pages
Drafting Manual 1
No ratings yet
Drafting Manual 1
1 page
برمجه نظم - المحاضره الأولى
60% (5)
برمجه نظم - المحاضره الأولى
11 pages
IBM LinuxONE Expert Care
No ratings yet
IBM LinuxONE Expert Care
4 pages
OSI and WAN
No ratings yet
OSI and WAN
5 pages
Front Page For Holiday Homework
100% (1)
Front Page For Holiday Homework
5 pages
Conditions and Loops
No ratings yet
Conditions and Loops
37 pages
Multiple CHoice Recognized
No ratings yet
Multiple CHoice Recognized
9 pages
Fw1701-02 - Flywheel Mounting
No ratings yet
Fw1701-02 - Flywheel Mounting
1 page
6G and Next-Generation Internet: Under Blockchain Web3 Economy 1st Edition Abdeljalil Beniiche instant download
100% (3)
6G and Next-Generation Internet: Under Blockchain Web3 Economy 1st Edition Abdeljalil Beniiche instant download
77 pages

Multiple Instruction Issue and Hardware Based Speculation: Soner Önder

Uploaded by

Multiple Instruction Issue and Hardware Based Speculation: Soner Önder

Uploaded by

Multiple Instruction Issue

•Exploiting more ILP requires that we overcome the

B1 x < y? •Assume the processor

•In order to execute instructions speculatively, we

•Both can be provided by using a simple structure

•It is a simple circular array with a head and a tail

 3 fields: instr, destination, value

1. Issue [get instruction from FP Op Queue]

• Check if the reorder buffer is full.

Once issued, the instruction stays in the reservation

2. Execute [operate on operands (EX)]

3. Write result [finish execution (WB)]

4. Commit [update register file with reorder result]

The instruction becomes non-speculative:

A mispredicted branch flushes the reorder buffer.

Common variation of speculative design

1. Index a MAP table using the

3. Allocate a free physical

Map table Code sequence

Map table Code sequence Renamed Code sequence

Map table Code sequence Renamed Code sequence

Map table Code sequence Renamed Code sequence

Map table Code sequence Renamed Code sequence

Map table Code sequence Renamed Code sequence

Map table Code sequence Renamed Code sequence

Assumptions for ideal/perfect machine to start:

120 Integer: 18 - 60 118.7

Change from Infinite window 61 FP: 15 -6045

gcc espresso li fpppp doducd tomcatv

Perfect Selective predictor Standard 2-bit Static None

window, 64 instr issue, 8K 49

gcc espresso li fpppp doducd tomcatv

Infinite 256 128 64 32 None

Change 2000 instr window,

gcc espresso li fpppp doducd tomcatv

Perfect Global/stack Perfect Inspection None

gcc expresso li fpppp doducd tomcatv

Infinite 256 128 64 32 16 8 4

You might also like