0% found this document useful (0 votes)
16 views53 pages

Lecture-14-03 02 2025

The document discusses hardware-based speculation in dynamic execution, focusing on the use of a Reorder Buffer (ROB) to manage speculative instructions and ensure correct program execution. It outlines the phases of instruction processing, the advantages and disadvantages of speculation, and the challenges faced by dynamically scheduled superscalar processors. Additionally, it describes the structure and function of the ROB, including how it handles exceptions and maintains the correct order of instruction commits.

Uploaded by

shwetank7744
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views53 pages

Lecture-14-03 02 2025

The document discusses hardware-based speculation in dynamic execution, focusing on the use of a Reorder Buffer (ROB) to manage speculative instructions and ensure correct program execution. It outlines the phases of instruction processing, the advantages and disadvantages of speculation, and the challenges faced by dynamically scheduled superscalar processors. Additionally, it describes the structure and function of the ROB, including how it handles exceptions and maintains the correct order of instruction commits.

Uploaded by

shwetank7744
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

LECTURE 14: H/W BASED

SPECULATION
Motivation
Assumption in dynamic execution: Perfect branch prediction
Instructions fetched & issued but not executed until branch completes
Iter. # Instructions Issues @ Begins EX Mem acc. Write CDB Comment

1 L.D F0, 0(R1) 1 2 3 4 First Issue

1 ADD.D F4, F0, F2 2 5 8 Wait for L.D

1 S.D F4, 0(R1) 3 4 9 Wait for ADD.D

1 DADDI R1, R1, #8 4 5 6

1 BNE R1,R2, for 5 7 Wait for DADDI

2 L.D F0, 0(R1) 6 8 10 11 Wait BNE, struct hazards

2 ADD.D F4, F0, F2 7 12 15 Wait for L.D

2 S.D F4, 0(R1) 8 9 16 Wait for ADD.D

2 DADDI R1, R1, #8 9 10 12

2 BNE R1,R2, for 10 13 Wait for DADDI


Speculation
◦ Overcome control dependences by
◦ predicting branch outcomes and
◦ speculatively executing instructions as if predictions were correct

◦ Key idea
◦ allow instructions to execute out of order
◦ force them to commit in order
◦ prevent any irrevocable action until instructions commits
Reorder Buffer (ROB)
◦Mechanism to handle incorrect predictions
◦ Instructions wait in ROB until they are no longer speculative
◦ Thereafter, they commit (complete/graduate)
◦ Speculative instructions not allowed to change processor state (regs,
mem, etc)
◦ ROB also provides precise exception
Reorder Buffer Structure
◦ Conceptually, ROB is FIFO queue with 4-field entries:
◦ type: instruction type such as arithmetic, load, store, branch
◦ destination: dest. reg. (load, arithmetic) or memory address (store)
◦ ready: flag indicating if instruction has completed execution
◦ value: value computed by instruction

type dest value ready


free
tail ptr:
youngest instn 0
Program order
1
head ptr:
oldest instn 0
ROB can Reorder buffer
supply
operands RF is wriiten
from ROB
instead of CDB
reg# data
From inst Fetch unit
Address
Unit
computes
EA of stores
Inst Queue

Address Unit

Load
Store buf eliminated, Bufs
function integrated in A
Add3 Mul2
ROB
Reservation Mul1
Add2
Store Stations
Add1
Store addr
Results must be
data Addr
broadcast on CDB with
Memory Unit FP adders ROB entry number FP multipliers

Common Data Bus


Points to remember regarding ROB
◦ Register file (RF) is written from a reorder buffer (ROB) instead of CDB
◦ To ensure that speculative instructions do not write the RF
◦ ROB can also supplies the operand values
◦ If both the operands are in ROB and RF we must use the value from ROB.
◦ Since ROB has the most recent values
◦ Store buffer is eliminated and the functions is integrated in ROB
◦ Connections from Address Unit to the ROB
◦ To enable address calculation for the stores and this must stored in the
destination field of the corresponding ROB entries
◦ Results must be broadcast on CDB with ROB entries that contains its
results
◦ This is needed where the instructions need to be stored in the ROB
Instruction Phases
◦Issue next instruction from instruction queue to RS
◦Execute when both operands are available
◦ Write result on CDB to all waiting RSs and ROB
◦Commit instruction at the head of the ROB
Instruction Phase 1: Issue
◦Issue next instruction from instruction queue if there is an
empty RS and empty ROB entry
◦ with operands from reg file or from value field ROB entry (youngest
ready ROB entry most recent) or with RS identifiers
◦ together with number of ROB entry
Instruction Phase 2: Execute
◦Same as Tomasulo: Ref Tomasulo Normal Dynamic Execution
Instruction Phase 3: Write Results
◦Write result on CDB to all waiting RSs and ROB
◦ instruction tagged with the number ROB entry
◦ Stores
◦ If stores value available -> write to value-field of store’s ROB entry
◦ If not-> ROB monitors CDB for store value
Instruction Phase 4: Commit
◦ Commit instruction at ROB
head type dest value ready
◦ By checking the ready flag if it free
has completed T
arith 1 Program
◦ Arithmetic and load write to order
register file ld 0
H
◦ Store writes to memory br 1
◦ Correctly predicted branch
finishes
◦ Incorrect predicted branch -> type dest value ready
flush the ROB free
◦ set ROB tail to entry holding arith 1 Program
branch ld 0
order

◦ restart execution at correct H=T


br 1
successor
type des val rdy excep
ROB Preserves Exception Behaviour
free
T
◦ Separate raising arith 0
div 1 DBZ
exception from taking H
br 1
exception
◦ Instructions causes type des val rdy excep
exception during execution T
free
-> raise exception in ROB arith 0 Take
entry H
div 1 DBZ
exception

◦ instruction commit -> take free


exception if raised
type des val rdy excep
◦ Instruction commit in
free
program order
arith 0 Exception
◦ only exceptions that Not Taken
should occur, do occur H=T
div 1 DBZ
br 1
Pros and Cons of Speculation
◦ Advantages speculation:
◦ Use Functional Units that would otherwise be unused for several cycles
◦ Eliminate control hazards
◦ Potential disadvantages speculation:
◦ consume time and energy if wrongly speculated
◦ requires HW and power
◦ performance reduction when instructions causes exceptional event (cache
miss, TLB miss)
◦ Example: If we speculatively execute a load then a load may cause cache
miss then handling the miss takes time and this time is wasted when
speculation is wrong
How much to speculate?
◦Typical solution:
◦ Speculation allows low-cost exception events (Eg. L1 cache miss)
◦ When expensive exceptional events (Eg. L2 cache miss, TLB miss, …)
occurs, processor waits until instruction causing event commits
before handling event
Multiple Issues Processors
IF ID EX M WB
◦Ideal CPI of single-issue
IF ID EX M WB
◦ Even with speculation IF ID EX M WB
◦How to break CPI=1
barrier? IF ID EX M WB
◦ Multiple Issue IF ID EX M WB
IF ID EX M WB
IF ID EX M WB
IF ID EX M WB
IF ID EX M WB
Multiple Issue Approaches
◦Dynamically scheduled superscalar
◦VLIW processors
◦ Performed by compiler
◦Statistically scheduled superscalar
◦ Similar to VLIW
Challenges of Dynamically Scheduled Superscalars
◦ Need to fetch multiple instructions/cycle
◦ Highly complex because instruction stream contains branches
◦ Need to issue multiple instructions / cycle
◦ Very complex because instructions may depend on each other
◦ 2 solutions
◦ pipeline issue logic
◦ Execute each stage in half a clock cycle-> at most 2 instructions can be issued
◦ widen issue logic
◦ Need multiple operations and operand busses, multiple RF read ports
◦ Need to widen CDB to complete multiple instructions/ cycle
◦ Multiple comparators at reservation stations
Superscalar Execution

double x[1000], s;
for (i=0; i<1000; i++)
x[i] = x[i]+s;

; R1 = x = &x[0]
; R2 = &x[1000]
; F2 = s
for: L.D F0, 0(R1) ; F0 =x[i]
ADD.D F4, F0, F2 ; F4 = x[i] + s
S.D F4, 0(R1) ; x[i] = F4
DADDI R1, R1, 8 ; R1 += 8 = &x[i+1]
BNE R1, R2, for ; if (R1! = &x[1000]) goto for
Assumptions of the hardware
◦ Dual-issue superscalar
◦ 1 integer functional unit (FU) (used for both ALU and EA
calculations)
◦ Separate FU to evaluate branch conditions
◦ Separate pipeline FP FU
◦ Issue and Write take 1 cc each
◦ Perfect branch prediction
◦ Instructions are fetched and issued but not actually executed until branch
has completed (No Speculation)
Execution latencies: Int ALU: 1 cc Ld/st: 2 cc (EX+MEM) FP add: 3cc
Dual-Issue Dynamic Superscalar without Speculation
Iter. # Instructions Issues @ Begins EX Mem acc. Write CDB

1 L.D F0, 0(R1) 1 2 3 4

1 ADD.D F4, F0, F2 1 5 8

1 S.D F4, 0(R1) 2 3 9

1 DADDI R1, R1, #8 2 4 5

1 BNE R1,R2, for 3 6

2 L.D F0, 0(R1) 4 7 8 9

2 ADD.D F4, F0, F2 4 10 13

2 S.D F4, 0(R1) 5 8 14

2 DADDI R1, R1, #8 5 9 10

2 BNE R1,R2, for 6 11

3 L.D F0, 0(R1) 7 12 13 14


Dual-Issue Dynamic Superscalar without Speculation
Dual Issue
Iter. # Instructions Issues @ Begins EX @ Mem access @ Write CDB@ No stalls for 1st instructions
1 L.D F0, 0(R1) 1 2 3 4

1 ADD.D F4, F0, F2 1 5 8

1 S.D F4, 0(R1) 2 3 9

1 DADDI R1, R1, #8 2 4 5

1 BNE R1,R2, for 3 6

2 L.D F0, 0(R1) 4 7 8 9

2 ADD.D F4, F0, F2 4 10 13

2 S.D F4, 0(R1) 5 8 14

2 DADDI R1, R1, #8 5 9 10

2 BNE R1,R2, for 6 11

3 L.D F0, 0(R1) 7 12 13 14


Dual-Issue Dynamic Superscalar without Speculation
Iter. # Instructions
wait Issues
for L.D@ Begins EX Mem acc. Write CDB

1 L.D F0, 0(R1) 1 2 3 4

1 ADD.D F4, F0, F2 1 5 8

1 S.D F4, 0(R1) 2 3 9

1 DADDI R1, R1, #8 2 4 5

1 BNE R1,R2, for 3 6

2 L.D F0, 0(R1) 4 7 8 9

2 ADD.D F4, F0, F2 4 10 13

2 S.D F4, 0(R1) 5 8 14

2 DADDI R1, R1, #8 5 9 10

2 BNE R1,R2, for 6 11

3 L.D F0, 0(R1) 7 12 13 14


Dual-Issue Dynamic Superscalar without Speculation
Iter. # Instructions Issues @ Begins EX Mem acc. Write CDB
wait for ADD.D
1 L.D F0, 0(R1) 1 2 3 4

1 ADD.D F4, F0, F2 1 5 8

1 S.D F4, 0(R1) 2 3 9

1 DADDI R1, R1, #8 2 4 5

1 BNE R1,R2, for 3 6

2 L.D F0, 0(R1) 4 7 8 9

2 ADD.D F4, F0, F2 4 10 13

2 S.D F4, 0(R1) 5 8 14

2 DADDI R1, R1, #8 5 9 10

2 BNE R1,R2, for 6 11

3 L.D F0, 0(R1) 7 12 13 14


Dual-Issue Dynamic Superscalar without Speculation
wait for ALU: Int FU is busy calculating the Effective address of the S.D
Iter. # Instructions Issues @ Begins EX Mem acc. Write CDB

1 L.D F0, 0(R1) 1 2 3 4

1 ADD.D F4, F0, F2 1 5 8

1 S.D F4, 0(R1) 2 3 9

1 DADDI R1, R1, #8 2 4 5

1 BNE R1,R2, for 3 6

2 L.D F0, 0(R1) 4 7 8 9

2 ADD.D F4, F0, F2 4 10 13

2 S.D F4, 0(R1) 5 8 14

2 DADDI R1, R1, #8 5 9 10

2 BNE R1,R2, for 6 11

3 L.D F0, 0(R1) 7 12 13 14


Dual-Issue Dynamic Superscalar without Speculation
wait for ADD.I
Iter. # Instructions Issues @ Begins EX Mem acc. Write CDB

1 L.D F0, 0(R1) 1 2 3 4

1 ADD.D F4, F0, F2 1 5 8

1 S.D F4, 0(R1) 2 3 9

1 DADDI R1, R1, #8 2 4 5

1 BNE R1,R2, for 3 6

2 L.D F0, 0(R1) 4 7 8 9

2 ADD.D F4, F0, F2 4 10 13

2 S.D F4, 0(R1) 5 8 14

2 DADDI R1, R1, #8 5 9 10

2 BNE R1,R2, for 6 11

3 L.D F0, 0(R1) 7 12 13 14


Dual-Issue Dynamic Superscalar without Speculation
Fetch after BP
Iter. # Instructions Issues @ Begins EX Mem acc. Write CDB

1 L.D F0, 0(R1) 1 2 3 4

1 ADD.D F4, F0, F2 1 5 8

1 S.D F4, 0(R1) 2 3 9

1 DADDI R1, R1, #8 2 4 5

1 BNE R1,R2, for 3 6

2 L.D F0, 0(R1) 4 7 8 9

2 ADD.D F4, F0, F2 4 10 13

2 S.D F4, 0(R1) 5 8 14

2 DADDI R1, R1, #8 5 9 10

2 BNE R1,R2, for 6 11

3 L.D F0, 0(R1) 7 12 13 14


Dual-Issue Dynamic Superscalar without Speculation
Iter. # Instructions Issues @ Begins EX Mem acc. Write CDB

1 L.D F0, 0(R1) 1 2 3 4

1 ADD.D F4, F0, F2 1 5 8

1 S.D F4, 0(R1) 2 3 9 wait for BNE, since


we assume no
1 DADDI R1, R1, #8 2 4 5
speculation
1 BNE R1,R2, for 3 6

2 L.D F0, 0(R1) 4 7 8 9

2 ADD.D F4, F0, F2 4 10 13

2 S.D F4, 0(R1) 5 8 14

2 DADDI R1, R1, #8 5 9 10

2 BNE R1,R2, for 6 11

3 L.D F0, 0(R1) 7 12 13 14


Dual-Issue Dynamic Superscalar without Speculation
Iter. # Instructions Issues @ Begins EX Mem acc. Write CDB

1 L.D F0, 0(R1) 1 2 3 4

1 ADD.D F4, F0, F2 1 5 8

1 S.D F4, 0(R1) 2 3 9 wait for L.D


1 DADDI R1, R1, #8 2 4 5

1 BNE R1,R2, for 3 6

2 L.D F0, 0(R1) 4 7 8 9

2 ADD.D F4, F0, F2 4 10 13

2 S.D F4, 0(R1) 5 8 14

2 DADDI R1, R1, #8 5 9 10

2 BNE R1,R2, for 6 11

3 L.D F0, 0(R1) 7 12 13 14


Dual-Issue Dynamic Superscalar without Speculation
wait for L.D to complete to cal. EA
Iter. # Instructions Issues @ Begins EX Mem acc. Write CDB

1 L.D F0, 0(R1) 1 2 3 4

1 ADD.D F4, F0, F2 1 5 8

1 S.D F4, 0(R1) 2 3 9 wait for ADD.I


1 DADDI R1, R1, #8 2 4 5

1 BNE R1,R2, for 3 6

2 L.D F0, 0(R1) 4 7 8 9

2 ADD.D F4, F0, F2 4 10 13

2 S.D F4, 0(R1) 5 8 14

2 DADDI R1, R1, #8 5 9 10

2 BNE R1,R2, for 6 11

3 L.D F0, 0(R1) 7 12 13 14


Dual-Issue Dynamic Superscalar without Speculation
Iter. # Instructions Issues @ Begins EX Mem acc. Write CDB

1 L.D F0, 0(R1) 1 2 3 4

1 ADD.D F4, F0, F2 1 5 8

1 S.D F4, 0(R1) 2 3 9 wait for ALU


1 DADDI R1, R1, #8 2 4 5

1 BNE R1,R2, for 3 6

2 L.D F0, 0(R1) 4 7 8 9

2 ADD.D F4, F0, F2 4 10 13

2 S.D F4, 0(R1) 5 8 14

2 DADDI R1, R1, #8 5 9 10

2 BNE R1,R2, for 6 11

3 L.D F0, 0(R1) 7 12 13 14


Dual-Issue Dynamic Superscalar without Speculation
Iter. # Instructions Issues @ Begins EX Mem acc. Write CDB

1 L.D F0, 0(R1) 1 2 3 4

1 ADD.D F4, F0, F2 1 5 8

1 S.D F4, 0(R1) 2 3 9 wait for


DADDI
1 DADDI R1, R1, #8 2 4 5

1 BNE R1,R2, for 3 6

2 L.D F0, 0(R1) 4 7 8 9

2 ADD.D F4, F0, F2 4 10 13

2 S.D F4, 0(R1) 5 8 14

2 DADDI R1, R1, #8 5 9 10

2 BNE R1,R2, for 6 11

3 L.D F0, 0(R1) 7 12 13 14


Dual-Issue Dynamic Superscalar without Speculation
Iter. # Instructions Issues @ Begins EX Mem acc. Write CDB

1 L.D F0, 0(R1) 1 2 3 4

1 ADD.D F4, F0, F2 1 5 8

1 S.D F4, 0(R1) 2 3 9 wait for BNE


completion
1 DADDI R1, R1, #8 2 4 5

1 BNE R1,R2, for 3 6

2 L.D F0, 0(R1) 4 7 8 9

2 ADD.D F4, F0, F2 4 10 13

2 S.D F4, 0(R1) 5 8 14

2 DADDI R1, R1, #8 5 9 10

2 BNE R1,R2, for 6 11

3 L.D F0, 0(R1) 7 12 13 14


Instructions per cycle of Dual-Issue Superscalar without
Speculation
◦ IPC = CPI – 1
◦ New loop instructions (5 instructions) fetched and issued per 3 cc
◦ IPC = 5/3 = 1.67?
◦ However, for 10 instructions, the IPC = 10/14 = 0.71
◦ approaches 1 if we schedule more iterations
◦ Let’s check for 3 iteration
◦ Issue unit will eventually fill all RSs and stall
◦ IPC single-issue dynamically scheduled core (for 10 inst it is 16
cc)= 0.625
◦ For 15 inst it is 21, i.e. 0.71
IPC of Dual-Issue Superscalar without Speculation

Iter # Instruction Issue @ Begin Ex @ Mem access @ Write CDB @


1 L.D F0, 0(R1) 1 2 3 4
1 ADD.D F4, F0, F2 1 5 8
1 S.D F4, 0(R1) 2 3 9
1 DADDI R1, R1, 8 2 4 5
1 BNE R1, R2, for 3 6
2 L.D F0, 0(R1) 4 7 8 9
IPC of Dual-Issue Superscalar without Speculation
1 INT ALU

Iter # Instruction Issue @ Begin Ex @ Mem access @ Write CDB @


1 L.D F0, 0(R1) 1 2 3 4
1 ADD.D F4, F0, F2 1 5 8
1 S.D F4, 0(R1) 2 3 9
1 DADDI R1, R1, 8 2 4 5
1 BNE R1, R2, for 3 6
2 L.D F0, 0(R1) 4 7 8 9

Since the DADDI has to wait


for the S.D to complete the
EA calculation
IPC of Dual-Issue Superscalar without Speculation

Iter # Instruction Issue @ Begin Ex @ Mem access @ Write CDB @


1 L.D F0, 0(R1) 1 2 3 4
1 ADD.D F4, F0, F2 1 5 8
1 S.D F4, 0(R1) 2 3 9
1 DADDI R1, R1, 8 2 4 5
1 BNE R1, R2, for 3 6
2 L.D F0, 0(R1) 4 7 8 9
IPC of Dual-Issue Superscalar without Speculation
Control hazard
causes 2 cc
Iter # Instruction Issue @ Begin Ex @ Mem access @ Write CDB @
stalls
1 L.D F0, 0(R1) 1 2 3 4
1 ADD.D F4, F0, F2 1 5 8
1 S.D F4, 0(R1) 2 3 9
1 DADDI R1, R1, 8 2 4 5
1 BNE R1, R2, for 3 6
2 L.D F0, 0(R1) 4 7 8 9

• Extra int ALU will improve the IPC

• Control hazard causes 2-cycle penalty every iteration


• Use speculation
Dual-Issue Dynamic Superscalar with Speculation
Iter. # Instructions Issues @ Begins EX @ Read acc @ Write CDB Commit

1 L.D F0, 0(R1) 1 2 3 4 5

1 ADD.D F4, F0, F2 1 5 8* 9

1 S.D F4, 0(R1) 2 3 9

1 DADDI R1, R1, #8 2 4 5 10

1 BNE R1,R2, for 3 6 10

2 L.D F0, 0(R1) 4 6 7 8* 11 *Assuming multiple


write CDB in same cycle
2 ADD.D F4, F0, F2 4 9 12* 13

2 S.D F4, 0(R1) 5 7 13

2 DADDI R1, R1, #8 5 8 9 14

2 BNE R1,R2, for 6 10 14

3 L.D F0, 0(R1) 7 10 11 12* 15

3 ADD.D F4, F0, F2 7 13 16 17

3 S.D F4, 0(R1) 8 11 17

3 DADDI R1, R1, #-8 8 12 13 18


Dual-Issue Dynamic Superscalar with Speculation
Iter. # Instructions Issues @ Begins EX @ Read acc @ Write CDB Commit

1 L.D F0, 0(R1) 1 2 3 4 5 No stalls 1st issue


1 ADD.D F4, F0, F2 1 5 8 9

1 S.D F4, 0(R1) 2 3 9

1 DADDI R1, R1, #8 2 4 5 10

1 BNE R1,R2, for 3 6 10

2 L.D F0, 0(R1) 4 6 7 8* 11

2 ADD.D F4, F0, F2 4 9 12 13

2 S.D F4, 0(R1) 5 7 13

2 DADDI R1, R1, #8 5 8 9 14

2 BNE R1,R2, for 6 10 14

3 L.D F0, 0(R1) 7 10 11 12* 15

3 ADD.D F4, F0, F2 7 13 16 17

3 S.D F4, 0(R1) 8 11 17

3 DADDI R1, R1, #-8 8 12 13 18


Dual-Issue Dynamic Superscalar with Speculation
Iter. # Instructions WaitIssues
for L.D start EX
@ Begins after
@ CDB
Read acc @ Write CDB Commit

1 L.D F0, 0(R1) 1 2 3 4 5

1 ADD.D F4, F0, F2 1 5 8 9

1 S.D F4, 0(R1) 2 3 9

1 DADDI R1, R1, #8 2 4 5 10

1 BNE R1,R2, for 3 6 10

2 L.D F0, 0(R1) 4 6 7 8* 11

2 ADD.D F4, F0, F2 4 9 12 13

2 S.D F4, 0(R1) 5 7 13

2 DADDI R1, R1, #8 5 8 9 14

2 BNE R1,R2, for 6 10 14

3 L.D F0, 0(R1) 7 10 11 12* 15

3 ADD.D F4, F0, F2 7 13 16 17

3 S.D F4, 0(R1) 8 11 17

3 DADDI R1, R1, #-8 8 12 13 18


Dual-Issue Dynamic Superscalar with Speculation
Iter. # Instructions Issues @ Begins EX @ Read acc @ Write CDB Commit

1 L.D F0, 0(R1) 1 2 3 4 5

1 ADD.D F4, F0, F2 1 5 8 9

1 S.D F4, 0(R1) 2 3 9 Commit only after


1 DADDI R1, R1, #8 2 4 5 10 value have been
1 BNE R1,R2, for 3 6 10
placed on CDB by
ADD.D
2 L.D F0, 0(R1) 4 6 7 8* 11

2 ADD.D F4, F0, F2 4 9 12 13

2 S.D F4, 0(R1) 5 7 13

2 DADDI R1, R1, #8 5 8 9 14

2 BNE R1,R2, for 6 10 14

3 L.D F0, 0(R1) 7 10 11 12* 15

3 ADD.D F4, F0, F2 7 13 16 17

3 S.D F4, 0(R1) 8 11 17

3 DADDI R1, R1, #-8 8 12 13 18


Dual-Issue Dynamic Superscalar with Speculation
Iter. # Instructions Issues @ Begins EX @ Read acc @ Write CDB Commit

Wait for 1 L.D F0, 0(R1) 1 2 3 4 5


ALU, in cc Commit in cc 10(
1 ADD.D F4, F0, F2 1 5 8 9
3 the ALU for in-order
is used by 1 S.D F4, 0(R1) 2 3 9 commit), Also we
S.D 1 DADDI R1, R1, #8 2 4 5 10 assume upto 2
instructions can
1 BNE R1,R2, for 3 6 10
commit per cycle
2 L.D F0, 0(R1) 4 6 7 8* 11 (since dual issue)
2 ADD.D F4, F0, F2 4 9 12 13

2 S.D F4, 0(R1) 5 7 13

2 DADDI R1, R1, #8 5 8 9 14

2 BNE R1,R2, for 6 10 14

3 L.D F0, 0(R1) 7 10 11 12* 15

3 ADD.D F4, F0, F2 7 13 16 17

3 S.D F4, 0(R1) 8 11 17

3 DADDI R1, R1, #-8 8 12 13 18


Dual-Issue Dynamic Superscalar with Speculation
Iter. # Instructions Issues @ Begins EX @ Read acc @ Write CDB Commit

1 L.D F0, 0(R1) 1 2 3 4 5

1 ADD.D F4, F0, F2 1 5 8 9

1 S.D F4, 0(R1) 2 3 9 Branch wait for


1 DADDI R1, R1, #8 2 4 5 10 preceding DADDI
1 BNE R1,R2, for 3 6 10

2 L.D F0, 0(R1) 4 6 7 8* 11

2 ADD.D F4, F0, F2 4 9 12 13

2 S.D F4, 0(R1) 5 7 13

2 DADDI R1, R1, #8 5 8 9 14

2 BNE R1,R2, for 6 10 14

3 L.D F0, 0(R1) 7 10 11 12* 15

3 ADD.D F4, F0, F2 7 13 16 17

3 S.D F4, 0(R1) 8 11 17

3 DADDI R1, R1, #-8 8 12 13 18


Dual-Issue Dynamic Superscalar with Speculation
Iter. # Instructions Issues @ Begins EX @ Read acc @ Write CDB Commit

1 L.D F0, 0(R1) 1 2 3 4 5

1 ADD.D F4, F0, F2 1 5 8 9

1 S.D F4, 0(R1) 2 3 9 Wait for DADDI,


1 DADDI R1, R1, #8 2 4 5 10 and does not wait
1 BNE R1,R2, for 3 6 10
fro BNE because of
speculation
2 L.D F0, 0(R1) 4 6 7 8* 11

2 ADD.D F4, F0, F2 4 9 12 13

2 S.D F4, 0(R1) 5 7 13

2 DADDI R1, R1, #8 5 8 9 14

2 BNE R1,R2, for 6 10 14

3 L.D F0, 0(R1) 7 10 11 12* 15

3 ADD.D F4, F0, F2 7 13 16 17

3 S.D F4, 0(R1) 8 11 17

3 DADDI R1, R1, #-8 8 12 13 18


Dual-Issue Dynamic Superscalar with Speculation
Iter. # Instructions Issues @ Begins EX @ Read acc @ Write CDB Commit

1 L.D F0, 0(R1) 1 2 3 4 5

1 ADD.D F4, F0, F2 1 5 8 9

1 S.D F4, 0(R1) 2 3 9 Wait for L.D


1 DADDI R1, R1, #8 2 4 5 10

1 BNE R1,R2, for 3 6 10

2 L.D F0, 0(R1) 4 5 6 7 11

2 ADD.D F4, F0, F2 4 8 11 12

2 S.D F4, 0(R1) 5 6 12

2 DADDI R1, R1, #8 5 7 8 13

2 BNE R1,R2, for 6 9 13

3 L.D F0, 0(R1) 7 8 9 10 14

3 ADD.D F4, F0, F2 7 11 14 15

3 S.D F4, 0(R1) 8 9 15

3 DADDI R1, R1, #-8 8 10 11 16


Dual-Issue Dynamic Superscalar with Speculation
Iter. # Instructions Issues @ Begins EX @ Read acc @ Write CDB Commit

1 L.D F0, 0(R1) 1 2 3 4 5

1 ADD.D F4, F0, F2 1 5 8 9

1 S.D F4, 0(R1) 2 3 9 Wait for ADD.D


1 DADDI R1, R1, #8 2 4 5 10

1 BNE R1,R2, for 3 6 10

2 L.D F0, 0(R1) 4 6 7 8* 11

2 ADD.D F4, F0, F2 4 9 12 13

2 S.D F4, 0(R1) 5 7 13

2 DADDI R1, R1, #8 5 8 9 14

2 BNE R1,R2, for 6 10 14

3 L.D F0, 0(R1) 7 10 11 12* 15

3 ADD.D F4, F0, F2 7 13 16 17

3 S.D F4, 0(R1) 8 11 17

3 DADDI R1, R1, #-8 8 12 13 18


Dual-Issue Dynamic Superscalar with Speculation
Iter. # Instructions Issues @ Begins EX @ Read acc @ Write CDB Commit

1 L.D F0, 0(R1) 1 2 3 4 5

1 ADD.D F4, F0, F2 1 5 8 9

1 S.D F4, 0(R1) 2 3 9 Wait for ALU to


1 DADDI R1, R1, #8 2 4 5 10 become
1 BNE R1,R2, for 3 6 10
available

2 L.D F0, 0(R1) 4 6 7 8* 11

2 ADD.D F4, F0, F2 4 9 12 13

2 S.D F4, 0(R1) 5 7 13

2 DADDI R1, R1, #8 5 8 9 14

2 BNE R1,R2, for 6 10 14

3 L.D F0, 0(R1) 7 10 11 12* 15

3 ADD.D F4, F0, F2 7 13 16 17

3 S.D F4, 0(R1) 8 11 17

3 DADDI R1, R1, #-8 8 12 13 18


Dual-Issue Dynamic Superscalar with Speculation
Iter. # Instructions Issues @ Begins EX @ Read acc @ Write CDB Commit

1 L.D F0, 0(R1) 1 2 3 4 5

1 ADD.D F4, F0, F2 1 5 8 9

1 S.D F4, 0(R1) 2 3 9 Wait for DADDI


1 DADDI R1, R1, #8 2 4 5 10

1 BNE R1,R2, for 3 6 10

2 L.D F0, 0(R1) 4 6 7 8* 11

2 ADD.D F4, F0, F2 4 9 12 13

2 S.D F4, 0(R1) 5 7 13

2 DADDI R1, R1, #8 5 8 9 14

2 BNE R1,R2, for 6 10 14

3 L.D F0, 0(R1) 7 10 11 12* 15

3 ADD.D F4, F0, F2 7 13 16 17

3 S.D F4, 0(R1) 8 11 17

3 DADDI R1, R1, #-8 8 12 13 18


Dual-Issue Dynamic Superscalar with Speculation
Iter. # Instructions Issues @ Begins EX @ Read acc @ Write CDB Commit

1 L.D F0, 0(R1) 1 2 3 4 5

1 ADD.D F4, F0, F2 1 5 8 9

1 S.D F4, 0(R1) 2 3 9 Wait for DADDI


1 DADDI R1, R1, #8 2 4 5 10

1 BNE R1,R2, for 3 6 10

2 L.D F0, 0(R1) 4 6 7 8* 11

2 ADD.D F4, F0, F2 4 9 12 13

2 S.D F4, 0(R1) 5 7 13

2 DADDI R1, R1, #8 5 8 9 14

2 BNE R1,R2, for 6 10 14

3 L.D F0, 0(R1) 7 10 11 12* 15

3 ADD.D F4, F0, F2 7 13 16 17

3 S.D F4, 0(R1) 8 11 17

3 DADDI R1, R1, #-8 8 12 13 18


◦IPC for two iterations = 10/14 = 0.71
◦ For three iterations = 15/18 = 0.83
IPC of Dual-Issue Superscalar with Speculation
Iter # Instruction Issue @ Begin Ex @ Read acc @ Write CDB @ Commit
2 L.D F0, 0(R1) 4 6 7 8 11
2 ADD.D F4, F0, F2 4 9 12 13
2 S.D F4, 0(R1) 5 7 13
2 DADDI R1, R1, 8 5 8 9 14
2 BNE R1, R2, for 6 10 14

• 5 instructions commit every 4 cycles -> IPC = 5/4 = 1.25


• does not improve if we add int ALU
• ADD.D has to wait for L.D and writes CDB at same time L.D commits
• commit -> always lose 1 cycle per iteration
Thank You

You might also like