0% found this document useful (0 votes)
47 views114 pages

CAO Fall 2024 Lecture 07 RISC V Pipelined Implementation

Uploaded by

Omair Siddique
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views114 pages

CAO Fall 2024 Lecture 07 RISC V Pipelined Implementation

Uploaded by

Omair Siddique
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 114

EE-321 Fall 2024

Computer Architecture and Organization

Lecture # 07
RISC-V Pipelined Implementation

Muhammad Imran
[email protected]
Acknowledgement
2

▪ Content from following has been used in these lectures


▪ Computer Organization and Design, RISC-V 2nd Edition, Patterson and
Hennessy
Contents
3

▪ Pipelining – An Overview
▪ Pipelined Microarchitecture
▪ Dependencies and Pipelining Hazards
▪ Dealing with Pipelining Hazards in Hardware
▪ Handling Multicycle Operations
Pipelining to Improve Throughput
A Laundry Analogy
5

▪ Pipelining helps execute multiple tasks in parallel


▪ Improves throughput …
Improvement by Pipelining
6

▪ Pipelining only improves throughput


▪ Individual tasks still take same amount of time
▪ When number of tasks is too large and pipeline stages are perfectly
balanced!
▪ Improvement in performance ≈ Number of pipeline stages
▪ Example
▪ Suppose each stage in laundry takes 10 minutes and there are 4 stages
▪ Without pipelining
▪ 4 loads take 40×4=160 minutes!
▪ With pipelining
▪ 4 loads take 70 minutes!
▪ With large number of loads, improvement would approach 4 times!
Pipelining in RISC-V
7

▪ Instruction execution stages


1. Fetch instruction from memory.
2. Read registers and decode the instruction.
3. Execute the operation or calculate an address.
4. Access an operand in data memory (if necessary).
5. Write the result into a register (if necessary).
▪ Therefore, we can implement a five-stage pipeline for RISC-V!

▪ RISC-V Pipeline Stages are not perfectly balanced!


Pipelining in RISC-V
8

▪ Example of RISC-V instruction execution time!

▪ Assuming Multiplexors, Control Unit, PC Access and Sign-extension


unit has no delay!
▪ Load takes the longest!
▪ For single-cycle, the clock period must be that of load time!
▪ Among individual stages the longest time is 200 ps!
▪ The clock period in pipelined architecture must be at least 200 ps (+ tclk2q
+ ts), although some stages take only 100 ps!!
Pipelining in RISC-V
9

▪ Time to execute three instructions


▪ Without pipelining: 2400 ps
▪ After pipelining: 1400 ps
▪ Time between first and 3rd instruction = 2×200 ps = 400 ps
Pipelining in RISC-V
10

▪ How about adding 1,000,000 more instructions (to current 3)?


▪ Each instruction adds 200 ps, total execution time using pipelining would be (1400 ps +
200×1,000,000 ps) = 200,001,400 ps
▪ For non-pipelined design execution time = (2400 ps + 1,000,000×800 ps) = 800,002,400
ps
▪ Improved is about 4 times = amount of reduction in clock period (given imbalanced stages)!
Pipelining in RISC-V
11

▪ Implementing RISC-V Pipelining


▪ RISC-V instructions are easier to pipeline, because of
▪ Fixed instruction size
▪ Easy to fetch and decode!
▪ Fixed location of source/destination operands
▪ Memory operands appear only in loads/stores
▪ Execution stage can be used for address calculation!
▪ Access memory in last stage!
▪ Allowing memory operand in other instructions would increase number of
pipeline stages / imbalance!
▪ RISC-V has fewer instruction formats!

▪ x86 architecture has variable instruction sizes


▪ Hard to pipeline
▪ x86 instructions are translated into RISC like operations to implement
pipelining
Pipelining is a programmer invisible technique for
performance improvement …
Pipelined Microarchitecture
RISC-V Pipeline Stages
14

1. IF: Instruction fetch


2. ID: Instruction decode and register file read
3. EX: Execution or address calculation
4. MEM: Data memory access
5. WB: Write back
RISC-V Pipeline Stages
15

IF: Instruction Fetch ID: Instruction Decode / EX: Execute MEM: Memory WB: Write Back
Register File Read Access

+
Branch

Control
4
+
RegWrite

0 MemRead
Read Read Read
PC
Address Register 1 Data 1 MemToReg
1 Zero
Instruction Read Read ALU Read
Register 2 ALUSrc Address 1
[31:0] Data 2 Result Data
0 ALU
Write Register 0

Write Data 1
Instruction
Write
Memory Register File
Data

Imm Data Memory


Gen

ALU
Control MemWrite

ALUOp
Adding Pipeline Registers
16

IF/ID ID/EX EX/MEM MEM/WB

+
Branch

Control
4
+
RegWrite

0 MemRead
Read Read Read
PC
Address Register 1 Data 1 MemToReg
1 Zero
Instruction Read Read ALU Read
Register 2 ALUSrc Address 1
[31:0] Data 2 Result Data
0 ALU
Write Register 0

Write Data 1
Instruction
Write
Memory Register File
Data

Imm Data Memory


Gen

ALU
Control MemWrite

ALUOp
• Four Pipeline Registers
• PC also acts as a pipeline register
Writing Back the Correct Reigster
17

IF/ID ID/EX EX/MEM MEM/WB

+
Branch

Control
4
+
RegWrite

0 MemRead
Read Read Read
PC
Address Register 1 Data 1 MemToReg
1 Zero
Instruction Read Read ALU Read
Register 2 ALUSrc Address 1
[31:0] Data 2 Result Data
0 ALU
Write Register 0

Write Data 1
Instruction
Write
Memory Register File
Data

Imm Data Memory


Gen

ALU
• Must save the Write Register address Control MemWrite
until the Write Back stage
ALUOp
• Similarly, any other signal for a
particular instruction, needed in later
stages!
Key Points!
18

▪ For a store instruction, the contents of the register read in Stage 2


must be saved in pipeline
▪ Used in 4th Stage
▪ A store instruction finishes in 4th stage (memory access)
▪ Can we execute instructions faster because of this?
▪ No, because other instructions are already in process!
▪ An instruction must pass through all stages even if it doesn’t use some
of the stages!
▪ Each logical component can be used only within a single pipeline
stage!
▪ Otherwise, there would be structural hazard!
Pipelining Control
Control Signals in Different Stages
20

IF/ID ID/EX EX/MEM MEM/WB

+ Branch

Control
4
+
RegWrite

0 MemRead
Read Read Read
PC
Address Register 1 Data 1 MemToReg
1 Zero
Instruction Read Read ALU Read
Register 2 ALUSrc Address 1
[31:0] Data 2 Result Data
0 ALU
Write Register 0

Write Data 1
Instruction
Write
Memory Register File
Data

Imm Data Memory


Gen

ALU
• Which stage RegWrite belongs to? Control MemWrite

ALUOp
Key in Pipelining Control
21

▪ Control Signals are generated in Instruction Decode State


▪ Save control signals in pipeline registers until they are used (in later
stages)!
Identifying Control Signals for Each Stage
22

▪ Needed in 3 subsequent stages after generation in Second Stage!


Pipelining Control
23

▪ Saving signals until used …


Pipelined Datapath and Control
24

IF/ID ID/EX EX/MEM MEM/WB

+ Branch
M
W
RegWrite
W W
Control M M
4
+ E
RegWrite

0 MemRead
Read Read Read
PC
Address Register 1 Data 1 MemToReg
1 Zero
Instruction Read Read ALU Read
Register 2 ALUSrc Address 1
[31:0] Data 2 Result Data
0 ALU
Write Register 0

Write Data 1
Instruction
Write
Memory Register File
Data

Imm Data Memory


Gen

ALU
Control MemWrite

ALUOp
Dependencies and Pipelining Hazards
Dependences and Hazards
26

▪ Dependency
▪ An instruction’s execution dependent on outcome of another instruction
▪ Types of dependences
▪ Flow dependences or true data dependences
▪ Name dependences
▪ Control dependences

▪ Hazard
▪ Situations when an instruction in the pipeline cannot correctly execute
in the following cycle
▪ Types of hazards
▪ Structural Hazards
▪ Data Hazards
▪ Control Hazards
Dependences and Hazards
27

▪ Dependencies are a property of program


▪ A dependence may or may not result in hazard!
▪ A hazard may or may not require pipeline stall to resolve!
▪ This is a property of the pipeline organization!
▪ Implementation dependent!
Data Dependences
28

▪ Flow dependence
▪ x3  x1 op x2 Read after Write True Data Dependency
▪ x5  x3 op x4 (RAW)

▪ Anti dependence
▪ x3  x1 op x2 Write after Read
▪ x1  x4 op x5 (WAR)

Name Dependency
▪ Output dependence Do not lead to a hazard in simple
in-order pipeline we are
▪ x3  x1 op x2 discussing now!
Write after Write
▪ x5  x3 op x4
(WAW) Hazards due to name
▪ x3  x6 op x7 dependences can be addressed
by register renaming!
Data Dependences
29

▪ Data value may flow through registers or memory


▪ Data-dependence through registers is easier to detect
▪ That through memory is harder
▪ 100(x4) and 20(x6) may be same locations!
▪ 20(x4) and 20(x4) may be different in different instructions!
▪ Data hazard through memory is present when memory is accessed out
of program order!
▪ Not a case in simple pipeline!
Control Dependences
30

▪ Every instruction except that within first basic block is dependent on


outcome of a branch!
▪ Example
▪ add x1, x2, x3
▪ beq x4, x0, L
▪ sub x1, x5, x6
▪ …
▪ L:
▪ or x7, x1, x8
Structural Hazards
31

▪ When hardware cannot execute a combination of instructions in


same cycle
▪ The required hardware resource is busy!
▪ Example
▪ If we had one memory for instructions and data in RISC-V
▪ Memory access from one instruction and instruction fetch of another
couldn’t execute in one cycle

▪ RISC-V instruction set was designed to be pipelined


▪ Easier to avoid structural hazards in pipelined implementation
Addressing Data Hazards in the Pipeline
In a simple five stage in-order integer pipeline, we
only need to address Read-after-Write (RAW) data
hazard which is due to true data dependency!
Data Hazards
34

▪ When instruction cannot execute because the data required by it is


not yet available
▪ Dependence of one instruction on an earlier instruction in the pipeline
▪ Example
▪ add x1, x2, x3
▪ sub x4, x1, x5
▪ x1 will be written back in first instruction later before it is required by 2nd
instruction!

Without addressing data hazard outdated value of x1 will be read!


Addressing Data Hazards
35

▪ Simple Solution
▪ Stall the pipeline!
▪ Wait until the data has been written

Pipeline Stall or Bubble –


A pipeline stall (wait)
initiated to resolve a
hazard

▪ Need to wait 3 cycles, so that slows down execution!


▪ Implementing stall
▪ Compiler can implement stalls (by adding NOPs) to resolve data hazards!
▪ Or hardware can stall the pipeline!
Addressing Data Hazards
36

▪ Forwarding cannot prevent all pipeline stalls!


▪ Example
▪ lw x1, 0(x2)
▪ sub x4, x1, x5
▪ x1 is not available for forwarding when it is needed by 2nd instruction!

▪ Known as load-use data hazard!


Addressing Data Hazards
37

▪ Forwarding or Bypassing
▪ Forward the data to the next instruction when it is available
▪ Do not wait for the write back stage to complete!

▪ Forwarding only works destination stage is later in time compared to


source stage!
Addressing Data Hazards
38

▪ Different cases for forwarding!


▪ An R-type / load / store instruction dependent on R-type instruction
▪ add x4, x5, x6
▪ sub x7, x4, x8
▪ Can be resolved by forwarding result of third stage of source to input of third
stage of the destination!
▪ An R-type / load / store (for address operand) instruction dependent on
load instruction
▪ lw x4, x5, x6
▪ sw x7, 0(x4)
▪ Can be resolved by adding a stall and then forwarding result of fourth stage of
source to input of third stage of the destination!
▪ A store (for data operand) instruction dependent on load instruction!
▪ lw x4, 0(x6)
▪ sw x4, 0(x8)
▪ Can be resolved by forwarding result of fourth stage of source to input of fourth
stage of the destination! (without any stall !!!)
Addressing Data Hazards
39

▪ Different cases for forwarding!


▪ From 3rd or 4th stage to 3rd stage!
▪ From 4th stage to 4th stage!
▪ To resolve data hazards in hardware
▪ Need to implement above two cases of forwarding!
▪ And
▪ Mechanism to stall pipeline for one cycle when needed!
▪ An R-type / load / store (for address operand) instruction dependent on load
instruction!
Addressing Data Hazards
40

▪ Code reordering / static scheduling to prevent stalls


▪ Example
▪ Consider following C Code
▪ a = b + e;
▪ c = b + f;
▪ Compiled to following RISC-V Code
▪ lw x1, 0(x31) # Load b
▪ lw x2, 8(x31) # Load e
▪ add x3, x1, x2 #b+e
▪ sw x3, 24(x31) # Store a
▪ lw x4, 16(x31) # Load f
▪ add x5, x1, x4 #b+f
▪ sw x5, 32(x31) # Store c
▪ Which instructions have data hazards?
▪ How many hazards are there with or without forwardig?
Addressing Data Hazards
41

▪ Code reordering to prevent stalls


▪ Example
▪ Data hazards when we can use forwarding
lw x1, 0(x31) lw x1, 0(x31)
lw x2, 8(x31) lw x2, 8(x31)
add x3, x1, x2 lw x4, 16(x31)
sw x3, 24(x31) add x3, x1, x2
lw x4, 16(x31) sw x3, 24(x31)
add x5, x1, x4 add x5, x1, x4
sw x5, 32(x31) sw x5, 32(x31)
▪ How can we reorder code to avoid stalls?
▪ Simply moving lw instruction above in the sequence can avoid both
hazards!
▪ How much this improves the performance?
▪ The new code sequence executes two cycles faster (assuming that we are using
forwarding!)
Addressing Data Hazards
42

▪ Forwarding implementation in RISC-V


▪ RISC-V instructions write at most one result!
▪ Result is written in last stage!
▪ Forwarding is harder if
▪ More than one results are to be forwarded per instruction!
Implementing forwarding to address data hazards!
Revisiting Data Hazards
44

▪ Example of data dependencies


▪ sub x2, x1, x3 # Register x2 written by sub
▪ and x12, x2, x5 # 1st operand(x2) depends on sub
▪ or x13, x6, x2 # 2nd operand(x2) depends on sub
▪ add x14, x2, x2 # 1st(x2) & 2nd(x2) depend on sub
▪ sw x15, 100(x2) # Base (x2) depends on sub

▪ Which instructions can have data hazards?


Revisiting Data Hazards
45

▪ Example of data hazards

Does this also


cause a hazard?
Yes, it is, but
preventable
within the
register file!
Detecting Data Hazards
46

▪ How can we detect data hazards in hardware?

When rd register or rd register


here here

i.e., when the register


read from register file
is the same which will
be written back by a
equals rs1/rs2 previous instruction!
here
Detecting Data Hazards
47

▪ Conditions for Hazards


1a. EX/MEM.RegisterRd = ID/EX.RegisterRs1
EX Hazard!
1b. EX/MEM.RegisterRd = ID/EX.RegisterRs2
2a. MEM/WB.RegisterRd = ID/EX.RegisterRs1 MEM Hazard!
2b. MEM/WB.RegisterRd = ID/EX.RegisterRs2
▪ Example
▪ sub x2, x1, x3
▪ and x12, x2, x5
▪ or x13, x6, x2 Type 1a Hazard!
▪ add x14, x2, x2
Type 2b Hazard!
▪ sw x15, 100(x2)

Is there any problem with this Hazard Detection Policy?


Detecting Data Hazards
48

▪ Rd of an instruction (in EX or MEM Stage) can be equal to rs1/rs2 of


a subsequent instruction even though this instruction doesn’t write
back!
▪ Using forwarding for this case would be wrong!
▪ Solution?
▪ Check if RegWrite is active!
▪ Which is in the WB Control Signals of EX/MEM register!

▪ Any other problems with Hazard Detection Policy?


▪ When x0 is the destination!!
▪ Wrong non-zero result would be forwarded!
▪ x0 as a source operand in subsequent instruction would get non-zero
forwarded data!
▪ So, do not forward when rd = x0 !!!
▪ Any other problem ??? … ☺
Detecting Data Hazards
49

▪ Final Problem
▪ What happens if two hazard detection conditions (EX Hazard and MEM
Hazard) occur at the same time?
▪ EX/MEM.RegisterRd = ID/EX.RegisterRs1/Rs2
▪ MEM/WB.RegisterRd = ID/EX.RegisterRs1/Rs2
▪ Example
▪ Summing a vector in a register
▪ add x1, x1, x2
EX Hazard ▪ add x1, x1, x3
EX + MEM ▪ add x1, x1, x4
▪ ...
▪ Two possible sources of forwarding
▪ From Memory Access Stage (MEM Hazard)!
▪ From Execute Stage (EX Hazard)!
▪ The updated data is in EX Stage!
▪ Thus, we need to discard MEM Hazard in this case!
Detecting Data Hazards
50

▪ Final Hazard Detection Policy / Logic


▪ EX Hazard
▪ EX/MEM.RegisterRd = ID/EX.RegisterRs1/Rs2
▪ And EX/MEM.RegWrite = 1
▪ And Rd ≠ x0
▪ MEM Hazard
▪ MEM/WB.RegisterRd = ID/EX.RegisterRs1/Rs2
▪ And MEM/WB.RegWrite = 1
▪ And Rd ≠ x0
▪ And there is NO EX Hazard !!!
▪ i.e., NOT of EX Hazard conditions!
Forwarding to Resolve Hazards
51

▪ Before and after forwarding …


Additional Datapath for Forwarding
52

▪ Rs1 and Rs2 (addresses) are added in ID/EX register to support


forwarding!
▪ Mux for immediate/register before ALU is not shown!
Forwarding Control
53

▪ ForwardA and ForwardB are the control signals generated by


forwarding unit!
Forwarding Control
54

▪ Logic to implement control in forwarding unit


▪ EX Hazard
▪ if (EX/MEM.RegWrite
and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs1)) ForwardA = 10
▪ if (EX/MEM.RegWrite
and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs2)) ForwardB = 10

▪ MEM Hazard
▪ if (MEM/WB.RegWrite
and (MEM/WB.RegisterRd ≠ 0)
and not(EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs1))
and (MEM/WB.RegisterRd = ID/EX.RegisterRs1)) ForwardA = 01
▪ if (MEM/WB.RegWrite
and (MEM/WB.RegisterRd ≠ 0)
and not(EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs2))
and (MEM/WB.RegisterRd = ID/EX.RegisterRs2)) ForwardB = 01
Datapath with Forwarding
55
Datapath with Forwarding
56

▪ Do not forget to include ALU Source Mux …


The other case of forwarding …
57

▪ What about store following a load?


▪ lw x1, 0(x2)
▪ sw x1, 0(x3)

▪ Need to forward from MEM/WB register to memory access stage! (not


to ALU!)
▪ Need additional hardware!
▪ Try implementing it as a learning exercise!
When Stalls are unavoidable …
Situations where we must Stall!
59

▪ When a register is read following a load instruction for the same


register …
Let’s see how we can handle Stalls in the hardware

Stalls in Hardware
61

▪ Hazard detection unit in the ID stage decides if instruction must be


stalled because of the previous load instruction!
▪ What stall really is?
▪ Can we implement stall by allowing the instruction to execute twice? Would
there be any problem?
▪ To implement the stall
▪ We must not allow the current instruction to progress
▪ i.e., do not let the IF/ID pipeline register change!
▪ Let the next instruction wait before being fetched into IF/ID!
▪ i.e., do not let the PC to change!
▪ To prevent the current decoded instruction from execution
▪ Do not let register file and memory be written!
Stalls in Hardware
62

▪ Instructions execution with a stall!


Hazard Detection
63

▪ Logic to detect whether to stall an instruction!


▪ Check if the previous instruction is a load!
▪ Check if the current instruction reads the register loaded by previous
instruction!
▪ if (ID/EX.MemRead and
((ID/EX.RegisterRd = IF/ID.RegisterRs1) or
(ID/EX.RegisterRd = IF/ID.RegisterRs2)))
stall the pipeline
Control Signals to Implement Stall
64

▪ To stall current instruction


▪ Do not allow IF/ID register to be changed
▪ Use IF/IDWrite signal (write enable signal) to control when IF/ID register
should be allowed to be changed!
▪ Make the control signals for following stages (Execute, Memory and
Write Back) to be 0!
▪ Only RegWrite and MemWrite need to be 0, rest can be don’t care!
▪ Will prevent memory / register file to be written!
▪ Practically same as inserting nop instruction!

▪ Preventing next instruction to be fetched


▪ Use PCWrite signal (write enable signa) to prevent PC from being
updated!
Datapath and Control with Stalls
65
Knowledge Check!
66

▪ Tell whether following code sequences must stall, can avoid stall
with only forwarding or can execute without stall or forwarding …
▪ Example 1
▪ lw x10, 0(x10)
▪ add x11, x10, x10
▪ Answer:
▪ Cannot fully avoid stall!
▪ Can reduce one cycle by forwarding!
Knowledge Check!
67

▪ Tell whether following code sequences must stall, can avoid stall
with only forwarding or can execute without stall or forwarding …
▪ Example 2
▪ add x11, x10, x10
▪ addi x12, x10, 5
▪ addi x14, x11, 5
▪ Answer:
▪ Third instruction needs x11 before first writes it back!
▪ read-after-write data hazard!
▪ Stall can be avoided by forwarding!
Knowledge Check!
68

▪ Tell whether following code sequences must stall, can avoid stall
with only forwarding or can execute without stall or forwarding …
▪ Example 3
▪ addi x11, x10, 1
▪ addi x12, x10, 2
▪ addi x13, x10, 3
▪ addi x14, x10, 4
▪ addi x15, x10, 5
▪ Answer:
▪ No stalls even without forwarding!
Hardware can fully resolve all data hazards but code
reordering / static scheduling in software can help
prevent some Stalls!
Addressing Control Hazards in Pipeline
Control Hazards
71

▪ When conditional branch instructions need to decide which


instruction should be next
▪ Next instruction cannot be fetched until branch decision is finalized!
▪ Also known as branch hazards
▪ Simple solution?
▪ Stall the pipeline until branch is decided!
▪ Implementation is same as in case of data hazards, need to enhance
hazard detection logic to detect control hazards!
▪ How many stalls?
▪ If branch is decided in 3rd stage and PC is updated in 4th stage!
▪ Three stalls!
▪ If branch is decided in 3rd stage and PC is also updated in 3rd stage!
▪ Two stalls!
Reducing Branch Delay
72

▪ Branch decision can be finalized in ID stage by adding extra


hardware (xor gates for comparison + adder for PC+offset)!
▪ Still the pipeline may need to be stalled!
▪ Only one stall as opposed to 3!
▪ Example
Reducing Branch Delay
73

▪ Branch decision can be finalized in ID stage by adding extra


hardware (xor gates for comparison)!
▪ Problems?
▪ Forwarding (for RAW dependency) needs to be modified!
▪ If branch is dependent on an earlier instructions, forwarding may be needed
to ID stage where branch is executing!
▪ Forwarding sources can be EX/MEM or MEM/WB registers!
▪ Additional stalls (for RAW dependency) may also be needed!
▪ If branch is dependent on ALU instruction → 1 Cycle Stall
▪ If branch is dependent on previous load → 2 Cycle Stall
Addressing Control Hazards
74

▪ Prediction
▪ A better solution to control hazards
▪ Predict the outcome of branch and fetch the next instruction!
▪ If prediction is wrong, fetch the right instruction again!
▪ Prediction can be static or dynamic!
Addressing Control Hazards
75

▪ Static prediction
▪ Predict all branches as taken or not taken
▪ Last example using prediction
▪ When prediction is correct!

▪ When prediction is wrong!


Addressing Control Hazards
76

▪ Static prediction
▪ Predict some branches as taken and some as not taken
▪ For example, a branch instruction at the end of a loop is usually taken
▪ Predict all branches to earlier addresses to be taken!
▪ When predicting ‘taken’
▪ Pipeline may still be stalled until branch target address (PC+offset) is
available!
▪ Use target address buffer to avoid these stalls!
▪ Example
▪ If address is computed in ID stage and branch is also decided in ID stage!
▪ One stall when predicting as “taken”!
▪ No stall when predicting as “not taken”!
When prediction goes wrong … some instruction in
the pipeline may need to be discarded!
Addressing Control Hazards
78

▪ Assuming branch decision comes after MEM Stage!

Need to discard the


instructions in MEM, EX and
ID Stage if the branch is
taken!
Discarding Instruction
79

▪ Instruction entering MEM stage


▪ Make control signals as zeroes!
▪ Instruction entering EXE stage
▪ Make control signals as zeroes!
▪ How to discard instruction in IF/ID register which is fetched in same
cycle when branch is decided!
▪ Will enter ID stage in next cycle!
▪ The instruction is not yet decoded and there are no control signals to be
zeroed!
▪ How to handle this?
▪ Use a control signal IF.Flush which zeroes the instruction in IF/ID register!
▪ The instruction effectively becomes nop instruction!
Final Pipelined Datapath and Control
80

▪ Note: ALU Source Mux + Mux control signals not shown …


Addressing Control Hazards
81

▪ Delayed prediction
▪ Delay the branch decision!
▪ Execute the instruction which is not impacted by branch!
▪ Efficient for one-cycle branch delays!
▪ Example
▪ add x1, x2, x4
▪ beq x5, x6, somewhere
▪ Reorder the instructions as
▪ beq x5, x6, somewhere
▪ add x1, x2, x4
▪ Can be handled by assembler!
▪ Invisible to the programmer!
Addressing Control Hazards
82

▪ Dynamic prediction
▪ Predict based on knowledge of the behavior of branch instruction!
▪ Keep history of different branches as taken and not taken
▪ Predict based on prevalent behavior of different instructions!
▪ Because of lot of history, such prediction have accuracy of above 90%!
Dynamic Branch Prediction
83

▪ To implement dynamic prediction in hardware


▪ Include a small look-up table which is indexed by the lower portion of
the address of branches
▪ Branch Prediction Buffer
▪ Branch History Table
▪ A single bit may indicate if this branch was taken or not taken recently!
▪ Prediction bit is inverted on misprediction!
▪ Accuracy of such predictor in loops?
▪ 1 misprediction on start of loop!
▪ (n-1) correct predictions!
▪ 1 misprediction when exiting loop!
▪ Success rate = (n-1)/(n+1), e.g. for 9 iterations, it is 80%!
▪ Can we improve accuracy?
Dynamic Branch Prediction
84

▪ 2-bit branch prediction!


▪ Revert branch decision on two mispredictions not one!
Dynamic Prediction
85

▪ Tournament Branch Predictor


▪ Use multiple branch predictors!
▪ A selector selects the predictor which has been most accurate for a
given branch!
▪ Target Address Calculation
▪ For taken branches target address calculation requires 1 cycle!
▪ To avoid 1 cycle penalty, use
▪ Branch Target Buffer
▪ A cache to hold target PC or instruction for branches!
Knowledge Check!
86

▪ Consider three branch prediction schemes: predict not taken, predict taken,
and dynamic prediction. Assume that they all have zero penalty when they
predict correctly and two cycles when they are wrong. Assume that the
average predict accuracy of the dynamic predictor is 90%. Which predictor
is the best choice for the following branches?
1. A conditional branch that is taken with 5% frequency
▪ Answer
▪ Predict Not taken!
2. A conditional branch that is taken with 95% frequency
▪ Answer
▪ Predict Taken!
3. A conditional branch that is taken with 70% frequency
▪ Answer
▪ Dynamic Prediction is better!
Instruction Sets can make pipelining easier or harder

Relevant Reading
88

▪ Computer Organization and Design (RISC-V, 2nd Edition), Patterson


and Hennessy
▪ Chapter 4
▪ Sections 4.6, 4.7, 4.8 and 4.9!

▪ Do practice for exams!


▪ Exercise Problems
▪ Chapter 4 of the Textbook
▪ Problems relevant to Sections 4.6, 4.7, 4.8 and 4.9!
Handling Multicycle Floating-Point Operations in
RISC-V
Why multicycle?
90

▪ What happens if we force all operations to complete in one cycle?


▪ Slower clock
▪ More logic
▪ Can be both!
▪ Pipelining multicycle operation
▪ Start a new operation when the first operation goes into the second
stage of execution!
▪ For example, pipeline a multiplier into seven pipeline stages
▪ Can overlap execution of 7 (independent) multiply operations!

▪ Can we fully pipeline multicycle operations?


▪ Sometimes no!
Unpipelined Multicycle Execution
91

Xpower = 1;
for(i=0; i < 3; i++)
Xpower = X*Xpower;

clk
[7:0]

Start
[7:0]
×
0 [7:0] [7:0]
D[7:0] Q[7:0]
[7:0]
X[7:0] 1
Xpower[7:0]

▪ This design can, however, be fully pipelined!


▪ At cost of additional registers and multipliers!
We will see how to handle both pipelined and
unpipelined multicycle operations …
RISC-V Floating Point Pipeline
93

▪ Change the EX stage


▪ Allow operations to take multiple cycles for different operations!
▪ Add multiple functional units instead of one ALU
▪ Benefit?
▪ Reduced structural hazards!
▪ Cannot issue a second instruction to EX stage, if the functional unit is busy!

▪ Functional units in FP pipeline


▪ Integer unit for loads, stores, integer ALU operations, branches!
▪ FP and integer multiplier
▪ FP adder that handles FP add, subtract, and conversion
▪ FP and integer divider
RISC-V Floating Point Pipeline
94

▪ FP pipeline with unpipelined functional units

• EX stage is not pipelined but it is multicycle!


• An instruction using same functional unit in EX
stage as an earlier instruction cannot proceed
before completion of previous one!
• If an instruction cannot proceed to EX stage, all
the following instructions are stalled!
RISC-V Floating Point Pipeline
95

▪ Pipelining the functional units


▪ Let’s say
▪ FP/Integer multiply unit is pipelined into 7 stages!
▪ FP adder is pipelined into 4 stages!
▪ FP/Integer divide unit is unpipelined with 25 stages!

▪ Use latency
▪ No of cycles between an instruction producing a result and an
instruction using that result!
▪ After how many cycles dependent instruction can start execution!
▪ Just like loads always have use latency (wait) of 1 cycle!

▪ Initiation Interval
▪ When can another instruction that uses the same functional unit, be
issued (assuming it is independent)!
RISC-V Floating Point Pipeline
96

▪ Latencies and initiation intervals of functional units

▪ These are use latencies of instruction using the results in EX stage!


▪ What about use latencies for store instruction for the value being stored
(not the base address register)?
▪ Can be one cycle less!
▪ Because the stored value is needed in MEM stage not EX stage!
RISC-V Floating Point Pipeline
97

▪ FP pipeline with pipelined functional units

• DIV is still not pipelined!


• We can have up to 7 multiply, 4 FP add and 1 divide operation in the pipeline at same time!
RISC-V Floating Point Pipeline
98

▪ Penalty for faster clock and more pipeline stages in a functional unit?
▪ More latency!
▪ Can cause stalls due to data hazards!

▪ Additional Pipeline Registers


▪ ID register can be logically divided into multiple registers, one each for
every functional unit
▪ ID/EX, ID/A1, ID/M1, and ID/DIV
▪ Can also be implemented using different reigsters
▪ However, there’s only one operation that can be in this stage at a time and
control signals are associated only with that operation!
▪ That is in-order instruction issue!
RISC-V Floating Point Pipeline
99

▪ Executing a sequence of independent instructions


▪ fmul.d # double precision multiply
▪ fadd.d # double precision add
▪ fld # load double precision floating point
▪ fsd # store double precision floating point

fmul.d IF ID M1 M2 M3 M4 M5 M6 M7 MEM WB

fadd.d IF ID A1 A2 A3 A4 MEM WB

fld IF ID EX MEM WB

fsd IF ID EX MEM WB
Hazards and forwarding is handled in a similar
manner as integer pipeline, with a few additional
issues …
Forwarding is very similar, only hazard detection unit
need to consider a few more situations for adding
stalls…
Hazards in FP Pipeline
102

▪ Structural Hazards
▪ Unpipelined divide unit
▪ Instructions using MEM, WB at same time!
▪ Instruction have varying running times
▪ Can have more than one register writes in same cycle!
▪ Write after write (WAW) hazards are possible
▪ A later instruction writes the result to a register before an earlier one!
▪ When the earlier one finishes, the result can be wrong!

▪ Write-after-Read (WAR) hazard still not possible !


▪ Because instructions enter execution in-order!
Hazards in FP Pipeline
103

▪ Instructions complete / commit in an order different than the


issuance order
▪ Exceptions’ handling is complex!
▪ Because of longer EX stage
▪ Stalls after RAW hazards would be more frequent!
▪ RAW: Instruction needing data yet to be produced by earlier one!
Hazards in FP Pipeline
104

▪ Read after write (RAW) hazards


▪ Consider following code sequence
▪ fld f4, 0(x2)
▪ fmul.d f0, f4, f6
▪ fadd.d f2, f0, f8
▪ fsd f2, 0(x2)
▪ Execution in the pipeline Is forwarding being used?

Unless the previous instruction is issued for execution,


Extra Stall to avoid conflict of MEM
the next instruction is not decoded!
stages in two instructions!
Because control signals in ID/EX register are for one
instruction and we are not replicating hardware!
Hazards in FP Pipeline
105

▪ Structural hazard due to multiple write backs!


▪ Consider following sequence!

▪ In 10th cycle only fld accesses memory, so there’s no structural hazard!


▪ But in 11th cycle, all three instructions are in WB stage?
▪ What should be done, assuming single write port for register file?
Hazards in FP Pipeline
106

▪ Structural hazard due to multiple write backs!


▪ Solutions
▪ Add multiple ports
▪ Since this case is rare, it is not worth to add more ports!
▪ Detect it as a structural hazard and then add a stall to resolve it!
▪ Two ways to detect and add stall
1. Track the use of write port in ID stage and stall the instruction!
▪ A shift register can indicate when an instruction will use write port!
▪ Benefits?
• All stalls are detected in ID stage as for other stalls!
▪ Cost is additional shift register / write conflict logic!
2. Stall an instruction when it tries to enter MEM or WB stage
▪ In this case, any of the conflicting instructions can be stalled!
▪ Give priority to the unit with longest latency as that would have most
probably caused others to stall because of a RAW hazard, eventually
leading to both instructions writing back at same time!
▪ Benefit is easier to detect and implement stall here!
▪ Drawback is complicated control as this stall will impact multiple
instructions!
Hazards in FP Pipeline
107

▪ Write after write (WAW) hazards!


▪ Two instructions write same register but in wrong order!
▪ Consider the following case!

fadd.d f2, f4, f6 IF ID A1 A2 A3 A4 MEM WB

IF ID EX MEM WB

fld f2, 0(x2) IF ID EX MEM WB

▪ WAW hazard only occurs when the result of an instruction is overwritten


without usage by another instruction!
▪ If another instruction was using result of fadd i.e., f2, then there would be RAW hazard
and fld will be delayed automatically!
Hazards in FP Pipeline
108

▪ Write after write (WAW) hazards!


▪ Solutions
1. Delay issuance of second instruction until the first one enters MEM stage!
2. Allow the second instruction to proceed but do not allow the previous
instruction to write the register again!
▪ Since the result is not being used, no need to write it!
▪ Both can be implemented by detecting WAW hazard in ID stage!
▪ The difficulty is to detect when a second instruction may finish before an
earlier one!
▪ Since this case is rare, a simple solution can be used
▪ Do not issue an instruction which writes back the same register as an earlier one
which is in execution!
Summing up the hazard detection in FP pipeline …
Hazards in FP Pipeline
110

▪ Detecting hazards
▪ Between integer instructions!
▪ Only for load use or branches if branch is decided earlier!
▪ Between FP instructions
▪ Between FP and integer instructions
▪ For FP loads or stores!
▪ For data movement between FP and integer registers!
Hazards in FP Pipeline
111

▪ Assuming that all hazards are detected in ID stage, 3 checks need to


be performed
1. Check for structural hazards!
▪ Check if divide unit is busy or not!
▪ Check if write port will be available when needed!
2. Check for RAW hazards!
▪ Source registers should not be pending destinations in the pipeline when
this instruction will need them in pipeline!
▪ A number of checks need to be made for this!
3. Check for a WAW hazard!
▪ If an instruction in A1, …, A4, D, M1, …, M7 has same destination as this
one,
▪ Stall this instruction!
Hazards in FP Pipeline
112

▪ Forwarding is similar as in integer pipeline!


▪ Test for forwarding
▪ Check if destination in EX/MEM, A4/MEM, M7/MEM, D/MEM and WB/MEM
pipeline register is same as the source operands of current instruction (in
execution)!
Stalls Frequency in FP Pipeline
113

▪ How efficient a FP pipeline performs?


Relevant Reading
114

▪ Computer Architecture: A Quantitative Approach, 6th Edition,


Hennessy and Patterson
▪ Appendix C
▪ Sections C.4 and C.5
▪ Recommended to read case study in C.6 as well!

You might also like