CAO Fall 2024 Lecture 07 RISC V Pipelined Implementation
CAO Fall 2024 Lecture 07 RISC V Pipelined Implementation
Lecture # 07
RISC-V Pipelined Implementation
Muhammad Imran
[email protected]
Acknowledgement
2
▪ Pipelining – An Overview
▪ Pipelined Microarchitecture
▪ Dependencies and Pipelining Hazards
▪ Dealing with Pipelining Hazards in Hardware
▪ Handling Multicycle Operations
Pipelining to Improve Throughput
A Laundry Analogy
5
IF: Instruction Fetch ID: Instruction Decode / EX: Execute MEM: Memory WB: Write Back
Register File Read Access
+
Branch
Control
4
+
RegWrite
0 MemRead
Read Read Read
PC
Address Register 1 Data 1 MemToReg
1 Zero
Instruction Read Read ALU Read
Register 2 ALUSrc Address 1
[31:0] Data 2 Result Data
0 ALU
Write Register 0
Write Data 1
Instruction
Write
Memory Register File
Data
ALU
Control MemWrite
ALUOp
Adding Pipeline Registers
16
+
Branch
Control
4
+
RegWrite
0 MemRead
Read Read Read
PC
Address Register 1 Data 1 MemToReg
1 Zero
Instruction Read Read ALU Read
Register 2 ALUSrc Address 1
[31:0] Data 2 Result Data
0 ALU
Write Register 0
Write Data 1
Instruction
Write
Memory Register File
Data
ALU
Control MemWrite
ALUOp
• Four Pipeline Registers
• PC also acts as a pipeline register
Writing Back the Correct Reigster
17
+
Branch
Control
4
+
RegWrite
0 MemRead
Read Read Read
PC
Address Register 1 Data 1 MemToReg
1 Zero
Instruction Read Read ALU Read
Register 2 ALUSrc Address 1
[31:0] Data 2 Result Data
0 ALU
Write Register 0
Write Data 1
Instruction
Write
Memory Register File
Data
ALU
• Must save the Write Register address Control MemWrite
until the Write Back stage
ALUOp
• Similarly, any other signal for a
particular instruction, needed in later
stages!
Key Points!
18
+ Branch
Control
4
+
RegWrite
0 MemRead
Read Read Read
PC
Address Register 1 Data 1 MemToReg
1 Zero
Instruction Read Read ALU Read
Register 2 ALUSrc Address 1
[31:0] Data 2 Result Data
0 ALU
Write Register 0
Write Data 1
Instruction
Write
Memory Register File
Data
ALU
• Which stage RegWrite belongs to? Control MemWrite
ALUOp
Key in Pipelining Control
21
+ Branch
M
W
RegWrite
W W
Control M M
4
+ E
RegWrite
0 MemRead
Read Read Read
PC
Address Register 1 Data 1 MemToReg
1 Zero
Instruction Read Read ALU Read
Register 2 ALUSrc Address 1
[31:0] Data 2 Result Data
0 ALU
Write Register 0
Write Data 1
Instruction
Write
Memory Register File
Data
ALU
Control MemWrite
ALUOp
Dependencies and Pipelining Hazards
Dependences and Hazards
26
▪ Dependency
▪ An instruction’s execution dependent on outcome of another instruction
▪ Types of dependences
▪ Flow dependences or true data dependences
▪ Name dependences
▪ Control dependences
▪ Hazard
▪ Situations when an instruction in the pipeline cannot correctly execute
in the following cycle
▪ Types of hazards
▪ Structural Hazards
▪ Data Hazards
▪ Control Hazards
Dependences and Hazards
27
▪ Flow dependence
▪ x3 x1 op x2 Read after Write True Data Dependency
▪ x5 x3 op x4 (RAW)
▪ Anti dependence
▪ x3 x1 op x2 Write after Read
▪ x1 x4 op x5 (WAR)
Name Dependency
▪ Output dependence Do not lead to a hazard in simple
in-order pipeline we are
▪ x3 x1 op x2 discussing now!
Write after Write
▪ x5 x3 op x4
(WAW) Hazards due to name
▪ x3 x6 op x7 dependences can be addressed
by register renaming!
Data Dependences
29
▪ Simple Solution
▪ Stall the pipeline!
▪ Wait until the data has been written
▪ Forwarding or Bypassing
▪ Forward the data to the next instruction when it is available
▪ Do not wait for the write back stage to complete!
▪ Final Problem
▪ What happens if two hazard detection conditions (EX Hazard and MEM
Hazard) occur at the same time?
▪ EX/MEM.RegisterRd = ID/EX.RegisterRs1/Rs2
▪ MEM/WB.RegisterRd = ID/EX.RegisterRs1/Rs2
▪ Example
▪ Summing a vector in a register
▪ add x1, x1, x2
EX Hazard ▪ add x1, x1, x3
EX + MEM ▪ add x1, x1, x4
▪ ...
▪ Two possible sources of forwarding
▪ From Memory Access Stage (MEM Hazard)!
▪ From Execute Stage (EX Hazard)!
▪ The updated data is in EX Stage!
▪ Thus, we need to discard MEM Hazard in this case!
Detecting Data Hazards
50
▪ MEM Hazard
▪ if (MEM/WB.RegWrite
and (MEM/WB.RegisterRd ≠ 0)
and not(EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs1))
and (MEM/WB.RegisterRd = ID/EX.RegisterRs1)) ForwardA = 01
▪ if (MEM/WB.RegWrite
and (MEM/WB.RegisterRd ≠ 0)
and not(EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs2))
and (MEM/WB.RegisterRd = ID/EX.RegisterRs2)) ForwardB = 01
Datapath with Forwarding
55
Datapath with Forwarding
56
▪ Tell whether following code sequences must stall, can avoid stall
with only forwarding or can execute without stall or forwarding …
▪ Example 1
▪ lw x10, 0(x10)
▪ add x11, x10, x10
▪ Answer:
▪ Cannot fully avoid stall!
▪ Can reduce one cycle by forwarding!
Knowledge Check!
67
▪ Tell whether following code sequences must stall, can avoid stall
with only forwarding or can execute without stall or forwarding …
▪ Example 2
▪ add x11, x10, x10
▪ addi x12, x10, 5
▪ addi x14, x11, 5
▪ Answer:
▪ Third instruction needs x11 before first writes it back!
▪ read-after-write data hazard!
▪ Stall can be avoided by forwarding!
Knowledge Check!
68
▪ Tell whether following code sequences must stall, can avoid stall
with only forwarding or can execute without stall or forwarding …
▪ Example 3
▪ addi x11, x10, 1
▪ addi x12, x10, 2
▪ addi x13, x10, 3
▪ addi x14, x10, 4
▪ addi x15, x10, 5
▪ Answer:
▪ No stalls even without forwarding!
Hardware can fully resolve all data hazards but code
reordering / static scheduling in software can help
prevent some Stalls!
Addressing Control Hazards in Pipeline
Control Hazards
71
▪ Prediction
▪ A better solution to control hazards
▪ Predict the outcome of branch and fetch the next instruction!
▪ If prediction is wrong, fetch the right instruction again!
▪ Prediction can be static or dynamic!
Addressing Control Hazards
75
▪ Static prediction
▪ Predict all branches as taken or not taken
▪ Last example using prediction
▪ When prediction is correct!
▪ Static prediction
▪ Predict some branches as taken and some as not taken
▪ For example, a branch instruction at the end of a loop is usually taken
▪ Predict all branches to earlier addresses to be taken!
▪ When predicting ‘taken’
▪ Pipeline may still be stalled until branch target address (PC+offset) is
available!
▪ Use target address buffer to avoid these stalls!
▪ Example
▪ If address is computed in ID stage and branch is also decided in ID stage!
▪ One stall when predicting as “taken”!
▪ No stall when predicting as “not taken”!
When prediction goes wrong … some instruction in
the pipeline may need to be discarded!
Addressing Control Hazards
78
▪ Delayed prediction
▪ Delay the branch decision!
▪ Execute the instruction which is not impacted by branch!
▪ Efficient for one-cycle branch delays!
▪ Example
▪ add x1, x2, x4
▪ beq x5, x6, somewhere
▪ Reorder the instructions as
▪ beq x5, x6, somewhere
▪ add x1, x2, x4
▪ Can be handled by assembler!
▪ Invisible to the programmer!
Addressing Control Hazards
82
▪ Dynamic prediction
▪ Predict based on knowledge of the behavior of branch instruction!
▪ Keep history of different branches as taken and not taken
▪ Predict based on prevalent behavior of different instructions!
▪ Because of lot of history, such prediction have accuracy of above 90%!
Dynamic Branch Prediction
83
▪ Consider three branch prediction schemes: predict not taken, predict taken,
and dynamic prediction. Assume that they all have zero penalty when they
predict correctly and two cycles when they are wrong. Assume that the
average predict accuracy of the dynamic predictor is 90%. Which predictor
is the best choice for the following branches?
1. A conditional branch that is taken with 5% frequency
▪ Answer
▪ Predict Not taken!
2. A conditional branch that is taken with 95% frequency
▪ Answer
▪ Predict Taken!
3. A conditional branch that is taken with 70% frequency
▪ Answer
▪ Dynamic Prediction is better!
Instruction Sets can make pipelining easier or harder
…
Relevant Reading
88
Xpower = 1;
for(i=0; i < 3; i++)
Xpower = X*Xpower;
clk
[7:0]
Start
[7:0]
×
0 [7:0] [7:0]
D[7:0] Q[7:0]
[7:0]
X[7:0] 1
Xpower[7:0]
▪ Use latency
▪ No of cycles between an instruction producing a result and an
instruction using that result!
▪ After how many cycles dependent instruction can start execution!
▪ Just like loads always have use latency (wait) of 1 cycle!
▪ Initiation Interval
▪ When can another instruction that uses the same functional unit, be
issued (assuming it is independent)!
RISC-V Floating Point Pipeline
96
▪ Penalty for faster clock and more pipeline stages in a functional unit?
▪ More latency!
▪ Can cause stalls due to data hazards!
fmul.d IF ID M1 M2 M3 M4 M5 M6 M7 MEM WB
fadd.d IF ID A1 A2 A3 A4 MEM WB
fld IF ID EX MEM WB
fsd IF ID EX MEM WB
Hazards and forwarding is handled in a similar
manner as integer pipeline, with a few additional
issues …
Forwarding is very similar, only hazard detection unit
need to consider a few more situations for adding
stalls…
Hazards in FP Pipeline
102
▪ Structural Hazards
▪ Unpipelined divide unit
▪ Instructions using MEM, WB at same time!
▪ Instruction have varying running times
▪ Can have more than one register writes in same cycle!
▪ Write after write (WAW) hazards are possible
▪ A later instruction writes the result to a register before an earlier one!
▪ When the earlier one finishes, the result can be wrong!
IF ID EX MEM WB
▪ Detecting hazards
▪ Between integer instructions!
▪ Only for load use or branches if branch is decided earlier!
▪ Between FP instructions
▪ Between FP and integer instructions
▪ For FP loads or stores!
▪ For data movement between FP and integer registers!
Hazards in FP Pipeline
111