0% found this document useful (0 votes)

57 views114 pages

CAO Fall 2024 Lecture 07 RISC V Pipelined Implementation

Uploaded by

Omair Siddique

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views114 pages

CAO Fall 2024 Lecture 07 RISC V Pipelined Implementation

Uploaded by

Omair Siddique

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 114

EE-321 Fall 2024

Computer Architecture and Organization

Lecture # 07
RISC-V Pipelined Implementation

Muhammad Imran
[email protected]
Acknowledgement
2

▪ Content from following has been used in these lectures

▪ Computer Organization and Design, RISC-V 2nd Edition, Patterson and
Hennessy
Contents
3

▪ Pipelining – An Overview
▪ Pipelined Microarchitecture
▪ Dependencies and Pipelining Hazards
▪ Dealing with Pipelining Hazards in Hardware
▪ Handling Multicycle Operations
Pipelining to Improve Throughput
A Laundry Analogy
5

▪ Pipelining helps execute multiple tasks in parallel

▪ Improves throughput …
Improvement by Pipelining
6

▪ Pipelining only improves throughput

▪ Individual tasks still take same amount of time
▪ When number of tasks is too large and pipeline stages are perfectly
balanced!
▪ Improvement in performance ≈ Number of pipeline stages
▪ Example
▪ Suppose each stage in laundry takes 10 minutes and there are 4 stages
▪ Without pipelining
▪ 4 loads take 40×4=160 minutes!
▪ With pipelining
▪ 4 loads take 70 minutes!
▪ With large number of loads, improvement would approach 4 times!
Pipelining in RISC-V
7

▪ Instruction execution stages

1. Fetch instruction from memory.
2. Read registers and decode the instruction.
3. Execute the operation or calculate an address.
4. Access an operand in data memory (if necessary).
5. Write the result into a register (if necessary).
▪ Therefore, we can implement a five-stage pipeline for RISC-V!

▪ RISC-V Pipeline Stages are not perfectly balanced!

Pipelining in RISC-V
8

▪ Example of RISC-V instruction execution time!

▪ Assuming Multiplexors, Control Unit, PC Access and Sign-extension

unit has no delay!
▪ Load takes the longest!
▪ For single-cycle, the clock period must be that of load time!
▪ Among individual stages the longest time is 200 ps!
▪ The clock period in pipelined architecture must be at least 200 ps (+ tclk2q
+ ts), although some stages take only 100 ps!!
Pipelining in RISC-V
9

▪ Time to execute three instructions

▪ Without pipelining: 2400 ps
▪ After pipelining: 1400 ps
▪ Time between first and 3rd instruction = 2×200 ps = 400 ps
Pipelining in RISC-V
10

▪ How about adding 1,000,000 more instructions (to current 3)?

▪ Each instruction adds 200 ps, total execution time using pipelining would be (1400 ps +
200×1,000,000 ps) = 200,001,400 ps
▪ For non-pipelined design execution time = (2400 ps + 1,000,000×800 ps) = 800,002,400
ps
▪ Improved is about 4 times = amount of reduction in clock period (given imbalanced stages)!
Pipelining in RISC-V
11

▪ Implementing RISC-V Pipelining

▪ RISC-V instructions are easier to pipeline, because of
▪ Fixed instruction size
▪ Easy to fetch and decode!
▪ Fixed location of source/destination operands
▪ Memory operands appear only in loads/stores
▪ Execution stage can be used for address calculation!
▪ Access memory in last stage!
▪ Allowing memory operand in other instructions would increase number of
pipeline stages / imbalance!
▪ RISC-V has fewer instruction formats!

▪ x86 architecture has variable instruction sizes

▪ Hard to pipeline
▪ x86 instructions are translated into RISC like operations to implement
pipelining
Pipelining is a programmer invisible technique for
performance improvement …
Pipelined Microarchitecture
RISC-V Pipeline Stages
14

1. IF: Instruction fetch

2. ID: Instruction decode and register file read
3. EX: Execution or address calculation
4. MEM: Data memory access
5. WB: Write back
RISC-V Pipeline Stages
15

IF: Instruction Fetch ID: Instruction Decode / EX: Execute MEM: Memory WB: Write Back
Register File Read Access

+
Branch

Control
4
+
RegWrite

0 MemRead
Read Read Read
PC
Address Register 1 Data 1 MemToReg
1 Zero
Instruction Read Read ALU Read
Register 2 ALUSrc Address 1
[31:0] Data 2 Result Data
0 ALU
Write Register 0

Write Data 1
Instruction
Write
Memory Register File
Data

Imm Data Memory

Gen

ALU
Control MemWrite

ALUOp
Adding Pipeline Registers
16

IF/ID ID/EX EX/MEM MEM/WB

+
Branch

Control
4
+
RegWrite

0 MemRead
Read Read Read
PC
Address Register 1 Data 1 MemToReg
1 Zero
Instruction Read Read ALU Read
Register 2 ALUSrc Address 1
[31:0] Data 2 Result Data
0 ALU
Write Register 0

Write Data 1
Instruction
Write
Memory Register File
Data

Imm Data Memory

Gen

ALU
Control MemWrite

ALUOp
• Four Pipeline Registers
• PC also acts as a pipeline register
Writing Back the Correct Reigster
17

IF/ID ID/EX EX/MEM MEM/WB

+
Branch

Control
4
+
RegWrite

0 MemRead
Read Read Read
PC
Address Register 1 Data 1 MemToReg
1 Zero
Instruction Read Read ALU Read
Register 2 ALUSrc Address 1
[31:0] Data 2 Result Data
0 ALU
Write Register 0

Write Data 1
Instruction
Write
Memory Register File
Data

Imm Data Memory

Gen

ALU
• Must save the Write Register address Control MemWrite
until the Write Back stage
ALUOp
• Similarly, any other signal for a
particular instruction, needed in later
stages!
Key Points!
18

▪ For a store instruction, the contents of the register read in Stage 2

must be saved in pipeline
▪ Used in 4th Stage
▪ A store instruction finishes in 4th stage (memory access)
▪ Can we execute instructions faster because of this?
▪ No, because other instructions are already in process!
▪ An instruction must pass through all stages even if it doesn’t use some
of the stages!
▪ Each logical component can be used only within a single pipeline
stage!
▪ Otherwise, there would be structural hazard!
Pipelining Control
Control Signals in Different Stages
20

IF/ID ID/EX EX/MEM MEM/WB

+ Branch

Control
4
+
RegWrite

0 MemRead
Read Read Read
PC
Address Register 1 Data 1 MemToReg
1 Zero
Instruction Read Read ALU Read
Register 2 ALUSrc Address 1
[31:0] Data 2 Result Data
0 ALU
Write Register 0

Write Data 1
Instruction
Write
Memory Register File
Data

Imm Data Memory

Gen

ALU
• Which stage RegWrite belongs to? Control MemWrite

ALUOp
Key in Pipelining Control
21

▪ Control Signals are generated in Instruction Decode State

▪ Save control signals in pipeline registers until they are used (in later
stages)!
Identifying Control Signals for Each Stage
22

▪ Needed in 3 subsequent stages after generation in Second Stage!

Pipelining Control
23

▪ Saving signals until used …

Pipelined Datapath and Control
24

IF/ID ID/EX EX/MEM MEM/WB

+ Branch
M
W
RegWrite
W W
Control M M
4
+ E
RegWrite

0 MemRead
Read Read Read
PC
Address Register 1 Data 1 MemToReg
1 Zero
Instruction Read Read ALU Read
Register 2 ALUSrc Address 1
[31:0] Data 2 Result Data
0 ALU
Write Register 0

Write Data 1
Instruction
Write
Memory Register File
Data

Imm Data Memory

Gen

ALU
Control MemWrite

ALUOp
Dependencies and Pipelining Hazards
Dependences and Hazards
26

▪ Dependency
▪ An instruction’s execution dependent on outcome of another instruction
▪ Types of dependences
▪ Flow dependences or true data dependences
▪ Name dependences
▪ Control dependences

▪ Hazard
▪ Situations when an instruction in the pipeline cannot correctly execute
in the following cycle
▪ Types of hazards
▪ Structural Hazards
▪ Data Hazards
▪ Control Hazards
Dependences and Hazards
27

▪ Dependencies are a property of program

▪ A dependence may or may not result in hazard!
▪ A hazard may or may not require pipeline stall to resolve!
▪ This is a property of the pipeline organization!
▪ Implementation dependent!
Data Dependences
28

▪ Flow dependence
▪ x3  x1 op x2 Read after Write True Data Dependency
▪ x5  x3 op x4 (RAW)

▪ Anti dependence
▪ x3  x1 op x2 Write after Read
▪ x1  x4 op x5 (WAR)

Name Dependency
▪ Output dependence Do not lead to a hazard in simple
in-order pipeline we are
▪ x3  x1 op x2 discussing now!
Write after Write
▪ x5  x3 op x4
(WAW) Hazards due to name
▪ x3  x6 op x7 dependences can be addressed
by register renaming!
Data Dependences
29

▪ Data value may flow through registers or memory

▪ Data-dependence through registers is easier to detect
▪ That through memory is harder
▪ 100(x4) and 20(x6) may be same locations!
▪ 20(x4) and 20(x4) may be different in different instructions!
▪ Data hazard through memory is present when memory is accessed out
of program order!
▪ Not a case in simple pipeline!
Control Dependences
30

▪ Every instruction except that within first basic block is dependent on

outcome of a branch!
▪ Example
▪ add x1, x2, x3
▪ beq x4, x0, L
▪ sub x1, x5, x6
▪ …
▪ L:
▪ or x7, x1, x8
Structural Hazards
31

▪ When hardware cannot execute a combination of instructions in

same cycle
▪ The required hardware resource is busy!
▪ Example
▪ If we had one memory for instructions and data in RISC-V
▪ Memory access from one instruction and instruction fetch of another
couldn’t execute in one cycle

▪ RISC-V instruction set was designed to be pipelined

▪ Easier to avoid structural hazards in pipelined implementation
Addressing Data Hazards in the Pipeline
In a simple five stage in-order integer pipeline, we
only need to address Read-after-Write (RAW) data
hazard which is due to true data dependency!
Data Hazards
34

▪ When instruction cannot execute because the data required by it is

not yet available
▪ Dependence of one instruction on an earlier instruction in the pipeline
▪ Example
▪ add x1, x2, x3
▪ sub x4, x1, x5
▪ x1 will be written back in first instruction later before it is required by 2nd
instruction!

Without addressing data hazard outdated value of x1 will be read!

Addressing Data Hazards
35

▪ Simple Solution
▪ Stall the pipeline!
▪ Wait until the data has been written

Pipeline Stall or Bubble –

A pipeline stall (wait)
initiated to resolve a
hazard

▪ Need to wait 3 cycles, so that slows down execution!

▪ Implementing stall
▪ Compiler can implement stalls (by adding NOPs) to resolve data hazards!
▪ Or hardware can stall the pipeline!
Addressing Data Hazards
36

▪ Forwarding cannot prevent all pipeline stalls!

▪ Example
▪ lw x1, 0(x2)
▪ sub x4, x1, x5
▪ x1 is not available for forwarding when it is needed by 2nd instruction!

▪ Known as load-use data hazard!

Addressing Data Hazards
37

▪ Forwarding or Bypassing
▪ Forward the data to the next instruction when it is available
▪ Do not wait for the write back stage to complete!

▪ Forwarding only works destination stage is later in time compared to

source stage!
Addressing Data Hazards
38

▪ Different cases for forwarding!

▪ An R-type / load / store instruction dependent on R-type instruction
▪ add x4, x5, x6
▪ sub x7, x4, x8
▪ Can be resolved by forwarding result of third stage of source to input of third
stage of the destination!
▪ An R-type / load / store (for address operand) instruction dependent on
load instruction
▪ lw x4, x5, x6
▪ sw x7, 0(x4)
▪ Can be resolved by adding a stall and then forwarding result of fourth stage of
source to input of third stage of the destination!
▪ A store (for data operand) instruction dependent on load instruction!
▪ lw x4, 0(x6)
▪ sw x4, 0(x8)
▪ Can be resolved by forwarding result of fourth stage of source to input of fourth
stage of the destination! (without any stall !!!)
Addressing Data Hazards
39

▪ Different cases for forwarding!

▪ From 3rd or 4th stage to 3rd stage!
▪ From 4th stage to 4th stage!
▪ To resolve data hazards in hardware
▪ Need to implement above two cases of forwarding!
▪ And
▪ Mechanism to stall pipeline for one cycle when needed!
▪ An R-type / load / store (for address operand) instruction dependent on load
instruction!
Addressing Data Hazards
40

▪ Code reordering / static scheduling to prevent stalls

▪ Example
▪ Consider following C Code
▪ a = b + e;
▪ c = b + f;
▪ Compiled to following RISC-V Code
▪ lw x1, 0(x31) # Load b
▪ lw x2, 8(x31) # Load e
▪ add x3, x1, x2 #b+e
▪ sw x3, 24(x31) # Store a
▪ lw x4, 16(x31) # Load f
▪ add x5, x1, x4 #b+f
▪ sw x5, 32(x31) # Store c
▪ Which instructions have data hazards?
▪ How many hazards are there with or without forwardig?
Addressing Data Hazards
41

▪ Code reordering to prevent stalls

▪ Example
▪ Data hazards when we can use forwarding
lw x1, 0(x31) lw x1, 0(x31)
lw x2, 8(x31) lw x2, 8(x31)
add x3, x1, x2 lw x4, 16(x31)
sw x3, 24(x31) add x3, x1, x2
lw x4, 16(x31) sw x3, 24(x31)
add x5, x1, x4 add x5, x1, x4
sw x5, 32(x31) sw x5, 32(x31)
▪ How can we reorder code to avoid stalls?
▪ Simply moving lw instruction above in the sequence can avoid both
hazards!
▪ How much this improves the performance?
▪ The new code sequence executes two cycles faster (assuming that we are using
forwarding!)
Addressing Data Hazards
42

▪ Forwarding implementation in RISC-V

▪ RISC-V instructions write at most one result!
▪ Result is written in last stage!
▪ Forwarding is harder if
▪ More than one results are to be forwarded per instruction!
Implementing forwarding to address data hazards!
Revisiting Data Hazards
44

▪ Example of data dependencies

▪ sub x2, x1, x3 # Register x2 written by sub
▪ and x12, x2, x5 # 1st operand(x2) depends on sub
▪ or x13, x6, x2 # 2nd operand(x2) depends on sub
▪ add x14, x2, x2 # 1st(x2) & 2nd(x2) depend on sub
▪ sw x15, 100(x2) # Base (x2) depends on sub

▪ Which instructions can have data hazards?

Revisiting Data Hazards
45

▪ Example of data hazards

Does this also

cause a hazard?
Yes, it is, but
preventable
within the
register file!
Detecting Data Hazards
46

▪ How can we detect data hazards in hardware?

When rd register or rd register

here here

i.e., when the register

read from register file
is the same which will
be written back by a
equals rs1/rs2 previous instruction!
here
Detecting Data Hazards
47

▪ Conditions for Hazards

1a. EX/MEM.RegisterRd = ID/EX.RegisterRs1
EX Hazard!
1b. EX/MEM.RegisterRd = ID/EX.RegisterRs2
2a. MEM/WB.RegisterRd = ID/EX.RegisterRs1 MEM Hazard!
2b. MEM/WB.RegisterRd = ID/EX.RegisterRs2
▪ Example
▪ sub x2, x1, x3
▪ and x12, x2, x5
▪ or x13, x6, x2 Type 1a Hazard!
▪ add x14, x2, x2
Type 2b Hazard!
▪ sw x15, 100(x2)

Is there any problem with this Hazard Detection Policy?

Detecting Data Hazards
48

▪ Rd of an instruction (in EX or MEM Stage) can be equal to rs1/rs2 of

a subsequent instruction even though this instruction doesn’t write
back!
▪ Using forwarding for this case would be wrong!
▪ Solution?
▪ Check if RegWrite is active!
▪ Which is in the WB Control Signals of EX/MEM register!

▪ Any other problems with Hazard Detection Policy?

▪ When x0 is the destination!!
▪ Wrong non-zero result would be forwarded!
▪ x0 as a source operand in subsequent instruction would get non-zero
forwarded data!
▪ So, do not forward when rd = x0 !!!
▪ Any other problem ??? … ☺
Detecting Data Hazards
49

▪ Final Problem
▪ What happens if two hazard detection conditions (EX Hazard and MEM
Hazard) occur at the same time?
▪ EX/MEM.RegisterRd = ID/EX.RegisterRs1/Rs2
▪ MEM/WB.RegisterRd = ID/EX.RegisterRs1/Rs2
▪ Example
▪ Summing a vector in a register
▪ add x1, x1, x2
EX Hazard ▪ add x1, x1, x3
EX + MEM ▪ add x1, x1, x4
▪ ...
▪ Two possible sources of forwarding
▪ From Memory Access Stage (MEM Hazard)!
▪ From Execute Stage (EX Hazard)!
▪ The updated data is in EX Stage!
▪ Thus, we need to discard MEM Hazard in this case!
Detecting Data Hazards
50

▪ Final Hazard Detection Policy / Logic

▪ EX Hazard
▪ EX/MEM.RegisterRd = ID/EX.RegisterRs1/Rs2
▪ And EX/MEM.RegWrite = 1
▪ And Rd ≠ x0
▪ MEM Hazard
▪ MEM/WB.RegisterRd = ID/EX.RegisterRs1/Rs2
▪ And MEM/WB.RegWrite = 1
▪ And Rd ≠ x0
▪ And there is NO EX Hazard !!!
▪ i.e., NOT of EX Hazard conditions!
Forwarding to Resolve Hazards
51

▪ Before and after forwarding …

Additional Datapath for Forwarding
52

▪ Rs1 and Rs2 (addresses) are added in ID/EX register to support

forwarding!
▪ Mux for immediate/register before ALU is not shown!
Forwarding Control
53

▪ ForwardA and ForwardB are the control signals generated by

forwarding unit!
Forwarding Control
54

▪ Logic to implement control in forwarding unit

▪ EX Hazard
▪ if (EX/MEM.RegWrite
and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs1)) ForwardA = 10
▪ if (EX/MEM.RegWrite
and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs2)) ForwardB = 10

▪ MEM Hazard
▪ if (MEM/WB.RegWrite
and (MEM/WB.RegisterRd ≠ 0)
and not(EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs1))
and (MEM/WB.RegisterRd = ID/EX.RegisterRs1)) ForwardA = 01
▪ if (MEM/WB.RegWrite
and (MEM/WB.RegisterRd ≠ 0)
and not(EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs2))
and (MEM/WB.RegisterRd = ID/EX.RegisterRs2)) ForwardB = 01
Datapath with Forwarding
55
Datapath with Forwarding
56

▪ Do not forget to include ALU Source Mux …

The other case of forwarding …
57

▪ What about store following a load?

▪ lw x1, 0(x2)
▪ sw x1, 0(x3)

▪ Need to forward from MEM/WB register to memory access stage! (not

to ALU!)
▪ Need additional hardware!
▪ Try implementing it as a learning exercise!
When Stalls are unavoidable …
Situations where we must Stall!
59

▪ When a register is read following a load instruction for the same

▪ Hazard detection unit in the ID stage decides if instruction must be

stalled because of the previous load instruction!
▪ What stall really is?
▪ Can we implement stall by allowing the instruction to execute twice? Would
there be any problem?
▪ To implement the stall
▪ We must not allow the current instruction to progress
▪ i.e., do not let the IF/ID pipeline register change!
▪ Let the next instruction wait before being fetched into IF/ID!
▪ i.e., do not let the PC to change!
▪ To prevent the current decoded instruction from execution
▪ Do not let register file and memory be written!
Stalls in Hardware
62

▪ Instructions execution with a stall!

Hazard Detection
63

▪ Logic to detect whether to stall an instruction!

▪ Check if the previous instruction is a load!
▪ Check if the current instruction reads the register loaded by previous
instruction!
▪ if (ID/EX.MemRead and
((ID/EX.RegisterRd = IF/ID.RegisterRs1) or
(ID/EX.RegisterRd = IF/ID.RegisterRs2)))
stall the pipeline
Control Signals to Implement Stall
64

▪ To stall current instruction

▪ Do not allow IF/ID register to be changed
▪ Use IF/IDWrite signal (write enable signal) to control when IF/ID register
should be allowed to be changed!
▪ Make the control signals for following stages (Execute, Memory and
Write Back) to be 0!
▪ Only RegWrite and MemWrite need to be 0, rest can be don’t care!
▪ Will prevent memory / register file to be written!
▪ Practically same as inserting nop instruction!

▪ Preventing next instruction to be fetched

▪ Use PCWrite signal (write enable signa) to prevent PC from being
updated!
Datapath and Control with Stalls
65
Knowledge Check!
66

▪ Tell whether following code sequences must stall, can avoid stall
with only forwarding or can execute without stall or forwarding …
▪ Example 1
▪ lw x10, 0(x10)
▪ add x11, x10, x10
▪ Answer:
▪ Cannot fully avoid stall!
▪ Can reduce one cycle by forwarding!
Knowledge Check!
67

▪ Tell whether following code sequences must stall, can avoid stall
with only forwarding or can execute without stall or forwarding …
▪ Example 2
▪ add x11, x10, x10
▪ addi x12, x10, 5
▪ addi x14, x11, 5
▪ Answer:
▪ Third instruction needs x11 before first writes it back!
▪ read-after-write data hazard!
▪ Stall can be avoided by forwarding!
Knowledge Check!
68

▪ Tell whether following code sequences must stall, can avoid stall
with only forwarding or can execute without stall or forwarding …
▪ Example 3
▪ addi x11, x10, 1
▪ addi x12, x10, 2
▪ addi x13, x10, 3
▪ addi x14, x10, 4
▪ addi x15, x10, 5
▪ Answer:
▪ No stalls even without forwarding!
Hardware can fully resolve all data hazards but code
reordering / static scheduling in software can help
prevent some Stalls!
Addressing Control Hazards in Pipeline
Control Hazards
71

▪ When conditional branch instructions need to decide which

instruction should be next
▪ Next instruction cannot be fetched until branch decision is finalized!
▪ Also known as branch hazards
▪ Simple solution?
▪ Stall the pipeline until branch is decided!
▪ Implementation is same as in case of data hazards, need to enhance
hazard detection logic to detect control hazards!
▪ How many stalls?
▪ If branch is decided in 3rd stage and PC is updated in 4th stage!
▪ Three stalls!
▪ If branch is decided in 3rd stage and PC is also updated in 3rd stage!
▪ Two stalls!
Reducing Branch Delay
72

▪ Branch decision can be finalized in ID stage by adding extra

hardware (xor gates for comparison + adder for PC+offset)!
▪ Still the pipeline may need to be stalled!
▪ Only one stall as opposed to 3!
▪ Example
Reducing Branch Delay
73

▪ Branch decision can be finalized in ID stage by adding extra

hardware (xor gates for comparison)!
▪ Problems?
▪ Forwarding (for RAW dependency) needs to be modified!
▪ If branch is dependent on an earlier instructions, forwarding may be needed
to ID stage where branch is executing!
▪ Forwarding sources can be EX/MEM or MEM/WB registers!
▪ Additional stalls (for RAW dependency) may also be needed!
▪ If branch is dependent on ALU instruction → 1 Cycle Stall
▪ If branch is dependent on previous load → 2 Cycle Stall
Addressing Control Hazards
74

▪ Prediction
▪ A better solution to control hazards
▪ Predict the outcome of branch and fetch the next instruction!
▪ If prediction is wrong, fetch the right instruction again!
▪ Prediction can be static or dynamic!
Addressing Control Hazards
75

▪ Static prediction
▪ Predict all branches as taken or not taken
▪ Last example using prediction
▪ When prediction is correct!

▪ When prediction is wrong!

Addressing Control Hazards
76

▪ Static prediction
▪ Predict some branches as taken and some as not taken
▪ For example, a branch instruction at the end of a loop is usually taken
▪ Predict all branches to earlier addresses to be taken!
▪ When predicting ‘taken’
▪ Pipeline may still be stalled until branch target address (PC+offset) is
available!
▪ Use target address buffer to avoid these stalls!
▪ Example
▪ If address is computed in ID stage and branch is also decided in ID stage!
▪ One stall when predicting as “taken”!
▪ No stall when predicting as “not taken”!
When prediction goes wrong … some instruction in
the pipeline may need to be discarded!
Addressing Control Hazards
78

▪ Assuming branch decision comes after MEM Stage!

Need to discard the

instructions in MEM, EX and
ID Stage if the branch is
taken!
Discarding Instruction
79

▪ Instruction entering MEM stage

▪ Make control signals as zeroes!
▪ Instruction entering EXE stage
▪ Make control signals as zeroes!
▪ How to discard instruction in IF/ID register which is fetched in same
cycle when branch is decided!
▪ Will enter ID stage in next cycle!
▪ The instruction is not yet decoded and there are no control signals to be
zeroed!
▪ How to handle this?
▪ Use a control signal IF.Flush which zeroes the instruction in IF/ID register!
▪ The instruction effectively becomes nop instruction!
Final Pipelined Datapath and Control
80

▪ Note: ALU Source Mux + Mux control signals not shown …

Addressing Control Hazards
81

▪ Delayed prediction
▪ Delay the branch decision!
▪ Execute the instruction which is not impacted by branch!
▪ Efficient for one-cycle branch delays!
▪ Example
▪ add x1, x2, x4
▪ beq x5, x6, somewhere
▪ Reorder the instructions as
▪ beq x5, x6, somewhere
▪ add x1, x2, x4
▪ Can be handled by assembler!
▪ Invisible to the programmer!
Addressing Control Hazards
82

▪ Dynamic prediction
▪ Predict based on knowledge of the behavior of branch instruction!
▪ Keep history of different branches as taken and not taken
▪ Predict based on prevalent behavior of different instructions!
▪ Because of lot of history, such prediction have accuracy of above 90%!
Dynamic Branch Prediction
83

▪ To implement dynamic prediction in hardware

▪ Include a small look-up table which is indexed by the lower portion of
the address of branches
▪ Branch Prediction Buffer
▪ Branch History Table
▪ A single bit may indicate if this branch was taken or not taken recently!
▪ Prediction bit is inverted on misprediction!
▪ Accuracy of such predictor in loops?
▪ 1 misprediction on start of loop!
▪ (n-1) correct predictions!
▪ 1 misprediction when exiting loop!
▪ Success rate = (n-1)/(n+1), e.g. for 9 iterations, it is 80%!
▪ Can we improve accuracy?
Dynamic Branch Prediction
84

▪ 2-bit branch prediction!

▪ Revert branch decision on two mispredictions not one!
Dynamic Prediction
85

▪ Tournament Branch Predictor

▪ Use multiple branch predictors!
▪ A selector selects the predictor which has been most accurate for a
given branch!
▪ Target Address Calculation
▪ For taken branches target address calculation requires 1 cycle!
▪ To avoid 1 cycle penalty, use
▪ Branch Target Buffer
▪ A cache to hold target PC or instruction for branches!
Knowledge Check!
86

▪ Consider three branch prediction schemes: predict not taken, predict taken,
and dynamic prediction. Assume that they all have zero penalty when they
predict correctly and two cycles when they are wrong. Assume that the
average predict accuracy of the dynamic predictor is 90%. Which predictor
is the best choice for the following branches?
1. A conditional branch that is taken with 5% frequency
▪ Answer
▪ Predict Not taken!
2. A conditional branch that is taken with 95% frequency
▪ Answer
▪ Predict Taken!
3. A conditional branch that is taken with 70% frequency
▪ Answer
▪ Dynamic Prediction is better!
Instruction Sets can make pipelining easier or harder
…
Relevant Reading
88

▪ Computer Organization and Design (RISC-V, 2nd Edition), Patterson

and Hennessy
▪ Chapter 4
▪ Sections 4.6, 4.7, 4.8 and 4.9!

▪ Do practice for exams!

▪ Exercise Problems
▪ Chapter 4 of the Textbook
▪ Problems relevant to Sections 4.6, 4.7, 4.8 and 4.9!
Handling Multicycle Floating-Point Operations in
RISC-V
Why multicycle?
90

▪ What happens if we force all operations to complete in one cycle?

▪ Slower clock
▪ More logic
▪ Can be both!
▪ Pipelining multicycle operation
▪ Start a new operation when the first operation goes into the second
stage of execution!
▪ For example, pipeline a multiplier into seven pipeline stages
▪ Can overlap execution of 7 (independent) multiply operations!

▪ Can we fully pipeline multicycle operations?

▪ Sometimes no!
Unpipelined Multicycle Execution
91

Xpower = 1;
for(i=0; i < 3; i++)
Xpower = X*Xpower;

clk
[7:0]

Start
[7:0]
×
0 [7:0] [7:0]
D[7:0] Q[7:0]
[7:0]
X[7:0] 1
Xpower[7:0]

▪ This design can, however, be fully pipelined!

▪ At cost of additional registers and multipliers!
We will see how to handle both pipelined and
unpipelined multicycle operations …
RISC-V Floating Point Pipeline
93

▪ Change the EX stage

▪ Allow operations to take multiple cycles for different operations!
▪ Add multiple functional units instead of one ALU
▪ Benefit?
▪ Reduced structural hazards!
▪ Cannot issue a second instruction to EX stage, if the functional unit is busy!

▪ Functional units in FP pipeline

▪ Integer unit for loads, stores, integer ALU operations, branches!
▪ FP and integer multiplier
▪ FP adder that handles FP add, subtract, and conversion
▪ FP and integer divider
RISC-V Floating Point Pipeline
94

▪ FP pipeline with unpipelined functional units

• EX stage is not pipelined but it is multicycle!

• An instruction using same functional unit in EX
stage as an earlier instruction cannot proceed
before completion of previous one!
• If an instruction cannot proceed to EX stage, all
the following instructions are stalled!
RISC-V Floating Point Pipeline
95

▪ Pipelining the functional units

▪ Let’s say
▪ FP/Integer multiply unit is pipelined into 7 stages!
▪ FP adder is pipelined into 4 stages!
▪ FP/Integer divide unit is unpipelined with 25 stages!

▪ Use latency
▪ No of cycles between an instruction producing a result and an
instruction using that result!
▪ After how many cycles dependent instruction can start execution!
▪ Just like loads always have use latency (wait) of 1 cycle!

▪ Initiation Interval
▪ When can another instruction that uses the same functional unit, be
issued (assuming it is independent)!
RISC-V Floating Point Pipeline
96

▪ Latencies and initiation intervals of functional units

▪ These are use latencies of instruction using the results in EX stage!

▪ What about use latencies for store instruction for the value being stored
(not the base address register)?
▪ Can be one cycle less!
▪ Because the stored value is needed in MEM stage not EX stage!
RISC-V Floating Point Pipeline
97

▪ FP pipeline with pipelined functional units

• DIV is still not pipelined!

• We can have up to 7 multiply, 4 FP add and 1 divide operation in the pipeline at same time!
RISC-V Floating Point Pipeline
98

▪ Penalty for faster clock and more pipeline stages in a functional unit?
▪ More latency!
▪ Can cause stalls due to data hazards!

▪ Additional Pipeline Registers

▪ ID register can be logically divided into multiple registers, one each for
every functional unit
▪ ID/EX, ID/A1, ID/M1, and ID/DIV
▪ Can also be implemented using different reigsters
▪ However, there’s only one operation that can be in this stage at a time and
control signals are associated only with that operation!
▪ That is in-order instruction issue!
RISC-V Floating Point Pipeline
99

▪ Executing a sequence of independent instructions

▪ fmul.d # double precision multiply
▪ fadd.d # double precision add
▪ fld # load double precision floating point
▪ fsd # store double precision floating point

fmul.d IF ID M1 M2 M3 M4 M5 M6 M7 MEM WB

fadd.d IF ID A1 A2 A3 A4 MEM WB

fld IF ID EX MEM WB

fsd IF ID EX MEM WB
Hazards and forwarding is handled in a similar
manner as integer pipeline, with a few additional
issues …
Forwarding is very similar, only hazard detection unit
need to consider a few more situations for adding
stalls…
Hazards in FP Pipeline
102

▪ Structural Hazards
▪ Unpipelined divide unit
▪ Instructions using MEM, WB at same time!
▪ Instruction have varying running times
▪ Can have more than one register writes in same cycle!
▪ Write after write (WAW) hazards are possible
▪ A later instruction writes the result to a register before an earlier one!
▪ When the earlier one finishes, the result can be wrong!

▪ Write-after-Read (WAR) hazard still not possible !

▪ Because instructions enter execution in-order!
Hazards in FP Pipeline
103

▪ Instructions complete / commit in an order different than the

issuance order
▪ Exceptions’ handling is complex!
▪ Because of longer EX stage
▪ Stalls after RAW hazards would be more frequent!
▪ RAW: Instruction needing data yet to be produced by earlier one!
Hazards in FP Pipeline
104

▪ Read after write (RAW) hazards

▪ Consider following code sequence
▪ fld f4, 0(x2)
▪ fmul.d f0, f4, f6
▪ fadd.d f2, f0, f8
▪ fsd f2, 0(x2)
▪ Execution in the pipeline Is forwarding being used?

Unless the previous instruction is issued for execution,

Extra Stall to avoid conflict of MEM
the next instruction is not decoded!
stages in two instructions!
Because control signals in ID/EX register are for one
instruction and we are not replicating hardware!
Hazards in FP Pipeline
105

▪ Structural hazard due to multiple write backs!

▪ Consider following sequence!

▪ In 10th cycle only fld accesses memory, so there’s no structural hazard!

▪ But in 11th cycle, all three instructions are in WB stage?
▪ What should be done, assuming single write port for register file?
Hazards in FP Pipeline
106

▪ Structural hazard due to multiple write backs!

▪ Solutions
▪ Add multiple ports
▪ Since this case is rare, it is not worth to add more ports!
▪ Detect it as a structural hazard and then add a stall to resolve it!
▪ Two ways to detect and add stall
1. Track the use of write port in ID stage and stall the instruction!
▪ A shift register can indicate when an instruction will use write port!
▪ Benefits?
• All stalls are detected in ID stage as for other stalls!
▪ Cost is additional shift register / write conflict logic!
2. Stall an instruction when it tries to enter MEM or WB stage
▪ In this case, any of the conflicting instructions can be stalled!
▪ Give priority to the unit with longest latency as that would have most
probably caused others to stall because of a RAW hazard, eventually
leading to both instructions writing back at same time!
▪ Benefit is easier to detect and implement stall here!
▪ Drawback is complicated control as this stall will impact multiple
instructions!
Hazards in FP Pipeline
107

▪ Write after write (WAW) hazards!

▪ Two instructions write same register but in wrong order!
▪ Consider the following case!

fadd.d f2, f4, f6 IF ID A1 A2 A3 A4 MEM WB

IF ID EX MEM WB

fld f2, 0(x2) IF ID EX MEM WB

▪ WAW hazard only occurs when the result of an instruction is overwritten

without usage by another instruction!
▪ If another instruction was using result of fadd i.e., f2, then there would be RAW hazard
and fld will be delayed automatically!
Hazards in FP Pipeline
108

▪ Write after write (WAW) hazards!

▪ Solutions
1. Delay issuance of second instruction until the first one enters MEM stage!
2. Allow the second instruction to proceed but do not allow the previous
instruction to write the register again!
▪ Since the result is not being used, no need to write it!
▪ Both can be implemented by detecting WAW hazard in ID stage!
▪ The difficulty is to detect when a second instruction may finish before an
earlier one!
▪ Since this case is rare, a simple solution can be used
▪ Do not issue an instruction which writes back the same register as an earlier one
which is in execution!
Summing up the hazard detection in FP pipeline …
Hazards in FP Pipeline
110

▪ Detecting hazards
▪ Between integer instructions!
▪ Only for load use or branches if branch is decided earlier!
▪ Between FP instructions
▪ Between FP and integer instructions
▪ For FP loads or stores!
▪ For data movement between FP and integer registers!
Hazards in FP Pipeline
111

▪ Assuming that all hazards are detected in ID stage, 3 checks need to

be performed
1. Check for structural hazards!
▪ Check if divide unit is busy or not!
▪ Check if write port will be available when needed!
2. Check for RAW hazards!
▪ Source registers should not be pending destinations in the pipeline when
this instruction will need them in pipeline!
▪ A number of checks need to be made for this!
3. Check for a WAW hazard!
▪ If an instruction in A1, …, A4, D, M1, …, M7 has same destination as this
one,
▪ Stall this instruction!
Hazards in FP Pipeline
112

▪ Forwarding is similar as in integer pipeline!

▪ Test for forwarding
▪ Check if destination in EX/MEM, A4/MEM, M7/MEM, D/MEM and WB/MEM
pipeline register is same as the source operands of current instruction (in
execution)!
Stalls Frequency in FP Pipeline
113

▪ How efficient a FP pipeline performs?

Relevant Reading
114

▪ Computer Architecture: A Quantitative Approach, 6th Edition,

Hennessy and Patterson
▪ Appendix C
▪ Sections C.4 and C.5
▪ Recommended to read case study in C.6 as well!

Disc08 Sols
100% (1)
Disc08 Sols
8 pages
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
No ratings yet
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
64 pages
CODch 6 Slides
No ratings yet
CODch 6 Slides
77 pages
Lect8 Pipelined DP Control
No ratings yet
Lect8 Pipelined DP Control
59 pages
L04 Pipelining
No ratings yet
L04 Pipelining
48 pages
Pipelining Lecture
No ratings yet
Pipelining Lecture
39 pages
CO4 PPT Modified
No ratings yet
CO4 PPT Modified
35 pages
2.pipeline RISC-V v2
No ratings yet
2.pipeline RISC-V v2
47 pages
Week 11
No ratings yet
Week 11
33 pages
Pipelining 2019
No ratings yet
Pipelining 2019
82 pages
Pipelining and Pipelining Hazards
No ratings yet
Pipelining and Pipelining Hazards
43 pages
Chapter 4 Part 2
No ratings yet
Chapter 4 Part 2
50 pages
Pipelining Preview: Basics & Challenges
No ratings yet
Pipelining Preview: Basics & Challenges
75 pages
Lec04 Pipelining Intro&hazards
No ratings yet
Lec04 Pipelining Intro&hazards
77 pages
Lecture 8 Chapter - 04 RISC-V Pipelining - Student Version
No ratings yet
Lecture 8 Chapter - 04 RISC-V Pipelining - Student Version
59 pages
Module 5 Part2 Pipelining
No ratings yet
Module 5 Part2 Pipelining
36 pages
Chapter 04 RISC V Removed
No ratings yet
Chapter 04 RISC V Removed
99 pages
Computer Architecture: Pipelining: Dr. Ashok Kumar Turuk
No ratings yet
Computer Architecture: Pipelining: Dr. Ashok Kumar Turuk
136 pages
CAO Pipelining Lecture
No ratings yet
CAO Pipelining Lecture
50 pages
Computer Architecture: Nguyễn Trí Thành
No ratings yet
Computer Architecture: Nguyễn Trí Thành
77 pages
Week 11-13
No ratings yet
Week 11-13
76 pages
Presentation 1
No ratings yet
Presentation 1
22 pages
Chapter - 04 RISC V
No ratings yet
Chapter - 04 RISC V
132 pages
Unit2 Aca
No ratings yet
Unit2 Aca
118 pages
Reduced Instruction Set Computers Pipelining: (RISC)
No ratings yet
Reduced Instruction Set Computers Pipelining: (RISC)
25 pages
Computer Architecture and Organization
No ratings yet
Computer Architecture and Organization
49 pages
Chapter 10 Principles of Pipelining
No ratings yet
Chapter 10 Principles of Pipelining
124 pages
L04 Pipelining
No ratings yet
L04 Pipelining
38 pages
PIPELINING
No ratings yet
PIPELINING
30 pages
05 Pipelining
No ratings yet
05 Pipelining
37 pages
Pipelined Datapath and Control
No ratings yet
Pipelined Datapath and Control
37 pages
Lec11 Pipeline 1 Notes
No ratings yet
Lec11 Pipeline 1 Notes
26 pages
Computer System Organization
No ratings yet
Computer System Organization
26 pages
Pipeline
100% (2)
Pipeline
8 pages
Reduced Instruction Set Computers Pipelining: (RISC)
No ratings yet
Reduced Instruction Set Computers Pipelining: (RISC)
25 pages
SRM Pipelining 05
No ratings yet
SRM Pipelining 05
42 pages
Chapter # 03 Pipelining
No ratings yet
Chapter # 03 Pipelining
85 pages
CA Unit-2 Chapter-2
No ratings yet
CA Unit-2 Chapter-2
36 pages
Analysis of CPU
No ratings yet
Analysis of CPU
9 pages
Lecture # 8B
No ratings yet
Lecture # 8B
20 pages
Lec07 Pipelining Review
No ratings yet
Lec07 Pipelining Review
121 pages
Unit-V: Performance Enhancement Techinques
No ratings yet
Unit-V: Performance Enhancement Techinques
61 pages
Pipe Lining
No ratings yet
Pipe Lining
16 pages
Pipe Lining
No ratings yet
Pipe Lining
66 pages
Pipelinehazard 160823134502
No ratings yet
Pipelinehazard 160823134502
61 pages
Lpower Design Method UPF
No ratings yet
Lpower Design Method UPF
12 pages
Chapter 17 - Pipelining Hazards
No ratings yet
Chapter 17 - Pipelining Hazards
33 pages
Enhancing Performance With Pipelining
No ratings yet
Enhancing Performance With Pipelining
85 pages
CS 61C: Great Ideas in Computer Architecture Lecture 13: Pipelining
No ratings yet
CS 61C: Great Ideas in Computer Architecture Lecture 13: Pipelining
47 pages
Pipelinehazard For Class
No ratings yet
Pipelinehazard For Class
61 pages
Embedded Computer Architecture 5SAI0
No ratings yet
Embedded Computer Architecture 5SAI0
59 pages
FPGA
No ratings yet
FPGA
10 pages
CO Pipelining PDF Notes
No ratings yet
CO Pipelining PDF Notes
10 pages
Ca07 2014 PDF
No ratings yet
Ca07 2014 PDF
56 pages
Risc in Pipe Ine
No ratings yet
Risc in Pipe Ine
39 pages
Device Parameter Extraction of 14Nm, 10nmand 7Nm Finfet
No ratings yet
Device Parameter Extraction of 14Nm, 10nmand 7Nm Finfet
43 pages
CS530 Fall2015 Lecture9
No ratings yet
CS530 Fall2015 Lecture9
5 pages
Pipelines - #1 RISC ISA Without Pipe
No ratings yet
Pipelines - #1 RISC ISA Without Pipe
9 pages
CHAPTER-1: Introduction To Microprocessor (10%) : Short Answer Questions
No ratings yet
CHAPTER-1: Introduction To Microprocessor (10%) : Short Answer Questions
6 pages
HRY-312 Computer Organization Introduction To Pipelining
No ratings yet
HRY-312 Computer Organization Introduction To Pipelining
30 pages
The Robustness of Various Test Compression Techniques
100% (1)
The Robustness of Various Test Compression Techniques
16 pages
Microcontroller&Its Applications 2021 Notes
No ratings yet
Microcontroller&Its Applications 2021 Notes
38 pages
Microprocessor and Microcontroller
0% (1)
Microprocessor and Microcontroller
3 pages
Lecture Ch4 Performance
No ratings yet
Lecture Ch4 Performance
25 pages
Digital Logic Design Ched Format
No ratings yet
Digital Logic Design Ched Format
4 pages
Summary of The Types of Flip
50% (2)
Summary of The Types of Flip
2 pages
Traffic Control System Using 89c51 and Assembly Language
0% (2)
Traffic Control System Using 89c51 and Assembly Language
27 pages
ECAD and VLSI Lab Manual
No ratings yet
ECAD and VLSI Lab Manual
61 pages
Shuffled MCQs Cleaned
No ratings yet
Shuffled MCQs Cleaned
74 pages
Stack and Subroutines
No ratings yet
Stack and Subroutines
24 pages
MES Module 3
No ratings yet
MES Module 3
83 pages
Automating Variation and Repeater Analysis in Physical Design of
No ratings yet
Automating Variation and Repeater Analysis in Physical Design of
79 pages
15ME608 LAB MANUAL-final
No ratings yet
15ME608 LAB MANUAL-final
40 pages
4th Sem. / Computer Engineering / I.T. Subject: Computer Organization
No ratings yet
4th Sem. / Computer Engineering / I.T. Subject: Computer Organization
2 pages
Scheme Emachines d640 Aser4251g Je40 DN
No ratings yet
Scheme Emachines d640 Aser4251g Je40 DN
63 pages
P5V580-AT-C: Isa/Pci/Io Mother Board User's Manual
No ratings yet
P5V580-AT-C: Isa/Pci/Io Mother Board User's Manual
31 pages
DSD CW1
No ratings yet
DSD CW1
8 pages
Cs2-Chapter 4 Question Bank
No ratings yet
Cs2-Chapter 4 Question Bank
3 pages
Diploma Time
No ratings yet
Diploma Time
25 pages
The Main Memory System: Challenges and Opportunities
No ratings yet
The Main Memory System: Challenges and Opportunities
26 pages
TAC Xenta: Connection Cable TAC Xenta-RS232 (Part No. 007309030 To Be Phased Out
No ratings yet
TAC Xenta: Connection Cable TAC Xenta-RS232 (Part No. 007309030 To Be Phased Out
2 pages
De-211 Quick Install v2
No ratings yet
De-211 Quick Install v2
2 pages
1 Malaysia Netbook
No ratings yet
1 Malaysia Netbook
11 pages
Main Articles: And: CPU Design Control Unit
No ratings yet
Main Articles: And: CPU Design Control Unit
3 pages
Problem in Instanciating Multi-Dimensional Array in VHDL
No ratings yet
Problem in Instanciating Multi-Dimensional Array in VHDL
5 pages
FEPL Xilinx Application Module (FE-XAM) Installation Guide
No ratings yet
FEPL Xilinx Application Module (FE-XAM) Installation Guide
3 pages
All My IT Tech Posts
From Everand
All My IT Tech Posts
Stephen Edwards
No ratings yet
Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation
From Everand
Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation
Bruce Dang
No ratings yet
Computer Science II Essentials
From Everand
Computer Science II Essentials
Randall Raus
No ratings yet