CA L7 Unit4 Slides Updated
CA L7 Unit4 Slides Updated
CA L7 Unit4 Slides Updated
1
Outline: Unit 4
• Pipeline Hazards
• Pipelined Datapath
2
Combination of Combinational and Sequential Logic
Elements
• Because only state elements can store a data value, any collection of
combinational logic must have its inputs come from a set of state
elements and its outputs written into a set of state elements
3
Datapath with Control
4
RISC-V Instruction Execution Steps
5
Steps in RISC-V Instruction Execution
6
Steps in RISC-V Instruction Execution
ld x1, 100(x4)
ld x2, 200(x4)
ld x3, 300(x4)
7
Steps in RISC-V Instruction Execution
ld x1, 100(x4)
ld x2, 200(x4)
ld x3, 300(x4)
Clock Cycle: 1 8
Steps in RISC-V Instruction Execution
ld x1, 100(x4)
ld x2, 200(x4)
ld x3, 300(x4)
Clock Cycle: 2 9
Steps in RISC-V Instruction Execution
ld x1, 100(x4)
ld x2, 200(x4)
ld x3, 300(x4)
Clock Cycle: 3 10
Processor Implementation
• Instruction Count
– Determined by Compiler and Instruction Set Architecture
• CPI and Clock cycle time
– Determined by Processor Implementation
11
Performance Issues of Single Cycle Processor Design
• CPI is 1.
12
Steps in RISC-V Instruction Execution
13
RISC-V Performance: Single Cycle
• Assume that the operation times for the major functional units of data
path are :
– 100ps for register read or write
– 200 ps for memory access for instructions or data,
– 200 ps for ALU operation
Single-Cycle Implementation
• Clock cycle time must support the
longest instruction (i.e. ld)
• Clock Cycle Time : 800 ps
14
Steps in RISC-V Instruction Execution
15
Clock Cycle = 800
RISC-V Performance: Single Cycle
Single-cycle
16
Performance Issues of Single Cycle Processor Design
• CPI is 1.
17
Performance Issues of Single Cycle Processor Design
• CPI is 1.
18
Pipelining Analogy
19
Pipelining Analogy
• Hence, the first RISC-V pipeline we explore has five stages: one step per
stage
21
Steps in RISC-V Instruction Execution
22
Steps in RISC-V Instruction Execution
ld x1, 100(x4)
ld x2, 200(x4)
ld x3, 300(x4)
23
Steps in RISC-V Instruction Execution
ld x1, 100(x4)
ld x2, 200(x4)
ld x3, 300(x4)
Clock Cycle: 1 24
Steps in RISC-V Instruction Execution
ld x1, 100(x4)
ld x2, 200(x4)
ld x3, 300(x4)
Clock Cycle: 2 25
Steps in RISC-V Instruction Execution
ld x1, 100(x4)
ld x2, 200(x4)
ld x3, 300(x4)
Clock Cycle: 3 26
RISC-V Pipeline Performance
• Assume that the operation times for the major functional units of data
path are :
– 100ps for register read or write
– 200 ps for memory access for instructions or data,
– 200 ps for ALU operation
27
Steps in RISC-V Instruction Execution
28
Steps in RISC-V Instruction Execution
29
Clock Cycle = 800
Steps in RISC-V Instruction Execution: Pipelined
30
Steps in RISC-V Instruction Execution: Pipelined
• Assume that the operation times for the major functional units of data
path are :
– 100ps for register read or write
– 200 ps for memory access for instructions or data,
– 200 ps for ALU operation
32
RISC-V Pipeline
Single-cycle
Pipelined
33
RISC-V Pipeline
Single-cycle
Pipelined
34
RISC-V Pipeline
Single-cycle
Pipelined
Single-cycle
Pipelined
Single-cycle
Pipelined
• Under ideal conditions and if the pipeline stages are balanced (i.e. all the
stages took the same operational time)
38
Performance Improvement through Pipelining
• Under ideal conditions and if the pipeline stages are balanced (i.e. all the
stages took the same operational time)
39
Steps in RISC-V Instruction Execution
40
Steps in RISC-V Instruction Execution: Unbalanced Pipeline
41
Steps in RISC-V Instruction Execution
42
Steps in RISC-V Instruction Execution: Balanced Pipeline
43
Pipeline Hazards
• These are situations in pipelining when the next instruction cannot execute
in the following clock cycle.
44
Pipeline Hazards
• These are situations in pipelining when the next instruction cannot execute
in the following clock cycle.
45
Pipeline Hazards
• These are situations in pipelining when the next instruction cannot execute
in the following clock cycle.
46
Pipeline Hazards
• These are situations in pipelining when the next instruction cannot execute
in the following clock cycle.
1. Structural Hazard
– A required hardware resource is busy
2. Data Hazard
– Data that is needed to execute the next instruction has not yet become available
3. Control Hazard
– Deciding on control action (or instruction sequence) depends on previous instruction that
has not yet been completed
47
Structural Hazards
• Example
– In RISC-V pipeline with a single memory, Load/store requires data access to this single
memory
– Instruction fetch (for a future instruction) would have to stall for that cycle
48
Steps in RISC-V Instruction Execution
ld x1, 100(x4)
ld x2, 200(x4)
ld x3, 300(x4)
Clock Cycle: 3 49
RISC-V Pipeline
Single-cycle
Pipelined
50
Data Hazards
51
Data Hazards
• Example
– add x19, x0, x1
sub x2, x19, x3
52
Solving Data Hazards: Forwarding (Bypassing)
• Example
– add x19, x0, x1
sub x2, x19, x3
53
Solving Data Hazards: Limitations of Forwarding
54
Solving Data Hazards: Limitations of Forwarding
55
Solving Data Hazards: Code Scheduling
56
Solving Data Hazards: Code Scheduling
57
Solving Data Hazards: Code Scheduling
58
Control Hazards
59
Control Hazards
60
Steps in RISC-V Instruction Execution
ld x1, 100(x4)
ld x2, 200(x4)
ld x3, 300(x4)
61
Steps in RISC-V Instruction Execution
64
Steps in RISC-V Instruction Execution
Clock Cycle: 3 65
Handling Control Hazards: Types of Branch Prediction
66
Designing Instruction Set Architecture (ISA) for Pipelining
• RISC-V ISA has been designed for pipelining. As can be seen from the
following design choices:
– In an instruction set like the x86, where instructions vary from 1 byte to 15 bytes,
pipelining is considerably more challenging.
– Modern implementations of the x86 architecture actually translate x86 instructions into
simple operations that look like RISC-V instructions and then pipeline the simple
operations rather than the native x86 instructions!
67
Designing Instruction Set Architecture (ISA) for Pipelining
• RISC-V ISA has been designed for pipelining. As can be seen from the
following design choices:
2) RISC-V only has a few instruction formats, with the source and
destination register fields being located in the same place in each
instruction
– This design choice makes it much easier to decode instructions and read registers in one
step.
68
Designing Instruction Set Architecture (ISA) for Pipelining
• RISC-V ISA has been designed for pipelining. As can be seen from the
following design choices:
– If we could operate on the operands in memory, as in the x86, stages 3 and 4 would
expand to an address stage, memory stage, and then execute stage.
69
Pipelining: Overview
• Subject to hazards
– Structure, data, control
70
Pipelining Hazards: Exercise
ld x10, 0(x10)
add x11, x10, x10
71
Pipelining Hazards: Exercise
add x11, x10, x10
addi x12, x10, 5
addi x14, x11, 5
72
Pipeline Diagrams
73
Single Clock-Cycle Pipeline Diagram: Example
74
Single Clock-Cycle Pipeline Diagram: Example
75
Multiple Clock-Cycle Pipeline Diagram: Example
76
Multiple Clock-Cycle Pipeline Diagram: Example
• Traditional form
77
Pipeline Operation for Load Instruction: IF
78
Pipeline Operation for Load Instruction: ID
79
Pipeline Operation for Load Instruction: EX
80
Pipeline Operation for Load Instruction: MEM
81
Pipeline Operation for Load Instruction: WB
82
Flaw in Pipeline Operation for Load Instruction: WB
Wrong
register
number
83
Corrected Pipelined Datapath for Load Instruction
84
Pipeline Operation for Store Instruction: IF
85
Pipeline Operation for Store Instruction: ID
86
Pipeline Operation for Store Instruction: EX
87
Pipeline Operation for Store Instruction: MEM
88
Pipeline Operation for Store Instruction: WB
89
Datapath with Control
90
Implementation of Main Control Unit
91
Implementation of Main Control Unit
Truth Table
• Control signals are derived
from binary encoded
instructions.
92
Pipelined Datapath with Control
93
Grouping of Control Signals by Pipeline Stages: IF & ID
95
Grouping of Control Signals by Pipeline Stages: MEM
96
Grouping of Control Signals by Pipeline Stages: WB
97
Implementation of Main Control Unit
Truth Table
• Control signals are derived
from binary encoded
instructions.
99
Data Hazards and Forwarding: Implementation Issues
• Implementation of forwarding
100
Data Hazards
101
Data Hazards
102
Data Hazards
103
Detecting the Need to Forward
104
Pipelined Datapath
105
Detecting the Need to Forward
106
Data Hazards and Forwarding
107
Data Hazards and Forwarding
108
Forwarding Pathways
109
Forwarding Pathways
110
Forwarding Conditions
– if (EX/MEM.RegWrite
and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd == ID/EX.RegisterRs2)) ForwardB = 10
111
Forwarding Conditions
– if (MEM/WB.RegWrite
and (MEM/WB.RegisterRd ≠ 0)
and (MEM/WB.RegisterRd == ID/EX.RegisterRs2)) ForwardB = 01
112
Double Data Hazard
113
Data Hazards and Forwarding
114
Data Hazards and Forwarding
115
Double Data Hazard
– if (MEM/WB.RegWrite
and (MEM/WB.RegisterRd ≠ 0)
and not(EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs2))
and (MEM/WB.RegisterRd == ID/EX.RegisterRs2)) ForwardB = 01
117
Pipelined Datapath with Forwarding
118
Solving Data Hazards: Limitations of Forwarding
119
Load-Use Data Hazard
120
Load-Use Data Hazard: Need for Hazard Detection Unit
121
Load-Use Data Hazard: Need for Hazard Detection Unit
• In ID stage, Hazard Detection Unit uses the following condition to test for
the occurrence of load-use data hazard
– ID/EX.MemRead and
((ID/EX.RegisterRd = IF/ID.RegisterRs1) or
(ID/EX.RegisterRd = IF/ID.RegisterRs2))
122
How to Stall the Pipeline
123
Load-Use Data Hazard: Need for Hazard Detection Unit
• In ID stage, Hazard Detection Unit uses the following condition to test for
the occurrence of load-use data hazard
– ID/EX.MemRead and
((ID/EX.RegisterRd = IF/ID.RegisterRs1) or
(ID/EX.RegisterRd = IF/ID.RegisterRs1))
–
• If load-use data hazard is detected, Hazard Detection Unit stalls the
pipeline (and inserts a bubble)
– The hazard detection unit controls the writing of the PC and IF/ID registers plus the
multiplexor that chooses between the real control values and all 0s
124
Load-Use Data Hazard: How to Stall the Pipeline
125
Load-Use Data Hazard: How to Stall the Pipeline
126
Load-Use Data Hazard: How to Stall the Pipeline
127
Load-Use Data Hazard: How to Stall the Pipeline
128
Pipeline Hazards
• These are situations in pipelining when the next instruction cannot execute
in the following clock cycle.
1. Structural Hazard
– A required hardware resource is busy
2. Data Hazard
– Data that is needed to execute the next instruction has not yet become available
3. Control Hazard
– Deciding on control action (or instruction sequence) depends on previous instruction that
has not yet been completed
129
Control Hazards
130
Control Hazards
131
Control Hazards
132
Handling Control Hazards: Reducing Branch Delay
133
Control Hazards
• Consider this sequence:
36: sub x10, x4, x8
40: beq x1, x3, 16 // PC-relative branch
// to 40+16*2=72
44: and x12, x2, x5
48: or x13, x2, x6
52: add x14, x4, x2
56: sub x15, x6, x7
...
72: ld x4, 50(x7)
• Simple strategy:
– Let’s always assume that the branch is not taken
134
Example: Wrong Assumption about Branch Outcome
135
Example: Wrong Assumption about Branch Outcome
136
How to Convert an Instruction in Pipeline to NOP(Bubble)
137
How to Convert an Instruction in Pipeline to NOP(Bubble)
• To flush instructions in the IF stage, we add a control line, called IF.Flush, that zeros the
138
instruction field of the IF/ID pipeline register.
Control Hazards
140
Handling Control Hazards: Types of Branch Prediction
141
Dynamic Branch Prediction: Implementation
142
Dynamic Branch Prediction: Implementation
• 1-bit Predictor
– Only 1 bit is used to keep the prediction information.
– At each misprediction, the prediction bit is inverted
• 2-bit Predictor
– 2 bits are used to keep the
prediction information
– The 2 bits are used to encode 4
states in the system
143
Dynamic Branch Prediction: Implementation
• Ideally, the accuracy of the predictor would match the taken branch
frequency for highly regular branches
– Consider the inner loop branch that branches nine times in a row, and then is not taken
once