CH 6
CH 6
Pipeline Hazards
Task A
order
B
C
D
Time
6PM 7 8 9 10 11 12 1 2AM
Pipeline : 3.5 hours
Task A
order
B
Stages in pipeline
C
D
SpeedUP = 8 / 3.5 = 2.3 => 4?
2
MIPS Architecture
Each MIPS instruction take five steps
Instruction fetch (IF)
Instruction decode and register fetch
ALU operation or calculate the address
Data access in data memory
Register Write
Instruction execution
5 steps -> 5 stages
IF ID EX MEM WB
3
MIPS Architecture
Class Function units
R-type Instruction fetch Register access ALU Register access
Load word Instruction fetch Register access ALU Memory access Register access
Store word Instruction fetch Register access ALU Memory access
Branch Instruction fetch Register access ALU
Jump Instruction fetch
Assume MUX, control unit, PC access, sign extension have no delay
Instrution Instruction Register ALU Data Register Total
Class memory Read Operation Memory write
R-type 200 100 200 0 100 600
Load word 200 100 200 200 100 800
Store word 200 100 200 200 700
Branch 200 100 200 0 500
Jump 200 200
4
Single Cycle v.s. Pipelined
Program
execution 200 400 600 800 1000 1200 1400 1600 1800
Time
order
(in instructions)
lw $1, 100($0)
Instruction
fetch
Reg ALU
Data
access
Reg Instruction cycle = 800 ps
lw $2, 200($0) Instruction Data
800 ps fetch
Reg ALU access
Reg
800 ps
Program
execution 200 400 600 800 1000 1200 1400
Time
order
(in instructions)
Instruction Data
lw $1, 100($0)
fetch
Reg ALU access
Reg Instruction cycle = 200 ps
Instruction Data
lw $2, 200($0) 200 Reg ALU Reg
ps fetch access
Instruction Data
lw $3, 300($0) 200 Reg ALU access
Reg
ps fetch
5
Pipelining SpeedUp
Time between instructions pipelined
Time between instructions nonpiprlined
=
𝑁𝑢𝑚𝑏𝑒𝑟 of pipeline stages
Ideal speedup = number of pipeline stages
Conditions
Stages are perfectly balanced
Large number of instructions
Total execution time is less important especially when large
number of instructions
So what’s problem in our previous pipeline?
So why Pipelining can Improve performance
Improve performance by increasing instruction throughput
Instead of decreasing the execution time of an individual
instructions
6
Pipelining
What makes it easy
all instructions are the same length
just a few instruction formats
memory operands appear only in loads and stores
What makes it hard?
structural hazards: suppose we had only one memory
control hazards: need to worry about branch instructions
data hazards: an instruction depends on a previous instruction
7
Pipeline Hazards
Hazards
Situations in pipelining when the next instruction cannot execute
in the following clock cycles
Three types of hazards
Structural hazards
Due to resource constraints
Data hazards
Due to data availability
Control hazards
Due to change of instruction flow
8
Hazard
Limits to pipelining: Hazards prevent next instruction from
executing during its designated clock cycle
Structural hazards: Hardware cannot support this combination of
instructions - two instructions need the same resource.
Data hazards: Instruction depends on result of prior instruction still in
the pipeline
Control hazards: Pipelining of branches & other instructions that
change the PC
Common solution is to stall the pipeline until the hazard is
resolved, inserting one or more “bubbles” in the pipeline
To do this, hardware or software must detect that a hazard
has occurred.
9
Structural hazards
Hardware cannot support the instructions executing in the
same clock cycle
Limited resources
Eg. Memory accesses
10
Pipelining MIPS Execution
Program
execution 2 4 6 8 10 12 14 16 18
Time
order
(in instructions)
Instruction Data
lw $1, 100($0) Reg ALU Reg
fetch access
8 ps
Program
execution 2 4 6 8 10 12 14
Time
order
(in instructions)
lw $1, 100($0) Instruction Data
Reg ALU access
Reg
fetch
Instruction Data
lw $2, 200($0) 2 ps Reg ALU Reg
fetch access
Instruction Data
lw $3, 300($0) 2 ps Reg ALU access
Reg
fetch
2 ps 2 ps 2 ps 2 ps 2 ps
I Load
ALU
Ifetch Reg DMem Reg
n
s
Instr 1 Reg Reg
ALU
Ifetch DMem
t
r.
Instr 2 Reg
ALU
Ifetch Reg DMem
O
r
Instr 3
ALU
Ifetch Reg DMem Reg
d
e
r Instr 4
ALU
Ifetch Reg DMem Reg
12
Structural Hazards
⚫ Structural hazards occur when two or more instructions
need the same resource.
⚫ Common methods for eliminating structural hazards are:
– Duplicate resources
– Pipeline the resource
– Reorder the instructions
⚫ It may be too expensive too eliminate a structural hazard, in
which case the pipeline should stall.
⚫ When the pipeline stalls, no instructions are issued until the
hazard has been resolved.
⚫ What are some examples of structural hazards?
13
Why Pipeline?
Time (clock cycles)
Single-
Inst 0 Reg cycle
ALU
I Im Reg Dm
n Datapath
s
Inst 1 Im Reg
ALU
Reg Dm
t
r.
Inst 2 Im Reg Dm Reg
ALU
O
r
Inst 3 Im Reg Dm Reg
ALU
d
e
r Inst 4 Im Reg Dm Reg
ALU
14
One Memory Port Structural Hazards
Time (clock cycles)
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7
ALU
I Load Ifetch Reg DMem Reg
n
s Instr 1
ALU
Ifetch Reg DMem Reg
t
r.
Instr 2
ALU
Ifetch Reg DMem Reg
O
r Stall Bubble Bubble Bubble Bubble Bubble
d
e
r Instr 3
ALU
Ifetch Reg DMem Reg
15
Data Hazards
An instruction depends on the results of a previous
instruction still in the pipeline
Data dependency
Example
IF ID EXE MEM WB
16
Solutions to the Data Hazards
Rely on compiler to remove the these dependency
Data forwarding (bypassing)
Getting the needed data item early from the internal resources
Program
Execution 200 400 600 800 1000
Order Time
(in instructions)
17
Load-Use Data Hazard
Program
Execution
Time
200 400 600 800 1000 1200 1400
Order
(in instructions)
18
Data Hazard Classification
Classified according to the order of read and write accesses
RAW (Read after write)
J tries to read a source before I write it, so j incorrectly get the old
value
add $s0, $to, $t1; write at 5th stage
sub $t2, $s0, $t3; read at 2nd stage
IF ID EX MEM WB
IF ID EXE MEM WB
19
Data Hazard Classification
WAW (Write after write)
J tries to write an operand before it is written by I, leaving the old
value
lw $s0, 100($t0); write at 6th stage
add $s0, $t1, $t3; write at 4th stage
IF ID EX MEM1 MEM2 WB
IF ID EXE WB
Present only in the pipelines that write in more than one pipe
stage or out-of-order execution (allowed instruction continuing
even previous one is stalled
20
Data Hazard Classification
WAR (Write after read)
J tries to write a destination before it is read by I, so I incorrectly
gets the new value
sw $s0, 100($t0); use $t0 at 5th stage
add $t0, $t1, $t3; write at 4th stage
IF ID EX MEM1 MEM2 WB
IF ID EXE WB
This hazard occurs when there are some instructions that write
results early in the instruction pipeline, and other instructions
that read a source late in the pipeline.
RAR (Read after read): this is not a hazard
21
Solutions to the Data Hazards
Rely on compiler (or DIY) to remove the these dependency
Reordering code to avoid pipeline stalls
A = B + E, C = B + F
lw $t1, 0($t0) lw $t1, 0($t0)
lw $t2, 4($t0) lw $t2, 4($t0)
add $t3, $t2, $t1 lw $t4, 8($t1)
sw $t3, 12($t0) add $t3, $t2, $t2
lw $t4, 8($t1) sw $t3, 12($t0)
add $t5, $t1, $t4 add $t5, $t1, $t4
sw $t5, 16($t0) sw $t5, 16($t0)
The hazard in blue line can be solved by forwarding
22
Control Hazards
Flow of instruction addresses is not what the pipeline
expected
What is the pipeline expected? Next instructions
Due to Branch instruction
Simplest solution: Stall on branch
Program
200 400 600 800 1000 1200 1400
execution Time
order
(in instructions)
Instruction Data
add $4, $5, $6 fetch Reg ALU access
Reg
23
Performance of “Stall on Branch”
Assume all other instructions have a CPI of 1
Branch: 13%
CPI = 1 (ideal case)+ 1 (one stall) *0.13 = 1.13
If not resolved by the 2nd stage, the situation will be worse
Better Solution: Branch Prediction
24
2nd Solution to Control Hazards
Predict
Use prediction to handle branches
If predict right -> no stall
If predict wrong -> 1 stall
Approach
Static prediction
Always untaken
Some branches as taken and some as untaken,
Always taken for loop branch
Dynamic prediction
Keep a history for each branch as taken or untaken
What you should concern for prediction
If guessed wrong, the pipeline control should ensure that the
instruction following the wrongly guessed branch has no effect
and must restart the pipeline from the proper branch address
25
Predict “Not Taken”
Right predict Program
execution Time 200 400 600 800 1000 1200 1400
order
(in instructions)
Instruction Data
add $4, $5, $6 fetch Reg ALU access
Reg
26
Third Solution to Control Hazards
Delayed Branch
Place an instruction into the branch delay slot that is not
affected by branch
Handwritten or compiler
Program
execution
order
(in instructions)
Time 200 400 600 800 1000 1200 1400
delayed branch
beq $1, $2, 40
Instruction Data
add $4, $5, $6 fetch
Reg ALU access
Reg
27
Note on Delayed Branch
Compiler typically fill about 50% of branch delays slots with
useful instruction
Imply 50% of slot could be “NOP”
If the pipeline is deeper, more branch delays slots and even
harder to fill
28
See You Next Class!
29