0% found this document useful (0 votes)
24 views29 pages

CH 6

The document discusses pipelining in computer architecture, focusing on MIPS architecture and the concept of pipeline hazards. It outlines the types of hazards—structural, data, and control—and their impact on instruction execution, as well as potential solutions like stalling and data forwarding. Additionally, it emphasizes the importance of balancing pipeline stages and the role of compiler optimizations in mitigating hazards.

Uploaded by

秦槐駿
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views29 pages

CH 6

The document discusses pipelining in computer architecture, focusing on MIPS architecture and the concept of pipeline hazards. It outlines the types of hazards—structural, data, and control—and their impact on instruction execution, as well as potential solutions like stalling and data forwarding. Additionally, it emphasizes the importance of balancing pipeline stages and the role of compiler optimizations in mitigating hazards.

Uploaded by

秦槐駿
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Computer Organization

Pipeline Hazards

Prof. Ya-Shu Chen


National Taiwan University of Science and Technology
1
Overview of Pipelining
 Pipelining
 Multiple instructions are overlapped in execution
8 10 11 12 1
Nonpipelined : 8 hours
6PM 7 9 2AM
Time

Task A
order
B
C
D

Time
6PM 7 8 9 10 11 12 1 2AM
Pipeline : 3.5 hours

Task A
order

B
Stages in pipeline
C

D
SpeedUP = 8 / 3.5 = 2.3 => 4?
2
MIPS Architecture
 Each MIPS instruction take five steps
 Instruction fetch (IF)
 Instruction decode and register fetch
 ALU operation or calculate the address
 Data access in data memory
 Register Write
 Instruction execution
 5 steps -> 5 stages

ex: add $s0, $t0, $t1


200 400 600 800 1000
Time

IF ID EX MEM WB

3
MIPS Architecture
Class Function units
R-type Instruction fetch Register access ALU Register access
Load word Instruction fetch Register access ALU Memory access Register access
Store word Instruction fetch Register access ALU Memory access
Branch Instruction fetch Register access ALU
Jump Instruction fetch
Assume MUX, control unit, PC access, sign extension have no delay
Instrution Instruction Register ALU Data Register Total
Class memory Read Operation Memory write
R-type 200 100 200 0 100 600
Load word 200 100 200 200 100 800
Store word 200 100 200 200 700
Branch 200 100 200 0 500
Jump 200 200
4
Single Cycle v.s. Pipelined
Program
execution 200 400 600 800 1000 1200 1400 1600 1800
Time
order
(in instructions)
lw $1, 100($0)
Instruction
fetch
Reg ALU
Data
access
Reg Instruction cycle = 800 ps
lw $2, 200($0) Instruction Data
800 ps fetch
Reg ALU access
Reg

lw $3, 300($0) Instruction


800 ps fetch

800 ps
Program
execution 200 400 600 800 1000 1200 1400
Time
order
(in instructions)
Instruction Data
lw $1, 100($0)
fetch
Reg ALU access
Reg Instruction cycle = 200 ps
Instruction Data
lw $2, 200($0) 200 Reg ALU Reg
ps fetch access
Instruction Data
lw $3, 300($0) 200 Reg ALU access
Reg
ps fetch

200 200 200 200 200


ps ps ps ps ps

5
Pipelining SpeedUp
Time between instructions pipelined
Time between instructions nonpiprlined
=
𝑁𝑢𝑚𝑏𝑒𝑟 of pipeline stages
 Ideal speedup = number of pipeline stages
 Conditions
 Stages are perfectly balanced
 Large number of instructions
 Total execution time is less important especially when large
number of instructions
 So what’s problem in our previous pipeline?
 So why Pipelining can Improve performance
 Improve performance by increasing instruction throughput
 Instead of decreasing the execution time of an individual
instructions
6
Pipelining
 What makes it easy
 all instructions are the same length
 just a few instruction formats
 memory operands appear only in loads and stores
 What makes it hard?
 structural hazards: suppose we had only one memory
 control hazards: need to worry about branch instructions
 data hazards: an instruction depends on a previous instruction

7
Pipeline Hazards
 Hazards
 Situations in pipelining when the next instruction cannot execute
in the following clock cycles
 Three types of hazards
 Structural hazards
 Due to resource constraints
 Data hazards
 Due to data availability
 Control hazards
 Due to change of instruction flow

8
Hazard
 Limits to pipelining: Hazards prevent next instruction from
executing during its designated clock cycle
 Structural hazards: Hardware cannot support this combination of
instructions - two instructions need the same resource.
 Data hazards: Instruction depends on result of prior instruction still in
the pipeline
 Control hazards: Pipelining of branches & other instructions that
change the PC
 Common solution is to stall the pipeline until the hazard is
resolved, inserting one or more “bubbles” in the pipeline
 To do this, hardware or software must detect that a hazard
has occurred.

9
Structural hazards
 Hardware cannot support the instructions executing in the
same clock cycle
 Limited resources
 Eg. Memory accesses

10
Pipelining MIPS Execution
Program
execution 2 4 6 8 10 12 14 16 18
Time
order
(in instructions)
Instruction Data
lw $1, 100($0) Reg ALU Reg
fetch access

lw $2, 200($0) Instruction Data


8 ps fetch
Reg ALU access
Reg

lw $3, 300($0) Instruction


8 ps fetch

8 ps
Program
execution 2 4 6 8 10 12 14
Time
order
(in instructions)
lw $1, 100($0) Instruction Data
Reg ALU access
Reg
fetch
Instruction Data
lw $2, 200($0) 2 ps Reg ALU Reg
fetch access
Instruction Data
lw $3, 300($0) 2 ps Reg ALU access
Reg
fetch

2 ps 2 ps 2 ps 2 ps 2 ps

What if the 4th instruction appears?


11
One Memory Port
Structural Hazards
Time (clock cycles)
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7

I Load

ALU
Ifetch Reg DMem Reg

n
s
Instr 1 Reg Reg

ALU
Ifetch DMem
t
r.
Instr 2 Reg

ALU
Ifetch Reg DMem

O
r
Instr 3

ALU
Ifetch Reg DMem Reg
d
e
r Instr 4

ALU
Ifetch Reg DMem Reg

12
Structural Hazards
⚫ Structural hazards occur when two or more instructions
need the same resource.
⚫ Common methods for eliminating structural hazards are:
– Duplicate resources
– Pipeline the resource
– Reorder the instructions
⚫ It may be too expensive too eliminate a structural hazard, in
which case the pipeline should stall.
⚫ When the pipeline stalls, no instructions are issued until the
hazard has been resolved.
⚫ What are some examples of structural hazards?

13
Why Pipeline?
Time (clock cycles)

Single-
Inst 0 Reg cycle

ALU
I Im Reg Dm
n Datapath
s
Inst 1 Im Reg

ALU
Reg Dm
t
r.
Inst 2 Im Reg Dm Reg

ALU
O
r
Inst 3 Im Reg Dm Reg

ALU
d
e
r Inst 4 Im Reg Dm Reg

ALU

14
One Memory Port Structural Hazards
Time (clock cycles)
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7

ALU
I Load Ifetch Reg DMem Reg

n
s Instr 1

ALU
Ifetch Reg DMem Reg
t
r.
Instr 2

ALU
Ifetch Reg DMem Reg

O
r Stall Bubble Bubble Bubble Bubble Bubble
d
e
r Instr 3

ALU
Ifetch Reg DMem Reg

15
Data Hazards
 An instruction depends on the results of a previous
instruction still in the pipeline
 Data dependency
 Example

add $s0, $to, $t1; write at 5th stage


sub $t2, $s0, $t3; read at 2nd stage
Results : 3 bubbles (waiting cycle)
Could you draw the diagram?
IF ID EXE MEM WB

IF ID EXE MEM WB

16
Solutions to the Data Hazards
 Rely on compiler to remove the these dependency
 Data forwarding (bypassing)
 Getting the needed data item early from the internal resources

200 400 600 800 1000


Time

add $s0, $t0, $t1 IF ID EX MEM WB

Program
Execution 200 400 600 800 1000
Order Time
(in instructions)

add $s0, $t0, $t1 IF ID EX MEM WB

No stall after forwarding

sub $t2, $s0, $t3 IF ID EX MEM WB

17
Load-Use Data Hazard

 Still one stall (bubble) even the forwarding is applied

Program
Execution
Time
200 400 600 800 1000 1200 1400
Order
(in instructions)

lw $s0, 20($t1) IF ID EX MEM WB

Bubble Bubble Bubble Bubble Bubble

sub $t2, $s0, $t3 IF ID EX MEM WB

18
Data Hazard Classification
 Classified according to the order of read and write accesses
 RAW (Read after write)
 J tries to read a source before I write it, so j incorrectly get the old
value
add $s0, $to, $t1; write at 5th stage
sub $t2, $s0, $t3; read at 2nd stage
IF ID EX MEM WB
IF ID EXE MEM WB

 Some can be solved by forwarding

19
Data Hazard Classification
 WAW (Write after write)
 J tries to write an operand before it is written by I, leaving the old
value
lw $s0, 100($t0); write at 6th stage
add $s0, $t1, $t3; write at 4th stage
IF ID EX MEM1 MEM2 WB
IF ID EXE WB
 Present only in the pipelines that write in more than one pipe
stage or out-of-order execution (allowed instruction continuing
even previous one is stalled

20
Data Hazard Classification
 WAR (Write after read)
 J tries to write a destination before it is read by I, so I incorrectly
gets the new value
sw $s0, 100($t0); use $t0 at 5th stage
add $t0, $t1, $t3; write at 4th stage
IF ID EX MEM1 MEM2 WB
IF ID EXE WB
 This hazard occurs when there are some instructions that write
results early in the instruction pipeline, and other instructions
that read a source late in the pipeline.
RAR (Read after read): this is not a hazard

21
Solutions to the Data Hazards
 Rely on compiler (or DIY) to remove the these dependency
 Reordering code to avoid pipeline stalls

A = B + E, C = B + F
lw $t1, 0($t0) lw $t1, 0($t0)
lw $t2, 4($t0) lw $t2, 4($t0)
add $t3, $t2, $t1 lw $t4, 8($t1)
sw $t3, 12($t0) add $t3, $t2, $t2
lw $t4, 8($t1) sw $t3, 12($t0)
add $t5, $t1, $t4 add $t5, $t1, $t4
sw $t5, 16($t0) sw $t5, 16($t0)
The hazard in blue line can be solved by forwarding

22
Control Hazards
 Flow of instruction addresses is not what the pipeline
expected
 What is the pipeline expected? Next instructions
 Due to Branch instruction
 Simplest solution: Stall on branch
Program
200 400 600 800 1000 1200 1400
execution Time
order
(in instructions)
Instruction Data
add $4, $5, $6 fetch Reg ALU access
Reg

200 ps Instruction Data


beq $1, $2, 40 fetch Reg ALU Reg
access

Bubble Bubble Bubble Bubble Bubble One stall


or $7, $8, $9 Instruction Data
400 ps fetch
Reg ALU access
Reg

23
Performance of “Stall on Branch”
 Assume all other instructions have a CPI of 1
 Branch: 13%
CPI = 1 (ideal case)+ 1 (one stall) *0.13 = 1.13
 If not resolved by the 2nd stage, the situation will be worse
 Better Solution: Branch Prediction

24
2nd Solution to Control Hazards
 Predict
 Use prediction to handle branches
 If predict right -> no stall
 If predict wrong -> 1 stall
 Approach
 Static prediction
 Always untaken
 Some branches as taken and some as untaken,
 Always taken for loop branch
 Dynamic prediction
 Keep a history for each branch as taken or untaken
 What you should concern for prediction
 If guessed wrong, the pipeline control should ensure that the
instruction following the wrongly guessed branch has no effect
and must restart the pipeline from the proper branch address

25
Predict “Not Taken”
Right predict Program
execution Time 200 400 600 800 1000 1200 1400
order
(in instructions)
Instruction Data
add $4, $5, $6 fetch Reg ALU access
Reg

200 ps Instruction Data


beq $1, $2, 40 fetch Reg ALU Reg
access
lw $3, 300($0) Instruction Data
200 ps fetch
Reg ALU access
Reg

Wrong predict Program


execution Time
200 400 600 800 1000 1200 1400
order
(in instructions)
Instruction Data
add $4, $5, $6 fetch Reg ALU access
Reg

200 ps Instruction Data


beq $1, $2, 40 fetch Reg ALU Reg
access

Bubble Bubble Bubble Bubble Bubble

or $7, $8, $9 Instruction Data


400 ps fetch
Reg ALU access
Reg

26
Third Solution to Control Hazards
 Delayed Branch
 Place an instruction into the branch delay slot that is not
affected by branch
 Handwritten or compiler
Program
execution
order
(in instructions)
Time 200 400 600 800 1000 1200 1400
delayed branch
beq $1, $2, 40
Instruction Data
add $4, $5, $6 fetch
Reg ALU access
Reg

beq $1, $2, 40 200 ps Instruction


fetch
Reg ALU
Data
access
Reg add $4, $5, $6
(delayed branch slot)
Bubble Bubble Bubble Bubble Bubble
or $7, $8, $9
or $7, $8, $9 Instruction
Reg ALU
Data
Reg
400 ps fetch access

Without delayed branch

27
Note on Delayed Branch
 Compiler typically fill about 50% of branch delays slots with
useful instruction
 Imply 50% of slot could be “NOP”
 If the pipeline is deeper, more branch delays slots and even
harder to fill

28
See You Next Class!

29

You might also like