0% found this document useful (0 votes)
78 views33 pages

CS104: Computer Organization: 2 April, 2020

This document discusses techniques for handling data hazards in pipelined processors, specifically data forwarding. It describes read after write hazards and how forwarding can resolve them by providing instruction results to dependent instructions as soon as they are available. It shows the forwarding logic and control conditions needed to correctly determine when to forward from the EX/MEM and MEM/WB pipeline stages to the EX stage inputs. This allows pipelines to achieve a cycle per instruction of 1 even with data dependencies.

Uploaded by

Om Prakash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views33 pages

CS104: Computer Organization: 2 April, 2020

This document discusses techniques for handling data hazards in pipelined processors, specifically data forwarding. It describes read after write hazards and how forwarding can resolve them by providing instruction results to dependent instructions as soon as they are available. It shows the forwarding logic and control conditions needed to correctly determine when to forward from the EX/MEM and MEM/WB pipeline stages to the EX stage inputs. This allows pipelines to achieve a cycle per instruction of 1 even with data dependencies.

Uploaded by

Om Prakash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

L13

2/04/2020

CS104: Computer Organization


2nd April, 2020
Manojit Ghose
Computer Organization (IS F242)
Dip Sankar
Lecture Banerjee
1 : Introductory Thoughts

Dip Sankar Banerjee


[email protected]
Department
Indian Institute of CS &Technology,
of Information IS Guwahati
Jan-Apr 2020
L13
2/04/2020

Review: Can Pipelining Get Us Into Trouble?


Yes: Pipeline Hazards
 structural hazards: attempt to use the same resource by two
different instructions at the same time
 data hazards: attempt to use data before it is ready
- An instruction’s source operand(s) are produced by a prior
instruction still in the pipeline
 control hazards: attempt to make a decision about program
control flow before the condition has been evaluated and the
new PC target address calculated
- branch and jump instructions, exceptions

Pipeline control must detect the hazard and then take


action to resolve hazards
L13
2/04/2020

Review: Register Usage Can Cause Data Hazards


Read after write data hazard
Value of $1 10 10 10 10 10/-20 -20 -20 -20 -20

ALU
add $1, IM Reg DM Reg

ALU
sub $4,$1,$5 IM Reg DM Reg

ALU
and $6,$1,$7 IM Reg DM Reg

ALU
or $8,$1,$9 IM Reg DM Reg

ALU
xor $4,$1,$5 IM Reg DM Reg
L13
2/04/2020

One Way to “Fix” a Data Hazard


Can fix data
add $1, hazard by

ALU
I IM Reg DM Reg
waiting – stall –
n
but impacts CPI
s
t stall
r.

O stall
r
d
sub $4,$1,$5

ALU
e IM Reg DM Reg
r

ALU
and $6,$1,$7 IM Reg DM Reg
L13
2/04/2020
Another Way to “Fix” a Data Hazard
Fix data hazards
by forwarding

ALU
I add $1, IM Reg DM Reg
results as soon as
n they are available
s to where they are

ALU
IM Reg DM Reg
t sub $4,$1,$5 needed
r.

ALU
O IM Reg DM Reg
r and $6,$1,$7
d
e

ALU
r IM Reg DM Reg
or $8,$1,$9

ALU
IM Reg DM Reg
xor $4,$1,$5
L13
2/04/2020
Data Forwarding (aka Bypassing)
 Take the result from the earliest point that it exists in any of
the pipeline state registers and forward it to the functional
units (e.g., the ALU) that need it that cycle
 For ALU functional unit: the inputs can come from any
pipeline register rather than just from ID/EX by
 adding multiplexors to the inputs of the ALU
 connecting the Rd write data in EX/MEM or MEM/WB to either (or
both) of the EX’s stage Rs and Rt ALU mux inputs
 adding the proper control hardware to control the new muxes
 Other functional units may need similar forwarding logic
(e.g., the DM)
 With forwarding can achieve a CPI of 1 even in the
presence of data dependencies
L13
2/04/2020

Data Forwarding Control Conditions


1. EX Forward Unit:
if (EX/MEM.RegWrite Forwards the
and (EX/MEM.RegisterRd != 0)
result from the
and (EX/MEM.RegisterRd = ID/EX.RegisterRs))
ForwardA = 10 previous instr.
if (EX/MEM.RegWrite to either input
and (EX/MEM.RegisterRd != 0) of the ALU
and (EX/MEM.RegisterRd = ID/EX.RegisterRt))
ForwardB = 10

1. MEM Forward Unit:


if (MEM/WB.RegWrite Forwards the
and (MEM/WB.RegisterRd != 0)
result from the
and (MEM/WB.RegisterRd = ID/EX.RegisterRs))
ForwardA = 01 second
if (MEM/WB.RegWrite previous instr.
and (MEM/WB.RegisterRd != 0) to either input
and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) of the ALU
ForwardB = 01
L13
2/04/2020

Forwarding Illustration

add $1,

ALU
I IM Reg DM Reg
n
s

ALU
t sub $4,$1,$5 IM Reg DM Reg
r.

ALU
IM Reg DM Reg
r and $6,$7,$1
d
e
r

EX forwarding MEM forwarding


L13
2/04/2020
Yet Another Complication!
 Another potential data hazard can occur when there is a
conflict between the result of the WB stage instruction and
the MEM stage instruction – which should be forwarded?

I
add $1,$1,$2

ALU
IM Reg DM Reg
n
s
t
r. add $1,$1,$3

ALU
IM Reg DM Reg

O
r
add $1,$1,$4
ALU
d IM Reg DM Reg
e
r
L13
2/04/2020
Corrected Data Forwarding Control Conditions
1. EX Forward Unit:
if (EX/MEM.RegWrite
and (EX/MEM.RegisterRd != 0) Forwards the
and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) result from the
ForwardA = 10 previous instr.
if (EX/MEM.RegWrite
and (EX/MEM.RegisterRd != 0) to either input
and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) of the ALU
ForwardB = 10
1. MEM Forward Unit:
if (MEM/WB.RegWrite
and (MEM/WB.RegisterRd != 0)
and (EX/MEM.RegisterRd != ID/EX.RegisterRs) Forwards the
and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) result from the
ForwardA = 01 previous or
second
if (MEM/WB.RegWrite
previous instr.
and (MEM/WB.RegisterRd != 0)
and (EX/MEM.RegisterRd != ID/EX.RegisterRt) to either input
and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) of the ALU
ForwardB = 01
L13
2/04/2020
Datapath with Forwarding Hardware PCSrc

ID/EX
EX/MEM
Control
IF/ID

Add
Branch MEM/WB
Shift Add
4
left 2
Read Addr 1
Instruction Data
Register Read
Memory Read Addr 2Data 1 Memory
Read
PC

File Address
Read
Address Write Addr ALU Data
Read
Data 2 Write Data
Write Data
ALU
16 Sign 32 cntrl
Extend

EX/MEM.RegisterRd

ID/EX.RegisterRt
Forward MEM/WB.RegisterRd
ID/EX.RegisterRs Unit
L13
2/04/2020
Memory-to-Memory Copies
 For loads immediately followed by stores (memory-to-
memory copies) can avoid a stall by adding forwarding
hardware from the MEM/WB register to the data memory
input.
 Would need to add a Forward Unit and a mux to the MEM stage

I
n
lw $1,4($2)
ALU
IM Reg DM Reg
s
t
r.

ALU
O sw $1,4($3) IM Reg DM Reg
r
d
e
r
L13
2/04/2020
Forwarding with Load-use Data Hazards

ALU
I lw $1,4($2) IM Reg DM Reg

n
s

ALU
IM Reg DM Reg
t stall
r.

ALU
sub $4,$1,$5 IM Reg DM Reg
O
r

ALU
d
e
and $6,$1,$7 IM Reg DM Reg

r
or $8,$1,$9

ALU
IM Reg DM Reg

xor $4,$1,$5

ALU
IM Reg DM

 Will still need one stall cycle even with forwarding


L13
2/04/2020
Load-use Hazard Detection Unit
Need a Hazard detection Unit in the ID stage that inserts
a stall between the load and its use
1. ID Hazard detection Unit:
if (ID/EX.MemRead
and ((ID/EX.RegisterRt = IF/ID.RegisterRs)
or (ID/EX.RegisterRt = IF/ID.RegisterRt)))
stall the pipeline

The first line tests to see if the instruction now in the EX


stage is a lw; the next two lines check to see if the
destination register of the lw matches either source
register of the instruction in the ID stage (the load-use
instruction)
After this one cycle stall, the forwarding logic can handle
the remaining data hazards
L13
2/04/2020
Hazard/Stall Hardware
Along with the Hazard Unit, we have to implement the stall
Prevent the instructions in the IF and ID stages from
progressing down the pipeline – done by preventing the
PC register and the IF/ID pipeline register from changing
 Hazard detection Unit controls the writing of the PC (PC.write)
and IF/ID (IF/ID.write) registers
Insert a “bubble” between the lw instruction (in the EX
stage) and the load-use instruction (in the ID stage) (i.e.,
insert a noop in the execution stream)
 Set the control bits in the EX, MEM, and WB control fields of the
ID/EX pipeline register to 0 (noop). The Hazard Unit controls the
mux that chooses between the real control values and the 0’s.
Let the lw instruction and the instructions after it in the
pipeline (before it in the code) proceed normally down the
pipeline
L13
2/04/2020
Adding the Hazard/Stall Hardware PCSrc

Hazard ID/EX.MemRead
ID/EX
Unit EX/MEM
0
IF/ID 1
Control 0
Add
Branch MEM/WB
Shift Add
4
left 2
Read Addr 1
Instruction Data
Register Read
Memory Read Addr 2Data 1 Memory
Read
PC

File Address
Read
Address Write Addr ALU Data
Read
Data 2 Write Data
Write Data
ALU
16 Sign 32 cntrl
Extend

Forward
Unit
ID/EX.RegisterRt
L13
2/04/2020

Control Hazards
When the flow of instruction addresses is not sequential
(i.e., PC = PC + 4); incurred by change of flow instructions
 Unconditional branches (j, jal, jr, jalr)
 Conditional branches (beq, bne)
 Exceptions (bad opcode, overflow)

Possible approaches
 Stall always (impacts CPI)
 Move decision point as early in the pipeline as possible, thereby
reducing the number of stall cycles
 Delay decision (code reordering--requires compiler support)
 Predict and only stall on wrong prediction

Control hazards occur less frequently than data hazards,


but nothing is as effective against control hazards as
forwarding is for data hazards
L13
2/04/2020

Datapath Branch and Jump Hardware


Jump
PCSrc

Shift ID/EX
left 2 EX/MEM

IF/ID Control

Add
Branch MEM/WB
PC+4[31-28] Add
4 Shift
left 2
Read Addr 1
Instruction Data
Register Read
Memory Memory
Read Addr 2Data 1
Read
PC

File Address Read


Address Write Addr ALU Data
Read
Data 2 Write Data
Write Data
ALU
16 Sign 32 cntrl
Extend

Forward
Unit
L13
2/04/2020

Jumps Incur One Stall


Jumps not decoded until ID, so one flush is needed
 To flush next instr, set IF.Flush to zero the instruction
field of the IF/ID pipeline register (turning it into a noop)

Fix jump

ALU
I j IM Reg DM Reg
hazard by
n
waiting –
s
flush

ALU
t flush IM Reg DM Reg
r.
------------------

ALU
IM Reg DM Reg
O (j target)
r
d
e
r

Jumps now take 2 cycles  Fortunately, jumps are very


infrequent – only 3% of the SPECint instruction mix
L13
2/04/2020

Two “Types” of Stalls


noop instruction (or bubble) inserted between two
instructions in the pipeline (as done for load-use
situations)
 Keep the instructions earlier in the pipeline (later in the code)
from progressing down the pipeline for a cycle (“bounce” them in
place with write control signals)
 Insert noop by zeroing control bits in the pipeline register at the
appropriate stage
 Let the instructions later in the pipeline (earlier in the code)
progress normally down the pipeline

Flushes (or instruction squashing) where an instruction in


the pipeline is replaced with a noop instruction (as done
for the instruction located sequentially after j instruction)
 Zero the control bits for the instruction to be flushed (here we
zeroed all bits in the IF/ID pipeline register, before decoding the
control bits)
L13
2/04/2020

Supporting ID Stage Jumps


Jump
PCSrc

Shift ID/EX
left 2 EX/MEM

IF/ID Control

Add
Branch MEM/WB
PC+4[31-28] Add
4 Shift
left 2
Read Addr 1
Instruction Register Read Data
Memory Read Addr 2Data 1 Memory
Read 0
PC

File Address Read


Address Write Addr ALU Data
Read
Data 2 Write Data
Write Data
ALU
16 Sign 32 cntrl
Extend
What can be
done to support
IF stage jumps Forward
with CPI = 1 ? Unit
L13
2/04/2020

Review: Branch Instr’s Cause Control Hazards


Dependencies backward in time cause hazards

beq

ALU
I IM Reg DM Reg
n
s

ALU
t lw IM Reg DM Reg
r.

ALU
O Inst 3 IM Reg DM Reg
r
d

ALU
e Inst 4 IM Reg DM Reg
r
L13
2/04/2020

One Way to “Fix” a Branch Control Hazard

Fix branch
beq

ALU
I IM Reg DM Reg hazard by
n waiting –
s flush 3 – but

ALU
t flush IM Reg DM Reg
affects CPI
r.
badly

ALU
IM Reg DM Reg
O flush
r

ALU
d IM Reg DM Reg
e flush
r

ALU
IM Reg DM Reg
beq target

ALU
IM Reg DM
Inst 3
L13
2/04/2020

Another Way to “Fix” a Branch Control Hazard


Move branch decision hardware back to as early in
the pipeline as possible – i.e., during the decode cycle

ALU
beq IM Reg DM Reg Fix branch
I
n hazard by
s waiting –

ALU
t flush IM Reg DM Reg flush 1
r.

ALU
O IM Reg DM Reg
r beq target
d

ALU
e IM Reg DM
r Inst 3
L13
2/04/2020

Reducing the Delay of Branches


Move the branch decision hardware back to the EX stage
 Reduces the number of stall (flush) cycles to two
 Adds the and gate and the 3x1 mux to the EX timing path

Add hardware to compute the branch target address and


evaluate the branch decision to the ID stage
 Reduces the number of stall (flush) cycles to one ( as with jumps)
- But now need to add forwarding hardware in ID stage
 Computing branch target address can be done in parallel with
RegFile read (done for all instructions – only used when needed)
 Comparing the registers can’t be done until after RegFile read, so
comparing and updating the PC adds a comparator, the and gate and
the 3x1 mux to the ID timing path
Can we move the decision to the IF stage? (CPI = 1)

For deeper pipelines, branch decision points can be even


later in the pipeline, incurring more stalls
L13
2/04/2020

ID Branch Forwarding Issues


MEM/WB “forwarding” WB add3 $1,
is taken care of by the MEM add2 $3,
normal RegFile write EX add1 $4,
before read operation ID beq $1,$2,Loop
IF next_seq_instr

Need to forward from the WB add3 $3,


EX/MEM pipeline stage to MEM add2 $1,
the ID comparison EX add1 $4,
ID beq $1,$2,Loop
hardware for cases like
IF next_seq_instr
if (IDcontrol.Branch
and (EX/MEM.RegisterRd != 0) Forwards the
and (EX/MEM.RegisterRd = IF/ID.RegisterRs)) result from the
ForwardC = 1 2nd previous
if (IDcontrol.Branch instr. to either
and (EX/MEM.RegisterRd != 0) input of the
and (EX/MEM.RegisterRd = IF/ID.RegisterRt)) comparator
ForwardD = 1
L13
2/04/2020

ID Branch Forwarding Issues, con’t


If the instruction immediately WB add3 $3,
before the branch produces MEM add2 $4,
one of the branch source EX add1 $1,
ID beq $1,$2,Loop
operands, then a stall needs IF next_seq_instr
to be inserted (between the
add1 and beq) since the EX stage ALU operation is
occurring at the same time as the ID stage branch
compare operation
 “Bounce” the beq (in ID) and next_seq_instr (in IF) in place
(ID Hazard Unit deasserts PC.Write and IF/ID.Write)
 Insert a stall between the add in the EX stage and the beq in
the ID stage by zeroing the control bits going into the ID/EX
pipeline register (done by the ID Hazard Unit)
If the branch is found to be taken, then flush the instruction
currently in IF (IF.Flush), as we did with jumps
L13
2/04/2020
Supporting ID Stage Branches
Branch
PCSrc

Hazard ID/EX
Unit EX/MEM
0 1
IF/ID Control 0

Add
4 Shift Add MEM/WB

Compare
IF.Flush

left 2

Read Addr 1
Instruction RegFile Data
Memory Read Addr 2 Memory
Read 0
PC

Read Data 1 Read Data


Address Write Addr ALU Address
ReadData 2
Write Data
Write Data
ALU
16 Sign cntrl
Extend 32

Forward
Unit

Forward
Unit
L13
2/04/2020

Delayed Branches
If the branch hardware has been moved to the ID stage,
then we can eliminate all branch stalls with delayed
branches which are defined as always executing the next
sequential instruction after the branch instruction – the
branch takes effect after that next instruction
 MIPS compiler moves an instruction to immediately after the branch
that is not affected by the branch (a safe instruction) thereby hiding
the branch delay
With deeper pipelines, the branch delay grows, requiring
more than one delay slot (N slots)
 The compiler is less likely to find N safe instructions than just 1
 Delayed branches have lost popularity compared to more
expensive but more flexible (dynamic) hardware branch prediction
 Growth in available transistors has made hardware branch
prediction relatively cheaper
L13
2/04/2020
Scheduling Branch Delay Slots
A. From before branch B. From branch target C. From fall through
add $1,$2,$3 sub $4,$5,$6 add $1,$2,$3
if $2=0 then if $1=0 then
delay slot delay slot
add $1,$2,$3
if $1=0 then
delay slot sub $4,$5,$6

becomes becomes becomes


add $1,$2,$3
if $2=0 then if $1=0 then
add $1,$2,$3 sub $4,$5,$6
add $1,$2,$3
if $1=0 then
sub $4,$5,$6

 A is the best choice, it fills delay slot with an independent instruction


 In B and C, the sub instruction may need to be copied, increasing IC
 In B and C, it must be okay to execute sub when branch fails
(wasted work, but safe)
L13
2/04/2020
Static Branch Prediction
 Resolve branch hazards by assuming a given outcome
and proceeding without waiting to see the actual branch
outcome
1. Predict not taken – always predict branches will not be
taken, continue to fetch from the sequential instruction
stream, only when branch is taken does the pipeline stall
 If taken, flush instructions after the branch (earlier in the pipeline)
- in IF, ID, and EX stages if branch logic in MEM – three stalls
- In IF and ID stages if branch logic in EX – two stalls
- in IF stage if branch logic in ID – one stall
 ensure that those flushed instructions haven’t changed the
machine state – automatic in the MIPS pipeline since machine
state changing operations are at the tail end of the pipeline
(MemWrite (in MEM) or RegWrite (in WB))
 restart the pipeline at the branch destination
L13
2/04/2020
Flushing with Misprediction (Not Taken)

ALU
IM Reg DM Reg
I 4 beq $1,$2, 5
n
s 8 sub $4,$1,$5

ALU
IM Reg DM Reg
t flush
r.
----------------

ALU
28 and $6,$1,$7 IM Reg DM Reg
O
r
d

ALU
32 or r8,$1,$9 IM Reg DM Reg
e
r

 To flush the IF stage instruction, assert IF.Flush to


zero the instruction field of the IF/ID pipeline register
(transforming it into a noop)
L13
2/04/2020
Branching Structures
Predict not taken works well for “top of the loop”
branching structures
Loop: beq $1,$2,Out
1nd loop instr
 But such loops have jumps at the .
bottom of the loop to return to the .
top of the loop – and incur the .
jump stall overhead last loop instr
j Loop
Out: fall out instr

Predict not taken doesn’t work well for “bottom of the


loop” branching structures Loop: 1st loop instr
2nd loop instr
.
.
.
last loop instr
bne $1,$2,Loop
fall out instr

You might also like