CS104: Computer Organization: 2 April, 2020
CS104: Computer Organization: 2 April, 2020
2/04/2020
ALU
add $1, IM Reg DM Reg
ALU
sub $4,$1,$5 IM Reg DM Reg
ALU
and $6,$1,$7 IM Reg DM Reg
ALU
or $8,$1,$9 IM Reg DM Reg
ALU
xor $4,$1,$5 IM Reg DM Reg
L13
2/04/2020
ALU
I IM Reg DM Reg
waiting – stall –
n
but impacts CPI
s
t stall
r.
O stall
r
d
sub $4,$1,$5
ALU
e IM Reg DM Reg
r
ALU
and $6,$1,$7 IM Reg DM Reg
L13
2/04/2020
Another Way to “Fix” a Data Hazard
Fix data hazards
by forwarding
ALU
I add $1, IM Reg DM Reg
results as soon as
n they are available
s to where they are
ALU
IM Reg DM Reg
t sub $4,$1,$5 needed
r.
ALU
O IM Reg DM Reg
r and $6,$1,$7
d
e
ALU
r IM Reg DM Reg
or $8,$1,$9
ALU
IM Reg DM Reg
xor $4,$1,$5
L13
2/04/2020
Data Forwarding (aka Bypassing)
Take the result from the earliest point that it exists in any of
the pipeline state registers and forward it to the functional
units (e.g., the ALU) that need it that cycle
For ALU functional unit: the inputs can come from any
pipeline register rather than just from ID/EX by
adding multiplexors to the inputs of the ALU
connecting the Rd write data in EX/MEM or MEM/WB to either (or
both) of the EX’s stage Rs and Rt ALU mux inputs
adding the proper control hardware to control the new muxes
Other functional units may need similar forwarding logic
(e.g., the DM)
With forwarding can achieve a CPI of 1 even in the
presence of data dependencies
L13
2/04/2020
Forwarding Illustration
add $1,
ALU
I IM Reg DM Reg
n
s
ALU
t sub $4,$1,$5 IM Reg DM Reg
r.
ALU
IM Reg DM Reg
r and $6,$7,$1
d
e
r
I
add $1,$1,$2
ALU
IM Reg DM Reg
n
s
t
r. add $1,$1,$3
ALU
IM Reg DM Reg
O
r
add $1,$1,$4
ALU
d IM Reg DM Reg
e
r
L13
2/04/2020
Corrected Data Forwarding Control Conditions
1. EX Forward Unit:
if (EX/MEM.RegWrite
and (EX/MEM.RegisterRd != 0) Forwards the
and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) result from the
ForwardA = 10 previous instr.
if (EX/MEM.RegWrite
and (EX/MEM.RegisterRd != 0) to either input
and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) of the ALU
ForwardB = 10
1. MEM Forward Unit:
if (MEM/WB.RegWrite
and (MEM/WB.RegisterRd != 0)
and (EX/MEM.RegisterRd != ID/EX.RegisterRs) Forwards the
and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) result from the
ForwardA = 01 previous or
second
if (MEM/WB.RegWrite
previous instr.
and (MEM/WB.RegisterRd != 0)
and (EX/MEM.RegisterRd != ID/EX.RegisterRt) to either input
and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) of the ALU
ForwardB = 01
L13
2/04/2020
Datapath with Forwarding Hardware PCSrc
ID/EX
EX/MEM
Control
IF/ID
Add
Branch MEM/WB
Shift Add
4
left 2
Read Addr 1
Instruction Data
Register Read
Memory Read Addr 2Data 1 Memory
Read
PC
File Address
Read
Address Write Addr ALU Data
Read
Data 2 Write Data
Write Data
ALU
16 Sign 32 cntrl
Extend
EX/MEM.RegisterRd
ID/EX.RegisterRt
Forward MEM/WB.RegisterRd
ID/EX.RegisterRs Unit
L13
2/04/2020
Memory-to-Memory Copies
For loads immediately followed by stores (memory-to-
memory copies) can avoid a stall by adding forwarding
hardware from the MEM/WB register to the data memory
input.
Would need to add a Forward Unit and a mux to the MEM stage
I
n
lw $1,4($2)
ALU
IM Reg DM Reg
s
t
r.
ALU
O sw $1,4($3) IM Reg DM Reg
r
d
e
r
L13
2/04/2020
Forwarding with Load-use Data Hazards
ALU
I lw $1,4($2) IM Reg DM Reg
n
s
ALU
IM Reg DM Reg
t stall
r.
ALU
sub $4,$1,$5 IM Reg DM Reg
O
r
ALU
d
e
and $6,$1,$7 IM Reg DM Reg
r
or $8,$1,$9
ALU
IM Reg DM Reg
xor $4,$1,$5
ALU
IM Reg DM
Hazard ID/EX.MemRead
ID/EX
Unit EX/MEM
0
IF/ID 1
Control 0
Add
Branch MEM/WB
Shift Add
4
left 2
Read Addr 1
Instruction Data
Register Read
Memory Read Addr 2Data 1 Memory
Read
PC
File Address
Read
Address Write Addr ALU Data
Read
Data 2 Write Data
Write Data
ALU
16 Sign 32 cntrl
Extend
Forward
Unit
ID/EX.RegisterRt
L13
2/04/2020
Control Hazards
When the flow of instruction addresses is not sequential
(i.e., PC = PC + 4); incurred by change of flow instructions
Unconditional branches (j, jal, jr, jalr)
Conditional branches (beq, bne)
Exceptions (bad opcode, overflow)
Possible approaches
Stall always (impacts CPI)
Move decision point as early in the pipeline as possible, thereby
reducing the number of stall cycles
Delay decision (code reordering--requires compiler support)
Predict and only stall on wrong prediction
Shift ID/EX
left 2 EX/MEM
IF/ID Control
Add
Branch MEM/WB
PC+4[31-28] Add
4 Shift
left 2
Read Addr 1
Instruction Data
Register Read
Memory Memory
Read Addr 2Data 1
Read
PC
Forward
Unit
L13
2/04/2020
Fix jump
ALU
I j IM Reg DM Reg
hazard by
n
waiting –
s
flush
ALU
t flush IM Reg DM Reg
r.
------------------
ALU
IM Reg DM Reg
O (j target)
r
d
e
r
Shift ID/EX
left 2 EX/MEM
IF/ID Control
Add
Branch MEM/WB
PC+4[31-28] Add
4 Shift
left 2
Read Addr 1
Instruction Register Read Data
Memory Read Addr 2Data 1 Memory
Read 0
PC
beq
ALU
I IM Reg DM Reg
n
s
ALU
t lw IM Reg DM Reg
r.
ALU
O Inst 3 IM Reg DM Reg
r
d
ALU
e Inst 4 IM Reg DM Reg
r
L13
2/04/2020
Fix branch
beq
ALU
I IM Reg DM Reg hazard by
n waiting –
s flush 3 – but
ALU
t flush IM Reg DM Reg
affects CPI
r.
badly
ALU
IM Reg DM Reg
O flush
r
ALU
d IM Reg DM Reg
e flush
r
ALU
IM Reg DM Reg
beq target
ALU
IM Reg DM
Inst 3
L13
2/04/2020
ALU
beq IM Reg DM Reg Fix branch
I
n hazard by
s waiting –
ALU
t flush IM Reg DM Reg flush 1
r.
ALU
O IM Reg DM Reg
r beq target
d
ALU
e IM Reg DM
r Inst 3
L13
2/04/2020
Hazard ID/EX
Unit EX/MEM
0 1
IF/ID Control 0
Add
4 Shift Add MEM/WB
Compare
IF.Flush
left 2
Read Addr 1
Instruction RegFile Data
Memory Read Addr 2 Memory
Read 0
PC
Forward
Unit
Forward
Unit
L13
2/04/2020
Delayed Branches
If the branch hardware has been moved to the ID stage,
then we can eliminate all branch stalls with delayed
branches which are defined as always executing the next
sequential instruction after the branch instruction – the
branch takes effect after that next instruction
MIPS compiler moves an instruction to immediately after the branch
that is not affected by the branch (a safe instruction) thereby hiding
the branch delay
With deeper pipelines, the branch delay grows, requiring
more than one delay slot (N slots)
The compiler is less likely to find N safe instructions than just 1
Delayed branches have lost popularity compared to more
expensive but more flexible (dynamic) hardware branch prediction
Growth in available transistors has made hardware branch
prediction relatively cheaper
L13
2/04/2020
Scheduling Branch Delay Slots
A. From before branch B. From branch target C. From fall through
add $1,$2,$3 sub $4,$5,$6 add $1,$2,$3
if $2=0 then if $1=0 then
delay slot delay slot
add $1,$2,$3
if $1=0 then
delay slot sub $4,$5,$6
ALU
IM Reg DM Reg
I 4 beq $1,$2, 5
n
s 8 sub $4,$1,$5
ALU
IM Reg DM Reg
t flush
r.
----------------
ALU
28 and $6,$1,$7 IM Reg DM Reg
O
r
d
ALU
32 or r8,$1,$9 IM Reg DM Reg
e
r