0% found this document useful (0 votes)
50 views21 pages

CS 162 Computer Architecture Lecture 3: Pipelining Contd.: Instructor: L.N. Bhuyan

lecture 3 of coa2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views21 pages

CS 162 Computer Architecture Lecture 3: Pipelining Contd.: Instructor: L.N. Bhuyan

lecture 3 of coa2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 21

CS 162 Computer Architecture

Lecture 3: Pipelining Contd.

Instructor: L.N. Bhuyan


www.cs.ucr.edu/~bhuyan/cs162

1 1999 ©UCB
Single Cycle Datapath (From Ch 5)
M
a a u
d d x
4 d << d
2 PCSrc
Read 25:21 Read MemWrite
P Addr Reg1
Read Read
C
31:0 Read data1 Zero data
20:16
Instruc- Reg2
A
tion L
Read Address
M Write U MemTo-
data2 M
u Reg Reg
u
Imem x Regs x
Dmem
Write ALU-
15:11 con Write
Data
Data
RegDst ALU- M
RegWrite src MemRead u
15:0 Sign
Extend x

2 ALUOp 1999 ©UCB


Required Changes to
Datapath
° Introduce registers to separate 5 stages
by putting IF/ID, ID/EX, EX/MEM, and
MEM/WB registers in the datapath.
° Next PC value is computed in the 3rd
step, but we need to bring in next instn
in the next cycle – Move PCSrc Mux to
1st stage. The PC is incremented unless
there is a new branch address.
° Branch address is computed in 3rd
stage. With pipeline, the PC value has
changed! Must carry the PC value along
with instn. Width of IF/ID register = (IR)+
(PC) = 64 bits.
3 1999 ©UCB
Changes to Datapath
Contd.
° For lw instn, we need write register
address at stage 5. But the IR is now
occupied by another instn! So, we
must carry the IR destination field as
we move along the stages. See
connection in fig.
Length of ID/EX register = (Reg1:32)+
(Reg2:32)+(offset:32)+ (PC:32)+
(destination register:5) = 133 bits
Assignment: What are the lengths of
EX/MEM, and MEM/WB registers

4 1999 ©UCB
Pipelined Datapath (with Pipeline Regs)
(6.2)Fetch Decode Execute Memory Write
Back
0
M
u
x
1

IF/ID ID/EX EX/MEM MEM/WB


Add

Add
4 Add
result

Shift
left 2

Read
Ins tructio n

PC Address register 1
Read
data 1
Read
register 2 Zero
Read ALU ALU
Write 0 Address Read
data 2 result 1
register M data
u M
Imem Write
data Regs x
1
u
x
0
Write

16 32
data
Dmem
Sign
extend

5
64 bits 133 bits 102 bits 69 bits
1999 ©UCB
Pipelined Control
(6.3)
• Start with single-cycle controller
• Group control lines by pipeline stage needed
• Extend pipeline registers with control bits

WB

Instruction Mem
Control WB

EX Mem WB

RegDst
Branch MemToReg
ALUop
MemRead RegWrite
ALUSrc
MemWrite

IF/ID ID/EX EX/MEM MEM/WB


6 1999 ©UCB
Pipelined Processor: Datapath +
Control • More work to correctly handle pipeline hazards
PCSrc

ID/EX
0
M
u WB
x EX/MEM
1
Control M WB
MEM/WB

EX M WB
IF/ID

Add

Add
4 Add resul t
RegWrite
Sh if t Branch

MemWrite
left 2

MemToReg
ALUSrc
Instructi on

Read
PC Address regis ter 1 Read
Read data 1
regis ter 2 Zero
Read ALU ALU
Writ e 0 Read
data 2 result Address 1
Imem regis ter M
u
data
M

Regs
Writ e x u
data x
1
Dmem
0
Write
data

Instruction 16 32
[15– 0] 6
Si gn ALU MemRead
ex tend control

Instruction
[20– 16]
0 ALUOp
M
Instruction u
[15– 11] x
1
RegDst
7 1999 ©UCB
Reca
p
° if can keep all pipeline stages busy,
can retire (complete) up to one
instruction per clock cycle (thereby
achieving single-cycle throughput)
° The pipeline paradox (for MIPS): any
instruction still takes 5 cycles to
execute (even though can retire one
instruction per cycle)

8 1999 ©UCB
Problems for Pipelining
° Hazards prevent next instruction from
executing during its designated clock
cycle, limiting speedup
• Structural hazards: HW cannot support
this combination of instructions (single
memory for instruction and data)
• Data hazards: Instruction depends on
result of prior instruction still in the
pipeline
• Control hazards: conditional branches &
other instructions may stall the pipeline
delaying later instructions

9 1999 ©UCB
Single Memory is a Structural
Hazard
Time (clock cycles)
I
n

ALU
M Reg M Reg

s Load

ALU
t Instr 1 M Reg M Reg

r.

ALU
M Reg M Reg
Instr 2
O

ALU
M Reg M Reg
Instr 3
r

ALU
d Instr 4 M Reg M Reg

e
r
10
• Can’t read same memory twice in same clock cycle
1999 ©UCB
EX: MIPS multicycle datapath:
Structural Hazard in Memory

P Address Instruction Read


C Register Reg1
Memory Read
Read
Instruction Reg2
data 1 A A ALU-
or Data L Out
Registers U
Write Read
Reg data 2 B
Data Memory
Data
Register Data

11 1999 ©UCB
Structural Hazards limit
performance
° Example: if 1.3 memory accesses per
instruction (30% of instructions
execute loads and stores)
and only one memory access per cycle
then
• Average CPI  1.3
• Otherwise datapath resource is more than
100% utilized

Structural Hazard Solution: Add more


Hardware
12 1999 ©UCB
Speed Up Equation for Pipelining

CPIpipelined = Ideal CPI + Pipeline stall clock cycles per instn

Speedup = Ideal CPI x Pipeline depth Clock Cycleunpipelined


---------------------------------- X -------------------------
Ideal CPI + Pipeline stall
x CPI Clock Cyclepipelined

Speedup = Pipeline depth Clock Cycleunpipelined


------------------------ X ---------------------------
1 + Pipeline stall CPI Clock Cyclepipelined

13 1999 ©UCB
Example: Dual-port vs. Single-port
° Machine A: Dual ported memory
° Machine B: Single ported memory, but its pipelined implementation
has a 1.05 times faster clock rate
° Ideal CPI = 1 for both
° Loads are 40% of instructions executed
SpeedUpA = Pipeline Depth/(1 + 0) x (clockunpipe/clockpipe)
= Pipeline Depth
SpeedUpB = Pipeline Depth/(1 + 0.4 x 1)
x (clockunpipe/(clockunpipe / 1.05)
= (Pipeline Depth/1.4) x 1.05
= 0.75 x Pipeline Depth
SpeedUpA / SpeedUpB = Pipeline Depth/(0.75 x Pipeline Depth) = 1.33

° Machine A is 1.33 times faster

14 1999 ©UCB
Data Hazard on Register $1
(6.4)

add $1 ,$2, $3

sub $4, $1 ,$3

and $6, $1 ,$7

or $8, $1 ,$9

xor $10, $1 ,$11

15 1999 ©UCB
Data Hazard
Solution:
• “Forward” result from one stage to another
I Time (clock cycles)
IF ID/RF EX MEM WB
n

ALU
s add $1,$2,$3 IM Reg DM Reg

ALU
IM Reg DM Reg
sub $4,$1,$3
r.

ALU
IM Reg DM Reg
and $6,$1,$7
O

ALU
IM Reg DM Reg
r or $8,$1,$9
d

ALU
IM Reg DM Reg
xor $10,$1,$11
e
r
• “or” OK if implement register file properly
16 1999 ©UCB
Hazard Detection for Forwarding
° A hazard must be detected just before execution so that
in case of hazard, the data can be forwarded to the
input of the ALU.
° It can be detected when a source register (Rs or Rt or
both) of the instruction at the EX stage is equal to the
destination register (Rd) of an instruction in the
pipeline (either in MEM or WB stage)
° Compare the values of Rs and Rt registers in the ID/EX
stage with Rd at EX/MEM and MEM/WB stages =>
Need to carry Rs, Rt, Rd values to the ID/EX register
from the IF/ID register (only Rd was carried before)
° If they match, forward the data to the input of the ALU
through the multiplexor.

See Fig. 6.43 pp. 488 of the text


17 1999 ©UCB
Forwarding: What about
Loads?
• Dependencies backward in time are hazards

IF ID/RF EX MEM WB

ALU
lw $1,0($2) IM Reg DM Reg

ALU
IM Reg DM Reg
sub $4,$1,$3

• Can’t solve with forwarding alone


• Must stall instruction dependent on load
•“Load-Use” hazard
18 1999 ©UCB
Data Hazard Even with
Forwarding
• Must stall pipeline 1 cycle (insert 1 bubble)
Time (clock cycles)

IF ID/RF EX MEM WB
lw $1, 0($2)

ALU
IM Reg DM Reg

bub

ALU
sub $4,$1,$6 IM Reg
ble
DM Reg

bub

ALU
IM Reg DM Reg
and $6,$1,$7 ble

bub

ALU
or $8,$1,$9 ble
IM Reg DM

19 1999 ©UCB
Compiler Schemes to Improve Load Delay
° Compiler will detect data dependency and inserts
nop instructions until data is available
sub $2, $1, $3
nop
and $12, $2, $5
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)
° Compiler will find independent instructions to
fill in the delay slots
20 1999 ©UCB
Software Scheduling to Avoid Load Hazards
Try producing fast code for
a = b + c;
d = e – f;
assuming a, b, c, d ,e, and f in memory.
Slow code: Fast code:
LW Rb,b LW Rb,b
LW Rc,c LW Rc,c
ADD Ra,Rb,Rc LW Re,e
SW a,Ra ADD Ra,Rb,Rc
LW Re,e
LW Rf,f
LW Rf,f
SW a,Ra
SUB Rd,Re,Rf
SUB Rd,Re,Rf
SW d,Rd
SW d,Rd

21 1999 ©UCB

You might also like