Lecture 13 Pipelining
Lecture 13 Pipelining
Datapath Output
• Today’s Topics:
– Pipelining by Analogy
– Introduction to MIPS pipelining
Copyright 1997 UCB
Pipelining is Natural!
Laundry Example
A B C D
• Ann, Brian, Cathy, Dave
each have one load of clothes
to wash, dry, and fold
1
Sequential Laundry
6 PM 7 8 9 10 11 12 1 2 AM
T 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30
a Time
A
s
k B
C
O
r D
d
e
r • Sequential laundry takes 8 hours for 4 loads
• If they learned pipelining, how long would laundry take?
30 30 30 30 30 30 30 Time
T
a A
s
k B
C
O
D
r
d
e
r
2
Pipelining Lessons
6 PM 7 8 9 • Pipelining doesn’t help latency of
single task, it helps throughput of
Time entire workload
T
a 30 30 30 30 30 30 30 • Multiple tasks operating
simultaneously using different
s A resources
k
• Potential speedup = Number of
B pipe stages
O C • Pipeline rate limited by slowest
r pipeline stage
d D • Unbalanced lengths of pipe stages
e reduces speedup
r • Time to “fill” pipeline and time to
“drain” it reduces speedup
• Stall for Dependences
Pipelining is an implementation
technique in which multiple
The Five Stages of Load
instructions are overlapped in
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5
execution.
Load Ifetch Reg/Dec Exec Mem Wr
3
Single-cycle design must take the worst-case clock
We assume the write to the register file
Pipelining occurs in the first half of the clock cycle
cycle of 800 ps, even though some instructions can
be as fast as 500 ps. Similarly, the pipelined
and the read from the register file execution clock cycle must have the worst-case
occurs in the second half. clock cycle of 200 ps, even though some stages take
only 100 ps.
Instruction Data
lw $2, 200($0) 8 ns fetch
Reg ALU
access
Reg
Instruction
lw $3, 300($0) 8 ns fetch
...
8 ns
Program
execution 2 4 6 8 10 12 14
Time
order
(in instructions)
Instruction Data
lw $1, 100($0) Reg ALU Reg
fetch access
Both use the same hardware
lw $2, 200($0) 2 ns
Instruction
fetch
Reg ALU
Data
access
Reg components. We see a fourfold speed-up
Instruction Data
on average time between instructions,
lw $3, 300($0) 2 ns Reg ALU Reg
fetch access from 800 ps down to 200 ps.
2 ns 2 ns 2 ns 2 ns 2 ns
2
Pipelining
3
Pipelined Datapath
Basic Idea
Two exceptions to this left-to-right flow of instructions:
■ The write-back stage, which places the result back into the
register file in the middle of the datapath
■ The selection of the next value of the PC, choosing between
the incremented PC and the branch address from the MEM stage
IF: Instruction fetch ID: Instruction decode/ EX: Execute/ MEM: Memory access WB: Write back
register file read address calculation
0
M
u
x
1
Add
4 Add Add
result
Shift
left 2
Read
PC Address register 1 Read
data 1
Read
register 2 Zero
Instruction Registers Read ALU ALU
Write 0 Read
data 2 result Address 1
register M data
Instruction M
u Data
memory Write x u
memory x
data 1
0
Write
data
16 32
Sign
extend
4
Pipelined Datapath
0
M
u
x
1
Add
Add
4 Add result
Shift
left 2
Read
Instruction
PC Address register 1
Read
data 1
Read
register 2 Zero
Instruction
Registers Read ALU ALU
memory Write 0 Read
data 2 result Address 1
register M data
u M
Data u
Write x memory
data x
1
0
Write
registers high-lighted.
- Every pipeline stage must have some registers to store the data produced in that stage.
- So we must place registers wherever there are dividing lines between stages. Returning to our laundry
analogy, we might have a basket between each pair of stages to hold the clothes for the next step.
- All instructions advance during each clock cycle from one pipeline register to the next. The registers are
Can you find a problem even if there are no dependencies?
named for the two stages separated by that register. For example, the pipeline register between the IF and
What instructions can
ID stages is called we execute to manifest the problem?
IF/ID.
5
Corrected Datapath
0
M
u
x
1
Add
4 Add Add
result
Shift
left 2
Read
Instruction
6
Graphically Representing Pipelines
7
Graphically Representing Pipelines
5
Why Pipeline?
6
Why Pipeline? Because the resources are there!
Time (clock cycles)
ALU
I Im Reg Dm Reg
n Inst 0
s
ALU
t Inst 1 Im Reg Dm Reg
r.
ALU
O Inst 2 Im Reg Dm Reg
r
d Inst 3
ALU
Im Reg Dm Reg
e
r
Inst 4
ALU
Im Reg Dm Reg