0% found this document useful (0 votes)
15 views12 pages

Lecture 13 Pipelining

The document discusses the concept of pipelining in computer architecture, illustrating its benefits through a laundry analogy that demonstrates how tasks can be overlapped to improve throughput. It outlines the five stages of MIPS instruction execution and the challenges posed by pipeline hazards, including structural, data, and control hazards. The document emphasizes that while pipelining can significantly enhance performance, it requires careful management of these hazards to achieve optimal results.

Uploaded by

Shibly Sarkar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views12 pages

Lecture 13 Pipelining

The document discusses the concept of pipelining in computer architecture, illustrating its benefits through a laundry analogy that demonstrates how tasks can be overlapped to improve throughput. It outlines the five stages of MIPS instruction execution and the challenges posed by pipeline hazards, including structural, data, and control hazards. The document emphasizes that while pipelining can significantly enhance performance, it requires careful management of these hazards to achieve optimal results.

Uploaded by

Shibly Sarkar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

The Big Picture: Where are We Now?

• The Five Classic Components of a Computer


Processor
Input
Control
Memory

Datapath Output

• Today’s Topics:
– Pipelining by Analogy
– Introduction to MIPS pipelining
Copyright 1997 UCB

Pipelining is Natural!
Laundry Example
A B C D
• Ann, Brian, Cathy, Dave
each have one load of clothes
to wash, dry, and fold

• Washer takes 30 minutes

• Dryer takes 30 minutes

• “Folder” takes 30 minutes

• “Stasher” takes 30 minutes


to put clothes into drawers

Copyright 1997 UCB

1
Sequential Laundry
6 PM 7 8 9 10 11 12 1 2 AM

T 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30
a Time
A
s
k B
C
O
r D
d
e
r • Sequential laundry takes 8 hours for 4 loads
• If they learned pipelining, how long would laundry take?

Copyright 1997 UCB

Pipelined Laundry: Start work ASAP


6 PM 7 8 9 10 11 12 1 2 AM

30 30 30 30 30 30 30 Time
T
a A
s
k B
C
O
D
r
d
e
r

• Pipelined laundry takes 3.5 hours for 4 loads!


Copyright 1997 UCB

2
Pipelining Lessons
6 PM 7 8 9 • Pipelining doesn’t help latency of
single task, it helps throughput of
Time entire workload
T
a 30 30 30 30 30 30 30 • Multiple tasks operating
simultaneously using different
s A resources
k
• Potential speedup = Number of
B pipe stages
O C • Pipeline rate limited by slowest
r pipeline stage
d D • Unbalanced lengths of pipe stages
e reduces speedup
r • Time to “fill” pipeline and time to
“drain” it reduces speedup
• Stall for Dependences

Copyright 1997 UCB

Pipelining is an implementation
technique in which multiple
The Five Stages of Load
instructions are overlapped in
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5
execution.
Load Ifetch Reg/Dec Exec Mem Wr

MIPS instructions classically take five steps:


• Ifetch: Instruction Fetch
– Fetch the instruction from the Instruction Memory
• Reg/Dec: Registers Fetch and Instruction Decode
• Exec: Calculate the memory address
• Mem: Read the data from the Data Memory
• Wr: Write the data back to the register file

Copyright 1997 UCB

3
Single-cycle design must take the worst-case clock
We assume the write to the register file
Pipelining occurs in the first half of the clock cycle
cycle of 800 ps, even though some instructions can
be as fast as 500 ps. Similarly, the pipelined
and the read from the register file execution clock cycle must have the worst-case
occurs in the second half. clock cycle of 200 ps, even though some stages take
only 100 ps.

• Improve perfomance by increasing instruction throughput


Program
execution 2 4 6 8 10 12 14 16 18
order Time
(in instructions)
Instruction Data
lw $1, 100($0) fetch
Reg ALU
access
Reg

Instruction Data
lw $2, 200($0) 8 ns fetch
Reg ALU
access
Reg

Instruction
lw $3, 300($0) 8 ns fetch
...
8 ns

Program
execution 2 4 6 8 10 12 14
Time
order
(in instructions)
Instruction Data
lw $1, 100($0) Reg ALU Reg
fetch access
Both use the same hardware
lw $2, 200($0) 2 ns
Instruction
fetch
Reg ALU
Data
access
Reg components. We see a fourfold speed-up
Instruction Data
on average time between instructions,
lw $3, 300($0) 2 ns Reg ALU Reg
fetch access from 800 ps down to 200 ps.
2 ns 2 ns 2 ns 2 ns 2 ns

FIGURE 4.27 Single-cycle, nonpipelined execution in top


versus pipelined (multi-cycled) execution in bottom

Ideal speedup is number of stages in the pipeline. Do we achieve this?


No, due to hazards.

2
Pipelining

• What makes it easy


– all instructions are the same length
– just a few instruction formats
– memory operands appear only in loads and stores

• What makes it hard?


– structural hazards: suppose we had only one memory
– control hazards: need to worry about branch instructions
– data hazards: an instruction depends on a previous instruction

• We’ll build a simple pipeline and look at these issues

3
Pipelined Datapath
Basic Idea
Two exceptions to this left-to-right flow of instructions:
■ The write-back stage, which places the result back into the
register file in the middle of the datapath
■ The selection of the next value of the PC, choosing between
the incremented PC and the branch address from the MEM stage

IF: Instruction fetch ID: Instruction decode/ EX: Execute/ MEM: Memory access WB: Write back
register file read address calculation
0
M
u
x
1

Add

4 Add Add
result
Shift
left 2

Read
PC Address register 1 Read
data 1
Read
register 2 Zero
Instruction Registers Read ALU ALU
Write 0 Read
data 2 result Address 1
register M data
Instruction M
u Data
memory Write x u
memory x
data 1
0
Write
data
16 32
Sign
extend

• What do we need to add to actually split the datapath into stages?

4
Pipelined Datapath

0
M
u
x
1

IF/ID ID/EX EX/MEM MEM/WB

Add

Add
4 Add result

Shift
left 2

Read
Instruction

PC Address register 1
Read
data 1
Read
register 2 Zero
Instruction
Registers Read ALU ALU
memory Write 0 Read
data 2 result Address 1
register M data
u M
Data u
Write x memory
data x
1
0
Write

Figure 4.35: The pipelined


data
16 32
Sign

datapath with the pipeline


extend

registers high-lighted.

- Every pipeline stage must have some registers to store the data produced in that stage.
- So we must place registers wherever there are dividing lines between stages. Returning to our laundry
analogy, we might have a basket between each pair of stages to hold the clothes for the next step.
- All instructions advance during each clock cycle from one pipeline register to the next. The registers are
Can you find a problem even if there are no dependencies?
named for the two stages separated by that register. For example, the pipeline register between the IF and
What instructions can
ID stages is called we execute to manifest the problem?
IF/ID.

5
Corrected Datapath

Advanced: See book for


background study

0
M
u
x
1

IF/ID ID/EX EX/MEM MEM/WB

Add

4 Add Add
result
Shift
left 2

Read
Instruction

PC Address register 1 Read


data 1
Read
register 2 Zero
Instruction
Registers Read ALU ALU
memory Write 0 Address Read
data 2 result 1
register M data
u Data M
Write x u
memory x
data 1
0
Write
data
16 32
Sign
extend

FIGURE 4.41 The corrected pipelined datapath to


handle the load instruction properly

6
Graphically Representing Pipelines

Time (in clock cycles)


Program
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6
execution
order
(in instructions)
lw $10, 20($1) IM Reg ALU DM Reg

sub $11, $2, $3 IM Reg ALU DM Reg

- Shows the physical resources used at


FIGURE 4.43 Multiple-clock-cycle pipeline each stage
diagram of two instructions - Instructions are listed in instruction
execution order from top to bottom, and
- Clock cycles move from left to right.
• Can help with answering questions like:
– how many cycles does it take to execute this code?
– what is the ALU doing during cycle 4?
– use this representation to help understand datapaths

7
Graphically Representing Pipelines

• Can help with answering questions like:


– how many cycles does it take to execute this code?
– what is the ALU doing during cycle 4?
– use this representation to help understand datapaths

Copyright 1997 UCB

Conventional Pipelined Execution Representation


Time

IFetch Dcd Exec Mem WB

IFetch Dcd Exec Mem WB

IFetch Dcd Exec Mem WB

IFetch Dcd Exec Mem WB

IFetch Dcd Exec Mem WB


Program Flow
IFetch Dcd Exec Mem WB

CS385, Spring-99 Copyright 1997 UCB

5
Why Pipeline?

• Suppose we execute 100 instructions


• Single Cycle Machine
– 45 ns/cycle x 1 CPI x 100 inst = 4500 ns
• Ideal pipelined machine
– 10 ns/cycle x (1 CPI x 100 inst + 4 cycle drain) = 1040 ns

Copyright 1997 UCB

6
Why Pipeline? Because the resources are there!
Time (clock cycles)

ALU
I Im Reg Dm Reg
n Inst 0
s

ALU
t Inst 1 Im Reg Dm Reg
r.

ALU
O Inst 2 Im Reg Dm Reg
r
d Inst 3

ALU
Im Reg Dm Reg
e
r
Inst 4

ALU
Im Reg Dm Reg

Copyright 1997 UCB

There are situations in pipelining when the next


instruction cannot execute in the following
clock cycle. These events are called hazards.
Can pipelining get us into trouble?
• Yes: Pipeline Hazards
– structural hazards: attempt to use the same resource two different ways at the
same time
Solutions: Waiting,
• E.g., combined adding
washer/dryer more
would be a hardwares
structural hazard or folder busy doing
something else (watching TV)
– data hazards: attempt to use item before it is ready
•Solutions: Waiting,
E.g., one sock of pair inforwarding/bypassing
dryer and one in washer; can’t fold until get sock from
washer through dryer
• instruction depends on result of prior instruction still in the pipeline
– control hazards: attempt to make a decision before condition is evaulated
Solutions: Waiting,
• E.g., washing branch and
football uniforms taken/not taken,
need to get proper mutiple streams,
detergent level; need toetc.
see after dryer before next load in
• branch instructions
• Can always resolve hazards by waiting
– pipeline control must detect the hazard
– take action (or delay action) to resolve hazards
Copyright 1997 UCB

You might also like