Computer Organization and Assembly Language: Pipeline: Introduction
Computer Organization and Assembly Language: Pipeline: Introduction
Language
Pipeline: Introduction
CSCE430/830 Pipeline
Pipelining Outline
• Introduction
– Defining Pipelining
– Pipelining Instructions
• Hazards
– Structural hazards
– Data Hazards
– Control Hazards
• Performance
• Controller implementation
CSCE430/830 Pipeline
What is Pipelining?
• Key idea:
overlap execution of multiple instructions
CSCE430/830 Pipeline
The Laundry Analogy
CSCE430/830 Pipeline
If we do laundry sequentially...
6 PM 7 8 9 10 11 12 1 2 AM
30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30
T Time
a A
s
k
B
O
r
d C
e
r D
• Time Required: 8 hours for 4 loads
CSCE430/830 Pipeline
To Pipeline, We Overlap Tasks
6 PM 7 8 9 10 11 12 1 2 AM
30 30 30 30 30 30 30 Time
T
a
s A
k
O
B
r
d C
e
r D
• Time Required: 3.5 Hours for 4 Loads
CSCE430/830 Pipeline
To Pipeline, We Overlap Tasks
6 PM 7 8 9 10 11 12 1 2 AM
30 30 30 30 30 30 30 Time
T
a • Pipelining doesn’t help latency of
s A single task, it helps throughput of
k entire workload
O
B • Pipeline rate limited by slowest
r pipeline stage
d C
e • Multiple tasks operating
r D simultaneously
• Potential speedup = Number
pipe stages
• Unbalanced lengths of pipe
stages reduces speedup
• Time to “fill” pipeline and time to
CSCE430/830 “drain” it reduces speedup Pipeline
Pipelining a Digital System
1 nanosecond = 10^-9 second
1 picosecond = 10^-12 second
1ns
• Separate each piece with a pipeline register
Pipeline
Register
CSCE430/830 Pipeline
Pipelining a Digital System
Non-pipelined:
1 operation finishes
every 1ns
1ns
Pipelined:
1 operation finishes
every 200ps
CSCE430/830 Pipeline
Comments about pipelining
CSCE430/830 Pipeline
Pipelining a Processor
CSCE430/830 Pipeline
Review - Single-Cycle Processor
CSCE430/830 •What do we need to add to actually split the datapath into stages? Pipeline
The Basic Pipeline For MIPS
ALU
Ifetch Reg DMem Reg
I
n
s
ALU
t Ifetch Reg DMem Reg
r.
ALU
O Ifetch Reg DMem Reg
r
d
ALU
e Ifetch Reg DMem Reg
CSCE430/830 Pipeline
Pipeline example: lw
IF
CSCE430/830 Pipeline
Pipeline example: lw
ID
CSCE430/830 Pipeline
Pipeline example: lw
EX
CSCE430/830 Pipeline
Pipeline example: lw
MEM
CSCE430/830 Pipeline
Pipeline example: lw
WB
CSCE430/830 Pipeline
Single-Cycle vs. Pipelined Execution
Non-Pipelined
Instruction 0 200 400 600 800 1000 1200 1400 1600 1800
Order Time
Instruction REG REG
lw $1, 100($0) ALU MEM
Fetch RD WR
Instruction REG REG
lw $2, 200($0) Fetch
ALU MEM
RD WR
800ps
Instruction
lw $3, 300($0)
Fetch
800ps
800ps
Pipelined
Instruction 0 200 400 600 800 1000 1200 1400 1600
Order Time
Instruction REG REG
lw $1, 100($0) ALU MEM
Fetch RD WR
Instruction REG REG
lw $2, 200($0) Fetch
ALU MEM
RD WR
200ps
Instruction REG REG
lw $3, 300($0) ALU MEM
Fetch RD WR
200ps
200ps 200ps 200ps 200ps 200ps
CSCE430/830 Pipeline
Speedup
• Consider the unpipelined processor introduced previously. Assume that
it has a 1 ns clock cycle and it uses 4 cycles for ALU operations and
branches, and 5 cycles for memory operations, assume that the relative
frequencies of these operations are 40%, 20%, and 40%, respectively.
Suppose that due to clock skew and setup, pipelining the processor
adds 0.2ns of overhead to the clock. Ignoring any latency impact, how
much speedup in the instruction execution rate will we gain from a
pipeline?
CSCE430/830 Pipeline
Comments about Pipelining
CSCE430/830 Pipeline
Pipeline Hazards
CSCE430/830 Pipeline
Summary - Pipelining Overview
CSCE430/830 Pipeline
Pipelining Outline
• Introduction
– Defining Pipelining
– Pipelining Instructions
• Hazards
– Structural hazards
– Data Hazards
• Performance
CSCE430/830 Pipeline