0% found this document useful (0 votes)
4 views24 pages

Pipelining

Pipelining is an execution technique that overlaps multiple instructions in a computer's architecture to increase throughput without reducing individual instruction execution time. It divides the instruction process into stages, allowing different instructions to be processed simultaneously, which enhances overall efficiency. Pipelining is widely used in various fields, including computer processors and assembly lines, to improve performance by utilizing resources more effectively.

Uploaded by

artefaksarah2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views24 pages

Pipelining

Pipelining is an execution technique that overlaps multiple instructions in a computer's architecture to increase throughput without reducing individual instruction execution time. It divides the instruction process into stages, allowing different instructions to be processed simultaneously, which enhances overall efficiency. Pipelining is widely used in various fields, including computer processors and assembly lines, to improve performance by utilizing resources more effectively.

Uploaded by

artefaksarah2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Computer

Organization &
Architecture

Pipelining

Del Institute of Technology,


Lecture 12
Computer Technology Program
Introduction
• Pipelining is an implementation technique where multiple instructions are
overlapped in execution.
• The computer pipeline is divided in stages.
• Each stage completes a part of an instruction in parallel
• The stages are connected one to the next to form a pipe - instructions enter at
one end, progress through the stages, and exit at the other end.
• Pipelining does not decrease the time for individual instruction execution.
• Instead, it increases instruction throughput.
• The throughput of the instruction pipeline is determined by how often an
instruction exits the pipeline.
• Because the pipe stages are hooked together, all the stages must be ready to
proceed at the same time.
Traditional Concept
Assuming you’ve got:
• One washer (takes 30 minutes)

• One drier (takes 40 minutes)

• One “folder” (takes 20 minutes)

• It takes 90 minutes to wash, dry, and fold 1 load of laundry.


– How long does 4 loads take?

Ann, Brian, Cathy, Dave each have one load of clothes


to wash, dry, and fold
Traditional Concept: Laundry System
6 PM 7 8 9 10 11 Midnight

Time

30 40 20 30 40 20 30 40 20 30 40 20
Traditional Concept: Laundry System
6 PM 7 8 9 10 11 Midnight

Time

30 40 20 30 40 20 30 40 20 30 40 20

• Sequential laundry takes 6 hours


A for 4 loads
• If they learned pipelining, how
long would laundry take?
B

D
Traditional Concept: Laundry System
6 PM 7 8 9 10 11 Midnight

Time
T
a 30 40 40 40 40 20
s
k A
• Pipelined laundry takes 3.5
hours for 4 loads
O B
r
d C
e
r D
Traditional Concept: Laundry System
6 PM 7 8 9
• Pipelining doesn’t help latency
Time of single task, it helps
throughput of entire workload
T
a 30 40 40 40 40 20 • Pipeline rate limited by slowest
pipeline stage
s
A • Multiple tasks operating
k
simultaneously using different
resources
O
B • Potential speedup = Number
r pipe stages
d
• Unbalanced lengths of pipe
e C stages reduces speedup
r • Time to “fill” pipeline and time
D
to “drain” it reduces speedup
• Stall for Dependences
Pipelining
• Pipelining is a general-purpose efficiency technique
– It is not specific to processors

• Pipelining is used in:


– Assembly lines
– Bucket brigades
– Fast food restaurants

• Pipelining is used in other CS disciplines:


– Networking
– Server software architecture

• Useful to increase throughput in the presence of long latency


– More on that later…
Pipelining Processors
• We’ve seen two possible implementations of the MIPS architecture.
– A single-cycle datapath executes each instruction in just one clock
cycle, but the cycle time may be very long.
– A multicycle datapath has much shorter cycle times, but each
instruction requires many cycles to execute.
• Pipelining gives the best of both worlds and is used in just about every
modern processor.
– Cycle times are short so clock rates are high.
– But we can still execute an instruction in about one clock cycle!
Instruction execution review
• Executing a MIPS instruction can take up to five steps.
Step Name Description
Instruction Fetch IF Read an instruction from memory.
Instruction Decode ID Read source registers and generate control signals.

Execute EX Compute an R-type result or a branch outcome.


Memory MEM Read or write the data memory.
Writeback WB Store a result in the destination register.

• However, as we saw, not all instructions need all five steps.


Instruction Steps required
beq IF ID EX
R-type IF ID EX WB
sw IF ID EX MEM
lw IF ID EX MEM WB
Single-cycle datapath diagram
0
M
Add u
x
PC 4 1
Add
Shift
left 2
PCSrc

1ns RegWrite 2ns


2ns MemWrite MemToReg
Read Instruction I [25 - 21]
Read Read
address [31-0]
register 1 data 1
ALU Read Read 1
I [20 - 16]
Read Zero address data M
Instruction
register 2 Read 0 u
memory 0 Result Write
data 2 x
M address
M Write 0
u
u register x Data
x Write
Registers memory
I [15 - 11] 1 ALUOp data
2ns 1 Write
data

MemRead
ALUSrc
RegDst

I [15 - 0] Sign
extend

 How long does it take to execute each instruction?


Single-cycle review
• All five execution steps occur in one clock cycle.
• This means the cycle time must be long enough to accommodate all the
steps of the most complex instruction—a “lw” in our instruction set.
– If the register file has a 1ns latency and the memories and ALU have a 2ns latency,
“lw” will require 8ns.
– Thus all instructions will take 8ns to execute.
• Each hardware element can only be used once per clock cycle.
– A “lw” or “sw” must access memory twice (in the IF and MEM stages), so there
are separate instruction and data memories.
– There are multiple adders, since each instruction increments the PC (IF) and
performs another computation (EX). On top of that, branches also need to
compute a target address.
Example: Instruction Fetch (IF)
• Let’s quickly review how lw is executed in the single-cycle
datapath.
• We’ll ignore PC incrementing and branching for now.
• In the Instruction Fetch (IF) step, we read the instruction
memory.
Instruction Decode (ID)
The Instruction Decode (ID) step reads the source register from the register file.

RegWrite
MemWrite MemToReg
Read Instruction I [25 - 21]
Read Read
address [31-0]
register 1 data 1
ALU Read Read 1
I [20 - 16]
Read Zero address data M
Instruction
register 2 Read 0 u
memory 0 Result Write
data 2 M x
M Write address
register u Data 0
u x Write
x Registers memory
I [15 - 11] Write 1 ALUOp data
1 data
MemRead
ALUSrc
RegDst
I [15 - 0] Sign
extend
Execute (EX)
The third step, Execute (EX), computes the effective memory address from the source register
and the instruction’s constant field.
RegWrite
MemWrite MemToReg
Read Instruction I [25 - 21]
Read Read
address [31-0]
register 1 data 1
ALU Read Read 1
I [20 - 16]
Read Zero address data M
Instruction
register 2 Read 0 u
memory 0 Result Write
data 2 M x
M Write address
register u Data 0
u x Write
x Registers memory
I [15 - 11] Write 1 ALUOp data
1 data
MemRead
ALUSrc
RegDst
I [15 - 0] Sign
extend
Memory (MEM)
The Memory (MEM) step involves reading the data memory, from the address
computed by the ALU.

RegWrite
MemWrite MemToReg
Read Instruction I [25 - 21]
Read Read
address [31-0]
register 1 data 1
ALU Read Read 1
I [20 - 16]
Read Zero address data M
Instruction
register 2 Read 0 u
memory 0 Result Write
data 2 M x
M Write address
register u Data 0
u x Write
x Registers memory
I [15 - 11] Write 1 ALUOp data
1 data
MemRead
ALUSrc
RegDst
I [15 - 0] Sign
extend
Writeback (WB)
• Finally, in the Writeback (WB) step, the memory value
is stored into the destination register.
RegWrite
MemWrite MemToReg
Read Instruction I [25 - 21]
Read Read
address [31-0]
register 1 data 1
ALU Read Read 1
I [20 - 16]
Read Zero address data M
Instruction
register 2 Read 0 u
memory 0 Result Write
data 2 M x
M Write address
register u Data 0
u x Write
x Registers memory
I [15 - 11] Write 1 ALUOp data
1 data
MemRead
ALUSrc
RegDst
I [15 - 0] Sign
extend
A bunch of lazy functional units
• Notice that each execution step uses a different functional unit.
• In other words, the main units are idle for most of the 8ns cycle!
– The instruction RAM is used for just 2ns at the start of the cycle.
– Registers are read once in ID (1ns), and written once in WB (1ns).
– The ALU is used for 2ns near the middle of the cycle.
– Reading the data memory only takes 2ns as well.
• That’s a lot of hardware sitting around doing nothing.
Putting those slackers to work
• We shouldn’t have to wait for the entire instruction to complete before we
can re-use the functional units.
• For example, the instruction memory is free in the Instruction Decode step
as shown below, so...
Idle Instruction Decode (ID)

RegWrite
MemWrite MemToReg
Read Instruction I [25 - 21]
Read Read
address [31-0]
register 1 data 1 ALU Read Read 1
I [20 - 16]
Read Zero address data
Instruction M
register 2 Read 0
memory 0 Result Write u
Write data 2 M
M address x
register u Data
u Write 0
Registers x memory
I [15 - 11] x Write ALUOp data
data 1
1
MemRead
ALUSrc
RegDst
I [15 - 0] Sign
extend
Decoding and fetching together
• Why don’t we go ahead and fetch the next instruction while we’re
decoding the first one?

Fetch 2nd Decode 1st instruction

RegWrite
MemWrite MemToReg
Read Instruction I [25 - 21]
Read Read
address [31-0]
register 1 data 1 ALU Read Read 1
I [20 - 16]
Read Zero address data
Instruction M
register 2 Read 0
memory 0 Result Write u
Write data 2 M
M address x
register u Data
u Write 0
Registers x memory
I [15 - 11] x Write ALUOp data
data 1
1
MemRead
ALUSrc
RegDst
I [15 - 0] Sign
extend
Executing, decoding and fetching
• Similarly, once the first instruction enters its Execute stage, we can go
ahead and decode the second instruction.
• But now the instruction memory is free again, so we can fetch the third
instruction!
Fetch 3rd Decode 2nd Execute 1st
RegWrite
MemWrite MemToReg
Read Instruction I [25 - 21] Read Read
address [31-0] register 1 data 1
I [20 - 16] ALU Read Read 1
Instruction Read Zero address data
Read 0 M
memory 0 register 2 Result Write u
Write data 2 M address
M x
u register
Registers
u Write Data
0
I [15 - 11] x Write x ALUOp data memory
data 1
1 MemRead
RegDst ALUSrc
I [15 - 0] Sign
extend
Making Pipelining Work
• We’ll make our pipeline 5 stages long, to handle load
instructions as they were handled in the multi-cycle
implementation
– Stages are: IF, ID, EX, MEM, and WB
• We want to support executing 5 instructions simultaneously:
one in each stage.
Break datapath into 5 stages
• Each stage has its own functional units.
• Each stage can execute in 2ns
• Just like the multi-cycle implementation
IF ID EXE MEM WB

RegWrite
MemWrite MemToReg
Read Instruction I [25 - 21]
Read Read
address [31-0]
register 1 data 1
ALU Read Read 1
I [20 - 16]
Read Zero address data M
Instruction
register 2 Read 0 u
memory 0 Result Write
data 2 M x
M Write address
register u Data 0
u x Write
x Registers memory
I [15 - 11] Write 1 ALUOp data
1 data
MemRead
ALUSrc
RegDst
I [15 - 0] Sign
extend

2ns 2ns 2ns 2ns


Summary
• Pipelining attempts to maximize instruction throughput by overlapping
the execution of multiple instructions.
• Pipelining offers amazing speedup.
– In the best case, one instruction finishes on every cycle, and the
speedup is equal to the pipeline depth.
• The pipeline datapath is much like the single-cycle one, but with added
pipeline registers
– Each stage needs is own functional units
• Next time we’ll see the datapath and control, and walk through an
example execution.

You might also like