0% found this document useful (0 votes)
16 views

CSE332 / EEE336 Computer Organization & Architecture Pipelining I

Uploaded by

Samrat Shovon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

CSE332 / EEE336 Computer Organization & Architecture Pipelining I

Uploaded by

Samrat Shovon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 21

CSE332 / EEE336

Computer Organization & Architecture


Pipelining I
Lecture 8

Rashadul Kabir
North South University
Summer 2020
Outline of this Lecture
 Processor Implementation Styles
 Pipelining

2
Processor Implementation Styles
 Single Cycle Implementation
 Performs each instruction in 1 clock cycle
 Clock cycle must be long enough for slowest instruction; therefore,
 Disadvantage: only as fast as slowest instruction
 Multi-Cycle Implementation
 Breaks fetch/execute cycle into multiple steps
 Performs 1 step in each clock cycle
 Advantage: each instruction uses only as many cycles as it needs
 Pipelined Implementation
 Executes each instruction in multiple steps
 Performs 1 step / instruction in each clock cycle
 Processes multiple instructions in parallel – assembly line

3
Assembly line

4
Two important terms!
 Throughput is the amount of processing that can be
accomplished during a given interval of time.

 Latency is the amount of time taken to complete a task.

 We will be using these terms throughout the lecture!

5
Pipelining using Laundry Analogy
Time
6 PM 7 8 9 10 11 12 1 2 AM

Task
order
6 PM 7 8 9 10 11 12 1 2 AM
TimeA
Task
B
order
A
C

D
B

6 PM 7 8 9 10 11 12 1 2 AM
Time

Task
order 6 PM 7 8 9 10 11 12 1 2 AM
Time
A

Task
- 4 loads of laundry in parallel
order B

A
- no additional resources
C

B
- throughput increased by 4
D

C
- latency per load is the same
D 6
Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]
Pipelining Multiple Loads of Laundry: In
Time
Task
6 PM 7 8 9 10 11 12 1 2 AM

Practice order
A
6 PM 7 8 9 10 11 12 1 2 AM
Time
B
Task
order
C
A
D
B

6 PM 7 8 9 10 11 12 1 2 AM
Time

Task
order
6 PM 7 8 9 10 11 12 1 2 AM
TimeA

Task B
order
C
A

D
B

C
the slowest step decides throughput
D
7
Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]
Pipelining
 Pipelining exploits the potential of parallelism among
instructions. This parallelism is called instruction-level
parallelism (ILP).
 Pipelining does not reduce latency of a single task, it
increases throughput of entire workload.
 Pipeline rate limited by longest stage / slowest pipeline stage
 Potential speedup = number of pipeline stages
 Unbalanced lengths of pipe stages reduces speedup.
 Time to “fill” pipeline and time to “drain” it – when there is
slack in the pipeline – reduces speedup.

8
Ideal Pipelining

combinational logic (F,D,E,M,W) BW=~(1/T)


T psec

T/2 ps (F,D,E) T/2 ps (M,W) BW=~(2/T)

T/3 T/3 T/3 BW=~(3/T)


ps (F,D) ps (E,M) ps (M,W)

9
More Realistic Pipeline: Throughput
 Nonpipelined version with delay T
BW = 1/(T+S) where S = latch delay

T ps

 k-stage pipelined version


BWk-stage = 1 / (T/k +S )
BWmax = 1 / (1 gate delay + S )

T/k T/k
ps ps

10
More Realistic Pipeline: Cost
 Nonpipelined version with combinational cost G
Cost = G+L where L = latch cost

G gates

 k-stage pipelined version


Costk-stage = G + Lk

G/k G/k

11
Instruction execution overview
 Executing a MIPS instruction can take up to five steps.

 However, not all instructions need these five steps.

12
Datapath broken into 5 stages
 Each stage has its own functional units.
 Each stage can execute in .2 ns. Is this the right partitioning?
Why not 4 or 6?
 Just like a multi-cycle implementation.
IF: Instruction fetch ID: Instruction decode/ EX: Execute/ MEM: Memory access WB: Write back
register file read address calculation
0
M
ignore
u
x
1 for now

Add

4 Add Add
result
Shift
left 2

Read
PC Address register 1 Read
data 1
Read
register 2 Zero
Instruction Registers Read ALU ALU

Instruction
Write
register
data 2
0
M
u
result Address
Data
Read
data
1
M
RF
memory Write x u
data 1
Write
memory x
0
write
data
16 32
Sign
extend

200ps 100ps 200ps 200ps 100ps


13
Instruction Pipeline Throughput

5-stage speedup is 4, not 5 as predicted by the ideal model. Why?


14
Enabling Pipelined Processing: Pipeline
Registers
IF: Instruction fetch ID: Instruction decode/ EX: Execute/ MEM: Memory access WB: Write back
register file read address calculation
00
MM
uu
No resource is used by more than 1 stage!
xx
11

IF/ID ID/EX EX/MEM MEM/WB


PCD+4

PCE+4

nPCM
Add
Add

Add
44 Add Add
Add result
result
ShiftShift
leftleft
22

Read
Read
Instruction

Address register
register 11

AE
PCPC
PCF

Address Read
Read

AoutM
data
data 11
Read
Read
register
22 Zero
Zero

MDRW
Instruction register
Instruction Registers Read
Registers Read ALU ALU
ALU ALU
IRD

memory Write 00 Read


Read
Write data
data 22 result
result Address
Address 11
register
register MM data
data
Instruction MM
uu Data
Data
memory uu
BE
Write
Write xx memory
memory
data
data xx
11
Write 00
Write
data
data

AoutW
BM
ImmE

1616 3232
Sign
Sign
extend
extend

T/k T/k
ps T ps
15
Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]
Write 0
Write data
data
16 32
16 32Sign

Pipelined Operation Example


Sign
extend
extend

lw All instruction classes must follow the same path and timing
Instruction fetch through thelw pipeline stages.
lw
Any performance impact?
lw
00
00
lw
M
0
MM
Instruction decode Execution Memory
uuu
x
xxx Write back
111

IF/ID
IF/ID
IF/ID
IF/ID
IF/ID ID/EX
ID/EX
ID/EX
ID/EX
ID/EX EX/MEM
EX/MEM
EX/MEM
EX/MEM
EX/MEM MEM/WB
MEM/WB
MEM/WB
MEM/WB

Add
Add
Add

Add
444
4 Add Add
Add
Add
Add
Add
Add
Add
result
result
result
result
Shift
Shift
Shift
Shift
left
left 22
left 22
left

Read
Read
Read
Read
Instruction

Read
Instruction
Instruction
Instruction
Instruction

PC
PC Address register
register111
register Read
PC Address
Address
Address Read
Read
Read
Read
Read data
data111
data
data
data 1
Read
Read
Read
Read
register
register222
register 2 Zero
Zero
Zero
Zero
Instruction
Instruction
Instruction register
Registers Read
Registers Read
Registers ALU
ALU ALU
ALU
ALU
ALU ALU
ALU
memory
memory
memory Write Read
Read 00
000 ALU
ALU Read
Write
Write data
data222 result
result Address
Address
Address Read
Read
Read 11
register
register
data
data M
result
result
result Address
Address data
data
data
data 11
register
register MMM Data
Data data M
uuuu Data
Data
Data M MM
Write
Write
Write xxxxx
memory
memory uuuu
memory
memory
memory x
xxx
data
data
data 11
11
Write
Write 0000
Write
Write
Write
data
data
data
16
16
16 32
32
32
Sign 32
Sign
Sign
extend
extend
extend

lw
0
0 M Instruction decode lw
M
u
u
x 16
Based on original figure from [P&H
x CO&D,
1 COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] Write back
data
register 1M M
uu Data 0M
Write
Write xx
Write uu
data memory
memory xx
data
data 11
00

Pipelined Operation Example


16 32 Write
Write
Sign data
data
extend
16
16 32
32
Sign
Sign
extend
extend

Clock 1
Clock
Clock 5 3

lw $10,
sub $11,20($1)
$2, $3 lw $10,
sub $11,20($1)
$2, $3 lw $10, 20($1)
Instruction fetch Instruction decode Execution
0
sub $11, $2, $3 lw $10,
sub $11,20($1)
$2, $3 sub $11,20($1)
lw $10, $2, $3
00
M
MM
uuu Execution Memory
Memory Write back
Write back
xxx
11

IF/ID
IF/ID ID/EX
ID/EX EX/MEM
EX/MEM MEM/WB
MEM/WB
MEM/WB

Add
Add
Add

Add AddAdd
Add
44 Add
Add result
result
result
Shift
Shift
Shift
left 22
left
left 2

Read
Read
Read
Instruction
Instruction

PC Address
Address register 11
register
register 1 Read
PC
PC Address Read
Read
Read data 11
data
data 1
Read
Read Zero
Instruction register 22
register
register 2 Zero
Zero
Instruction
Instruction Registers Read ALU ALU
memory Registers
Registers Read
Read ALU
ALU ALU
ALU
memory
memory Write
Write
Write 2 00 result Address Read
Read 1
data 22
data result
result Address
Address data 11
register
register
register M
MM data
data
M
M
uuu Data
Data
Data
Data u
Write
Write xxx uu
Write memory
memory
memory xxx
data
data
data 1
11 0
00
Write
Write
Write
data
data
data
16
16
16 32
32
Sign 32
Sign
Sign
extend
extend
extend
extend

Clock
Clock
Clock56 21 43
Clock
Clock

sub $11, $2, $3 lw $10, 20($1) 17


Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]
sub $11, $2, $3 lw $10, 20($1) sub $11, $2, $3
Illustrating Pipeline Operation: Operation View

t0 t1 t2 t3 t4 t5
Inst0 IF ID EX MEM WB
Inst1 IF ID EX MEM WB
Inst2 IF ID EX MEM WB
Inst3 IF ID EX MEM WB
Inst4 IF ID EX MEM
IF ID EX
IF ID
IF

18
Illustrating Pipeline Operation: Resource
View
t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10

IF I0 I1 I2 I3 I4 I5 I6 I7 I8 I9 I10

ID I0 I1 I2 I3 I4 I5 I6 I7 I8 I9

EX I0 I1 I2 I3 I4 I5 I6 I7 I8

MEM I0 I1 I2 I3 I4 I5 I6 I7

WB I0 I1 I2 I3 I4 I5 I6

19
Suggested readings
 Chapter 4, Computer Organization and Design (Fifth
Edition) - D. A. Patterson and J. L. Hennesey
 Section 6.2, Computer Architecture and Implementation –
H. G. Cragon
 Section 7.8, Digital Design and Computer Architecture (2nd
edition) – D. Harris, S. Harris

20
Thank you!

21

You might also like