0% found this document useful (0 votes)

18 views21 pages

CSE332 / EEE336 Computer Organization & Architecture Pipelining I

Uploaded by

Samrat Shovon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views21 pages

CSE332 / EEE336 Computer Organization & Architecture Pipelining I

Uploaded by

Samrat Shovon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 21

CSE332 / EEE336

Computer Organization & Architecture

Pipelining I
Lecture 8

Rashadul Kabir
North South University
Summer 2020
Outline of this Lecture
 Processor Implementation Styles
 Pipelining

2
Processor Implementation Styles
 Single Cycle Implementation
 Performs each instruction in 1 clock cycle
 Clock cycle must be long enough for slowest instruction; therefore,
 Disadvantage: only as fast as slowest instruction
 Multi-Cycle Implementation
 Breaks fetch/execute cycle into multiple steps
 Performs 1 step in each clock cycle
 Advantage: each instruction uses only as many cycles as it needs
 Pipelined Implementation
 Executes each instruction in multiple steps
 Performs 1 step / instruction in each clock cycle
 Processes multiple instructions in parallel – assembly line

3
Assembly line

4
Two important terms!
 Throughput is the amount of processing that can be
accomplished during a given interval of time.

 Latency is the amount of time taken to complete a task.

 We will be using these terms throughout the lecture!

5
Pipelining using Laundry Analogy
Time
6 PM 7 8 9 10 11 12 1 2 AM

Task
order
6 PM 7 8 9 10 11 12 1 2 AM
TimeA
Task
B
order
A
C

D
B

6 PM 7 8 9 10 11 12 1 2 AM
Time

Task
order 6 PM 7 8 9 10 11 12 1 2 AM
Time
A

Task
- 4 loads of laundry in parallel
order B

A
- no additional resources
C

B
- throughput increased by 4
D

C
- latency per load is the same
D 6
Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]
Pipelining Multiple Loads of Laundry: In
Time
Task
6 PM 7 8 9 10 11 12 1 2 AM

Practice order
A
6 PM 7 8 9 10 11 12 1 2 AM
Time
B
Task
order
C
A
D
B

6 PM 7 8 9 10 11 12 1 2 AM
Time

Task
order
6 PM 7 8 9 10 11 12 1 2 AM
TimeA

Task B
order
C
A

D
B

C
the slowest step decides throughput
D
7
Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]
Pipelining
 Pipelining exploits the potential of parallelism among
instructions. This parallelism is called instruction-level
parallelism (ILP).
 Pipelining does not reduce latency of a single task, it
increases throughput of entire workload.
 Pipeline rate limited by longest stage / slowest pipeline stage
 Potential speedup = number of pipeline stages
 Unbalanced lengths of pipe stages reduces speedup.
 Time to “fill” pipeline and time to “drain” it – when there is
slack in the pipeline – reduces speedup.

8
Ideal Pipelining

combinational logic (F,D,E,M,W) BW=~(1/T)

T psec

T/2 ps (F,D,E) T/2 ps (M,W) BW=~(2/T)

T/3 T/3 T/3 BW=~(3/T)

ps (F,D) ps (E,M) ps (M,W)

9
More Realistic Pipeline: Throughput
 Nonpipelined version with delay T
BW = 1/(T+S) where S = latch delay

T ps

 k-stage pipelined version

BWk-stage = 1 / (T/k +S )
BWmax = 1 / (1 gate delay + S )

T/k T/k
ps ps

10
More Realistic Pipeline: Cost
 Nonpipelined version with combinational cost G
Cost = G+L where L = latch cost

G gates

 k-stage pipelined version

Costk-stage = G + Lk

G/k G/k

11
Instruction execution overview
 Executing a MIPS instruction can take up to five steps.

 However, not all instructions need these five steps.

12
Datapath broken into 5 stages
 Each stage has its own functional units.
 Each stage can execute in .2 ns. Is this the right partitioning?
Why not 4 or 6?
 Just like a multi-cycle implementation.
IF: Instruction fetch ID: Instruction decode/ EX: Execute/ MEM: Memory access WB: Write back
register file read address calculation
0
M
ignore
u
x
1 for now

Add

4 Add Add
result
Shift
left 2

Read
PC Address register 1 Read
data 1
Read
register 2 Zero
Instruction Registers Read ALU ALU

Instruction
Write
register
data 2
0
M
u
result Address
Data
Read
data
1
M
RF
memory Write x u
data 1
Write
memory x
0
write
data
16 32
Sign
extend

200ps 100ps 200ps 200ps 100ps

13
Instruction Pipeline Throughput

5-stage speedup is 4, not 5 as predicted by the ideal model. Why?

14
Enabling Pipelined Processing: Pipeline
Registers
IF: Instruction fetch ID: Instruction decode/ EX: Execute/ MEM: Memory access WB: Write back
register file read address calculation
00
MM
uu
No resource is used by more than 1 stage!
xx
11

IF/ID ID/EX EX/MEM MEM/WB

PCD+4

PCE+4

nPCM
Add
Add

Add
44 Add Add
Add result
result
ShiftShift
leftleft
22

Read
Read
Instruction

Address register
register 11

AE
PCPC
PCF

Address Read
Read

AoutM
data
data 11
Read
Read
register
22 Zero
Zero

MDRW
Instruction register
Instruction Registers Read
Registers Read ALU ALU
ALU ALU
IRD

memory Write 00 Read

Read
Write data
data 22 result
result Address
Address 11
register
register MM data
data
Instruction MM
uu Data
Data
memory uu
BE
Write
Write xx memory
memory
data
data xx
11
Write 00
Write
data
data

AoutW
BM
ImmE

1616 3232
Sign
Sign
extend
extend

Pipelined Operation Example

Sign
extend
extend

lw All instruction classes must follow the same path and timing
Instruction fetch through thelw pipeline stages.
lw
Any performance impact?
lw
00
00
lw
M
0
MM
Instruction decode Execution Memory
uuu
x
xxx Write back
111

IF/ID
IF/ID
IF/ID
IF/ID
IF/ID ID/EX
ID/EX
ID/EX
ID/EX
ID/EX EX/MEM
EX/MEM
EX/MEM
EX/MEM
EX/MEM MEM/WB
MEM/WB
MEM/WB
MEM/WB

Add
Add
Add

Add
444
4 Add Add
Add
Add
Add
Add
Add
Add
result
result
result
result
Shift
Shift
Shift
Shift
left
left 22
left 22
left

Read
Read
Read
Read
Instruction

Read
Instruction
Instruction
Instruction
Instruction

PC
PC Address register
register111
register Read
PC Address
Address
Address Read
Read
Read
Read
Read data
data111
data
data
data 1
Read
Read
Read
Read
register
register222
register 2 Zero
Zero
Zero
Zero
Instruction
Instruction
Instruction register
Registers Read
Registers Read
Registers ALU
ALU ALU
ALU
ALU
ALU ALU
ALU
memory
memory
memory Write Read
Read 00
000 ALU
ALU Read
Write
Write data
data222 result
result Address
Address
Address Read
Read
Read 11
register
register
data
data M
result
result
result Address
Address data
data
data
data 11
register
register MMM Data
Data data M
uuuu Data
Data
Data M MM
Write
Write
Write xxxxx
memory
memory uuuu
memory
memory
memory x
xxx
data
data
data 11
11
Write
Write 0000
Write
Write
Write
data
data
data
16
16
16 32
32
32
Sign 32
Sign
Sign
extend
extend
extend

lw
0
0 M Instruction decode lw
M
u
u
x 16
Based on original figure from [P&H
x CO&D,
1 COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] Write back
data
register 1M M
uu Data 0M
Write
Write xx
Write uu
data memory
memory xx
data
data 11
00

Pipelined Operation Example

16 32 Write
Write
Sign data
data
extend
16
16 32
32
Sign
Sign
extend
extend

Clock 1
Clock
Clock 5 3

lw $10,
sub $11,20($1)
$2, $3 lw $10,
sub $11,20($1)
$2, $3 lw $10, 20($1)
Instruction fetch Instruction decode Execution
0
sub $11, $2, $3 lw $10,
sub $11,20($1)
$2, $3 sub $11,20($1)
lw $10, $2, $3
00
M
MM
uuu Execution Memory
Memory Write back
Write back
xxx
11

IF/ID
IF/ID ID/EX
ID/EX EX/MEM
EX/MEM MEM/WB
MEM/WB
MEM/WB

Add
Add
Add

Add AddAdd
Add
44 Add
Add result
result
result
Shift
Shift
Shift
left 22
left
left 2

Read
Read
Read
Instruction
Instruction

PC Address
Address register 11
register
register 1 Read
PC
PC Address Read
Read
Read data 11
data
data 1
Read
Read Zero
Instruction register 22
register
register 2 Zero
Zero
Instruction
Instruction Registers Read ALU ALU
memory Registers
Registers Read
Read ALU
ALU ALU
ALU
memory
memory Write
Write
Write 2 00 result Address Read
Read 1
data 22
data result
result Address
Address data 11
register
register
register M
MM data
data
M
M
uuu Data
Data
Data
Data u
Write
Write xxx uu
Write memory
memory
memory xxx
data
data
data 1
11 0
00
Write
Write
Write
data
data
data
16
16
16 32
32
Sign 32
Sign
Sign
extend
extend
extend
extend

Clock
Clock
Clock56 21 43
Clock
Clock

sub $11, $2, $3 lw $10, 20($1) 17

t0 t1 t2 t3 t4 t5
Inst0 IF ID EX MEM WB
Inst1 IF ID EX MEM WB
Inst2 IF ID EX MEM WB
Inst3 IF ID EX MEM WB
Inst4 IF ID EX MEM
IF ID EX
IF ID
IF

18
Illustrating Pipeline Operation: Resource
View
t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10

IF I0 I1 I2 I3 I4 I5 I6 I7 I8 I9 I10

ID I0 I1 I2 I3 I4 I5 I6 I7 I8 I9

EX I0 I1 I2 I3 I4 I5 I6 I7 I8

MEM I0 I1 I2 I3 I4 I5 I6 I7

WB I0 I1 I2 I3 I4 I5 I6

19
Suggested readings
 Chapter 4, Computer Organization and Design (Fifth
Edition) - D. A. Patterson and J. L. Hennesey
 Section 6.2, Computer Architecture and Implementation –
H. G. Cragon
 Section 7.8, Digital Design and Computer Architecture (2nd
edition) – D. Harris, S. Harris

20
Thank you!

AI-900 Dumps Microsoft Azure AI Fundamentals (Beta)
No ratings yet
AI-900 Dumps Microsoft Azure AI Fundamentals (Beta)
6 pages
Tessent AppNote Memory Shared BUS
100% (2)
Tessent AppNote Memory Shared BUS
38 pages
Processor Structure and Function
100% (1)
Processor Structure and Function
55 pages
Lec4 - ILP Pipelining Intro
No ratings yet
Lec4 - ILP Pipelining Intro
24 pages
3-Pipelining 241110 203716
No ratings yet
3-Pipelining 241110 203716
59 pages
Lec12 Pipeline
No ratings yet
Lec12 Pipeline
23 pages
CA07 2022S3 New
No ratings yet
CA07 2022S3 New
29 pages
Lec 7 CSE-509 Pipelining
No ratings yet
Lec 7 CSE-509 Pipelining
27 pages
07 Pipeline Notes
No ratings yet
07 Pipeline Notes
145 pages
Cse410 10 Pipelining A
No ratings yet
Cse410 10 Pipelining A
7 pages
Pipelining and Parallel Processing
No ratings yet
Pipelining and Parallel Processing
26 pages
Pipe Lining
No ratings yet
Pipe Lining
66 pages
What Is The Most Boring Household Activity?
No ratings yet
What Is The Most Boring Household Activity?
27 pages
Basic Pipelining: CS2100 - Computer Organization
No ratings yet
Basic Pipelining: CS2100 - Computer Organization
83 pages
Pipelining Basic and Intermediate Concepts
No ratings yet
Pipelining Basic and Intermediate Concepts
75 pages
CS530 Fall2015 Lecture9
No ratings yet
CS530 Fall2015 Lecture9
5 pages
Onur Digitaldesign - Comparch 2021 Lecture13 Pipelining Afterlecture
No ratings yet
Onur Digitaldesign - Comparch 2021 Lecture13 Pipelining Afterlecture
138 pages
Cse410 10 Pipelining A
No ratings yet
Cse410 10 Pipelining A
27 pages
Lec 04 Pipeline D Processor
No ratings yet
Lec 04 Pipeline D Processor
106 pages
L14 MipsPipeline Ovw
No ratings yet
L14 MipsPipeline Ovw
17 pages
Pipelining
No ratings yet
Pipelining
24 pages
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
No ratings yet
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
64 pages
Lecture 13 Pipelining
No ratings yet
Lecture 13 Pipelining
12 pages
Lecture-4-08 01 2025
No ratings yet
Lecture-4-08 01 2025
35 pages
Arch3 Pipelining Afterlecture
No ratings yet
Arch3 Pipelining Afterlecture
180 pages
Pipe 1 New
No ratings yet
Pipe 1 New
64 pages
Pipelined MIPS Processor: Dmitri Strukov ECE 154A
No ratings yet
Pipelined MIPS Processor: Dmitri Strukov ECE 154A
81 pages
Module-5 DDCO
No ratings yet
Module-5 DDCO
35 pages
Pipelining and Parallel Processing
No ratings yet
Pipelining and Parallel Processing
25 pages
L24 Pipeline
No ratings yet
L24 Pipeline
40 pages
Computer Systems Architecture: Thorsten Altenkirch and Liyang Hu
No ratings yet
Computer Systems Architecture: Thorsten Altenkirch and Liyang Hu
20 pages
Computer Architecture: Nguyễn Trí Thành
No ratings yet
Computer Architecture: Nguyễn Trí Thành
77 pages
IKI20210 Pengantar Organisasi Komputer Kuliah No. 25: Pipeline
No ratings yet
IKI20210 Pengantar Organisasi Komputer Kuliah No. 25: Pipeline
27 pages
CS M151B / EE M116C: Computer Systems Architecture
No ratings yet
CS M151B / EE M116C: Computer Systems Architecture
38 pages
The Improvement of The Personal Computer
No ratings yet
The Improvement of The Personal Computer
74 pages
Chapter 4.5 - 4.8 Piplined Processor and Hazards
No ratings yet
Chapter 4.5 - 4.8 Piplined Processor and Hazards
68 pages
Lec03-Pipelining 2021
No ratings yet
Lec03-Pipelining 2021
20 pages
Pipeline
No ratings yet
Pipeline
33 pages
Slide 6
No ratings yet
Slide 6
46 pages
Chapter 4 The Processor
No ratings yet
Chapter 4 The Processor
72 pages
Pipelined Processor Design: Computer Architecture and Assembly Language
No ratings yet
Pipelined Processor Design: Computer Architecture and Assembly Language
22 pages
HRY-312 Computer Organization Introduction To Pipelining
No ratings yet
HRY-312 Computer Organization Introduction To Pipelining
30 pages
Pipelining: 5-Stage Pipeline: Mahdi Nazm Bojnordi
No ratings yet
Pipelining: 5-Stage Pipeline: Mahdi Nazm Bojnordi
35 pages
Pipelining
No ratings yet
Pipelining
44 pages
5.1-5.3 Pipelining and Parallel Processing
No ratings yet
5.1-5.3 Pipelining and Parallel Processing
56 pages
DDCO Jan25 Unit5
No ratings yet
DDCO Jan25 Unit5
30 pages
Pipeline Processing
No ratings yet
Pipeline Processing
28 pages
Lec18 Pipeline
No ratings yet
Lec18 Pipeline
59 pages
07 MIPS Pipelining CH4
No ratings yet
07 MIPS Pipelining CH4
73 pages
EE (CE) 6304 Computer Architecture Lecture #2 (8/28/13)
No ratings yet
EE (CE) 6304 Computer Architecture Lecture #2 (8/28/13)
35 pages
16.482 / 16.561 Computer Architecture and Design: Instructor: Dr. Michael Geiger Fall 2013
No ratings yet
16.482 / 16.561 Computer Architecture and Design: Instructor: Dr. Michael Geiger Fall 2013
42 pages
Pipelined Processor Design: Computer Architecture & Assembly Language Prof. Muhamed Mudawar
No ratings yet
Pipelined Processor Design: Computer Architecture & Assembly Language Prof. Muhamed Mudawar
66 pages
Processor Organization & Instruction Cycle
No ratings yet
Processor Organization & Instruction Cycle
31 pages
Pipelining - Modified1
No ratings yet
Pipelining - Modified1
51 pages
Chapter7 - Basic Processing Unit 1
No ratings yet
Chapter7 - Basic Processing Unit 1
31 pages
Lect8 Pipelined DP Control
No ratings yet
Lect8 Pipelined DP Control
59 pages
4 29 03 ImplementingMIPS 0429
No ratings yet
4 29 03 ImplementingMIPS 0429
45 pages
Pipeline
100% (2)
Pipeline
8 pages
Lec07 Pipelining Review
No ratings yet
Lec07 Pipelining Review
121 pages
Risc in Pipe Ine
No ratings yet
Risc in Pipe Ine
39 pages
Week11 Slides
No ratings yet
Week11 Slides
49 pages
Chapter 2 Lecture 4 and 5
No ratings yet
Chapter 2 Lecture 4 and 5
56 pages
CSE332 / EEE336 Computer Organization & Architecture MIPS Datapath and Its Control
No ratings yet
CSE332 / EEE336 Computer Organization & Architecture MIPS Datapath and Its Control
33 pages
Assignment-2 Implementation of Fractional Knapsack Problem: AME Ehebuba Erdous
No ratings yet
Assignment-2 Implementation of Fractional Knapsack Problem: AME Ehebuba Erdous
3 pages
Assignment-2: Title: Implementation of Fractional Knapsack Problem
No ratings yet
Assignment-2: Title: Implementation of Fractional Knapsack Problem
3 pages
Real Time Bangla Number Plate Recognition Using Computer Vision and Convolutional Neural Network
No ratings yet
Real Time Bangla Number Plate Recognition Using Computer Vision and Convolutional Neural Network
16 pages
C To Asynchronous Dataflow Circuits: An End-to-End Toolflow
No ratings yet
C To Asynchronous Dataflow Circuits: An End-to-End Toolflow
8 pages
Azure de Project
No ratings yet
Azure de Project
73 pages
DataKitchen 7 Steps White Paper
No ratings yet
DataKitchen 7 Steps White Paper
6 pages
Concept of Pipelining - Computer Architecture Tutorial What Is Pipelining?
100% (1)
Concept of Pipelining - Computer Architecture Tutorial What Is Pipelining?
5 pages
Osdi18 Chen
No ratings yet
Osdi18 Chen
17 pages
DVC Cheatsheet
No ratings yet
DVC Cheatsheet
1 page
Fast Hardware Computation of X Mod Z: J. T. Butler T. Sasao
No ratings yet
Fast Hardware Computation of X Mod Z: J. T. Butler T. Sasao
5 pages
Coa Notes
No ratings yet
Coa Notes
9 pages
CA - OS-Chapter 2 - Students
No ratings yet
CA - OS-Chapter 2 - Students
44 pages
Baldev 2018
No ratings yet
Baldev 2018
9 pages
Ab Initio - V1.4
No ratings yet
Ab Initio - V1.4
15 pages
SSIS Scaling and Performance
No ratings yet
SSIS Scaling and Performance
14 pages
5 Pipeline
No ratings yet
5 Pipeline
63 pages
Analysis of 16-Bit and 32-Bit RISC Processors
No ratings yet
Analysis of 16-Bit and 32-Bit RISC Processors
7 pages
A Sustainable BIM-based Multidisciplinary Framework For Underground Pipeline Clash Detection and Analysis
No ratings yet
A Sustainable BIM-based Multidisciplinary Framework For Underground Pipeline Clash Detection and Analysis
14 pages
Comparison Between Pipelining
No ratings yet
Comparison Between Pipelining
9 pages
CS 6303 Computer Architecture TWO Mark With Answer
100% (1)
CS 6303 Computer Architecture TWO Mark With Answer
14 pages
Realme RMX3085 RMX3085L1 2024-03-10 19-00-46
No ratings yet
Realme RMX3085 RMX3085L1 2024-03-10 19-00-46
90 pages
Multi-Dimensional Packet Classification On FPGA: 100 Gbps and Beyond
No ratings yet
Multi-Dimensional Packet Classification On FPGA: 100 Gbps and Beyond
9 pages
State Machine Timing: Retiming
No ratings yet
State Machine Timing: Retiming
15 pages
6 Explanation
No ratings yet
6 Explanation
13 pages
(Signal Processing and Communications 13) Hu, Yu Hen - Programmable Digital Signal Processors - Architecture, Programming, and App PDF
No ratings yet
(Signal Processing and Communications 13) Hu, Yu Hen - Programmable Digital Signal Processors - Architecture, Programming, and App PDF
386 pages
Jenkins Pipeline
No ratings yet
Jenkins Pipeline
5 pages
Oppo CPH2375 Op5312l1 2024-06-07 09-16-32
No ratings yet
Oppo CPH2375 Op5312l1 2024-06-07 09-16-32
177 pages
004 Components-of-ADF
No ratings yet
004 Components-of-ADF
10 pages
Pipeline and Vector Processing
No ratings yet
Pipeline and Vector Processing
4 pages
Jenkins Guided Tour
No ratings yet
Jenkins Guided Tour
13 pages

CSE332 / EEE336 Computer Organization & Architecture Pipelining I

Uploaded by

CSE332 / EEE336 Computer Organization & Architecture Pipelining I

Uploaded by

CSE332 / EEE336

Computer Organization & Architecture

 Latency is the amount of time taken to complete a task.

 We will be using these terms throughout the lecture!

combinational logic (F,D,E,M,W) BW=~(1/T)

T/2 ps (F,D,E) T/2 ps (M,W) BW=~(2/T)

T/3 T/3 T/3 BW=~(3/T)

 k-stage pipelined version

 k-stage pipelined version

 However, not all instructions need these five steps.

200ps 100ps 200ps 200ps 100ps

5-stage speedup is 4, not 5 as predicted by the ideal model. Why?

IF/ID ID/EX EX/MEM MEM/WB

memory Write 00 Read

Pipelined Operation Example

Pipelined Operation Example

sub $11, $2, $3 lw $10, 20($1) 17

You might also like