0% found this document useful (0 votes)

203 views53 pages

Comp Architecture Chapter 4 - Pipelining

The document discusses parallel processing and pipelining in computer architecture. It describes four categories of parallel processing based on instruction and data streams: SISD, SIMD, MISD, and MIMD. It then explains pipelining using examples of an assembly line and a laundromat pipeline. Key aspects of pipelining covered are stages, clocks, latches, characteristics, and performance metrics like latency, throughput, and clock speed. Pipelining improves throughput by processing different instructions in overlapping stages.

Uploaded by

Deshitha Chamikara Wickramarathna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

203 views53 pages

Comp Architecture Chapter 4 - Pipelining

Uploaded by

Deshitha Chamikara Wickramarathna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 53

EA 2004

Computer Architecture - II

Pipelining
Parallel processing

A parallel processing system is able to perform

simultaneous data processing to achieve faster
execution time

The system may have two or more ALUs and be able

to execute two or more instructions at the same time

Goal is to increase the throughput the amount of

processing that can be accomplished during a given
interval of time
Parallel processing

Parallel processing can happen in two basic streams:

instruction stream - consists of the sequence of

instructions read from memory

data stream - encapsulates the operations performed

on the memory

computers can be classified into 4 different categories more

focused on the behavioral aspects of Parallel Processing
Parallel processing classification

Single instruction stream, single data stream SISD

Single instruction stream, multiple data stream SIMD

Multiple instruction stream, single data stream MISD

Multiple instruction stream, multiple data stream MIMD

Single instruction stream, single
data stream SISD

A single control unit

A Processor Unit

A memory unit

Instructions are executed sequentially. Parallel processing

may be achieved by means of multiple functional
units or by pipeline processing
Single instruction stream,
multiple data stream SIMD
A single control unit
Many Processor Units
A memory unit

Includes multiple processing units with a single control

unit. All processors receive the same instruction, but
operate on different data.
Multiple instruction stream,
single data stream MISD
Many Processor Units
Which on its own contains
A control unit
A local memory
Theoretical only

processors receive different instructions, but operate

on same data.

i.e. Space shuttle flight control systems

Multiple instruction stream,
multiple data stream MIMD
Many Processor Units
Many Control Units
A computer system capable of processing several
programs at the same time.

Most multiprocessor and supercomputer systems can

be classified in this category

Parallel processing also be classified via pipelining which

concerns operational and structural interconnections
What is a Pipeline
Pipelining is used by all modern microprocessors to
enhance performance by overlapping the execution
of instructions.
A common analogue for a pipeline is a factory
assembly line. Assume that there are three stages:
o Welding
o Painting
o Polishing

For simplicity, assume that each task takes one hour.

What is a Pipeline
A single person would take three hours to produce one
product.

Three people, one person could work on each stage,

upon completing their stage they could pass their
product on to the next person (since each stage takes
one hour there will be no waiting).

Then produce one product per hour assuming the

assembly line has been filled.
Pipelining: Laundry Example

Small laundry has one

washer, one dryer and one
operator, it takes 90 A B C D
minutes to finish one load:

Washer takes 30 minutes

Dryer takes 40 minutes
operator folding takes 20
minutes
Sequential Laundry
6 PM 7 8 9 10 11 Midnight
Time

30 40 20 30 40 20 30 40 20 30 40 20
T
a A
s
k
B
O
r
d C
e 90 min
r
D
This operator scheduled his loads to be delivered to the laundry every 90
minutes which is the time required to finish one load. In other words he
will not start a new task unless he is already done with the previous task
The process is sequential. Sequential laundry takes 6 hours for 4 loads
Efficiently scheduled laundry: Pipelined Laundry

6 PM 7 8 9 10 11 Midnight
Time

30 40 40 40 40 20
40 40 40
T
a A
s
k
B
O
r
d C
e
r
D
Another operator asks for the delivery of loads to the laundry every 40
minutes!?.
Pipelined laundry takes 3.5 hours for 4 loads
Pipelining Facts Multiple tasks operating
simultaneously
Pipelining doesnt help
6 PM latency of single task, it
7 8 9
helps throughput of entire
Time workload
T
a 30 40 40 40 40 20 Pipeline rate limited by
s slowest pipeline stage
k A
Potential speedup =
O Number of pipe stages
r B
d Unbalanced lengths of pipe
e The washer stages reduces speedup
r C waits for the
dryer for 10
minutes Time to fill pipeline and
D time to drain it reduces
speedup
Building a Car
Unpipelined Start and finish a job before moving to the next

Parallelism = 1 car

24 hrs.
Latency= 24 hrs.
Throughput = 1/24 hrs.
24 hrs.

Jobs
24 hrs.

Time
Latency the amount of time that a single operation takes to execute
Throughput the rate at which operations get executed (generally
expressed as operations/second or operations/cycle)
The Assembly Line
Pipelined Break the job into smaller stages
Eng. Body Paint 8h
A B C Parallelism = 3 cars
8h Eng. Body Paint Latency= 24 hrs.
A B C
Throughput = 1/8 hrs.
Eng. Body Paint
A B C
Jobs
Eng. Body Paint
3X
A B C

Time
In computer..
Unpipelined Start and finish a job before moving to the next
FET DEC EXE

FET DEC EXE

Jobs

Time
In computer..
Pipelined Break the job into smaller stages
FET DEC EXE
A B C
I1 I1 I1
Cycle 1 FET DEC EXE
A B C
I2 I2
Cycle 2 FET DEC EXE
A B C
Jobs I3
Cycle 3
A B C

Time
In computer..
Unpipelined Start and finish a job before moving to the next
FET DEC EXE

3 ns FET DEC EXE

Jobs

Clock Speed = 1/3ns = 333 MHz

Time
In computer..
Pipelined Break the job into smaller stages
FET DEC EXE
A B C
I1 I1 I1 Clock Speed = 1/1ns = 1 GHz
Cycle 1 FET DEC EXE
A B C
I2 I2
Cycle 2 FET DEC EXE
A B C
Jobs I3
Cycle 3
A B C
1ns
3 ns
Time
Pipelining

Latency the amount of time that a single operation takes to execute

Throughput the rate at which operations get executed (generally
expressed as operations/second or operations/cycle)
Clocks and Latches

Stage 1 Stage 2
Clocks and Latches

Stage 1 L Stage 2 L

Clk
Clocks and Latches

Stage 1 L Stage 2 L

Clk

Four segment pipeline:

Clock

Input S1 R1 S2 R2 S3 R3 S4 R4
Example
Assume a 2 ns flip-flop delay
Characteristics Of Pipelining
Decomposes a sequential process into segments.

Divide the processor into segment processors each one

is dedicated to a particular segment.

Each segment is executed in a dedicated segment-

processor operates concurrently with all other segments.

Information flows through these multiple hardware

segments.

If the stages of a pipeline are not balanced and one

stage is slower than another, the entire throughput of
the pipeline is affected
Pipelining
Instruction execution is divided into k segments or
stages
Instruction exits pipe stage k-1 and proceeds into pipe

stage k
All pipe stages take the same amount of time; called

one processor cycle

Length of the processor cycle is determined by the

slowest pipe stage

k segments
Pipeline Performance
n:instructions n is equivalent to number of loads in
the laundry example
k: stages in
k is the stages (washing, drying and
pipeline folding.
: clock cycle Clock cycle is the slowest task time
Tk: total time

Tk (k (n 1))
n
T1 nk
Speedup
Tk k (n 1) k
Efficiently scheduled laundry: Pipelined Laundry

6 PM 7 8 9 10 11 Midnight
Time

30 40 40 40 40 20
40 40 40
T
a A
s
k
B
O
r
d C
e
r
D
Speedup
Consider a k-segment pipeline operating on n data
sets. (In the above example, k = 3 and n = 4.)

It takes k clock cycles to fill the pipeline and get the

first result from the output of the pipeline.

After that the remaining (n - 1) results will come out

at each clock cycle.

It therefore takes (k + n - 1) clock cycles to complete

the task.
Speedup
If we execute the same task sequentially in a
single processing unit, it takes (k * n) clock
cycles.
The speedup gained by using the pipeline is:

S = k * n / (k + n - 1 )
Speedup
S = k * n / (k + n - 1 )

For n >> k (such as 1 million data sets on a 3-stage

pipeline),

S~k

So we can gain the speedup which is equal to the

number of functional units for a large data sets. This
is because the multiple functional units can work in
parallel except for the filling and cleaning-up cycles.
Speedup
Example
- 4-stage pipeline
- subopertion in each stage; tp = 20nS
- 100 tasks to be executed
- 1 task in non-pipelined system; 20*4 = 80nS

Pipelined System
(k + n - 1)*tp = (4 + 99) * 20 = 2060nS

Non-Pipelined System
n*k*tp = 100 * 80 = 8000nS

Speedup
Sk = 8000 / 2060 = 3.88

4-Stage Pipeline is basically identical to the system with 4

identical function units
Example of Pipelining
Suppose we want to perform the combined
multiply and add operations with a stream
of numbers:

Ai * Bi + Ci for i =1,2,3,,7
Example of Pipelining
The sub-operations performed in each
segment of the pipeline are as follows:

R1 Ai, R2 Bi
R3 R1 * R2 R4 Ci
R5 R3 + R4
Example of Pipelining
Ai Bi Ci

R1 Ai , R2 Bi R1 R2
Input Ai and Bi

R3 R1 * R2, R4 Ci
Multiplier
Multiply and input Ci

R5 R3 + R4 R3 R4
Add Ci to product
Adder

R5
Content of registers in pipeline example

Clock
Pulse
Segment1 Segment2 Segment3
number R1 R2 R3 R4 R5

1 A1 B1 ---- ---- ----

2 A2 B2 A1*B1 C1 ----
3 A3 B3 A2*B2 C2 A1*B1+C1
4 A4 B4 A3*B3 C3 A2*B2+C2
5 A5 B5 A4*B4 C4 A3*B3+C3
6 A6 B6 A5*B5 C5 A4*B4+C4
7 A7 B7 A6*B6 C6 A5*B5+C5
8 ---- ---- A7*B7 C7 A6*B6+C6
9 ---- ---- ---- ---- A7*B7+C7

Exercise: Looking at the above example define how the operation of

Ai*Bi + Ci*Di+ Ei
is executed using a pipeline
Arithmetic Pipeline
From the early times of computing arithmetics withheld an
important aspect, yet arithmetic operations happen to
consume much of the time with in the arithmetic and logic
unit.

Thus pipelining is used to boost the performance of ALUs

and has opened up to many means of High performance of
computing.

Arithmetic pipelines are generally used for fixed point

operations and floating point operations.
Arithmetic Pipeline: Floating Point Adder

A generic floating point number can be stated as

X = A * 2a

Where X happens to be a binary value.

A is defined to be the mantissa and a is called the
exponent.
Arithmetic Pipeline: Floating Point Adder

X = A * 2a
Y = B * 2b
A floating point adder can be executed via 4 simple
sub operations

Compare the exponents.

Align the mantissas.
Add or subtract the mantissas.
Normalize the result.
Arithmetic Pipeline: Floating Point Adder

Given below is a simple demonstration of how two

decimal floats are added.

Consider the two input floats of X and Y

X = 0.9832* 103
Y = 0.8929* 102

Note: Decimal numbers are used for simplicity of explanation

Arithmetic Pipeline: Floating Point Adder

X = 0.9832* 103
Y = 0.8929* 102

In the initial segment the two exponents are compared.

The larger exponent is 3 and thus it is chosen as the
exponent for the result.

difference between the two exponents is 1 (3-2).

Arithmetic Pipeline: Floating Point Adder
X = 0.9832* 103
Y = 0.8929* 102
Since Y is with the lesser exponent its mantissa is
shifted to the right and the two gained values are,
X = 0.9832* 103
Y = 0.08929* 103

Afterwards the two mantissas are simply added and the

value Z is gained
Z = 1.07249* 103
Finally the gained result is normalized in manner which
staples a mantissa with a fraction with a none zero
value for the first decimal point.
Arithmetic Pipeline for Floating Point Adder

Exponents
Mantissas
a b
A B

R
R
Compare
Difference
Segment 1 Exponent
By subtraction
Align mantissas
R
R
Segment 2 Choose exponent

Add or subtract
Segment 3 mantissas

R R

Normalize
Segment 4 Adjust
Exponent result

R R
Arithmetic Pipeline for Floating Point Adder
Instruction Pipeline

An Instruction pipeline works in a similar manner

to the Arithmetic Pipeline even though it works
with an instruction field as suppose to a data
stream.
Instruction Pipeline
process of an instruction requires the following sequence
of steps.

Fetch the instruction from memory.

Decode the instruction.
Calculate the effective address.
Fetch the operands from memory.
Execute the instruction.
Store the result in the proper place.
Instruction Pipeline
Consider the following specification of a pipeline mean to
have 4 separate segments

In such a system up to 4 different instructions can be

processed at the same time.
Pipeline Conflicts
Difficulties in general can be caused due to the reasons
specified below.
Resource conflicts
when two segments access memory at the same

time.
Data dependency conflicts
occur when an instruction is dependent of a result of a
previous instruction which is not available yet
Branch difficulties conflicts
when branching and other instructions that change the
value of the PC.
Four-segment CPU pipeline for overcome
Pipeline Conflicts
Fetch instruction
Segment 1 from memory

Decode instruction
And calculate
Segment 2
Effective address

yes
Branch?

no
Fetch operand
Segment 3 From memory

Execute
Segment 4
instruction

Interrupt yes
handling Interrupt?

no
Update PC

Empty pipe
Four-segment CPU pipeline for overcome
Pipeline Conflicts
Timing of Instruction Pipeline

Step: 1 2 3 4 5 6 7 8 9 10 11 12 13

Instruction: 1 FI DA FO EX

2 FI DA FO EX

(Branch) 3 FI DA FO EX

4 FI -- -- FI DA FO EX

5 -- -- -- FI DA FO EX

6 FI DA FO EX

7 FI DA FO EX
Four-segment CPU pipeline for overcome
Pipeline Conflicts
The four segments illustrated in above table have the following
meanings:

FI is the segment that fetches an instruction.

DA is the segment that decodes the instruction and

calculate the effective address.

FO is the segment that fetches the operand.

EX is the segment that executes the instruction.

Thank You

Introduction To High Performance Scientific Computing
No ratings yet
Introduction To High Performance Scientific Computing
510 pages
Explicit Dynamics With LS-DYNA
100% (7)
Explicit Dynamics With LS-DYNA
159 pages
Lecture Notes Wireless Communication
No ratings yet
Lecture Notes Wireless Communication
87 pages
Study of Relational Database Management Systems Through OraclePL SQL
100% (1)
Study of Relational Database Management Systems Through OraclePL SQL
54 pages
Lecture 1.1.2 (System Bus Organization, Machine Language Program Execution - Instruction Cycles, Machine Cycles and Bus Cycles)
No ratings yet
Lecture 1.1.2 (System Bus Organization, Machine Language Program Execution - Instruction Cycles, Machine Cycles and Bus Cycles)
11 pages
Parallel Asynchronous Programming Java
No ratings yet
Parallel Asynchronous Programming Java
144 pages
Unit 5
No ratings yet
Unit 5
86 pages
Pipeline and Vector Processing
83% (12)
Pipeline and Vector Processing
37 pages
Ethernet (LAN) Address Resolution Protocol (ARP) Reverse Address Resolution Protocol (RARP)
100% (1)
Ethernet (LAN) Address Resolution Protocol (ARP) Reverse Address Resolution Protocol (RARP)
55 pages
OS Module
No ratings yet
OS Module
163 pages
Register and Flags 2
100% (1)
Register and Flags 2
27 pages
5.pipeline and Multiprocessors
100% (1)
5.pipeline and Multiprocessors
16 pages
Unit 3 Process Synchronization
No ratings yet
Unit 3 Process Synchronization
18 pages
Caal Lab Manual
100% (1)
Caal Lab Manual
63 pages
Grade 12 IT Theory Notes PDF
No ratings yet
Grade 12 IT Theory Notes PDF
126 pages
Unit 3 - Peripheral Interfacing
No ratings yet
Unit 3 - Peripheral Interfacing
56 pages
Chapter 3 Boolean Anlgebra and Logi Gates
No ratings yet
Chapter 3 Boolean Anlgebra and Logi Gates
59 pages
COA - Module-5
No ratings yet
COA - Module-5
35 pages
8086 Instruction Set
100% (1)
8086 Instruction Set
101 pages
Computer Architecture - Memory System
100% (1)
Computer Architecture - Memory System
22 pages
Unit 2 - Programming of 8085 Microprocessor
100% (1)
Unit 2 - Programming of 8085 Microprocessor
32 pages
Chapter 2 - Multi Threading
No ratings yet
Chapter 2 - Multi Threading
34 pages
The 555 Timer Circuit II
No ratings yet
The 555 Timer Circuit II
15 pages
Mes Manual 2022-23
No ratings yet
Mes Manual 2022-23
39 pages
Adaptive Dynamic Relaxation Algorithm For Non-Linear Hyperelastic Structures
No ratings yet
Adaptive Dynamic Relaxation Algorithm For Non-Linear Hyperelastic Structures
19 pages
Computer Architecture & Organization UNIT 1
No ratings yet
Computer Architecture & Organization UNIT 1
17 pages
Chp02 Assembly Language Fundamentals
100% (2)
Chp02 Assembly Language Fundamentals
14 pages
8086 Full
100% (4)
8086 Full
72 pages
4thsem Microprocessor Notes PDF
No ratings yet
4thsem Microprocessor Notes PDF
148 pages
Pipelining vs. Parallel Processing
No ratings yet
Pipelining vs. Parallel Processing
23 pages
Parallel and Distributed Computing Module I
No ratings yet
Parallel and Distributed Computing Module I
28 pages
Devops Record
No ratings yet
Devops Record
109 pages
MIPS Instruction Set Architecture PDF
No ratings yet
MIPS Instruction Set Architecture PDF
70 pages
COMSATS Institute of Information Technology WAH Campus
No ratings yet
COMSATS Institute of Information Technology WAH Campus
63 pages
Principles of Concurrency
No ratings yet
Principles of Concurrency
7 pages
Activity No 8 Attributes
No ratings yet
Activity No 8 Attributes
6 pages
Unit I Introduction To 8085 Microprocessor
No ratings yet
Unit I Introduction To 8085 Microprocessor
55 pages
Client Server Architecture
No ratings yet
Client Server Architecture
14 pages
Module 4
No ratings yet
Module 4
35 pages
CCE 414 Lect 2 - Computer Networks 2020
No ratings yet
CCE 414 Lect 2 - Computer Networks 2020
55 pages
Chapter 4.1 Introduction To Assembly Language
No ratings yet
Chapter 4.1 Introduction To Assembly Language
46 pages
CHAPTER - 07 / LECTURE - 01 / Data Link Layer
No ratings yet
CHAPTER - 07 / LECTURE - 01 / Data Link Layer
8 pages
Multiprocessor System Architecture
No ratings yet
Multiprocessor System Architecture
11 pages
Instruction Pipeline
No ratings yet
Instruction Pipeline
27 pages
Advanced Computer Architecture ECE 6373: Pauline Markenscoff N320 Engineering Building 1 E-Mail: Markenscoff@uh - Edu
No ratings yet
Advanced Computer Architecture ECE 6373: Pauline Markenscoff N320 Engineering Building 1 E-Mail: Markenscoff@uh - Edu
151 pages
Process Migration: Dejan S. Milo JI CI C
No ratings yet
Process Migration: Dejan S. Milo JI CI C
59 pages
Parallelism
No ratings yet
Parallelism
22 pages
Unit 5
No ratings yet
Unit 5
51 pages
Operational Amplifiers
No ratings yet
Operational Amplifiers
33 pages
8086 Microprocessor
No ratings yet
8086 Microprocessor
21 pages
Ds Chapter 5
No ratings yet
Ds Chapter 5
31 pages
Presentation 5156 Content Document 20250301102853AM
No ratings yet
Presentation 5156 Content Document 20250301102853AM
40 pages
Kubernetes & Google Kubernetes Engine (GKE) : by Akash Agrawal
No ratings yet
Kubernetes & Google Kubernetes Engine (GKE) : by Akash Agrawal
24 pages
Unit - 2 Central Processing Unit TOPIC 1: General Register Organization
No ratings yet
Unit - 2 Central Processing Unit TOPIC 1: General Register Organization
13 pages
Ch-6 Database System Architecture
No ratings yet
Ch-6 Database System Architecture
41 pages
Comp 11
No ratings yet
Comp 11
13 pages
Pipeline
No ratings yet
Pipeline
33 pages
Chapter 2.1 The 8086 Microprocessor Architecture
No ratings yet
Chapter 2.1 The 8086 Microprocessor Architecture
26 pages
Pipeline and Vector
No ratings yet
Pipeline and Vector
29 pages
CSC 222: Computer Organization: & Assembly Language
No ratings yet
CSC 222: Computer Organization: & Assembly Language
22 pages
Introduction To Embedded Systems: Bus Structure
No ratings yet
Introduction To Embedded Systems: Bus Structure
17 pages
MCCM Tutorial
No ratings yet
MCCM Tutorial
35 pages
Unit-4 Pipelinie and Vector Processing
No ratings yet
Unit-4 Pipelinie and Vector Processing
33 pages
Ds Chapter 7
No ratings yet
Ds Chapter 7
21 pages
A3016 Rev9
No ratings yet
A3016 Rev9
27 pages
Database System Architectures DS 2
No ratings yet
Database System Architectures DS 2
37 pages
Ram & Rom 2
No ratings yet
Ram & Rom 2
13 pages
Threads in C: David Chisnall
No ratings yet
Threads in C: David Chisnall
24 pages
CA Slides#3 Pipeline Introduction
No ratings yet
CA Slides#3 Pipeline Introduction
26 pages
Question Bank For Computer Architecture
No ratings yet
Question Bank For Computer Architecture
23 pages
Unit-V NEW
No ratings yet
Unit-V NEW
21 pages
2015 02 26 - Ej
No ratings yet
2015 02 26 - Ej
20 pages
Abbas 20
No ratings yet
Abbas 20
20 pages
Choosing and Implementing Hugging Face Models - by Stephanie Kirmer - Towards Data Science
No ratings yet
Choosing and Implementing Hugging Face Models - by Stephanie Kirmer - Towards Data Science
15 pages
L1-Introducation To Distributed System
No ratings yet
L1-Introducation To Distributed System
15 pages
L1 Introduction
No ratings yet
L1 Introduction
12 pages
Computer Architecture
No ratings yet
Computer Architecture
4 pages
DST4030A Lecture 1
No ratings yet
DST4030A Lecture 1
20 pages
CSE 820 Graduate Computer Architecture: Dr. Enbody
No ratings yet
CSE 820 Graduate Computer Architecture: Dr. Enbody
25 pages
Microprocessor - Overview: How Does A Microprocessor Work?
No ratings yet
Microprocessor - Overview: How Does A Microprocessor Work?
8 pages
Mapping The Data Warehouse
No ratings yet
Mapping The Data Warehouse
16 pages
Infinite-Llm: Efficient LLM Service For Long Context With Distattention and Distributed Kvcache
No ratings yet
Infinite-Llm: Efficient LLM Service For Long Context With Distattention and Distributed Kvcache
14 pages
An Overview of Watershed Algorithm Implementations in Open Source Libraries
No ratings yet
An Overview of Watershed Algorithm Implementations in Open Source Libraries
15 pages
Cache Coherence Snoopy Bus Protocol
No ratings yet
Cache Coherence Snoopy Bus Protocol
15 pages
A High Quality Eulerian 3D Fluid Solver in C++
No ratings yet
A High Quality Eulerian 3D Fluid Solver in C++
14 pages
0020.matrix Multiplication Systolic
No ratings yet
0020.matrix Multiplication Systolic
9 pages
Instruction Cycle
No ratings yet
Instruction Cycle
4 pages
8086 Signals
No ratings yet
8086 Signals
11 pages
UGRD-ITE6300 Cloud Computing and Internet of Things3
No ratings yet
UGRD-ITE6300 Cloud Computing and Internet of Things3
9 pages
Big Data Tools and Techniques
No ratings yet
Big Data Tools and Techniques
12 pages
3.permutations and Combinations
No ratings yet
3.permutations and Combinations
4 pages
Implementing A Large Data Bus VLIW Microprocessor
No ratings yet
Implementing A Large Data Bus VLIW Microprocessor
7 pages
Osi Model
No ratings yet
Osi Model
2 pages
Introduction To Epidata Epinfo SPSS and STATA
No ratings yet
Introduction To Epidata Epinfo SPSS and STATA
2 pages
Sewp Zc413 Computer Organization & Architecture
No ratings yet
Sewp Zc413 Computer Organization & Architecture
16 pages
Computer System Organization
No ratings yet
Computer System Organization
4 pages
ConnectingWith Database Explorer
No ratings yet
ConnectingWith Database Explorer
3 pages
S.No Topics Lec: Advanced Computer Network ETCS-401
No ratings yet
S.No Topics Lec: Advanced Computer Network ETCS-401
4 pages

Comp Architecture Chapter 4 - Pipelining

Uploaded by

Comp Architecture Chapter 4 - Pipelining

Uploaded by

EA 2004

A parallel processing system is able to perform

The system may have two or more ALUs and be able

Goal is to increase the throughput the amount of

Parallel processing can happen in two basic streams:

instruction stream - consists of the sequence of

data stream - encapsulates the operations performed

computers can be classified into 4 different categories more

Single instruction stream, single data stream SISD

Single instruction stream, multiple data stream SIMD

Multiple instruction stream, single data stream MISD

Multiple instruction stream, multiple data stream MIMD

A single control unit

Instructions are executed sequentially. Parallel processing

Includes multiple processing units with a single control

processors receive different instructions, but operate

i.e. Space shuttle flight control systems

Most multiprocessor and supercomputer systems can

Parallel processing also be classified via pipelining which

For simplicity, assume that each task takes one hour.

Three people, one person could work on each stage,

Then produce one product per hour assuming the

Small laundry has one

Washer takes 30 minutes

FET DEC EXE

3 ns FET DEC EXE

Clock Speed = 1/3ns = 333 MHz

Latency the amount of time that a single operation takes to execute

Four segment pipeline:

Divide the processor into segment processors each one

Each segment is executed in a dedicated segment-

Information flows through these multiple hardware

If the stages of a pipeline are not balanced and one

one processor cycle

slowest pipe stage

It takes k clock cycles to fill the pipeline and get the

After that the remaining (n - 1) results will come out

It therefore takes (k + n - 1) clock cycles to complete

For n >> k (such as 1 million data sets on a 3-stage

So we can gain the speedup which is equal to the

4-Stage Pipeline is basically identical to the system with 4

1 A1 B1 ---- ---- ----

Exercise: Looking at the above example define how the operation of

Thus pipelining is used to boost the performance of ALUs

Arithmetic pipelines are generally used for fixed point

A generic floating point number can be stated as

Where X happens to be a binary value.

Compare the exponents.

Given below is a simple demonstration of how two

Consider the two input floats of X and Y

Note: Decimal numbers are used for simplicity of explanation

In the initial segment the two exponents are compared.

difference between the two exponents is 1 (3-2).

Afterwards the two mantissas are simply added and the

An Instruction pipeline works in a similar manner

Fetch the instruction from memory.

In such a system up to 4 different instructions can be

FI is the segment that fetches an instruction.

DA is the segment that decodes the instruction and

FO is the segment that fetches the operand.

EX is the segment that executes the instruction.

You might also like