0% found this document useful (0 votes)

69 views21 pages

Pipelining

The document discusses different types of parallel processors including single-instruction single-data streams (SISD), single-instruction multiple-data streams (SIMD), multiple-instruction single-data streams (MISD), and multiple-instruction multiple-data streams (MIMD). It also covers pipeline processing which allows for parallel execution through different stages. Examples of applications suited for parallel processing include expert systems, fluid flow analysis, and seismic data analysis due to their vast computational needs.

Uploaded by

Sushmit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views21 pages

Pipelining

Uploaded by

Sushmit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Introduction to Computer Architecture

(Parallel and Pipeline processors)

KR Chowdhary
Professor & Head
Email: [email protected]
webpage: krchowdhary.com

Department of Computer Science and Engineering

MBM Engineering College, Jodhpur

November 22, 2013

KR Chowdhary Parallel and Pipeline processors 1/ 21

Classification of parallel Processors

◮ One classification by M.J. Flynn 2. Single-instruction multiple-data

considers the organization of a streams (SIMD): All processors
computer system by the number receive the same instruction from
of instructions and data items control unit, but operate in
that can be manipulated different sets of data.
simultaneously.
◮ The sequence of instructions read 3. Multiple-instruction single-data
from the memory constitute an streams (MISD): It is of
instruction stream. theoretical interest only as no
◮ The operation performed on data practical organization can be
in the processor constitutes a constructed using this
data stream. organization.
◮ Parallel processing may occur in
instruction stream stream or data 4. Multiple-instruction multiple-data
stream, or both. streams (MIMD): Several
1. Single-instruction single-data programs can execute at the
streams (SISD): Instructions are same time. Most multiprocessors
executed sequentially. come in this category.

KR Chowdhary Parallel and Pipeline processors 2/ 21

Flynn’s taxonomy of computer architecture(1966)

1. SISD Instruction stream Instruction stream

3. MIMD

Control unit Processor Memory

Control unit Processor (P) memory (M)
1 1 1
I/O b

Instr. stream data stream b

b
Instr. stream data stream b
b
b
Instruction stream

P1 M1
Control unit Processor memory
b data stream n n n
b b b
CU b b b
b b
Instr. stream data stream
2. SIMD
Pn Mn
Program loaded
from front end data stream
Figure 2: MIMD Architecture.

Figure 1: SISD and SIMD architecture

One of the parallel processing class that does not fit into this classification is
pipeline processing.

KR Chowdhary Parallel and Pipeline processors 3/ 21

Parallel processing

Parallel Processing:
◮ Increasing speed by doing many things in parallel.
◮ Let P is a sequential processor processing the task T in sequential
manner. If T is partitioned into n subtasks T1 , T2 , . . . , Tn of appox. same
′
size, then a processor P (say) having n processors can be programmed so
that all the subtasks of T can execute in parallel.
′
◮ Then P executes n times faster than P.
◮ A failure of CPU is fatal to a sequential processor, but not in the case of
parallel processor.
◮ Some of the applications of parallel computer (processors) are:
1. Expert system for AI
2. Fluid flow analysis,
3. Seismic data analysis
4. Long range weather forecasting,
5. Computer Assisted tomography
6. Nuclear reactor modeling,
7. Visual image processing
8. VLSI design
◮ The typical characteristic of parallel computing are: vast amount of
computation, floating point arithmetic, vast number of operands.

KR Chowdhary Parallel and Pipeline processors 4/ 21

Computational Models

◮ We assume that a given computation can be divided into concurrent tasks

for execution on the multiprocessor.
◮ Equal Duration Model: A given task can be divided into n equal subtasks,
each of which can be executed by one processor. If ts is the execution
time of the whole task using a single processor, then the time taken by
each processor to execute its subtask is tm = ts /n. Since, according to this
model, all processors are executing their subtasks simultaneously, then the
time taken to execute the whole task is tm = ts /n.
◮ The speedup factor of a parallel system can be defined as the ratio
between the time taken by a single processor to solve a given problem
instance to the time taken by a parallel system consisting of n processors
to solve the same problem instance.

S(n) = speedup factor

ts
=
tm
ts
= =n
ts /n

KR Chowdhary Parallel and Pipeline processors 5/ 21

Pipelining
◮ A typical example of parallel processing is a one-dimensional array of
processors, where there are n identical processors P1 . . . Pn and each having
its local memory. These processors communicate by message passing
(send - receive).
b b b

P1 P2 Pn
b b b

serial IO -operations(send-receive)

Figure 3: Pipeline processing.

◮ There are total n operations going on in parallel.

◮ A pipe line constitutes a sequence of processing circuits, called segments
or stages.
- m stage pipeline has same throughput as m separate units.
R1 C1 R2 C2 Rm Cm R

segment 1 segment 2 segment m

Ri : Buffer register Ci : Computing element

Figure 4: Pipeline segments

KR Chowdhary Parallel and Pipeline processors 6/ 21
Pipeline Processors

1. Instruction pipeline: Transfer of instructions through various stages of cpu,

during instruction cycle: fetch, decode, execute. Thus, there can be three
different instructions in different stages of execution: one getting fetched,
previous of that is getting decoded, and previous to that is getting
executed.
2. Arithmetic pipeline: The data is computed through different stages.

istr. fetch Instr. decode Instr. execute

Instr. adr.
instruction results
segment 1 segment 2 segment 3

compare align normalize

exponent add
mantissa mantissa resulst
segment 1 seg. 2 seg. 3 seg. 4

X = (XM , XE ), Y = (YM , YE ) X +Y

Figure 5: Instruction and data pipeline examples.

KR Chowdhary Parallel and Pipeline processors 7/ 21

Pipe-lining Example

◮ Consider an example to compute: Ai ∗ Bi + Ci , for i = 1, 2, 3, 4, 5. Each

segment has r registers, a multiplier, and an adder unit.

R1 R2

Multiplier R4

Adder

Figure 6: A segment comprising registers and computing elements.

R1 ← Ai , R2 ← Bi ; input Ai , Bi
R3 ← R1 ∗ R2 , R4 ← C ; multiply and i/ C
R5 ← R3 + R4 ; add Ci to product

KR Chowdhary Parallel and Pipeline processors 8/ 21

Pipe-lining Example

Table 1: Computation of expression Ai ∗ Bi + Ci in space and time in 3-stage pipeline.

Clock pulse Segment 1 Segment 2 Segment 3

no. R1 , R2 R3 , R4 R5
1. A1 , B1 −, − -
2. A2 , B2 A1 ∗ B1 , C1 -
3. A3 , B3 A2 ∗ B2 , C2 A1 ∗ B1 + C1
4. A4 , B4 A3 ∗ B3 , C3 A2 ∗ B2 + C2
5. A5 , B5 A4 ∗ B4 , C4 A3 ∗ B3 + C3
6. −− A5 ∗ B5 , C5 A4 ∗ B4 + C4
7. −− −, − A5 ∗ B5 + C5
◮ Any operator that can be decomposed into a sequence of sub-operations
of about the same components can be implemented by pipeline processor.
◮ Consider that for a k-segment pipeline with clock cycle time =tp sec.,
with total n no. of tasks (T1 , T2 , . . . , Tn ) are required to be executed.
◮ T1 requires time equal to k.tp secs. Remaining n − 1 tasks emerge from
the pipeline at the rate of one task per clock cycle, and they will be
completed in time of (n − 1)tp sec, so total clock cycles required =
k + (n − 1).
◮ For k = 3 segment and n = 5 tasks it is 3 + (5 − 1) = 7, as clear from
table 1.
KR Chowdhary Parallel and Pipeline processors 9/ 21
Computational Models

◮ Consider an instruction pipeline unit (segment) that performs the same

operation and takes time equal to tu to complete each task. Total time for
n tasks is n.tu . The speedup for no. of segments as k and clock period as
tp is:
n.tu
S(n) = (1)
(k + (n − 1))tp
◮ For large number of tasks, n >> k − 1, k + n − 1 ≈ n, so,
n.tu
S(n) = (2)
n.tp
tu
= (3)
tp
◮ Instruction pipelining is similar to use of assembly line in manufacturing
plant
◮ An instruction’s execution is broken in to many steps, which indicates the
scope for pipelining
◮ pipelining requires registers to store data between stages.

KR Chowdhary Parallel and Pipeline processors 10/ 21

Computational Models
Parallel computation with serial section model:
◮ It is assumed that fraction f of a given task (computation) cannot be
divided into concurrent subtasks. The remaining part (1 − f ) is assumed
to be dividable. (for example, f may correspond to data i/p).
◮ The time required to execute the task on n processors is:
ts
tm = f .ts + (1 − f ). (4)
n
◮ The speedup is therefore,
ts
S(n) = (5)
f .ts + (1 − f ). tns
n
= (6)
1 + (n − 1).f
◮ So, S(n) is primarily determined by the code section, which cannot be
divided.
◮ If task is completely serial (f = 1), then no speedup can be achieved even
by parallel processors.
◮ For n → ∞,
1
S(n) = (7)
f
which is maximum speedup.
KR Chowdhary Parallel and Pipeline processors 11/ 21
Computational Models

◮ Improvement in performance (speed) of parallel algorithm over a

sequential is limited not by no. of processors but by fraction of the
algorithm (code) that cannot be parallelized. (Amdahl’s law).
◮ Considering the communication overhead:
ts
S(n) = (8)
f .ts + (1 − f )(ts /n) + tc
n
= (9)
f .(n − 1) + 1 + n(tc /ts )
◮ For n → ∞,
n
S(n) = (10)
f (n − 1) + 1 + n(tc /ts )
1
= (11)
f + (tc /ts )
◮ Thus, S(n) depends on communication overhead tc also.

KR Chowdhary Parallel and Pipeline processors 12/ 21

Pipe-lining processors

Instruction Pipe-lining: typical stages of pipeline are:

1. FI (fetch instruction)
2. DI (decode Instruction)
3. CO (calculate operands)
4. FO (fetch operands)
5. EI (execute instruction)
6. WO (write operands)

KR Chowdhary Parallel and Pipeline processors 13/ 21

Instruction Pipe-lining

◮ Nine different instructions are to be executed

◮ The six stage pipeline can reduce the execution time for 9 instructions
from 54 time units to 14 time units.
time units →
Instruc. 1 2 3 4 5 6 7 8 9 10 11 12 13 14
In1 FI DI CO FO EI WO
In2 FI DI CO FO EI WO
In3 FI DI CO FO EI WO
In4 FI DI CO FO EI WO
In5 FI DI CO FO EI WO
In6 FI DI CO FO EI WO
In7 FI DI CO FO EI WO
In8 FI DI CO FO EI WO
In9 FI DI CO FO EI WO

◮ The diagram assumes that each instruction goes through 6 stages of

pipeline.
◮ But, for example, a load instruction does not need WO.
◮ It is also assumed that there is no memory conflicts, for example, FI, FO,
WO in all require memory access (together).
◮ The value may be in cache, or FO/WO may be null.
◮ Six stages may not be of equal duration, conditional branch/interrupt
instruction may invalidate several fetches
◮ After which stage it should check for conditional branch/interrupt?
KR Chowdhary Parallel and Pipeline processors 14/ 21
Factors in Instruction Pipe-lining

◮ Overhead in each stage of ◮ Pipeline Hazard occur when

pipeline for data movements pipeline or its portion stalls.
buffer to buffer 1. Resource hazard: Two or more
instructions in pipeline require
◮ Amount of control logic needed same resource (say ALU/reg.)
to handle memory/register (called structure hazard)
dependencies increases with size 2. Data hazards: conflict in
of pipeline memory access
3. Control hazards: (called branch
◮ It needs time for the buffers to hazards) wrong decision in
operate branch prediction

KR Chowdhary Parallel and Pipeline processors 15/ 21

Vector Processing

◮ In many computational mvi i, 0

applications, a problem can be loop: read A[i]
formulated in terms of vectors read B[i]
and matrices. Processing these store i = i +1
by a special computer is called cmp i, 100
vector processing. jnz loop
◮ A vector is: ◮ Accesses the arrays A and B, and
V = [V1 V2 V3 . . . Vn ]. The index only counter needs to updated.
for Vi is represented as V [i]. A The vector processing computer
program for adding two vectors A eliminates the need of fetching
and B of length 100, to produce the instructions, and executing
vector C is: them. As they are fetched only
◮ Scalar Concept: once only, decoded once only, but
executes them 100 times. This
for(i=0; i < 100; i++) allows operations to be specified
c[i]=b[i]+a[i]; only as:
◮ In machine language we write it
as: C (1 : 100) = A(1 : 100)+B(1 : 100)

KR Chowdhary Parallel and Pipeline processors 16/ 21

Vector Processing

◮ Vector instructions includes the design vector processor to store

initial address of operands, length all operands in registers in
of vectors, and operands to be advance.
performed, all in one composition
instruction. The addition is done ◮ It can be applied in matrix
with a pipelines floating pointing multiplication, for
point adder. It is possible to [l × m] × [m × n].

KR Chowdhary Parallel and Pipeline processors 17/ 21

RISC v/s CISC

◮ CISC: (Complex instruction set program, thus small size, thus

computer) lesser memory, and faster
1. Complex programs have access
motivated the complex and ◮ RISC: (Reduced instruction set
powerful HLL. This produced computer)
semantic gap between HLL and
1. Large number of Gen. purpose
machine languages, and
registers, use of compiler
required more effort in
technology to optimize register
compiler constructions.
usage
2. The attempt to reduce the
2. R-R operations (Adv.?)
semantic gap/simplify compiler
construction, motivated to 3. Simple addressing modes
make more powerful instruction 4. Limited and simple instruction
sets. (CISC - Complex set (one instruction per
Instruction set computing) machine cycle)
3. CISC provide better support for 5. Advantage of simple
HLLs instructions?
4. Lesser count of instructions in 6. Optimizing pipeline

KR Chowdhary Parallel and Pipeline processors 18/ 21

Exercises

1. Design a pipeline configuration to carry out the task to compute:

(Ai + Bi )/(Ci + Di )
2. Construct pipeline to add 100 floating point numbers, i.e., find the result
of x1 × x2 × . . . x100 .
3. 3.1 List the advantages of designing a floating point processor in the form of a
k-segment pipeline rather than a k-unit parallel processor.
3.2 A floating-point pipeline has four segments S1 ,S2 ,S3 ,S4 , whose delays are
100, 90, 100, and 110 nano-secs, respectively. What is the pipeline’s
maximum throughput in MFLOPS?
4. It is frequently argued that large (super) computer is approaching its
performance limits, and the future advances in large computers will
depend on interconnected large number of inexpensive computers
together. List the arguments against and favor.
5. List the features to be added in sequential programming languages, to use
them in large interconnection of small inexpensive computers.
6. Let S1 , S2 , . . ., Sk denote the sequence of k-operations on a program.
Suppose that execution of these operations on a uni-processor produces
the same results regardless of the oder of execution of the k Si ’s. Show
that this does not imply that the Si ’s are parallelizable on a
multi-processor.
KR Chowdhary Parallel and Pipeline processors 19/ 21
Exercises

7. Prove that general problem of determining whether two program segments

S1 , S2 are parallelizable is undecidable by showing that a solution to this
problem implies a solution to the halting problem for Turing machine.
(Hint: Assume that an algorithm A to determine parallelization exists, and
consider applying A to a program containing the statement: if Turing
machine T halts after at most n steps then S1 else S2 ).

KR Chowdhary Parallel and Pipeline processors 20/ 21

Bibliography

M. Morris Mano, “Computer System Architecture”, 3nd Edition, Pearson,

2006.
William Stalling, ”Computer Organization and Architecture”, 8th Edition,
Pearson, 2010.
John P. Hayes, “Computer Architecture and Organization”, 2nd Edition,
McGraw-Hill International Edition, 1988.

KR Chowdhary Parallel and Pipeline processors 21/ 21

Digital Modulations using Matlab
From Everand
Digital Modulations using Matlab
Mathuranathan Viswanathan
4/5 (6)
2.2 Pipelining: Asynchronous
25% (4)
2.2 Pipelining: Asynchronous
24 pages
Power Line Carrier
No ratings yet
Power Line Carrier
15 pages
CA Slides#3 Pipeline Introduction
No ratings yet
CA Slides#3 Pipeline Introduction
26 pages
Coa Notes Unit 5
No ratings yet
Coa Notes Unit 5
55 pages
5.1-5.3 Pipelining and Parallel Processing
No ratings yet
5.1-5.3 Pipelining and Parallel Processing
56 pages
Unit-V NEW
No ratings yet
Unit-V NEW
21 pages
Lec18 Pipeline
No ratings yet
Lec18 Pipeline
59 pages
5 Pipeline
No ratings yet
5 Pipeline
63 pages
Chapter 5 Pipelining and Vector Processing Modified
No ratings yet
Chapter 5 Pipelining and Vector Processing Modified
37 pages
Module 3-Part 2
No ratings yet
Module 3-Part 2
50 pages
Parallelism in Uniprocessor System and Granularity
100% (5)
Parallelism in Uniprocessor System and Granularity
5 pages
Coa Unit 5
No ratings yet
Coa Unit 5
71 pages
Module 4
No ratings yet
Module 4
12 pages
Unit 6 Coa
No ratings yet
Unit 6 Coa
20 pages
Comp Architecture Chapter 4 - Pipelining
No ratings yet
Comp Architecture Chapter 4 - Pipelining
53 pages
Lecture Notes On Parallel Processing Pipeline
No ratings yet
Lecture Notes On Parallel Processing Pipeline
12 pages
CA Unit 6 (Pipelining)
No ratings yet
CA Unit 6 (Pipelining)
13 pages
Chapter 9
No ratings yet
Chapter 9
28 pages
Chapter 3 - Pipelining-And-Vector-Processing
100% (1)
Chapter 3 - Pipelining-And-Vector-Processing
29 pages
ch.9 Pipeline MoDIFIED
No ratings yet
ch.9 Pipeline MoDIFIED
76 pages
Module 4 - Parallel & Pipeline Processing - Final
No ratings yet
Module 4 - Parallel & Pipeline Processing - Final
31 pages
Campmc Unit Ii
No ratings yet
Campmc Unit Ii
61 pages
Parallel Chapter3
No ratings yet
Parallel Chapter3
29 pages
PCC-CS402
No ratings yet
PCC-CS402
7 pages
Computer Architecture Pipe Line
No ratings yet
Computer Architecture Pipe Line
28 pages
Pipeline Processing
No ratings yet
Pipeline Processing
28 pages
Unit-4 Pipelinie and Vector Processing
No ratings yet
Unit-4 Pipelinie and Vector Processing
33 pages
Pipeline and Vector
No ratings yet
Pipeline and Vector
29 pages
Chapter 3
No ratings yet
Chapter 3
59 pages
Chapter 9 - Pipeline and Vector Processing Section 9.1 - Parallel Processing
No ratings yet
Chapter 9 - Pipeline and Vector Processing Section 9.1 - Parallel Processing
10 pages
Coa Unit 5
No ratings yet
Coa Unit 5
20 pages
33 Hazards in Pipeline 06-04-2023
No ratings yet
33 Hazards in Pipeline 06-04-2023
27 pages
Module 5
No ratings yet
Module 5
16 pages
Ca Unit 2.2
100% (2)
Ca Unit 2.2
22 pages
Pipelining and Vector Processing
No ratings yet
Pipelining and Vector Processing
37 pages
Pipe Lining
No ratings yet
Pipe Lining
7 pages
Unit 5
No ratings yet
Unit 5
36 pages
Unit 5
No ratings yet
Unit 5
51 pages
Instruction Pipelining and SuperScalar Development - 2019
No ratings yet
Instruction Pipelining and SuperScalar Development - 2019
53 pages
Final
No ratings yet
Final
26 pages
Unit - V: Pipeline & Vector Processing and Multi Processors Pipeline and Vector Processing: Multiprocessors
No ratings yet
Unit - V: Pipeline & Vector Processing and Multi Processors Pipeline and Vector Processing: Multiprocessors
20 pages
Vectors
No ratings yet
Vectors
52 pages
Pipeline
No ratings yet
Pipeline
33 pages
ACA - Pipelining
No ratings yet
ACA - Pipelining
25 pages
Chap 9
No ratings yet
Chap 9
59 pages
CAO-II Module 2 Complete
100% (1)
CAO-II Module 2 Complete
32 pages
Pipelining Unit 3
No ratings yet
Pipelining Unit 3
19 pages
Presentation 5156 Content Document 20250301102853AM
No ratings yet
Presentation 5156 Content Document 20250301102853AM
40 pages
Unit 4 - P 2
No ratings yet
Unit 4 - P 2
13 pages
ACA1
No ratings yet
ACA1
26 pages
Pipelining and Vector Processing
No ratings yet
Pipelining and Vector Processing
37 pages
Pipelining Basic Concepts and Approaches
No ratings yet
Pipelining Basic Concepts and Approaches
6 pages
1.4-Parallel Computer Architecture
No ratings yet
1.4-Parallel Computer Architecture
22 pages
Parallel Processing Parallel Processing
No ratings yet
Parallel Processing Parallel Processing
64 pages
Unit 6 COA
No ratings yet
Unit 6 COA
37 pages
ACA UNIT-2 Kai Hwang
No ratings yet
ACA UNIT-2 Kai Hwang
40 pages
C Programming for the Pc the Mac and the Arduino Microcontroller System
From Everand
C Programming for the Pc the Mac and the Arduino Microcontroller System
Peter D Minns
No ratings yet
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
MARIO FRANCO
No ratings yet
TensorFlow构建机器学习项目: Chinese Edition
From Everand
TensorFlow构建机器学习项目: Chinese Edition
Posts & Telecom Press
No ratings yet
You Press 'Enter' on the Browser: What happens when..., #1
From Everand
You Press 'Enter' on the Browser: What happens when..., #1
Dustin W. Morris
5/5 (1)
Multimedia Design Project Assessment (MDPA) Report Template: Product URL: Analysis
No ratings yet
Multimedia Design Project Assessment (MDPA) Report Template: Product URL: Analysis
4 pages
Attack Report
No ratings yet
Attack Report
5 pages
MAC Excel Shortcuts: Apple Icon System Preferences Keyboard Check "Use All F1, F2, Etc."
No ratings yet
MAC Excel Shortcuts: Apple Icon System Preferences Keyboard Check "Use All F1, F2, Etc."
1 page
Brief Digital Media & Marketing Plan For Theme Parks & Amusement Parks
No ratings yet
Brief Digital Media & Marketing Plan For Theme Parks & Amusement Parks
14 pages
Datasheet HWT-D2152-10-SIU
No ratings yet
Datasheet HWT-D2152-10-SIU
6 pages
Buy Apple AirPods Pro (2nd Generation) With MagSa
No ratings yet
Buy Apple AirPods Pro (2nd Generation) With MagSa
1 page
L5-6 Reading Houses of The Future
No ratings yet
L5-6 Reading Houses of The Future
3 pages
Dovado UMR Mobile Broadband Router - Manual PDF
No ratings yet
Dovado UMR Mobile Broadband Router - Manual PDF
42 pages
What Is Active Directory
No ratings yet
What Is Active Directory
34 pages
Bcom ITM - Informatics 2B
No ratings yet
Bcom ITM - Informatics 2B
160 pages
الموضوع رقم 29 اختبار الفصل الثالث لغة إنجليزية ثالثة متوسط
No ratings yet
الموضوع رقم 29 اختبار الفصل الثالث لغة إنجليزية ثالثة متوسط
2 pages
112 SOS - Awareness PPT - 13.09.2020
No ratings yet
112 SOS - Awareness PPT - 13.09.2020
19 pages
Application of Machine Learning Controller in Matrix Converter Based On Model Predictive Control Algorithm
No ratings yet
Application of Machine Learning Controller in Matrix Converter Based On Model Predictive Control Algorithm
8 pages
Blue Coat Systems Reporter
100% (1)
Blue Coat Systems Reporter
251 pages
Low-Temperature Heating and Cooling: Augustin Mouchot 1878 Universal Exhibition in Paris Sahara Frank Shuman
No ratings yet
Low-Temperature Heating and Cooling: Augustin Mouchot 1878 Universal Exhibition in Paris Sahara Frank Shuman
2 pages
Mini Project
No ratings yet
Mini Project
19 pages
PCA82C250 / 251 CAN Transceiver: Application Note
No ratings yet
PCA82C250 / 251 CAN Transceiver: Application Note
24 pages
R0124 19 Ed1.1 Satellite AIS Considerations A 124 App.19 December 2011
No ratings yet
R0124 19 Ed1.1 Satellite AIS Considerations A 124 App.19 December 2011
14 pages
Service Bulletin S - 1: Technical Information To All The Owners of Sulzer S 20 Type Diesel Engines 18.02.94
No ratings yet
Service Bulletin S - 1: Technical Information To All The Owners of Sulzer S 20 Type Diesel Engines 18.02.94
7 pages
Information Theory and Coding Notes - Akshansh
100% (2)
Information Theory and Coding Notes - Akshansh
158 pages
Katalog 2022 Inkalum
No ratings yet
Katalog 2022 Inkalum
119 pages
Hitachi Energy Transformer Service Brochure
100% (1)
Hitachi Energy Transformer Service Brochure
13 pages
Accessibility and Visualization For Pulse Meter
No ratings yet
Accessibility and Visualization For Pulse Meter
122 pages
Brill Creations Company Profile
No ratings yet
Brill Creations Company Profile
21 pages
Techlog 2018 - 2 Installation Licensing Guide
No ratings yet
Techlog 2018 - 2 Installation Licensing Guide
21 pages
Deco L14
No ratings yet
Deco L14
72 pages
Sonda Temperatura ICTD 2072 ICTD Temperature Probes Data Sheet
No ratings yet
Sonda Temperatura ICTD 2072 ICTD Temperature Probes Data Sheet
4 pages
02case Study Project Main Document March 2022 - CP Final 18042022
No ratings yet
02case Study Project Main Document March 2022 - CP Final 18042022
7 pages
6ES73340CE010AA0 Datasheet en
No ratings yet
6ES73340CE010AA0 Datasheet en
4 pages

Pipelining

Uploaded by

Pipelining

Uploaded by

Introduction to Computer Architecture

(Parallel and Pipeline processors)

Department of Computer Science and Engineering

November 22, 2013

KR Chowdhary Parallel and Pipeline processors 1/ 21

◮ One classification by M.J. Flynn 2. Single-instruction multiple-data

KR Chowdhary Parallel and Pipeline processors 2/ 21

1. SISD Instruction stream Instruction stream

Control unit Processor Memory

Instr. stream data stream b

Figure 1: SISD and SIMD architecture

KR Chowdhary Parallel and Pipeline processors 3/ 21

KR Chowdhary Parallel and Pipeline processors 4/ 21

◮ We assume that a given computation can be divided into concurrent tasks

S(n) = speedup factor

KR Chowdhary Parallel and Pipeline processors 5/ 21

Figure 3: Pipeline processing.

◮ There are total n operations going on in parallel.

segment 1 segment 2 segment m

Ri : Buffer register Ci : Computing element

Figure 4: Pipeline segments

1. Instruction pipeline: Transfer of instructions through various stages of cpu,

istr. fetch Instr. decode Instr. execute

compare align normalize

Figure 5: Instruction and data pipeline examples.

KR Chowdhary Parallel and Pipeline processors 7/ 21

◮ Consider an example to compute: Ai ∗ Bi + Ci , for i = 1, 2, 3, 4, 5. Each

Figure 6: A segment comprising registers and computing elements.

KR Chowdhary Parallel and Pipeline processors 8/ 21

Table 1: Computation of expression Ai ∗ Bi + Ci in space and time in 3-stage pipeline.

Clock pulse Segment 1 Segment 2 Segment 3

◮ Consider an instruction pipeline unit (segment) that performs the same

KR Chowdhary Parallel and Pipeline processors 10/ 21

◮ Improvement in performance (speed) of parallel algorithm over a

KR Chowdhary Parallel and Pipeline processors 12/ 21

Instruction Pipe-lining: typical stages of pipeline are:

KR Chowdhary Parallel and Pipeline processors 13/ 21

◮ Nine different instructions are to be executed

◮ The diagram assumes that each instruction goes through 6 stages of

◮ Overhead in each stage of ◮ Pipeline Hazard occur when

KR Chowdhary Parallel and Pipeline processors 15/ 21

◮ In many computational mvi i, 0

KR Chowdhary Parallel and Pipeline processors 16/ 21

◮ Vector instructions includes the design vector processor to store

KR Chowdhary Parallel and Pipeline processors 17/ 21

◮ CISC: (Complex instruction set program, thus small size, thus

KR Chowdhary Parallel and Pipeline processors 18/ 21

1. Design a pipeline configuration to carry out the task to compute:

7. Prove that general problem of determining whether two program segments

KR Chowdhary Parallel and Pipeline processors 20/ 21

M. Morris Mano, “Computer System Architecture”, 3nd Edition, Pearson,

KR Chowdhary Parallel and Pipeline processors 21/ 21

You might also like