0% found this document useful (0 votes)

17 views16 pages

Module 5

The document discusses parallel processing and pipelining, highlighting their roles in improving execution speed through concurrent data processing and task decomposition. It explains various computer classifications, such as SISD, SIMD, MISD, and MIMD, and details the mechanics of pipelining, including its efficiency and potential speedup ratios. Additionally, it covers instruction pipelines, their operation, challenges like resource conflicts and data dependencies, and methods to handle these issues.

Uploaded by

prasannashiremath27

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views16 pages

Module 5

Uploaded by

prasannashiremath27

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

UNIT-V

Pipeline: Parallel processing, pipelining-arithmetic pipeline, instruction pipeline;

Multiprocessors: Characteristics of multiprocessors, inter connection structures, inter
processor arbitration, and inter processor communication and synchronization

Parallel Processing

• A parallel processing system is able to perform concurrent data processing

to achieve faster execution time

• The system may have two or more ALUs and be able to execute two or
more instructions at the same time
• Also, the system may have two or more processors operating concurrently

• Goal is to increase the throughput – the amount of processing that can be

accomplished during a given interval of time
• Parallel processing increases the amount of hardware required

• Example: the ALU can be separated into three units and the operands diverted
to each unit under the supervision of a control unit

• All units are independent of each other

• A multifunctional organization is usually associated with a complex control unit to

coordinate all the activities among the various components

110
111
• Parallel processing can be classified from:
o The internal organization of the processors
o The interconnection structure between processors
o The flow of information through the system
o The number of instructions and data items that are manipulated simultaneously

• The sequence of instructions read from memory is the instruction stream

• The operations performed on the data in the processor is the data stream

• Parallel processing may occur in the instruction stream, the data stream, or both

Computer classification:
o Single instruction stream, single data stream – SISD
o Single instruction stream, multiple data stream – SIMD o
Multiple instruction stream, single data stream – MISD
o Multiple instruction stream, multiple data stream – MIMD

• SISD – Instructions are executed sequentially. Parallel processing may be

achieved by means of multiple functional units or by pipeline processing
• SIMD – Includes multiple processing units with a single control unit. All
processors receive the same instruction, but operate on different data.

• MIMD – A computer system capable of processing several programs at the

same time.
• We will consider parallel processing under the following main topics:

PIPELINING

• Pipelining is a technique of decomposing a sequential process into sub operations,

with each sub process being executed in a special dedicated segment that operates
concurrently with all other segments

• Each segment performs partial processing dictated by the way the task is
partitioned

• The result obtained from the computation in each segment is transferred to the
next segment in the pipeline
• The final result is obtained after the data have passed through all segments

• Can imagine that each segment consists of an input register followed by an

combinational circuit
• A clock is applied to all registers after enough time has elapsed to
perform all segment activity
• The information flows through the pipeline one step at a time
• Example: A i * B i + C i for i = 1, 2, 3, …, 7

112
• The suboperations performed in each segment are:

R1 ← Ai , R2 ← Bi

R3 ← R1 * R2, R4 ← Ci

R5 ← R3 + R4

113
• Any operation that can be decomposed into a sequence of suboperations of
about the same complexity can be implemented by a pipeline processor

• The technique is efficient for those applications that need to repeat

the same task many time with different sets of data
• A task is the total operation performed going through all segments of a pipeline

• The behavior of a pipeline can be illustrated with a space-time diagram

• This shows the segment utilization as a function of time

• Once the pipeline is full, it takes only one clock period to obtain an output

114
115
• Any operation that can be decomposed into a sequence of suboperations of
about the same complexity can be implemented by a pipeline processor

• The technique is efficient for those applications that need to repeat

the same task many time with different sets of data
• A task is the total operation performed going through all segments of a pipeline

• The behavior of a pipeline can be illustrated with a space-time diagram

• This shows the segment utilization as a function of time

• Once the pipeline is full, it takes only one clock period to obtain an output

• Consider a k-segment pipeline with a clock cycle time tp to execute n tasks

• The first task T1 requires time ktp to complete

• The remaining n – 1 tasks finish at the rate of one task per clock cycle and will be
completed after time (n – 1)tp
• The total time to complete the n tasks is [k + n – 1]tp

• The example of Figure 9-4 requires [4 + 6 – 1] clock cycles to finish

• Consider a nonpipeline unit that performs the same operation and takes tn time
to complete each task

• The total time to complete n tasks would be ntn

116
• The speedup of a pipeline processing over an equivalent nonpipeline processing is
defined by the ratio
S= ntn .

(k + n – 1)tp

• As the number of tasks increase, the speedup becomes S

= tn
tp

• If we assume that the time to process a task is the same in both circuits, tn =k tp S = ktn
=k
tp

• Therefore, the theoretical maximum speed up that a pipeline can provide is k

• Example:

o Cycle time = tp = 20 ns o # of
segments = k = 4
o # of tasks = n = 100

The pipeline system will take (k + n – 1)tp = (4 + 100 –1)20ns = 2060 ns

Assuming that tn = ktp = 4 * 20 = 80 ns,

A nonpipeline system requires nktp = 100 * 80 =

8000 ns The speedup ratio = 8000/2060 = 3.88

• The pipeline cannot operate at its maximum theoretical rate

• One reason is that the clock cycle must be chosen to equal the time delay of
the segment with the maximum propagation time
• Pipeline organization is applicable for arithmetic operations and fetching
instructions

117
• As the number of tasks increase, the speedup becomes
S = tn
t
p
• Therefore, the theoretical maximum speed up that a pipeline can provide is k

• Example:

p Cycle time = tp = 20 ns o # of
segments = k = 4
p # of tasks = n = 100

The pipeline system will take (k + n – 1)tp = (4 + 100 –1)20ns = 2060 ns

Assuming that tn = ktp = 4 * 20 = 80 ns,

A nonpipeline system requires nktp = 100 * 80 =

8000 ns The speedup ratio = 8000/2060 = 3.88

• The pipeline cannot operate at its maximum theoretical rate

• One reason is that the clock cycle must be chosen to equal the time delay
of the segment with the maximum propagation time
• Pipeline organization is applicable for arithmetic operations and fetching
instructions

Arithmetic Pipeline
• Pipeline arithmetic units are usually found in very high speed computers

• They are used to implement floating-point operations, multiplication of fixed-point

numbers, and similar computations encountered in scientific problems

• Example for floating-point addition and subtraction

• Inputs are two normalized floating-point binary numbers
X = A x 2a
b
Y =Bx2
• A and B are two fractions that represent the mantissas
• a and b are the exponents

118
• Four segments are used to perform the following:

o Compare the exponents o Align the

mantissas
o Add or subtract the
mantissas o Normalize the result

119
3 2
• X = 0.9504 x 10 and Y = 0.8200 x 10
• The two exponents are subtracted in the first segment to obtain 3-2=1
• The larger exponent 3 is chosen as the exponent of the result
3
• Segment 2 shifts the mantissa of Y to the right to obtain Y = 0.0820 x 10
• The mantissas are now aligned
3
• Segment 3 produces the sum Z = 1.0324 x 10
• Segment 4 normalizes the result by shifting the mantissa once to the right and
4
incrementing the exponent by one to obtain Z = 0.10324 x 10
Instruction Pipeline

• An instruction pipeline reads consecutive instructions from memory while

previous instructions are being executed in other segments
• This causes the instruction fetch and execute phases to overlap and perform
simultaneous operations

• If a branch out of sequence occurs, the pipeline must be emptied and all
the instructions that have been read from memory after the branch
instruction must be discarded

• Consider a computer with an instruction fetch unit and an instruction

execution unit forming a two segment pipeline

• A FIFO buffer can be used for the fetch segment

• Thus, an instruction stream can be placed in a queue, waiting for decoding and
processing by the execution segment

• This reduces the average access time to memory for reading instructions

• Whenever there is space in the buffer, the control unit initiates the next
instruction fetch phase
• The following steps are needed to process each instruction:
o Fetch the instruction from memory
o Decode the instruction
o Calculate the effective address o
Fetch the operands from memory o
Execute the instruction
o Store the result in the proper place

• The pipeline may not perform at its maximum rate due to: o
Different segments taking different times to operate
o Some segment being skipped for certain operations
o Memory access conflicts

120
• Example: Four-segment instruction pipeline

• Assume that the decoding can be combined with calculating the EA in one
segment
• Assume that most of the instructions store the result in a register so that the execution
and storing of the result can be combined in one segment

121
• Up to four suboperations in the instruction cycle can overlap and up to four different
instructions can be in progress of being processed at the same time
• It is assumed that the processor has separate instruction and data memories

• Reasons for the pipeline to deviate from its normal operation are:
o Resource conflicts caused by access to memory by two segments at the
same time.
o Data dependency conflicts arise when an instruction depends on the result of
a previous instruction, but his result is not yet available

122
• Assume that most of the instructions store the result in a register so that the execution
and storing of the result can be combined in one segment

123
• Up to four suboperations in the instruction cycle can overlap and up to four different
instructions can be in progress of being processed at the same time
• It is assumed that the processor has separate instruction and data memories

• Reasons for the pipeline to deviate from its normal operation are:

Resource conflicts caused by access to memory by two segments at the

same time.
Data dependency conflicts arise when an instruction depends on the result of a previous
instruction, but his result is not yet available

Branch difficulties arise from program control instructions that may change the value of
PC

• Methods to handle data dependency:

o Hardware interlocks are circuits that detect instructions whose source operands
are destinations of prior instructions. Detection causes the hardware to insert
the required delays without altering the program sequence.
o Operand forwarding uses special hardware to detect a conflict and then avoid
it by routing the data through special paths between pipeline segments. This
requires additional hardware paths through multiplexers as well as the circuit
to detect the conflict.

o Delayed load is a procedure that gives the responsibility for solving data
conflicts to the compiler. The compiler is designed to detect a data conflict and
reorder the instructions as necessary to delay the loading of the conflicting data
by inserting no-operation instructions.

• Methods to handle branch instructions:

o Prefetching the target instruction in addition to the next instruction
allows either instruction to be available.
o A branch target buffer is an associative memory included in the fetch segment of the
branch instruction that stores the target instruction for a previously executed branch.
It also stores the next few instructions after the branch target instruction. This way,
the branch instructions that have occurred previously are readily available in the
pipeline without interruption.
o The loop buffer is a variation of the BTB. It is a small very high speed register
file maintained by the instruction fetch segment of the pipeline. Stores all
branches within a loop segment.

124
o Branch prediction uses some additional logic to guess the outcome of
a conditional branch instruction before it is executed. The pipeline
then begins prefetching instructions from the predicted path.
.

125

BCA Semester II Computer Organisation and Architecture (COA
No ratings yet
BCA Semester II Computer Organisation and Architecture (COA
24 pages
Chapter 3
No ratings yet
Chapter 3
59 pages
Coa Notes Unit 5
No ratings yet
Coa Notes Unit 5
55 pages
Unit-6 Pipelining
No ratings yet
Unit-6 Pipelining
63 pages
Coa Unit 5
No ratings yet
Coa Unit 5
71 pages
Unit 5
No ratings yet
Unit 5
51 pages
Presentation 5156 Content Document 20250301102853AM
No ratings yet
Presentation 5156 Content Document 20250301102853AM
40 pages
Unit-4-Pipeline and Vector Processing
No ratings yet
Unit-4-Pipeline and Vector Processing
45 pages
Csso U 5
No ratings yet
Csso U 5
29 pages
Chapter 5 Pipelining and Vector Processing Modified
No ratings yet
Chapter 5 Pipelining and Vector Processing Modified
37 pages
Pipeline Processing Coa
No ratings yet
Pipeline Processing Coa
34 pages
Unit 6 COA
No ratings yet
Unit 6 COA
37 pages
Chap 9
No ratings yet
Chap 9
59 pages
Unit 5 (Coa) Notes
No ratings yet
Unit 5 (Coa) Notes
35 pages
Unit-4 Pipelinie and Vector Processing
No ratings yet
Unit-4 Pipelinie and Vector Processing
33 pages
COAU5
No ratings yet
COAU5
31 pages
ACA - Pipelining
No ratings yet
ACA - Pipelining
25 pages
Lecture 10
No ratings yet
Lecture 10
23 pages
Comp Architecture Chapter 4 - Pipelining
No ratings yet
Comp Architecture Chapter 4 - Pipelining
53 pages
2.2 Pipelining: Asynchronous
25% (4)
2.2 Pipelining: Asynchronous
24 pages
Parallel Processing
No ratings yet
Parallel Processing
32 pages
Unit - V: Pipeline & Vector Processing and Multi Processors Pipeline and Vector Processing: Multiprocessors
No ratings yet
Unit - V: Pipeline & Vector Processing and Multi Processors Pipeline and Vector Processing: Multiprocessors
20 pages
Coa Unit 5
No ratings yet
Coa Unit 5
20 pages
Vectors
No ratings yet
Vectors
52 pages
Pipeline and Vector
No ratings yet
Pipeline and Vector
29 pages
Unit 4 COA
No ratings yet
Unit 4 COA
19 pages
Pipelining and Vector Processing
No ratings yet
Pipelining and Vector Processing
28 pages
Pipelining and Parallel Processing
No ratings yet
Pipelining and Parallel Processing
25 pages
Pipelining
No ratings yet
Pipelining
11 pages
33 Hazards in Pipeline 06-04-2023
No ratings yet
33 Hazards in Pipeline 06-04-2023
27 pages
Chapter 5 - CO - BIM - III
No ratings yet
Chapter 5 - CO - BIM - III
7 pages
Pipelining PDF
No ratings yet
Pipelining PDF
19 pages
CAO-II Module 2 Complete
100% (1)
CAO-II Module 2 Complete
32 pages
UNIT-5: Pipeline and Vector Processing
No ratings yet
UNIT-5: Pipeline and Vector Processing
63 pages
CA Slides#3 Pipeline Introduction
No ratings yet
CA Slides#3 Pipeline Introduction
26 pages
Lecture 8 Unit 4 Pipeline and Vector Processing 2019
No ratings yet
Lecture 8 Unit 4 Pipeline and Vector Processing 2019
36 pages
Unit-V NEW
No ratings yet
Unit-V NEW
21 pages
Vector Processing and Pipelining
No ratings yet
Vector Processing and Pipelining
22 pages
Pipe Lining
No ratings yet
Pipe Lining
7 pages
Unit 7 N
No ratings yet
Unit 7 N
13 pages
1.4-Parallel Computer Architecture
No ratings yet
1.4-Parallel Computer Architecture
22 pages
Chapter9pipelining 200907163859
No ratings yet
Chapter9pipelining 200907163859
13 pages
Chapter 8 Pipeline and Vector Processing
0% (1)
Chapter 8 Pipeline and Vector Processing
12 pages
Pipelining and Vector Processing
No ratings yet
Pipelining and Vector Processing
37 pages
Unit-3 (Part-IV)
No ratings yet
Unit-3 (Part-IV)
4 pages
Lecture Notes On Parallel Processing Pipeline
No ratings yet
Lecture Notes On Parallel Processing Pipeline
12 pages
Lec18 Pipeline
No ratings yet
Lec18 Pipeline
59 pages
Parallel Chapter3
No ratings yet
Parallel Chapter3
29 pages
Pipelining and Vector Processing
No ratings yet
Pipelining and Vector Processing
30 pages
Unit-5-Parallel Processing
No ratings yet
Unit-5-Parallel Processing
11 pages
Chapter 9 - Pipeline and Vector Processing Section 9.1 - Parallel Processing
No ratings yet
Chapter 9 - Pipeline and Vector Processing Section 9.1 - Parallel Processing
10 pages
Chapter 3 - Pipelining-And-Vector-Processing
100% (1)
Chapter 3 - Pipelining-And-Vector-Processing
29 pages
CO Module 5 Notes
No ratings yet
CO Module 5 Notes
16 pages
CM303
No ratings yet
CM303
23 pages
SHS Research Guidelines V.2
No ratings yet
SHS Research Guidelines V.2
16 pages
Ecommerce Website (E Style) PPT
No ratings yet
Ecommerce Website (E Style) PPT
32 pages
Arman - Maleeq by (1) Salma Ƴar Lele-1
No ratings yet
Arman - Maleeq by (1) Salma Ƴar Lele-1
51 pages
Pipeline and Vector Processing
100% (1)
Pipeline and Vector Processing
18 pages
Parallelism in Uniprocessor System and Granularity
100% (5)
Parallelism in Uniprocessor System and Granularity
5 pages
Nuevo English Book
100% (1)
Nuevo English Book
80 pages
TDS Summary ALL 30 Mar 2022
No ratings yet
TDS Summary ALL 30 Mar 2022
226 pages
Can Daily Reading of English Newspaper Articles Enhance Students Vocabulary
No ratings yet
Can Daily Reading of English Newspaper Articles Enhance Students Vocabulary
19 pages
LogicalReasoningTest1 Solutions
100% (1)
LogicalReasoningTest1 Solutions
9 pages
002baqarah I
No ratings yet
002baqarah I
252 pages
Dogmatics Pomazansky
No ratings yet
Dogmatics Pomazansky
197 pages
English Primary 1 Teacher Guide
No ratings yet
English Primary 1 Teacher Guide
128 pages
Ethernet Storage Design Considerations and Best Practices For Clustered Data ONTAP Configurations
100% (1)
Ethernet Storage Design Considerations and Best Practices For Clustered Data ONTAP Configurations
40 pages
A Narrowband Active Noise Control System With A Frequency Estimation Algorithm Based On Parallel Adaptive Notch Filter
No ratings yet
A Narrowband Active Noise Control System With A Frequency Estimation Algorithm Based On Parallel Adaptive Notch Filter
40 pages
Jesus According To Jesus
No ratings yet
Jesus According To Jesus
16 pages
International Space Olympiad 2024 Preliminary Level Examination Result Super Senior Category
No ratings yet
International Space Olympiad 2024 Preliminary Level Examination Result Super Senior Category
7 pages
2.1-2.9 Batch Robustness
100% (1)
2.1-2.9 Batch Robustness
32 pages
Excerpts From The Benjamin Blackburn Family
No ratings yet
Excerpts From The Benjamin Blackburn Family
15 pages
OpenText Directory Services 16.0 - Tenant Management Guide English (OTDS160000-CCS-EN-02)
100% (1)
OpenText Directory Services 16.0 - Tenant Management Guide English (OTDS160000-CCS-EN-02)
16 pages
Vehicle Communication System Using Li-Fi Technology PDF
No ratings yet
Vehicle Communication System Using Li-Fi Technology PDF
7 pages
Existence and Continuous Dependence of Solutions of A Neutral Functional-Differential Equation
No ratings yet
Existence and Continuous Dependence of Solutions of A Neutral Functional-Differential Equation
18 pages
Dyslexia Essay
No ratings yet
Dyslexia Essay
4 pages
LLM Mastery Pathways
No ratings yet
LLM Mastery Pathways
8 pages
Advanced Java
No ratings yet
Advanced Java
7 pages
Bahasa Inggris Kelas 5
No ratings yet
Bahasa Inggris Kelas 5
5 pages
Untitled
No ratings yet
Untitled
4 pages
History and Theory of Architecture III Essay Assignment 2013
No ratings yet
History and Theory of Architecture III Essay Assignment 2013
4 pages
Types of Coomunication
No ratings yet
Types of Coomunication
3 pages
Ansys-Product-Reference-Table-Startup-Program-Rev-9-11-23 - 1 1
No ratings yet
Ansys-Product-Reference-Table-Startup-Program-Rev-9-11-23 - 1 1
2 pages
Stilske Figure
No ratings yet
Stilske Figure
5 pages
DICOM Tags Used by Phoenix Modality Worklist
No ratings yet
DICOM Tags Used by Phoenix Modality Worklist
2 pages
Semi-Automatic Twister: The Way To Make It
No ratings yet
Semi-Automatic Twister: The Way To Make It
1 page
IGNOU Operating System Previous Years Solved Papers
From Everand
IGNOU Operating System Previous Years Solved Papers
Manish Soni
No ratings yet
Technology in Telecommunications Networks
From Everand
Technology in Telecommunications Networks
Tanushri Kaniyar
No ratings yet
An Introduction To Data Acquisition
From Everand
An Introduction To Data Acquisition
Jason King
No ratings yet
Preliminary Specifications: Programmed Data Processor Model Three (PDP-3) October, 1960
From Everand
Preliminary Specifications: Programmed Data Processor Model Three (PDP-3) October, 1960
Digital Equipment Corporation
No ratings yet

Module 5

Uploaded by

Module 5

Uploaded by

UNIT-V

Pipeline: Parallel processing, pipelining-arithmetic pipeline, instruction pipeline;

• A parallel processing system is able to perform concurrent data processing

• Goal is to increase the throughput – the amount of processing that can be

• All units are independent of each other

• A multifunctional organization is usually associated with a complex control unit to

• The sequence of instructions read from memory is the instruction stream

• SISD – Instructions are executed sequentially. Parallel processing may be

• MIMD – A computer system capable of processing several programs at the

• Pipelining is a technique of decomposing a sequential process into sub operations,

• Can imagine that each segment consists of an input register followed by an

• The technique is efficient for those applications that need to repeat

• The behavior of a pipeline can be illustrated with a space-time diagram

• This shows the segment utilization as a function of time

• The technique is efficient for those applications that need to repeat

• The behavior of a pipeline can be illustrated with a space-time diagram

• This shows the segment utilization as a function of time

• Consider a k-segment pipeline with a clock cycle time tp to execute n tasks

• The first task T1 requires time ktp to complete

• The example of Figure 9-4 requires [4 + 6 – 1] clock cycles to finish

• The total time to complete n tasks would be ntn

• As the number of tasks increase, the speedup becomes S

• Therefore, the theoretical maximum speed up that a pipeline can provide is k

The pipeline system will take (k + n – 1)tp = (4 + 100 –1)20ns = 2060 ns

Assuming that tn = ktp = 4 * 20 = 80 ns,

A nonpipeline system requires nktp = 100 * 80 =

• The pipeline cannot operate at its maximum theoretical rate

The pipeline system will take (k + n – 1)tp = (4 + 100 –1)20ns = 2060 ns

Assuming that tn = ktp = 4 * 20 = 80 ns,

A nonpipeline system requires nktp = 100 * 80 =

• The pipeline cannot operate at its maximum theoretical rate

• They are used to implement floating-point operations, multiplication of fixed-point

• Example for floating-point addition and subtraction

o Compare the exponents o Align the

• An instruction pipeline reads consecutive instructions from memory while

• Consider a computer with an instruction fetch unit and an instruction

• A FIFO buffer can be used for the fetch segment

Resource conflicts caused by access to memory by two segments at the

• Methods to handle data dependency:

• Methods to handle branch instructions:

You might also like