0% found this document useful (0 votes)
293 views23 pages

Unit 6 - Pipeline, Vector Processing and Multiprocessors

The document discusses parallel processing and pipelining techniques used to improve computer performance. It describes parallel processing as performing multiple tasks simultaneously using multiple processors or functional units. Pipelining breaks down processes into sequential stages that can operate concurrently on different data. The document outlines Flynn's classification of parallel systems and describes techniques like instruction pipelining and hazards like data dependencies that must be addressed.

Uploaded by

Piyush Koirala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
293 views23 pages

Unit 6 - Pipeline, Vector Processing and Multiprocessors

The document discusses parallel processing and pipelining techniques used to improve computer performance. It describes parallel processing as performing multiple tasks simultaneously using multiple processors or functional units. Pipelining breaks down processes into sequential stages that can operate concurrently on different data. The document outlines Flynn's classification of parallel systems and describes techniques like instruction pipelining and hazards like data dependencies that must be addressed.

Uploaded by

Piyush Koirala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

UNIT VI – PIPELINE, VECTOR PROCESSING

AND MULTIPROCESSORS – 6 HRS

Notes By: Raju Poudel (Mechi Multiple Campus)


Parallel Processing
 Parallel processing is a term used to denote a large class of techniques that are used to
provide simultaneous data-processing tasks for the purpose of increasing the computational
speed of a computer system.
 Instead of processing each instruction sequentially as in a conventional computer, a parallel
processing system is able to perform concurrent data processing to achieve faster
execution time.
 For example, while an instruction is being executed in the ALU, the next instruction can be
read from memory. The system may have two or more ALUs and be able to execute two or
more instructions at the same time.
 Furthermore, the system may have two or more processors operating concurrently. The
purpose of parallel processing is to speed up the computer processing capability and
increase its throughput, that is, the amount of processing that can be accomplished during a
given interval of time.
 The amount of hardware increases with parallel processing. and with it, the cost of the
system increases. However, technological developments have reduced hardware costs to
the point where parallel processing techniques are economically feasible.
Notes By: Raju Poudel (Mechi Multiple Campus)
Parallel Processing
 Parallel processing is established by distributing the
data among the multiple functional units. For
example, the arithmetic, logic, and shift operations
can be separated into three units and the operands
diverted to each unit under the supervision of a
control unit.
 The adder and integer multiplier perform the
arithmetic operations with integer numbers.
 The floating-point operations are separated into three
circuits operating in parallel.
 The logic, shift, and increment operations can be
performed concurrently on different data.
 All units are independent of each other, so one
number can be shifted while another number is being
incremented.
 A multifunctional organization is usually associated
with a complex control unit to coordinate all the
activities among the various components.
Notes By: Raju Poudel (Mechi Multiple Campus)
Flynn's Classification of Parallel Processing
 There are a variety of ways that parallel processing can be classified. It can be considered
from the internal organization of the processors, from the interconnection structure between
processors, or from the flow of information through the system.
 One classification introduced by M. J. Flynn considers the organization of a computer
system by the number of instructions and data items that are manipulated simultaneously.
 The normal operation of a computer is to fetch instructions from memory and execute them
in the processor. The sequence of instructions read from memory constitutes an
instruction stream. The operations performed on the data in the processor constitutes a
data stream. Parallel processing may occur in the instruction stream, in the data stream,
or in both.
 Flynn's classification divides computers into four major groups as follows:
1. Single instruction stream, single data stream (SISD)
2. Single instruction stream, multiple data stream (SIMD)
3. Multiple instruction stream, single data stream (MISD)
4. Multiple instruction stream, multiple data stream (MIMD)

Notes By: Raju Poudel (Mechi Multiple Campus)


Flynn's Classification of Parallel Processing
 SISD represents the organization of a single computer containing a control unit, a processor
unit, and a memory unit. Instructions are executed sequentially and the system may or may
not have internal parallel processing capabilities. Parallel processing in this case may be
achieved by means of multiple functional units or by pipeline processing.

 SIMD represents an organization that includes many processing units under the supervision
of a common control unit. All processors receive the same instruction from the control unit
but operate on different items of data. The shared memory unit must contain multiple
modules so that it can communicate with all the processors simultaneously.

 MISD structure is only of theoretical interest since no practical system has been constructed
using this organization.

 MIMD organization refers to a computer system capable of processing several programs at


the same time. Most multiprocessor and multicomputer systems can be classified in this
category.

Notes By: Raju Poudel (Mechi Multiple Campus)


Pipelining
 Pipelining is a technique of decomposing a sequential process into sub-operations, with
each sub-process being executed in a special dedicated segment that operates
concurrently with all other segments.
 A pipeline can be visualized as a collection of processing segments through which binary
information flows.
 Each segment performs partial processing dictated by the way the task is partitioned. The
result obtained from the computation in each segment is transferred to the next segment in
the pipeline. The final result is obtained after the data have passed through all segments.

 It is characteristic of pipelines that several computations can be in progress in distinct


segments at the same time.
 The overlapping of computation is made possible by associating a register with each
segment in the pipeline. The registers provide isolation between each segment so that each
can operate on distinct data simultaneously.

Notes By: Raju Poudel (Mechi Multiple Campus)


Pipelining Example
 The pipeline organization will be demonstrated by
means of a simple example. Suppose that we want
to perform the combined multiply and add
operations with a stream of numbers.

 Each sub-operation is to be implemented in a


segment within a pipeline. Each segment has one or
two registers and a combinational circuit as shown
in Fig. 9-2.
 R1 through R5 are registers that receive new data
with every clock pulse. The multiplier and adder are
combinational circuits. The sub-operations
performed in each segment of the pipeline are as
follows:

Notes By: Raju Poudel (Mechi Multiple Campus)


Instruction Pipeline
 Pipeline processing can occur not only in the data stream but in the instruction stream as well. An
instruction pipeline reads consecutive instructions from memory while previous instructions are
being executed in other segments.
 This causes the instruction fetch and execute phases to overlap and perform simultaneous
operations.
 Computers with complex instructions require other phases in addition to the fetch and execute to
process an instruction completely. In the most general case, the computer needs to process each
instruction with the following sequence of steps:
1. Fetch the instruction from memory.
2. Decode the instruction.
3. Calculate the effective address.
4. Fetch the operands from memory.
5. Execute the instruction.
6. Store the result in the proper place.
 The design of an instruction pipeline will be most efficient if the instruction cycle is divided into
segments of equal duration. The time that each step takes to fulfill its function depends on the
instruction and the way itis executed.
Notes By: Raju Poudel (Mechi Multiple Campus)
Example: Four-Segment Instruction Pipeline
 The above figure shows operation of 4-segment
instruction pipeline. The four segments are
represented as:
1. FI: segment 1 that fetches the instruction.
2. DA: segment 2 that decodes the instruction and
calculates the effective address.
3. FO: segment 3 that fetches the operands.
4. EX: segment 4 that executes the instruction.
 The space time diagram for the 4-segment
instruction pipeline is given below:

Notes By: Raju Poudel (Mechi Multiple Campus)


Pipeline Conflicts (Hazards)
 A pipeline hazard occurs when the instruction pipeline deviates at some phases, some
operational conditions that do not permit the continued execution. In general, there are three
major difficulties that cause the instruction pipeline to deviate from its normal operation.

1. Resource conflicts caused by access to memory by two segments at the same time.
Most of these conflicts can be resolved by using separate instruction and data memories.

2. Data dependency conflicts arise when an instruction depends on the result of a previous
instruction, but this result is not yet available.

3. Branch difficulties arise from branch and other instructions that change the value of PC.

Notes By: Raju Poudel (Mechi Multiple Campus)


Data Dependency
 It arises when instructions depend on the result of previous instruction but the previous
instruction is not available yet.
 For example an instruction in segment may need to fetch an operand that is being
generated at same time by the previous instruction in the segment.
 The most common techniques used to resolve data hazard are:
(a) Hardware interlock - a hardware interlock is a circuit that detects instructions whose
source operands are destinations of instructions farther up in the pipeline. It then inserts
enough number of clock cycles to delays the execution of such instructions.
(b) Operand forwarding - This method uses a special hardware to detect conflicts in
instruction execution and then avoid it by routing the data through special path between
pipeline segments. For example, instead of transferring an ALU result into a destination
result, the hardware checks the destination operand, and if it is needed in next instruction,
it passes the result directly into ALU input, bypassing the register.
(c) Delayed load - It is software solutions where the compiler is designed in such a way that
it can detect the conflicts; re-order the instructions to delay the loading of conflicting data
by inserting no operation instruction.
Notes By: Raju Poudel (Mechi Multiple Campus)
Handling of Branch Instructions
 Branch hazard arises from branch and other instruction that change the value of program
counter (PC). The conditional branch provides plenty of instruction branch line and it is
difficult to determine which branches will be taken or not taken. A variety of approaches
have been used to deal with branch hazard and they are described below.

(a) Multiple streaming - It is a brute-force approach which replicates the initial portions of the
pipeline and allows the pipeline to fetch both instructions, making use of two streams
(branches).
(b) Prefetch branch target - When a conditional branch is recognized, the target of the
branch is prefetched, in addition to the instruction following the branch. This target is then
saved until the branch instruction is executed. If the branch is taken, the target has already
been prefetched.
(c) Branch prediction - uses additional logic to prediction the outcomes of a (conditional)
branch before it is executed. The popular approaches are - predict never taken, predict
always taken, predict by opcode, taken/not taken switch and using branch history table.

Notes By: Raju Poudel (Mechi Multiple Campus)


Handling of Branch Instructions

d) Loop buffer - A loop buffer is a small, very-high-speed memory maintained by the


instruction fetch stage of the pipeline and containing the n most recently fetched
instructions, in sequence. If a branch is to be taken, the hardware first checks whether the
branch target is within the buffer. If so, the next instruction is fetched from the buffer.

e) Delayed branch - This technique is employed in most RISC processors. In this technique,
compiler detects the branch instructions and re-arranges the instructions by inserting
useful instructions to avoid pipeline hazards.

Notes By: Raju Poudel (Mechi Multiple Campus)


Vector Processing
 Vector processing is a procedure for speeding the processing of information by a
computer, in which pipelined units perform arithmetic operations on uniform, linear arrays of
data values, and a single instruction involves the execution of the same operation on every
element of the array.

 There is a class of computational problems that are beyond the capabilities of a


conventional computer. These problems are characterized by the fact that they require a
vast number of computations that will take a conventional computer days or even weeks to
complete.

 In many science and engineering applications, the problems can be formulated in terms of
vectors and matrices that lend themselves to vector processing.
 To achieve the required level of high performance it is necessary to utilize the fastest and
most reliable hardware and apply innovative procedures from vector and parallel processing
techniques.

Notes By: Raju Poudel (Mechi Multiple Campus)


Application Areas of Vector Processing
Computers with vector processing capabilities are in demand in specialized applications. The
following are representative application areas where vector processing is of the utmost
importance.
- Long-range weather forecasting
- Petroleum explorations
- Seismic data analysis
- Medical diagnosis
- Aerodynamics and space flight simulations
- Artificial intelligence and expert systems
- Mapping the human genome
- Image processing

Notes By: Raju Poudel (Mechi Multiple Campus)


Vector Operations
 Many scientific problems require arithmetic operations on large arrays of numbers. These numbers are usually formulated
as vectors and matrices of floating-point numbers.
 A vector is an ordered set of a one-dimensional array of data items. A vector V of length n is represented as a row vector
by V = [V1 , V2 , V3, · · · Vn].
 A conventional sequential computer is capable of processing operands one at a time. Consequently, operations on vectors
must be broken down into single computations with subscripted variables. The element Vi of vector V is written as V(I) and
the index I refers to a memory address or register where the number is stored.
 To examine the difference between a conventional scalar processor and a vector processor, consider the following Fortran
DO loop:

 This is a program for adding two vectors A and B of length 100 to produce a vector C.

 .A computer capable of vector processing eliminates the overhead associated with the time it takes to fetch and execute
the instructions in the program loop. It allows operations to be specified with a single vector instruction of the form
C(1 : 100) = A(1 : 100) + B(1: 100)
 The vector instruction includes the initial address of the operands, the length of the vectors, and the operation to be
performed, all in one composite instruction.

Notes By: Raju Poudel (Mechi Multiple Campus)


Matrix Multiplication
 Matrix multiplication is one of the most computational intensive operations performed in computers with
vector processors. An n x m matrix of numbers has n rows and m columns and may be considered as
constituting a set of n row vectors or a set of m column vectors. Consider, for example, the multiplication of
two 3 x 3 matrices A and B.

 For example, the number in the first row and first column of matrix C is calculated by letting i = 1, j = 1, to
obtain
Inner Product
In general, the inner product consists of the sum of k product terms of the form

In a typical application k may be equal to 100 or even 1000. The inner product calculation on a pipeline vector
processor is shown below:

Notes By: Raju Poudel (Mechi Multiple Campus)


Arithmetic Pipeline
 Pipeline arithmetic units are usually found in very high speed computers. They are used to
implement floating-point operations, multiplication of fixed-point numbers, and similar
computations encountered in scientific problems.

 Let’s take an example of a pipeline unit for floating-point addition and subtraction. The
inputs to the floating-point adder pipeline are two normalized floating-point binary numbers.

Notes By: Raju Poudel (Mechi Multiple Campus)


Arithmetic Pipeline

Notes By: Raju Poudel (Mechi Multiple Campus)


Multiprocessor System
 A multiprocessor is a computer system with two or more central processing units (CPUs),
with each one sharing the common main memory as well as the peripherals. This helps in
simultaneous processing of programs.
 The key objective of using a multiprocessor is to boost the system’s execution speed, with
other objectives being fault tolerance and application matching.
 A multiprocessor is regarded as a means to improve computing speeds, performance and
cost-effectiveness, as well as to provide enhanced availability and reliability.
Characteristics:
 Consists of more than one CPU.
 Fast processing.
 Reliability
 Cost – Effective
 Simultaneous processing of programs.

Notes By: Raju Poudel (Mechi Multiple Campus)


Interconnection Structures for Multiprocessor System
 The components that form a multiprocessor system are CPUs, IOPs(Input Output
Processors) connected to input output devices, and a memory unit.
 There are several physical forms available for establishing an interconnection network.
- Time-shared common bus
- Multiport memory
- Crossbar switch
- Multistage switching network
1. Time Shared Common Bus
A common-bus multiprocessor system consists of a number of processors connected through
a common path to a memory unit.

Notes By: Raju Poudel (Mechi Multiple Campus)


Interconnection Structures for Multiprocessor
System

2. Multiport Memory
A multiport memory system employs separate
buses between each memory module and each
CPU.

3. Crossbar Switch
Consists of a number of cross points that are
placed at intersections between processor buses
and memory module paths.

Notes By: Raju Poudel (Mechi Multiple Campus)


Interconnection Structures for Multiprocessor System
4. Multistage Switching Network

Notes By: Raju Poudel (Mechi Multiple Campus)

You might also like