0% found this document useful (0 votes)
15 views20 pages

Unit 6

The document discusses parallel processing and pipelining techniques used to enhance computational speed in computer systems. It explains the concepts of parallel processing, pipelining, instruction pipelines, and vector processing, along with their applications and challenges such as pipeline hazards and data dependencies. Flynn's classification of parallel processing is also outlined, categorizing systems based on instruction and data streams.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views20 pages

Unit 6

The document discusses parallel processing and pipelining techniques used to enhance computational speed in computer systems. It explains the concepts of parallel processing, pipelining, instruction pipelines, and vector processing, along with their applications and challenges such as pipeline hazards and data dependencies. Flynn's classification of parallel processing is also outlined, categorizing systems based on instruction and data streams.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

UNIT 6

BCA 2nd semester


Parallel Processing
• Parallel processing is a term used to denote a large class of techniques that
are used to provide simultaneous data-processing tasks for the purpose of
increasing the computational speed of a computer system.
• Instead of processing each instruction sequentially as in a conventional
computer, a parallel processing system is able to perform concurrent data
processing to achieve faster execution time.
• For example, while an instruction is being executed in the ALU, the next
instruction can be read from memory. The system may have two or more
ALUs and be able to execute two or more instructions at the same time.
• Furthermore, the system may have two or more processors operating
concurrently. The purpose of parallel processing is to speed up the computer
processing capability and increase its throughput, that is, the amount of
processing that can be accomplished during a given interval of time.
• The amount of hardware increases with parallel processing. and with it, the
cost of the system increases. However, technological developments have
reduced hardware costs to the point where parallel processing techniques are
economically feasible.
Pipelining
• Pipelining is a technique of decomposing a sequential process into sub-
operations, with each sub-process being executed in a special dedicated
segment that operates concurrently with all other segments.
• A pipeline can be visualized as a collection of processing segments
through which binary information flows.
• Each segment performs partial processing dictated by the way the task is
partitioned. The result obtained from the computation in each segment is
transferred to the next segment in the pipeline. The final result is
obtained after the data have passed through all segments.
• It is characteristic of pipelines that several computations can be in
progress in distinct segments at the same time.
• The overlapping of computation is made possible by associating a
register with each segment in the pipeline. The registers provide
isolation between each segment so that each can operate on distinct data
simultaneously.
Pipelining Example
R1
Thethrough R5 are registers
pipeline organization that receive
will be demonstrated by new data with every clock
pulse.
means ofThe multiplier
a simple and adder
example. Suppose that weare
wantcombinational
to circuits. The sub-
perform the combined multiply and add operations
operations performed
with a stream of numbers. in each segment of the pipeline are as follows:

Each sub-operation is to be implemented in a


segment within a pipeline. Each segment has one or
two registers and a combinational circuit as shown in
Fig.

R1 through R5 are registers that receive new data


with every clock pulse. The multiplier and adder are
combinational circuits. The sub-operations
performed in each segment of the pipeline are as
follows:
• A task is defined as the total operation performed going through
all the segments in the pipeline. The behavior of a pipeline can be
illustrated with a space-time diagram. It shows the segment
utilization as a function of time. The space-time diagram of a 4
segment pipeline is given below:
• Figure: Space time diagram of 4 segment and 6 tasks
Instruction Pipeline
• Pipeline processing can occur not only in the data stream but in the instruction
stream as well. An instruction pipeline reads consecutive instructions from
memory while previous instructions are being executed in other segments.
• This causes the instruction fetch and execute phases to overlap and perform
simultaneous operations.
• Computers with complex instructions require other phases in addition to the fetch
and execute to process an instruction completely. In the most general case, the
computer needs to process each instruction with the following sequence of steps:
1. Fetch the instruction from memory.
2. Decode the instruction.
3. Calculate the effective address.
4. Fetch the operands from memory.
5. Execute the instruction.
6. Store the result in the proper place.
• The design of an instruction pipeline will be most efficient if the instruction cycle
is divided into segments of equal duration. The time that each step takes to fulfill
its function depends on the instruction and the way it is executed.
Example: Four-Segment Instruction Pipeline
The above figure shows operation of 4-segment
instruction pipeline. The four segments are
represented as:
1. FI: segment 1 that fetches the instruction.
2. DA: segment 2 that decodes the instruction
and calculates the effective address.
3. FO: segment 3 that fetches the operands.
4. EX: segment 4 that executes the instruction.
The space time diagram for the 4-segment
instruction pipeline is given below:
Pipeline Conflicts (Hazards)
• A pipeline hazard occurs when the instruction pipeline deviates at
some phases, some operational conditions that do not permit the
continued execution. In general, there are three major difficulties
that cause the instruction pipeline to deviate from its normal
operation.
1. Resource conflicts caused by access to memory by two
segments at the same time. Most of these conflicts can be
resolved by using separate instruction and data memories.
2. Data dependency conflicts arise when an instruction depends
on the result of a previous instruction, but this result is not yet
available.
3. Branch difficulties arise from branch and other instructions that
change the value of PC.
Data Dependency
• It arises when instructions depend on the result of previous instruction but the
previous instruction is not available yet.
• For example an instruction in segment may need to fetch an operand that is being
generated at same time by the previous instruction in the segment.
• The most common techniques used to resolve data hazard are:
(a) Hardware interlock - a hardware interlock is a circuit that detects instructions
whose source operands are destinations of instructions farther up in the pipeline.
It then inserts enough number of clock cycles to delays the execution of such
instructions.
(b) Operand forwarding - This method uses a special hardware to detect conflicts in
instruction execution and then avoid it by routing the data through special path
between pipeline segments. For example, instead of transferring an ALU result
into a destination result, the hardware checks the destination operand, and if it is
needed in next instruction, it passes the result directly into ALU input, bypassing
the register.
(c) Delayed load - It is software solutions where the compiler is designed in such a
way that it can detect the conflicts; re-order the instructions to delay the loading
of conflicting data by inserting no operation instruction.
Handling of Branch Instructions
• Branch hazard arises from branch and other instruction that change
the value of program counter (PC). The conditional branch provides
plenty of instruction branch line and it is difficult to determine which
branches will be taken or not taken. A variety of approaches have been
used to deal with branch hazard and they are described below.
(a) Prefetch branch target - When a conditional branch is recognized,
the target of the branch is prefetched, in addition to the instruction
following the branch. This target is then saved until the branch
instruction is executed. If the branch is taken, the target has already
been prefetched.
(b) Branch prediction - uses additional logic to prediction the outcomes
of a (conditional) branch before it is executed. The popular
approaches are - predict never taken, predict always taken, predict by
opcode, taken/not taken switch and using branch history table.
d) Loop buffer - A loop buffer is a small, very-high-speed memory
maintained by the instruction fetch stage of the pipeline and
containing the most recently fetched instructions, in sequence. If a
branch is to be taken, the hardware first checks whether the branch
target is within the buffer. If so, the next instruction is fetched from
the buffer.
e) Delayed branch - This technique is employed in most RISC
processors. In this technique, compiler detects the branch
instructions and re-arranges the instructions by inserting useful
instructions to avoid pipeline hazards.
Vector Processing

• Vector processing is a procedure for speeding the processing of


information by a computer, in which pipelined units perform arithmetic
operations on uniform, linear arrays of data values, and a single
instruction involves the execution of the same operation on every
element of the array.
• There is a class of computational problems that are beyond the
capabilities of a conventional computer. These problems are
characterized by the fact that they require a vast number of computations
that will take a conventional computer days or even weeks to complete.
• In many science and engineering applications, the problems can be
formulated in terms of vectors and matrices that lend themselves to
vector processing.
• To achieve the required level of high performance it is necessary to
utilize the fastest and most reliable hardware and apply innovative
procedures from vector and parallel processing techniques
Application Areas of Vector Processing

• Computers with vector processing capabilities are in demand in


specialized applications. The following are representative
application areas where vector processing is of the utmost
importance.
- Long-range weather forecasting
- Petroleum explorations
- Seismic data analysis
- Medical diagnosis
- Aerodynamics and space flight simulations
- Artificial intelligence and expert systems
- Mapping the human genome
- Image processing
Vector Operations
• Many scientific problems require arithmetic operations on large arrays of
numbers. These numbers are usually formulated as vectors and matrices of
floating-point numbers.
• A vector is an order set of one dimensional array of data items. A vector V of
length ‘n’ is represented as a row vector by V = [V1, V2, V3,…………………….., Vn]
• A conventional sequential computer is capable of processing operands one at a
time. Consequently, operations on vectors must be broken down into single
computations with subscripted variables. The element Vi of vector V is written
as V(I) and the index I refers to a memory address or register where the
number is stored.
• To examine the difference between a conventional scalar processor and a
vector processor, consider the following Fortran DO loop:

This is a program for adding two vectors A and B of length


100 to produce a vector C.
• A computer capable of vector processing eliminates the overhead
associated with the time it takes to fetch and execute the
instructions in the program loop. It allows operations to be
specified with a single vector instruction of the form
C(1 : 100) = A(1 : 100) + B(1: 100)
• The vector instruction includes the initial address of the operands,
the length of the vectors, and the operation to be performed, all in
one composite instruction.
Matrix Multiplication

• Matrix multiplication is one of the most computational intensive operations


performed in computers with vector processors. An n x m matrix of numbers
has n rows and m columns and may be considered as constituting a set of n
row vectors or a set of m column vectors. Consider, for example, the
multiplication of two 3 x 3 matrices A and B.
Inner Product
• In general, the inner product consists of the sum of k product terms of the form

In a typical application k may be equal to 100 or even 1000. The inner product calculation
on a pipeline vector processor is shown below:
Flynn's Classification of Parallel Processing
• There are a variety of ways that parallel processing can be classified. It can be
considered from the internal organization of the processors, from the
interconnection structure between processors, or from the flow of information
through the system.
• One classification introduced by M. J. Flynn considers the organization of a
computer system by the number of instructions and data items that are
manipulated simultaneously.
• The normal operation of a computer is to fetch instructions from memory and
execute them in the processor. The sequence of instructions read from memory
constitutes an instruction stream. The operations performed on the data in the
processor constitutes a data stream. Parallel processing may occur in the
instruction stream, in the data stream, or in both.
• Flynn's classification divides computers into four major groups as follows:
1. Single instruction stream, single data stream (SISD)
2. Single instruction stream, multiple data stream (SIMD)
3. Multiple instruction stream, single data stream (MISD)
4. Multiple instruction stream, multiple data stream (MIMD)
• SISD represents the organization of a single computer containing a control
unit, a processor unit, and a memory unit. Instructions are executed
sequentially and the system may or may not have internal parallel processing
capabilities. Parallel processing in this case may be achieved by means of
multiple functional units or by pipeline processing.
• SIMD represents an organization that includes many processing units under
the supervision of a common control unit. All processors receive the same
instruction from the control unit but operate on different items of data. The
shared memory unit must contain multiple modules so that it can
communicate with all the processors simultaneously.
• MISD structure is only of theoretical interest since no practical system has
been constructed using this organization.
• MIMD organization refers to a computer system capable of processing
several programs at the same time. Most multiprocessor and multicomputer
systems can be classified in this category.

You might also like