Unit-5-Parallel Processing
Unit-5-Parallel Processing
Parallel processing:
• Parallel processing is a term used for a large class of techniques that
are used to provide simultaneous data-processing tasks for the purpose of increasing the
computational speed of a computer system.
The system may have two or more ALUs to be able to execute two or more
instruction at the same time.
It can be achieved by having multiple functional units that perform same or different
operation simultaneously.
Separate the execution unit into eight functional units operating in parallel.
There are variety of ways in which the parallel processing can be classified
19
UNIT-V
Architectural Classification:
– Flynn's classification
» Instruction Stream
» Data Stream
SISD represents the organization containing single control unit, a processor unit and a
memory unit. Instruction are executed sequentially and system may or may not have
internal parallel processing capabilities.
SIMD represents an organization that includes many processing units under the
supervision of a common control unit.
MISD structure is of only theoretical interest since no practical system has been
constructed using this organization.
The main difference between multicomputer system and multiprocessor system is that the
multiprocessor system is controlled by one operating system that provides interaction
between processors and all the component of the system cooperate in the solution of a
problem.
Pipeline Processing
Vector Processing
Array Processors
20
UNIT-V
PIPELINING:
• The final result is obtained when data have passed through all segments.
21
UNIT-V
OPERATIONS IN EACH PIPELINE STAGE:
• Space-Time Diagram
PIPELINE SPEEDUP:
n = 6 in previous example
22
UNIT-V
k = 4 in previous example
The first task t1 requires k clock cycles to complete its operation since there
are k segments
• Speedup (S)
Example:
- 4-stage pipeline
Types of Pipelining:
• Arithmetic Pipeline
• Instruction Pipeline
ARITHMETIC PIPELINE:
Pipeline arithmetic units are usually found in very high speed computers.
23
UNIT-V
We will now discuss the pipeline unit for the floating point addition and subtraction.
The inputs to floating point adder pipeline are two normalized floating point numbers.
The floating point addition and subtraction can be performed in four segments.
Floating-point adder:
1) Compare exponents :
3-2=1
2) Align mantissas
X = 0.9504 x 103
Y = 0.08200 x 103
3) Add mantissas
Z = 1.0324 x 103
4) Normalize result
Z = 0.10324 x 104
24
UNIT-V
Instruction Pipeline:
Pipeline processing can occur not only in the data stream but in the instruction stream
as well.
This caused the instruction fetch and execute segments to overlap and perform
simultaneous operation.
25
UNIT-V
INSTRUCTION CYCLE:
26
UNIT-V
[3] Calculate the effective address of the operand
* Effective address calculation can be done in the part of the decoding phase
* Storage of the operation result into a register is done automatically in the execution phase
[2] DA: Decode the instruction and calculate the effective address of the operand
Pipeline Conflicts :
–
1) Resource conflicts: memory access by two segments at the same time. Most of these
conflicts can be resolved by using separate instruction and data memories.
27
UNIT-V
Example: an instruction with register indirect mode cannot proceed to fetch the operand
if the previous instruction is loading the address into the register.
3) Branch difficulties: branch and other instruction (interrupt, ret, ..) that change the value
of PC.
Hardware interlocks: It is the circuit that detects the conflict situation and
delayed the instruction by sufficient cycles to resolve the conflict.
Operand Forwarding: It uses the special hardware to detect the conflict and
avoid it by routing the data through the special path between pipeline
segments.
Delayed Loads: The compiler detects the data conflict and reorder the
instruction as necessary to delay the loading of the conflicting data by
inserting no operation instruction.
Branch Prediction
Delayed Branch
RISC Pipeline:
Since all operation are performed in the register, there is no need of effective address
calculation.
I: Instruction Fetch
A: ALU Operation
E: Execute Instruction
Delayed Load:
28
UNIT-V
Delayed Branch:
29
UNIT-V