Pipeline & Parallel Processing
Pipeline & Parallel Processing
(COA)
GTU # 3140707
UNIT-III
Pipeline & Parallel Processing
Outline
Looping
• Parallel Processing
• Pipelining
• Arithmetic Pipeline
• Instruction Pipeline
Parallel Processing
Parallel Processing
Parallel processing is a term used to denote a large class of techniques that are used to
provide simultaneous data-processing tasks for the purpose of increasing the computational
speed of a computer system.
Purpose of parallel processing is to speed up the computer processing capability and increase
its throughput.
Throughput:
The amount of processing that can be accomplished during a given interval of time.
Pipelining
Pipelining
Pipeline is a technique of decomposing a sequential process into sub operations, with each
sub-process being executed in a special dedicated segment that operates concurrently with all
other segments.
A pipeline can be visualized as a collection of processing segments through which binary
information flows.
Each segment performs partial processing dictated by the way the task is partitioned.
The result obtained from the computation in each segment is transferred to the next segment
in the pipeline.
The registers provide isolation between each segment.
The technique is efficient for those applications that need to repeat the same task many times
with different sets of data.
Pipelining example
for
R1 R2
Multiplier
R3 R4
Adder
R5
Pipelining
General structure of four segment pipeline
Clock
Input
S1 R1 S2 R2 S3 R3 S4 R4
Space-time Diagram
Segment 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1
2 Non-Pipelined Architecture
3
Segment 1 2 3 4 5 6 7
Clock cycles
1
2
Pipelined Architecture
3
4
Arithmetic Pipeline
a Exponents b A Mantissas B
R R
R R
R R
Instruction Pipeline
Instruction Pipeline
In the most general case, the computer needs to process each instruction with the following
sequence of steps
1. Fetch the instruction from memory.
2. Decode the instruction.
3. Calculate the effective address.
4. Fetch the operands from memory.
5. Execute the instruction.
6. Store the result in the proper place.
Different segments may take different times to operate on the incoming information.
Some segments are skipped for certain operations.
The design of an instruction pipeline will be most efficient if the instruction cycle is divided
into segments of equal duration.
Instruction Pipeline
Assume that the decoding of the instruction can be combined with the calculation of the
effective address into one segment.
Assume further that most of the instructions place the result into a processor registers so that
the instruction execution and storing of the result can be combined into one segment.
This reduces the instruction pipeline into four segments.
1. FI: Fetch an instruction from memory
2. DA: Decode the instruction and calculate the effective address of the operand
3. FO: Fetch the operand
4. EX: Execute the operation
Four segment CPU pipeline
Fetch instruction
Segment1: from memory
Interrupt yes no
Interrupt?
handling
Update PC
Empty pipe
Space-time Diagram
Step: 1 2 3 4 5 6 7 8 9 10 11 12 13
Instruction 1 FI D F E
A O X
2
FI D F E
3
A O X
4 FI D F E
5 A O X
6 FI D F E
A O X
7
FI D F E
A O X
FI D F E
A O X
FI D F E
Space-time Diagram
Step: 1 2 3 4 5 6 7 8 9 10 11 12 13
Instruction 1 FI D F E
A O X
2
FI D F E
(Branch) 3
A O X
4 FI D F E
5 A O X
6 FI - - FI D F E
A O X
7
- - - FI D F E
A O X
FI D F E
A O X
Pipeline Conflict & Data Dependency
Pipeline Conflict
There are three major difficulties that cause the instruction pipeline conflicts.
1. Resource conflicts caused by access to memory by two segments at the same time. Most of these conflicts
can be resolved by using separate instruction and data memories.
2. Data dependency conflicts arise when an instruction depends on the result of a previous instruction, but
this result is not yet available.
3. Branch difficulties arise from branch and other instructions that change the value of PC.
Data Dependency
Data dependency occurs when an instruction needs data that are not yet available.
Pipelined computers deal with such conflicts between data dependencies in a variety of ways
as follows:
1. Hardware Interlocks
2. Operand forwarding
3. Delayed load
Handling Branch Instructions
The branch instruction breaks the normal sequence of the instruction stream, causing
difficulties in the operation of the instruction pipeline.
Hardware techniques available to minimize the performance degradation caused by
instruction branching are as follows:
Pre-fetch target
Branch target buffer
Loop buffer
Branch prediction
Delayed branch