Parallel Processing
Parallel Processing
Agenda
• Introduction to parallel processing
• Flynn’s classification
• Pipelining
– General Pipeline
– Arithmetic pipeline
– Instruction pipeline
• Instruction level parallelism
Parallel Processing
• Execution of Concurrent Events in the
computing process to achieve faster
computational speed
• Levels of Parallel Processing
– Job or Program level
– Task or Procedure level
– Inter-Instruction level
– Intra-Instruction level
Parallel Computers
• Flynn's classification: Based on the multiplicity
of Instruction Streams and Data Streams
– Instruction Stream: Sequence of Instructions read
from memory
– Data Stream: Operations performed on the data in
the processor Number of Data Streams
Single Multiple
Instruction stream
• Characteristics
– Standard von Neumann machine
– Instructions and data are stored in memory
– One operation at a time
• Limitations
– Limitation on Memory Bandwidth
– Memory is shared by CPU and I/O
MISD COMPUTER SYSTEMS
M CU P
M CU P Memory
• •
• •
• •
M CU P Data stream
Instruction stream
Control Unit
Instruction stream
P P • • • P Processor units
Data stream
Alignment network
• Characteristics
– Only one copy of the program exists
– A single controller executes one instruction at a time
MIMD COMPUTER SYSTEMS
P M P M ••• P M
Interconnection Network
Shared Memory
• Characteristics
– Multiple processing units
– Execution of multiple instructions on multiple data
• Types of MIMD computer systems
– Shared memory multiprocessors
– Message-passing multicomputers
Pipelining
• A technique of decomposing a sequential
process into sub-operations, with each sub
process being executed in a partial dedicated
segment that operates concurrently with all
other segments.
Ai * Bi + Ci for i = 1, 2, 3, ... , 7
Ai Bi Memory Ci
Segment 1
R1 R2
Multiplier
Segment 2
R3 R4
Adder
Segment 3
R5
Input S 1 R1 S2 R2 S 3 R3 S 4 R4
2 T1 T2 T3 T4 T5 T6
3 T1 T2 T3 T4 T5 T6
4 T1 T2 T3 T4 T5 T6
PIPELINE SPEEDUP
n: Number of tasks to be performed
R R
INSTRUCTION CYCLE
• Six Phases* in an Instruction Cycle
1. Fetch an instruction from memory
2. Decode the instruction
3. Calculate the effective address of the operand
4. Fetch the operands from memory
5. Execute the operation
6. Store the result in the proper place
Decode instruction
Segment2: and calculate
effective address
yes Branch?
no
Segment3: Fetch operand
from memory
Interrupt yes
Interrupt?
handling
no
Update PC
Empty pipe
Instruction Execution In a 4-stage Pipeline
Step: 1 2 3 4 5 6 7 8 9 10 11 12 13
Instruction 1 FI DA FO EX
2 FI DA FO EX
(Branch) 3 FI DA FO EX
4 FI FI DA FO EX
5 FI DA FO EX
6 FI DA FO EX
7 FI DA FO EX
Major Hazards In Pipelined Execution
• 3 categories
1. Resource Hazard
2. Data Dependency Hazard
3. Branching Hazard
Resource Hazard
• Hardware Resources required by the
instructions in simultaneous overlapped
execution cannot be met
Eg. Fetch instruction and fetch operands for
two different instructions at same time.
• Solution is to use two different memory buses
to fetch instruction and data respectively
Data Dependency Hazard
• Occurs when the execution of an instruction
depends on the result of previous instruction
• Eg. ADD R1, R2, R3
SUB R4, R1, R5
• Data hazard can be dealt with either hardware
techniques or software techniques.
Data Dependency Solution
Hardware Technique:
• Interlock
– hardware detects the data dependencies and delays
the scheduling of the dependent instruction by
stalling enough clock cycles
• Forwarding (bypassing, short-circuiting)
– Accomplished by a data path that routes a value from
a source (usually an ALU) to a user, bypassing a
designated register. This allows the value to be
produced to be used at an earlier stage in the pipeline
than would otherwise be possible
Data Dependency Solution
Software Technique:
• Delayed Load:
– give the responsibility for solving data conflicts
problems to the compiler that translates the high-
level programming language into a machine language
program.
– The compiler for such computers is designed to detect
a data conflict and reorder the instructions as
necessary to delay the loading of the conflicting data
by inserting delayed load no-operation instructions.
– This method is referred to as delayed load.
Branching Hazards
• Branch Instructions
– Branch target address is not known until the branch
instruction is completed
Branch FI DA FO EX
Instruction
Next FI DA FO EX
Instruction