Lecture 10
Lecture 10
PARALLEL P ROCESSING
1
Parallel Processing
Serial processing (sequential)
http://www.itrelease.com/2017/11/difference-serial-parallel-processing/
It is a processing of completing one task at a
time. The processor can not complete more
than one task at the time and they are run in a
sequence.
2
Parallel Processing
• The purpose of Parallel processing:
1. Speed up the computer processing capability
2. Increase the throughput
3
Parallel processing classifications
• It can be considered
1. From the internal organization of the processors
2. From the interconnection structure between processors
3. From the flow of information through the system.
Flynn's classification
Based on M. Morris Mano “Computer System Architecture”-- Lecturer Ahmed Salah Hameed
4
Flynn's classification
IS
1. SISD
CU PU MM
IS DS
Shared memmory
2. SIMD DS 1
PU 1 MM 1
DS 2
PU 2 MM 2
IS
CU
DS n
PU n MM n
IS
5
Flynn's classification
DS
IS 1 IS 1
CU 1 PU 1
3. MISD IS 2 IS 2
Shared memory
CU 2 PU 2 MM n
Memory
MM 2 MM 1
IS n IS n
CU n PU n
DS
Shared memory
IS 1 IS 1 DS
4. MIMD CU 1 PU 1 MM 1
IS 2 IS 2
CU 2 PU 2 MM 2
v v
IS n IS n
CU n PU n MM n
6
SISD : Single-instruction, single-data (SISD) systems – An SISD computing system is a uniprocessor machine
which is capable of executing a single instruction, operating on a single data stream.
In SISD, machine instructions are processed in a sequential manner and computers adopting this model
are popularly called sequential computers.
All the instructions and data to be processed have to be stored in primary memory.
Single-instruction, multiple-data (SIMD) systems –An SIMD system is a multiprocessor machine capable of
executing the same instruction on all the CPUs but operating on different data streams.
Machines based on an SIMD model are well suited to scientific computing since they involve lots of vector
and matrix operations.
So that the information can be passed to all the processing elements (PEs) organized data elements of vectors
can be divided into multiple sets(N-sets for N PE systems) and each PE can process one data set.
Each PE in the MIMD model has separate instruction and data streams; therefore machines built using
this model are capable to any kind of application.
Unlike SIMD and MISD machines, PEs in MIMD machines work asynchronously.
Pipelining
• Pipelining is a technique of decomposing a sequential
process into sub operations, with each sub process being
executed in a special dedicated segment that operates
concurrently with all other segments.
1 2 3 4 1 2 3 4 5 6
7
Example of the Pipeline Organization
8
Example of the Pipeline Organization
9
SPEED CALCULATION
• n is number of tasks
• Each task divided into k-segments
• Each k-segment executed in a clock cycle time (tp)
• The first task T1 requires a time equal to (k*tp) to complete its operation
because we have k segments in the pipe.
• The remaining n - 1 tasks emerge from the pipe at the rate of one task per
clock cycle and they will be completed after a time equal to (n - 1)*tp.
• Therefore, to complete n tasks using a k-segment pipeline:
k + (n - 1) clock cycles
10
SPEED CALCULATION
Clock cycles
1 2 3 4 5 6 7 8 9
1 k1 k2 k3 k4
2 k1 k2 k3 k4
3 k1 k2 k3 k4
Tasks
4 k1 k2 k3 k4
5 k1 k2 k3 k4
6 k1 k2 k3 k4
11
SPEED CALCULATION
NONPIPELINE UNIT
• Each task take time equal to tn.
• The total time required for n tasks is
n * tn
PIPELINE UNIT
• To complete n tasks using a k-segment pipeline:
k + (n - 1) clock cycles
• Total time is: k + (n - 1) * tp
Smax
SPEED UP
12
EXAMPLE ON SPEED CALCULATION
13
INSTRUCTION PIPELINE
• An instruction pipeline reads consecutive instructions from memory
while previous instructions are being executed in other segments. This
causes the instruction fetch and execute phases to overlap and perform
simultaneous operations.
• Simple Example:
• Consider a computer with an instruction fetch unit and an instruction
execution unit designed to provide a two-segment pipeline.
1 2 3 4 1 2 3 4 5 6
F E F E
F E F E
F E F E
14
INSTRUCTION PIPELINE
• NOTE 1: Computers with complex
instructions require other phases in
addition to the fetch and execute to
process an instruction completely.
15
INSTRUCTION PIPELINE
• NOTE 3: Memory access conflicts are sometimes resolved by using
two memory buses for accessing instructions and data in separate
modules. In this way, an instruction word and a data word can be read
simultaneously from two different modules.
16
EXAMPLE: FOUR-SEGMENT INSTRUCTION PIPELINE
17
PROBLEMS
18
Clock cycles
1 2 3 4 5 6 7 8 9 10 11 12 13
1 k1 k2 k3 k4 k5 k6
2 k1 k2 k3 k4 k5 k6
3 k1 k2 k3 k4 k5 k6
4 k1 k2 k3 k4 k5 k6
Tasks
5 k1 k2 k3 k4 k5 k6
6 k1 k2 k3 K4 k5 k6
7 k1 k2 k3 K4 k5 k6
8 k1 k2 k3 K4 k5 k6
19
20
21
22