ICS 2410 Advanced
Topics in Computer
Science
Chapter 5 Parallel and
Concurrent Systems
1
Some Definitions
Concurrent - Events or processes which
seem to occur or progress at the same
time.
Parallel–Events or processes which occur
or progress at the same time
Parallelprogramming (also, unfortunately,
sometimes called concurrent programming), is
a computer programming technique that
provides for the parallel execution of
operations , either
within a single parallel computer
or across a number of systems. 2
In the latter case, the term distributed
Flynn’s Taxonomy
Best
known classification scheme for parallel
computers.
Depends on parallelism it exhibits with its
Instruction stream
Data stream
A sequence of instructions (the instruction
stream) manipulates a sequence of operands
(the data stream)
Theinstruction stream (I) and the data
stream (D) can be either single (S) or multiple
(M)
3
Four combinations: SISD, SIMD, MISD, MIMD
SISD
Single Instruction, Single Data
Single-CPU systems
i.e.,
uniprocessors
Note: co-processors don’t count as additional
processors
Concurrent processing allowed
Instructionprefetching
Pipelined execution of instructions
Concurrent execution allowed
Thatis, independent concurrent tasks can
execute different sequences of operations.
Most Important Example: a PC
4
SIMD
Single instruction, multiple data
Oneinstruction stream is broadcast to all
processors
Each processor, also called a processing
element (or PE), is usually simplistic and
logically is essentially an ALU;
PEs do not store a copy of the program
nor have a program control unit.
Individual
processors can remain idle
during execution of segments of the
program (based on a data test).
5
SIMD (cont.)
Allactive processor executes the same
instruction synchronously, but on different
data
Technically, on a memory access, all active
processors must access the same location
in their local memory.
The data items form an array (or vector)
and an instruction can act on the complete
array in one cycle.
6
How to View a SIMD
Machine
Think of soldiers all in a unit.
The commander selects certain
soldiers as active – for example, the
first row.
The commander barks out an order
to all the active soldiers, who execute
the order synchronously.
The remaining soldiers do not
execute orders until they are re-
activated.
7
MIMD
Multiple instruction, multiple data
Processors are asynchronous, since they can
independently execute different programs
on different data sets.
Communications are handled either
through shared memory.
(multiprocessors)
byuse of message passing
(multicomputers)
MIMD’shave been considered by most
researchers to include the most powerful 8
and least restricted computers.
MIMD (cont. 2/4)
Have very major communication costs
When compared to SIMDs
Internal ‘housekeeping activities’ are often overlooked
Maintaining distributed memory & distributed databases
Synchronization or scheduling of tasks
Load balancing between processors
Onemethod for programming MIMDs is for all
processors to execute the same program.
Execution of tasks by processors is still asynchronous
Called SPMD method (single program, multiple data)
Usual method when number of processors are large.
Considered to be a “data parallel programming” style
for MIMDs.
9
MIMD (cont 3/4)
A more common technique for programming MIMDs is to
use multi-tasking:
The problem solution is broken up into various tasks.
Tasks are distributed among processors initially.
If new tasks are produced during executions, these may
handled by parent processor or distributed
Each processor can execute its collection of tasks
concurrently.
Ifsome of its tasks must wait for results from other tasks or new
data , the processor will focus the remaining tasks.
Larger programs usually run a load balancing algorithm in
the background that re-distributes the tasks assigned to
the processors during execution
Either dynamic load balancing or called at specific times
Dynamic scheduling algorithms may be needed to assign
a higher execution priority to time-critical tasks
E.g., on critical path, more important, earlier deadline, etc.
10
Multiprocessors
(Shared Memory MIMDs)
Allprocessors have access to all memory
locations .
Two types: UMA and NUMA
UMA (uniform memory access)
Frequently called symmetric multiprocessors or
SMPs
Similar to uniprocessor, except additional,
identical CPU’s are added to the bus.
Each processor has equal access to memory
and can do anything that any other processor
can do. 11
SMPs have been and remain very popular
Multiprocessors (cont.)
NUMA (non-uniform memory access).
Has a distributed memory system.
Each memory location has the same address
for all processors.
Access time to a given memory location varies
considerably for different CPUs.
Normally,fast cache is used with NUMA
systems to reduce the problem of different
memory access time for PEs.
Creates problem of ensuring all copies of the
same data in different memory locations are
identical. 12
Multicomputers
(Message-Passing MIMDs)
Processors are connected by a network
Interconnection network connections is one possibility
Also, may be connected by Ethernet links or a bus.
Each processor has a local memory and can only
access its own local memory.
Data is passed between processors using
messages, when specified by the program.
Message passing between processors is
controlled by a message passing language
(typically MPI)
The problem is divided into processes or tasks
that can be executed concurrently on individual
processors. Each processor is normally assigned
13
multiple processes.
Multiprocessors vs
Multicomputers
Programmingdisadvantages of
message-passing
Programmers must make explicit message-
passing calls in the code
This
is low-level programming and is error
prone.
Datais not shared between processors but
copied, which increases the total data size.
Dataintegrity problem: Difficulty to
maintain correctness of multiple copies of
data item.
14
Multiprocessors vs Multicomputers (cont)
Programming advantages of message-
passing
No problem with simultaneous access to data.
Allows different PCs to operate on the same data
independently.
Allows PCs on a network to be easily upgraded when
faster processors become available.
Mixed“distributed shared memory”
systems exist
Lots of current interest in a cluster of SMPs.
Easier
to build systems with a very large
number of processors. 15
Seeking Concurrency
Several Different Ways Exist
Data parallelism
Task parallelism
Sometimes called control
parallelism or functional
parallelism.
Pipelining
16
Data Parallelism
All tasks (or processors) apply the same set of
operations to different data.
Example: for i 0 to 99 do
a[i] b[i] + c[i]
endfor
Operations may be executed concurrently
17
Data Parallelism Features
Each processor performs the same data
computation on different data sets
Computations can be performed either
synchronously or asynchronously
Defn: Grain Size is the average number of
computations performed between
communication or synchronization steps
18
Task/Functional/
Control/Job Parallelism
Independent tasks apply different operations to
different data elements
a2
b3
m (a + b) / 2
s (a2 + b2) / 2
v s - m2
First and second statements may execute concurrently
Third and fourth statements may execute concurrently
Normally, this type of parallelism deals with
concurrent execution of tasks, not statements
19
Control Parallelism
Features
Problem is divided into different
non-identical tasks
Tasks
are divided between the
processors so that their
workload is roughly balanced
Parallelismat the task level is
considered to be coarse grained
parallelism
20
Pipelining
Divide a process into stages
Produce several items simultaneously
21
Compute Partial Sums
Consider the for loop:
p[0] a[0]
for i 1 to 3 do
p[i] p[i-1] + a[i]
endfor
This computes the partial sums:
p[0] a[0]
p[1] a[0] + a[1]
p[2] a[0] + a[1] + a[2]
p[3] a[0] + a[1] + a[2] + a[3]
The loop is not data parallel as there are dependencies.
However, we can stage the calculations in order to
22
achieve some parallelism.
SIMD Machines
An early SIMD computer designed for
vector and matrix processing was the Illiac
IV computer
Initialdevelopment at the University of Illinois
1965-70
Moved to NASA Ames, completed in 1972 but
not fully functional until 1976.
The MPP, DAP, the Connection Machines
CM-1 and CM-2, and MasPar’s MP-1 and
MP-2 are examples of SIMD computers
The CRAY-1 and the Cyber-205 use
pipelined arithmetic units to support
vector operations and are sometimes
called a pipelined SIMD 23
Today’s SIMDs
SIMD functionality is sometimes
embedded in sequential machines.
Others are being build as part of hybrid
architectures.
Some SIMD and SIMD-like features are
included in some multi/many core
processing units
Some SIMD-like architectures have been
build as special purpose machines,
although some of these could classify as
general purpose.
24
Advantages of SIMDs
Less hardware than MIMDs as they
have only one control unit.
Control units are complex.
Less memory needed than MIMD
Only one copy of the instructions need
to be stored
Allows more data to be stored in
memory.
Much less time required for
communication between PEs and
data movement.
25
Advantages of SIMDs (cont)
Singleinstruction stream and
synchronization of PEs make SIMD
applications easier to program,
understand, & debug.
Similar to sequential programming
Control flow operations and scalar
operations can be executed on the control
unit while PEs are executing other
instructions.
Less complex hardware in SIMD since no
message decoder is needed in the PEs 26
MIMDs need a message decoder in each PE.
SIMD Shortcoming Claims
Claim 1: SIMDs have a data-parallel
orientation, but not all problems are data-
parallel
Claim2: Speed drops for conditionally
executed branches
Claim 3: Don’t adapt to multiple users
well.
Claim 4: Do not scale down well to
“starter” systems that are affordable.
Claim5: Requires customized VLSI for
27
processors and expense of control units in