Parallel Programming
Parallel Computers
Parallel Programming - Lecture 1
The Demand for Computational Speed
Continual demand for greater computational
speed from a computer system than is currently
possible
Areas requiring great computational speed include
numerical modeling and simulation of
scientific and engineering problems.
Computations must be completed within a
"reasonable" time period.
Parallel Programming - Lecture 1 2
Grand Challenge Problems
One that cannot be solved in a reasonable
amount of time with today’s computers.
Obviously, an execution time of 10 years is always
unreasonable.
Examples
Modeling large DNA structures.
Global weather forecasting.
Modeling motion of astronomical bodies.
Parallel Programming - Lecture 1 3
Global Weather Forecasting Example
Atmosphere modeled by dividing it into
3-dimensional cells.
Calculations of each cell repeated many times to
model passage of time.
Parallel Programming - Lecture 1 4
Global Weather Forecasting Example
Suppose whole global atmosphere divided into
cells of size 1 mile 1 mile 1 mile to a height of
10 miles (10 cells high) - about 5 108 cells.
Suppose each calculation requires 200 floating
point operations. In one time step, 1011 floating
point operations necessary.
Parallel Programming - Lecture 1 5
Global Weather Forecasting Example
To forecast the weather over 7 days using
1-minute intervals, a computer operating at
1Gflops (109 floating point operations/s) takes 106
seconds or over 10 days.
To perform calculation in 5 minutes requires
computer operating at 3.4 Tflops (3.4 1012
floating point operations/sec).
Parallel Programming - Lecture 1 6
Modeling Motion of Astronomical Bodies
Each body attracted to each other body by
gravitational forces. Movement of each body
predicted by calculating total force on each body.
With N bodies, N - 1 forces to calculate for each
body, or approx. N2 calculations. (N log2 N for an
efficient approx. algorithm.)
After determining new positions of bodies,
calculations repeated.
Parallel Programming - Lecture 1 7
Modeling Motion of Astronomical Bodies
A galaxy might have, say, 1011 stars.
Even if each calculation done in 1 ms, it takes
109 years for one iteration using N2 algorithm and
almost a year for one iteration using
an efficient N log2 N approximate algorithm.
Parallel Programming - Lecture 1 8
Parallel Computing
Using more than one computer, or a computer
with more than one processor, to solve a
problem.
Motives
Usually faster computation - very simple idea -
that n computers operating simultaneously can
achieve the result n times faster.
Other motives include: fault tolerance, larger
amount of memory available, ...
Parallel Programming - Lecture 1 9
Flynn’s Classifications
Flynn (1966) created a classification for computers
based upon instruction streams and data streams:
Single instruction stream - single data stream
(SISD) computer
Single processor computer - single stream of
instructions generated from program. Instructions
operate upon a single stream of data items.
Parallel Programming - Lecture 1 10
Multiple Instruction Stream - Multiple
Data Stream (MIMD) Computer
General-purpose multiprocessor system - each
processor has a separate program and one
instruction stream is generated from each program
for each processor. Each instruction operates upon
different data.
Both the shared memory and the message-passing
multiprocessors so far described are in the MIMD
classification.
Parallel Programming - Lecture 1 11
Single Instruction Stream - Multiple
Data Stream (SIMD) Computer
A specially designed computer - a single
instruction stream from a single program, but
multiple data streams exist. Instructions from
program broadcast to more than one processor.
Each processor executes same instruction in
synchronism, but using different data.
Developed because a number of important
applications that mostly operate upon arrays of
data.
Parallel Programming - Lecture 1 12
Multiple Program Multiple Data (MPMD)
Structure
Within the MIMD classification, each processor will
have its own program to execute:
Program Program
Instructions Instructions
Processor Processor
Data Data
Parallel Programming - Lecture 1 13
Single Program Multiple Data (SPMD)
Structure
Single source program written and each processor
executes its personal copy of this program, although
independently and not in synchronism.
Source program can be constructed so that parts of
the program are executed by certain computers and
not others depending upon the identity of the
computer.
Parallel Programming - Lecture 1 14
Interconnected Computers as a
Computing Platform
A network of computers became a very attractive
alternative to expensive supercomputers and parallel
computer systems for high-performance computing in
early 1990’s.
Several early projects:
Berkeley NOW (network of workstations) project.
NASA Beowulf project.
Parallel Programming - Lecture 1 15
Beowulf Clusters*
A group of interconnected computers achieving
high performance with low cost.
Typically using interconnects - high speed
Ethernet, and Linux OS.
* Beowulf comes from name given by NASA
Goddard Space Flight Center cluster project.
Parallel Programming - Lecture 1 16
Key advantages
Very high performance workstations and PCs
readily available at low cost.
The latest processors can easily be incorporated
into the system as they become available.
Existing software can be used or modified.
Parallel Programming - Lecture 1 17
Software Tools for Clusters
Based upon Message Passing Parallel
Programming:
Parallel Virtual Machine (PVM) - developed in
late 1980’s. Became very popular.
Message-Passing Interface (MPI) - standard
defined in 1990s.
Both provide a set of user-level libraries for
message passing. Use with regular
programming languages (C, C++, ...).
Parallel Programming - Lecture 1 18