3-ParallelProgrammingModels
3-ParallelProgrammingModels
1
Parallel Programming Models
• There are several parallel programming models in common use:
• Shared Memory (without threads)
• Threads
• Distributed Memory / Message Passing
• Data Parallel
• Hybrid
• Single Program Multiple Data (SPMD)
• Multiple Program Multiple Data (MPMD)
2
Overview
• Although it might not seem apparent, these models are NOT specific to a
particular type of machine or memory architecture. In fact, any of these models
can (theoretically) be implemented on any underlying hardware.
• This lecture describes each of the models mentioned above and discuss some
of their actual implementations.
4
Shared Memory Model (without threads)
• In this programming model, processes/tasks share a common address space,
which they read and write to asynchronously, meaning that processes do not
need to explicitly communicate but rather access shared data directly.
• An advantage of this model from the programmer's point of view is that the
notion of data "ownership" is lacking, so there is no need to specify explicitly the
communication of data between tasks. All processes see and have equal access 5
to shared memory. Program development can often be simplified.
Shared Memory Model (without threads)
• An important disadvantage in terms of performance is that it becomes more
difficult to understand and manage data locality:
• If a process works with data that is stored close to it (e.g., in its own cache or memory),
it does not need to access shared memory frequently.
• Accessing memory (especially shared memory) takes time and resources. If a process
can keep the data in its local cache or memory, it avoids costly accesses to the shared
memory.
• Modern processors use caches to store frequently used data. If multiple processes keep
accessing the same shared data, the cache keeps refreshing (updating), which slows
things down.
• In a shared memory system, processes communicate via a system bus. If many
processes frequently access the same data in shared memory, the bus gets congested,
leading to slowdowns.
• Unfortunately, controlling data locality is hard to understand and may be beyond
the control of the average user. 6
Shared Memory Model (without threads)
• Implementation:
• On shared memory platforms, native operating systems, compilers and/or
hardware provide support for shared memory programming. For example, the
POSIX standard provides an API for using shared memory, and UNIX
provides shared memory segments (shmget, shmat, shmctl, etc.).
7
Threads Model
• In this shared memory programming model, a single “heavy weight”
process can have multiple “light weight” concurrent execution paths. For
example:
• The operating system schedules the main program a.out which is loaded
and acquires all necessary resources to run. This is the "heavy weight"
process.
• a.out performs some serial work and then creates a number of tasks
(threads) that can be scheduled and run by the operating system
concurrently.
• Each thread has local data, but also, shares the entire resources of
a.out. This saves the overhead associated with replicating a program's
resources for each thread ("light weight"). Each thread also benefits from a
global memory view because it shares the memory space of a.out.
• A thread's work may best be described as a subroutine within the main
program. Any thread can execute any subroutine at the same time as other
threads.
• Threads communicate with each other through global memory (updating
address locations). This requires synchronization constructs to ensure that
8
more than one thread is not updating the same global address at any time.
• Threads can come and go, but a.out remains present to provide the
necessary shared resources until the application has completed.
Threads Model Implementations
• OpenMP:
• Industry standard, jointly defined and endorsed by a group of major computer
hardware and software vendors, organizations and individuals.
• Compiler directive based
• Portable / multi-platform, including Unix and Windows platforms
• Available in C/C++ and Fortran implementations
• Can be very easy and simple to use – provides for "incremental parallelism“ 10
11
Message Passing Model Implementation: MPI
• From a programming perspective, message passing implementations commonly
comprise a library of subroutines that are imbedded in source code. The
programmer is responsible for determining all parallelism.
• Historically, a variety of message passing libraries have been available since the
1980s. These implementations differed substantially from each other making it
difficult for programmers to develop portable applications.
• In 1992, the MPI Forum was formed with the primary goal of establishing a
standard interface for message passing implementations.
• Part 1 of the Message Passing Interface (MPI) was released in 1994. Part 2
(MPI-2) was released in 1996.
12
Message Passing Model Implementation: MPI
• MPI is the "de facto" industry standard for message passing in parallel
computing.
• It has replaced older message-passing implementations for production use.
• MPI is available on most parallel computing platforms, ensuring wide
applicability.
• Different MPI versions exist (MPI-1, MPI-2, MPI-3), but not all implementations
support every feature.
• For shared memory architectures, MPI implementations usually don't use a
network for task communications. Instead, they use shared memory (memory
copies) for performance reasons.
13
Message Passing Model Implementation: MPI
• The diagram at the bottom illustrates a hybrid approach combining MPI and
OpenMP:
• Each MPI process runs on a separate node in a distributed system.
• Within each node, multiple CPU cores execute OpenMP threads.
• OpenMP is used for intra-node parallelism (within a node), while MPI is used for inter-
node communication (across different nodes via a network).
• This hybrid approach optimizes performance by reducing communication costs.
14
Data Parallel Model
• Also be referred to as the Partitioned Global Address Space (PGAS) model.
16
Hybrid Model
• Hybrid models combine more than one of the previously described programming models.
• Currently, a common example of a hybrid model is the combination of MPI with OpenMP.
• Threads perform computationally intensive kernels using local, on-node data
• Communications between processes on different nodes occurs over the network using MPI
• This hybrid model lends itself well to the most popular (currently) hardware environment
of clustered multi/many-core machines.
17
Hybrid Model
• Another similar and increasingly popular example of a hybrid model is using MPI with
CPU-GPU (graphics processing unit) programming.
• MPI tasks run on CPUs using local memory and communicating with each other over a network.
• Computationally intensive kernels are off-loaded to GPUs on-node.
• Data exchange between node-local memory and GPUs uses CUDA (or something equivalent).
19
Multiple Program Multiple Data (MPMD)
• Like SPMD, MPMD is actually a "high level" programming model that can be built upon
any combination of the previously mentioned parallel programming models.
• MPMD applications typically have multiple executable object files (programs). While the
application is being run in parallel, each task can be executing the same or different
program as other tasks.
20