0% found this document useful (0 votes)
3 views

3-ParallelProgrammingModels

The document discusses various parallel programming models including Shared Memory, Threads, Distributed Memory/Message Passing, Data Parallel, Hybrid, Single Program Multiple Data (SPMD), and Multiple Program Multiple Data (MPMD). Each model has unique characteristics, advantages, and disadvantages, and the choice of model often depends on available resources and personal preference. The lecture also highlights implementations of these models, such as POSIX Threads, OpenMP, and MPI, emphasizing the importance of understanding these models for effective parallel computing.

Uploaded by

sp22-bcs-073
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

3-ParallelProgrammingModels

The document discusses various parallel programming models including Shared Memory, Threads, Distributed Memory/Message Passing, Data Parallel, Hybrid, Single Program Multiple Data (SPMD), and Multiple Program Multiple Data (MPMD). Each model has unique characteristics, advantages, and disadvantages, and the choice of model often depends on available resources and personal preference. The lecture also highlights implementations of these models, such as POSIX Threads, OpenMP, and MPI, emphasizing the importance of understanding these models for effective parallel computing.

Uploaded by

sp22-bcs-073
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

CSC 334 – Parallel and Distributed Computing

Instructor: Ms. Muntha Amjad


Lecture# 03: Parallel Programming Models

1
Parallel Programming Models
• There are several parallel programming models in common use:
• Shared Memory (without threads)
• Threads
• Distributed Memory / Message Passing
• Data Parallel
• Hybrid
• Single Program Multiple Data (SPMD)
• Multiple Program Multiple Data (MPMD)

• Parallel programming models exist as an abstraction above hardware and


memory architectures.

2
Overview
• Although it might not seem apparent, these models are NOT specific to a
particular type of machine or memory architecture. In fact, any of these models
can (theoretically) be implemented on any underlying hardware.

• Shared memory model on a distributed memory machine: Kendall Square


Research (KSR) ALLCACHE approach. Machine memory was physically
distributed across networked machines but appeared to the user as a single
shared memory global address space. Generically, this approach is referred to as
"virtual shared memory".

• Distributed memory model on a shared memory machine: Message Passing


Interface (MPI) on SGI Origin 2000 employed the CC-NUMA (Cache-Coherent
Non-Uniform Memory Access) type of shared memory architecture, where every 3
task has direct access to global memory. However, the ability to send and receive
messages using MPI, as is commonly done over a network of distributed memory
Overview
• Which model to use is often a combination of what is available and personal
choice. There is no "best" model, although there certainly are better
implementations of some models over others.

• This lecture describes each of the models mentioned above and discuss some
of their actual implementations.

4
Shared Memory Model (without threads)
• In this programming model, processes/tasks share a common address space,
which they read and write to asynchronously, meaning that processes do not
need to explicitly communicate but rather access shared data directly.

• Various mechanisms such as locks / semaphores are used to control access to


the shared memory, resolve contentions and to prevent race conditions (where
multiple processes modify data unpredictably) and deadlocks (situations where
processes wait indefinitely for resources).

• This is perhaps the simplest parallel programming model.

• An advantage of this model from the programmer's point of view is that the
notion of data "ownership" is lacking, so there is no need to specify explicitly the
communication of data between tasks. All processes see and have equal access 5
to shared memory. Program development can often be simplified.
Shared Memory Model (without threads)
• An important disadvantage in terms of performance is that it becomes more
difficult to understand and manage data locality:
• If a process works with data that is stored close to it (e.g., in its own cache or memory),
it does not need to access shared memory frequently.
• Accessing memory (especially shared memory) takes time and resources. If a process
can keep the data in its local cache or memory, it avoids costly accesses to the shared
memory.
• Modern processors use caches to store frequently used data. If multiple processes keep
accessing the same shared data, the cache keeps refreshing (updating), which slows
things down.
• In a shared memory system, processes communicate via a system bus. If many
processes frequently access the same data in shared memory, the bus gets congested,
leading to slowdowns.
• Unfortunately, controlling data locality is hard to understand and may be beyond
the control of the average user. 6
Shared Memory Model (without threads)

• Implementation:
• On shared memory platforms, native operating systems, compilers and/or
hardware provide support for shared memory programming. For example, the
POSIX standard provides an API for using shared memory, and UNIX
provides shared memory segments (shmget, shmat, shmctl, etc.).

7
Threads Model
• In this shared memory programming model, a single “heavy weight”
process can have multiple “light weight” concurrent execution paths. For
example:
• The operating system schedules the main program a.out which is loaded
and acquires all necessary resources to run. This is the "heavy weight"
process.
• a.out performs some serial work and then creates a number of tasks
(threads) that can be scheduled and run by the operating system
concurrently.
• Each thread has local data, but also, shares the entire resources of
a.out. This saves the overhead associated with replicating a program's
resources for each thread ("light weight"). Each thread also benefits from a
global memory view because it shares the memory space of a.out.
• A thread's work may best be described as a subroutine within the main
program. Any thread can execute any subroutine at the same time as other
threads.
• Threads communicate with each other through global memory (updating
address locations). This requires synchronization constructs to ensure that
8
more than one thread is not updating the same global address at any time.
• Threads can come and go, but a.out remains present to provide the
necessary shared resources until the application has completed.
Threads Model Implementations

• In both cases, the programmer is responsible for determining all parallelism.

• Threaded implementations are not new in computing. In the past, hardware


vendors developed their own proprietary threading methods. These
implementations were incompatible with each other, making it difficult to write
portable, cross-platform multithreaded programs.
• Unrelated standardization efforts have resulted in two very different
implementations of threads: POSIX Threads and OpenMP. 9
Threads Model Implementations
• POSIX Threads:
• Specified by the IEEE POSIX 1003.1c standard (1995). C Language only
• Part of Unix/Linux operating systems
• Library based; requires parallel coding; commonly referred to as Pthreads.
• Very explicit parallelism; requires significant programmer attention to detail.

• OpenMP:
• Industry standard, jointly defined and endorsed by a group of major computer
hardware and software vendors, organizations and individuals.
• Compiler directive based
• Portable / multi-platform, including Unix and Windows platforms
• Available in C/C++ and Fortran implementations
• Can be very easy and simple to use – provides for "incremental parallelism“ 10

• Other common threaded implementations include Microsoft threads, Java and


Python threads CUDA threads for GPUs
Distributed Memory/ Message Passing Model
• This model demonstrates the following
characteristics:
• A set of tasks that use their own local memory
during computation. Multiple tasks can reside on
the same physical machine and/or across an
arbitrary number of machines.
• Tasks exchange data through communications
by sending and receiving messages.
• Data transfer usually requires cooperative
operations to be performed by each process. For
example, a send operation must have a
matching receive operation.

11
Message Passing Model Implementation: MPI
• From a programming perspective, message passing implementations commonly
comprise a library of subroutines that are imbedded in source code. The
programmer is responsible for determining all parallelism.

• Historically, a variety of message passing libraries have been available since the
1980s. These implementations differed substantially from each other making it
difficult for programmers to develop portable applications.

• In 1992, the MPI Forum was formed with the primary goal of establishing a
standard interface for message passing implementations.

• Part 1 of the Message Passing Interface (MPI) was released in 1994. Part 2
(MPI-2) was released in 1996.
12
Message Passing Model Implementation: MPI
• MPI is the "de facto" industry standard for message passing in parallel
computing.
• It has replaced older message-passing implementations for production use.
• MPI is available on most parallel computing platforms, ensuring wide
applicability.
• Different MPI versions exist (MPI-1, MPI-2, MPI-3), but not all implementations
support every feature.
• For shared memory architectures, MPI implementations usually don't use a
network for task communications. Instead, they use shared memory (memory
copies) for performance reasons.

13
Message Passing Model Implementation: MPI
• The diagram at the bottom illustrates a hybrid approach combining MPI and
OpenMP:
• Each MPI process runs on a separate node in a distributed system.
• Within each node, multiple CPU cores execute OpenMP threads.
• OpenMP is used for intra-node parallelism (within a node), while MPI is used for inter-
node communication (across different nodes via a network).
• This hybrid approach optimizes performance by reducing communication costs.

14
Data Parallel Model
• Also be referred to as the Partitioned Global Address Space (PGAS) model.

• The data parallel model demonstrates the following characteristics:


• Most of the parallel work focuses on performing operations on a data set. The data
set is typically organized into a common structure, such as an array or cube.
• Different tasks operate on separate portions of the same data structure. These tasks
do not overlap, meaning each task is responsible for a unique chunk of data.
• All tasks perform the same computation but on different portions of the data.
Example: If we need to add 4 to every element in an array, each task processes a
different subset of the array.
• On shared memory architectures, all tasks may have access to the data
structure through global memory.

• On distributed memory architectures the data structure is split up and resides as 15


"chunks" in the local memory of each task.
Data Parallel Model Implementations
• Currently, there are several parallel programming implementations in various
stages of developments, based on the Data Parallel / PGAS model.
• Coarray Fortran: A small set of extensions to Fortran 95 for SPMD parallel
programming. Compiler dependent.
• Unified Parallel C (UPC): An extension to the C programming language for SPMD
parallel programming. Compiler dependent.
• Global Arrays: Provides a shared memory style programming environment in the
context of distributed array data structures. Public domain library with C and
Fortran77 bindings.
• X10: A PGAS based parallel programming language being developed by IBM at the
Thomas J. Watson Research Center.
• Chapel: An open-source parallel programming language project being led by Cray.

16
Hybrid Model
• Hybrid models combine more than one of the previously described programming models.

• Currently, a common example of a hybrid model is the combination of MPI with OpenMP.
• Threads perform computationally intensive kernels using local, on-node data
• Communications between processes on different nodes occurs over the network using MPI

• This hybrid model lends itself well to the most popular (currently) hardware environment
of clustered multi/many-core machines.

17
Hybrid Model
• Another similar and increasingly popular example of a hybrid model is using MPI with
CPU-GPU (graphics processing unit) programming.
• MPI tasks run on CPUs using local memory and communicating with each other over a network.
• Computationally intensive kernels are off-loaded to GPUs on-node.
• Data exchange between node-local memory and GPUs uses CUDA (or something equivalent).

• Other hybrid models are common:


• MPI with Pthreads 18
• MPI with non-GPU accelerators
Single Program Multiple Data (SPMD)
• SPMD is actually a "high level" programming model that can be built upon any
combination of the previously mentioned parallel programming models.
• All tasks execute their copy of the same program simultaneously. All tasks may use
different data
• SPMD programs usually have the necessary logic programmed into them to allow
different tasks to branch or conditionally execute only those parts of the program they are
designed to execute. That is, tasks do not necessarily have to execute the entire
program - perhaps only a portion of it.
• The SPMD model, using message passing or hybrid programming, is probably the most
commonly used parallel programming model for multi-node clusters.

19
Multiple Program Multiple Data (MPMD)
• Like SPMD, MPMD is actually a "high level" programming model that can be built upon
any combination of the previously mentioned parallel programming models.

• MPMD applications typically have multiple executable object files (programs). While the
application is being run in parallel, each task can be executing the same or different
program as other tasks.

• All tasks may use different data

20

You might also like