0% found this document useful (0 votes)

29 views20 pages

3 ParallelProgrammingModels

The document discusses various parallel programming models including Shared Memory, Threads, Distributed Memory/Message Passing, Data Parallel, Hybrid, Single Program Multiple Data (SPMD), and Multiple Program Multiple Data (MPMD). Each model has unique characteristics, advantages, and disadvantages, and the choice of model often depends on available resources and personal preference. The lecture also highlights implementations of these models, such as POSIX Threads, OpenMP, and MPI, emphasizing the importance of understanding these models for effective parallel computing.

Uploaded by

sp22-bcs-073

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views20 pages

3 ParallelProgrammingModels

Uploaded by

sp22-bcs-073

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

CSC 334 – Parallel and Distributed Computing

Instructor: Ms. Muntha Amjad

Lecture# 03: Parallel Programming Models

1
Parallel Programming Models
• There are several parallel programming models in common use:
• Shared Memory (without threads)
• Threads
• Distributed Memory / Message Passing
• Data Parallel
• Hybrid
• Single Program Multiple Data (SPMD)
• Multiple Program Multiple Data (MPMD)

• Parallel programming models exist as an abstraction above hardware and

memory architectures.

2
Overview
• Although it might not seem apparent, these models are NOT specific to a
particular type of machine or memory architecture. In fact, any of these models
can (theoretically) be implemented on any underlying hardware.

• Shared memory model on a distributed memory machine: Kendall Square

Research (KSR) ALLCACHE approach. Machine memory was physically
distributed across networked machines but appeared to the user as a single
shared memory global address space. Generically, this approach is referred to as
"virtual shared memory".

• Distributed memory model on a shared memory machine: Message Passing

Interface (MPI) on SGI Origin 2000 employed the CC-NUMA (Cache-Coherent
Non-Uniform Memory Access) type of shared memory architecture, where every 3
task has direct access to global memory. However, the ability to send and receive
messages using MPI, as is commonly done over a network of distributed memory
Overview
• Which model to use is often a combination of what is available and personal
choice. There is no "best" model, although there certainly are better
implementations of some models over others.

• This lecture describes each of the models mentioned above and discuss some
of their actual implementations.

4
Shared Memory Model (without threads)
• In this programming model, processes/tasks share a common address space,
which they read and write to asynchronously, meaning that processes do not
need to explicitly communicate but rather access shared data directly.

• Various mechanisms such as locks / semaphores are used to control access to

the shared memory, resolve contentions and to prevent race conditions (where
multiple processes modify data unpredictably) and deadlocks (situations where
processes wait indefinitely for resources).

• This is perhaps the simplest parallel programming model.

• An advantage of this model from the programmer's point of view is that the
notion of data "ownership" is lacking, so there is no need to specify explicitly the
communication of data between tasks. All processes see and have equal access 5
to shared memory. Program development can often be simplified.
Shared Memory Model (without threads)
• An important disadvantage in terms of performance is that it becomes more
difficult to understand and manage data locality:
• If a process works with data that is stored close to it (e.g., in its own cache or memory),
it does not need to access shared memory frequently.
• Accessing memory (especially shared memory) takes time and resources. If a process
can keep the data in its local cache or memory, it avoids costly accesses to the shared
memory.
• Modern processors use caches to store frequently used data. If multiple processes keep
accessing the same shared data, the cache keeps refreshing (updating), which slows
things down.
• In a shared memory system, processes communicate via a system bus. If many
processes frequently access the same data in shared memory, the bus gets congested,
leading to slowdowns.
• Unfortunately, controlling data locality is hard to understand and may be beyond
the control of the average user. 6
Shared Memory Model (without threads)

• Implementation:
• On shared memory platforms, native operating systems, compilers and/or
hardware provide support for shared memory programming. For example, the
POSIX standard provides an API for using shared memory, and UNIX
provides shared memory segments (shmget, shmat, shmctl, etc.).

7
Threads Model
• In this shared memory programming model, a single “heavy weight”
process can have multiple “light weight” concurrent execution paths. For
example:
• The operating system schedules the main program a.out which is loaded
and acquires all necessary resources to run. This is the "heavy weight"
process.
• a.out performs some serial work and then creates a number of tasks
(threads) that can be scheduled and run by the operating system
concurrently.
• Each thread has local data, but also, shares the entire resources of
a.out. This saves the overhead associated with replicating a program's
resources for each thread ("light weight"). Each thread also benefits from a
global memory view because it shares the memory space of a.out.
• A thread's work may best be described as a subroutine within the main
program. Any thread can execute any subroutine at the same time as other
threads.
• Threads communicate with each other through global memory (updating
address locations). This requires synchronization constructs to ensure that
8
more than one thread is not updating the same global address at any time.
• Threads can come and go, but a.out remains present to provide the
necessary shared resources until the application has completed.
Threads Model Implementations

• In both cases, the programmer is responsible for determining all parallelism.

• Threaded implementations are not new in computing. In the past, hardware

vendors developed their own proprietary threading methods. These
implementations were incompatible with each other, making it difficult to write
portable, cross-platform multithreaded programs.
• Unrelated standardization efforts have resulted in two very different
implementations of threads: POSIX Threads and OpenMP. 9
Threads Model Implementations
• POSIX Threads:
• Specified by the IEEE POSIX 1003.1c standard (1995). C Language only
• Part of Unix/Linux operating systems
• Library based; requires parallel coding; commonly referred to as Pthreads.
• Very explicit parallelism; requires significant programmer attention to detail.

• OpenMP:
• Industry standard, jointly defined and endorsed by a group of major computer
hardware and software vendors, organizations and individuals.
• Compiler directive based
• Portable / multi-platform, including Unix and Windows platforms
• Available in C/C++ and Fortran implementations
• Can be very easy and simple to use – provides for "incremental parallelism“ 10

• Other common threaded implementations include Microsoft threads, Java and

Python threads CUDA threads for GPUs
Distributed Memory/ Message Passing Model
• This model demonstrates the following
characteristics:
• A set of tasks that use their own local memory
during computation. Multiple tasks can reside on
the same physical machine and/or across an
arbitrary number of machines.
• Tasks exchange data through communications
by sending and receiving messages.
• Data transfer usually requires cooperative
operations to be performed by each process. For
example, a send operation must have a
matching receive operation.

11
Message Passing Model Implementation: MPI
• From a programming perspective, message passing implementations commonly
comprise a library of subroutines that are imbedded in source code. The
programmer is responsible for determining all parallelism.

• Historically, a variety of message passing libraries have been available since the
1980s. These implementations differed substantially from each other making it
difficult for programmers to develop portable applications.

• In 1992, the MPI Forum was formed with the primary goal of establishing a
standard interface for message passing implementations.

• Part 1 of the Message Passing Interface (MPI) was released in 1994. Part 2
(MPI-2) was released in 1996.
12
Message Passing Model Implementation: MPI
• MPI is the "de facto" industry standard for message passing in parallel
computing.
• It has replaced older message-passing implementations for production use.
• MPI is available on most parallel computing platforms, ensuring wide
applicability.
• Different MPI versions exist (MPI-1, MPI-2, MPI-3), but not all implementations
support every feature.
• For shared memory architectures, MPI implementations usually don't use a
network for task communications. Instead, they use shared memory (memory
copies) for performance reasons.

13
Message Passing Model Implementation: MPI
• The diagram at the bottom illustrates a hybrid approach combining MPI and
OpenMP:
• Each MPI process runs on a separate node in a distributed system.
• Within each node, multiple CPU cores execute OpenMP threads.
• OpenMP is used for intra-node parallelism (within a node), while MPI is used for inter-
node communication (across different nodes via a network).
• This hybrid approach optimizes performance by reducing communication costs.

14
Data Parallel Model
• Also be referred to as the Partitioned Global Address Space (PGAS) model.

• The data parallel model demonstrates the following characteristics:

• Most of the parallel work focuses on performing operations on a data set. The data
set is typically organized into a common structure, such as an array or cube.
• Different tasks operate on separate portions of the same data structure. These tasks
do not overlap, meaning each task is responsible for a unique chunk of data.
• All tasks perform the same computation but on different portions of the data.
Example: If we need to add 4 to every element in an array, each task processes a
different subset of the array.
• On shared memory architectures, all tasks may have access to the data
structure through global memory.

• On distributed memory architectures the data structure is split up and resides as 15

"chunks" in the local memory of each task.
Data Parallel Model Implementations
• Currently, there are several parallel programming implementations in various
stages of developments, based on the Data Parallel / PGAS model.
• Coarray Fortran: A small set of extensions to Fortran 95 for SPMD parallel
programming. Compiler dependent.
• Unified Parallel C (UPC): An extension to the C programming language for SPMD
parallel programming. Compiler dependent.
• Global Arrays: Provides a shared memory style programming environment in the
context of distributed array data structures. Public domain library with C and
Fortran77 bindings.
• X10: A PGAS based parallel programming language being developed by IBM at the
Thomas J. Watson Research Center.
• Chapel: An open-source parallel programming language project being led by Cray.

16
Hybrid Model
• Hybrid models combine more than one of the previously described programming models.

• Currently, a common example of a hybrid model is the combination of MPI with OpenMP.
• Threads perform computationally intensive kernels using local, on-node data
• Communications between processes on different nodes occurs over the network using MPI

• This hybrid model lends itself well to the most popular (currently) hardware environment
of clustered multi/many-core machines.

17
Hybrid Model
• Another similar and increasingly popular example of a hybrid model is using MPI with
CPU-GPU (graphics processing unit) programming.
• MPI tasks run on CPUs using local memory and communicating with each other over a network.
• Computationally intensive kernels are off-loaded to GPUs on-node.
• Data exchange between node-local memory and GPUs uses CUDA (or something equivalent).

• Other hybrid models are common:

• MPI with Pthreads 18
• MPI with non-GPU accelerators
Single Program Multiple Data (SPMD)
• SPMD is actually a "high level" programming model that can be built upon any
combination of the previously mentioned parallel programming models.
• All tasks execute their copy of the same program simultaneously. All tasks may use
different data
• SPMD programs usually have the necessary logic programmed into them to allow
different tasks to branch or conditionally execute only those parts of the program they are
designed to execute. That is, tasks do not necessarily have to execute the entire
program - perhaps only a portion of it.
• The SPMD model, using message passing or hybrid programming, is probably the most
commonly used parallel programming model for multi-node clusters.

19
Multiple Program Multiple Data (MPMD)
• Like SPMD, MPMD is actually a "high level" programming model that can be built upon
any combination of the previously mentioned parallel programming models.

• MPMD applications typically have multiple executable object files (programs). While the
application is being run in parallel, each task can be executing the same or different
program as other tasks.

• All tasks may use different data

Chapter 2 - Parallel Algorithm Design
No ratings yet
Chapter 2 - Parallel Algorithm Design
84 pages
Lecture 4
No ratings yet
Lecture 4
20 pages
Shared Memory in Parallel Computing
No ratings yet
Shared Memory in Parallel Computing
26 pages
Parallel and Distributed Computing Lecture#12
No ratings yet
Parallel and Distributed Computing Lecture#12
19 pages
Meet-7-Parallel Programming Models Bag1
No ratings yet
Meet-7-Parallel Programming Models Bag1
17 pages
Overview of Parallel Programming Models
No ratings yet
Overview of Parallel Programming Models
25 pages
Lecture 6 Parallel Programming Models 1
No ratings yet
Lecture 6 Parallel Programming Models 1
14 pages
Unit 2 Pram Algorithms: Structure Page Nos
No ratings yet
Unit 2 Pram Algorithms: Structure Page Nos
25 pages
Overview of Parallel Processing Types
No ratings yet
Overview of Parallel Processing Types
31 pages
DST4030A Lecture Notes Week 4
No ratings yet
DST4030A Lecture Notes Week 4
42 pages
Recent Trends in Parallel Computing
No ratings yet
Recent Trends in Parallel Computing
12 pages
Lecture 6 Parallel Programming Models
No ratings yet
Lecture 6 Parallel Programming Models
17 pages
IT105 Midterm Lecture Part1
No ratings yet
IT105 Midterm Lecture Part1
5 pages
Parallel Random Access Machines
No ratings yet
Parallel Random Access Machines
5 pages
Lecture-4 Parallel Programming Model
No ratings yet
Lecture-4 Parallel Programming Model
14 pages
Part 1 - Lecture 3 - Parallel Software-1
No ratings yet
Part 1 - Lecture 3 - Parallel Software-1
45 pages
CICS 504 Computer Organization
No ratings yet
CICS 504 Computer Organization
35 pages
Introduction to Parallel Computing
No ratings yet
Introduction to Parallel Computing
28 pages
Multiprocessor Basics & Performance
No ratings yet
Multiprocessor Basics & Performance
52 pages
Chapter Four - Parallel Computing
No ratings yet
Chapter Four - Parallel Computing
86 pages
Lecture 13 - Programming Models
100% (1)
Lecture 13 - Programming Models
15 pages
Programming Models
No ratings yet
Programming Models
21 pages
Lecture 1
No ratings yet
Lecture 1
23 pages
PA Midsem
No ratings yet
PA Midsem
20 pages
PDC Lecture 05
No ratings yet
PDC Lecture 05
48 pages
PP CS
No ratings yet
PP CS
89 pages
Parallel Programming
No ratings yet
Parallel Programming
108 pages
Parallel Computing
No ratings yet
Parallel Computing
32 pages
HPC Module 4
No ratings yet
HPC Module 4
18 pages
Unit3 All
No ratings yet
Unit3 All
115 pages
Introduction To Parallel Programming: Center For Institutional Research Computing
No ratings yet
Introduction To Parallel Programming: Center For Institutional Research Computing
98 pages
Lecture 05 - Programming Models
No ratings yet
Lecture 05 - Programming Models
14 pages
Introduction to POSIX Threads in C
No ratings yet
Introduction to POSIX Threads in C
32 pages
Understanding Parallel Programming Concepts
No ratings yet
Understanding Parallel Programming Concepts
18 pages
Understanding Multicore and OpenMP
No ratings yet
Understanding Multicore and OpenMP
82 pages
Concurrency: CS2403 Programming Languages
No ratings yet
Concurrency: CS2403 Programming Languages
44 pages
PRAM Model
No ratings yet
PRAM Model
5 pages
Parallel Programming Unit 2
No ratings yet
Parallel Programming Unit 2
71 pages
Parallel Programming 3
No ratings yet
Parallel Programming 3
22 pages
Bcs702 Parallel Computing Module 1
100% (3)
Bcs702 Parallel Computing Module 1
35 pages
PDC Lecture 14 MPI Sockets and Memory Models
No ratings yet
PDC Lecture 14 MPI Sockets and Memory Models
20 pages
Parallel Distributed Computing
No ratings yet
Parallel Distributed Computing
64 pages
(OS) - Unit-2.2-2.5 Process Management
No ratings yet
(OS) - Unit-2.2-2.5 Process Management
72 pages
OS Processes & Threads Guide
No ratings yet
OS Processes & Threads Guide
24 pages
15cs72aca Module-5 Aca
100% (1)
15cs72aca Module-5 Aca
53 pages
Lecture 16
No ratings yet
Lecture 16
30 pages
Comporg6 ch12
No ratings yet
Comporg6 ch12
36 pages
2 ParallelArchExec
No ratings yet
2 ParallelArchExec
46 pages
Introduction to MPI and OpenMP Programming
No ratings yet
Introduction to MPI and OpenMP Programming
67 pages
Overview of Parallel Computing Platforms
No ratings yet
Overview of Parallel Computing Platforms
28 pages
ParallelProgramming Start2016
No ratings yet
ParallelProgramming Start2016
41 pages
Programação Paralela e Distribuída
No ratings yet
Programação Paralela e Distribuída
39 pages
Mpi Course
No ratings yet
Mpi Course
202 pages
Module 03-Access Control
No ratings yet
Module 03-Access Control
35 pages
2 TypesofParallelism
No ratings yet
2 TypesofParallelism
69 pages
4 DesigningParallelPrograms
No ratings yet
4 DesigningParallelPrograms
69 pages
1 Introduction
No ratings yet
1 Introduction
48 pages
Multicore Programming for CSE Students
No ratings yet
Multicore Programming for CSE Students
27 pages
Matrix Multiplication Optimization
No ratings yet
Matrix Multiplication Optimization
32 pages
Levels of Parallelism in Computing
No ratings yet
Levels of Parallelism in Computing
70 pages
LIGGGHTS Installation Guide for Ubuntu
No ratings yet
LIGGGHTS Installation Guide for Ubuntu
10 pages
MPI Python Workshop Day1 Fall2024
No ratings yet
MPI Python Workshop Day1 Fall2024
22 pages
Project Proposal
No ratings yet
Project Proposal
3 pages
ANSYS Fluent Getting Started Guide v181
No ratings yet
ANSYS Fluent Getting Started Guide v181
66 pages
Green Gradient Modern Computer Presentation
No ratings yet
Green Gradient Modern Computer Presentation
7 pages
Python Tools for Parallel Computing
No ratings yet
Python Tools for Parallel Computing
16 pages
Parallel Processing Overview
No ratings yet
Parallel Processing Overview
150 pages
Matrix-Matrix Multiplication
No ratings yet
Matrix-Matrix Multiplication
8 pages
Question Bank HPC
No ratings yet
Question Bank HPC
4 pages
Parallel Computing Trends
No ratings yet
Parallel Computing Trends
7 pages
Running Cp2K Calculations: Iain Bethune
No ratings yet
Running Cp2K Calculations: Iain Bethune
22 pages
Lab Manual 06 - P&DC
No ratings yet
Lab Manual 06 - P&DC
3 pages
Multi Core Architectures and Programming
No ratings yet
Multi Core Architectures and Programming
10 pages
IT Lab Manual: Distributed Systems
No ratings yet
IT Lab Manual: Distributed Systems
50 pages
Parallel Io hdf5
No ratings yet
Parallel Io hdf5
53 pages
PDC Workbook
No ratings yet
PDC Workbook
122 pages
GPU Programming & Parallelism
No ratings yet
GPU Programming & Parallelism
23 pages
The Landscape of GPU-Centric Communication
No ratings yet
The Landscape of GPU-Centric Communication
25 pages
02-NCCL Nvshmem
No ratings yet
02-NCCL Nvshmem
64 pages
Unit Iv Distributed Memory Programming With Mpi
No ratings yet
Unit Iv Distributed Memory Programming With Mpi
19 pages
Oneapi HPC Toolkit - Get Started Guide Windows - 2023.0 766889 766890
No ratings yet
Oneapi HPC Toolkit - Get Started Guide Windows - 2023.0 766889 766890
21 pages
Mpi Book
No ratings yet
Mpi Book
350 pages
Parallel & Distributed Systems Course
No ratings yet
Parallel & Distributed Systems Course
2 pages
IMSL Fortran Library User Guide 1 PDF
No ratings yet
IMSL Fortran Library User Guide 1 PDF
947 pages
Siesta-5 0 1
No ratings yet
Siesta-5 0 1
200 pages
CT - IKPI Telecontrol 2008 Short - 76
No ratings yet
CT - IKPI Telecontrol 2008 Short - 76
35 pages

3 ParallelProgrammingModels

Uploaded by

3 ParallelProgrammingModels

Uploaded by

CSC 334 – Parallel and Distributed Computing

Instructor: Ms. Muntha Amjad

• Parallel programming models exist as an abstraction above hardware and

• Shared memory model on a distributed memory machine: Kendall Square

• Distributed memory model on a shared memory machine: Message Passing

• Various mechanisms such as locks / semaphores are used to control access to

• This is perhaps the simplest parallel programming model.

• In both cases, the programmer is responsible for determining all parallelism.

• Threaded implementations are not new in computing. In the past, hardware

• Other common threaded implementations include Microsoft threads, Java and

• The data parallel model demonstrates the following characteristics:

• On distributed memory architectures the data structure is split up and resides as 15

• Other hybrid models are common:

• All tasks may use different data

You might also like