0% found this document useful (0 votes)

42 views24 pages

Parallel Computing

This document discusses parallel computing memory architectures and parallel programming models. It covers shared memory architectures including uniform memory access (UMA) and non-uniform memory access (NUMA). Distributed memory and hybrid distributed-shared memory architectures are also discussed. Common parallel programming models include the shared memory model, threads model, message passing model, data parallel model, and hybrid model. Designing parallel programs involves partitioning work, considering factors like communication overhead, and balancing granularity. Debugging and performance analysis tools are also important.

Uploaded by

MUHAMMAD AYUB

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views24 pages

Parallel Computing

Uploaded by

MUHAMMAD AYUB

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Lecture 2

More about Parallel

Computing
Vajira Thambawita
Parallel Computer Memory Architectures
- Shared Memory
• Multiple processors can work independently but share the same
memory resources

• Shared memory machines can be divided into two groups based upon
memory access time:
UMA : Uniform Memory Access
NUMA : Non- Uniform Memory Access
Parallel Computer Memory Architectures
- Shared Memory

Uniform Memory Access (UMA)

• Equal accesses and access times to memory

• Most commonly represented today by Symmetric
Multiprocessor (SMP) machines
Parallel Computer Memory Architectures
- Shared Memory
Non - Uniform Memory Access (NUMA)

• Not all processors have equal memory

access time
Parallel Computer Memory Architectures
- Distributed Memory

• Processors have own memory

(There is no concept of global address space)
• It operates independently
• communications in message passing systems
are performed via send and receive operations
Parallel Computer Memory Architectures
– Hybrid Distributed-Shared Memory

• Use in largest and Fasted computers in the

world today
Parallel Programming Models
Shared Memory Model (without threads)
• In this programming model, processes/tasks share a common address
space, which they read and write to asynchronously.
Parallel Programming Models
Threads Model
• This programming model is a type of
shared memory programming.
• In the threads model of parallel
programming, a single "heavy weight"
process can have multiple "light
weight", concurrent execution paths.
• Ex: POSIX Threads, OpenMP,
Microsoft threads, Java Python
threads, CUDA threads for GPUs
Parallel Programming Models
Distributed Memory / Message
Passing Model
• A set of tasks that use their
own local memory during
computation. Multiple tasks
can reside on the same physical
machine and/or across an
arbitrary number of machines.
• Tasks exchange data through
communications by sending
and receiving messages.
• Ex:
• Message Passing Interface
(MPI)
Parallel Programming Models
Data Parallel Model
• May also be referred to as the Partitioned Global Address Space (PGAS)
model.
• Ex: Coarray Fortran, Unified Parallel C (UPC), X10
Parallel Programming Models
Hybrid Model
• A hybrid model combines more than one of the previously described
programming models.
Parallel Programming Models
SPMD and MPMD
Single Program Multiple Data (SPMD)

Multiple Program Multiple Data (MPMD)

Designing Parallel Programs
Automatic vs. Manual Parallelization
• Fully Automatic
• The compiler analyzes the source code and identifies opportunities for
parallelism.
• The analysis includes identifying inhibitors to parallelism and possibly a
cost weighting on whether or not the parallelism would actually improve
performance.
• Loops (do, for) are the most frequent target for automatic
parallelization.
• Programmer Directed
• Using "compiler directives" or possibly compiler flags, the programmer
explicitly tells the compiler how to parallelize the code.
• May be able to be used in conjunction with some degree of automatic
parallelization also.
Designing Parallel Programs
Understand the Problem and the Program

• Easy to parallelize problem

• Calculate the potential energy for each of several thousand independent
conformations of a molecule. When done, find the minimum energy
conformation.

• A problem with little-to-no parallelism

• Calculation of the Fibonacci series (0,1,1,2,3,5,8,13,21,...) by use of the
formula:
• F(n) = F(n-1) + F(n-2)
Designing Parallel Programs
Partitioning

• One of the first steps in designing a parallel program is to break the

problem into discrete "chunks" of work that can be distributed to
multiple tasks. This is known as decomposition or partitioning.

Two ways:
• Domain decomposition

• Functional decomposition
Designing Parallel Programs
Domain Decomposition
The data associated with a problem is decomposed

There are different ways to partition data:

Designing Parallel Programs
Functional Decomposition
The problem is decomposed according to the work that must be done
Designing Parallel Programs
You DON'T need communications
• Some types of problems can be
decomposed and executed in parallel with
virtually no need for tasks to share data.
• Ex: Every pixel in a black and white image
needs to have its color reversed

You DO need communications

• This require tasks to share data with each
other
• A 2-D heat diffusion problem requires a
task to know the temperatures calculated
by the tasks that have neighboring data
Designing Parallel Programs
Factors to Consider (designing your program's inter-task
communications)
• Communication overhead
• Latency vs. Bandwidth
• Visibility of communications
• Synchronous vs. asynchronous communications
• Scope of communications

• Efficiency of communications
Designing Parallel Programs
Granularity
• In parallel computing, granularity is a qualitative measure of the ratio of
computation to communication. (Computation / Communication)
• Periods of computation are typically separated from periods of
communication by synchronization events.

• Fine-grain Parallelism

• Coarse-grain Parallelism
Designing Parallel Programs
• Fine-grain Parallelism
• Relatively small amounts of computational work
are done between communication events
• Low computation to communication ratio
• Facilitates load balancing
• Implies high communication overhead and less
opportunity for performance enhancement
• If granularity is too fine it is possible that the
overhead required for communications and
synchronization between tasks takes longer than
the computation.
• Coarse-grain Parallelism
• Relatively large amounts of computational work
are done between communication/synchronization
events
• High computation to communication ratio
• Implies more opportunity for performance increase
• Harder to load balance efficiently
Designing Parallel Programs
I/O
• Rule #1: Reduce overall I/O as much as possible
• If you have access to a parallel file system, use it.
• Writing large chunks of data rather than small chunks is usually
significantly more efficient.
• Fewer, larger files performs better than many small files.
• Confine I/O to specific serial portions of the job, and then use parallel
communications to distribute data to parallel tasks. For example, Task 1
could read an input file and then communicate required data to other
tasks. Likewise, Task 1 could perform write operation after receiving
required data from all other tasks.
• Aggregate I/O operations across tasks - rather than having many tasks
perform I/O, have a subset of tasks perform it.
Designing Parallel Programs
Debugging
• TotalView from RogueWave Software
• DDT from Allinea
• Inspector from Intel

Performance Analysis and Tuning

• LC's web pages at https://hpc.llnl.gov/software/development-environment-
software
• TAU: http://www.cs.uoregon.edu/research/tau/docs.php
• HPCToolkit: http://hpctoolkit.org/documentation.html
• Open|Speedshop: http://www.openspeedshop.org/
• Vampir / Vampirtrace: http://vampir.eu/
• Valgrind: http://valgrind.org/
• PAPI: http://icl.cs.utk.edu/papi/
• mpitrace https://computing.llnl.gov/tutorials/bgq/index.html#mpitrace
• mpiP: http://mpip.sourceforge.net/
• memP: http://memp.sourceforge.net/
Summary

Learn Multithreading with Modern C++
From Everand
Learn Multithreading with Modern C++
James Raynard
No ratings yet
High Performance Computing (HPC) - Lec2
No ratings yet
High Performance Computing (HPC) - Lec2
53 pages
Mpi Course
No ratings yet
Mpi Course
202 pages
Parallel Computing
No ratings yet
Parallel Computing
28 pages
CS526 3 Design of Parallel Programs
No ratings yet
CS526 3 Design of Parallel Programs
83 pages
4-DesigningParallelPrograms
No ratings yet
4-DesigningParallelPrograms
69 pages
5 - Designing Parallel Programs
No ratings yet
5 - Designing Parallel Programs
52 pages
Parallel_computing
No ratings yet
Parallel_computing
32 pages
Unit VI Parallel Programming Concepts
No ratings yet
Unit VI Parallel Programming Concepts
90 pages
Message Passing Fundamentals: Reference: Http://foxtrot - Ncsa.uiuc - edu:8900/public/MPI
No ratings yet
Message Passing Fundamentals: Reference: Http://foxtrot - Ncsa.uiuc - edu:8900/public/MPI
22 pages
Parallel Programming
No ratings yet
Parallel Programming
18 pages
Part 1 - Lecture 3 - Parallel Software-1
No ratings yet
Part 1 - Lecture 3 - Parallel Software-1
45 pages
001__DDS-IIIT-Jan-10th
No ratings yet
001__DDS-IIIT-Jan-10th
34 pages
Partitioning
No ratings yet
Partitioning
37 pages
Project - ParallelComputing BSR v2
No ratings yet
Project - ParallelComputing BSR v2
40 pages
E- Notes -HPC-Unit 3-1
No ratings yet
E- Notes -HPC-Unit 3-1
26 pages
Lectures5 14
No ratings yet
Lectures5 14
85 pages
p1
No ratings yet
p1
30 pages
P 1
No ratings yet
P 1
44 pages
High Performance Computing
No ratings yet
High Performance Computing
17 pages
Group3_Parallel_Computing_Techniques_presentation power point 2025 (2)
No ratings yet
Group3_Parallel_Computing_Techniques_presentation power point 2025 (2)
27 pages
Parallel Programming Module 1
No ratings yet
Parallel Programming Module 1
71 pages
Parallel Computing
No ratings yet
Parallel Computing
91 pages
PA midsem
No ratings yet
PA midsem
20 pages
Parallel Programming
No ratings yet
Parallel Programming
42 pages
Introduction To Parallel Programming: Linda Woodard CAC 19 May 2010
100% (1)
Introduction To Parallel Programming: Linda Woodard CAC 19 May 2010
38 pages
HPC Module 4
No ratings yet
HPC Module 4
18 pages
PARALLEL VS DISTRIBUTED COMPUTING
No ratings yet
PARALLEL VS DISTRIBUTED COMPUTING
9 pages
Chap 4-7 - Parallel - Abstractions - and - MPI
No ratings yet
Chap 4-7 - Parallel - Abstractions - and - MPI
34 pages
L1.3a HPC Concepts
No ratings yet
L1.3a HPC Concepts
43 pages
Parallel and Distributed Algorithms-IMPORTANT QUESTION
100% (1)
Parallel and Distributed Algorithms-IMPORTANT QUESTION
15 pages
Intro Parallel Programming 2015
No ratings yet
Intro Parallel Programming 2015
38 pages
Week1 - Parallel and Distributed Computing
100% (1)
Week1 - Parallel and Distributed Computing
46 pages
Clase01 - Introducción Al Paralelismo
No ratings yet
Clase01 - Introducción Al Paralelismo
30 pages
Clase01 - Introducción Al Paralelismo
No ratings yet
Clase01 - Introducción Al Paralelismo
30 pages
HPC Lecture 2 Points
No ratings yet
HPC Lecture 2 Points
7 pages
Module 1 Chapter2
No ratings yet
Module 1 Chapter2
100 pages
hpc_parallel
No ratings yet
hpc_parallel
122 pages
ParallelIzation Principles
No ratings yet
ParallelIzation Principles
40 pages
Parallel Programming: Sathish S. Vadhiyar Course Web Page
No ratings yet
Parallel Programming: Sathish S. Vadhiyar Course Web Page
36 pages
BDS Session 2
No ratings yet
BDS Session 2
58 pages
IT105 Midterm Lecture Part1
No ratings yet
IT105 Midterm Lecture Part1
5 pages
PDC Complete Course File
No ratings yet
PDC Complete Course File
422 pages
Programação Paralela e Distribuída
No ratings yet
Programação Paralela e Distribuída
39 pages
Unit 3 Parallel Programming: Structure Nos
No ratings yet
Unit 3 Parallel Programming: Structure Nos
26 pages
Overview of Parallel Computing: Shawn T. Brown
No ratings yet
Overview of Parallel Computing: Shawn T. Brown
46 pages
1. introduction
No ratings yet
1. introduction
17 pages
Slides Taken From: Parallel Computing Platforms
No ratings yet
Slides Taken From: Parallel Computing Platforms
11 pages
Distributedcomp
No ratings yet
Distributedcomp
13 pages
Intro_HPC_IITK
No ratings yet
Intro_HPC_IITK
44 pages
COA UNIT 5 (AutoRecovered)
No ratings yet
COA UNIT 5 (AutoRecovered)
14 pages
Parallel Processing
No ratings yet
Parallel Processing
31 pages
Cloud Computing - Lecture 3
No ratings yet
Cloud Computing - Lecture 3
22 pages
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
From Everand
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
Jonathan Rigdon
No ratings yet
Build Your Own Distributed Compilation Cluster - A Practical Walkthrough
From Everand
Build Your Own Distributed Compilation Cluster - A Practical Walkthrough
Hunter Davis
No ratings yet
Dataflow and Reactive Programming Systems
From Everand
Dataflow and Reactive Programming Systems
Matt Carkci
No ratings yet
Programming with Patterns in Parallel and Distributed Systems
From Everand
Programming with Patterns in Parallel and Distributed Systems
Pasquale De Marco
No ratings yet
The Complete Future Trait Guide
From Everand
The Complete Future Trait Guide
Hamze Ghalebi
No ratings yet
Quantum Computer Vs Traditional Computer
From Everand
Quantum Computer Vs Traditional Computer
Arief Muinnudin
No ratings yet
The Software Programmer: Basis of common protocols and procedures
From Everand
The Software Programmer: Basis of common protocols and procedures
S Mathioudakis
No ratings yet
AIM Methodology
No ratings yet
AIM Methodology
46 pages
B HyperFlex Systems Upgrade Guide 2 6 Chapter 010 PDF
No ratings yet
B HyperFlex Systems Upgrade Guide 2 6 Chapter 010 PDF
12 pages
FG - Auto Distribute
No ratings yet
FG - Auto Distribute
3 pages
MICE Vendor Database
No ratings yet
MICE Vendor Database
5 pages
DFS Basics
No ratings yet
DFS Basics
3 pages
Figures - Chapter 1
No ratings yet
Figures - Chapter 1
8 pages
Principles of Compiler Design: Amey Karkare Department of Computer Science and Engineering, IIT Kanpur
No ratings yet
Principles of Compiler Design: Amey Karkare Department of Computer Science and Engineering, IIT Kanpur
32 pages
CLOUD COMPUTING FULL - 230912 - 062952 - Removed - 230913 - 123329
No ratings yet
CLOUD COMPUTING FULL - 230912 - 062952 - Removed - 230913 - 123329
49 pages
discover-chat clean architecture
No ratings yet
discover-chat clean architecture
3 pages
CIS Controls Version 8.1!6!24 2024
No ratings yet
CIS Controls Version 8.1!6!24 2024
46 pages
Module 2 Acquisition and Sourcing
No ratings yet
Module 2 Acquisition and Sourcing
108 pages
Avira Notification Disable
100% (14)
Avira Notification Disable
1 page
Bizagi System Administration
No ratings yet
Bizagi System Administration
562 pages
Anish Shah Prac3
No ratings yet
Anish Shah Prac3
5 pages
Itika Jindal Resume
No ratings yet
Itika Jindal Resume
2 pages
Oracle.1z0-347.v2019-07-01.q50: Leave A Reply
No ratings yet
Oracle.1z0-347.v2019-07-01.q50: Leave A Reply
19 pages
Protection & Security of Operating Systems
No ratings yet
Protection & Security of Operating Systems
12 pages
6 GEN Iot Connected Home Ebook Iot With Aws May 2018
No ratings yet
6 GEN Iot Connected Home Ebook Iot With Aws May 2018
6 pages
FAQ List GWI V7 Final
No ratings yet
FAQ List GWI V7 Final
26 pages
Compute Engine & VPC - 20 - PCA 24239
No ratings yet
Compute Engine & VPC - 20 - PCA 24239
10 pages
Onapsis SAP RECON CyberRisk Assessment
No ratings yet
Onapsis SAP RECON CyberRisk Assessment
2 pages
Udit Joshi Resume PDF
No ratings yet
Udit Joshi Resume PDF
1 page
Improving ASM Disk Discovery Time PDF
No ratings yet
Improving ASM Disk Discovery Time PDF
8 pages
Vincent's Resume
No ratings yet
Vincent's Resume
1 page
csf2 References
No ratings yet
csf2 References
145 pages
AxUtil and Windows PowerShell Commands For Deploying Models (AX
No ratings yet
AxUtil and Windows PowerShell Commands For Deploying Models (AX
3 pages
ESG and Digital Technology Integration Course
No ratings yet
ESG and Digital Technology Integration Course
1 page
Furqan
No ratings yet
Furqan
4 pages
SQA Question Bank
No ratings yet
SQA Question Bank
4 pages
Oracle Cloud PM 2
No ratings yet
Oracle Cloud PM 2
23 pages

Parallel Computing

Uploaded by

Parallel Computing

Uploaded by

Lecture 2

More about Parallel

Uniform Memory Access (UMA)

• Equal accesses and access times to memory

• Not all processors have equal memory

• Processors have own memory

• Use in largest and Fasted computers in the

Multiple Program Multiple Data (MPMD)

• Easy to parallelize problem

• A problem with little-to-no parallelism

• One of the first steps in designing a parallel program is to break the

There are different ways to partition data:

You DO need communications

Performance Analysis and Tuning

You might also like