0% found this document useful (0 votes)

191 views

High Performance Computing Lecture 2 Parallel Programming With MPI Pub

This document provides an overview of a lecture on parallel programming with MPI (Message Passing Interface). The lecture introduces MPI concepts and how MPI is used for parallel programming. It also discusses how parallel programming with MPI is commonly applied to physics and engineering applications in scientific computing and simulation. The lecture aims to help students understand parallel programming techniques and how to take advantage of high performance computing systems and technologies.

Uploaded by

Яeader

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

191 views

High Performance Computing Lecture 2 Parallel Programming With MPI Pub

Uploaded by

Яeader

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

High Performance Computing

ADVANCED SCIENTIFIC COMPUTING

Prof. Dr. – Ing. Morris Riedel
Adjunct Associated Professor
School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland
Research Group Leader, Juelich Supercomputing Centre, Forschungszentrum Juelich, Germany

@Morris Riedel @MorrisRiedel @MorrisRiedel

LECTURE 2

Parallel Programming with MPI
September 9, 2019
Room V02‐156
Review of Lecture 1 – High Performance Computing (HPC)

 HPC Basics  HPC Ecosystem Technologies
multi‐core processors
many‐core processors
with high single‐thread
with moderate single
performance
thread performance
used in parallel computing
used in parallel computing

not only used in physical
modeling and simulation
distributed memory
sciences today, but also for
architectures using
machine & deep learning
the Message Passing
Interface (MPI)
‘big data’

[3] J. Haut, G. Cavallaro [4] F. Berman: Maximising the
[1] Distributed & Cloud Computing Book [2] Introduction to High Performance Computing for Scientists and Engineers and M. Riedel et al. Potential of Research Data

Lecture 2 – Parallel Programming with MPI 2 / 50

Outline of the Course

1. High Performance Computing 11. Scientific Visualization & Scalable Infrastructures

2. Parallel Programming with MPI 12. Terrestrial Systems & Climate

13. Systems Biology & Bioinformatics
3. Parallelization Fundamentals
14. Molecular Systems & Libraries
4. Advanced MPI Techniques
15. Computational Fluid Dynamics & Finite Elements
5. Parallel Algorithms & Data Structures
16. Epilogue
6. Parallel Programming with OpenMP
7. Graphical Processing Units (GPUs) + additional practical lectures & Webinars for our
hands‐on assignments in context
8. Parallel & Scalable Machine & Deep Learning
9. Debugging & Profiling & Performance Toolsets
 Practical Topics
10. Hybrid Programming & Patterns  Theoretical / Conceptual Topics
Lecture 2 – Parallel Programming with MPI 3 / 50
Outline

 Message Passing Interface (MPI) Concepts  Promises from previous lecture(s):

 Practical Lecture 0.2: Lecture 2 will
 Modular Supercomputing Architecture & Application Examples provide a full introduction and many
 Distributed Memory Computers & MPI Standard for Portability more examples of the Message Passing
Interface (MPI) for parallel
 Point‐to‐Point Message Passing Functions programming
 Lecture 1: Lecture 2 & 4 will give in-
 Understanding MPI Collectives depth details on the distributed-
memory programming model with the
 Using MPI Ranks & Communicators Message Passing Interface (MPI)
 Lecture 1: Lecture 2 will provide a full
introduction and many more examples
of the Message Passing Interface (MPI)
 MPI Parallel Programming Basics for parallel programming

 Jötunn HPC Environment with Libraries & Modules
 Thinking Parallel & Step‐wise Walkthrough for Parallel Programming
 Basic Building Blocks of a Parallel Program
 Code Compilation & Parallel Executions
 Simple PingPong Application Example

Lecture 2 – Parallel Programming with MPI 4 / 50

Selected Learning Outcomes

 Students understand…
 Latest developments in parallel processing & high performance computing (HPC)
 How to create and use high‐performance clusters
 What are scalable networks & data‐intensive workloads
 The importance of domain decomposition
 Complex aspects of parallel programming
 HPC environment tools that support programming
or analyze behaviour
 Different abstractions of parallel computing on various levels
 Foundations and approaches of scientific domain‐
specific applications
 Students are able to …
 Programm and use HPC programming paradigms
 Take advantage of innovative scientific computing simulations & technology
 Work with technologies and tools to handle parallelism complexity
Lecture 2 – Parallel Programming with MPI 5 / 50
Message Passing Interface (MPI) Concepts

Lecture 2 – Parallel Programming with MPI 6 / 50

Parallel Programming with MPI – Physics & Engineering Applications for HPC

 Parallel programming  Parallel programming

with MPI in physics with MPI in physics
and engineering and engineering
applications are often applications typically
based on known simulate or model a
physical laws using specific area (i.e., a
iterative numerical model space) over a
methods and are often specific time (i.e.,
called simulation simulation time)
sciences or
computational
sciences Numerical calculations… Model
…simulation over time
 Parallel Programming
with MPI can be
considered as one sub
area of scientific
programming and/or
scientific computing Experiment Theory
‘we observe ‘we create
the nature‘ a model
of nature'

 Lecture 12 – 15 will offer more insights into a wide variety of physics & engineering applications that take advantage of HPC with MPI
Lecture 2 – Parallel Programming with MPI 7 / 50
Parallel Programming with MPI – Data Science Applications for HPC

 Machine Learning Algorithms
 Example: Highly Parallel Density‐based spatial clustering of applications with noise (DBSCAN)
 Selected Applications: Clustering different cortical layers in brain tissue & point cloud data analysis

Clustering

[11] M. Goetz and M. Riedel et al,
Proceedings IEEE Supercomputing Conference, 2015

 Lecture 8 will provide more details on MPI application examples with a particular focus on parallel and scalable machine learning
Lecture 2 – Parallel Programming with MPI 8 / 50
Example: Modular Supercomputing Architecture – MPI Usage in Cluster Module

we focus in this
 The Cluster Module (CM) offers a
Cluster Nodes (CNs) with high lecture only on
single-thread performance and a this module
universal Infiniband interconnect
 Given the CM architecture setup
it works very well for applications
that take advantage of MPI

network
interconnection
important

 The modular supercomputing architecture (MSA)

[7] DEEP Projects Web Page enables a flexible HPC system design co-designed
by the need of different application workloads

Lecture 2 – Parallel Programming with MPI 9 / 50

Application Example: Formula Race Car Design & Room Heat Dissipation

 Pro: Network communication is relativel hidden and supported
 Contra: Programming with MPI still requires using ‘parallelization methods’
 Not easy: Write ‘technical code’ well integrated in ‘problem‐domain code’
 Example: Race Car Simulation &
Heat dissipation in a Room
 Apply a good parallelization method
(e.g. domain decomposition) time t

 Write manually good MPI code for
(technical) communication
between processors
time t
(e.g. across 1024 cores)
 Integrate well technical code
[10] Modified from [2] Introduction to High Performance Computing
with problem‐domain code Caterham F1 team for Scientists and Engineers
(e.g. computational fluid dynamics & airflow)
 Lecture 3 will provide more details on MPI application examples with a particular focus on parallelization fundamentals
Lecture 2 – Parallel Programming with MPI 10 / 50
Distributed‐Memory Computers – Revisited (cf. Lecture 1)

 A distributed-memory parallel computer establishes a ‘system view’

where no process can access another process’ memory directly

[2] Introduction to High Performance Computing for Scientists and Engineers

time t
dominant
programming model
Message Passing
Interface (MPI)
time t
 Features
 Processors communicate via Network Interfaces (NI) [10] Modified from
Caterham F1 team
[2] Introduction to High Performance Computing
for Scientists and Engineers
 NI mediates the connection to a Communication network
 This setup is rarely used  a programming model view today
Lecture 2 – Parallel Programming with MPI 11 / 50
Programming with Distributed Memory using MPI – Revisited (cf. Lecture 1)

 Distributed-memory programming enables

explicit message passing as communication between processors
 Message Passing Interface (MPI) is dominant distributed-memory
programming standard today (available in many different version)
 MPI is a standard defined and developed by the MPI Forum

[5] MPI Standard

 Features
 No remote memory access on distributed‐memory systems
 Require to ‘send messages’ back and forth between processes PX
 Many free Message Passing Interface (MPI) libraries available
 Programming is tedious & complicated, but most flexible method P1 P2 P3 P4 P5

 Lecture 4 will provide more details on advanced functions of the Message Passing Interface (MPI) standard and its use in applications
Lecture 2 – Parallel Programming with MPI 12 / 50
GNU OpenMPI Implementation

 Message Passing Interface (MPI)
 A standardized and portable message‐passing standard
 Designed to support different HPC architectures
 A wide variety of MPI implementations exist
 Standard defines the syntax and semantics
of a core of library routines used in C, C++ & Fortran [5] MPI Forum

 OpenMPI Implementation
 Open source license based on the BSD license
 Full MPI (version 3) standards conformance [6] OpenMPI Web page
 Developed & maintained by a consortium of
academic, research, & industry partners
 Typically available as modules on HPC systems and used with mpicc compiler
 Often built with the GNU compiler set and/or Intel compilers

 Lecture 2 will provide a full introduction and many more examples of the Message Passing Interface (MPI) for parallel programming
Lecture 2 – Parallel Programming with MPI 13 / 50
What is MPI from a Technical Perspective?

 ‘Communication library’ abstracting from low‐level network view
 Offers 500+ available functions to communicate between computing nodes
 Practice reveals: parallel applications often require just ~12 (!) functions
 Includes routines for efficient ‘parallel I/O’ (using underlying hardware)

 Supports ‘different ways of communication’
P1 P2 P3 P4 P5
 ‘Point‐to‐point communication’ between two computing nodes (P P)
 Collective functions involve ‘N computing nodes in useful communiction’  Computing nodes
are independent
computing
processors (that
 Deployment on Supercomputers supporting Applications Portability may also have N
cores each) and
 Installed on (almost) all parallel computers that are all part of
one big parallel
 Different languages: C, Fortran, Python, R, etc. computer (e.g.
hybrid architecture,
 Careful: Different versions might be installed cf. Lecture 1)

Lecture 2 – Parallel Programming with MPI 14 / 50

MPI Standard enables Portability of Applications

 Key reasons for requiring a standard programming library
 Technical advancement in supercomputers is extremely fast
 Parallel computing experts switch organizations and face another system
 Applications using proprietary libraries where not portable
 Create whole applications from scratch or time‐consuming code updates
 MPI changed this & is dominant parallel programming model
P1 P2 P3 P4 P5
 Works for different
MPI standard  MPI is an open
implementations standard that
significantly supports
 E.g., MPICH the portability
HPC Machine A HPC Machine B of parallel
 E.g., Parastation MPI Porting a parallel applications across a
MPI Library MPI Library wide variety of
 E.g., OpenMPI MPI application different HPC systems
 Etc. and supercomputer
architectures

Lecture 2 – Parallel Programming with MPI 15 / 50

Is MPI yet another Network Library?

 TCP/IP and socket programming libraries are plentiful available
 Do we need a dedicated communication & network protocols library?
 Goal: simplify programming in parallel programming Over the Internet?
 Focus on scientific and engineering applications with mathematical calculations
 Enable parallel and scalable machine and deep learning algorithms
 Selected reasons
P1 P2 P3 P4 P5
 Designed for performance within large parallel computers (e.g. no security)
 Supports various interconnects between ‘computing nodes’ (hardware)
 Offers various benefits like ‘reliable messages’ or ‘in‐order arrivals’

 MPI is not designed to handle any communication in computer networks and is thus very special
 MPI is not good for clients that constantly establishing/closing connections again and again (e.g. would have very slow performance in MPI)
 MPI is notgood for internet chat clients or Web service servers in the Internet (e.g. no security beyond firewalls, no message encryption
directly available, etc.)

Lecture 2 – Parallel Programming with MPI 16 / 50

Message Passing: Exchanging Data with MPI Send/Receive

Compute NEW: 17
Node
P P

DATA: 17 M M DATA: 80
P1 P2 P3 P4 P5
HPC Machine
NEW: 06  Each processor has
its own data in its
memory that
MPI point‐to‐point
communications P P can not be
seen/accessed by
other processors

DATA: 06 M M DATA: 19

Lecture 2 – Parallel Programming with MPI 17 / 50

Collective Functions : Broadcast (one‐to‐many)

NEW: 17

P P

DATA: 17 M M DATA: 80
P1 P2 P3 P4 P5

NEW: 17  Broadcast
NEW: 17 distributes the
same data to many
P P or even all other
processors

DATA: 06 M M DATA: 19

Lecture 2 – Parallel Programming with MPI 18 / 50

Collective Functions: Scatter (one‐to‐many)

P P NEW: 10

DATA: 10
DATA: 20 M M DATA: 80
DATA: 30 P1 P2 P3 P4 P5
 Scatter distributes
different data to
many or even all

NEW: 30
P P NEW: 20 other processors

DATA: 06 M M DATA: 19

Lecture 2 – Parallel Programming with MPI 19 / 50

Collective Functions: Gather (many‐to‐one)

NEW: 80
NEW: 19
NEW: 06 P P

DATA: 17 M M DATA: 80
P1 P2 P3 P4 P5
 Gather collects data
from many or even
all other processors
P P to one specific
processor

DATA: 06 M M DATA: 19

Lecture 2 – Parallel Programming with MPI 20 / 50

Collective Functions: Reduce (many‐to‐one)

NEW: 122 global sum
+ as example
+ P P
+ +
DATA: 17 M M DATA: 80
P1 P2 P3 P4 P5
+
 Reduce combines
collection with
computation based on
P P data from many or even
all other processors
 Usage of reduce
includes finding a
global minimum or

DATA: 06 M M DATA: 19
maximum, sum, or
product of the different
data located
at different processors
Lecture 2 – Parallel Programming with MPI 21 / 50
Using MPI Ranks & Communicators

 Answers the following question:
(numbers reflect  How do I know where to send/receive to/from?
unique identity
of processor  Each MPI activity specifies the context in
named ‘MPI rank)
which a corresponding function is performed
 MPI_COMM_WORLD
(region/context of all processes)
 Create (sub‐)groups of the processes / virtual
groups of processes
 Peform communications only within these sub‐
groups easily with well‐defined processes

 Using communicators wisely in collective functions

can reduce the number of affected processors
[8] LLNL MPI Tutorial  MPI rank is a unique number for each processor

 Lecture 4 on advanded MPI techniques will provide details about the often used MPI cartesian communicator & its use in applications
Lecture 2 – Parallel Programming with MPI 22 / 50
[Video] Introducing MPI – Summary

[9] Introducting MPI, YouTube Video

Lecture 2 – Parallel Programming with MPI 23 / 50

MPI Parallel Programming Basics

Lecture 2 – Parallel Programming with MPI 24 / 50

Starting Parallel Programming – What do we need?

 Check access to the cluster machine
 Check MPI standard implementation and its version
 Often SSH is used to remotely access clusters

 OpenMPI
 ‘Open Source High Performance Computing’
 Using the module environment
(cf. Practical Lecture 0.2)

 Other Implementations exists [6] OpenMPI Web page

 E.g., MPICH implementation
 E.g., Parastation MPI implementation [12] Icelandic HPC Machines & Community
 (we don‘t use those in this course)

Lecture 2 – Parallel Programming with MPI 25 / 50

HPC System – Jötunn Cluster – Revisited (cf. Practical Lecture 0.1)

 4 Nodes
 Cpu: 2x Intel Xeon CPU E5‐2690 v3 @ 2.60GHz
(2.6 GHz, 12 core)
 Memory
 128GB DDR4
 Interconnect
 10 Gb/s Ethernet
 Ganglia monitoring
service
 Shows usage of CPUs

[12] Icelandic HPC Machines & Community

 We will have a visit to computing room of Jötunn to ‘touch metal’ and will meet our HPC System expert Hjörleifur Sveinbjörnsson
Lecture 2 – Parallel Programming with MPI 26 / 50
SSH Access to HPC System – Jötunn HPC System Example – Revisited

 Example: first login via Hekla (if you are not in Uni network)

[12] Icelandic HPC Machines & Community

Lecture 2 – Parallel Programming with MPI 27 / 50

Step 1: SSH Access to HPC System – Jötunn HPC System Example

Jötunn HPC System

Hekla System

Lecture 2 – Parallel Programming with MPI 28 / 50

Step 2: Edit a Text File – Simple Hello World C Programm – Revisited
 #include is used for C header files that is a file that contains function
#include <stdio.h>
declarations for C in-built library functions; stdio.h is the standard input and
output library for C

int main()  The main function is ‘called‘ by the operating system when a user runs the C
{ program – but essentially a usual c function with optional parameters that we
will explore during the course of the lecture series

printf("Hello, World!");  The printf() function sends formatted text as output to stdout and is often
used for simple debugging of C programs

return 0;  Return provides return values to the calling function; in the case of the main
function this can be considered as an exit status code for the OS. Mostly, 0
} exit code signifies a normal run (no errors) and a non 0 exit code (e.g., 1)
usually means there was a problem and the program had to exit abnormally.

 Simple C Program
 Above file content is stored in file hello.c using a C compiler
C
 Although .c file extension it remains a normal text file
 hello.c is not executable as C programm  it needs a compilation hello.c

Lecture 2 – Parallel Programming with MPI 29 / 50

New Steps Required: Start ‘Thinking’ Parallel

 Parallel Processing Approach  SPMD stands

for Single
 Parallel MPI programs know about the existence of other processes of it Program
Multiple Data
and what their own role is in the bigger picture
 MPI programs are written in a sequential
programming language, but executed in parallel
 Same MPI program runs on all processes (SPMD)

P P
 Data exchange is key for design of applications
 Sending/receiving data at specific times in the program
 No shared memory for sharing variables with other remote processes
 Messages can be simple variables (e.g. a word) or complex structures

 Start with the basic building blocks using MPI P P …
 Building up the ‘parallel computing environment’

Lecture 2 – Parallel Programming with MPI 30 / 50

Step 3: Edit a Text File – (MPI) Basic Building Blocks: Variables & Output
 The main function is ‘called‘ by the operating system when a user runs the C
#include <stdio.h>
program – but essentially a usual c function with optional parameters that we
added here to be used later in the initialization of the MPI environment

int main(int argc, char** argv)  Two integer variables that are later useful for working with specific data
{ obtained from the specific MPI library that we need to add in the next step too
in order to fill information into the integer variables about rank and sizes
int rank, size;
 The printf() function sends formatted text as output to stdout and
printf("Hello World, I am %d out of %d\n", is often used for simple debugging of C programs
rank, size);  Thinking in parallel in parallel programming is to understand that
different processes have an identity and work on different
elements of the program
return 0;  In the example we want to give an output that shows the identity
of each MPI process by using the rank and size information
}

 Extended Simple C Program (still C only)
 Above file content is stored in file hello.c using a C compiler
C
 Selected changes to the basic c program structure to prepare for MPI
 hello.c is not executable as C programm  it needs a compilation hello.c

Lecture 2 – Parallel Programming with MPI 31 / 50

Step 4: Edit a Text File – MPI Basic Building Blocks: Header & Init/Finalize
 Libraries can be used by including C header files, here the library for MPI is
#include <stdio.h>
included in order to use several MPI functions in our extended C program
#include <mpi.h>

int main(int argc, char** argv)

{  The MPI_INIT() function initializes the MPI environment and can take inputs
via the main() function arguments
int rank, size;
MPI_Init(&argc, &argv);
printf("Hello World, I am %d out of %d\n",
rank, size);  MPI_Finalize() shuts down the MPI environment
 After MPI_Finalize() no parallel execution of the code can take place)
MPI_Finalize();
return 0;
} using a C compiler
C
 Extended Simple C Program hello.c
 hello.c is not executable as C programm  it needs a compilation

Lecture 2 – Parallel Programming with MPI 32 / 50

Step 4: Edit a Text File – MPI Basic Building Blocks: Rank & Size Variables

#include <stdio.h>  The MPI_Comm_size()

#include <mpi.h> function determines the
overall number of n
processes in the parallel
program: stores it in
int main(int argc, char** argv) variable size
{  The MPI_Comm_rank()
function determines the
int rank, size; unique identifier for each
processor:
MPI_Init(&argc, &argv); stores it in variable rank
with valures (0 … n-1)
MPI_Comm_size(MPI_COMM_WORLD, &size);
 MPI_COMM_WORLD
MPI_Comm_rank(MPI_COMM_WORLD, &rank); communicator constant
denotes the ‘region of
printf("Hello World, I am %d out of %d\n", communication’, here all
rank, size); processes

MPI_Finalize();
[8] LLNL MPI Tutorial
return 0;
} using a C compiler
C
 Extended Simple C Program
 hello.c is not executable as C programm  it needs a compilation hello.c

Lecture 2 – Parallel Programming with MPI 33 / 50

Step 5: Load the right Modules for Compilers & Compile C Program (1)

using a C compiler
C
hello.c

[12] Icelandic HPC Machines & Community

Lecture 2 – Parallel Programming with MPI 34 / 50

HPC System Module Environment – Revisited (cf. Practical Lecture 0.1)

 Knowledge of installed compilers essential (e.g. C, Fortran90, etc.)
 Different versions and types of compilers exist (Intel, GNU, MPI, etc.)
 E.g. mpicc pingpong.c –o pingpong
 Module environment tool
 Avoids to manually setup environment information for every application
 Simplifies shell initialization and lets users easily modify their environment
 Modules can be loaded and unloaded
 Enable the installation of software in different versions
 Module avail
 Lists all available modules on the HPC system (e.g. compilers, MPI, etc.)
 Module load
 Loads particular modules into the current work environment [12] Icelandic HPC Machines & Community

 E.g. module load gnu openmpi

Lecture 2 – Parallel Programming with MPI 35 / 50

GNU OpenMPI Implementation – Revisited

Lecture 2 – Parallel Programming with MPI 36 / 50

Step 5: Load the right Modules for Compilers & Compile C Program (2)

 Using modules to get the
right C compiler for
compiling hello.c
 ‘module load gnu openmpi‘
 Note: there are many C
compilers available, we
using a C compiler
here pick one for our C
particular HPC course that
works with the Message hello.c mpicc
Passing Interface (MPI)
 Note: If there are no errors,
the file hello is now a full C
C program executable that
can be started by an OS hello
executable
 New: C program with MPI statements [12] Icelandic HPC Machines & Community

(cf. Practical Lecture 0.2 w/o MPI statements)
Lecture 2 – Parallel Programming with MPI 37 / 50
Step 6: Parallel Processing – Executing an MPI Program with MPIRun & Script (1)

 Compilation done In Step 5
 Compilers and linkers need various information where include files and libraries can be found
 E.g. C header files like ‘mpi.h’
 Compiling is different for each programming language hello hello
 Example to understand distribution of program P P
 E.g., executing the MPI program on 4 processors
 Normally batch system allocations mpirun
(cf. Practical Lecture 0.2) M M
 Understanding role
of mpirun is important create 4 processes that produce hello hello
output in parallel
 Output of the program P P
 Order of outputs
can vary because I/O
screen ‘serial resource’ M M
Lecture 2 – Parallel Programming with MPI 38 / 50
Step 6: Parallel Processing – Executing an MPI Program with MPIRun & Script (2)

 Need of Job script
 Example using mpirun

hello hello

P P
mpirun
M M
create 4 processes that produce hello hello
output in parallel
 Step‐Wise Walkthrough P P
 All performed steps should be done
in same manner for all MPI jobs
M M
Lecture 2 – Parallel Programming with MPI 39 / 50
Step 6: Parallel Processing – Executing an MPI Program with MPIRun & Script (3)

 Submission using the Scheduler
 Example: SLURM on Jötunn HPC system
 Scheduler allocated 4 nodes as requested
 MPIRun and scheduler distribute the executable on right nodes
 Output consists of
the combined
output of all 4
requested nodes

Scheduler
Jötunn login node

Jötunn compute nodes
output file

Lecture 2 – Parallel Programming with MPI 40 / 50

Message Passing: Exchanging Data with MPI Send/Receive – Revisited

Compute NEW: 17
Node
P P

DATA: 17 M M DATA: 80
P1 P2 P3 P4 P5
HPC Machine
NEW: 06  Each processor has
its own data in its
memory that
MPI point‐to‐point
communications P P can not be
seen/accessed by
other processors

DATA: 06 M M DATA: 19

Lecture 2 – Parallel Programming with MPI 41 / 50

Message Passing: Exchanging Data with MPI Send/Receive – Example

 Example: pingpong.c

Lecture 2 – Parallel Programming with MPI 42 / 50

Collective Functions : Broadcast (one‐to‐many) – Revisited

NEW: 17

P P

DATA: 17 M M DATA: 80
P1 P2 P3 P4 P5

NEW: 17  Broadcast
NEW: 17 distributes the
same data to many
P P or even all other
processors

DATA: 06 M M DATA: 19

Lecture 2 – Parallel Programming with MPI 43 / 50

Collective Functions : Broadcast (one‐to‐many) – Example

 Example: broadcast.c

Lecture 2 – Parallel Programming with MPI 44 / 50

Summary of the Parallel Environment & Message Passing

P
M

P P P …
M M M …

P
modified from [8] LLNL MPI Tutorial
M
Lecture 2 – Parallel Programming with MPI 45 / 50
[Video] OpenMPI

[13] What is OpenMPI, YouTube Video

Lecture 2 – Parallel Programming with MPI 46 / 50

Lecture Bibliography

Lecture 2 – Parallel Programming with MPI 47 / 50

Lecture Bibliography (1)

 [1] K. Hwang, G. C. Fox, J. J. Dongarra, ‘Distributed and Cloud Computing’, Book, Online:
http://store.elsevier.com/product.jsp?locale=en_EU&isbn=9780128002049
 [2] Introduction to High Performance Computing for Scientists and Engineers, Georg Hager & Gerhard Wellein, Chapman & Hall/CRC Computational Science,
ISBN 143981192X, English, ~330 pages, 2010, Online:
http://www.amazon.de/Introduction‐Performance‐Computing‐Scientists‐Computational/dp/143981192X
 [3] J. Haut, G. Cavallaro and M. Riedel et al., IEEE Transactions on Geoscience and Remote Sensing, 2019, Online:
https://www.researchgate.net/publication/335181248_Cloud_Deep_Networks_for_Hyperspectral_Image_Analysis
 [4] Fran Berman, ‘Maximising the Potential of Research Data’
 [5] The MPI Standard, Online:
http://www.mpi‐forum.org/docs/
 [6] OpenMPI Web page, Online:
https://www.open‐mpi.org/
 [7] DEEP Projects Web page, Online:
http://www.deep‐projects.eu/
 [8] LLNL MPI Tutorial, Online:
https://computing.llnl.gov/tutorials/mpi/
 [9] HPC – Introducting MPI, YouTube Video, Online:
http://www.youtube.com/watch?v=kHV6wmG35po
 [10] Caterham F1 Team Races Past Competition with HPC, Online:
http://insidehpc.com/2013/08/15/caterham‐f1‐team‐races‐past‐competition‐with‐hpc
 [11] M. Goetz, C. Bodenstein, M. Riedel, ‘HPDBSCAN – Highly Parallel DBSCAN’, in proceedings of the ACM/IEEE International Conference for High Performance
Computing, Networking, Storage, and Analysis (SC2015), Machine Learning in HPC Environments (MLHPC) Workshop, 2015, Online:
https://www.researchgate.net/publication/301463871_HPDBSCAN_highly_parallel_DBSCAN
Lecture 2 – Parallel Programming with MPI 48 / 50
Lecture Bibliography (2)

 [12] Icelandic HPC Machines & Community, Online:
http://ihpc.is
 [13] YouTube Video, What is OpenMPI, Online:
http://www.youtube.com/watch?v=D0‐xSWBGNAw

Lecture 2 – Parallel Programming with MPI 49 / 50

Lecture 2 – Parallel Programming with MPI 50 / 50

OpenAI SOC 3 Report
No ratings yet
OpenAI SOC 3 Report
12 pages
High Performance Computing Update 0908 - InCOSE
No ratings yet
High Performance Computing Update 0908 - InCOSE
19 pages
Lecture Week - 1 Introduction 1 - SP-24
No ratings yet
Lecture Week - 1 Introduction 1 - SP-24
51 pages
Microsoft in High Performance Computing: An Introduction: Aditya Krishnan Technical Product Manager Microsoft Corp
No ratings yet
Microsoft in High Performance Computing: An Introduction: Aditya Krishnan Technical Product Manager Microsoft Corp
21 pages
FOSDEM14 HPC Devroom 12 Sniper
No ratings yet
FOSDEM14 HPC Devroom 12 Sniper
33 pages
HPC - Unit Test-I (9 July 2020) : Mark Only One Oval
No ratings yet
HPC - Unit Test-I (9 July 2020) : Mark Only One Oval
5 pages
Nvidia - Rapids
No ratings yet
Nvidia - Rapids
33 pages
CC 1 Unit Notes
No ratings yet
CC 1 Unit Notes
8 pages
Tesla V100 Performance Guide
No ratings yet
Tesla V100 Performance Guide
23 pages
NGC Registry Launch Technical Overview
No ratings yet
NGC Registry Launch Technical Overview
11 pages
Dgx1 v100 System Architecture Whitepaper
No ratings yet
Dgx1 v100 System Architecture Whitepaper
43 pages
Nvidia Opencl Best Practices Guide: Optimization
No ratings yet
Nvidia Opencl Best Practices Guide: Optimization
49 pages
HPC Datasheet sc23 h200 Datasheet 3002446
No ratings yet
HPC Datasheet sc23 h200 Datasheet 3002446
3 pages
High Performance Network-on-Chip Through MPLS
No ratings yet
High Performance Network-on-Chip Through MPLS
4 pages
Using FFmpeg With NVIDIA GPU Hardware Acceleration
No ratings yet
Using FFmpeg With NVIDIA GPU Hardware Acceleration
22 pages
Accelerating Matrix Multiplication With Block Sparse Format and NVIDIA Tensor Cores - NVIDIA Technical Blog
No ratings yet
Accelerating Matrix Multiplication With Block Sparse Format and NVIDIA Tensor Cores - NVIDIA Technical Blog
7 pages
Module 2 Class 1
No ratings yet
Module 2 Class 1
9 pages
TB 04631 001 - v01
No ratings yet
TB 04631 001 - v01
25 pages
ACER Altos R3 Server Datasheet
No ratings yet
ACER Altos R3 Server Datasheet
2 pages
344.48 Nvidia Control Panel Quick Start Guide PDF
No ratings yet
344.48 Nvidia Control Panel Quick Start Guide PDF
33 pages
Nvidia RTX A2000 Datasheet
No ratings yet
Nvidia RTX A2000 Datasheet
1 page
Nvidia DGX Station Print Infographic 738375 Web
No ratings yet
Nvidia DGX Station Print Infographic 738375 Web
1 page
dgx2 User Guide
No ratings yet
dgx2 User Guide
125 pages
Triton X-100-1
No ratings yet
Triton X-100-1
9 pages
2021-02-04 DAIM Company Presentation
No ratings yet
2021-02-04 DAIM Company Presentation
17 pages
Scalar Security Study 2019
No ratings yet
Scalar Security Study 2019
76 pages
CUDA Compute Unified Device Architecture
No ratings yet
CUDA Compute Unified Device Architecture
26 pages
Cap 100M PDF
No ratings yet
Cap 100M PDF
35 pages
362.00 Nvidia Control Panel Quick Start Guide
No ratings yet
362.00 Nvidia Control Panel Quick Start Guide
33 pages
Speaker -A02- 5747- Best Practices in Networking for AI
No ratings yet
Speaker -A02- 5747- Best Practices in Networking for AI
15 pages
Sustainability Analysis On NVDA
100% (1)
Sustainability Analysis On NVDA
24 pages
Zoom System Design PDF
No ratings yet
Zoom System Design PDF
9 pages
Nvidia XID - Errors
No ratings yet
Nvidia XID - Errors
12 pages
ResQ Redis Cache ER Diagram
No ratings yet
ResQ Redis Cache ER Diagram
1 page
Nvidia DGX A100 Datasheet
No ratings yet
Nvidia DGX A100 Datasheet
2 pages
Uncertainty in Modeling
No ratings yet
Uncertainty in Modeling
25 pages
Nvidia Cuda Arc
No ratings yet
Nvidia Cuda Arc
16 pages
Insert Project Title: Business Requirements Specification
No ratings yet
Insert Project Title: Business Requirements Specification
19 pages
Introduction To High Performance Scientific Computing
No ratings yet
Introduction To High Performance Scientific Computing
464 pages
CUDA Installation Guide Windows
100% (1)
CUDA Installation Guide Windows
17 pages
Nvidia DGX Os 4 Server: Software Release Notes
No ratings yet
Nvidia DGX Os 4 Server: Software Release Notes
19 pages
The Right Tools For Professionals: Nvidia Workstation Gpus
No ratings yet
The Right Tools For Professionals: Nvidia Workstation Gpus
4 pages
AWS Storage Use Cases
No ratings yet
AWS Storage Use Cases
12 pages
Session 11-12 - Text Analytics
No ratings yet
Session 11-12 - Text Analytics
38 pages
CUDA Installation Guide Windows
No ratings yet
CUDA Installation Guide Windows
28 pages
Performance Computing
100% (1)
Performance Computing
102 pages
NVIDIA CUDA C Programming Guide 3.1
No ratings yet
NVIDIA CUDA C Programming Guide 3.1
173 pages
High Performance Computing: Course Introduction
No ratings yet
High Performance Computing: Course Introduction
32 pages
Vmware Inside HPC Guide To Virtualization The Cloud and HPC
No ratings yet
Vmware Inside HPC Guide To Virtualization The Cloud and HPC
8 pages
SWOT-Analysis-NVIDIA
No ratings yet
SWOT-Analysis-NVIDIA
7 pages
NPN Opportunity Registration 2.0: Partner Update
No ratings yet
NPN Opportunity Registration 2.0: Partner Update
13 pages
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet
RISC vs CISC
From Everand
RISC vs CISC
Isaac Berners-Lee
No ratings yet
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
From Everand
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
Fouad Sabry
No ratings yet
HDInsight Essentials - Second Edition
From Everand
HDInsight Essentials - Second Edition
Rajesh Nadipalli
No ratings yet
High Performance Computing Lecture 1 HPC Public
No ratings yet
High Performance Computing Lecture 1 HPC Public
50 pages
UNIT 1
No ratings yet
UNIT 1
31 pages
Ebook What Is HPC
No ratings yet
Ebook What Is HPC
25 pages
EAS 520 UmassD Syllabus Sheer
No ratings yet
EAS 520 UmassD Syllabus Sheer
2 pages
Introduction To High-Performance Computing
No ratings yet
Introduction To High-Performance Computing
13 pages
Garden and Park Structures
No ratings yet
Garden and Park Structures
24 pages
Introduction To Neural Networks
100% (1)
Introduction To Neural Networks
46 pages
Communication On Water: Sail More Safely
No ratings yet
Communication On Water: Sail More Safely
6 pages
Thermocouple
No ratings yet
Thermocouple
12 pages
Energy Harvesting From Human Power: A Roadmap To New Research Challenges
No ratings yet
Energy Harvesting From Human Power: A Roadmap To New Research Challenges
20 pages
Books Everyone Should Read
No ratings yet
Books Everyone Should Read
79 pages
Language Certificates
No ratings yet
Language Certificates
2 pages