0% found this document useful (0 votes)
155 views

High Performance Computing Lecture 2 Parallel Programming With MPI Pub

This document provides an overview of a lecture on parallel programming with MPI (Message Passing Interface). The lecture introduces MPI concepts and how MPI is used for parallel programming. It also discusses how parallel programming with MPI is commonly applied to physics and engineering applications in scientific computing and simulation. The lecture aims to help students understand parallel programming techniques and how to take advantage of high performance computing systems and technologies.

Uploaded by

Яeader
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
155 views

High Performance Computing Lecture 2 Parallel Programming With MPI Pub

This document provides an overview of a lecture on parallel programming with MPI (Message Passing Interface). The lecture introduces MPI concepts and how MPI is used for parallel programming. It also discusses how parallel programming with MPI is commonly applied to physics and engineering applications in scientific computing and simulation. The lecture aims to help students understand parallel programming techniques and how to take advantage of high performance computing systems and technologies.

Uploaded by

Яeader
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

High Performance Computing

ADVANCED SCIENTIFIC COMPUTING

Prof. Dr. – Ing. Morris Riedel
Adjunct Associated Professor
School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland
Research Group Leader, Juelich Supercomputing Centre, Forschungszentrum Juelich, Germany

@Morris Riedel @MorrisRiedel @MorrisRiedel


LECTURE 2

Parallel Programming with MPI
September 9, 2019
Room V02‐156
Review of Lecture 1 – High Performance Computing (HPC)

 HPC Basics  HPC Ecosystem Technologies
multi‐core processors 
many‐core processors 
with high single‐thread 
with moderate single 
performance
thread performance
used in parallel computing
used in parallel computing

not only used in physical 
modeling and simulation 
distributed memory
sciences today, but also for 
architectures using
machine & deep learning
the Message Passing
Interface (MPI)
‘big data’

[3] J. Haut, G. Cavallaro  [4] F. Berman: Maximising the 
[1] Distributed & Cloud Computing Book [2] Introduction to High Performance Computing for Scientists and Engineers and M. Riedel et al. Potential of Research Data

Lecture 2 – Parallel Programming with MPI 2 / 50


Outline of the Course

1. High Performance Computing 11. Scientific Visualization & Scalable Infrastructures

2. Parallel Programming with MPI 12. Terrestrial Systems & Climate


13. Systems Biology & Bioinformatics
3. Parallelization Fundamentals
14. Molecular Systems & Libraries
4. Advanced MPI Techniques
15. Computational Fluid Dynamics & Finite Elements
5. Parallel Algorithms & Data Structures
16. Epilogue
6. Parallel Programming with OpenMP
7. Graphical Processing Units (GPUs) + additional practical lectures & Webinars for our
hands‐on assignments in context
8. Parallel & Scalable Machine & Deep Learning
9. Debugging & Profiling & Performance Toolsets
 Practical Topics
10. Hybrid Programming & Patterns  Theoretical / Conceptual Topics
Lecture 2 – Parallel Programming with MPI 3 / 50
Outline

 Message Passing Interface (MPI) Concepts  Promises from previous lecture(s):


 Practical Lecture 0.2: Lecture 2 will
 Modular Supercomputing Architecture & Application Examples provide a full introduction and many
 Distributed Memory Computers & MPI Standard for Portability more examples of the Message Passing
Interface (MPI) for parallel
 Point‐to‐Point Message Passing Functions programming
 Lecture 1: Lecture 2 & 4 will give in-
 Understanding MPI Collectives depth details on the distributed-
memory programming model with the
 Using MPI Ranks & Communicators Message Passing Interface (MPI)
 Lecture 1: Lecture 2 will provide a full
introduction and many more examples
of the Message Passing Interface (MPI)
 MPI Parallel Programming Basics for parallel programming

 Jötunn HPC Environment with Libraries & Modules
 Thinking Parallel & Step‐wise Walkthrough for Parallel Programming
 Basic Building Blocks of a Parallel Program
 Code Compilation & Parallel Executions 
 Simple PingPong Application Example

Lecture 2 – Parallel Programming with MPI 4 / 50


Selected Learning Outcomes

 Students understand…
 Latest developments in parallel processing & high performance computing (HPC)
 How to create and use high‐performance clusters
 What are scalable networks & data‐intensive workloads
 The importance of domain decomposition
 Complex aspects of parallel programming
 HPC environment tools that support programming 
or analyze behaviour
 Different abstractions of parallel computing on various levels
 Foundations and approaches of scientific domain‐
specific applications 
 Students are able to …
 Programm and use HPC programming paradigms
 Take advantage of innovative scientific computing simulations & technology
 Work with technologies and tools to handle parallelism complexity
Lecture 2 – Parallel Programming with MPI 5 / 50 
Message Passing Interface (MPI) Concepts

Lecture 2 – Parallel Programming with MPI 6 / 50


Parallel Programming with MPI – Physics & Engineering Applications for HPC

 Parallel programming  Parallel programming


with MPI in physics with MPI in physics
and engineering and engineering
applications are often applications typically
based on known simulate or model a
physical laws using specific area (i.e., a
iterative numerical model space) over a
methods and are often specific time (i.e.,
called simulation simulation time)
sciences or
computational
sciences Numerical calculations… Model
…simulation over time
 Parallel Programming
with MPI can be
considered as one sub
area of scientific
programming and/or
scientific computing Experiment Theory
‘we observe ‘we create
the nature‘ a model
of nature'

 Lecture 12 – 15 will offer more insights into a wide variety of physics & engineering applications that take advantage of HPC with MPI
Lecture 2 – Parallel Programming with MPI 7 / 50
Parallel Programming with MPI – Data Science Applications for HPC

 Machine Learning Algorithms
 Example: Highly Parallel Density‐based spatial clustering of applications with noise (DBSCAN) 
 Selected Applications: Clustering different cortical layers in brain tissue & point cloud data analysis

Clustering

[11] M. Goetz and M. Riedel et al, 
Proceedings IEEE Supercomputing Conference, 2015

 Lecture 8 will provide more details on MPI application examples with a particular focus on parallel and scalable machine learning
Lecture 2 – Parallel Programming with MPI 8 / 50
Example: Modular Supercomputing Architecture – MPI Usage in Cluster Module

we focus in this 
 The Cluster Module (CM) offers a
Cluster Nodes (CNs) with high lecture only on
single-thread performance and a this module
universal Infiniband interconnect
 Given the CM architecture setup
it works very well for applications
that take advantage of MPI

network
interconnection
important

 The modular supercomputing architecture (MSA)


[7] DEEP Projects Web Page enables a flexible HPC system design co-designed
by the need of different application workloads

Lecture 2 – Parallel Programming with MPI 9 / 50


Application Example: Formula Race Car Design & Room Heat Dissipation

 Pro: Network communication is relativel hidden and supported
 Contra: Programming with MPI still requires using ‘parallelization methods’
 Not easy: Write ‘technical code’ well integrated in ‘problem‐domain code’
 Example: Race Car Simulation &
Heat dissipation in a Room 
 Apply a good parallelization method 
(e.g. domain decomposition) time t

 Write manually good MPI code for 
(technical) communication 
between processors
time t
(e.g. across 1024 cores)
 Integrate well technical code
[10] Modified from  [2] Introduction to High Performance Computing 
with problem‐domain code Caterham F1 team for Scientists and Engineers
(e.g. computational fluid dynamics & airflow)
 Lecture 3 will provide more details on MPI application examples with a particular focus on parallelization fundamentals
Lecture 2 – Parallel Programming with MPI 10 / 50
Distributed‐Memory Computers – Revisited (cf. Lecture 1)

 A distributed-memory parallel computer establishes a ‘system view’


where no process can access another process’ memory directly

[2] Introduction to High Performance Computing for Scientists and Engineers

time t
dominant
programming model
Message Passing 
Interface (MPI)
time t
 Features
 Processors communicate via Network Interfaces (NI) [10] Modified from 
Caterham F1 team
[2] Introduction to High Performance Computing 
for Scientists and Engineers
 NI mediates the connection to a Communication network
 This setup is rarely used  a programming model view today
Lecture 2 – Parallel Programming with MPI 11 / 50
Programming with Distributed Memory using MPI – Revisited (cf. Lecture 1)

 Distributed-memory programming enables


explicit message passing as communication between processors
 Message Passing Interface (MPI) is dominant distributed-memory
programming standard today (available in many different version)
 MPI is a standard defined and developed by the MPI Forum

[5] MPI Standard

 Features
 No remote memory access on distributed‐memory systems
 Require to ‘send messages’ back and forth between processes PX
 Many free Message Passing Interface (MPI) libraries available
 Programming is tedious & complicated, but most flexible method P1 P2 P3 P4 P5

 Lecture 4 will provide more details on advanced functions of the Message Passing Interface (MPI) standard and its use in applications
Lecture 2 – Parallel Programming with MPI 12 / 50
GNU OpenMPI Implementation

 Message Passing Interface (MPI)
 A standardized and portable message‐passing standard
 Designed to support different HPC architectures
 A wide variety of MPI implementations exist
 Standard defines the syntax and semantics 
of a core of library routines used in C, C++ & Fortran [5] MPI Forum

 OpenMPI Implementation
 Open source license based on the BSD license
 Full MPI (version 3) standards conformance [6] OpenMPI Web page
 Developed & maintained by a consortium of 
academic, research, & industry partners
 Typically available as modules on HPC systems and used with mpicc compiler
 Often built with the GNU compiler set and/or Intel compilers

 Lecture 2 will provide a full introduction and many more examples of the Message Passing Interface (MPI) for parallel programming
Lecture 2 – Parallel Programming with MPI 13 / 50
What is MPI from a Technical Perspective?

 ‘Communication library’ abstracting from low‐level network view
 Offers 500+ available functions to communicate between computing nodes
 Practice reveals: parallel applications often require just ~12 (!) functions
 Includes routines for efficient ‘parallel I/O’ (using underlying hardware)

 Supports ‘different ways of communication’
P1 P2 P3 P4 P5
 ‘Point‐to‐point communication’ between two computing nodes (P P)
 Collective functions involve ‘N computing nodes in useful communiction’   Computing nodes
are independent
computing
processors (that
 Deployment on Supercomputers supporting Applications Portability may also have N
cores each) and
 Installed on (almost) all parallel computers that are all part of
one big parallel
 Different languages: C, Fortran, Python, R, etc. computer (e.g.
hybrid architecture,
 Careful: Different versions might be installed cf. Lecture 1)

Lecture 2 – Parallel Programming with MPI 14 / 50


MPI Standard enables Portability of Applications

 Key reasons for requiring a standard programming library
 Technical advancement in supercomputers is extremely fast
 Parallel computing experts switch organizations and face another system
 Applications using proprietary libraries where not portable
 Create whole applications from scratch or time‐consuming code updates
 MPI changed this & is dominant parallel programming model
P1 P2 P3 P4 P5
 Works for different 
MPI standard  MPI is an open 
implementations standard that
significantly supports 
 E.g., MPICH the portability
HPC Machine A HPC Machine B of parallel 
 E.g., Parastation MPI Porting a parallel applications across a 
MPI Library MPI Library wide variety of 
 E.g., OpenMPI MPI application different HPC systems 
 Etc. and supercomputer 
architectures

Lecture 2 – Parallel Programming with MPI 15 / 50


Is MPI yet another Network Library?

 TCP/IP and socket programming libraries are plentiful available
 Do we need a dedicated communication & network protocols library?
 Goal: simplify programming in parallel programming Over the Internet?
 Focus on scientific and engineering applications with mathematical calculations
 Enable parallel and scalable machine and deep learning algorithms
 Selected reasons
P1 P2 P3 P4 P5
 Designed for performance within large parallel computers (e.g. no security)
 Supports various interconnects between ‘computing nodes’ (hardware)
 Offers various benefits like ‘reliable messages’ or ‘in‐order arrivals’

 MPI is not designed to handle any communication in computer networks and is thus very special
 MPI is not good for clients that constantly establishing/closing connections again and again (e.g. would have very slow performance in MPI)
 MPI is notgood for internet chat clients or Web service servers in the Internet (e.g. no security beyond firewalls, no message encryption
directly available, etc.)

Lecture 2 – Parallel Programming with MPI 16 / 50


Message Passing: Exchanging Data with MPI Send/Receive

Compute  NEW: 17
Node
P P

DATA: 17 M M DATA: 80
P1 P2 P3 P4 P5
HPC Machine 
NEW: 06  Each processor has
its own data in its
memory that
MPI point‐to‐point
communications P P can not be
seen/accessed by
other processors

DATA: 06 M M DATA: 19

Lecture 2 – Parallel Programming with MPI 17 / 50


Collective Functions : Broadcast (one‐to‐many)

NEW: 17

P P

DATA: 17 M M DATA: 80
P1 P2 P3 P4 P5

NEW: 17  Broadcast
NEW: 17 distributes the
same data to many
P P or even all other
processors

DATA: 06 M M DATA: 19

Lecture 2 – Parallel Programming with MPI 18 / 50


Collective Functions:  Scatter (one‐to‐many)

P P NEW: 10

DATA: 10
DATA: 20 M M DATA: 80
DATA: 30 P1 P2 P3 P4 P5
 Scatter distributes
different data to
many or even all

NEW: 30
P P NEW: 20 other processors

DATA: 06 M M DATA: 19

Lecture 2 – Parallel Programming with MPI 19 / 50


Collective Functions: Gather (many‐to‐one)

NEW: 80
NEW: 19
NEW: 06 P P

DATA: 17 M M DATA: 80
P1 P2 P3 P4 P5
 Gather collects data
from many or even
all other processors
P P to one specific
processor

DATA: 06 M M DATA: 19

Lecture 2 – Parallel Programming with MPI 20 / 50


Collective Functions: Reduce (many‐to‐one)

NEW: 122 global sum
+ as example
+ P P
+ +
DATA: 17 M M DATA: 80
P1 P2 P3 P4 P5
+
 Reduce combines
collection with
computation based on
P P data from many or even
all other processors
 Usage of reduce
includes finding a
global minimum or

DATA: 06 M M DATA: 19
maximum, sum, or
product of the different
data located
at different processors
Lecture 2 – Parallel Programming with MPI 21 / 50
Using MPI Ranks & Communicators

 Answers the following question: 
(numbers reflect   How do I know where to send/receive to/from?
unique identity
of processor   Each MPI activity specifies the context in 
named ‘MPI rank)
which a corresponding function is performed
 MPI_COMM_WORLD
(region/context of all processes)
 Create (sub‐)groups of the processes / virtual 
groups of processes
 Peform communications only within these sub‐
groups easily with well‐defined processes

 Using communicators wisely in collective functions


can reduce the number of affected processors
[8] LLNL MPI Tutorial  MPI rank is a unique number for each processor

 Lecture 4 on advanded MPI techniques will provide details about the often used MPI cartesian communicator & its use in applications
Lecture 2 – Parallel Programming with MPI 22 / 50
[Video] Introducing MPI – Summary

[9] Introducting MPI, YouTube Video

Lecture 2 – Parallel Programming with MPI 23 / 50


MPI Parallel Programming Basics

Lecture 2 – Parallel Programming with MPI 24 / 50


Starting Parallel Programming – What do we need?

 Check access to the cluster machine
 Check MPI standard implementation and its version
 Often SSH is used to remotely access clusters

 OpenMPI
 ‘Open Source High Performance Computing’
 Using the module environment
(cf. Practical Lecture 0.2)

 Other Implementations exists [6] OpenMPI Web page

 E.g., MPICH implementation
 E.g., Parastation MPI implementation  [12] Icelandic HPC Machines & Community
 (we don‘t use those in this course)

Lecture 2 – Parallel Programming with MPI 25 / 50 


HPC System – Jötunn Cluster – Revisited (cf. Practical Lecture 0.1)

 4 Nodes
 Cpu: 2x Intel Xeon CPU E5‐2690 v3 @ 2.60GHz 
(2.6 GHz, 12 core)
 Memory
 128GB DDR4
 Interconnect
 10 Gb/s Ethernet
 Ganglia monitoring
service
 Shows usage of CPUs

[12] Icelandic HPC Machines & Community

 We will have a visit to computing room of Jötunn to ‘touch metal’ and will meet our HPC System expert Hjörleifur Sveinbjörnsson
Lecture 2 – Parallel Programming with MPI 26 / 50 
SSH Access to HPC System – Jötunn HPC System Example – Revisited 

 Example: first login via Hekla (if you are not in Uni network)

[12] Icelandic HPC Machines & Community

Lecture 2 – Parallel Programming with MPI 27 / 50 


Step 1: SSH Access to HPC System – Jötunn HPC System Example

Jötunn HPC System

Hekla System

Lecture 2 – Parallel Programming with MPI 28 / 50 


Step 2: Edit a Text File – Simple Hello World C Programm – Revisited 
 #include is used for C header files that is a file that contains function
#include <stdio.h>
declarations for C in-built library functions; stdio.h is the standard input and
output library for C

int main()  The main function is ‘called‘ by the operating system when a user runs the C
{ program – but essentially a usual c function with optional parameters that we
will explore during the course of the lecture series

printf("Hello, World!");  The printf() function sends formatted text as output to stdout and is often
used for simple debugging of C programs

return 0;  Return provides return values to the calling function; in the case of the main
function this can be considered as an exit status code for the OS. Mostly, 0
} exit code signifies a normal run (no errors) and a non 0 exit code (e.g., 1)
usually means there was a problem and the program had to exit abnormally.

 Simple C Program
 Above file content is stored in file hello.c using a C compiler
C
 Although .c file extension it remains a normal text file
 hello.c is not executable as C programm  it needs a compilation hello.c

Lecture 2 – Parallel Programming with MPI 29 / 50 


New Steps Required: Start ‘Thinking’ Parallel

 Parallel Processing Approach  SPMD stands


for Single
 Parallel MPI programs know about the existence of other processes of it  Program
Multiple Data
and what their own role is in the bigger picture
 MPI programs are written in a sequential
programming language, but executed in parallel
 Same MPI program runs on all processes (SPMD)

P P
 Data exchange is key for design of applications
 Sending/receiving data at specific times in the program
 No shared memory for sharing variables with other remote processes
 Messages can be simple variables (e.g. a word) or complex structures

 Start with the basic building blocks using MPI P P …
 Building up the ‘parallel computing environment’

Lecture 2 – Parallel Programming with MPI 30 / 50 


Step 3: Edit a Text File – (MPI) Basic Building Blocks: Variables & Output
 The main function is ‘called‘ by the operating system when a user runs the C
#include <stdio.h>
program – but essentially a usual c function with optional parameters that we
added here to be used later in the initialization of the MPI environment

int main(int argc, char** argv)  Two integer variables that are later useful for working with specific data
{ obtained from the specific MPI library that we need to add in the next step too
in order to fill information into the integer variables about rank and sizes
int rank, size;
 The printf() function sends formatted text as output to stdout and
printf("Hello World, I am %d out of %d\n", is often used for simple debugging of C programs
rank, size);  Thinking in parallel in parallel programming is to understand that
different processes have an identity and work on different
elements of the program
return 0;  In the example we want to give an output that shows the identity
of each MPI process by using the rank and size information
}

 Extended Simple C Program (still C only)
 Above file content is stored in file hello.c using a C compiler
C
 Selected changes to the basic c program structure to prepare for MPI
 hello.c is not executable as C programm  it needs a compilation hello.c

Lecture 2 – Parallel Programming with MPI 31 / 50 


Step 4: Edit a Text File – MPI Basic Building Blocks: Header & Init/Finalize
 Libraries can be used by including C header files, here the library for MPI is
#include <stdio.h>
included in order to use several MPI functions in our extended C program
#include <mpi.h>

int main(int argc, char** argv)


{  The MPI_INIT() function initializes the MPI environment and can take inputs
via the main() function arguments
int rank, size;
MPI_Init(&argc, &argv);
printf("Hello World, I am %d out of %d\n",
rank, size);  MPI_Finalize() shuts down the MPI environment
 After MPI_Finalize() no parallel execution of the code can take place)
MPI_Finalize();
return 0;
} using a C compiler
C
 Extended Simple C Program hello.c
 hello.c is not executable as C programm  it needs a compilation

Lecture 2 – Parallel Programming with MPI 32 / 50 


Step 4: Edit a Text File – MPI Basic Building Blocks: Rank & Size Variables

#include <stdio.h>  The MPI_Comm_size()


#include <mpi.h> function determines the
overall number of n
processes in the parallel
program: stores it in
int main(int argc, char** argv) variable size
{  The MPI_Comm_rank()
function determines the
int rank, size; unique identifier for each
processor:
MPI_Init(&argc, &argv); stores it in variable rank
with valures (0 … n-1)
MPI_Comm_size(MPI_COMM_WORLD, &size);
 MPI_COMM_WORLD
MPI_Comm_rank(MPI_COMM_WORLD, &rank); communicator constant
denotes the ‘region of
printf("Hello World, I am %d out of %d\n", communication’, here all
rank, size); processes

MPI_Finalize();
[8] LLNL MPI Tutorial
return 0;
} using a C compiler
C
 Extended Simple C Program
 hello.c is not executable as C programm  it needs a compilation hello.c

Lecture 2 – Parallel Programming with MPI 33 / 50 


Step 5: Load the right Modules for Compilers & Compile C Program (1)

using a C compiler
C
hello.c

[12] Icelandic HPC Machines & Community

Lecture 2 – Parallel Programming with MPI 34 / 50 


HPC System Module Environment – Revisited (cf. Practical Lecture 0.1) 

 Knowledge of installed compilers essential (e.g. C, Fortran90, etc.)
 Different versions and types of compilers exist (Intel, GNU, MPI, etc.)
 E.g. mpicc pingpong.c  –o   pingpong
 Module environment tool
 Avoids to manually setup environment information for every application
 Simplifies shell initialization and lets users easily modify their environment
 Modules can be loaded and unloaded
 Enable the installation of software in different versions
 Module avail
 Lists all available modules on the HPC system (e.g. compilers, MPI, etc.)
 Module load 
 Loads particular modules into the current work environment [12] Icelandic HPC Machines & Community

 E.g. module load  gnu openmpi

Lecture 2 – Parallel Programming with MPI 35 / 50 


GNU OpenMPI Implementation – Revisited 

 Message Passing Interface (MPI)
 A standardized and portable message‐passing standard
 Designed to support different HPC architectures
 A wide variety of MPI implementations exist
 Standard defines the syntax and semantics 
of a core of library routines used in C, C++ & Fortran [7] MPI Forum

 OpenMPI Implementation
 Open source license based on the BSD license
 Full MPI (version 3) standards conformance [6] OpenMPI Web page
 Developed & maintained by a consortium of 
academic, research, & industry partners
 Typically available as modules on HPC systems and used with mpicc compiler
 Often built with the GNU compiler set and/or Intel compilers

Lecture 2 – Parallel Programming with MPI 36 / 50 


Step 5: Load the right Modules for Compilers & Compile C Program (2)

 Using modules to get the
right C compiler for 
compiling hello.c
 ‘module load gnu openmpi‘
 Note: there are many C
compilers available, we
using a C compiler
here pick one for our  C
particular HPC course that 
works with the Message hello.c mpicc
Passing Interface (MPI)
 Note: If there are no errors,
the file hello is now a full C
C program executable that
can be started by an OS hello
executable
 New: C program with MPI statements [12] Icelandic HPC Machines & Community

(cf. Practical Lecture 0.2 w/o MPI statements)
Lecture 2 – Parallel Programming with MPI 37 / 50 
Step 6: Parallel Processing – Executing an MPI Program with MPIRun & Script (1)

 Compilation done In Step 5
 Compilers and linkers need various information where include files and libraries can be found
 E.g. C header files like ‘mpi.h’
 Compiling is different for each programming language hello hello
 Example to understand distribution of program P P
 E.g., executing the MPI program on 4 processors
 Normally batch system allocations  mpirun
(cf. Practical Lecture 0.2) M M
 Understanding role 
of mpirun is important create 4 processes that produce hello hello
output in parallel
 Output of the program  P P
 Order of outputs 
can vary because I/O
screen ‘serial resource’ M M
Lecture 2 – Parallel Programming with MPI 38 / 50 
Step 6: Parallel Processing – Executing an MPI Program with MPIRun & Script (2)

 Need of Job script
 Example using mpirun

hello hello

P P
mpirun
M M
create 4 processes that produce hello hello
output in parallel
 Step‐Wise Walkthrough P P
 All performed steps should be done
in same manner for all MPI jobs
M M
Lecture 2 – Parallel Programming with MPI 39 / 50 
Step 6: Parallel Processing – Executing an MPI Program with MPIRun & Script (3)

 Submission using the Scheduler
 Example: SLURM on Jötunn HPC system
 Scheduler allocated 4 nodes as requested
 MPIRun and scheduler distribute the executable on right nodes
 Output consists of 
the combined 
output of all 4 
requested nodes

Scheduler
Jötunn login node

Jötunn compute nodes
output file

Lecture 2 – Parallel Programming with MPI 40 / 50 


Message Passing: Exchanging Data with MPI Send/Receive – Revisited

Compute  NEW: 17
Node
P P

DATA: 17 M M DATA: 80
P1 P2 P3 P4 P5
HPC Machine 
NEW: 06  Each processor has
its own data in its
memory that
MPI point‐to‐point
communications P P can not be
seen/accessed by
other processors

DATA: 06 M M DATA: 19

Lecture 2 – Parallel Programming with MPI 41 / 50


Message Passing: Exchanging Data with MPI Send/Receive – Example

 Example: pingpong.c

Lecture 2 – Parallel Programming with MPI 42 / 50


Collective Functions : Broadcast (one‐to‐many) – Revisited 

NEW: 17

P P

DATA: 17 M M DATA: 80
P1 P2 P3 P4 P5

NEW: 17  Broadcast
NEW: 17 distributes the
same data to many
P P or even all other
processors

DATA: 06 M M DATA: 19

Lecture 2 – Parallel Programming with MPI 43 / 50


Collective Functions : Broadcast (one‐to‐many) – Example

 Example: broadcast.c

Lecture 2 – Parallel Programming with MPI 44 / 50


Summary of the Parallel Environment & Message Passing

P
M

P P P …
M M M …

P
modified from [8] LLNL MPI Tutorial
M
Lecture 2 – Parallel Programming with MPI 45 / 50
[Video] OpenMPI

[13] What is OpenMPI, YouTube Video

Lecture 2 – Parallel Programming with MPI 46 / 50


Lecture Bibliography

Lecture 2 – Parallel Programming with MPI 47 / 50


Lecture Bibliography (1)

 [1]  K. Hwang, G. C. Fox, J. J. Dongarra, ‘Distributed and Cloud Computing’, Book, Online: 
http://store.elsevier.com/product.jsp?locale=en_EU&isbn=9780128002049
 [2] Introduction to High Performance Computing for Scientists and Engineers, Georg Hager & Gerhard Wellein, Chapman & Hall/CRC Computational Science, 
ISBN 143981192X, English, ~330 pages, 2010, Online:
http://www.amazon.de/Introduction‐Performance‐Computing‐Scientists‐Computational/dp/143981192X
 [3] J. Haut, G. Cavallaro and M. Riedel et al., IEEE Transactions on Geoscience and Remote Sensing, 2019, Online:
https://www.researchgate.net/publication/335181248_Cloud_Deep_Networks_for_Hyperspectral_Image_Analysis
 [4] Fran Berman, ‘Maximising the Potential of Research Data’
 [5] The MPI  Standard, Online: 
http://www.mpi‐forum.org/docs/
 [6] OpenMPI Web page, Online:
https://www.open‐mpi.org/
 [7] DEEP Projects Web page, Online: 
http://www.deep‐projects.eu/
 [8] LLNL MPI Tutorial, Online: 
https://computing.llnl.gov/tutorials/mpi/
 [9] HPC – Introducting MPI, YouTube Video, Online: 
http://www.youtube.com/watch?v=kHV6wmG35po
 [10] Caterham F1 Team Races Past Competition with HPC, Online: 
http://insidehpc.com/2013/08/15/caterham‐f1‐team‐races‐past‐competition‐with‐hpc
 [11] M. Goetz, C. Bodenstein, M. Riedel, ‘HPDBSCAN – Highly Parallel DBSCAN’, in proceedings of the ACM/IEEE International Conference for High Performance 
Computing, Networking, Storage, and Analysis (SC2015), Machine Learning in HPC Environments (MLHPC) Workshop, 2015, Online:
https://www.researchgate.net/publication/301463871_HPDBSCAN_highly_parallel_DBSCAN
Lecture 2 – Parallel Programming with MPI 48 / 50
Lecture Bibliography (2)

 [12] Icelandic HPC Machines & Community, Online: 
http://ihpc.is
 [13] YouTube Video, What is OpenMPI, Online: 
http://www.youtube.com/watch?v=D0‐xSWBGNAw

Lecture 2 – Parallel Programming with MPI 49 / 50


Lecture 2 – Parallel Programming with MPI 50 / 50

You might also like