0% found this document useful (0 votes)
3 views

Lecture 10-Introduction to MPI

The document provides an overview of Applied High-Performance Computing and Parallel Programming, focusing on the Message Passing Interface (MPI) and its role in distributed memory systems. It covers key concepts such as communicators, ranks, point-to-point communication, and the structure of MPI programs, including initialization and finalization. Additionally, it discusses the importance of using MPI wrapper compilers for code compilation and job submission in HPC environments.

Uploaded by

roarsomebros
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Lecture 10-Introduction to MPI

The document provides an overview of Applied High-Performance Computing and Parallel Programming, focusing on the Message Passing Interface (MPI) and its role in distributed memory systems. It covers key concepts such as communicators, ranks, point-to-point communication, and the structure of MPI programs, including initialization and finalization. Additionally, it discusses the importance of using MPI wrapper compilers for code compilation and job submission in HPC environments.

Uploaded by

roarsomebros
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

Applied High-Performance Computing and Parallel

Programming

Presenter: Liangqiong Qu

Assistant Professor

The University of Hong Kong


Outline

▪ Introduction to MPI
▪ Parallel Execution in MPI
▪ Communicator and Rank
▪ MPI Blocking Point-to-Point Communication
▪ Beginner’s of MPI Toolbox
▪ Examples
Review of Previous Lecture: Dominant Architectures of HPC Systems
• Shared-memory computers: A shared-memory parallel computer is a system in
which a number of CPUs work on a common, shared physical address space.
Shared-memory programming enables immediate access to all data from all
processors without explicit communication
• Distributed memory computers: A distributed-memory architecture is a system
where each processor or node has its own local memory, and they communicate
with each other through message passing.
• Hybrid (shared/distributed memory) computers: A hybrid architecture is a
system that combines the features of both shared-memory and distributed-memory
architectures.

Figure. Architecture of Distributed memory


Figure. Architecture of shared-memory computers
Distributed Memory and MPI

• Definition: A distributed-memory architecture is a system where


each processor or node has its own local memory, and they
communicate with each other through message passing.

• Features
• No global shared address space
• Data exchange and communication between processors is done
by explicitly passing message through NI (network interfaces)

• Progamming
• No remote memory access on distributed-memory systems
• Require to ‘send message’ back and forth between processors
• Many free Massage Passing Interface (MPI) libraries available
The Message Passing Paradigm
• A brief history of MPI: Before 1990’s, many libraries could facilitate building parallel
applications, but there was not a standard accepted way of doing it. In Supercomputing
1992 conference, research gather together then define a standard interface for performing
message passing - the Message Passing Interface (MPI). This standard interface allow
programmers to write parallel applications that were portable to all major parallel
architectures.

• MPI is a widely accepted standard in HPC

• Processing-based approach: All variables are local! No concept of shared memory.


• A basic principle of MPI: same program on each processor/machine (SPMD). The
program is written in a sequential language like C and Fortran.

• Data exchange between processes: Send/receive messages via MPI library calls
• No automatic workload distribution
The MPI Standard

• MPI forum – defines MPI standard / library subroutine interface

• Latest standard: MPI 4.1 (Nov., 2023) 1166 pages


• First version MPI 1.0 was released on 1994
• MPI 5.0 under development

• Members of MPI standard forum


• Application developers
• Research institutes & computing centers
• Manufacturers of supercomputers & software designers

• Successful free implementations (MPICH, OpenMPI, mvapich) and vendor


libraries (Intel, Cray, HP, …)

• Documents : https://www.mpi-forum.org/docs/
Serial Programming vs Parallel Programming (MPI) Terminologies
Serial Programming Parallel Programming (MPI)
Parallel Execution in MPI
• Processes run throughout program
execution
Program startup
• MPI start mechanism:
• Launches tasks/processes
• Establishes communication
context (“communicator”)
+

• MPI point-to-point communication


• between pairs of tasks/processes
• MPI collective communication:
• between all processes or a
subgroup Program shutdown Thread # 0 1 2 3 4

• Clean shutdown by MPI


C Interface for MPI

• Required header files:


# include <mpi.h>

• Bindings:
• MPI function calls follow a
specific naming convention
error = MPI_Xxxxx(…);

• MPI constant (global/common):


All upper case in C
Initialization and Finalization

• Details of MPI startup are implementation defined

• First call in MPI program: initialization of parallel machine


int MPI_Init(int *argc,char ***argv) Must be called in every
MPI program, must be
• Last call: clean shutdown of parallel machine called before any other
int MPI_Finalize(); MPI functions

Only “master process is guaranteed to continue after finalize


No other MPI routines may be called after it.
Communicator and Rank
• Key questions arise early in parallel program: How many processors are participating
and which one am I.
• MPI_Init () defines “communicator” MPI_COMM_WORLD comprising all processes

Process rank

• MPI uses objects called communicators and groups to define which collection of
processes may communicate with each other.
• Within a communicator, every process has its own unique, integer identifier rank
assigned by the system when the process initializes. A rank is sometimes also called a
“task ID”. Ranks are contiguous and begin at zero.
Communicator and Rank
• Communicator defines a set of processes (MPI_COMM_WORLD: all)

• The rank identifies each process within a communicator


• Obtain rank:
int rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
• rank = 0,1,2,..., (number of processes in communicator - 1)
• One process may have different ranks if it belongs to different communicators

• Obtain number of processors in communicator:


int size;
MPI_Comm_size(MPI_Comm_WORLD, &size);
Communicator and Rank: & and * in C-Programming (Background)
• A computer memory location has an address and holds a content. The address is a
numerical number (often expressed in hexadecimal), which is hard for programmers to
use directly.

• To ease the burden of programming using


numerical address, early programming
languages (such as C) introduce the concept
of variables.
• A variable is a named location that can store
a value of a particular type. Instead of
numerical addresses, names (or identifiers)
are attached to certain addresses.
Communicator and Rank: & and * in C-Programming (Background)
• When a variable is created in C, a memory address is assigned to the variable. The
memory address is the location of where the variable is stored on the computer.
• When we assign a value to the variable, it is stored in this memory address. To access
it, use the reference operator (&), and the result represents where the variable is stored:

• A pointer is a variable that stores the memory address of another variable as its value.
• A pointer variable points to a data type (like int) of the same type, and is created with
the * operator.
• You can also get the value of the variable the pointer points to, by using the * operator
(the dereference operator):
Communicator and Rank: & and * in C-Programming (Background)
• In C, function arguments are passed by value by default. This means when you pass a
variable (e.g., rank) to a function, the function receives a copy of its value. That is C
allocates new memory for the parameter inside the function.
• Any modifications to the parameter inside the function do not affect the original
variable outside.
Communicator and Rank: & and * in C-Programming (Background)
• In C, function arguments are passed by value by default. This means when you pass a
variable (e.g., rank) to a function, the function receives a copy of its value. That is C
allocates new memory for the parameter inside the function.
• We need to pass by the pointer! The memory address
Communicator and Rank: & and * in C-Programming (Background)
• When a variable is created in C, a memory address is assigned to the variable. The
memory address is the location of where the variable is stored on the computer.
• When we assign a value to the variable, it is stored in this memory address. To access
it, use the reference operator (&), and the result represents where the variable is stored:

int rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
"&size" syntax is used to pass the
int size; address of the "size" variable to
MPI_Comm_size(MPI_Comm_WORLD, &size); the function.

• The “&” symbol is used in MPI to pass the address of a variable to a function,
allowing the function to directly modify the value at that memory location. “&” is
more frequently used in MPI functions rather than directly return the value with
return ***
General MPI Program Structure
• Head declaration
• Serial code

• Initialize MPI environment


• Launches tasks/processes
• Establishes communication context (“communicator”)

• Work and message passing calls

• Terminate the MPI environment

• Serial code
Step 1: Write the Code: MPI “Hello World!” in C
Step 2: Compiling the Code
• If you try to compile this code locally with gcc, you might run into problems

• Instead, compiling with MPI wrapper compilers


• Most MPI implementations provide wrapper scripts
• Such as mpicc (MPICH) or mpiicc (Intel MPI, our HKU HPC system use this)

• They behave like normal compilers


Step 2: Compiling the Code
• If you try to compile this code locally with gcc, you might run into problems

• Instead, compiling with MPI wrapper compilers


• Most MPI implementations provide wrapper scripts
• Such as mpicc (MPICH) or mpiicc (Intel MPI, our HKU HPC system use this)
• They behave like normal compilers

• Directly run mpicc –o hello_world hello_world.c output error


Review of Lecture 2: Batch System for Running Jobs with HPC
HPC system module environment: knowledge of installed compilers essential
▪ module list
List currently loaded modules under your account

▪ module avail
List all the available modules in the HPC system
▪ module avail X
List all installed version of module matching any of the “X”

▪ module load X (e.g., module load python)


Load a specific module X into your current account (e.g, load module python/2.7.13)

▪ module unload X
Unload specific module X from your current account
Step 2: Compiling Code Basic: Load the Right Module for Compilers
• If you try to compile this code locally with gcc, you might run into problems

• Instead, compiling with MPI wrapper compilers


• Most MPI implementations provide wrapper scripts
• Such as mpicc (MPICH) or mpiicc (Intel MPI, our HKU HPC system use this)
• They behave like normal compilers

• Directly run mpicc –o hello_world hello_world.c output errors


• Should load the module MPI into our current account
• Our HKU HPC use Intel MPI, then use
• module load impi
• Note: there are many compilers available, we here pick one for our particular HPC course that
works with the MPI
Step 3: Running the Code
• If you try to compile this code locally with gcc, you might run into problems

• Instead, compiling with MPI wrapper compilers


• Most MPI implementations provide wrapper scripts
• Such as mpicc (MPICH) or mpiicc (Intel MPI, our HKU HPC system use this)

• They behave like normal compilers

• Running
• Starup wrappers: mpirun or mpiexec
• mpirun –np 4 ./hello_world
• Details are implementation specific
Review of Lecture 2---Batch System for Running Jobs with HPC
Submitting job scripts
A job script must contain directives to inform the batch system about the
characteristics of the job. This directives appear as comments (#SBATCH) in the job
script and have to conform with the sbatch syntax
Step 3: Running the Code
• Preparing job scripts and running the code with Scheduler
• Example: Slurm as our HKU HPC system
• MPI Run and scheduler distribute the executable on right nodes
• After preparation of job scripts, then submit with sbatch command: sbatch submit-hello.sh
Step 3: Running the Code
• Running the code
• Example to understand distribution of program
• E.g., executing the MPI program on 4 processors
• Normally batch system allocations
• Understanding role of mpirun is important (below
command, running hello_world with 4 processors)

mpirun –np 4 ./hello_world


Summarization of “Hello World!” in MPI

• All MPI programs begin with MPI_init and end with


MPI_Finalize
• When a program is ran with MPI all the processes are
grouped in a communicator, MPI_COMM_WORLD
• Each statement executes independently in each process
Administration
• Assignment 1 has released
- Due March 14, 2025, Friday, 11:59 PM
- Accounts information to access HPC system in HKU has already be sent to your
email late this week.
- Important: The usage of accounts for the first accounts is from Feb. 27 to Mar. 12
11:59 PM.
- You cannot access to the HPC system after Mar.12!
• We need a class representative volunteer to attend Staff-Student Consultative
Committee (SSCC) for 2nd semester 2024-25 meeting! Thank you.
- Session 1: 2:30 - 3:40 p.m. on Wednesday, 26 March 2025 in Room 301, Run Run
Shaw Building
Take a break
MPI Point-to-Point Communication

▪ Sender
• Which processor is sending the message?
• Where is the data on the sending processor?
• What kind of data is being sent?
• How much data is there?

▪ Receiver
• Which processor is receiving the message?
• Where should the data be left on the receiving processor?
• How much data is the receiving processor prepared to accept?

▪ Sender and receiver must mass their information to MPI separately


MPI Point-to-Point Communication

▪ Processors communicate by sending and receiving messages


▪ MPI message: array of elements of a particular type

rank i rank j
Sender Receiver

▪ Data types
▪ Basic
▪ MPI derived types
Predefined Data Types in MPI (Selection)

• Data type matching: Same type


in send and receive call required
MPI Blocking Point-to-Point Communication

▪ Point-to-point: one sender, one receiver


• identified by rank

▪ Blocking: After the MPI call returns,


• the source process can safely modify the send buffer
• the receive buffer (on the destination process) contains the
entire message

• This is not the “standard” definition of “blocking”


Standard Blocking Send

int MPI_Send(void *buf, int count , MPI_Datatype datatype, int dest, int tag, MPI_Comm
comm)

buf address of send buffer


count # of elements
datatype MPI data type
dest destination rank
tag message tag
comm communicator

• void* (void pointer) is a void pointer that can hold the address of any data
type in C
Standard Blocking Send

int MPI_Send(void *buf, int count , MPI_Datatype datatype, int dest, int tag, MPI_Comm
comm)

buf address of send buffer


count # of elements
datatype MPI data type
dest destination rank
tag message tag
comm communicator

▪ At completion
• Send buffer can be reused as you see fit
• Status of destination is unknown - the message could be anywhere
Standard Blocking Send

int MPI_Send(void *buf, int count , MPI_Datatype datatype, int dest, int tag, MPI_Comm
comm)
Standard Blocking Receive
int MPI_Recv(void *buf, int count, MPI_Datatype datatype, int source, int tag,
MPI_Comm comm, MPI_Status *status);
buf address of receive buffer
count maximum # of elements that excepted to receive
datatype MPI data type
source sending processor rank
tag message tag
comm communicator
status address of status object. It is a struct that you can access if
necessary to have more information on the message you just
received.

▪ At completion
• Message has been received successfully
• Message length, and probably the tag and the sender, are still unknown
Source and Tag WildCards
▪ In certain cases, we might want to allow receiving messages from any sender or
with any tag.
▪ MPI_Recv accepts wildcards for the source and tag arguments:
MPI_ANY_SOURCE,MPI_ANY_TAG

▪ MPI_ANY_SOURCE, and MPI_ANY_TAG indicates receive a message from


any source/tag
▪ Actual source and tag values are available in the status object:
Received Message Length

▪ int MPI_Get_count(const MPI_Status *status, MPI_Datatype


datatype, int *count)

status address of status object


datatype MPI data type
count the address of the variable that will store the element count
after the function is executed

▪ Determines number of elements received

int count;
MPI_Get_count(&s,MPI_DOUBLE, &count);
Standard Blocking Receive
Requirements for Point-to-Point Communication
▪ For a communication to succeed:
• The sender must specify a valid destination.
• The receiver must specify a valid source rank (or MPI_ANY_SOURCE).
• The communicator used by the sender and receiver must be the same (e.g.,
MPI_COMM_WORLD).
• The tags specified by the sender and receiver must match (or MPI_ANY_TAG for
receiver).
• The data types of the messages being sent and received must match.
• The receiver's buffer must be large enough to hold the received message.
Beginner’s MPI Toolbox

• MPI_Init( ): Let's get going. Initializes the MPI execution environment.


• MPI_Comm_size( ): How many are we?
• MPI_Comm_rank( ): Who am I?
• MPI_Send( ): Send data to someone else.
• MPI_Recv( ): Receive data from someone/anyone.
• MPI_Get_count( ): How many items have I received?
• MPI_Finalize( ): Finish off. Terminates the MPI execution environment.

• Send/receive buffer may safely be reused after the call has completed
• MPI_Send() must have a specific received rank/tag, MPI_Recv () does not
Example 1. Exchanging Data with MPI Send/Receive (Pingpong.c)
Example 1. Exchanging Data with MPI Send/Receive
• MPI_Send( ) function is used to send
a certain number of elements of
some datatype to another MPI rank;
this routine blocks until the message
is received by the destination process
• MPI_Recv() function is used to
receive a certain number of elements
of some datatype from another MPI
rank; this routine blocks until the
message is received and thus send by
the source process
• This form of MPI communication is
called ‘blocking'
• int MPI_Recv(void *buf, int count, MPI_Datatype
datatype, int source, int tag, MPI_Comm comm,
MPI_Status *status);
• int MPI_Recv(void *buf, int count, MPI_Datatype
datatype, int source, int tag, MPI_Comm comm,
MPI_Status *status);
Example 1. Exchanging Data with MPI Send/Receive

▪ Spend a bit time to really


understand why source and
dest are equal here per rank
Example 1. Exchanging Data with MPI Send/Receive
• MPI_Status is a variable that
includes a lot of information about
the corresponding MPI function call
• We use the MPI_Status in our
example to check how much chars
we really transferred by using the
MPI_Get_count() function
• As a simple debug possibility we can
check whether the MPI_Status
information about source and tag of
the messages are corresponding to
our idea of programming
Example 1. Exchanging Data with MPI Send/Receive
• Load mpi into our account: module load impi
• Compiling with MPI wrapper compilers
• mpiicc pingpong.c –o pingpong
• Preparing batch script submit-pingpong.sh

• Submit the jobs: sbatch submit-pingpong.sh


• View the results
Example 1. Extension of Scontrol Command

• scontrol show job


<job_id> used to
see the status of a
specific job ID
Summary of Beginner’s MPI Toolbox
▪ Starting up and shutting down the “parallel program” with MPI_Init() and
MPI_Finalize()
▪ MPI task (“process”) identified by rank (MPI_Comm_rank() )
▪ Number of MPI tasks: MPI_Comm_size()
▪ Startup process is very implementation dependent
▪ Simple, blocking point-to-point communication with MPI_send() and
MPI_Recv()
• “Blocking” == buffer can be reused as soon as call returns
▪ Message matching
▪ Timing functions
Thank you very much for choosing this course!

Give us your feedback!

https://forms.gle/zDdrPGCkN7ef3UG5A

52

You might also like