CSC4005 Tutorial3
CSC4005 Tutorial3
Parallel Programming
Tutorial 3
・Non-blocking communication
Non-blocking communication is done using MPI_Isend() and MPI_Irecv().
These function return immediately (i.e., they do not block) even if the
communication is not finished yet. You must call MPI_Wait() or MPI_Test() to
see whether the communication has finished.
Blocking Point-wise Communication
MPI_Send(
void* data,
int count,
MPI_Datatype datatype,
int destination,
int tag,
MPI_Comm communicator);
Blocking Point-wise Communication
MPI_Recv(
void* data,
int count,
MPI_Datatype datatype,
int source,
int tag,
MPI_Comm communicator,
MPI_Status* status);
MPI Supported Datatypes
Blocking Point-wise Communication
Example 1
// Find out rank, size
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
int number;
if (world_rank == 0) {
number = -1;
MPI_Send(&number, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
} else if (world_rank == 1) {
MPI_Recv(&number, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("Process 1 received number %d from process 0\n", number);
}
Blocking Point-wise Communication
Example 2
if (0 == rank) {
data = 1999'12'08;
for (int i = 1; i < size; ++i) {
data = data + i;
std::cout << "sending " << data << " to " << i << std::endl;
MPI_Send(&data, 1, MPI_INT, i, 0, MPI_COMM_WORLD);
}
} else {
MPI_Recv(&data, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
std::cout << "received " << data << " at " << rank << std::endl;
}
Blocking Point-wise Communication
Example 2 Output
Blocking Point-wise Communication
int MPI_Sendrecv(
const void *sendbuf,
int sendcount,
MPI_Datatype sendtype,
int dest,
int sendtag,
void *recvbuf,
int recvcount,
MPI_Datatype recvtype,
int source,
int recvtag,
MPI_Comm comm,
MPI_Status *status);
Blocking Point-wise Communication
Example 3
Blocking Point-wise Communication
Example 3 Output
Probing
MPI_Get_count(
MPI_Status* status,
MPI_Datatype datatype,
int* count);
MPI_Probe(
int source,
int tag,
MPI_Comm comm,
MPI_Status* status);
MPI_Status
If we pass an MPI_Status structure to the MPI_Recv function, it
will be populated with additional information about the receive
operation after it completes. The three primary pieces of information
include:
int MPI_Wait(
MPI_Request *request,
MPI_Status *status);
Immediate (Non-blocking) Point-wise
Communication
MPI_Test
MPI_Test is the nonblocking counterpart to MPI_Wait. Instead of
blocking until the specified message is complete, this function returns
immediately with a flag that says whether the requested message is complete
(true) or not (false). MPI_Test is basically a safe polling mechanism, and this
means we can again emulate blocking behavior by executing MPI_Test inside
of a while-loop.
int MPI_Test(
MPI_Request *request,
int *flag,
MPI_Status *status);
Immediate (Non-blocking) Point-wise
Communication Example
MPI_Request send_req, recv_req;
MPI_Isend(send.data(), size, MPI_INT, target, 0, MPI_COMM_WORLD, &send_req);
MPI_Irecv(recv.data(), size, MPI_INT, source, 0, MPI_COMM_WORLD, &recv_req);
int sflag = 0, rflag = 0;
do {
MPI_Test(&send_req, &sflag, MPI_STATUS_IGNORE);
MPI_Test(&recv_req, &rflag, MPI_STATUS_IGNORE);
} while (!sflag || !rflag);
Broadcast
A broadcast is one of the standard
collective communication techniques.
During a broadcast, one process sends the
same data to all processes in a
communicator. One of the main uses of
broadcasting is to send out user input to a
parallel program, or send out configuration
parameters to all processes.
In this example, process zero is the
root process, and it has the initial copy of
data. All of the other processes receive the
copy of data.
Broadcast
MPI_Bcast(
void* data,
int count,
MPI_Datatype datatype,
int root,
MPI_Comm communicator);
Broadcast Example
int main(int argc, char **argv) {
int data;
int rank;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (0 == rank) {
std::cin >> data;
}
MPI_Bcast(&data, 1, MPI_INT, 0, MPI_COMM_WORLD);
std::cout << data << std::endl;
MPI_Finalize();
}
MPI_Scatter
MPI_Scatter is a collective routine
that is very similar to MPI_Bcast.
MPI_Scatter involves a designated root
process sending data to all processes in a
communicator.
The primary difference between
MPI_Bcast and MPI_Scatter is small but
important. MPI_Bcast sends the same
piece of data to all processes while
MPI_Scatter sends chunks of an array to
different processes.
MPI_Scatter
MPI_Scatter(
void* send_data,
int send_count,
MPI_Datatype send_datatype,
void* recv_data,
int recv_count,
MPI_Datatype recv_datatype,
int root,
MPI_Comm communicator);
MPI_Gather
MPI_Gather is the inverse of MPI_Scatter.
Instead of spreading elements from one process to
many processes, MPI_Gather takes elements from
many processes and gathers them to one single
process. This routine is highly useful to many parallel
algorithms, such as parallel sorting and searching.
Similar to MPI_Scatter, MPI_Gather takes
elements from each process and gathers them to the
root process. The elements are ordered by the rank
of the process from which they were received.
MPI_Gather
MPI_Gather(
void* send_data,
int send_count,
MPI_Datatype send_datatype,
void* recv_data,
int recv_count,
MPI_Datatype recv_datatype,
int root,
MPI_Comm communicator);
MPI_Allgather
Given a set of elements distributed across
all processes, MPI_Allgather will gather all of the
elements to all the processes. In the most basic
sense, MPI_Allgather is an MPI_Gather followed
by an MPI_Bcast.
Just like MPI_Gather, the elements from
each process are gathered in order of their rank,
except this time the elements are gathered to all
processes. The function declaration for
MPI_Allgather is almost identical to MPI_Gather
with the difference that there is no root process
in MPI_Allgather.
MPI_Allgather
MPI_Allgather(
void* send_data,
int send_count,
MPI_Datatype send_datatype,
void* recv_data,
int recv_count,
MPI_Datatype recv_datatype,
MPI_Comm communicator);
MPI_Barrier
MPI_Barrier(MPI_COMM_WORLD);
MPI_Barrier
{
std::this_thread::sleep_for(std::chrono::milliseconds{100} * rank);
std::cout << "hi (no bar)" << std::endl;
}
MPI_Barrier(MPI_COMM_WORLD);
{
std::this_thread::sleep_for(std::chrono::milliseconds{100} * rank);
MPI_Barrier(MPI_COMM_WORLD);
std::cout << "hi (bar)" << std::endl;
}
More Information
・Reduce:
https://mpitutorial.com/tutorials/mpi-reduce-and-allreduce/
・Group Division:
https://mpitutorial.com/tutorials/introduction-to-groups-and-
communicators/
Debugging: Stacktrace
Debugging: Stacktrace
mpirun –timeout 5 --get-stack-traces ./main
Pass in Arguments
#include <iostream> ・Input:
using namespace std; ./main CSC4005 Parallel Programming
return 0;
}
Pass in Arguments
・argc (ARGument Count) is int and stores number of command-
line arguments passed by the user including the name of the
program. The value of argc should be nonnegative.