0% found this document useful (0 votes)
13 views42 pages

P2P Communication

The document discusses P2P communication and MPI program execution, detailing the setup and execution of MPI programs on clusters, including process placement strategies and communication methods. It covers the installation of MPICH, the use of blocking send and receive functions, and the importance of message size and buffering. Additionally, it provides examples of MPI_Send and MPI_Recv functions, along with homework assignments related to analyzing communication endpoints and hops in parallel sum algorithms.

Uploaded by

1none2none3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPSX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views42 pages

P2P Communication

The document discusses P2P communication and MPI program execution, detailing the setup and execution of MPI programs on clusters, including process placement strategies and communication methods. It covers the installation of MPICH, the use of blocking send and receive functions, and the importance of message size and buffering. Additionally, it provides examples of MPI_Send and MPI_Recv functions, along with homework assignments related to analyzing communication endpoints and hops in parallel sum algorithms.

Uploaded by

1none2none3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPSX, PDF, TXT or read online on Scribd
You are on page 1/ 42

P2P Communication

Lecture 4
Jan 15, 2025
MPI Program Execution
Memory
Node/compute
node/host/
system/machine
host1 host2 host3 host4

Intranode Internode

mpiexec –n 8 –hosts host1,host2,host3,host4 ./exe


2
Execution on Beowulf/Unmanaged
Cluster

host1 host2 host3 host4 How much


load is there
on each node?

Network and Load-Aware Resource Manager for MPI Programs


https://dl.acm.org/doi/10.1145/3409390.3409406
3
MPI Program Execution hostfile
cn023
Memory
cn024
cn025
host1 host2 host3 host4 cn026
cn023
Intranode Internode cn024
cn025
cn026
mpiexec –n 8 –hosts host1,host2,host3,host4 ./exe
4
Homework

Analyze the
communication endpoints
for the optimized
algorithm of parallel sum
for round-robin and
sequential placement of 8
and 16 processes.

5
Process Placement - Parallel Sum
(Optimized)
host1 host2 host3 host4

0 2 4 6 Sequential placement
1 3 5 7

0 1 2 3 Round-robin placement
4 5 6 7

6
Parallel Sum (Optimized) on 4
Processes
Communication step 1: 1 -> 0, 3 -> 2
host1 host2 Communication step 2: 2 -> 0
0 2 Sequential placement
1 3

0 1 Round-robin placement
2 3
7
Number of Hops on 4 Processes
Communication step 1: 1 -> 0, 3 -> 2
host1 host2 Communication step 2: 2 -> 0
0 2 Sequential placement
1 3 Communication step 1 #hops: 0, 0 Max: 0
Communication step 2 #hops: 1 Max: 1
Sum: 1
0 1 Round-robin placement
2 3 Communication step 1 #hops: 1, 1 Max: 1
Communication step 2 #hops: 0 Max: 0
Sum: 1 8
Homework: Analyze #Hops

Analyze for
host1 host2 host3 host4
np = 8 and 16, ppn=4
for
Sequential placement
vs.
Round-robin placement

9
CSE Lab Beowulf Cluster
• ~ 30 nodes connected via Ethernet
• Each node has 12/8/4 cores
• Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
• NFS filesystem
• Your home directories are NFS-mounted on all nodes
• Login with CSE login credentials to any machine (IP address range
172.27.19.1 – 172.27.19.30)
• It’s possible that some machines are not reachable/usable, try some other IP

10
MPI Installation on CSE Cluster
Install MPICH 4.2.3 (https://www.mpich.org/static/downloads/4.2.3/) in your
home directory (from any node)
• Download mpich-4.2.3.tar.gz
• Follow installation instructions from
https://www.mpich.org/static/downloads/4.2.3/mpich-4.2.3-installguide.pd
f
• DO NOT use /tmp
• If mpirun is already installed locally on the system, do not use that node to
install (check using which mpirun)
• Verify after installation that `which mpirun` from any node points to your
installation

11
MPI Installation – BYO Cluster

Install MPICH 4.2.3 (https://www.mpich.org/static/downloads/4.2.3/) in a directory


(should have the same path on all your systems of interest)
• Download mpich-4.2.3.tar.gz
• Follow installation instructions from
https://www.mpich.org/static/downloads/4.2.3/mpich-4.2.3-installguide.pdf
• Use the same installation path on all systems (e.g. /home/test)
• Verify after installation that `which mpirun` from any node points to your
installation
• Create a user name and enable passwordless ssh (ssh-keygen)

12
CSE Lab Cluster
• Enable passwordless ssh (ssh-keygen)
• ssh csewsX (from any csews*) passwordlessly
• for i in `seq 1 20`; do ssh csews$i uptime ; done
• “Are you sure you want to continue connecting?” yes

13
MPI Reference Material
• Marc Snir, Steve W. Otto, Steven Huss-Lederman, David W.
Walker and Jack Dongarra, MPI - The Complete Reference,
Second Edition, Volume 1, The MPI Core.
• William Gropp, Ewing Lusk, Anthony Skjellum, Using MPI:
portable parallel programming with the message-passing
interface, 3rd Ed., Cambridge MIT Press, 2014.
• https://www.mpi-forum.org/docs/mpi-4.1/mpi41-report.pdf

14
P2P/Direct Communication
Blocking send and receive
MPI_Send
int MPI_Send (const void *buf, int
count, MPI_Datatype datatype, int
dest, int tag, MPI_Comm comm)
SENDER
Tags should match
int MPI_Recv (void *buf, int count,
MPI_Recv
MPI_Datatype datatype, int source,
int tag, MPI_Comm comm,
MPI_Status *status)
RECEIVER 15
MPI_Send Parameters
buf
initial address of send buffer (choice)
count
number of elements in send buffer (non-negative integer)
datatype
datatype of each send buffer element (handle)
dest
rank of destination (integer)
tag
message tag (integer)
comm
communicator (handle) https://www.mpich.org/static/docs/latest/www3/MPI_Send.html 16
MPI Data Types
• MPI_BYTE
• MPI_CHAR
• MPI_INT
• MPI_FLOAT
• MPI_DOUBLE

17
Example
int MPI_Send (const void *buf, int count, MPI_Datatype
datatype, int dest, int tag, MPI_Comm comm)

Send 1 INT from rank 0 to rank 1


// Initialization

if (myrank == 0)
MPI_Send (buf, 1, MPI_INT, 1, 1, MPI_COMM_WORLD);
else if (myrank == 1)
MPI_Recv (buf, 1, MPI_INT, 0, 1, MPI_COMM_WORLD, &status);
18
Code

19
Executing MPI programs

• Check `which mpicc` and `which mpiexec`


• Update PATH environment variable
• Compile
• mpicc –o filename filename.c
• Run
• mpiexec –np 4 –f hostfile filename [Often “No such file” error]
• mpiexec –np 4 –f hostfile ./filename

20
Simple Send/Recv Code (sendmessage.c)

No runtime or
compile-time
error
21
Runtime error
22
Message Size
Sender Receiver
message (13 bytes) Message (10 bytes)

Fatal error in MPI_Recv: Message truncated, error stack:


MPI_Recv(200)...........................: MPI_Recv(buf=0x7ffccc37c610,
count=10, MPI_CHAR, src=0, tag=99, MPI_COMM_WORLD,
status=0x7ffccc37c5d0) failed
MPIDI_CH3_PktHandler_EagerShortSend(363): Message from rank 0
and tag 99 truncated; 13 bytes received but buffer size is 10
23
No runtime or
compile-time
error
24
Simple Send/Recv Code
(sendmessage.c)
received : Hello,
there

25
Output
0 7 0
Received:
Welcome
1 0 7

26
Output for 4 Processes
0 7 0
Received:
Welcome
1 0 7
2 0 0
3 0 0

27
mpirun -np 2
./send

0 12 0

28
Multiple Sends and Receives
if (myrank == 0)
MPI_Send (buf, count, MPI_INT, 1, 1, MPI_COMM_WORLD),
MPI_Send (buf, count, MPI_INT, 1, 2, MPI_COMM_WORLD);
else if (myrank == 1)
MPI_Recv (buf, count, MPI_INT, 0, 1, MPI_COMM_WORLD, &status),
MPI_Recv (buf, count, MPI_INT, 0, 2, MPI_COMM_WORLD, &status);

printf ("%d %d\n", myrank, count); $ mpirun –np 2 ./send 10

0 10
1 10 29
Multiple Sends and Receives
$ mpirun –np 4 ./send
10
if (myrank == 0)
MPI_Send (buf, count, MPI_INT, 1, 1, MPI_COMM_WORLD),
MPI_Send (buf, count, MPI_INT, 1, 2, MPI_COMM_WORLD);
else if (myrank == 1)
MPI_Recv (buf, count, MPI_INT, 0, 1, MPI_COMM_WORLD, &status),
MPI_Recv (buf, count, MPI_INT, 0, 2, MPI_COMM_WORLD, &status);

0 10
printf ("%d %d\n", myrank, count); 1 10
2 10
3 10 30
Multiple Sends and Receives
if (myrank == 0)
MPI_Send (buf, count, MPI_INT, 1, 1, MPI_COMM_WORLD),
MPI_Send (buf, count, MPI_INT, 1, 2, MPI_COMM_WORLD);
else if (myrank == 1)
MPI_Recv (buf, count, MPI_INT, 0, 1, MPI_COMM_WORLD, &status),
MPI_Recv (buf, count, MPI_INT, 0, 1, MPI_COMM_WORLD, &status);
printf ("%d %d\n", myrank, count);
$ mpirun –np 2 ./send 10

0 10 31
Send and Receive

if (myrank == 0)
MPI_Send (buf, count, MPI_INT, 1, 1, MPI_COMM_WORLD);
else if (myrank == 1)
MPI_Recv (buf, count, MPI_INT, 0, 1, MPI_COMM_WORLD, &status);

$ mpirun –np 2 ./send


printf ("%d %d\n", myrank, count); 10

0 10
1 10 32
Multiple Sends and Receives
if (myrank == 0)
MPI_Send (buf, count, MPI_INT, 1, 1, MPI_COMM_WORLD),
MPI_Send (buf, count, MPI_INT, 1, 2, MPI_COMM_WORLD);
else if (myrank == 1)
MPI_Recv (buf, count, MPI_INT, 0, 2, MPI_COMM_WORLD, &status),
MPI_Recv (buf, count, MPI_INT, 0, 1, MPI_COMM_WORLD, &status);
printf ("%d %d\n", myrank, count); $ mpirun –np 2 ./send 10
0 10
1 10 33
MPI_Send (Blocking, Standard Mode)

• Does not return until buffer can be reused SENDER


• Message buffering can affect this
• Implementation-dependent

RECEIVER

34
Buffering

[Source: Cray presentation] 35


Multiple Sends and Receives
if (myrank == 0)
MPI_Send (buf, count, MPI_INT, 1, 1, MPI_COMM_WORLD),
MPI_Send (buf, count, MPI_INT, 1, 2, MPI_COMM_WORLD);
else if (myrank == 1)
MPI_Recv (buf, count, MPI_INT, 0, 2, MPI_COMM_WORLD, &status),
MPI_Recv (buf, count, MPI_INT, 0, 1, MPI_COMM_WORLD, &status);

printf ("%d %d\n", myrank, count);


$ mpirun –np 2 ./send
1000000 36
Eager vs. Rendezvous Protocol
• Eager
• Send completes without acknowledgement from destination
• MPIR_CVAR_CH3_EAGER_MAX_MSG_SIZE (check output of mpivars)
• Small messages, typically 128 KB (at least in MPICH)
• Rendezvous
• Requires an acknowledgement from a matching receive
• Large messages

37
MPI_Status
int MPI_Recv (void *buf, int count, MPI_Datatype datatype, int
source, int tag, MPI_Comm comm, MPI_Status *status)
typedef struct _MPI_Status {
• Source rank int count;
• Message tag int cancelled;
• Message length int MPI_SOURCE;
• MPI_Get_count int MPI_TAG;
int MPI_ERROR;
} MPI_Status, *PMPI_Status;
38
MPI_Get_count (status.c)
status.MPI_SOURCE
status.MPI_TAG

Rank 1 of 2 received 100 elements from 0


39
Communication – Message
Passing

Process 0 Process 1

40
Timing Send/Recv (timingSend.c)

41
Timing Output

What is the
total time?
42

You might also like