0% found this document useful (0 votes)

25 views36 pages

Lecture 11 MPI Point To Point Communication

The document outlines a presentation on Applied High-Performance Computing and Parallel Programming by Liangqiong Qu, focusing on MPI (Message Passing Interface) principles and communication methods. It covers topics such as MPI program structure, blocking and nonblocking point-to-point communication, and practical examples like parallel integration. Key concepts include the use of communicators, ranks, and various MPI functions for effective parallel programming.

Uploaded by

roarsomebros

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views36 pages

Lecture 11 MPI Point To Point Communication

Uploaded by

roarsomebros

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Applied High-Performance Computing and Parallel

Programming

Presenter: Liangqiong Qu

Assistant Professor

The University of Hong Kong

Administration

• Assignment 1 has released

- Due March 14, 2025, Friday, 11:59 PM
- Accounts information to access HPC system in HKU has already be sent to your
email late this week.
- Important: The usage of accounts for the first accounts is from Feb. 27 to Mar. 12
11:59 PM.
- You cannot access to the HPC system after Mar.12!
Review of Lecture 10 --- General MPI Program Structure
▪ A basic principle of MPI: same program on each processor/machine.
The program is written in a sequential language like C and Fortran.
▪ General MPI program structure
• Head declaration
• Serial code
• Initialize MPI environment with MPI_Init()
• Launches tasks/processes
• Establishes communication context (“communicator”)
• Work and message passing calls
• Terminate the MPI environment with MPI_Finalize()
• Serial code
Review of Lecture 10 --- Communicator and Rank
• Key questions arise early in parallel program: How many processors are participating
and which one am I.
• MPI_Init () defines “communicator” MPI_COMM_WORLD comprising all processes

Process rank

• MPI uses objects called communicators and groups to define which collection of
processes may communicate with each other.
• Within a communicator, every process has its own unique, integer identifier rank
assigned by the system when the process initializes.
Review of Lecture 10 --- Beginner’s MPI Toolbox

• MPI_Init( ): Let's get going. Initializes the MPI execution environment.

• MPI_Comm_size( ): How many are we?
• MPI_Comm_rank( ): Who am I?
• MPI_Send( ): Send data to someone else.
• MPI_Recv( ): Receive data from someone/anyone.
• MPI_Get_count( ): How many items have I received?
• MPI_Finalize( ): Finish off. Terminates the MPI execution environment.

• Send/receive buffer may safely be reused after the call has completed
• MPI_Send() must have a specific received rank/tag, MPI_Recv () does not
Review of Lecture 10 --- Point-to-Point Communication
▪ For a communication to succeed:
• The sender must specify a valid destination.
• The receiver must specify a valid source rank (or MPI_ANY_SOURCE).
• The communicator used by the sender and receiver must be the same
(e.g., MPI_COMM_WORLD).
• The tags specified by the sender and receiver must match (or
MPI_ANY_TAG for receiver).
• The data types of the messages being sent and received must match.
• The receiver's buffer must be large enough to hold the received message.
Outline

▪ Point-to-Point Communication with MPI

▪ MPI Blocking Point-to-Point Communication

▪ MPI Nonblocking Point-to-Point Communication

▪ Examples
MPI Blocking Point-to-Point Communication
▪ MPI_Send
• MPI_Send would hang until the whole message has arrived the receiver or the whole message has
been copied into a system buffer.
• After MPI_Send, the send buffer is save for reuse.

▪ MPI_Recv
• MPI_Recv would hang until the message has been received into the buffer specified by the buffer
argument.
• If message is not available, the process will remain hang until a message become available.

Network

System Buffer

Send Buffer Receive Buffer

Use Case: Next-Neighbor Communication
• Frequent pattern in message passing: ring shift

• Simplistic send/recv pairing is not reliable

MPI Blocking Point-to-Point Communication

Question: What happens for the above operation in MPI blocking point to point
communication?
Deadlock in MPI Blocking Point-to-Point Communication

Both ranks wait for Receive to get called

• Two processors initiate a blocking send to each other without posting a receive
• A deadlock occurs when two or more processors try to access the same set of
resources
Deadlock Prevention

Different ordering of send and receive: one rank post

the send while the other posts the receive

But not symmetric/does not scale

Nonblocking Point-to-Point Communication
▪ Limitations of blocking point-to-point communication
• May cause deadlock
• May cause idle time, etc

Process 0 Process 1

To get the first communication, both programs will have to wait for 5 seconds. Then
to get to the second communication, both have to wait for 6 more additional seconds.
Nonblocking Point-to-Point Communication
▪ Blocking point-to-point communication
• Limitations: may cause deadlock, idle time, etc
• Return only when the buffer is ready to reused
▪ Nonblocking point-to-point communication
• A nonblocking MPI call returns immediately to the next statement without
waiting for task to complete. This enables other_work to proceed right away.
• You cannot reuse the send buffer until either a successful wait/test or you
certainly know that the message has been received.
Process 0 Process 1
Nonblocking Point-to-Point Communication
▪ Nonblocking point-to-point communication
▪ Nonblocking calls merely initiate the communication process
▪ The status of the data transfer, and the success of the communication, must be
verified at a later point in the program.
▪ The purpose of a nonblocking send is mostly to notify the system of the
existence of an outgoing message: the actual transfer might take place later.
▪ It is up to the programmer to keep the send buffer intact until it can be verified
that the message has actually been copied someplace else.
▪ In either case, before trusting the contents of the message buffer, the
programmer must check its status using the MPI_Wait or MPI_Test functions.
Standard Nonblocking Send/Receive

▪ The final argument request is a handle to an opaque (or hidden) request

object that holds detailed information about the transaction. The request
handle can be used for subsequent Wait and Test calls. It allows the program to
track the progress and completion of the send operation.
Standard Nonblocking Send/Receive

▪ The final argument request is a handle to an opaque (or hidden) request

object that holds detailed information about the transaction.

▪ Do not reuse sendbuf/recvbuf before MPI_Isend/MPI_Irecv has been completed.

▪ Return of call does not imply completion, check for completion via
MPI_Wait*/MPI_Test*
▪ MPI_Irecv has no status argument
• obtained later during completion via MPI_Wait*/MPI_Test*
Nonblocking Completion Function
▪ Completion of send indicates that the whole message has been copied out and the
sender is now free to update the location in the send buffer.
▪ All nonblocking calls in MPI return a MPI_Request handle in lieu of a status
variable. The purpose of the handle is to let you check on the status of your message
at a later time. In MPI, the status variable becomes defined only after your message
data is ready.
▪ Think of MPI_Request handle as being like a numbered ticket at a fast-food restaurant:
the request handle is like your numbered ticket, allowing you to check back from time
to time whether your food is ready
▪ A fast-food restaurant usually asks its customers to give up their numbered tickets in
exchange for their food. In fact, in an MPI program, if you don‘t match every
nonblocking call with a follow-up call to a completion function, then MPI may
eventually run out of handles, which will cause your program to crash.
Nonblocking Completion Function
▪ MPI provides two functions to complete a nonblocking communication call. Turning
once again to our restaurant analogy.

• MPI_Wait: You hang around the counter until your order number is definitely
ready. (Maybe your friends won't talk to you until you bring them some food!)

• MPI_Test: You go to the counter ask the guy if your order number is ready. (If it
isn't, then you can go back to your table and talk some more with your friends.)
MPI_Wait
MPI_Wait
• Waiting forces the process to go in "blocking mode". The sending process will simply
wait for the request to finish. If your process waits right after MPI_Isend, the send is
the same as calling MPI_Send.

Figure1. Inserting the Wait call immediately after the Isend call emulates a blocking send, and the extra
call doesn't even add very much overhead.

• There are three ways to wait : MPI_Wait, MPI_Waitany, and MPI_Waitall. They can be
called as follows:
MPI_Wait
MPI_Wait
• There are three ways to wait : MPI_Wait, MPI_Waitany, and MPI_Waitall. They can be
called as follows:

• MPI_Wait just waits for the completion of the given request (single request). As soon
as the request is complete an instance of MPI_Status is returned in status.
• MPI can handle multiple communication requests
• MPI_Waitany waits for the first completed request in an array of requests to
continue. As soon as a request completes, the value of index is set to store the index
of the completed request of array_of_requests
• MPI_Waitall waits for the completion of an array of requests, waits for all
provided requests have been completed
MPI_Wait

Test for single request with MPI_Wait

MPI_Wait
Test for multiple requests with MPI_Waitany
MPI_Test
MPI_Test
• Testing checks if the request can be completed. If it can, the request is automatically
completed and the data transferred.
• There are three ways of testing : MPI_Test, MPI_Testany, MPI_Testall. They can be
called as follows:

• flag: variable of type int to test for success. It tells whether the request was completed
during test or not. If flag != 0, that means the request has been completed
MPI_Test
Example of Nonblocking Point-to-Point Communication
Nonblocking Point-to-Point Communication
• Standard nonblocking send/recv MPI_Isend()/MPI_Irecv()
• Return of call does not imply completion of operation
• Use MPI_Wait*() / MPI_Test*() to check for completion using request handles

• Potentials
• Overlapping of communication with work (not guaranteed by MPI standard)
• Overlapping send and receive
• Avoiding synchronization and idle times

• Caveat: Compiler does not know about asynchronous modification of data

Some Useful MPI Calls

▪ double MPI_Wtime();
Returns current time stamp

▪ int MPI_Abort(MPI_Comm comm, int errorcode);

• "Best effort" attempt to abort all tasks in communicator,
deliver error code to calling environment

• This is a last resort; if possible, shut down the program via

MPI_Finalize()
Example Usage of Useful MPI Calls

Example usage of int

MPI_Abort(MPI_Comm comm, int
errorcode);
Example Usage of Useful MPI Calls
Difference with MPI_Wtime and Clock in C

• `MPI_Wtime` measures the actual wall-

clock time, which is the elapsed time
from the beginning to the end of an
event or operation.
• `clock` measures the processor time,
which is the amount of time the CPU
spends executing a specific piece of
code. It doesn't account for factors such
as waiting for other processes
Example 2. Parallel Integration in MPI
Task: calculate in parallel, using 4 processors with the (existing)
intergrate(x, y). Let a = 0, b =2, and f(x) = x2

• Prerequisite knowledge of Trapezoidal Rule in C language for integration

Example 2. Parallel Integration in MPI
• Task: calculate in parallel, using 4 processors, let a = 0, b =2

• Split up interval [a, b] into equal disjoint chunks and compute partial results in parallel
Example 2. Parallel Integration in MPI
Task: calculate in parallel,
using 4 processors, let a = 0, b =2
• Split up interval [a, b] into equal
disjoint chunks
• Compute partial results in parallel
• Collect global sum at rank 0
Example 2. Parallel Integration in MPI
Remarks on Parallel Integration Example

• Gathering results from processes is a very common task in MPl - there are more
efficient and elegant ways to do this (learn in the future lectures).
• This is a reduction operation (summation). There are more efficient and elegant ways
to do this (learn in the future lectures).
• The “master” process waits for one receive operation to be completed before the next
one is initiated. There are more efficient ways... You guessed it!
• “Master-worker” schemes are quite common in MPI programming but scalability
to high process counts may be limited.
• Every process has its own res variable, but only the master process actually uses it >
it's typical for MPI codes to use more memory than actually needed.
Thank You!

HPC Day 11 PPT
No ratings yet
HPC Day 11 PPT
76 pages
MiniTool Partition Wizard Crack 12 Key Download Free 2025
No ratings yet
MiniTool Partition Wizard Crack 12 Key Download Free 2025
29 pages
7 P2p-Iv
No ratings yet
7 P2p-Iv
27 pages
04 cmsc416 Mpi
No ratings yet
04 cmsc416 Mpi
31 pages
ParallelProcessing Ch3 MPI
No ratings yet
ParallelProcessing Ch3 MPI
58 pages
2 Mpi
No ratings yet
2 Mpi
13 pages
5CS022 Lecture 2
No ratings yet
5CS022 Lecture 2
24 pages
Mpi p2
No ratings yet
Mpi p2
51 pages
Lecture 12-MPI Collective Communication
No ratings yet
Lecture 12-MPI Collective Communication
53 pages
PDC Lecture 17 & 18
No ratings yet
PDC Lecture 17 & 18
16 pages
Apznzayhh7i3gk6w Cuvwt6frekq7pgon 9ygvyqpxxizr06xwwpcj29m2cyf7srhmq5cu Hawkzm7cn8obps 9rbemjx43qoi2aixrppfxvlfp9nmwowtjlseuprpbxpttdeipr Rkq Zraxgwytizjexby1hzff8pkune92ywhrc Aez8ev7xemzlvd Qovivr9vkxanyei
No ratings yet
Apznzayhh7i3gk6w Cuvwt6frekq7pgon 9ygvyqpxxizr06xwwpcj29m2cyf7srhmq5cu Hawkzm7cn8obps 9rbemjx43qoi2aixrppfxvlfp9nmwowtjlseuprpbxpttdeipr Rkq Zraxgwytizjexby1hzff8pkune92ywhrc Aez8ev7xemzlvd Qovivr9vkxanyei
19 pages
Distributed Memory Programming Using
No ratings yet
Distributed Memory Programming Using
113 pages
Distributed Systems and Cloud Computing
No ratings yet
Distributed Systems and Cloud Computing
24 pages
4 P2P-1
No ratings yet
4 P2P-1
31 pages
Message Passing-1
No ratings yet
Message Passing-1
76 pages
Module 5
No ratings yet
Module 5
9 pages
07 2 Introduction MPI
No ratings yet
07 2 Introduction MPI
27 pages
Send and Receive
No ratings yet
Send and Receive
11 pages
NGK Mpi
No ratings yet
NGK Mpi
74 pages
Intro MPI
No ratings yet
Intro MPI
60 pages
Mpi 1
No ratings yet
Mpi 1
20 pages
Mpi 2
No ratings yet
Mpi 2
46 pages
Parallel Programming With Message-Passing Interface (MPI)
No ratings yet
Parallel Programming With Message-Passing Interface (MPI)
6 pages
Unit - 3 - My
No ratings yet
Unit - 3 - My
84 pages
In3200 Chap09
No ratings yet
In3200 Chap09
56 pages
Introduction To C MPI PM
No ratings yet
Introduction To C MPI PM
50 pages
Lap-Trinh-Song-Song - Pham-Quang-Dung - Chapter-Mpi - (Cuuduongthancong - Com)
No ratings yet
Lap-Trinh-Song-Song - Pham-Quang-Dung - Chapter-Mpi - (Cuuduongthancong - Com)
33 pages
CS-3006 - 5 - MPI Basics
No ratings yet
CS-3006 - 5 - MPI Basics
53 pages
Ms. V. Uma Maheswari, Assistant Lecturer, Department of Information Technology, National Institute of Technology, Surathkal
No ratings yet
Ms. V. Uma Maheswari, Assistant Lecturer, Department of Information Technology, National Institute of Technology, Surathkal
91 pages
Lec5 MPI
No ratings yet
Lec5 MPI
28 pages
CSC4005 Tutorial3
No ratings yet
CSC4005 Tutorial3
40 pages
Week 10
No ratings yet
Week 10
52 pages
Chapter Mpi
No ratings yet
Chapter Mpi
33 pages
Cs-3006 6 Mpi Basics 2
No ratings yet
Cs-3006 6 Mpi Basics 2
52 pages
PDC Lecture 16 MPI - Net-New
No ratings yet
PDC Lecture 16 MPI - Net-New
59 pages
Introduction MPI - Chap2 - Slide 3
No ratings yet
Introduction MPI - Chap2 - Slide 3
16 pages
MPI Part2 Updated
No ratings yet
MPI Part2 Updated
20 pages
Mpi Unit 5 Part 2 1
No ratings yet
Mpi Unit 5 Part 2 1
65 pages
Message Passing and MPI: John Mellor-Crummey
No ratings yet
Message Passing and MPI: John Mellor-Crummey
78 pages
Lecture 11 Distributed Memory Programming
No ratings yet
Lecture 11 Distributed Memory Programming
28 pages
MPI Point-To-Point Communication Modes
No ratings yet
MPI Point-To-Point Communication Modes
2 pages
Unit4 RMD PDF
No ratings yet
Unit4 RMD PDF
18 pages
High Performance Computing: Matthew Jacob Indian Institute of Science
No ratings yet
High Performance Computing: Matthew Jacob Indian Institute of Science
25 pages
Mpi Basic Operations
No ratings yet
Mpi Basic Operations
6 pages
Parallel & Distributed Computing: MPI - Message Passing Interface
No ratings yet
Parallel & Distributed Computing: MPI - Message Passing Interface
49 pages
Chapter 4 - Message-Passing Programming, MPI
No ratings yet
Chapter 4 - Message-Passing Programming, MPI
79 pages
1 - BYD ECB Electric Training
100% (1)
1 - BYD ECB Electric Training
55 pages
Intro To MPI: Hpc-Support@duke - Edu
No ratings yet
Intro To MPI: Hpc-Support@duke - Edu
56 pages
Message Passing Interface (MPI) Programming
No ratings yet
Message Passing Interface (MPI) Programming
11 pages
BIg Data Anslysi
No ratings yet
BIg Data Anslysi
57 pages
Mpi Lecture
No ratings yet
Mpi Lecture
129 pages
Mpi
No ratings yet
Mpi
30 pages
The Message Passing Interface (MPI)
No ratings yet
The Message Passing Interface (MPI)
18 pages
Message Passing Interface (MPI) Programming
No ratings yet
Message Passing Interface (MPI) Programming
11 pages
Message Passing Interface (MPI)
No ratings yet
Message Passing Interface (MPI)
14 pages
Automatic Data Master Server: Data Collection Server System Based On B/S Structure
100% (1)
Automatic Data Master Server: Data Collection Server System Based On B/S Structure
19 pages
HPC Lecture40
No ratings yet
HPC Lecture40
25 pages
Unit Iv Distributed Memory Programming With Mpi
No ratings yet
Unit Iv Distributed Memory Programming With Mpi
19 pages
Web Technology
No ratings yet
Web Technology
3 pages
An Introduction To MPI: Parallel Programming With The Message Passing Interface
No ratings yet
An Introduction To MPI: Parallel Programming With The Message Passing Interface
48 pages
Musso Owner Manual Small
No ratings yet
Musso Owner Manual Small
482 pages
GymnaUniphy Phyaction 740,790 - Service Manual
No ratings yet
GymnaUniphy Phyaction 740,790 - Service Manual
68 pages
What Is A Compilation
100% (1)
What Is A Compilation
8 pages
EP-MS-P4-S3-010 - 33kV XLPE Single Core Cables (KM Material Spec)
No ratings yet
EP-MS-P4-S3-010 - 33kV XLPE Single Core Cables (KM Material Spec)
7 pages
Virtual Bank
No ratings yet
Virtual Bank
114 pages
F6
No ratings yet
F6
2 pages
DB68-03812A-11 IM DVM S Outdoor AA EN 200929
No ratings yet
DB68-03812A-11 IM DVM S Outdoor AA EN 200929
110 pages
Bain Marie - 180 CMS
100% (1)
Bain Marie - 180 CMS
1 page
SD WAN 7.4 Architecture For MSSPs
No ratings yet
SD WAN 7.4 Architecture For MSSPs
67 pages
Inte 423 Business Process Re-Engineering
No ratings yet
Inte 423 Business Process Re-Engineering
4 pages
Rem Con 32
No ratings yet
Rem Con 32
61 pages
Wire EDM Features - Suzhou Baoma Numerical Control Equipment Co., LTD
No ratings yet
Wire EDM Features - Suzhou Baoma Numerical Control Equipment Co., LTD
4 pages
HSF301 Chapter 01
No ratings yet
HSF301 Chapter 01
34 pages
Week 1 Individual Assignment RFP
No ratings yet
Week 1 Individual Assignment RFP
8 pages
CH01 PPT
No ratings yet
CH01 PPT
77 pages
AI and Agri
No ratings yet
AI and Agri
19 pages
An Autonomous UHF RFID Transponder Concept For Fawn Saving Using Solar Energy Harvesting
No ratings yet
An Autonomous UHF RFID Transponder Concept For Fawn Saving Using Solar Energy Harvesting
7 pages
Restfull Api Con Node - Js
No ratings yet
Restfull Api Con Node - Js
39 pages
Interchange4thEd Level3 Writing Worksheets AnswerKey
No ratings yet
Interchange4thEd Level3 Writing Worksheets AnswerKey
2 pages
Caterpillar Cat 245 EXCAVATOR (Prefix 94L) Service Repair Manual (94L00001-00254)
No ratings yet
Caterpillar Cat 245 EXCAVATOR (Prefix 94L) Service Repair Manual (94L00001-00254)
23 pages
ICT Thematic Group Report For Harare Master Plan 2024 2044
No ratings yet
ICT Thematic Group Report For Harare Master Plan 2024 2044
11 pages
Best Practices: Getting Started With Informix Connection Manager
No ratings yet
Best Practices: Getting Started With Informix Connection Manager
52 pages
Sa 4679682FSR
No ratings yet
Sa 4679682FSR
2 pages
Real Time Linux Programming Mark Veltzer
No ratings yet
Real Time Linux Programming Mark Veltzer
20 pages
Birla Institute of Technology and Science II Semester 2012-13 MEL G641 CAD For IC Design Lab Assignement-1
No ratings yet
Birla Institute of Technology and Science II Semester 2012-13 MEL G641 CAD For IC Design Lab Assignement-1
2 pages
Roksan Remote - Control - rmx111
No ratings yet
Roksan Remote - Control - rmx111
12 pages
24 Samsung S24E650XW - Specifications
No ratings yet
24 Samsung S24E650XW - Specifications
1 page
Basic Setup of FortiMail Mail Server
From Everand
Basic Setup of FortiMail Mail Server
Dr. Hidaia Mahmood Alassoulii
No ratings yet
Kafka Developer Certified: The Essential Guide
From Everand
Kafka Developer Certified: The Essential Guide
SUJAN
No ratings yet
Basic Setup of FortiMail Mail Server
From Everand
Basic Setup of FortiMail Mail Server
Dr. Hidaia Mahmood Alassouli
No ratings yet

Lecture 11 MPI Point To Point Communication

Uploaded by

Lecture 11 MPI Point To Point Communication

Uploaded by

Applied High-Performance Computing and Parallel

The University of Hong Kong

• Assignment 1 has released

• MPI_Init( ): Let's get going. Initializes the MPI execution environment.

▪ Point-to-Point Communication with MPI

▪ MPI Blocking Point-to-Point Communication

▪ MPI Nonblocking Point-to-Point Communication

Send Buffer Receive Buffer

• Simplistic send/recv pairing is not reliable

Both ranks wait for Receive to get called

Different ordering of send and receive: one rank post

But not symmetric/does not scale

▪ The final argument request is a handle to an opaque (or hidden) request

▪ The final argument request is a handle to an opaque (or hidden) request

▪ Do not reuse sendbuf/recvbuf before MPI_Isend/MPI_Irecv has been completed.

Test for single request with MPI_Wait

• Caveat: Compiler does not know about asynchronous modification of data

▪ int MPI_Abort(MPI_Comm comm, int errorcode);

• This is a last resort; if possible, shut down the program via

Example usage of int

• `MPI_Wtime` measures the actual wall-

• Prerequisite knowledge of Trapezoidal Rule in C language for integration

You might also like