0% found this document useful (0 votes)
5 views44 pages

04_a (Distributed Memory Programming with MPI)

The document is a lecture on Distributed Memory Programming using MPI, covering topics such as writing MPI programs, common MPI functions, collective communication, and performance evaluation. It includes practical examples like the Trapezoidal Rule, handling input/output in MPI programs, and various communication strategies including scatter and gather. The lecture emphasizes the structure of MPI programs, the importance of MPI communicators, and the differences between collective and point-to-point communications.

Uploaded by

sophomorepieas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views44 pages

04_a (Distributed Memory Programming with MPI)

The document is a lecture on Distributed Memory Programming using MPI, covering topics such as writing MPI programs, common MPI functions, collective communication, and performance evaluation. It includes practical examples like the Trapezoidal Rule, handling input/output in MPI programs, and various communication strategies including scatter and gather. The lecture emphasizes the structure of MPI programs, the importance of MPI communicators, and the differences between collective and point-to-point communications.

Uploaded by

sophomorepieas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Parallel and Distributed

Programming
Dr. Muhammad Naveed Akhtar
Lecture – 04a
Distributed Memory Programming with MPI

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 2


Roadmap
• Writing your first MPI program.
• Using the common MPI functions.
• The Trapezoidal Rule in MPI.
• Collective communication.
• MPI derived datatypes.
• Performance evaluation of MPI programs.
• Parallel sorting.
• Safety in MPI programs.

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 3


Distributed and Shared Memory Systems

Shared Memory System

Distributed Memory System

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 4


Hello World!

Identifying MPI processes


• Common practice to identify processes by nonnegative integer ranks.
• p processes are numbered 0, 1, 2, .. p-1

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 5


Our first MPI program

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 6


MPI Program Compilation and Execution
Execute Compile
wrapper script to compile
mpiexec –n <np> <executable> Produce debugging information
Executable File Turns on all warnings
Specify Processors’ Count Create this executable file name
wrapper script to execute source file
mpicc –g –Wall –o mpi_hello mpi_hello.c

Execute with 4 Processors in Parallel Execute with only 1 Processor


mpiexec –n 4 ./mpi_hello mpiexec –n 1 ./mpi_hello

Greetings from process 0 of 4 ! Greetings from process 0 of 1 !


Greetings from process 1 of 4 !
Greetings from process 2 of 4 !
Greetings from process 3 of 4 !
Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 7
MPI Programs Structure
• Written in C.
• Has main.
• Uses stdio.h, string.h, etc.

• Need to add <mpi.h> header file.


• Identifiers defined by MPI start with “MPI_”.
• First letter following underscore is uppercase.
• For function names and MPI-defined types.
• Helps to avoid confusion.

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 8


MPI Components
• MPI_Init
• Tells MPI to do all the necessary setup.

• MPI_Finalize
• Tells MPI we’re done, so clean up anything allocated for this program.

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 9


MPI Communicators
• A collection of processes that can send messages to each other.
• MPI_Init defines a communicator that consists of all the processes created when the program is
started.
• Called MPI_COMM_WORLD.

number of processes in
the communicator

my rank
(the process making this call)

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 10


SPMD and Data Types
• SPMD – Single-Program Multiple-Data
• We compile one program.
• Process 0 does something different.
• Receives messages and prints them while the other
processes do the work.

• The if-else construct makes our program SPMD.

• For communication we need datatype


• What type is being communicated in 1,0 sequence

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 11


Communication (Send / Receive)

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 12


Message Matching (Send / Receive)
Define Data being Communicated dest=r
MPI Send(send_buf_p, send_buf_sz, send_type, dest, send_tag, send_comm);

MPI Recv(recv_buf_p, recv_buf_sz, recv_type, src, recv_tag, recv_comm, &status);


recv_buf_sz > send_buf_sz src=q

send_type = recv_type
recv_buf_sz ≥ send_buf_sz
Sender Rank = q
Receiver Rank = r (q & r are numbers)
recv_comm = send_comm
recv_tag = send_tag

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 13


Receiving Messages (In-Complete Information)
• A receiver can get a message without knowing:
• The amount of data in the message (send_buf_sz)
• The sender of the message (src)
• The tag of the message (send_tag)

MPI Recv(recv_buf_p, recv_buf_sz, recv_type, src, recv_tag, recv_comm, &status);

• How much data am I receiving?


MPI_Status* status;

status.MPI_SOURCE
status.MPI_TAG
Status.MPI_ERROR

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 14


Issues with Send and Receive
• Exact behavior is determined by the MPI implementation.
• MPI_Send may behave differently with regard to buffer size, cutoffs and blocking.
• MPI_Recv always blocks until a matching message is received.
• Know your implementation; Don’t make Assumptions!

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 15


Trapezoidal rule in MPI

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 16


The Trapezoidal Rule

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 17


Pseudo-Code for a serial program

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 18


Parallelizing the Trapezoidal Rule
• Partition problem solution into tasks.
• Identify communication channels between tasks.
• Aggregate tasks into composite tasks.
• Map composite tasks to cores.

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 19


Parallel Pseudo-Code

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 20


First version of MPI Program

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 21


First version of MPI Program (contd.)

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 22


First version of MPI Program (contd.)

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 23


Dealing with Output in an MPI Program
Each process just prints a message.

Output of Program (unpredictable)

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 24


Handling Input in an MPI Program
• Most MPI implementations only allow process 0 in MPI_COMM_WORLD access to stdin.
• Process 0 must read the data (scanf) and send to the other processes.

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 25


Collective communication

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 26


Tree-structured communication

Scenario 1 Scenario 2

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 27


MPI_Reduce

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 28


Collective vs. Point-to-Point Communications
• All the processes in the communicator must call the same collective function.
• For example, a program that attempts to match a call to MPI_Reduce on one process with a call to
MPI_Recv on another process is erroneous, (the program will likely hang or crash).
• The arguments passed by each process to an MPI collective communication must be “compatible.”
• For example, if one process passes in 0 as the dest_process and another passes in 1, then the
outcome of a call to MPI_Reduce is erroneous, (the program will likely hang or crash).
• The output_data_p argument is only used on dest_process.
• However, all of the processes still need to pass in an actual argument corresponding to
output_data_p, even if it’s just NULL.
• Point-to-point communications are matched on the basis of tags, Collective comm don’t use tags.
• They’re matched solely on the basis of the communicator and the order in which they’re called.

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 29


Example (Multiple calls to MPI_Reduce)

• Suppose that each process calls MPI_Reduce with operator MPI_SUM, and destination process 0.
• At first glance, it might seem that after the two calls to MPI_Reduce, the value of b will be 3, and
the value of d will be 6.
• However, the names of the memory locations are irrelevant to the matching of the calls to
MPI_Reduce.
• The order of the calls will determine the matching so the value stored in b will be 1+2+1 = 4, and
the value stored in d will be 2+1+2 = 5.

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 30


MPI_Allreduce
• Useful in a situation in which all of the processes need the result
of a global sum in order to complete some larger computation.

A global sum followed by


distribution of the result.

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 31


Butterfly Structured Global Sum

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 32


Broadcast
A tree-structured
broadcast.

• Data belonging to a single process is sent to all of the


processes in the communicator.

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 33


A version of Get_input that uses MPI_Bcast
Function Prototype

Function Definition

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 34


Data Distribution / Partition Strategies
• Block partitioning
• Assign blocks of consecutive components to each process.
• Cyclic partitioning
• Assign components in a round robin fashion.
• Block-cyclic partitioning
• Use a cyclic distribution of blocks of components.

Data Distribution Strategy


Total 12 elements Vector

3 Processors

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 35


Compute Vector Sum (Serial / Parallel)
Vector Sum Math Serial Approach

Loop for only “my” Items

Parallel Sum (Same Prototype)

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 36


Scatter and Gather (Distribute and Collect)
MPI_Scatter can be used in a function that reads MPI_Gather Collect all of the components of the
in an entire vector on process 0 but only sends the vector onto process 0, and then process 0 can
needed components to each of the other processes. process all of the components.

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 37


Reading and distributing a vector (Scatter)
Function Prototype
Function Definition

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 38


Print a distributed vector (Gather)
Function Definition

Function Prototype

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 39


Allgather
• Concatenates the contents of each process’ send_buf_p and stores this in each process’
recv_buf_p.
• As usual, recv_count is the amount of data being received from each process.

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 40


Matrix-Vector Multiplication

Dot product of the ith


i-th component of y
row of A with x.

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 41


Matrix-Vector Multiplication (Serial Version)
Serial Pseudocode

C Style Array stored as

Serial Program

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 42


Matrix-Vector Multiplication (MPI Version)
Function Prototype

Function Definition

Serial Version Comparison

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 43


Questions and comments?

Parallel and Distributed Programming (Dr. M. Naveed Akhtar) 44

You might also like