0% found this document useful (0 votes)

23 views8 pages

MPP Exercises

This document provides exercises on message-passing programming using MPI. It includes examples of writing simple Hello World programs in MPI and parallelizing a calculation of pi across multiple processes. It also describes how to measure timing in MPI programs and includes an exercise on ping-pong benchmarking between two processes.

Uploaded by

Sadiholic

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views8 pages

MPP Exercises

Uploaded by

Sadiholic

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

N I V E R

U S

IT
T H

Y
O F

H
G
E

R
D I U
N B

Exercises: Message-Passing Programming

David Henty

1 Hello World

1. Write an MPI program which prints the message “Hello World”.

2. Compile and run on several processes in parallel, using the backend compute nodes of ARCHER
(you will need to use qsub to run on the compute nodes).
3. Modify your program so that each process prints out both its rank and the total number of processes
P that the code is running on, i.e. the size of MPI_COMM_WORLD.
4. Modify your program so that only the master process (i.e. rank 0) prints out a message (very useful
when you run with hundreds of processes).
5. What happens if you omit the final MPI procedure call in your program?

1.1 Extra Exercises

Since both Cirrus and ARCHER are clusters of shared-memory nodes (36 and 24 cores per node respec-
tively), it can be interesting to know if two MPI processes are on the same or different nodes. This is not
specified by MPI – it is a function of the job launcher program (i.e. mpiexec_mpt or aprun).

1. Use the function MPI_Get_processor_name() to get each rank to print out where it is run-
ning. What is the default distribution of processes across multiple nodes?
2. You have some control over how the processes are allocated to nodes via additional options, e.g.
-ppn for mpiexec_mpt or -N for aprun. Modify the arguments to the job launcher to check
that these behave as you expect.

2 Parallel calculation of π

An approximation to the value π can be obtained from the following expression

1 N
π dx 1 X 1
Z
= ≈ 2
4 1 + x2 N i=1 i− 12

0 1+ N

where the answer becomes more accurate with increasing N . Iterations over i are independent so the
calculation can be parallelised.
For the following exercises you should set N = 840. This number is divisible by 2, 3, 4, 5, 6, 7 and 8
which is convenient when you parallelise the calculation!

1. Modify your Hello World program so that each process independently computes the value of π and
prints it to the screen. Check that the values are correct (each process should print the same value).

1
2. Now arrange for different processes to do the computation for different ranges of i. For example,
on two processes: rank 0 would do i = 1, 2, . . . , N2 ; rank 1 would do i = N2 + 1, N2 + 2, . . . , N .
Print the partial sums to the screen and check the values are correct by adding them up by hand.
3. Now we want to accumulate these partial sums by sending them to the master (rank 0) to add up:
• all processes (except the master) send their partial sum to the master
• the master receives the values from all the other processes, adding them to its own partial sum
You should use the MPI routines MPI_Ssend and MPI_Recv.
4. Use the function MPI_Wtime (see below) to record the time it takes to perform the calculation.
For a given value of N , does the time decrease as you increase the number of processes? Note that
to ensure that the calculation takes a sensible amount of time (e.g. more than a second) you will
probably have to perform the calculation of π several thousands of times.
5. Ensure your program works correctly if N is not an exact multiple of the number of processes P .

2.1 Timing MPI Programs

The MPI_Wtime() routine returns a double-precision floating-point number which represents elapsed
wall-clock time in seconds. The timer has no defined starting-point, so in order to time a piece of code,
two calls are needed and the difference should be taken between them.
There are a number of important considerations when timing a parallel program:
1. Due to system variability, it is not possible to accurately time any program that runs for a very
short time. A rule of thumb is that you cannot trust any measurement much less than one second.
2. To ensure a reasonable runtime, you will probably have to repeat the calculation many times within
a do/for loop. Make sure that you remove any print statements from within the loop, otherwise there
will be far too much output and you will simply be measuring the time taken to print to screen.
3. Due to the SPMD nature of MPI, each process will report a different time as they are all running
independently. A simple way to avoid confusion is to synchronise all processes when timing, e.g.
MPI_Barrier(MPI_COMM_WORLD); // Line up at the start line
tstart = MPI_Wtime(); // Fire the gun and start the clock
... // Code to be timed in here ...
MPI_Barrier(MPI_COMM_WORLD); // Wait for everyone to finish
tstop = MPI_WTime(); // Stop the clock
Note that the barrier is only needed to get consistent timings – it should not affect code correctness.
With synchronisation in place, all processes will record roughly the same time (the time of the
slowest process) and you only need to print it out on a single process (e.g. rank 0).
4. To get meaningful timings for more than a few processes you must run on the backend of morar
using qsub. If you run interactively then you will have more MPI processes than physical cores
and you will not see any speedup.

2.2 Extra Exercises

1. Write two versions of the code to sum the partial values: one where the master does explicit
receives from each of the P − 1 other processes in turn, the other where it issues P − 1 receives
each from any source (using wildcarding).
2. Print out the final value of π to its full precision (e.g.. 10 decimal places for single precision, or 20
for double). Do your two versions give exactly the same result as each other? Does each version
give exactly the same value every time you run it?

2
3. To fix any problems for the wildcard version, you can receive the values from all the processors
first, then add them up in a specific order afterwards. The master should declare a small array and
place the result from process i in position i in the array (or i + 1 for Fortran!). Once all the slots
are filled, the final value can be calculated. Does this fix the problem?
4. You have to repeat the entire calculation many times if you want to time the code. When you do
this, print out the value of π after the final repetition. Do both versions get a reasonable answer?
Can you spot what the problem might be for the wildcard version? Can you think of a way to fix
this using tags?

Size (bytes) # Iterations Total time (secs) Time per message Bandwidth (MB/s)

Table 1: Ping-Pong Results for Exercise 3

3 Ping Pong
1. Write a program in which two processes (say rank 0 and rank 1) repeatedly pass a message back
and forth. Use the synchronous mode MPI_Ssend to send the data. You should write your
program so that it operates correctly even when run on more than two processes, i.e. processes
with rank greater than one should simply do nothing. For simplicity, use a message that is an array
of integers. Remember that this is like a game of table-tennis:

• rank 0 should send a message to rank 1

• rank 1 should receive this message then send the same data back to rank 0
• rank 0 should receive the message from rank 1 and then return it
• etc. etc.

2. Insert timing calls to measure the time taken by all the communications. You will need to time
many ping-pong iterations to get a reasonable elapsed time, especially for small message lengths.
3. Investigate how the time taken varies with the size of the message. You should fill in your results
in Table 1. What is the asymptotic bandwidth for large messages?
4. Plot a graph of time against message size to determine the latency (i.e. the time taken for a message
of zero length); plot a graph of the bandwidth to see how this varies with message size.

The bandwidth and latency are key characteristics of any parallel machine, so it is always instructive to
run this ping-pong code on any new computers you may get access to.

3.1 Extra exercises

1. How do the ping-pong bandwidth and latency figures vary when you use buffered or standard
modes (MPI_Bsend and MPI_Send)?
Note: to send large messages with buffered sends you will have to supply MPI with additional
buffer space using MPI_Buffer_attach().
2. Write a program in which the master process sends the same message to all the other processes
in MPI_COMM_WORLD and then receives the message back from all of them. How does the time
taken vary with the size of the messages and with the number of processes?

3
Step 1 C Step 2 C+B

2 2
C B B A

D 3 1 B D+C 3 1 B+A

D A C D
0 0

A A+D

Step 3 C+B+A Result C+B+A+D

2 2
A D

D+C+B 3 1 B+A+D D+C+B+A 3 1 B+A+D+C

B C
0 0

A+D+C A+D+C+B

Figure 1: Global sum of four variables

4 Rotating information around a ring

Consider a set of processes arranged in a ring as shown in Figure 1. A simple way to perform a global
sum of data stored on each process (a parallel reduction operation) is to rotate each piece of data all the
way round the ring. At each iteration, a process receives some data from the left, adds the value to its
running total, then passes the data it has just received on to the right.
Figure 1 illustrates how this works for four processes (ranks 0, 1, 2 and 3) who hold values A, B, C and
D respectively. The running total on each process is shown in the square box, and the data being sent
between processes is shown next to the arrow. After three steps (P − 1 steps in general for P processes)
each process has computed the global sum A + B + C + D.
1. Write a program to perform a global sum using this simple ring method. Each process needs to
know the ranks of its two neighbours in the ring, which stay constant throughout the program. You
should use synchronous sends and avoid deadlock by using non-blocking forms for either the send
(MPI_Issend) or the receive (MPI_Irecv). Remember that you cannot assume that a non-
blocking operation has completed until you have issued an explicit wait. You can use non-blocking
calls for send and receive, but you will have to store two separate requests and wait on them both.
We need to initialise the local variables on each process with some process-dependent value. For
simplicity, we will just use the value of rank, i.e. in Figure 1 this would mean A = 0, B = 1,
C = 2 and D = 3. You should check that every process computes the sum correctly (e.g. print the
final value to the screen), which in this case is P (P − 1)/2.
2. Your program should compute the correct global sum for any set of input values. If you initialise
the local values to (rank + 1)2 , do you get the correct result P (P + 1)(2P + 1)/6 ?

4.1 Extra exercises

1. Measure the time taken for a global sum and investigate how it varies with increasing P . Plot a
graph of time against P — does the ring method scale as you would expect?
2. Using these timings, estimate how long it takes to send each individual message between processes.
How does this result compare with the latency figures from the ping-pong exercise?

4
3. The MPI_Sendrecv call is designed to avoid deadlock by combining the separate send and
receive operations into a single routine. Write a new version of the global sum using this routine
and compare the time taken with the previous implementation. Which is faster?
4. Investigate the time taken when you use standard and buffered sends rather than synchronous mode
(using MPI_Bsend you do not even need to use the non-blocking form as it is guaranteed to be
asynchronous). Which is the fastest? By comparing to the time taken by the combined send and
receive operation, can you guess how MPI_Sendrecv is actually being implemented?

5 Collective communications

1. Re-write the ring example using an MPI reduction operation to perform the global sum.
2. How does the execution time vary with P and how does this compare to your own implementation
which used the ring method?

5.1 Extra exercises

1. Compare the performance of a single reduction of an array of N values compared with N separate
calls to reductions of a single value. Can you explain the results (the latency and bandwidth values
from the pingpong code are useful to know)?
2. You can ensure all processes get the answer of a reduction by doing MPI_Reduce followed by
MPI_Bcast, or using MPI_Allreduce. Compare the performance of these two methods.
3. Imagine you want all the processes to write their output, in order, to a single file. The important
point is that only one process can have the file open at any given time. Using the MPI_Barrier
routine, modify your code so each process in turn opens, appends to, then closes the output file.

6 Rotating information using a Cartesian topology

For a 1D arrangement of processes it may seem a lot of effort to use a Cartesian topology rather than
managing the processes by hand, e.g. calculating the ranks of the nearest neighbours. It is, however,
worth learning how to use topologies as these book-keeping calculations become tedious to do by hand
in two and three dimensions. For simplicity we will only use the routines in one dimension. Even for this
simple case the exercise shows how easy it is to change the boundary conditions when using topologies.

1. Re-write the passing-around-a-ring exercise so that it uses a one-dimensional Cartesian topology,

computing the ranks of the nearest neighbours using MPI_Cart_shift. Remember to set the
periodicity of the boundary conditions appropriately for a ring rather than a line.
2. Alter the boundary conditions to change the topology from a ring to a line, and re-run your program.
Be sure to run using both of the initial values, i.e. rank and (rank + 1)2 . Do you understand the
output? What reduction operation is now being implemented? What are the the neighbouring ranks
for the two processes at the extreme ends of the line?
3. Check the results agree with those obtained from calling an MPI_Scan collective routine.

6.1 Extra exercises

1. Measure the time taken for the global sum in both periodic and non-periodic topologies, and inves-
tigate how it varies with P .

5
2. Extend the one-dimensional ring topology to a two-dimensional cylinder (periodic in one direction,
non-periodic in the other). Perform two separate reduction operations, one in each of the two
dimensions of the cylinder.

msg i,j mcols mrows

1111
0000
0000
1111
0000
1111
0000
1111 1111111111111
0000000000000
0000
1111 0000000000000
1111111111111
0000
1111 0000000000000
1111111111111
0000
1111 0000000000000
1111111111111
0000
1111
1,6 M
0000000000000
1111111111111
N
1,5 2,5
0000
1111
1,4 2,4 3,4
0000
1111
0000
1111
0000
1111
1,3 2,3 3,3 4,3

0000
1111
1,2 2,2 3,2 4,2 5,2
1,1 2,1 3,1 4,1 5,1 6,1
0000
1111
N M

Figure 2: Diagrammatic representation of the mcols and mrows matrix subsections

7 Derived Datatypes

We will extend exercise 4 to perform a global sum of a non-basic datatype. A simple example of this
is a compound type containing both an integer and a double-precision floating-point number. Such a
compound type can be declared as a structure in C, or a derived type in Fortran, as follows:

struct compound type compound

{
int ival; integer :: ival
double dval; double precision :: dval
};
end type compound

struct compound x,y; type(compound) :: x,y

x.ival = 1; x%ival = 1
y.dval = 9.0; y%dval = 9.0

If you are unfamiliar with using derived types in Fortran then I recommend that you go straight to exercise
number 2 which deals with defining MPI datatypes to map onto subsections of arrays. This is, in fact,
the most common use of derived types in scientific applications of MPI.

1. Modify the ring exercise so that it uses an MPI_Type_struct derived datatype to pass round
the above compound type, and computes separate integer and floating-point totals. You will need
to use MPI_Address to obtain the displacements of ival and dval. Initialise the integer part
to rank and the floating-point part to (rank + 1)2 and check that you obtain the correct results.
2. Modify your existing ping-pong code to exchange N × N square matrices between the processes
(int msg[N][N] in C or INTEGER MSG(N,N) in Fortran). Initialise the matrix elements to
be equal to rank so that you can keep track of the messages. Define MPI_Type_contiguous
and MPI_Type_vector derived types to represent N × M (type mcols) and M × N (type
mrows) subsections of the matrix, where M ≤ N . Which datatype to use for each subsection
depends on whether you are using C or Fortran. You may find it helpful to refer to Figure 2 to
clarify this, where I draw the arrays in the standard fashion for matrices (indices start from one
rather than zero, first index goes down, second index goes across).

6
Set N = 10 and M = 3 and exchange columns 4, 5 and 6 of the matrix using the mcols type. Print
the entire matrix on each process after every stage of the ping-pong to check that for correctness.
Now modify your program to exchange rows 4, 5 and 6 using the mrows type.

7.1 Extra exercises

1. For the compound type, print out the values of the displacements. Do you understand the results?
2. Modify the program that performed a global sum on a compound datatype so that it uses an MPI
collective routine. You will have to register your own reduction operation so that the MPI library
knows what calculation you want to be performed. Remember that addition is not a pre-defined
operation on your compound type; it still has to be defined even in the native language.
3. Modify your ping-pong code for matrix subsections so that you send type mcols and receive type
mrows. Do things function as you would expect?

7
8 Global Summation Using a Hypercube Algorithm
Although you should always perform global summations by using MPI_Reduce or MPI_Allreduce
with MPI_Op=MPI_SUM, it is an interesting exercise to program your own version using a more efficient
algorithm than the previous naive “message-round-a-ring” approach.
A more efficient method, at least for a number of processes that is a power of two, is to imagine that the
processes are arranged in a cube. The coordinates of the processes in the cube are taken from the binary
representation of the rank, therefore ensuring that exactly one process sits at each vertex of the cube.
Processes operate in pairs, swapping partial sums between neighbouring processes in each dimension in
turn. Figure 3 illustrates how this works in three dimensions (i.e. 23 = 8 processes).

3 7 011 111

1 5 001 101

2 6 010 110

0 4 000 100

011 111 011 111

001 101 001 101

010 110 010 110

000 100 000 100

011 111

001 101

010 110

000 100

3 7 H H+D
3 7

1 5 1 5

2 6 2 6
G G+C

0 4 0 4
A E A+E E+A

3 7 (H+D)+(F+B)

1 5

2 6 z
(G+C)+(E+A)
y

0 4 x

(A+E)+(C+G) (E+A)+(G+C)

Figure 3: Communications pattern for global sum on 8 processes

An elegant way to program this is to construct a periodic cartesian topology of the appropriate dimension
and compute neighbours using MPI_Cart_shift. When each process swaps data with its neighbour
you must ensure that your program does not deadlock. This can be done by a variety of methods, includ-
ing a ping-pong approach where each pair of processes agrees in advance who will do a send followed
by a receive, and who will do a receive followed by a send. How do you expect the time to scale with the
number of processes, and how does this compare to the measured time taken by MPI_Allreduce?

Computer Structures - MPI
No ratings yet
Computer Structures - MPI
16 pages
Apznzayhh7i3gk6w Cuvwt6frekq7pgon 9ygvyqpxxizr06xwwpcj29m2cyf7srhmq5cu Hawkzm7cn8obps 9rbemjx43qoi2aixrppfxvlfp9nmwowtjlseuprpbxpttdeipr Rkq Zraxgwytizjexby1hzff8pkune92ywhrc Aez8ev7xemzlvd Qovivr9vkxanyei
No ratings yet
Apznzayhh7i3gk6w Cuvwt6frekq7pgon 9ygvyqpxxizr06xwwpcj29m2cyf7srhmq5cu Hawkzm7cn8obps 9rbemjx43qoi2aixrppfxvlfp9nmwowtjlseuprpbxpttdeipr Rkq Zraxgwytizjexby1hzff8pkune92ywhrc Aez8ev7xemzlvd Qovivr9vkxanyei
19 pages
Final PDC Exam
No ratings yet
Final PDC Exam
10 pages
Week 6 10
No ratings yet
Week 6 10
43 pages
Distributed Memory Programming With MPI: Peter Pacheco
No ratings yet
Distributed Memory Programming With MPI: Peter Pacheco
121 pages
Lab3
No ratings yet
Lab3
4 pages
2 Mpi
No ratings yet
2 Mpi
13 pages
Week 6 10
No ratings yet
Week 6 10
44 pages
Solution of Project
No ratings yet
Solution of Project
5 pages
Introduction MPI - Chap2 - Slide 3
No ratings yet
Introduction MPI - Chap2 - Slide 3
16 pages
10 MPI Programmes
No ratings yet
10 MPI Programmes
26 pages
Structure of A MPI Program
No ratings yet
Structure of A MPI Program
26 pages
MPI Pacheco Ch3
No ratings yet
MPI Pacheco Ch3
124 pages
Distributed Memory Programming With: Peter Pacheco
No ratings yet
Distributed Memory Programming With: Peter Pacheco
125 pages
Code: First Method:: (1) Write A C Program Using Open MP To Estimate The Value of PI (Use Minimum Two Methods)
No ratings yet
Code: First Method:: (1) Write A C Program Using Open MP To Estimate The Value of PI (Use Minimum Two Methods)
8 pages
Parallel & Distributed Computing: MPI - Message Passing Interface
No ratings yet
Parallel & Distributed Computing: MPI - Message Passing Interface
49 pages
1 MPI Communications: CS424. Parallel Computing Lab#4
No ratings yet
1 MPI Communications: CS424. Parallel Computing Lab#4
30 pages
PDCLabMan Updated
No ratings yet
PDCLabMan Updated
46 pages
RajSingh HPC Exp1-7
No ratings yet
RajSingh HPC Exp1-7
23 pages
Intro MPI
No ratings yet
Intro MPI
60 pages
Exercise - 4
No ratings yet
Exercise - 4
8 pages
MPI Plamen Krastev
No ratings yet
MPI Plamen Krastev
49 pages
Lecture07 MPI by Example
No ratings yet
Lecture07 MPI by Example
27 pages
CP4292 Mcap
No ratings yet
CP4292 Mcap
15 pages
Assignment (T)
No ratings yet
Assignment (T)
13 pages
In3200 Chap09
No ratings yet
In3200 Chap09
56 pages
1.hello World Programme in Mpi
No ratings yet
1.hello World Programme in Mpi
11 pages
Exercise - 4
No ratings yet
Exercise - 4
8 pages
Untitled Document
No ratings yet
Untitled Document
23 pages
Unit Iv Distributed Memory Programming With Mpi
No ratings yet
Unit Iv Distributed Memory Programming With Mpi
19 pages
Mpi 1
No ratings yet
Mpi 1
38 pages
6 P2p-Iii
No ratings yet
6 P2p-Iii
33 pages
What Is The Message Passing Interface (MPI) ?: Standardization
No ratings yet
What Is The Message Passing Interface (MPI) ?: Standardization
5 pages
Map55612 1
No ratings yet
Map55612 1
10 pages
08 1 MPI Comm Data Distributions
No ratings yet
08 1 MPI Comm Data Distributions
60 pages
Sunil Kumar L 24
No ratings yet
Sunil Kumar L 24
21 pages
Mpi Basic Operations
No ratings yet
Mpi Basic Operations
6 pages
MPI Lab 3
No ratings yet
MPI Lab 3
18 pages
ECE 1747H: Parallel Programming: Message Passing (MPI)
No ratings yet
ECE 1747H: Parallel Programming: Message Passing (MPI)
67 pages
ECE 1747H: Parallel Programming: Message Passing (MPI)
No ratings yet
ECE 1747H: Parallel Programming: Message Passing (MPI)
67 pages
Pdcnotes
No ratings yet
Pdcnotes
23 pages
Pcap Cse 3263 Lab Manual 2023
No ratings yet
Pcap Cse 3263 Lab Manual 2023
70 pages
MPI2
No ratings yet
MPI2
3 pages
Mpi 1
No ratings yet
Mpi 1
20 pages
Intro To MPI
No ratings yet
Intro To MPI
44 pages
Intro To MPI: Hpc-Support@duke - Edu
No ratings yet
Intro To MPI: Hpc-Support@duke - Edu
56 pages
Mpi
No ratings yet
Mpi
46 pages
Problemes MPI
No ratings yet
Problemes MPI
4 pages
Parallel Programming and MPI
No ratings yet
Parallel Programming and MPI
54 pages
PDC Lab 8
No ratings yet
PDC Lab 8
7 pages
Introduction To C MPI PM
No ratings yet
Introduction To C MPI PM
50 pages
Mpi
No ratings yet
Mpi
30 pages
ATPESC 2019 Track-2 1-7-30 830am Guo-Raffenetti-Thakur-MPI For Scalable Computing
No ratings yet
ATPESC 2019 Track-2 1-7-30 830am Guo-Raffenetti-Thakur-MPI For Scalable Computing
199 pages
Using MPI With Fortran - Research Computing University of Colorado Boulder Documentation
No ratings yet
Using MPI With Fortran - Research Computing University of Colorado Boulder Documentation
8 pages
5 P2p-Ii
No ratings yet
5 P2p-Ii
26 pages
Pseudo Code of Mpi Programs
No ratings yet
Pseudo Code of Mpi Programs
22 pages
Point-to-Point Communication: MPI Send MPI Recv
No ratings yet
Point-to-Point Communication: MPI Send MPI Recv
4 pages
02 Message Passing Interface Tutorial
No ratings yet
02 Message Passing Interface Tutorial
34 pages
Lab 11
No ratings yet
Lab 11
2 pages
Gd Script
From Everand
Gd Script
Marijo Trkulja
No ratings yet
Intro CH 05
No ratings yet
Intro CH 05
46 pages
Intro CH 02
No ratings yet
Intro CH 02
34 pages
Intro CH 01
No ratings yet
Intro CH 01
37 pages
Intro CH 06
No ratings yet
Intro CH 06
42 pages
Intro CH 04
No ratings yet
Intro CH 04
43 pages
Lecture 2
No ratings yet
Lecture 2
22 pages
Intro CH 03
No ratings yet
Intro CH 03
26 pages
SQL Queries Notes
No ratings yet
SQL Queries Notes
45 pages
Lecture 2
No ratings yet
Lecture 2
16 pages
ExcITe Cup 2024 Poster
No ratings yet
ExcITe Cup 2024 Poster
1 page
IS & F Lecture 7 8 9 10 - MaliciousSoftware & Security Attacks
No ratings yet
IS & F Lecture 7 8 9 10 - MaliciousSoftware & Security Attacks
29 pages
Lecture 14 - External Memory 1
No ratings yet
Lecture 14 - External Memory 1
17 pages
Marks Deduction For 7B
No ratings yet
Marks Deduction For 7B
1 page
Rule Groups in Transformation PDF
No ratings yet
Rule Groups in Transformation PDF
10 pages
Aanchal Digest July 2017
71% (17)
Aanchal Digest July 2017
299 pages
Using Office Backstage: Lesson 3
No ratings yet
Using Office Backstage: Lesson 3
35 pages
Screenless Display Document
No ratings yet
Screenless Display Document
24 pages
Practice Questions - Comp
No ratings yet
Practice Questions - Comp
4 pages
Correct Mark 1.00 Out of 1.00
No ratings yet
Correct Mark 1.00 Out of 1.00
45 pages
OpenAMIP Standard Revision B
No ratings yet
OpenAMIP Standard Revision B
70 pages
Affidavit For A Criminal Complaint & Arrest Warrant For James Gordon Meek
No ratings yet
Affidavit For A Criminal Complaint & Arrest Warrant For James Gordon Meek
15 pages
Ender-7: 3D Printer User Manual
No ratings yet
Ender-7: 3D Printer User Manual
36 pages
My Personal Statement
No ratings yet
My Personal Statement
1 page
JavaScript Syllabus
No ratings yet
JavaScript Syllabus
3 pages
Smart Car Parking System Project Report
No ratings yet
Smart Car Parking System Project Report
17 pages
Installation and Licensing Documentation
No ratings yet
Installation and Licensing Documentation
316 pages
Lab - Configure Basic Router Settings Topology: Addressing Table
No ratings yet
Lab - Configure Basic Router Settings Topology: Addressing Table
14 pages
SOAL PAS INFORMATIKA KELAS 9 (Respons)
No ratings yet
SOAL PAS INFORMATIKA KELAS 9 (Respons)
57 pages
Human-Computer Interaction: Discipline of HCI
0% (1)
Human-Computer Interaction: Discipline of HCI
51 pages
SAD Unit2 PGDCA
No ratings yet
SAD Unit2 PGDCA
16 pages
Amazon Cover Letter Example
100% (2)
Amazon Cover Letter Example
7 pages
Landing A Job in A Product Based Company
No ratings yet
Landing A Job in A Product Based Company
8 pages
Evs TM Unit 5
No ratings yet
Evs TM Unit 5
34 pages
Vulnerability Management For Mobility
No ratings yet
Vulnerability Management For Mobility
9 pages
IOT Based Home Automation System FINAL REPORT - Delete
No ratings yet
IOT Based Home Automation System FINAL REPORT - Delete
53 pages
Array and Text File, Search and Sort - Practice
No ratings yet
Array and Text File, Search and Sort - Practice
4 pages
Database Login Form Task Rutuja Shejul
No ratings yet
Database Login Form Task Rutuja Shejul
7 pages
6building Blocks of Power BI
No ratings yet
6building Blocks of Power BI
6 pages
Fresherinfo - C Language Interview Question and Answers For TCS, Wipro, Infosys
No ratings yet
Fresherinfo - C Language Interview Question and Answers For TCS, Wipro, Infosys
16 pages
ErrMsg Eng
No ratings yet
ErrMsg Eng
9 pages
Fundamentals of CSS - Learn CSS - The Box Model Cheatsheet - Codecademy PDF
No ratings yet
Fundamentals of CSS - Learn CSS - The Box Model Cheatsheet - Codecademy PDF
3 pages
Online AppM HR M1-M80
No ratings yet
Online AppM HR M1-M80
80 pages
Man Chromeleon7 Reference Card May2009 Rev1 0
No ratings yet
Man Chromeleon7 Reference Card May2009 Rev1 0
4 pages

MPP Exercises

Uploaded by

MPP Exercises

Uploaded by

N I V E R

Exercises: Message-Passing Programming

1. Write an MPI program which prints the message “Hello World”.

1.1 Extra Exercises

An approximation to the value π can be obtained from the following expression

2.1 Timing MPI Programs

2.2 Extra Exercises

Table 1: Ping-Pong Results for Exercise 3

• rank 0 should send a message to rank 1

3.1 Extra exercises

Step 3 C+B+A Result C+B+A+D

D+C+B 3 1 B+A+D D+C+B+A 3 1 B+A+D+C

Figure 1: Global sum of four variables

4 Rotating information around a ring

4.1 Extra exercises

5.1 Extra exercises

6 Rotating information using a Cartesian topology

1. Re-write the passing-around-a-ring exercise so that it uses a one-dimensional Cartesian topology,

6.1 Extra exercises

msg i,j mcols mrows

Figure 2: Diagrammatic representation of the mcols and mrows matrix subsections

struct compound type compound

struct compound x,y; type(compound) :: x,y

7.1 Extra exercises

011 111 011 111

001 101 001 101

010 110 010 110

000 100 000 100

Figure 3: Communications pattern for global sum on 8 processes

You might also like