0% found this document useful (0 votes)

91 views30 pages

1 MPI Communications: CS424. Parallel Computing Lab#4

Uploaded by

thatsarra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

91 views30 pages

1 MPI Communications: CS424. Parallel Computing Lab#4

Uploaded by

thatsarra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

CS424.

Parallel Computing Lab#4

1 MPI Communications
In MPI (Message Passing Interface), communication plays a crucial role in coordinating work among independent
processes. MPI processes are independent, and explicit communication is necessary for coordination. There are two
types of communication within MPI:

1. Point-to-Point Communication: involves sending and receiving messages between two specific processes
within the same communicator. It allows processes to exchange data directly with each other.
 Types of Point-to-Point Operations:
- Blocking Send/Receive: Synchronous communication where the sender blocks until the
receiver acknowledges receipt.
- Non-blocking Send/Receive: Asynchronous communication where the sender initiates the
transfer and continues execution without waiting for the receiver.
2. Collective Communication: involves multiple processes working together as a group. It enables coordinated
operations across all processes in a communicator. It uses collective MPI functions, such as:
- MPI_Scatter: Distributes data from one process to many processes.
- MPI_Reduce: Performs data reduction (e.g., sum, max) across all processes.
- MPI_Bcast: Broadcasts data from one process to all others.

2 Examples
1. The following program implements a simple send/receive communication. Compile and run the program and
study the output of several runs.

Code 1

2. The following program demonstrates sending and receiving messages between two processes in a ping-pong
fashion. The communication does not work if the size of the communicator is not exactly 2.
Note that:
 Process 0: Sends a message to process 1 and then receives a message back.
 Process 1: Receives a message from process 0, sends a response, and then receives another
message.
Compile and run the program and study the output of several runs.
1
Code 2

3 Practice
1. Revisit Code 1 and change the behavior of the program so process 0 can receive the results in the order in
which the processes finish.

2. Explain the execution of Code 2 if the size of the communicator is not 2.

3. Compile and run the code in mpiSumArray.c( Code folder), study the behavior of the program across several
runs, and write the proper code lines to address the comments marked as “// ** display…”. Your output
should be similar to the example below.

4. Write an MPI program in C to calculate the sum of two predefined arrays of at least length 12. The program
should ensure that it only runs when the number of processes is four or more. Use parallel execution to
compute partial sums and then combine them to get the final result.

2
CS424. Parallel Computing Lab#5

1 MPI Collective Communications

MPI's collective communication functions allow all processes in a communicator group to perform an operation
together. These functions offer efficient communication patterns for parallel programming. The followings are some
common collective functions.

1. MPI_Bcast: One process (usually the root) has a piece of data.This data is sent to all other processes in
the communicator. All processes will have the same data after the broadcast.
2. MPI_Reduce: Each process has a local value (e.g., individual score). A mathematical operation (like sum,
max, min) is applied to all the values sent by each process. The result of the operation is stored on the
designated process (usually the root).
3. MPI_All_reduce: Similar to Reduce but it distributes the reduced result to all processes.
4. MPI_Scatter: One process (usually the root) has a large dataset. The data is divided (scattered) into
smaller chunks and sent to all processes. Each process receives a portion of the original data.
5. MPI_Gather: Each process has a piece of data. All processes send their data to a designated process
(usually the root). The root process gathers all the data into a single collection.
6. MPI_All_gather: Gathers data from all processes and distributes it to all processes.
7. MPI_Barrier: All processes in the communicator must reach this point before any can proceed. Useful
for synchronization, ensuring all processes have finished a specific task before moving on.

Benefits of Collective Communication:

• Simplifies code compared to sending and receiving messages individually between processes.
• Ensures efficient data exchange and synchronization within the communicator group.

Additional Notes:

• Not all collective functions require data exchange (e.g., MPI_Barrier).

• The designated "root" process can vary depending on the function.
• Some collective functions offer additional options like message tags for further differentiation.
• MPI_All_reduce gathers data from processes, execute the chosen operation, and then distributes, while
MPI_All_gather simply gathers and distributes.

To optimize code and potentially enhance performance, it is beneficial to explore which MPI functions can be
effectively substituted with other functions while maintaining the core functionality of the program.

2 Examples
Code 1 distributes array values among processes using MPI_Scatter. Compile and run the program and study the
output of several runs.

Explanation:

1. Include headers: Necessary header files for MPI communication (mpi.h), standard input/output (stdio.h),
and memory allocation (stdlib.h).
2. MPI Initialization: MPI_Init initializes the MPI environment and prepares it for communication.
3. Get Process Information:
o MPI_Comm_rank: Retrieves the rank (unique identifier) of the current process within the
communicator group.
o MPI_Comm_size: Retrieves the total number of processes participating in the communicator group.
4. Global Array (Root Process Only):
o Allocates memory for the entire array data on the root process (rank == 0) using malloc.
o You can modify the loop to initialize the data array with your desired values.

1
5. Scattering Data:
o send_count: Calculates the number of elements to send to each process, considering potential
uneven division.
o remainder: Tracks any remaining elements after dividing the total by the number of processes.
o MPI_Scatter: Distributes elements of the data array on the root process to all processes, including
the root itself. Each process receives send_count elements (except the root process might receive
extras if there's a remainder). local_data is allocated to store the received portion.
6. Printing Received Data: Each process prints the elements it received using a loop.
7. Memory Deallocation: Frees the allocated memory for local_data on all processes and data on the root
process.
8. Finalize MPI: MPI_Finalize cleans up the MPI environment and releases resources.
Running MPI programs on a single core system
• The --oversubscribe parameter in mpiexec (or mpirun) allows you to launch more MPI processes on a
single node than the number of available cores or hardware threads.
• The behavior of --oversubscribe might vary depending on the specific MPI implementation and system
configuration.
• Some MPI libraries might offer alternative options for process placement that provide more control over
oversubscription.

Code 1

2
3 Practice
1. In MPI, achieving a desired outcome often involves multiple communication pathways. List the potential
substitutions between MPI functions that can achieve similar communication patterns.

2. Recall the version you wrote of the code in mpiSumArray.c (Code folder, sample run is shown below).
Answer the following questions.

a) Rewrite the program using MPI_Reduce.

b) Show in four ways- 4 programs- how to calculate the total sum and send each process a copy of the result. Each
process must display the result it has received.

3. Do the necessary changes to the MPI program in Code 1 to perform the following.
a) Instead of printing the values of local_data, multiply the values of local_data by rank+1.
b) Gather the values of arrays after multiplication and send it to process 1. Show the result.
c) Use MPI_Reduce to calculate the minimum value of the array- after gathering- and let process 2 display the
result.

3
MPI identifiers

‫اسمها‬ ‫وظيفتها‬ ‫اقواسها‬ ‫عدد‬ ‫عدد‬

‫انات‬2‫ا‬ ‫العناوين‬
MPI_init MPI ‫ تهيئة وتشغيل بيئة‬MPI_init(NULL,NULL); 2 2
‫ تُستدعى في بداية أي برنامج‬MPI_init(&argc,&argv);

MPI_Finalize MPI ‫ انهاء بيئة‬MPI_Finalize( ); 0 0

‫ تُستدعى في نهاية البرنامج‬MPI_Finalize(void);

MPI_Comm_rank process ‫ الخاص بكل‬rank‫عرفة ال‬D MPI_Comm_rank(MPI_COMM_WORLD, &rank): 2 1

MPI_Comm_size ‫ داخل‬process‫عرفة كم عدد ال‬D MPI_Comm_rank(MPI_COMM_WORLD, &size): 2 1

communicator‫ال‬

MPI_send ‫ اخرى‬proc ‫ ترسل رسالة الى‬MPI_send(&message, size, datatype, dest, tag,MPI_COMM_WORLD); 6 1

MPI_Recv ‫ اخرى‬proc ‫ تستقبل رسالة من‬MPI_Recv(&message, size,datatype, src, 7 2

tag,MPI_COMM_WORLD,&status);

MPI_Reduce ‫ وتقوم بعملية‬procs‫ تجميع البيانات من ال‬MPI_Reduce(&sendbuf, &recvbuf, count, 7 2

‫ حسابية عليها‬Datatype ,operator ,dest ,MPI_COMM_WORLD);

MPI_Bcast procs‫ توزيع البيانات على جميع ال‬MPI_Bcast(&data ,count , datatype ,source, MPI_COMM_WORLD); 5 1

MPI_Allreduce ‫ وتقوم بعملية‬procs‫ تجميع البيانات من ال‬MPI_Allreduce (&sendbuf, &recvbuf, count, 6 2

‫ حسابية عليها‬Datatype ,operator ,MPI_COMM_WORLD);
proc‫و توزع النتيجة على كل ال‬

MPI_Scatter ‫ توزيع جزء من من مصفوفه على جميع‬MPI_Scatter(&sendbuf, sendcount, sendtype, &recvbuf, recvcount, 8 2
procs‫ ال‬recvtype, src, MPI_COMM_WORLD);

MPI_Gather ‫ لتصبح‬procs ‫صفوفه من كل‬D‫ تجميع اجزاء ا‬MPI_Gather(&sendbuf, sendcount, sendtype, &recvbuf, recvcount, 8 2
‫ مصفوفه واحده متكامله‬recvtype, dest, MPI_COMM_WORLD);

MPI_Allgather ‫ لتصبح‬procs ‫صفوفه من كل‬D‫ تجميع اجزاء ا‬MPI_Allgather(&sendbuf, sendcount, sendtype, &recvbuf, recvcount, 7 2
‫ مصفوفه واحده متكامله ثم توزعها على كل‬recvtype, MPI_COMM_WORLD);
procs

MPI_Type_create_stru ‫( انشاء نوع بيانات مخصص مكون من عدة‬Count ,array_of_length ,array_of_dis , array_of_type, &newtype ); 5 1
ct ‫متغيرات‬

MPI_Get_address ‫ يحصل على عنوان متغير ما في الذاكرة‬MPI_Get_address(&location , &address); 2 2

MPI_Type_commit ‫ تستخدم مباشره بعد‬MPI_Type_commit(newtype); 1 0

MPI_Type_create_struct
‫لتفعيلها‬

MPI_Type_free ‫ يستخدم لتحرير‬MPI_Type_free(&newtype); 1 1

MPI_Type_create_struct

MPI_Get_count ‫مها‬g‫عرفة عدد العناصر التي تم است‬D MPI_Get_count( MPI Status&status-p, MPI-Datatype type, count-p); 3 1

MPI_Barrier comm‫وجووده في ال‬D‫ توقف جميع العمليات ا‬MPI_Barrier(MPI_COMM_WORLD); 1 0

‫ لهذه النقطة‬procs‫حتى تصل كل ال‬

MPI_Wtime ‫ستغرق في تنفيذ جزء من الكود‬D‫ لقياس الوقت ا‬MPI_Wtime(void); 0 0

MPI Constant

‫اسمها‬ ‫الوظيفه‬
MPI_Double ‫اعداد عشريه‬ ‫تحديد نوع البيانات‬
MPI_INT ‫اعداد صحيحه‬
MPI_CHAR ‫حروف‬

MPI_SUM ‫جمع‬ ‫العمليات الي اقدر اطبقها على البيانات‬

MPI_PROD ‫ضرب‬
MPI_MAX ‫القيمه الكبرى‬
MPI_MIN ‫القيمه الصغرى‬

MPI_ANY_TAG tag ‫ لقبول اي‬recv‫يستخدم في دالة ال‬

MPI_ANY_SOURCE source ‫ لقبول اي رساله‬recv‫يستخدم في دالة ال‬

MPI_Datatype ‫نوع خاص من انواع البيانات‬

MPI_Aint ‫يستخدم لتخزين العناوين‬

MPI_COMM_WORLD procs‫يخزن جميع ال‬

KINGDOM OF SAUDI ARABIA ‫اﻟﻤﻤﻠﻜﺔ اﻟﻌﺮﺑﻴﺔ اﻟﺴﻌﻮدﻳﺔ‬
Ministry of Higher Education
Taibah University ‫وزارة اﻟﺘﻌﻠﻴﻢ اﻟﻌﺎﻟﻲ‬
College of Computer Science & Engineering ‫ﻛﻠﻴﺔ ﻋﻠﻮم وﻫﻨﺪﺳﺔ اﻟﺤﺎﺳﺐ‬-‫ﺟﺎﻣﻌﺔ ﻃﻴﺒﺔ‬
‫اﻵﻟﻲ‬

CS424 Introduction to Parallel Computing Semester II 2018-2019

LAB 5: Introduction to Message Passing Interface (MPI)

Objective
 To learn the basics of MPI program.
 To learn the MPI point-to-point communication.

Lab Activities
1. Using the “hello, world” program in the previous lab, please do the following:
a. Include the MPI header file mpi.h at the top of your program.
b. Include the MPI function, MPI_Init(), which initializes the program with the
MPI environment and MPI_Finalise(), which does the cleaning up of the
environment before the program ends.
c. Edit the “hello, world” to be “Hello, world from process with
rank %d out of %d processes.”, my_rank, comm_sz);
d. Declare my_rank and comm_sz as type integer.
e. Include MPI function MPI_Comm_size() which returns the size of the
communicator (i.e. total number of processes).
f. Include MPI function MPI_Comm_rank() that tells the rank or id of the
process.

2. Compile and run the “MPI-hello, world” program with 1 process, 2, 4 and 8
processes. Write down the output, note the differences and explain.
Here a sample with 4 processes:

Dr. Mutaz & Dr. Fazilah Page 1

3. Write, compile and run an MPI program which does the following point-to-point
communication.
 Process 0 sends the value 2017 to process 1, 2 and 3.
 The receiving process prints a message upon getting the value.
Here a sample:

4. Modify the above program by having process 0 sends the value of year iteratively,
starting from 2018 to 2021, to 4 different processes (i.e. 2018 to process 1, 2019 to
process 2 etc).
Here a sample:

Exercises
1. Please answer the following questions:
(i) Name the header file that you need to run MPI programs. Briefly explain the
content of the header file.
(ii) What are the two MPI functions that must be included in every MPI program.
What are they for?

Dr. Mutaz & Dr. Fazilah Page 2

(iii) The rank of a process can be identified using which MPI function?
(iv) Name the default communicator in MPI.
(v) Are MPI_Send() and MPI_Recv() a blocking or non-blocking operations?
(vi) Name the different variations of send and receive operations.

2. Compile and run Program 3.1 on page 85 with 1, 2 and 4 processes. Explain what
the program does.

3. Modify the program so that it does the reverse, that is process 0 sends the greeting
to the rest of the processes and each prints the greeting.

Note:

All provided samples were written using VS2017 which can't run the MPI programs directly
by IDE, so "cmd.exe" was used to execute and run the programs, with VS2010 there is no
need to use the command, the programs can be run directly by IDE.

Dr. Mutaz & Dr. Fazilah Page 3

1, 2:
#include "stdafx.h"
#include <stdio.h>
#include <string.h>
#include <mpi.h>
#include <iostream>

const int MAX_STRING = 100;

int main(void) {
char greeting[MAX_STRING];
int comm_sz;
int my_rank;
MPI_Init(NULL, NULL);
MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

if (my_rank != 0) {
sprintf(greeting, "Hello, world from process with rank %d out of %d
processes.\n", my_rank, comm_sz);
MPI_Send(greeting, strlen(greeting) + 1, MPI_CHAR, 0, 0, MPI_COMM_WORLD);
}
else {
printf("Hello, world from process with rank %d out of %d processes.\n",
my_rank, comm_sz);
for (int q = 1; q < comm_sz; q++) {
MPI_Recv(greeting, MAX_STRING, MPI_CHAR, q, 0, MPI_COMM_WORLD,
MPI_STATUS_IGNORE);
}
}
MPI_Finalize();
//system("pause");
return 0;
}
3.

#include "stdafx.h"
#include <stdio.h>
#include <string.h>
#include <mpi.h>
#include <iostream>

const int MAX_STRING = 100;

int main(void) {
int comm_sz;
int my_rank;
int year = 0;
MPI_Init(NULL, NULL);
MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

if (my_rank == 0) {
year = 2017;
for (int p = 1; p < comm_sz; p++) {
MPI_Send(&year, 1, MPI_INT, p, 0, MPI_COMM_WORLD);
}
}
else {
MPI_Recv(&year, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("Process %d received %d from process 0\n", my_rank, year);
}

MPI_Finalize();
//system("pause");
return 0;
}
4.
#include "stdafx.h"
#include <stdio.h>
#include <string.h>
#include <mpi.h>
#include <iostream>

const int MAX_STRING = 100;

int main(void) {
int comm_sz;
int my_rank;
int year = 2018;
MPI_Init(NULL, NULL);
MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

if (my_rank == 0) {
//0 sends that year to each
for (int p = 1; p < comm_sz; p++) {
MPI_Send(&year, 1, MPI_INT, p, 0, MPI_COMM_WORLD);
year++;
}
}
else {
MPI_Recv(&year, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("Process %d received %d from process 0\n", my_rank, year);
}

MPI_Finalize();
//system("pause");
return 0;
}
KINGDOM OF SAUDI ARABIA ‫اﻟﻤﻤﻠﻜﺔ اﻟﻌﺮﺑﻴﺔ اﻟﺴﻌﻮدﻳﺔ‬
Ministry of Higher Education
Taibah University ‫وزارة اﻟﺘﻌﻠﻴﻢ اﻟﻌﺎﻟﻲ‬
College of Computer Science & Engineering ‫ﻛﻠﻴﺔ ﻋﻠﻮم وﻫﻨﺪﺳﺔ اﻟﺤﺎﺳﺐ‬-‫ﺟﺎﻣﻌﺔ ﻃﻴﺒﺔ‬
‫اﻵﻟﻲ‬

CS424 Introduction to Parallel Computing Semester II 2018-2019

LAB 6: MPI – Collective Communication

Objective
 To learn the MPI collective communication.

Lab Activities
1. Use MPI_Reduce() to sum up the process rank of all the processes. The final
result is in process 0.
Here a sample with 4 processes:

2. In the same program, use MPI_Bcast() to send the result of the summation from
process 0 to all other processes.
Here a sample with 4 processes:

Dr. Mutaz & Dr. Fazilah Page 1

3. Use only MPI_Allreduce() to do operation in (1) and (2) above.
4. Modify the sequential vector summation of Lab 1 using MPI_Scatter() and
MPI_Gather().
Here a sample with 4 processes:

5. Repeat number (4) using MPI_Allgather().

Exercises
1. Modify the program in (5) so that process 0 reads the values into the vector, does
the summation and print the final results.
2. Examine the effect of multiple calls to MPI_Reduce () as presented in Table 3.3
section 3.4.3 of Pacheco.
Note:

Dr. Mutaz & Dr. Fazilah Page 2

LAB 6: MPI – Collective Communication

1:
#include "stdafx.h"
#include <stdio.h>
#include <string.h>
#include <mpi.h>
#include <iostream>

int main(void) {
int comm_sz;
int my_rank;
int sum;
MPI_Init(NULL, NULL);
MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
// we add reduce instead off send/receive
MPI_Reduce(&my_rank, &sum, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);

if (my_rank == 0) {
printf("Sum rank of %d processes is %d!\n", comm_sz, sum);
}

MPI_Finalize();
//system("pause");
return 0;
}
2.
#include "stdafx.h"
#include <stdio.h>
#include <string.h>
#include <mpi.h>
#include <iostream>

int main(void) {
int comm_sz;
int my_rank;
int sum;
MPI_Init(NULL, NULL);
MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

MPI_Reduce(&my_rank, &sum, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);

MPI_Bcast(&sum, 1, MPI_INT, 0, MPI_COMM_WORLD);

printf("Process %d: Sum rank of %d processes is %d!\n", my_rank, comm_sz, sum);

MPI_Finalize();
//system("pause");
return 0;
}
3.

#include "stdafx.h"
#include <stdio.h>
#include <string.h>
#include <mpi.h>
#include <iostream>

int main(void) {
int comm_sz;
int my_rank;
int sum;
MPI_Init(NULL, NULL);
MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

MPI_Allreduce(&my_rank, &sum, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);

printf("Process %d: Sum rank of %d processes is %d!\n", my_rank, comm_sz, sum);

MPI_Finalize();
//system("pause");
return 0;
}
4.
#include "stdafx.h"
#include <stdio.h>
#include <string.h>
#include <mpi.h>
#include <iostream>

/*
MPI_Scatter, MPI_Gather (vector addition)
*/

void Parallel_vector_sum(int &my_rank, int local_x[], /in/int local_y[]/in/, int

z[]/*out*/, int local_n/*in*/);

int main(void) {
int my_rank, comm_sz, n = 8;
int local_n;
int x[8] = { 0, 0, 1, 1, 2, 2, 3, 3 };
int local_x[2];
int y[8] = { 0, 0, 1, 1, 2, 2, 3, 3 };
int local_y[2];
int local_z[2];
int z[8];

MPI_Init(NULL, NULL);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);

local_n = n / comm_sz;

//printf("Proccess %d\n", my_rank);

MPI_Scatter(x, local_n, MPI_INT, local_x, local_n, MPI_INT, 0, MPI_COMM_WORLD);

MPI_Scatter(y, local_n, MPI_INT, local_y, local_n, MPI_INT, 0, MPI_COMM_WORLD);
Parallel_vector_sum(my_rank, local_x, local_y, local_z, local_n);
MPI_Gather(&local_z, 2, MPI_INT, &z, 2, MPI_INT, 0, MPI_COMM_WORLD);

if (my_rank == 0) {
for (int i = 0; i<n; i++)
printf("%d: %d\n", i, z[i]);
}
MPI_Finalize();
//system("pause");
return 0;
}

void Parallel_vector_sum(int &my_rank, int local_x[], /in/int local_y[]/in/, int

local_z[]/*out*/, int local_n/*in*/) {
int local_i;

for (local_i = 0; local_i < local_n; local_i++)

local_z[local_i] = local_x[local_i] + local_y[local_i];
}
//--------------------in java
//scater devide data
#include "stdafx.h"
#include <stdio.h>
#include <string.h>
#include <iostream>
/*
vector addition
*/
void Vector_sum(int x[], int y[], int z[], int n);
int main(void) {
int x[8] = { 0, 0, 1, 1, 2, 2, 3, 3 };
int y[8] = { 0, 0, 1, 1, 2, 2, 3, 3 };
int z[8];
int i, n = 8;
Vector_sum(x, y, z, n);
printf("X:\n");
for (i = 0; i < n; i++)
printf("%d: %d\n", i, x[i]);
printf("Y:\n");
for (i = 0; i < n; i++)
printf("%d: %d\n", i, y[i]);
printf("X+Y:\n");
for (i = 0; i < n; i++)
printf("%d: %d\n", i, z[i]);
//system("pause");
return 0;
}
void Vector_sum(int x[], int y[], int z[], int n) {
int i;
for (i = 0; i < n; i++)
z[i] = x[i] + y[i];

//-----------------------in java
#include <stdio.h>
#include <string.h>
#include <iostream>
/*
matrix-vector multiplication
*/
void Mat_vect_mult(int local_A[] /*in*/, int local_x[] /*in*/, int local_y[]
/*out*/, int
local_m /*in*/,
int n /*in*/);
int main(void) {
int my_rank, comm_sz, m = 16, n = 4;
int A[16] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 0, 1, 2, 5 };
int x[4] = { 0, 1, 1, 2 };
int y[4];
printf("A:\n");
for (int i = 0; i < m; i++) {
printf("%d:%d\n", i, A[i]);
}
printf("X:\n");
for (int i = 0; i < n; i++) {
printf("%d:%d\n", i, x[i]);
}
Mat_vect_mult(A, x, y, m, n);
printf("A*X:\n");
for (int i = 0; i < m / n; i++)
printf("%d: %d\n", i, y[i]);
//system("pause");
return 0;
}
// Serial matrix-vector multiplication
void Mat_vect_mult(int A[] /*in*/, int x[] /*in*/, int y[] /*out*/, int m
/*in*/, int n
/*in*/) {
int i, j;
for (i = 0; i < m / n; i++) {
y[i] = 0;
for (j = 0; j < n; j++) {
y[i] += A[i * n + j] * x[j];
}
}
}
KINGDOM OF SAUDI ARABIA ‫اﻟﻤﻤﻠﻜﺔ اﻟﻌﺮﺑﻴﺔ اﻟﺴﻌﻮدﻳﺔ‬
Ministry of Higher Education
Taibah University ‫وزارة اﻟﺘﻌﻠﻴﻢ اﻟﻌﺎﻟﻲ‬
College of Computer Science & Engineering ‫ﻛﻠﻴﺔ ﻋﻠﻮم وﻫﻨﺪﺳﺔ اﻟﺤﺎﺳﺐ‬-‫ﺟﺎﻣﻌﺔ ﻃﻴﺒﺔ‬
‫اﻵﻟﻲ‬

CS424 Introduction to Parallel Computing Semester II 2018-2019

LAB 7: MPI - Performance Evaluation

Objective
 To write simple parallel applications using MPI.
 To evaluate the performance of parallel applications.

Lab Activities
1. Modify your sequential matrix multiplication in Lab 1, into parallel matrix
multiplication.
2. Insert the necessary code to time the parallel matrix multiplication (please refer to
section 3.6.1).
3. Get the sequential run-time and parallel run-time for 2, 4, 8 and 16 processes for
varying matrix sizes as in Table 3.5 (Pacheco).
4. Calculate the speed up and the efficiency of the parallel program in (3) and plot a
graph for both.

Exercises
1. Obtain a speed up and efficiency for the following application using varying processes
and data size:
a. parallel trapezoid program (section 3.2.2).
b. parallel pi computation (section 4.4).

Dr. Mutaz & Dr. Fazilah Page 1

A sample with 4 processes for Lab Activity 1:

Dr. Mutaz & Dr. Fazilah Page 2

A sample with 4 processes for Lab Activity 2:

Dr. Mutaz & Dr. Fazilah Page 3

Note:

Dr. Mutaz & Dr. Fazilah Page 4

LAB 7: MPI - Performance Evaluation

1:
#include "stdafx.h"
#include <stdio.h>
#include <string.h>
#include <mpi.h>
#include <iostream>

/*
matrix-vector multiplication
*/

void Mat_vect_mult(int local_A[] /*in*/, int x[] /*in*/, int local_y[] /*out*/, int
local_m /*in*/,
int n /*in*/, MPI_Comm comm /*in*/);

int main(void) {
int my_rank, comm_sz, m = 8/*number of rows*/, n = 4 /*number of columns*/,
local_m;
int A[32] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 }; // m*n
int x[4] = { 0, 1, 3, 4 }; // n
int y[8]; // m
int local_A[8]; // local_m * n
int local_y[2]; // local_m

MPI_Init(NULL, NULL);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);

local_m = m / comm_sz;

printf("Proccess %d\n", my_rank);

MPI_Scatter(A, local_mn, MPI_INT, local_A, local_mn, MPI_INT, 0,

MPI_COMM_WORLD);
//printf("local_A\n");
//for (int i = 0; i<local_m*n; i++) {
// printf("%d:%d\n", i, local_A[i]);
//}

Mat_vect_mult(local_A, x, local_y, local_m, n, MPI_COMM_WORLD);

MPI_Gather(local_y, local_m, MPI_INT, y, local_m, MPI_INT, 0, MPI_COMM_WORLD);

//printf("local_y:\n");
//for (int i = 0; i<local_m; i++)
// printf("%d: %d\n", i, local_y[i]);

if (my_rank == 0) {
printf("A:\n");
for (int i = 0; i<m*n; i++) {
printf("%d:%d\n", i, A[i]);
}

printf("X:\n");
for (int i = 0; i<n; i++) {
printf("%d:%d\n", i, x[i]);
}

printf("A*X:\n");
for (int i = 0; i<m; i++)
printf("%d: %d\n", i, y[i]);
}

MPI_Finalize();
//system("pause");
return 0;
}

// An MPI matrix-vector multiplication function

void Mat_vect_mult(int local_A[] /*in*/, int x[] /*in*/, int local_y[] /*out*/, int
local_m /*in*/,
int n /*in*/, MPI_Comm comm /*in*/) {
int local_i, j;

MPI_Bcast(x, n, MPI_INT, 0, comm);

for (local_i = 0; local_i < local_m; local_i++) {

local_y[local_i] = 0;
for (j = 0; j < n; j++)
local_y[local_i] += local_A[local_i*n + j] * x[j];
}
}
2.
#include "stdafx.h"
#include <stdio.h>
#include <string.h>
#include <mpi.h>
#include <iostream>

/*
matrix-vector multiplication
*/

void Mat_vect_mult(int local_A[] /*in*/, int x[] /*in*/, int local_y[] /*out*/, int
local_m /*in*/,
int n /*in*/, MPI_Comm comm /*in*/);

MPI_Init(NULL, NULL);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);

local_m = m / comm_sz;

printf("Proccess %d\n", my_rank);

MPI_Scatter(A, local_mn, MPI_INT, local_A, local_mn, MPI_INT, 0,

MPI_COMM_WORLD);
//printf("local_A\n");
//for (int i = 0; i<local_m*n; i++) {
// printf("%d:%d\n", i, local_A[i]);
//}

double local_start, local_finish, local_elapsed, elapsed;

MPI_Barrier(MPI_COMM_WORLD);

local_start = MPI_Wtime();

Mat_vect_mult(local_A, x, local_y, local_m, n, MPI_COMM_WORLD);

local_finish = MPI_Wtime();
local_elapsed = local_finish - local_start;

MPI_Reduce(&local_elapsed, &elapsed, 1, MPI_DOUBLE, MPI_MAX, 0, MPI_COMM_WORLD);

printf("Proc %d > Elapsed time = %f seconds\n", my_rank, local_elapsed);

MPI_Gather(local_y, local_m, MPI_INT, y, local_m, MPI_INT, 0, MPI_COMM_WORLD);

//printf("local_y:\n");
//for (int i = 0; i<local_m; i++)
// printf("%d: %d\n", i, local_y[i]);

if (my_rank == 0) {
printf("A:\n");
for (int i = 0; i<m*n; i++) {
printf("%d:%d\n", i, A[i]);
}

printf("X:\n");
for (int i = 0; i<n; i++) {
printf("%d:%d\n", i, x[i]);
}

printf("A*X:\n");
for (int i = 0; i<m; i++)
printf("%d: %d\n", i, y[i]);

printf("Max Elapsed time = %f seconds\n", elapsed);

}

MPI_Finalize();
//system("pause");
return 0;
}

// An MPI matrix-vector multiplication function

void Mat_vect_mult(int local_A[] /*in*/, int x[] /*in*/, int local_y[] /*out*/, int
local_m /*in*/,
int n /*in*/, MPI_Comm comm /*in*/) {
int local_i, j;

MPI_Bcast(x, n, MPI_INT, 0, comm);

for (local_i = 0; local_i < local_m; local_i++) {

local_y[local_i] = 0;
for (j = 0; j < n; j++)
local_y[local_i] += local_A[local_i*n + j] * x[j];
}
}

Service Manual B4600 PDF
71% (7)
Service Manual B4600 PDF
171 pages
Batch Control IsA 9 21 2010
100% (2)
Batch Control IsA 9 21 2010
52 pages
Property Management Presentation
100% (1)
Property Management Presentation
14 pages
Deficient Knowledge: Nursing Diagnosis Nursing Care Plans (NCP)
No ratings yet
Deficient Knowledge: Nursing Diagnosis Nursing Care Plans (NCP)
3 pages
MPI2
No ratings yet
MPI2
3 pages
Unit Iv Distributed Memory Programming With Mpi
No ratings yet
Unit Iv Distributed Memory Programming With Mpi
19 pages
Distributed Memory Programming With MPI: Peter Pacheco
No ratings yet
Distributed Memory Programming With MPI: Peter Pacheco
121 pages
Distributed Memory Programming With: Peter Pacheco
No ratings yet
Distributed Memory Programming With: Peter Pacheco
125 pages
MPI Pacheco Ch3
No ratings yet
MPI Pacheco Ch3
124 pages
MPI1
No ratings yet
MPI1
2 pages
08 1 MPI Comm Data Distributions
No ratings yet
08 1 MPI Comm Data Distributions
60 pages
ECE 1747H: Parallel Programming: Message Passing (MPI)
No ratings yet
ECE 1747H: Parallel Programming: Message Passing (MPI)
67 pages
CH 4
No ratings yet
CH 4
16 pages
Distributed-Memory Parallel Programming With MPI: Supervised By: Dr. Shaima Hagras
No ratings yet
Distributed-Memory Parallel Programming With MPI: Supervised By: Dr. Shaima Hagras
20 pages
Message Passing-1
No ratings yet
Message Passing-1
76 pages
Unit IV
No ratings yet
Unit IV
12 pages
ECE 1747H: Parallel Programming: Message Passing (MPI)
No ratings yet
ECE 1747H: Parallel Programming: Message Passing (MPI)
67 pages
Assignment Individual - 1 ParallelProg
No ratings yet
Assignment Individual - 1 ParallelProg
6 pages
02 Message Passing Interface Tutorial
No ratings yet
02 Message Passing Interface Tutorial
34 pages
The Message Passing Interface (MPI)
No ratings yet
The Message Passing Interface (MPI)
18 pages
Lecture 15 MPI Summarization
No ratings yet
Lecture 15 MPI Summarization
26 pages
CP4253 Map Unit Iv
No ratings yet
CP4253 Map Unit Iv
22 pages
Parallel & Distributed Computing: MPI - Message Passing Interface
No ratings yet
Parallel & Distributed Computing: MPI - Message Passing Interface
49 pages
Mpi Basic Operations
No ratings yet
Mpi Basic Operations
6 pages
10.collectives I
No ratings yet
10.collectives I
31 pages
‎⁨تقرير⁩
No ratings yet
‎⁨تقرير⁩
16 pages
Message Passing Interface: Parallel Processing Course University of Tehran
No ratings yet
Message Passing Interface: Parallel Processing Course University of Tehran
49 pages
Mpi 1
No ratings yet
Mpi 1
20 pages
Intro MPI
No ratings yet
Intro MPI
60 pages
Point-to-Point Communication: MPI Send MPI Recv
No ratings yet
Point-to-Point Communication: MPI Send MPI Recv
4 pages
2 Mpi
No ratings yet
2 Mpi
13 pages
Week 10
No ratings yet
Week 10
52 pages
Cluster Lab Session 03
No ratings yet
Cluster Lab Session 03
9 pages
Ms. V. Uma Maheswari, Assistant Lecturer, Department of Information Technology, National Institute of Technology, Surathkal
No ratings yet
Ms. V. Uma Maheswari, Assistant Lecturer, Department of Information Technology, National Institute of Technology, Surathkal
91 pages
Mpi Programming 2
No ratings yet
Mpi Programming 2
57 pages
MPI Part2 Updated
No ratings yet
MPI Part2 Updated
20 pages
5CS022 Lecture 2
No ratings yet
5CS022 Lecture 2
24 pages
An Introduction To MPI: Parallel Programming With The Message Passing Interface
No ratings yet
An Introduction To MPI: Parallel Programming With The Message Passing Interface
48 pages
Introduction To C MPI PM
No ratings yet
Introduction To C MPI PM
50 pages
Cs-3006 6 Mpi Basics 2
No ratings yet
Cs-3006 6 Mpi Basics 2
52 pages
Mpi
No ratings yet
Mpi
30 pages
Send and Receive
No ratings yet
Send and Receive
11 pages
Lecture 11 Distributed Memory Programming
No ratings yet
Lecture 11 Distributed Memory Programming
28 pages
Distributed Systems and Cloud Computing
No ratings yet
Distributed Systems and Cloud Computing
24 pages
Message Passing Interface (MPI) Programming
No ratings yet
Message Passing Interface (MPI) Programming
11 pages
Introduction MPI - Chap2 - Slide 3
No ratings yet
Introduction MPI - Chap2 - Slide 3
16 pages
Lec 6
No ratings yet
Lec 6
21 pages
In3200 Chap09
No ratings yet
In3200 Chap09
56 pages
03 MPIProgramStructure
No ratings yet
03 MPIProgramStructure
42 pages
CS-3006 - 6 - MPI Advanced Topics
No ratings yet
CS-3006 - 6 - MPI Advanced Topics
32 pages
Clase 4 - Tutorial de MPI
No ratings yet
Clase 4 - Tutorial de MPI
35 pages
CS-3006 7 MPI Advanced Topics
No ratings yet
CS-3006 7 MPI Advanced Topics
36 pages
Key Concepts in MPI Programming: Processes
No ratings yet
Key Concepts in MPI Programming: Processes
6 pages
CS-3006 - 5 - MPI Basics
No ratings yet
CS-3006 - 5 - MPI Basics
53 pages
Introduction To MPI Basics
No ratings yet
Introduction To MPI Basics
8 pages
Lecture07 MPI by Example
No ratings yet
Lecture07 MPI by Example
27 pages
Message Passing and MPI: John Mellor-Crummey
No ratings yet
Message Passing and MPI: John Mellor-Crummey
78 pages
Mpi Lecture
No ratings yet
Mpi Lecture
129 pages
Message Passing Interface (MPI) Programming
No ratings yet
Message Passing Interface (MPI) Programming
11 pages
Writing Message Passing Parallel Programs With MPI: Course Notes
No ratings yet
Writing Message Passing Parallel Programs With MPI: Course Notes
80 pages
Lec5 MPI
No ratings yet
Lec5 MPI
28 pages
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
The Software Programmer: Basis of common protocols and procedures
From Everand
The Software Programmer: Basis of common protocols and procedures
S Mathioudakis
No ratings yet
Collection of Raspberry Pi Projects
From Everand
Collection of Raspberry Pi Projects
Guillermo Perez Guillen
5/5 (1)
STV Insights
No ratings yet
STV Insights
20 pages
Sworn Statement of Assets, Liabilities and Net Worth
No ratings yet
Sworn Statement of Assets, Liabilities and Net Worth
2 pages
ADR Paper
No ratings yet
ADR Paper
7 pages
Assignment - Acc 221-2023
No ratings yet
Assignment - Acc 221-2023
3 pages
EOI 2019 01 Website PDF
No ratings yet
EOI 2019 01 Website PDF
15 pages
BAC GIANG - Đề thi chọn ĐT 2023 (chính thức)
No ratings yet
BAC GIANG - Đề thi chọn ĐT 2023 (chính thức)
19 pages
Section A (50 Marks)
No ratings yet
Section A (50 Marks)
4 pages
Spru I 11444
No ratings yet
Spru I 11444
24 pages
RD Rigidsteelconduitimc
No ratings yet
RD Rigidsteelconduitimc
1 page
TNOU Hall Ticket
100% (1)
TNOU Hall Ticket
2 pages
Right To Privacy Essay
No ratings yet
Right To Privacy Essay
18 pages
Print - Udyam Registration Certificate
No ratings yet
Print - Udyam Registration Certificate
2 pages
Rules of NPKL
No ratings yet
Rules of NPKL
4 pages
Credentials - Impeerical Consulting
No ratings yet
Credentials - Impeerical Consulting
22 pages
F 305 Final Bill Checklist
No ratings yet
F 305 Final Bill Checklist
2 pages
D R L F L S I G S: EEP Einforcement Earning For Urniture Ayout Imulation in Ndoor Raphics Cenes
No ratings yet
D R L F L S I G S: EEP Einforcement Earning For Urniture Ayout Imulation in Ndoor Raphics Cenes
6 pages
Unit III
No ratings yet
Unit III
58 pages
AHM13e Chapter - 01 - Solution To Problems and Key To Cases
No ratings yet
AHM13e Chapter - 01 - Solution To Problems and Key To Cases
19 pages
PV Valves Operation and Maintenance Procedure
100% (2)
PV Valves Operation and Maintenance Procedure
6 pages
Fosroc Nitomortar FC (FS) : Constructive Solutions
No ratings yet
Fosroc Nitomortar FC (FS) : Constructive Solutions
2 pages
Grand Designs UK - November 2021
No ratings yet
Grand Designs UK - November 2021
156 pages
2021.02.05 USAO Press Release, Over $1.1M in Civil Settlements Reaffirm DOJ's Commitment To Preventing Opioid Abuse
No ratings yet
2021.02.05 USAO Press Release, Over $1.1M in Civil Settlements Reaffirm DOJ's Commitment To Preventing Opioid Abuse
2 pages
Math5 - q2 - Mod4 - Multiply Decimals Up To 2 Decimal Places
No ratings yet
Math5 - q2 - Mod4 - Multiply Decimals Up To 2 Decimal Places
30 pages
Henari Security Business Profile1
100% (1)
Henari Security Business Profile1
8 pages
HPE - Dp00002639en - Us - HPE Smart Storage Administrator GUI User Guide
No ratings yet
HPE - Dp00002639en - Us - HPE Smart Storage Administrator GUI User Guide
142 pages
11com OCM Final 21-22
80% (5)
11com OCM Final 21-22
5 pages