0% found this document useful (0 votes)
91 views30 pages

1 MPI Communications: CS424. Parallel Computing Lab#4

Uploaded by

thatsarra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
91 views30 pages

1 MPI Communications: CS424. Parallel Computing Lab#4

Uploaded by

thatsarra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

CS424.

Parallel Computing Lab#4

1 MPI Communications
In MPI (Message Passing Interface), communication plays a crucial role in coordinating work among independent
processes. MPI processes are independent, and explicit communication is necessary for coordination. There are two
types of communication within MPI:

1. Point-to-Point Communication: involves sending and receiving messages between two specific processes
within the same communicator. It allows processes to exchange data directly with each other.
 Types of Point-to-Point Operations:
- Blocking Send/Receive: Synchronous communication where the sender blocks until the
receiver acknowledges receipt.
- Non-blocking Send/Receive: Asynchronous communication where the sender initiates the
transfer and continues execution without waiting for the receiver.
2. Collective Communication: involves multiple processes working together as a group. It enables coordinated
operations across all processes in a communicator. It uses collective MPI functions, such as:
- MPI_Scatter: Distributes data from one process to many processes.
- MPI_Reduce: Performs data reduction (e.g., sum, max) across all processes.
- MPI_Bcast: Broadcasts data from one process to all others.

2 Examples
1. The following program implements a simple send/receive communication. Compile and run the program and
study the output of several runs.

Code 1

2. The following program demonstrates sending and receiving messages between two processes in a ping-pong
fashion. The communication does not work if the size of the communicator is not exactly 2.
Note that:
 Process 0: Sends a message to process 1 and then receives a message back.
 Process 1: Receives a message from process 0, sends a response, and then receives another
message.
Compile and run the program and study the output of several runs.
1
Code 2

3 Practice
1. Revisit Code 1 and change the behavior of the program so process 0 can receive the results in the order in
which the processes finish.

2. Explain the execution of Code 2 if the size of the communicator is not 2.

3. Compile and run the code in mpiSumArray.c( Code folder), study the behavior of the program across several
runs, and write the proper code lines to address the comments marked as “// ** display…”. Your output
should be similar to the example below.

4. Write an MPI program in C to calculate the sum of two predefined arrays of at least length 12. The program
should ensure that it only runs when the number of processes is four or more. Use parallel execution to
compute partial sums and then combine them to get the final result.

2
CS424. Parallel Computing Lab#5

1 MPI Collective Communications


MPI's collective communication functions allow all processes in a communicator group to perform an operation
together. These functions offer efficient communication patterns for parallel programming. The followings are some
common collective functions.

1. MPI_Bcast: One process (usually the root) has a piece of data.This data is sent to all other processes in
the communicator. All processes will have the same data after the broadcast.
2. MPI_Reduce: Each process has a local value (e.g., individual score). A mathematical operation (like sum,
max, min) is applied to all the values sent by each process. The result of the operation is stored on the
designated process (usually the root).
3. MPI_All_reduce: Similar to Reduce but it distributes the reduced result to all processes.
4. MPI_Scatter: One process (usually the root) has a large dataset. The data is divided (scattered) into
smaller chunks and sent to all processes. Each process receives a portion of the original data.
5. MPI_Gather: Each process has a piece of data. All processes send their data to a designated process
(usually the root). The root process gathers all the data into a single collection.
6. MPI_All_gather: Gathers data from all processes and distributes it to all processes.
7. MPI_Barrier: All processes in the communicator must reach this point before any can proceed. Useful
for synchronization, ensuring all processes have finished a specific task before moving on.

Benefits of Collective Communication:

• Simplifies code compared to sending and receiving messages individually between processes.
• Ensures efficient data exchange and synchronization within the communicator group.

Additional Notes:

• Not all collective functions require data exchange (e.g., MPI_Barrier).


• The designated "root" process can vary depending on the function.
• Some collective functions offer additional options like message tags for further differentiation.
• MPI_All_reduce gathers data from processes, execute the chosen operation, and then distributes, while
MPI_All_gather simply gathers and distributes.

To optimize code and potentially enhance performance, it is beneficial to explore which MPI functions can be
effectively substituted with other functions while maintaining the core functionality of the program.

2 Examples
Code 1 distributes array values among processes using MPI_Scatter. Compile and run the program and study the
output of several runs.

Explanation:

1. Include headers: Necessary header files for MPI communication (mpi.h), standard input/output (stdio.h),
and memory allocation (stdlib.h).
2. MPI Initialization: MPI_Init initializes the MPI environment and prepares it for communication.
3. Get Process Information:
o MPI_Comm_rank: Retrieves the rank (unique identifier) of the current process within the
communicator group.
o MPI_Comm_size: Retrieves the total number of processes participating in the communicator group.
4. Global Array (Root Process Only):
o Allocates memory for the entire array data on the root process (rank == 0) using malloc.
o You can modify the loop to initialize the data array with your desired values.

1
5. Scattering Data:
o send_count: Calculates the number of elements to send to each process, considering potential
uneven division.
o remainder: Tracks any remaining elements after dividing the total by the number of processes.
o MPI_Scatter: Distributes elements of the data array on the root process to all processes, including
the root itself. Each process receives send_count elements (except the root process might receive
extras if there's a remainder). local_data is allocated to store the received portion.
6. Printing Received Data: Each process prints the elements it received using a loop.
7. Memory Deallocation: Frees the allocated memory for local_data on all processes and data on the root
process.
8. Finalize MPI: MPI_Finalize cleans up the MPI environment and releases resources.
Running MPI programs on a single core system
• The --oversubscribe parameter in mpiexec (or mpirun) allows you to launch more MPI processes on a
single node than the number of available cores or hardware threads.
• The behavior of --oversubscribe might vary depending on the specific MPI implementation and system
configuration.
• Some MPI libraries might offer alternative options for process placement that provide more control over
oversubscription.

Code 1

2
3 Practice
1. In MPI, achieving a desired outcome often involves multiple communication pathways. List the potential
substitutions between MPI functions that can achieve similar communication patterns.

2. Recall the version you wrote of the code in mpiSumArray.c (Code folder, sample run is shown below).
Answer the following questions.

a) Rewrite the program using MPI_Reduce.


b) Show in four ways- 4 programs- how to calculate the total sum and send each process a copy of the result. Each
process must display the result it has received.

3. Do the necessary changes to the MPI program in Code 1 to perform the following.
a) Instead of printing the values of local_data, multiply the values of local_data by rank+1.
b) Gather the values of arrays after multiplication and send it to process 1. Show the result.
c) Use MPI_Reduce to calculate the minimum value of the array- after gathering- and let process 2 display the
result.

3
MPI identifiers

‫اسمها‬ ‫وظيفتها‬ ‫اقواسها‬ ‫عدد‬ ‫عدد‬


‫انات‬2‫ا‬ ‫العناوين‬
MPI_init MPI ‫ تهيئة وتشغيل بيئة‬MPI_init(NULL,NULL); 2 2
‫ تُستدعى في بداية أي برنامج‬MPI_init(&argc,&argv);

MPI_Finalize MPI ‫ انهاء بيئة‬MPI_Finalize( ); 0 0


‫ تُستدعى في نهاية البرنامج‬MPI_Finalize(void);

MPI_Comm_rank process ‫ الخاص بكل‬rank‫عرفة ال‬D MPI_Comm_rank(MPI_COMM_WORLD, &rank): 2 1

MPI_Comm_size ‫ داخل‬process‫عرفة كم عدد ال‬D MPI_Comm_rank(MPI_COMM_WORLD, &size): 2 1


communicator‫ال‬

MPI_send ‫ اخرى‬proc ‫ ترسل رسالة الى‬MPI_send(&message, size, datatype, dest, tag,MPI_COMM_WORLD); 6 1

MPI_Recv ‫ اخرى‬proc ‫ تستقبل رسالة من‬MPI_Recv(&message, size,datatype, src, 7 2


tag,MPI_COMM_WORLD,&status);

MPI_Reduce ‫ وتقوم بعملية‬procs‫ تجميع البيانات من ال‬MPI_Reduce(&sendbuf, &recvbuf, count, 7 2


‫ حسابية عليها‬Datatype ,operator ,dest ,MPI_COMM_WORLD);

MPI_Bcast procs‫ توزيع البيانات على جميع ال‬MPI_Bcast(&data ,count , datatype ,source, MPI_COMM_WORLD); 5 1

MPI_Allreduce ‫ وتقوم بعملية‬procs‫ تجميع البيانات من ال‬MPI_Allreduce (&sendbuf, &recvbuf, count, 6 2


‫ حسابية عليها‬Datatype ,operator ,MPI_COMM_WORLD);
proc‫و توزع النتيجة على كل ال‬

MPI_Scatter ‫ توزيع جزء من من مصفوفه على جميع‬MPI_Scatter(&sendbuf, sendcount, sendtype, &recvbuf, recvcount, 8 2
procs‫ ال‬recvtype, src, MPI_COMM_WORLD);

MPI_Gather ‫ لتصبح‬procs ‫صفوفه من كل‬D‫ تجميع اجزاء ا‬MPI_Gather(&sendbuf, sendcount, sendtype, &recvbuf, recvcount, 8 2
‫ مصفوفه واحده متكامله‬recvtype, dest, MPI_COMM_WORLD);

MPI_Allgather ‫ لتصبح‬procs ‫صفوفه من كل‬D‫ تجميع اجزاء ا‬MPI_Allgather(&sendbuf, sendcount, sendtype, &recvbuf, recvcount, 7 2
‫ مصفوفه واحده متكامله ثم توزعها على كل‬recvtype, MPI_COMM_WORLD);
procs

MPI_Type_create_stru ‫( انشاء نوع بيانات مخصص مكون من عدة‬Count ,array_of_length ,array_of_dis , array_of_type, &newtype ); 5 1
ct ‫متغيرات‬

MPI_Get_address ‫ يحصل على عنوان متغير ما في الذاكرة‬MPI_Get_address(&location , &address); 2 2

MPI_Type_commit ‫ تستخدم مباشره بعد‬MPI_Type_commit(newtype); 1 0


MPI_Type_create_struct
‫لتفعيلها‬

MPI_Type_free ‫ يستخدم لتحرير‬MPI_Type_free(&newtype); 1 1


MPI_Type_create_struct

MPI_Get_count ‫مها‬g‫عرفة عدد العناصر التي تم است‬D MPI_Get_count( MPI Status&status-p, MPI-Datatype type, count-p); 3 1

MPI_Barrier comm‫وجووده في ال‬D‫ توقف جميع العمليات ا‬MPI_Barrier(MPI_COMM_WORLD); 1 0


‫ لهذه النقطة‬procs‫حتى تصل كل ال‬

MPI_Wtime ‫ستغرق في تنفيذ جزء من الكود‬D‫ لقياس الوقت ا‬MPI_Wtime(void); 0 0

MPI Constant

‫اسمها‬ ‫الوظيفه‬
MPI_Double ‫اعداد عشريه‬ ‫تحديد نوع البيانات‬
MPI_INT ‫اعداد صحيحه‬
MPI_CHAR ‫حروف‬

MPI_SUM ‫جمع‬ ‫العمليات الي اقدر اطبقها على البيانات‬


MPI_PROD ‫ضرب‬
MPI_MAX ‫القيمه الكبرى‬
MPI_MIN ‫القيمه الصغرى‬

MPI_ANY_TAG tag ‫ لقبول اي‬recv‫يستخدم في دالة ال‬

MPI_ANY_SOURCE source ‫ لقبول اي رساله‬recv‫يستخدم في دالة ال‬

MPI_Datatype ‫نوع خاص من انواع البيانات‬

MPI_Aint ‫يستخدم لتخزين العناوين‬

MPI_COMM_WORLD procs‫يخزن جميع ال‬


KINGDOM OF SAUDI ARABIA ‫اﻟﻤﻤﻠﻜﺔ اﻟﻌﺮﺑﻴﺔ اﻟﺴﻌﻮدﻳﺔ‬
Ministry of Higher Education
Taibah University ‫وزارة اﻟﺘﻌﻠﻴﻢ اﻟﻌﺎﻟﻲ‬
College of Computer Science & Engineering ‫ﻛﻠﻴﺔ ﻋﻠﻮم وﻫﻨﺪﺳﺔ اﻟﺤﺎﺳﺐ‬-‫ﺟﺎﻣﻌﺔ ﻃﻴﺒﺔ‬
‫اﻵﻟﻲ‬

CS424 Introduction to Parallel Computing Semester II 2018-2019

LAB 5: Introduction to Message Passing Interface (MPI)

Objective
 To learn the basics of MPI program.
 To learn the MPI point-to-point communication.

Lab Activities
1. Using the “hello, world” program in the previous lab, please do the following:
a. Include the MPI header file mpi.h at the top of your program.
b. Include the MPI function, MPI_Init(), which initializes the program with the
MPI environment and MPI_Finalise(), which does the cleaning up of the
environment before the program ends.
c. Edit the “hello, world” to be “Hello, world from process with
rank %d out of %d processes.”, my_rank, comm_sz);
d. Declare my_rank and comm_sz as type integer.
e. Include MPI function MPI_Comm_size() which returns the size of the
communicator (i.e. total number of processes).
f. Include MPI function MPI_Comm_rank() that tells the rank or id of the
process.

2. Compile and run the “MPI-hello, world” program with 1 process, 2, 4 and 8
processes. Write down the output, note the differences and explain.
Here a sample with 4 processes:

Dr. Mutaz & Dr. Fazilah Page 1


3. Write, compile and run an MPI program which does the following point-to-point
communication.
 Process 0 sends the value 2017 to process 1, 2 and 3.
 The receiving process prints a message upon getting the value.
Here a sample:

4. Modify the above program by having process 0 sends the value of year iteratively,
starting from 2018 to 2021, to 4 different processes (i.e. 2018 to process 1, 2019 to
process 2 etc).
Here a sample:

Exercises
1. Please answer the following questions:
(i) Name the header file that you need to run MPI programs. Briefly explain the
content of the header file.
(ii) What are the two MPI functions that must be included in every MPI program.
What are they for?

Dr. Mutaz & Dr. Fazilah Page 2


(iii) The rank of a process can be identified using which MPI function?
(iv) Name the default communicator in MPI.
(v) Are MPI_Send() and MPI_Recv() a blocking or non-blocking operations?
(vi) Name the different variations of send and receive operations.

2. Compile and run Program 3.1 on page 85 with 1, 2 and 4 processes. Explain what
the program does.

3. Modify the program so that it does the reverse, that is process 0 sends the greeting
to the rest of the processes and each prints the greeting.

Note:

All provided samples were written using VS2017 which can't run the MPI programs directly
by IDE, so "cmd.exe" was used to execute and run the programs, with VS2010 there is no
need to use the command, the programs can be run directly by IDE.

Dr. Mutaz & Dr. Fazilah Page 3


1, 2:
#include "stdafx.h"
#include <stdio.h>
#include <string.h>
#include <mpi.h>
#include <iostream>

const int MAX_STRING = 100;

int main(void) {
char greeting[MAX_STRING];
int comm_sz;
int my_rank;
MPI_Init(NULL, NULL);
MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

if (my_rank != 0) {
sprintf(greeting, "Hello, world from process with rank %d out of %d
processes.\n", my_rank, comm_sz);
MPI_Send(greeting, strlen(greeting) + 1, MPI_CHAR, 0, 0, MPI_COMM_WORLD);
}
else {
printf("Hello, world from process with rank %d out of %d processes.\n",
my_rank, comm_sz);
for (int q = 1; q < comm_sz; q++) {
MPI_Recv(greeting, MAX_STRING, MPI_CHAR, q, 0, MPI_COMM_WORLD,
MPI_STATUS_IGNORE);
}
}
MPI_Finalize();
//system("pause");
return 0;
}
3.

#include "stdafx.h"
#include <stdio.h>
#include <string.h>
#include <mpi.h>
#include <iostream>

const int MAX_STRING = 100;

int main(void) {
int comm_sz;
int my_rank;
int year = 0;
MPI_Init(NULL, NULL);
MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

if (my_rank == 0) {
year = 2017;
for (int p = 1; p < comm_sz; p++) {
MPI_Send(&year, 1, MPI_INT, p, 0, MPI_COMM_WORLD);
}
}
else {
MPI_Recv(&year, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("Process %d received %d from process 0\n", my_rank, year);
}

MPI_Finalize();
//system("pause");
return 0;
}
4.
#include "stdafx.h"
#include <stdio.h>
#include <string.h>
#include <mpi.h>
#include <iostream>

const int MAX_STRING = 100;

int main(void) {
int comm_sz;
int my_rank;
int year = 2018;
MPI_Init(NULL, NULL);
MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

if (my_rank == 0) {
//0 sends that year to each
for (int p = 1; p < comm_sz; p++) {
MPI_Send(&year, 1, MPI_INT, p, 0, MPI_COMM_WORLD);
year++;
}
}
else {
MPI_Recv(&year, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("Process %d received %d from process 0\n", my_rank, year);
}

MPI_Finalize();
//system("pause");
return 0;
}
KINGDOM OF SAUDI ARABIA ‫اﻟﻤﻤﻠﻜﺔ اﻟﻌﺮﺑﻴﺔ اﻟﺴﻌﻮدﻳﺔ‬
Ministry of Higher Education
Taibah University ‫وزارة اﻟﺘﻌﻠﻴﻢ اﻟﻌﺎﻟﻲ‬
College of Computer Science & Engineering ‫ﻛﻠﻴﺔ ﻋﻠﻮم وﻫﻨﺪﺳﺔ اﻟﺤﺎﺳﺐ‬-‫ﺟﺎﻣﻌﺔ ﻃﻴﺒﺔ‬
‫اﻵﻟﻲ‬

CS424 Introduction to Parallel Computing Semester II 2018-2019

LAB 6: MPI – Collective Communication

Objective
 To learn the MPI collective communication.

Lab Activities
1. Use MPI_Reduce() to sum up the process rank of all the processes. The final
result is in process 0.
Here a sample with 4 processes:

2. In the same program, use MPI_Bcast() to send the result of the summation from
process 0 to all other processes.
Here a sample with 4 processes:

Dr. Mutaz & Dr. Fazilah Page 1


3. Use only MPI_Allreduce() to do operation in (1) and (2) above.
4. Modify the sequential vector summation of Lab 1 using MPI_Scatter() and
MPI_Gather().
Here a sample with 4 processes:

5. Repeat number (4) using MPI_Allgather().

Exercises
1. Modify the program in (5) so that process 0 reads the values into the vector, does
the summation and print the final results.
2. Examine the effect of multiple calls to MPI_Reduce () as presented in Table 3.3
section 3.4.3 of Pacheco.
Note:

All provided samples were written using VS2017 which can't run the MPI programs directly
by IDE, so "cmd.exe" was used to execute and run the programs, with VS2010 there is no
need to use the command, the programs can be run directly by IDE.

Dr. Mutaz & Dr. Fazilah Page 2


LAB 6: MPI – Collective Communication

1:
#include "stdafx.h"
#include <stdio.h>
#include <string.h>
#include <mpi.h>
#include <iostream>

int main(void) {
int comm_sz;
int my_rank;
int sum;
MPI_Init(NULL, NULL);
MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
// we add reduce instead off send/receive
MPI_Reduce(&my_rank, &sum, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);

if (my_rank == 0) {
printf("Sum rank of %d processes is %d!\n", comm_sz, sum);
}

MPI_Finalize();
//system("pause");
return 0;
}
2.
#include "stdafx.h"
#include <stdio.h>
#include <string.h>
#include <mpi.h>
#include <iostream>

int main(void) {
int comm_sz;
int my_rank;
int sum;
MPI_Init(NULL, NULL);
MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

MPI_Reduce(&my_rank, &sum, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);


MPI_Bcast(&sum, 1, MPI_INT, 0, MPI_COMM_WORLD);

printf("Process %d: Sum rank of %d processes is %d!\n", my_rank, comm_sz, sum);

MPI_Finalize();
//system("pause");
return 0;
}
3.

#include "stdafx.h"
#include <stdio.h>
#include <string.h>
#include <mpi.h>
#include <iostream>

int main(void) {
int comm_sz;
int my_rank;
int sum;
MPI_Init(NULL, NULL);
MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

MPI_Allreduce(&my_rank, &sum, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);

printf("Process %d: Sum rank of %d processes is %d!\n", my_rank, comm_sz, sum);

MPI_Finalize();
//system("pause");
return 0;
}
4.
#include "stdafx.h"
#include <stdio.h>
#include <string.h>
#include <mpi.h>
#include <iostream>

/*
MPI_Scatter, MPI_Gather (vector addition)
*/

void Parallel_vector_sum(int &my_rank, int local_x[], /*in*/int local_y[]/*in*/, int


z[]/*out*/, int local_n/*in*/);

int main(void) {
int my_rank, comm_sz, n = 8;
int local_n;
int x[8] = { 0, 0, 1, 1, 2, 2, 3, 3 };
int local_x[2];
int y[8] = { 0, 0, 1, 1, 2, 2, 3, 3 };
int local_y[2];
int local_z[2];
int z[8];

MPI_Init(NULL, NULL);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);

local_n = n / comm_sz;

//printf("Proccess %d\n", my_rank);

MPI_Scatter(x, local_n, MPI_INT, local_x, local_n, MPI_INT, 0, MPI_COMM_WORLD);


MPI_Scatter(y, local_n, MPI_INT, local_y, local_n, MPI_INT, 0, MPI_COMM_WORLD);
Parallel_vector_sum(my_rank, local_x, local_y, local_z, local_n);
MPI_Gather(&local_z, 2, MPI_INT, &z, 2, MPI_INT, 0, MPI_COMM_WORLD);

if (my_rank == 0) {
for (int i = 0; i<n; i++)
printf("%d: %d\n", i, z[i]);
}
MPI_Finalize();
//system("pause");
return 0;
}

void Parallel_vector_sum(int &my_rank, int local_x[], /*in*/int local_y[]/*in*/, int


local_z[]/*out*/, int local_n/*in*/) {
int local_i;

for (local_i = 0; local_i < local_n; local_i++)


local_z[local_i] = local_x[local_i] + local_y[local_i];
}
//--------------------in java
//scater devide data
#include "stdafx.h"
#include <stdio.h>
#include <string.h>
#include <iostream>
/*
vector addition
*/
void Vector_sum(int x[], int y[], int z[], int n);
int main(void) {
int x[8] = { 0, 0, 1, 1, 2, 2, 3, 3 };
int y[8] = { 0, 0, 1, 1, 2, 2, 3, 3 };
int z[8];
int i, n = 8;
Vector_sum(x, y, z, n);
printf("X:\n");
for (i = 0; i < n; i++)
printf("%d: %d\n", i, x[i]);
printf("Y:\n");
for (i = 0; i < n; i++)
printf("%d: %d\n", i, y[i]);
printf("X+Y:\n");
for (i = 0; i < n; i++)
printf("%d: %d\n", i, z[i]);
//system("pause");
return 0;
}
void Vector_sum(int x[], int y[], int z[], int n) {
int i;
for (i = 0; i < n; i++)
z[i] = x[i] + y[i];

//-----------------------in java
#include <stdio.h>
#include <string.h>
#include <iostream>
/*
matrix-vector multiplication
*/
void Mat_vect_mult(int local_A[] /*in*/, int local_x[] /*in*/, int local_y[]
/*out*/, int
local_m /*in*/,
int n /*in*/);
int main(void) {
int my_rank, comm_sz, m = 16, n = 4;
int A[16] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 0, 1, 2, 5 };
int x[4] = { 0, 1, 1, 2 };
int y[4];
printf("A:\n");
for (int i = 0; i < m; i++) {
printf("%d:%d\n", i, A[i]);
}
printf("X:\n");
for (int i = 0; i < n; i++) {
printf("%d:%d\n", i, x[i]);
}
Mat_vect_mult(A, x, y, m, n);
printf("A*X:\n");
for (int i = 0; i < m / n; i++)
printf("%d: %d\n", i, y[i]);
//system("pause");
return 0;
}
// Serial matrix-vector multiplication
void Mat_vect_mult(int A[] /*in*/, int x[] /*in*/, int y[] /*out*/, int m
/*in*/, int n
/*in*/) {
int i, j;
for (i = 0; i < m / n; i++) {
y[i] = 0;
for (j = 0; j < n; j++) {
y[i] += A[i * n + j] * x[j];
}
}
}
KINGDOM OF SAUDI ARABIA ‫اﻟﻤﻤﻠﻜﺔ اﻟﻌﺮﺑﻴﺔ اﻟﺴﻌﻮدﻳﺔ‬
Ministry of Higher Education
Taibah University ‫وزارة اﻟﺘﻌﻠﻴﻢ اﻟﻌﺎﻟﻲ‬
College of Computer Science & Engineering ‫ﻛﻠﻴﺔ ﻋﻠﻮم وﻫﻨﺪﺳﺔ اﻟﺤﺎﺳﺐ‬-‫ﺟﺎﻣﻌﺔ ﻃﻴﺒﺔ‬
‫اﻵﻟﻲ‬

CS424 Introduction to Parallel Computing Semester II 2018-2019

LAB 7: MPI - Performance Evaluation


Objective
 To write simple parallel applications using MPI.
 To evaluate the performance of parallel applications.

Lab Activities
1. Modify your sequential matrix multiplication in Lab 1, into parallel matrix
multiplication.
2. Insert the necessary code to time the parallel matrix multiplication (please refer to
section 3.6.1).
3. Get the sequential run-time and parallel run-time for 2, 4, 8 and 16 processes for
varying matrix sizes as in Table 3.5 (Pacheco).
4. Calculate the speed up and the efficiency of the parallel program in (3) and plot a
graph for both.

Exercises
1. Obtain a speed up and efficiency for the following application using varying processes
and data size:
a. parallel trapezoid program (section 3.2.2).
b. parallel pi computation (section 4.4).

Dr. Mutaz & Dr. Fazilah Page 1


A sample with 4 processes for Lab Activity 1:

Dr. Mutaz & Dr. Fazilah Page 2


A sample with 4 processes for Lab Activity 2:

Dr. Mutaz & Dr. Fazilah Page 3


Note:

All provided samples were written using VS2017 which can't run the MPI programs directly
by IDE, so "cmd.exe" was used to execute and run the programs, with VS2010 there is no
need to use the command, the programs can be run directly by IDE.

Dr. Mutaz & Dr. Fazilah Page 4


LAB 7: MPI - Performance Evaluation

1:
#include "stdafx.h"
#include <stdio.h>
#include <string.h>
#include <mpi.h>
#include <iostream>

/*
matrix-vector multiplication
*/

void Mat_vect_mult(int local_A[] /*in*/, int x[] /*in*/, int local_y[] /*out*/, int
local_m /*in*/,
int n /*in*/, MPI_Comm comm /*in*/);

int main(void) {
int my_rank, comm_sz, m = 8/*number of rows*/, n = 4 /*number of columns*/,
local_m;
int A[32] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 }; // m*n
int x[4] = { 0, 1, 3, 4 }; // n
int y[8]; // m
int local_A[8]; // local_m * n
int local_y[2]; // local_m

MPI_Init(NULL, NULL);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);

local_m = m / comm_sz;

printf("Proccess %d\n", my_rank);

MPI_Scatter(A, local_m*n, MPI_INT, local_A, local_m*n, MPI_INT, 0,


MPI_COMM_WORLD);
//printf("local_A\n");
//for (int i = 0; i<local_m*n; i++) {
// printf("%d:%d\n", i, local_A[i]);
//}

Mat_vect_mult(local_A, x, local_y, local_m, n, MPI_COMM_WORLD);

MPI_Gather(local_y, local_m, MPI_INT, y, local_m, MPI_INT, 0, MPI_COMM_WORLD);

//printf("local_y:\n");
//for (int i = 0; i<local_m; i++)
// printf("%d: %d\n", i, local_y[i]);

if (my_rank == 0) {
printf("A:\n");
for (int i = 0; i<m*n; i++) {
printf("%d:%d\n", i, A[i]);
}

printf("X:\n");
for (int i = 0; i<n; i++) {
printf("%d:%d\n", i, x[i]);
}

printf("A*X:\n");
for (int i = 0; i<m; i++)
printf("%d: %d\n", i, y[i]);
}

MPI_Finalize();
//system("pause");
return 0;
}

// An MPI matrix-vector multiplication function


void Mat_vect_mult(int local_A[] /*in*/, int x[] /*in*/, int local_y[] /*out*/, int
local_m /*in*/,
int n /*in*/, MPI_Comm comm /*in*/) {
int local_i, j;

MPI_Bcast(x, n, MPI_INT, 0, comm);

for (local_i = 0; local_i < local_m; local_i++) {


local_y[local_i] = 0;
for (j = 0; j < n; j++)
local_y[local_i] += local_A[local_i*n + j] * x[j];
}
}
2.
#include "stdafx.h"
#include <stdio.h>
#include <string.h>
#include <mpi.h>
#include <iostream>

/*
matrix-vector multiplication
*/

void Mat_vect_mult(int local_A[] /*in*/, int x[] /*in*/, int local_y[] /*out*/, int
local_m /*in*/,
int n /*in*/, MPI_Comm comm /*in*/);

int main(void) {
int my_rank, comm_sz, m = 8/*number of rows*/, n = 4 /*number of columns*/,
local_m;
int A[32] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 }; // m*n
int x[4] = { 0, 1, 3, 4 }; // n
int y[8]; // m
int local_A[8]; // local_m * n
int local_y[2]; // local_m

MPI_Init(NULL, NULL);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);

local_m = m / comm_sz;

printf("Proccess %d\n", my_rank);

MPI_Scatter(A, local_m*n, MPI_INT, local_A, local_m*n, MPI_INT, 0,


MPI_COMM_WORLD);
//printf("local_A\n");
//for (int i = 0; i<local_m*n; i++) {
// printf("%d:%d\n", i, local_A[i]);
//}

double local_start, local_finish, local_elapsed, elapsed;


MPI_Barrier(MPI_COMM_WORLD);

local_start = MPI_Wtime();

Mat_vect_mult(local_A, x, local_y, local_m, n, MPI_COMM_WORLD);

local_finish = MPI_Wtime();
local_elapsed = local_finish - local_start;

MPI_Reduce(&local_elapsed, &elapsed, 1, MPI_DOUBLE, MPI_MAX, 0, MPI_COMM_WORLD);

printf("Proc %d > Elapsed time = %f seconds\n", my_rank, local_elapsed);


MPI_Gather(local_y, local_m, MPI_INT, y, local_m, MPI_INT, 0, MPI_COMM_WORLD);

//printf("local_y:\n");
//for (int i = 0; i<local_m; i++)
// printf("%d: %d\n", i, local_y[i]);

if (my_rank == 0) {
printf("A:\n");
for (int i = 0; i<m*n; i++) {
printf("%d:%d\n", i, A[i]);
}

printf("X:\n");
for (int i = 0; i<n; i++) {
printf("%d:%d\n", i, x[i]);
}

printf("A*X:\n");
for (int i = 0; i<m; i++)
printf("%d: %d\n", i, y[i]);

printf("Max Elapsed time = %f seconds\n", elapsed);


}

MPI_Finalize();
//system("pause");
return 0;
}

// An MPI matrix-vector multiplication function


void Mat_vect_mult(int local_A[] /*in*/, int x[] /*in*/, int local_y[] /*out*/, int
local_m /*in*/,
int n /*in*/, MPI_Comm comm /*in*/) {
int local_i, j;

MPI_Bcast(x, n, MPI_INT, 0, comm);

for (local_i = 0; local_i < local_m; local_i++) {


local_y[local_i] = 0;
for (j = 0; j < n; j++)
local_y[local_i] += local_A[local_i*n + j] * x[j];
}
}

You might also like