DEPARTMENT:
LABORATORY:
Name Roll No
Semester Course Branch
Subject Code & Title .
University Register No.
Certified that this is a bonafide record of work done by the above student in the laboratory
during the year 2022 - 2023.
Signature of the staff in-charge Head of the Department
Submitted for the Anna University practical examination held on _______________
INTERNAL EXAMINER EXTERNAL EXAMINER
S.NO DATE TITLE OF THE EXPERIMENT PAGE REMARKS
No
OpenMP Fork-Join Parallelism
Aim:
To Write a simple Program to demonstrate an OpenMP Fork-Join Parallelism
Algorithm:
Identify the task to be parallelized: Start by identifying a task or a set of tasks that can be
executed concurrently. These tasks should be independent or have minimal dependencies
on each other.
Fork phase: In this phase, the program splits into multiple concurrent threads or processes
to execute the identified tasks. The fork operation creates child threads or processes, each
responsible for executing a specific portion of the work.
Assign tasks to threads/processes: Divide the identified tasks among the available threads
or processes. Each thread/process should be assigned a distinct portion of the overall
work to execute independently.
Execute tasks concurrently: Each thread/process executes its assigned task(s)
independently and concurrently with other threads/processes. This parallel execution can
significantly improve performance by utilizing multiple computing resources.
Join phase: Once all the threads/processes have completed their respective tasks, they
rejoin the main thread/process. The join operation waits for all child threads/processes to
finish their execution before proceeding further.
Merge results: After the join phase, you can collect and merge the results obtained from
each thread/process. This step allows you to combine the individual outputs of parallel
tasks into a final result.
Continue program execution: With the merged results, you can continue with the
remaining sequential portion of the program, which might involve further processing or
output generation based on the parallel computations.
Program:
#include <stdio.h>
#include <omp.h>
int main(void)
{
printf("Before: total thread number is %d\n", omp_get_num_threads());
#pragma omp parallel
{
printf("Thread id is %d\n", omp_get_thread_num());
}
printf("After: total thread number is %d\n", omp_get_num_threads());
return 0;
}
Output:
# gcc -fopenmp parallel.c
# ./a.out
Before: total thread number is 1
Thread id is 0
Thread id is 1
After: total thread number is 1
Result:
Thus the C program for a simple Program to demonstrate an OpenMP Fork-Join Parallelism
written and the output is obtained successfully.
Matrix-Vector Multiplication B=Ax
Aim:
To write a C++ program for a simple matrix-vector multiplication b=Ax
Algorithm:
Start by checking if the matrices are compatible for multiplication. This means that the
number of columns in the first matrix should be equal to the number of rows in the
second matrix. If they are not equal, then matrix multiplication is not possible.
Create a new matrix to store the result. The number of rows in the new matrix will be
equal to the number of rows in the first matrix, and the number of columns will be equal
to the number of columns in the second matrix.
Use nested loops to iterate over each element of the new matrix. The outer loop will
iterate over the rows of the first matrix, and the inner loop will iterate over the columns of
the second matrix.
Within the nested loops, multiply the corresponding elements of the row in the first
matrix by the column in the second matrix, and sum the results to get the value for the
corresponding element in the new matrix.
Repeat this process for each element in the new matrix, until all elements have been
calculated.
Once the multiplication is complete, return the new matrix as the result of the
multiplication operation.
Program:
#include <iostream>
#include <vector>
#include <omp.h>
// Function to perform matrix-vector multiplication
std::vector<int> matrixVectorMultiplication(const std::vector<std::vector<int>>& matrix, const
std::vector<int>& vector) {
int matrixSize = matrix.size();
std::vector<int> result(matrixSize, 0);
#pragma omp parallel for
for (int i = 0; i < matrixSize; i++) {
for (int j = 0; j < matrixSize; j++) {
result[i] += matrix[i][j] * vector[j];
}
}
return result;
}
int main() {
std::vector<std::vector<int>> matrix = {{1, 2, 3},
{4, 5, 6},
{7, 8, 9}};
std::vector<int> vector = {1, 2, 3};
std::vector<int> result = matrixVectorMultiplication(matrix, vector);
// Print the result
std::cout <<"Result: ";
for (int i = 0; i < result.size(); i++) {
std::cout << result[i] <<"";
}
std::cout << std::endl;
return 0;
}
Output:
/tmp/u1BsL3pgem.o
Result: 14 32 50
Result:
Thus the C++ program for a simple matrix-vector multiplication b=Ax has been written and
output is obtained successfully.
SUM OF ALL THE ELEMENTS IN AN ARRAY AND LARGEST
NUMBER
Aim:
To write a C/ C++ for sum of all numbers in the array and finding the largest number of
the array
Algorithm:
1. Start by defining a variable to hold the sum of numbers. Set this variable to 0.
2. Define a variable to hold the largest number. Set this variable to the smallest possible
value, often represented as negative infinity.
3. If you have a list of numbers, iterate over each number in the list.
4. For each number, add it to the sum variable.
5. Update the largest number variable if the current number is greater than the previously
stored largest number.
6. Continue iterating over the remaining numbers in the list until all numbers have been
processed.
7. Once all the numbers have been processed, you can use the sum variable to get the sum
of the numbers.
8. You can also use the largest number variable to get the largest number.
9. Return the sum of the numbers and the largest number as the final result.
Program:
#include <iostream>
#include <vector>
#include <omp.h>
// Function to compute the sum of all elements in an array
int computeArraySum(const std::vector<int>& array) {
int sum = 0;
int arraySize = array.size();
#pragma omp parallel for reduction(+: sum)
for (int i = 0; i < arraySize; i++) {
sum += array[i];
return sum;
int main() {
std::vector<int> array = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
int sum = computeArraySum(array);
std::cout <<"Sum: "<< sum << std::endl;
return 0;
//largest number
#include <iostream>
#include <vector>
#include <limits>
#include <omp.h>
// Function to find the largest number in an array
int findLargestNumber(const std::vector<int>& array) {
int largestNumber = std::numeric_limits<int>::min();
int arraySize = array.size();
#pragma omp parallel for reduction(max: largestNumber)
for (int i = 0; i < arraySize; i++) {
if (array[i] > largestNumber) {
largestNumber = array[i];
return largestNumber;
int main() {
std::vector<int> array = {10, 7, 22, 4, 15, 9, 17, 3, 12};
int largestNumber = findLargestNumber(array);
std::cout <<"Largest Number: "<< largestNumber << std::endl;
return 0;
}
Output:
/tmp/cBkbkWbFPW.o
Sum: 55
/tmp/cBkbkWbFPW.o
Largest Number: 22
Result:
Thus the C++ program for a sum of all number and find the largest number was written and
output is obtained successfully.
MESSAGE-PASSING LOGIC
Aim:
To Write a simple Program demonstrating Message-Passing logic using OpenMP
Algorithm:
1. Include the <iostream> and <mpi.h> header files in your program to use the MPI
functions and datatypes.
2. Initialize MPI at the beginning of your program using the MPI_Init() function. Pass the
argc and argv parameters to this function.
3. Declare variables to hold the rank and size of the communicator using the int data type.
4. Use MPI_Comm_rank(MPI_COMM_WORLD, &rank) to get the rank of the current process,
and MPI_Comm_size(MPI_COMM_WORLD, &size) to get the total number of processes in
the communicator.
5. Implement the main logic of your program inside an if statement. The example below
demonstrates sending a message from process 0 to process 1.
o In the if block for process 0:
Declare a character array to hold the message.
Use sprintf() to format the message string with relevant information.
Use MPI_Send() to send the message to process 1. Provide the following
arguments to the function: the message buffer, the length of the message
(calculated using strlen() + 1), the datatype (MPI_CHAR), the destination
rank (1 in this case), the tag (0 in this case), and the communicator
(MPI_COMM_WORLD).
Print a message indicating that the message has been sent.
o In the else if block for process 1:
Declare a character array to receive the message.
Use MPI_Recv() to receive the message from process 0. Provide the
following arguments to the function: the message buffer, the maximum
length of the message buffer, the datatype (MPI_CHAR), the source rank (0
in this case), the tag (0 in this case), and the communicator
(MPI_COMM_WORLD).
Print the received message.
6. Finally, call MPI_Finalize() at the end of your program to terminate MPI.
To compile and run this program using an MPI C++ compiler:
Save the code in a file (e.g., message_passing.cpp).
Open a terminal and navigate to the directory containing the file.
Compile the program using an MPI C++ compiler (e.g., mpicxx message_passing.cpp
-o message_passing).
Run the program using an MPI launcher with the desired number of processes (e.g.,
mpirun -n 2 ./message_passing).
The program will run and demonstrate message-passing between processes, printing the
messages sent and received.
Program:
#include <iostream>
#include <mpi.h>
int main(int argc, char** argv) {
int rank, size;
char message[100];
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
if (rank == 0) {
sprintf(message, "Hello from process %d", rank);
MPI_Send(message, strlen(message) + 1, MPI_CHAR, 1, 0, MPI_COMM_WORLD);
std::cout <<"Message Sent: "<< message << std::endl;
} else if (rank == 1) {
MPI_Recv(message, 100, MPI_CHAR, 0, 0, MPI_COMM_WORLD,
MPI_STATUS_IGNORE);
std::cout <<"Message Received: "<< message << std::endl;
}
MPI_Finalize();
return 0;
Output:
Message Sent: Hello from process 0
Message Received: Hello from process 0
Result:
Thus the C++ program for a massage passing logic was written and the output is obtained
successfully.
Floyd's Algorithm
Aim:
To write a C++ program for All-Pairs Shortest-Path Problem (Floyd's Algorithm).
Algorithm:
Initialize the distance matrix:
Create a 2D distance matrix of size n x n, where n is the number of vertices in the graph.
Set all matrix elements to INF (representing an unreachable path) except for the diagonal
elements, which should be set to 0 (representing the distance from a vertex to itself).
Input the edges and their weights:
Get the number of edges (m) from the user.
Iterate m times and for each iteration:
Input the source vertex (u), destination vertex (v), and weight of the edge.
Update the distance matrix with the provided weight at the corresponding matrix indices.
Perform the All-Pairs Shortest-Path calculation using Floyd's Algorithm:
Iterate over k from 0 to n-1 (representing intermediate vertices):
For each pair of vertices (i, j) in the distance matrix:
If the distance from i to k and the distance from k to j is smaller than the current distance
from i to j:
Update the distance from i to j with the sum of the distances from i to k and k to j.
Output the resulting distance matrix:
Print the distance matrix, showing the shortest distances between all pairs of vertices.
If a path is unreachable, display INF instead of the actual distance.
Program:
#include <iostream>
#include <vector>
#include <limits>
#define INF std::numeric_limits<int>::max()
void floydAlgorithm(std::vector<std::vector<int>>& graph, int n) {
// Applying Floyd's Algorithm
for (int k = 0; k < n; ++k) {
for (int i = 0; i < n; ++i) {
for (int j = 0; j < n; ++j) {
if (graph[i][k] != INF && graph[k][j] != INF && graph[i][k] + graph[k][j] < graph[i]
[j]) {
graph[i][j] = graph[i][k] + graph[k][j];
}
}
}
}
}
int main() {
int n; // Number of vertices
// Get the number of vertices from the user
std::cout <<"Enter the number of vertices: ";
std::cin >> n;
// Create the graph matrix and initialize it
std::vector<std::vector<int>> graph(n, std::vector<int>(n));
std::cout <<"Enter the adjacency matrix (use INF for unreachable vertices):"<< std::endl;
for (int i = 0; i < n; ++i) {
for (int j = 0; j < n; ++j) {
std::cin >> graph[i][j];
if (graph[i][j] == -1) {
graph[i][j] = INF;
}
}
}
// Apply Floyd's Algorithm
floydAlgorithm(graph, n);
// Print the resulting shortest paths
std::cout <<"Shortest paths between vertices:"<< std::endl;
for (int i = 0; i < n; ++i) {
for (int j = 0; j < n; ++j) {
if (graph[i][j] == INF) {
std::cout <<"INF\t";
} else {
std::cout << graph[i][j] <<"\t";
}
}
std::cout << std::endl;
}
return 0;
}
Output:
Result:
Thus the C++ program for an All-Pairs Shortest-Path Problem (Floyd's Algorithm) was
written and the output is obtained successfully.
Parallel Random Number Generators using Monte Carlo
Methods
Aim:
To Implement a program Parallel Random Number Generators using Monte Carlo
Methods in OpenMP.
Algorithm:
Determine the number of random points (numPoints) you want to generate and the
number of threads (numThreads) you want to use for parallel execution.
Initialize the random number generator. Use a suitable random number generator library
or function to generate random numbers. Make sure to seed the generator properly to
ensure different sequences of random numbers for each thread.
Set the number of threads for parallel execution using the appropriate function provided
by your parallel programming framework or library. For example, in OpenMP, you can
use the omp_set_num_threads() function.
Divide the total number of random points (numPoints) among the specified number of
threads (numThreads) using parallel loop constructs provided by your parallel
programming framework. For example, in OpenMP, you can use the omp for pragma.
Inside the parallel loop, each thread generates its portion of random points. Use the
random number generator to generate random numbers within the desired range.
For each generated random point, perform the desired computation or estimation using
the Monte Carlo method. This can involve checking if the random point falls within a
specific region or satisfies certain conditions.
If necessary, accumulate the results or statistics obtained by each thread using appropriate
synchronization mechanisms provided by your parallel programming framework. For
example, you can use atomic operations or reduction clauses to safely accumulate the
results from each thread.
Once the parallel computation is complete, combine the accumulated results or statistics
obtained from each thread to obtain the final result of your Monte Carlo estimation.
Output the result or perform any additional post-processing required.
Compile and run the program, making sure to enable the necessary parallel execution
support in your compiler or by linking against the appropriate parallel programming
libraries.
Program:
#include <iostream>
#include <random>
#include <vector>
#include <omp.h>
// Function to estimate Pi using Monte Carlo method in parallel
double estimatePiParallel(int numPoints) {
int numPointsInsideCircle = 0;
double x, y;
#pragma omp parallel private(x, y) reduction(+: numPointsInsideCircle)
{
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_real_distribution<double> dis(-1.0, 1.0);
#pragma omp for
for (int i = 0; i < numPoints; i++) {
x = dis(gen);
y = dis(gen);
if ((x * x) + (y * y) <= 1.0) {
numPointsInsideCircle++;
}
}
}
return 4.0 * numPointsInsideCircle / numPoints;
}
int main() {
int numPoints;
// Get the number of points from the user
std::cout <<"Enter the number of points: ";
std::cin >> numPoints;
// Estimate Pi using Monte Carlo method in parallel
double pi = estimatePiParallel(numPoints);
std::cout <<"Estimated Pi: "<< pi << std::endl;
return 0;
}
Output:
/tmp/hQmpjNxNV9.o
Enter the number of points: 1
Estimated Pi: 4
Result:
Thus the C++ program for Parallel Random Number Generators using Monte Carlo Method
was written and output is obtained successfully.
MPI-Broadcast-And-Collective-Communication
Aim:
To write a C program for MPI-broadcast-and-collective-communication.
Algorithm:
MPI Broadcast and Collective Communication
So far in the MPI tutorials, we have examined point-to-point communication, which is
communication between two processes. This lesson is the start of the collective
communication section. Collective communication is a method of communication which
involves participation of all processes in a communicator. In this lesson, we will discuss
the implications of collective communication and go over a standard collective routine -
broadcasting.
Note - All of the code for this site is on GitHub. This tutorial’s code is
under tutorials/mpi-broadcast-and-collective-communication/code.
Collective communication and synchronization points
One of the things to remember about collective communication is that it implies
a synchronization point among processes. This means that all processes must reach a
point in their code before they can all begin executing again.
Before going into detail about collective communication routines, let’s examine
synchronization in more detail. As it turns out, MPI has a special function that is
dedicated to synchronizing processes:
MPI_Barrier(MPI_Comm communicator)
The name of the function is quite descriptive - the function forms a barrier, and no
processes in the communicator can pass the barrier until all of them call the function.
Here’s an illustration. Imagine the horizontal axis represents execution of the program
and the circles represent different processes:
Process zero first calls MPI_Barrier at the first time snapshot (T 1). While process zero is
hung up at the barrier, process one and three eventually make it (T 2). When process two
finally makes it to the barrier (T 3), all of the processes then begin execution again (T 4).
Program:
#include <stdio.h>
#include <mpi.h>
int main(int argc, char* argv[]) {
int rank, size;
int data;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
if (rank == 0) {
data = 42;
}
// Broadcasting the data from rank 0 to all other processes
MPI_Bcast(&data, 1, MPI_INT, 0, MPI_COMM_WORLD);
printf("Process %d received data: %d\n", rank, data);
int sum = 0;
// Performing collective communication to calculate the sum of all data
MPI_Allreduce(&data, &sum, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);
printf("Process %d calculated sum: %d\n", rank, sum);
MPI_Finalize();
return 0;
}
Output:
mpirun -n 4 ./my_bcast
Process 0 broadcasting data 100
Process 2 received data 100 from root process
Process 3 received data 100 from root process
Process 1 received data 100 from root process
Result:
Thus the C++ program for MPI-broadcast-and-collective-communicationwas written and
output is obtained successfully
MPI-Scatter-Gather-And-All
Aim:
To Write a C Program to demonstrate MPI-scatter-gather-and-all gather
Algorithm:
I. Introduction
A. Briefly explain the purpose and importance of MPI scatter, gather, and all gather operations.
B. Provide an overview of the procedure and its steps.
II. Prerequisites
A. List the necessary software and tools required for performing MPI scatter, gather, and all
gather.
B. Specify any specific hardware or network requirements, if applicable.
III. Step-by-step Procedure
A. Step 1: Initialize MPI environment and set up the necessary variables.
B. Step 2: Define the data to be scattered or gathered.
C. Step 3: Determine the sending and receiving buffers for each MPI process.
D. Step 4: Implement the scatter operation to distribute the data evenly among the MPI
processes.
E. Step 5: Perform necessary computations or operations on the scattered data.
F. Step 6: Implement the gather operation to collect the processed data from each MPI process.
G. Step 7: Perform necessary computations or operations on the gathered data.
H. Step 8: Implement the all gather operation to share the gathered data with all MPI processes.
I. Step 9: Finalize the MPI environment and clean up any allocated resources.
IV. Best Practices and Tips
A. Provide recommendations for optimizing performance and avoiding common pitfalls.
B. Include suggestions for error handling and debugging.
Program:
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
int main(int argc, char* argv[]) {
int rank, size;
int sendbuf[8] = {1, 2, 3, 4, 5, 6, 7, 8};
int recvbuf[2];
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
// Scatter: Distribute the send buffer elements to all processes
MPI_Scatter(sendbuf, 2, MPI_INT, recvbuf, 2, MPI_INT, 0, MPI_COMM_WORLD);
printf("Process %d received data: %d %d\n", rank, recvbuf[0], recvbuf[1]);
// Gather: Collect the data from all processes into the recv buffer of root process
MPI_Gather(recvbuf, 2, MPI_INT, sendbuf, 2, MPI_INT, 0, MPI_COMM_WORLD);
if (rank == 0) {
printf("Gathered data on root process: ");
for (int i = 0; i < size * 2; i++) {
printf("%d ", sendbuf[i]);
printf("\n");
// Allgather: Gather data from all processes and distribute to all processes
MPI_Allgather(recvbuf, 2, MPI_INT, sendbuf, 2, MPI_INT, MPI_COMM_WORLD);
printf("Allgathered data on process %d: ", rank);
for (int i = 0; i < size * 2; i++) {
printf("%d ", sendbuf[i]);
printf("\n");
MPI_Finalize();
return 0;
Output:
/home/kendall/bin/mpirun -n 4 ./avg 100
Avg of all elements is 0.478699
Avg computed across original data is 0.478699
Result:
Thus the C++ program for MPI-scatter-gather-and-all gather was written and output is
obtained successfully.
MPI - SEND AND RECEIVE
Aim:
To write a C program for MPI-send-and-receive.
Algorithm:
MPI Send and Receive
Sending and receiving are the two foundational concepts of MPI. Almost every single function in
MPI can be implemented with basic send and receive calls. In this lesson, I will discuss how to
use MPI’s blocking sending and receiving functions, and I will also overview other basic
concepts associated with transmitting data using MPI..
Overview of sending and receiving with MPI
MPI’s send and receive calls operate in the following manner. First, process A decides a message
needs to be sent to process B. Process A then packs up all of its necessary data into a buffer for
process B. These buffers are often referred to as envelopes since the data is being packed into a
single message before transmission (similar to how letters are packed into envelopes before
transmission to the post office). After the data is packed into a buffer, the communication device
(which is often a network) is responsible for routing the message to the proper location. The
location of the message is defined by the process’s rank.
Even though the message is routed to B, process B still has to acknowledge that it wants to
receive A’s data. Once it does this, the data has been transmitted. Process A is acknowledged
that the data has been transmitted and may go back to work.
Sometimes there are cases when A might have to send many different types of messages to B.
Instead of B having to go through extra measures to differentiate all these messages, MPI allows
senders and receivers to also specify message IDs with the message (known as tags). When
process B only requests a message with a certain tag number, messages with different tags will
be buffered by the network until B is ready for them.
Program:
#include <stdio.h>
#include <mpi.h>
int main(int argc, char* argv[]) {
int rank, size;
int send_data = 42;
int recv_data;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
if (rank == 0) {
// Process 0 sends data to process 1
MPI_Send(&send_data, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
printf("Process 0 sent data: %d\n", send_data);
} else if (rank == 1) {
// Process 1 receives data from process 0
MPI_Recv(&recv_data, 1, MPI_INT, 0, 0, MPI_COMM_WORLD,
MPI_STATUS_IGNORE);
printf("Process 1 received data: %d\n", recv_data);
MPI_Finalize();
return 0;
}
Output:
0 sent and incremented ping_pong_count 1 to 1
0 received ping_pong_count 2 from 1
0 sent and incremented ping_pong_count 3 to 1
0 received ping_pong_count 4 from 1
0 sent and incremented ping_pong_count 5 to 1
0 received ping_pong_count 6 from 1
0 sent and incremented ping_pong_count 7 to 1
0 received ping_pong_count 8 from 1
0 sent and incremented ping_pong_count 9 to 1
0 received ping_pong_count 10 from 1
1 received ping_pong_count 1 from 0
1 sent and incremented ping_pong_count 2 to 0
1 received ping_pong_count 3 from 0
1 sent and incremented ping_pong_count 4 to 0
1 received ping_pong_count 5 from 0
1 sent and incremented ping_pong_count 6 to 0
1 received ping_pong_count 7 from 0
1 sent and incremented ping_pong_count 8 to 0
1 received ping_pong_count 9 from 0
1 sent and incremented ping_pong_count 10 to 0
Result:
Thus the C++ program for MPI-send-and-receive was written and output is obtained
successfully.
Parallel-Rank-With-MPI
Aim:
To write C program for parallel-rank-with-MPI
Algorithm:
Performing Parallel Rank with MPI
In the previous lesson, we went over MPI_Scatter, MPI_Gather, and MPI_Allgather. We are
going to expand on basic collectives in this lesson by coding a useful function for your MPI
toolkit - parallel rank.
Parallel rank - problem overview
When processes all have a single number stored in their local memory, it can be useful to know
what order their number is in respect to the entire set of numbers contained by all processes. For
example, a user might be benchmarking the processors in an MPI cluster and want to know the
order of how fast each processor is relative to the others. This information can be used for
scheduling tasks and so on. As you can imagine, it is rather difficult to find out a number’s order
in the context of all other numbers if they are spread across processes. This problem - the parallel
rank problem - is what we are going to solve in this lesson.
An illustration of the input and output of parallel rank is below:
The processes in the illustration (labeled 0 through 3) start with four numbers - 5, 2, 7, and 4.
The parallel rank algorithm then computes that process 1 has rank 0 in the set of numbers (i.e.
the first number), process 3 has rank 1, process 0 has rank 2, and process 2 has the last rank in
the set of numbers. Pretty simple, right?
Parallel rank API definition
Before we dive into solving the parallel rank problem, let’s first decide on how our function is
going to behave. Our function needs to take a number on each process and return its associated
rank with respect to all of the other numbers across all processes. Along with this, we will need
other miscellaneous information, such as the communicator that is being used, and the datatype
of the number being ranked.
Program:
#include <stdio.h>
#include <mpi.h>
int main(int argc, char* argv[]) {
int rank, size;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
printf("Hello from process %d of %d\n", rank, size);
MPI_Finalize();
return 0;
}
Output:
mpirun -n 4 ./random_rank 100
Rank for 0.242578 on process 0 - 0
Rank for 0.894732 on process 1 - 3
Rank for 0.789463 on process 2 - 2
Rank for 0.684195 on process 3 – 1
Result:
Thus the C++ program for parallel-rank-with-MPI was written and output is obtained
successfully.