0% found this document useful (0 votes)

1 views34 pages

Multicore Architecture Lab Manual

The document outlines various laboratory experiments conducted during the academic year 2022-2023, including programs demonstrating OpenMP Fork-Join Parallelism, matrix-vector multiplication, array sum and largest number finding, message-passing logic, Floyd's Algorithm for shortest paths, and Monte Carlo methods for parallel random number generation. Each experiment includes an aim, algorithm, program code, and output results, confirming successful execution. The document serves as a practical record for students preparing for their university examinations.

Uploaded by

reshmaappus27

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views34 pages

Multicore Architecture Lab Manual

Uploaded by

reshmaappus27

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 34

DEPARTMENT:

LABORATORY:

Name Roll No

Semester Course Branch

Subject Code & Title .

University Register No.

Certified that this is a bonafide record of work done by the above student in the laboratory
during the year 2022 - 2023.

Signature of the staff in-charge Head of the Department

Submitted for the Anna University practical examination held on _______________

INTERNAL EXAMINER EXTERNAL EXAMINER

S.NO DATE TITLE OF THE EXPERIMENT PAGE REMARKS

No
OpenMP Fork-Join Parallelism
Aim:
To Write a simple Program to demonstrate an OpenMP Fork-Join Parallelism
Algorithm:
 Identify the task to be parallelized: Start by identifying a task or a set of tasks that can be
executed concurrently. These tasks should be independent or have minimal dependencies
on each other.
 Fork phase: In this phase, the program splits into multiple concurrent threads or processes
to execute the identified tasks. The fork operation creates child threads or processes, each
responsible for executing a specific portion of the work.
 Assign tasks to threads/processes: Divide the identified tasks among the available threads
or processes. Each thread/process should be assigned a distinct portion of the overall
work to execute independently.
 Execute tasks concurrently: Each thread/process executes its assigned task(s)
independently and concurrently with other threads/processes. This parallel execution can
significantly improve performance by utilizing multiple computing resources.
 Join phase: Once all the threads/processes have completed their respective tasks, they
rejoin the main thread/process. The join operation waits for all child threads/processes to
finish their execution before proceeding further.
 Merge results: After the join phase, you can collect and merge the results obtained from
each thread/process. This step allows you to combine the individual outputs of parallel
tasks into a final result.
 Continue program execution: With the merged results, you can continue with the
remaining sequential portion of the program, which might involve further processing or
output generation based on the parallel computations.
Program:
#include <stdio.h>
#include <omp.h>

int main(void)
{
printf("Before: total thread number is %d\n", omp_get_num_threads());

#pragma omp parallel

{
printf("Thread id is %d\n", omp_get_thread_num());
}

printf("After: total thread number is %d\n", omp_get_num_threads());

return 0;
}

Output:
# gcc -fopenmp parallel.c
# ./a.out
Before: total thread number is 1
Thread id is 0
Thread id is 1
After: total thread number is 1

Result:
Thus the C program for a simple Program to demonstrate an OpenMP Fork-Join Parallelism
written and the output is obtained successfully.

Matrix-Vector Multiplication B=Ax

Aim:
To write a C++ program for a simple matrix-vector multiplication b=Ax
Algorithm:
 Start by checking if the matrices are compatible for multiplication. This means that the
number of columns in the first matrix should be equal to the number of rows in the
second matrix. If they are not equal, then matrix multiplication is not possible.

 Create a new matrix to store the result. The number of rows in the new matrix will be
equal to the number of rows in the first matrix, and the number of columns will be equal
to the number of columns in the second matrix.

 Use nested loops to iterate over each element of the new matrix. The outer loop will
iterate over the rows of the first matrix, and the inner loop will iterate over the columns of
the second matrix.

 Within the nested loops, multiply the corresponding elements of the row in the first
matrix by the column in the second matrix, and sum the results to get the value for the
corresponding element in the new matrix.

 Repeat this process for each element in the new matrix, until all elements have been
calculated.

 Once the multiplication is complete, return the new matrix as the result of the
multiplication operation.

Program:
#include <iostream>
#include <vector>
#include <omp.h>

// Function to perform matrix-vector multiplication

std::vector<int> matrixVectorMultiplication(const std::vector<std::vector<int>>& matrix, const
std::vector<int>& vector) {
int matrixSize = matrix.size();
std::vector<int> result(matrixSize, 0);

#pragma omp parallel for

for (int i = 0; i < matrixSize; i++) {
for (int j = 0; j < matrixSize; j++) {
result[i] += matrix[i][j] * vector[j];
}
}

return result;
}

int main() {
std::vector<std::vector<int>> matrix = {{1, 2, 3},
{4, 5, 6},
{7, 8, 9}};

std::vector<int> vector = {1, 2, 3};

std::vector<int> result = matrixVectorMultiplication(matrix, vector);

// Print the result

std::cout <<"Result: ";
for (int i = 0; i < result.size(); i++) {
std::cout << result[i] <<"";
}
std::cout << std::endl;
return 0;
}

Output:
/tmp/u1BsL3pgem.o
Result: 14 32 50

Result:
Thus the C++ program for a simple matrix-vector multiplication b=Ax has been written and
output is obtained successfully.
SUM OF ALL THE ELEMENTS IN AN ARRAY AND LARGEST
NUMBER
Aim:
To write a C/ C++ for sum of all numbers in the array and finding the largest number of
the array
Algorithm:

1. Start by defining a variable to hold the sum of numbers. Set this variable to 0.
2. Define a variable to hold the largest number. Set this variable to the smallest possible
value, often represented as negative infinity.
3. If you have a list of numbers, iterate over each number in the list.
4. For each number, add it to the sum variable.
5. Update the largest number variable if the current number is greater than the previously
stored largest number.
6. Continue iterating over the remaining numbers in the list until all numbers have been
processed.
7. Once all the numbers have been processed, you can use the sum variable to get the sum
of the numbers.
8. You can also use the largest number variable to get the largest number.
9. Return the sum of the numbers and the largest number as the final result.

Program:

#include <iostream>

#include <vector>

#include <omp.h>

// Function to compute the sum of all elements in an array

int computeArraySum(const std::vector<int>& array) {

int sum = 0;

int arraySize = array.size();

#pragma omp parallel for reduction(+: sum)

for (int i = 0; i < arraySize; i++) {

sum += array[i];

return sum;

int main() {

std::vector<int> array = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};

int sum = computeArraySum(array);

std::cout <<"Sum: "<< sum << std::endl;

return 0;

//largest number

#include <iostream>

#include <vector>

#include <limits>

#include <omp.h>
// Function to find the largest number in an array

int findLargestNumber(const std::vector<int>& array) {

int largestNumber = std::numeric_limits<int>::min();

int arraySize = array.size();

#pragma omp parallel for reduction(max: largestNumber)

for (int i = 0; i < arraySize; i++) {

if (array[i] > largestNumber) {

largestNumber = array[i];

return largestNumber;

int main() {

std::vector<int> array = {10, 7, 22, 4, 15, 9, 17, 3, 12};

int largestNumber = findLargestNumber(array);

std::cout <<"Largest Number: "<< largestNumber << std::endl;

return 0;
}

Output:

/tmp/cBkbkWbFPW.o

Sum: 55

/tmp/cBkbkWbFPW.o

Largest Number: 22

Result:

Thus the C++ program for a sum of all number and find the largest number was written and
output is obtained successfully.
MESSAGE-PASSING LOGIC
Aim:

To Write a simple Program demonstrating Message-Passing logic using OpenMP

Algorithm:

1. Include the <iostream> and <mpi.h> header files in your program to use the MPI
functions and datatypes.
2. Initialize MPI at the beginning of your program using the MPI_Init() function. Pass the
argc and argv parameters to this function.
3. Declare variables to hold the rank and size of the communicator using the int data type.
4. Use MPI_Comm_rank(MPI_COMM_WORLD, &rank) to get the rank of the current process,
and MPI_Comm_size(MPI_COMM_WORLD, &size) to get the total number of processes in
the communicator.
5. Implement the main logic of your program inside an if statement. The example below
demonstrates sending a message from process 0 to process 1.
o In the if block for process 0:
 Declare a character array to hold the message.
 Use sprintf() to format the message string with relevant information.
 Use MPI_Send() to send the message to process 1. Provide the following
arguments to the function: the message buffer, the length of the message
(calculated using strlen() + 1), the datatype (MPI_CHAR), the destination
rank (1 in this case), the tag (0 in this case), and the communicator
(MPI_COMM_WORLD).
 Print a message indicating that the message has been sent.
o In the else if block for process 1:
 Declare a character array to receive the message.
 Use MPI_Recv() to receive the message from process 0. Provide the
following arguments to the function: the message buffer, the maximum
length of the message buffer, the datatype (MPI_CHAR), the source rank (0
in this case), the tag (0 in this case), and the communicator
(MPI_COMM_WORLD).
 Print the received message.
6. Finally, call MPI_Finalize() at the end of your program to terminate MPI.

To compile and run this program using an MPI C++ compiler:

 Save the code in a file (e.g., message_passing.cpp).

 Open a terminal and navigate to the directory containing the file.
 Compile the program using an MPI C++ compiler (e.g., mpicxx message_passing.cpp
-o message_passing).
 Run the program using an MPI launcher with the desired number of processes (e.g.,
mpirun -n 2 ./message_passing).
The program will run and demonstrate message-passing between processes, printing the
messages sent and received.

Program:

#include <iostream>

#include <mpi.h>

int main(int argc, char** argv) {

int rank, size;

char message[100];

MPI_Init(&argc, &argv);

MPI_Comm_rank(MPI_COMM_WORLD, &rank);

MPI_Comm_size(MPI_COMM_WORLD, &size);

if (rank == 0) {

sprintf(message, "Hello from process %d", rank);

MPI_Send(message, strlen(message) + 1, MPI_CHAR, 1, 0, MPI_COMM_WORLD);

std::cout <<"Message Sent: "<< message << std::endl;

} else if (rank == 1) {

MPI_Recv(message, 100, MPI_CHAR, 0, 0, MPI_COMM_WORLD,

MPI_STATUS_IGNORE);

std::cout <<"Message Received: "<< message << std::endl;

}

MPI_Finalize();

return 0;

Output:

Message Sent: Hello from process 0

Message Received: Hello from process 0

Result:

Thus the C++ program for a massage passing logic was written and the output is obtained
successfully.
Floyd's Algorithm
Aim:

To write a C++ program for All-Pairs Shortest-Path Problem (Floyd's Algorithm).

Algorithm:

Initialize the distance matrix:

 Create a 2D distance matrix of size n x n, where n is the number of vertices in the graph.
 Set all matrix elements to INF (representing an unreachable path) except for the diagonal
elements, which should be set to 0 (representing the distance from a vertex to itself).
Input the edges and their weights:
Get the number of edges (m) from the user.
 Iterate m times and for each iteration:
 Input the source vertex (u), destination vertex (v), and weight of the edge.
 Update the distance matrix with the provided weight at the corresponding matrix indices.
 Perform the All-Pairs Shortest-Path calculation using Floyd's Algorithm:
 Iterate over k from 0 to n-1 (representing intermediate vertices):
For each pair of vertices (i, j) in the distance matrix:
 If the distance from i to k and the distance from k to j is smaller than the current distance
from i to j:
 Update the distance from i to j with the sum of the distances from i to k and k to j.
Output the resulting distance matrix:
 Print the distance matrix, showing the shortest distances between all pairs of vertices.
 If a path is unreachable, display INF instead of the actual distance.

Program:
#include <iostream>
#include <vector>
#include <limits>

#define INF std::numeric_limits<int>::max()

void floydAlgorithm(std::vector<std::vector<int>>& graph, int n) {
// Applying Floyd's Algorithm
for (int k = 0; k < n; ++k) {
for (int i = 0; i < n; ++i) {
for (int j = 0; j < n; ++j) {
if (graph[i][k] != INF && graph[k][j] != INF && graph[i][k] + graph[k][j] < graph[i]
[j]) {
graph[i][j] = graph[i][k] + graph[k][j];
}
}
}
}
}

int main() {
int n; // Number of vertices

// Get the number of vertices from the user

std::cout <<"Enter the number of vertices: ";
std::cin >> n;

// Create the graph matrix and initialize it

std::vector<std::vector<int>> graph(n, std::vector<int>(n));
std::cout <<"Enter the adjacency matrix (use INF for unreachable vertices):"<< std::endl;
for (int i = 0; i < n; ++i) {
for (int j = 0; j < n; ++j) {
std::cin >> graph[i][j];
if (graph[i][j] == -1) {
graph[i][j] = INF;
}
}
}

// Apply Floyd's Algorithm

floydAlgorithm(graph, n);

// Print the resulting shortest paths

std::cout <<"Shortest paths between vertices:"<< std::endl;
for (int i = 0; i < n; ++i) {
for (int j = 0; j < n; ++j) {
if (graph[i][j] == INF) {
std::cout <<"INF\t";
} else {
std::cout << graph[i][j] <<"\t";
}
}
std::cout << std::endl;
}

return 0;
}

Output:
Result:

Thus the C++ program for an All-Pairs Shortest-Path Problem (Floyd's Algorithm) was
written and the output is obtained successfully.
Parallel Random Number Generators using Monte Carlo
Methods
Aim:
To Implement a program Parallel Random Number Generators using Monte Carlo
Methods in OpenMP.
Algorithm:
 Determine the number of random points (numPoints) you want to generate and the
number of threads (numThreads) you want to use for parallel execution.
 Initialize the random number generator. Use a suitable random number generator library
or function to generate random numbers. Make sure to seed the generator properly to
ensure different sequences of random numbers for each thread.
 Set the number of threads for parallel execution using the appropriate function provided
by your parallel programming framework or library. For example, in OpenMP, you can
use the omp_set_num_threads() function.
 Divide the total number of random points (numPoints) among the specified number of
threads (numThreads) using parallel loop constructs provided by your parallel
programming framework. For example, in OpenMP, you can use the omp for pragma.
 Inside the parallel loop, each thread generates its portion of random points. Use the
random number generator to generate random numbers within the desired range.
 For each generated random point, perform the desired computation or estimation using
the Monte Carlo method. This can involve checking if the random point falls within a
specific region or satisfies certain conditions.
 If necessary, accumulate the results or statistics obtained by each thread using appropriate
synchronization mechanisms provided by your parallel programming framework. For
example, you can use atomic operations or reduction clauses to safely accumulate the
results from each thread.
 Once the parallel computation is complete, combine the accumulated results or statistics
obtained from each thread to obtain the final result of your Monte Carlo estimation.
 Output the result or perform any additional post-processing required.
 Compile and run the program, making sure to enable the necessary parallel execution
support in your compiler or by linking against the appropriate parallel programming
libraries.
Program:
#include <iostream>
#include <random>
#include <vector>
#include <omp.h>
// Function to estimate Pi using Monte Carlo method in parallel
double estimatePiParallel(int numPoints) {
int numPointsInsideCircle = 0;
double x, y;

#pragma omp parallel private(x, y) reduction(+: numPointsInsideCircle)

{
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_real_distribution<double> dis(-1.0, 1.0);

#pragma omp for

for (int i = 0; i < numPoints; i++) {
x = dis(gen);
y = dis(gen);

if ((x * x) + (y * y) <= 1.0) {

numPointsInsideCircle++;
}
}
}

return 4.0 * numPointsInsideCircle / numPoints;

}

int main() {
int numPoints;
// Get the number of points from the user
std::cout <<"Enter the number of points: ";
std::cin >> numPoints;

// Estimate Pi using Monte Carlo method in parallel

double pi = estimatePiParallel(numPoints);

std::cout <<"Estimated Pi: "<< pi << std::endl;

return 0;
}

Output:
/tmp/hQmpjNxNV9.o
Enter the number of points: 1
Estimated Pi: 4

Result:
Thus the C++ program for Parallel Random Number Generators using Monte Carlo Method
was written and output is obtained successfully.
MPI-Broadcast-And-Collective-Communication
Aim:
To write a C program for MPI-broadcast-and-collective-communication.
Algorithm:
 MPI Broadcast and Collective Communication

 So far in the MPI tutorials, we have examined point-to-point communication, which is

communication between two processes. This lesson is the start of the collective
communication section. Collective communication is a method of communication which
involves participation of all processes in a communicator. In this lesson, we will discuss
the implications of collective communication and go over a standard collective routine -
broadcasting.
 Note - All of the code for this site is on GitHub. This tutorial’s code is
under tutorials/mpi-broadcast-and-collective-communication/code.
 Collective communication and synchronization points
 One of the things to remember about collective communication is that it implies
a synchronization point among processes. This means that all processes must reach a
point in their code before they can all begin executing again.
 Before going into detail about collective communication routines, let’s examine
synchronization in more detail. As it turns out, MPI has a special function that is
dedicated to synchronizing processes:
 MPI_Barrier(MPI_Comm communicator)
 The name of the function is quite descriptive - the function forms a barrier, and no
processes in the communicator can pass the barrier until all of them call the function.
Here’s an illustration. Imagine the horizontal axis represents execution of the program
and the circles represent different processes:


 Process zero first calls MPI_Barrier at the first time snapshot (T 1). While process zero is
hung up at the barrier, process one and three eventually make it (T 2). When process two
finally makes it to the barrier (T 3), all of the processes then begin execution again (T 4).

Program:

#include <stdio.h>
#include <mpi.h>

int main(int argc, char* argv[]) {

int rank, size;
int data;

MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);

if (rank == 0) {
data = 42;
}

// Broadcasting the data from rank 0 to all other processes

MPI_Bcast(&data, 1, MPI_INT, 0, MPI_COMM_WORLD);

printf("Process %d received data: %d\n", rank, data);

int sum = 0;

// Performing collective communication to calculate the sum of all data

MPI_Allreduce(&data, &sum, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);
printf("Process %d calculated sum: %d\n", rank, sum);

MPI_Finalize();

return 0;
}

Output:
mpirun -n 4 ./my_bcast
Process 0 broadcasting data 100
Process 2 received data 100 from root process
Process 3 received data 100 from root process
Process 1 received data 100 from root process

Result:
Thus the C++ program for MPI-broadcast-and-collective-communicationwas written and
output is obtained successfully
MPI-Scatter-Gather-And-All
Aim:
To Write a C Program to demonstrate MPI-scatter-gather-and-all gather
Algorithm:

I. Introduction

A. Briefly explain the purpose and importance of MPI scatter, gather, and all gather operations.

B. Provide an overview of the procedure and its steps.

II. Prerequisites

A. List the necessary software and tools required for performing MPI scatter, gather, and all
gather.

B. Specify any specific hardware or network requirements, if applicable.

III. Step-by-step Procedure

A. Step 1: Initialize MPI environment and set up the necessary variables.

B. Step 2: Define the data to be scattered or gathered.

C. Step 3: Determine the sending and receiving buffers for each MPI process.

D. Step 4: Implement the scatter operation to distribute the data evenly among the MPI
processes.

E. Step 5: Perform necessary computations or operations on the scattered data.

F. Step 6: Implement the gather operation to collect the processed data from each MPI process.

G. Step 7: Perform necessary computations or operations on the gathered data.

H. Step 8: Implement the all gather operation to share the gathered data with all MPI processes.

I. Step 9: Finalize the MPI environment and clean up any allocated resources.

IV. Best Practices and Tips

A. Provide recommendations for optimizing performance and avoiding common pitfalls.

B. Include suggestions for error handling and debugging.

Program:

#include <stdio.h>

#include <stdlib.h>

#include <mpi.h>

int main(int argc, char* argv[]) {

int rank, size;

int sendbuf[8] = {1, 2, 3, 4, 5, 6, 7, 8};

int recvbuf[2];

MPI_Init(&argc, &argv);

MPI_Comm_rank(MPI_COMM_WORLD, &rank);

MPI_Comm_size(MPI_COMM_WORLD, &size);

// Scatter: Distribute the send buffer elements to all processes

MPI_Scatter(sendbuf, 2, MPI_INT, recvbuf, 2, MPI_INT, 0, MPI_COMM_WORLD);

printf("Process %d received data: %d %d\n", rank, recvbuf[0], recvbuf[1]);

// Gather: Collect the data from all processes into the recv buffer of root process

MPI_Gather(recvbuf, 2, MPI_INT, sendbuf, 2, MPI_INT, 0, MPI_COMM_WORLD);

if (rank == 0) {
printf("Gathered data on root process: ");

for (int i = 0; i < size * 2; i++) {

printf("%d ", sendbuf[i]);

printf("\n");

// Allgather: Gather data from all processes and distribute to all processes

MPI_Allgather(recvbuf, 2, MPI_INT, sendbuf, 2, MPI_INT, MPI_COMM_WORLD);

printf("Allgathered data on process %d: ", rank);

for (int i = 0; i < size * 2; i++) {

printf("%d ", sendbuf[i]);

printf("\n");

MPI_Finalize();

return 0;

Output:

/home/kendall/bin/mpirun -n 4 ./avg 100

Avg of all elements is 0.478699
Avg computed across original data is 0.478699

Result:
Thus the C++ program for MPI-scatter-gather-and-all gather was written and output is
obtained successfully.
MPI - SEND AND RECEIVE

Aim:

To write a C program for MPI-send-and-receive.

Algorithm:

MPI Send and Receive

Sending and receiving are the two foundational concepts of MPI. Almost every single function in
MPI can be implemented with basic send and receive calls. In this lesson, I will discuss how to
use MPI’s blocking sending and receiving functions, and I will also overview other basic
concepts associated with transmitting data using MPI..

Overview of sending and receiving with MPI

MPI’s send and receive calls operate in the following manner. First, process A decides a message
needs to be sent to process B. Process A then packs up all of its necessary data into a buffer for
process B. These buffers are often referred to as envelopes since the data is being packed into a
single message before transmission (similar to how letters are packed into envelopes before
transmission to the post office). After the data is packed into a buffer, the communication device
(which is often a network) is responsible for routing the message to the proper location. The
location of the message is defined by the process’s rank.

Even though the message is routed to B, process B still has to acknowledge that it wants to
receive A’s data. Once it does this, the data has been transmitted. Process A is acknowledged
that the data has been transmitted and may go back to work.

Sometimes there are cases when A might have to send many different types of messages to B.
Instead of B having to go through extra measures to differentiate all these messages, MPI allows
senders and receivers to also specify message IDs with the message (known as tags). When
process B only requests a message with a certain tag number, messages with different tags will
be buffered by the network until B is ready for them.

Program:

#include <stdio.h>

#include <mpi.h>

int main(int argc, char* argv[]) {

int rank, size;

int send_data = 42;

int recv_data;

MPI_Init(&argc, &argv);

MPI_Comm_rank(MPI_COMM_WORLD, &rank);

MPI_Comm_size(MPI_COMM_WORLD, &size);

if (rank == 0) {

// Process 0 sends data to process 1

MPI_Send(&send_data, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);

printf("Process 0 sent data: %d\n", send_data);

} else if (rank == 1) {

// Process 1 receives data from process 0

MPI_Recv(&recv_data, 1, MPI_INT, 0, 0, MPI_COMM_WORLD,

MPI_STATUS_IGNORE);

printf("Process 1 received data: %d\n", recv_data);

MPI_Finalize();

return 0;

}
Output:

0 sent and incremented ping_pong_count 1 to 1

0 received ping_pong_count 2 from 1
0 sent and incremented ping_pong_count 3 to 1
0 received ping_pong_count 4 from 1
0 sent and incremented ping_pong_count 5 to 1
0 received ping_pong_count 6 from 1
0 sent and incremented ping_pong_count 7 to 1
0 received ping_pong_count 8 from 1
0 sent and incremented ping_pong_count 9 to 1
0 received ping_pong_count 10 from 1
1 received ping_pong_count 1 from 0
1 sent and incremented ping_pong_count 2 to 0
1 received ping_pong_count 3 from 0
1 sent and incremented ping_pong_count 4 to 0
1 received ping_pong_count 5 from 0
1 sent and incremented ping_pong_count 6 to 0
1 received ping_pong_count 7 from 0
1 sent and incremented ping_pong_count 8 to 0
1 received ping_pong_count 9 from 0
1 sent and incremented ping_pong_count 10 to 0

Result:
Thus the C++ program for MPI-send-and-receive was written and output is obtained
successfully.
Parallel-Rank-With-MPI
Aim:
To write C program for parallel-rank-with-MPI
Algorithm:
Performing Parallel Rank with MPI

In the previous lesson, we went over MPI_Scatter, MPI_Gather, and MPI_Allgather. We are
going to expand on basic collectives in this lesson by coding a useful function for your MPI
toolkit - parallel rank.

Parallel rank - problem overview

When processes all have a single number stored in their local memory, it can be useful to know
what order their number is in respect to the entire set of numbers contained by all processes. For
example, a user might be benchmarking the processors in an MPI cluster and want to know the
order of how fast each processor is relative to the others. This information can be used for
scheduling tasks and so on. As you can imagine, it is rather difficult to find out a number’s order
in the context of all other numbers if they are spread across processes. This problem - the parallel
rank problem - is what we are going to solve in this lesson.

An illustration of the input and output of parallel rank is below:

The processes in the illustration (labeled 0 through 3) start with four numbers - 5, 2, 7, and 4.
The parallel rank algorithm then computes that process 1 has rank 0 in the set of numbers (i.e.
the first number), process 3 has rank 1, process 0 has rank 2, and process 2 has the last rank in
the set of numbers. Pretty simple, right?

Parallel rank API definition

Before we dive into solving the parallel rank problem, let’s first decide on how our function is
going to behave. Our function needs to take a number on each process and return its associated
rank with respect to all of the other numbers across all processes. Along with this, we will need
other miscellaneous information, such as the communicator that is being used, and the datatype
of the number being ranked.

Program:

#include <stdio.h>

#include <mpi.h>

int main(int argc, char* argv[]) {

int rank, size;

MPI_Init(&argc, &argv);

MPI_Comm_rank(MPI_COMM_WORLD, &rank);

MPI_Comm_size(MPI_COMM_WORLD, &size);

printf("Hello from process %d of %d\n", rank, size);

MPI_Finalize();

return 0;

}
Output:

mpirun -n 4 ./random_rank 100

Rank for 0.242578 on process 0 - 0
Rank for 0.894732 on process 1 - 3
Rank for 0.789463 on process 2 - 2
Rank for 0.684195 on process 3 – 1

Result:
Thus the C++ program for parallel-rank-with-MPI was written and output is obtained
successfully.

Acu Solve
No ratings yet
Acu Solve
185 pages
CP4252 Multicore Architecture and Programming Lab Manual
No ratings yet
CP4252 Multicore Architecture and Programming Lab Manual
26 pages
Multicore Architecture and Programming Lab Manual
No ratings yet
Multicore Architecture and Programming Lab Manual
29 pages
Lab Manual
No ratings yet
Lab Manual
31 pages
Multicore
No ratings yet
Multicore
23 pages
CP4292 Mcap
No ratings yet
CP4292 Mcap
24 pages
MAP Lab Mannual
No ratings yet
MAP Lab Mannual
24 pages
Multi Core
No ratings yet
Multi Core
25 pages
MAP Lab Completed
No ratings yet
MAP Lab Completed
29 pages
MPC LAB Manual New
No ratings yet
MPC LAB Manual New
24 pages
CP 4292 MCP Lab Manual
No ratings yet
CP 4292 MCP Lab Manual
20 pages
20bce2126 PDC Lab Da 3
No ratings yet
20bce2126 PDC Lab Da 3
11 pages
Micro
No ratings yet
Micro
30 pages
Cp4292 Multicore Lab Multicore Lab Removed
No ratings yet
Cp4292 Multicore Lab Multicore Lab Removed
37 pages
MPC LAB Manual New
No ratings yet
MPC LAB Manual New
23 pages
HPC Programs
No ratings yet
HPC Programs
19 pages
CP4292 Mcap
No ratings yet
CP4292 Mcap
15 pages
CP4292 Multicore Architecture Lab Manual
No ratings yet
CP4292 Multicore Architecture Lab Manual
36 pages
As 3
No ratings yet
As 3
2 pages
Parallel Computing Lab Manual PDF
100% (1)
Parallel Computing Lab Manual PDF
51 pages
Untitled Document
No ratings yet
Untitled Document
23 pages
PDC-Lab 21BCE10419
No ratings yet
PDC-Lab 21BCE10419
20 pages
HPC Codes-2
No ratings yet
HPC Codes-2
15 pages
Viva Questions
No ratings yet
Viva Questions
15 pages
All HPC Programs
No ratings yet
All HPC Programs
16 pages
E 3 (Openmp - Iii) : Matrix Multiplication
No ratings yet
E 3 (Openmp - Iii) : Matrix Multiplication
10 pages
OpenMP Matrix
No ratings yet
OpenMP Matrix
6 pages
C Programming
From Everand
C Programming
Netra
No ratings yet
Parallel and Distributed Computing Lab Digital Assignment - 3
No ratings yet
Parallel and Distributed Computing Lab Digital Assignment - 3
10 pages
CS2209 - Oops Lab Manual
100% (1)
CS2209 - Oops Lab Manual
62 pages
Akash.0011.Oops Lab
No ratings yet
Akash.0011.Oops Lab
29 pages
HPC 3 - Min Max
No ratings yet
HPC 3 - Min Max
4 pages
Pc Labmanual
No ratings yet
Pc Labmanual
19 pages
HPC 3
No ratings yet
HPC 3
7 pages
Parallel Programming
No ratings yet
Parallel Programming
10 pages
Lab 3
No ratings yet
Lab 3
23 pages
Java Practise Exercise
No ratings yet
Java Practise Exercise
3 pages
OpenMP Shared
No ratings yet
OpenMP Shared
28 pages
OOPs USING C++ (202-22) PRACTICAL FILE
No ratings yet
OOPs USING C++ (202-22) PRACTICAL FILE
17 pages
Excelente
No ratings yet
Excelente
64 pages
OpenMP Programs
No ratings yet
OpenMP Programs
4 pages
4 Performance.4x
No ratings yet
4 Performance.4x
14 pages
CSC213 Object Oriented Programming-Lab Manual-Sol
No ratings yet
CSC213 Object Oriented Programming-Lab Manual-Sol
36 pages
New
No ratings yet
New
16 pages
PDC Lab 2-5
No ratings yet
PDC Lab 2-5
5 pages
HPC CODES
No ratings yet
HPC CODES
21 pages
Lab Manual Etcs204 Oops
No ratings yet
Lab Manual Etcs204 Oops
50 pages
Lab Pratice First Lab Manual
No ratings yet
Lab Pratice First Lab Manual
81 pages
C Programs To Become Expert In Programming
From Everand
C Programs To Become Expert In Programming
Shubham Yadav
No ratings yet
Programming in C++ and Data Structures Lab Manual: II-B.Tech I - Sem (R-19)
No ratings yet
Programming in C++ and Data Structures Lab Manual: II-B.Tech I - Sem (R-19)
138 pages
Aim - Algorithms-C++ Lab
No ratings yet
Aim - Algorithms-C++ Lab
20 pages
Vector Addition: Exercise 1 (Openmp-I) Scenario - I
100% (1)
Vector Addition: Exercise 1 (Openmp-I) Scenario - I
15 pages
HPC Output
No ratings yet
HPC Output
12 pages
Object Oriented Programming Lab
No ratings yet
Object Oriented Programming Lab
66 pages
C++ Programs PDF
No ratings yet
C++ Programs PDF
47 pages
CPP All Programs
100% (2)
CPP All Programs
94 pages
PC File
No ratings yet
PC File
57 pages
Parallel and Distributed Computing: Code 1
No ratings yet
Parallel and Distributed Computing: Code 1
6 pages
Lab Paper of CPP
No ratings yet
Lab Paper of CPP
7 pages
HPC - Assignment 1
No ratings yet
HPC - Assignment 1
2 pages
Javascript - 50 functions and tutorial
From Everand
Javascript - 50 functions and tutorial
Nino Paiotta
4/5 (1)
Finding N Prime Numbers Using Distrusted Computing PVM (Parallel Virtual Machine)
No ratings yet
Finding N Prime Numbers Using Distrusted Computing PVM (Parallel Virtual Machine)
7 pages
Parallel Processing Unit With MIMD Architecture: April 2014
No ratings yet
Parallel Processing Unit With MIMD Architecture: April 2014
6 pages
EE6304 Lecture12 TLP
No ratings yet
EE6304 Lecture12 TLP
70 pages
Spark-Performance Tuning
No ratings yet
Spark-Performance Tuning
16 pages
Role of Parallel Computation in IOT, AR, Big Data and VR
No ratings yet
Role of Parallel Computation in IOT, AR, Big Data and VR
16 pages
SaaS Implementation Best Practices - v2
No ratings yet
SaaS Implementation Best Practices - v2
24 pages
BDA Notes
No ratings yet
BDA Notes
68 pages
Outline: What Is A Distributed DBMS Problems Current State-Of-Affairs
No ratings yet
Outline: What Is A Distributed DBMS Problems Current State-Of-Affairs
20 pages
Chapter 2 - Parallel Algorithm Design
No ratings yet
Chapter 2 - Parallel Algorithm Design
84 pages
Sysarch Presentation
No ratings yet
Sysarch Presentation
12 pages
14 Parallelismand Data Partitioningand Repartitioning Explaination
No ratings yet
14 Parallelismand Data Partitioningand Repartitioning Explaination
18 pages
8 Great Ideas in Computer Architecture
50% (2)
8 Great Ideas in Computer Architecture
4 pages
Linking Structure and Parametric Geometry
100% (1)
Linking Structure and Parametric Geometry
4 pages
Chapter 2 Process Management Part 2 Threads and Multithreading
No ratings yet
Chapter 2 Process Management Part 2 Threads and Multithreading
42 pages
Ieee 2007zk2
No ratings yet
Ieee 2007zk2
6 pages
Introduction To Computer PDF
No ratings yet
Introduction To Computer PDF
220 pages
Ca Important Unitwise Question
No ratings yet
Ca Important Unitwise Question
3 pages
Introduction To High Performance Computing: Unit-I
No ratings yet
Introduction To High Performance Computing: Unit-I
70 pages
Hardware Acceleration of SVM Classifier
No ratings yet
Hardware Acceleration of SVM Classifier
9 pages
Computer Sci. - Technology
No ratings yet
Computer Sci. - Technology
28 pages
Experiment 2 (A)
No ratings yet
Experiment 2 (A)
9 pages
HPC Revised Syllabus
No ratings yet
HPC Revised Syllabus
4 pages
Deepseek v2 Tech Report
No ratings yet
Deepseek v2 Tech Report
50 pages
Indian Contribution To Parallel Processing
No ratings yet
Indian Contribution To Parallel Processing
5 pages
PSCAD V5 - HPC Overview-1
No ratings yet
PSCAD V5 - HPC Overview-1
19 pages
Intel's PENTIUM-II III PROCESSORS
No ratings yet
Intel's PENTIUM-II III PROCESSORS
24 pages
10EC65 Operating Systems - Process Management
No ratings yet
10EC65 Operating Systems - Process Management
77 pages
PBSProUserGuide10 4 PDF
No ratings yet
PBSProUserGuide10 4 PDF
340 pages
Singh 2017
No ratings yet
Singh 2017
11 pages