0% found this document useful (0 votes)
7 views4 pages

Tp2 - Openmp (Introduction) : Imad Kissami

The document outlines a series of exercises for an OpenMP course, focusing on parallel programming techniques. Exercises include creating OpenMP programs for thread management, parallelizing PI calculations, matrix multiplication, and the Jacobi method. Each exercise emphasizes the importance of understanding shared versus private variables and performance analysis through various threading configurations.

Uploaded by

Mohi Gpt4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views4 pages

Tp2 - Openmp (Introduction) : Imad Kissami

The document outlines a series of exercises for an OpenMP course, focusing on parallel programming techniques. Exercises include creating OpenMP programs for thread management, parallelizing PI calculations, matrix multiplication, and the Jacobi method. Each exercise emphasizes the importance of understanding shared versus private variables and performance analysis through various threading configurations.

Uploaded by

Mohi Gpt4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Mohammed VI Polytechnic University

TP2 - OpenMP (Introduction)


Imad Kissami
February 16, 2025

Exercise 1:
In this very simple exercise, you need to :

1. Write an OpenMP program displaying the number of threads used for the execution and
the rank of each of the threads.

2. Compile the code manually to create a monoprocessor executable and a parallel executable.

3. Test the programs obtained with different numbers of threads for the parallel program,
without submitting in batch.

Output example for the parallel program with 4 threads :


Hello from the rank 2 thread
Hello from the rank 1 thread
Hello from the rank 3 thread
Hello from the rank 0 thread
Parallel execution of hello_world with 4 threads

Exercise 2: Parallelizing of PI calculation


static long num_steps = 100000;
double step;
int main ()
{
int i; double x, pi , sum = 0.0;
step = 1.0/( double) num_steps;
for (i=0;i< num_steps; i++){
x = (i+0.5)* step;
sum = sum + 4.0/(1.0+x*x);
}
pi = step * sum;
}

1. Create a parallel version of the pi program using a parallel construct.

2. Don’t use #pragma parallel for

3. Pay close attention to shared versus private variables.

4. use double omp_get_wtime() to calculate the CPU time.

Exercise 3: Pi with loops


• Go back to the serial pi program and parallelize it with a loop construct

• Your goal is to minimize the number of changes made to the serial program (add only 1
line)
2

Exercise 4: Parallelizing Matrix Multiplication with OpenMP

// Allocate memory dynamically


double *a = (double *) malloc(m * n * sizeof(double ));
double *b = (double *) malloc(n * m * sizeof(double ));
double *c = (double *) malloc(m * m * sizeof(double ));

// Initialize matrices
for (int i = 0; i < m; i++) {
for (int j = 0; j < n; j++) {
a[i * n + j] = (i + 1) + (j + 1); // Access via 1D indexing
}
}

for (int i = 0; i < n; i++) {


for (int j = 0; j < m; j++) {
b[i * m + j] = (i + 1) - (j + 1);
}
}

for (int i = 0; i < m; i++) {


for (int j = 0; j < m; j++) {
c[i * m + j] = 0;
}
}

// Matrix multiplication
for (int i = 0; i < m; i++) {
for (int j = 0; j < m; j++) {
for (int k = 0; k < n; k++) {
c[i * m + j] += a[i * n + k] * b[k * m + j];
}
}
}

The code calculates the matrix product:

C =A×B

• In this exercise, you must:

1. Insert the appropriate OpenMP directives and analyze the code performance.
2. Use Collapse directive to parallelize this matrix multiplication code.
3. Run the code using 1, 2, 4, 8, 16 threads and plot the speedup and efficiency.
4. Test the loop iteration repartition modes (STATIC, DYNAMIC, GUIDED) and vary the
chunk sizes.

Exercise 5: Parallelizing of Jacobi Method with OpenMP


The program solves a general linear system using the Jacobi iterative method.
# include <stdio.h>
# include <stdlib.h>
# include <string.h>
# include <float.h>
# include <math.h>
# include <sys/time.h>
# include <omp.h> // Replaces time.h

// Default matrix size


# ifndef VAL_N
# define VAL_N 120
#endif
# ifndef VAL_D
# define VAL_D 80
#endif

// Random initialization of an array


3

void random_number(double* array , int size) {


for (int i = 0; i < size; i++) {
array[i] = (double)rand () / (double )( RAND_MAX - 1);
}
}

int main () {
int n = VAL_N , diag = VAL_D;
int i, j, iteration = 0;
double norme;

// Correct 2D matrix allocation


double *a = (double *) malloc(n * n * sizeof(double ));
double *x = (double *) malloc(n * sizeof(double ));
double *x_courant = (double *) malloc(n * sizeof(double ));
double *b = (double *) malloc(n * sizeof(double ));

if (!a || !x || !x_courant || !b) {


fprintf(stderr , "Memory␣allocation␣failed !\n");
exit(EXIT_FAILURE );
}

// Time measurement variables


struct timeval t_elapsed_0 , t_elapsed_1;
double t_elapsed;

double t_cpu_0 , t_cpu_1 , t_cpu;

// Matrix and RHS initialization


srand (421); // For reproducibility
random_number(a, n * n);
random_number(b, n);

// Strengthening the diagonal


for (i = 0; i < n; i++) {
a[i * n + i] += diag; // Corrected indexing
}

// Initial solution
for (i = 0; i < n; i++) {
x[i] = 1.0;
}

// Start timing
t_cpu_0 = omp_get_wtime ();
gettimeofday (& t_elapsed_0 , NULL );

// Jacobi Iteration
while (1) {
iteration ++;

for (i = 0; i < n; i++) {


x_courant[i] = 0;
for (j = 0; j < i; j++) {
x_courant[i] += a[j * n + i] * x[j]; // Corrected indexing
}
for (j = i + 1; j < n; j++) {
x_courant[i] += a[j * n + i] * x[j]; // Corrected indexing
}
x_courant[i] = (b[i] - x_courant[i]) / a[i * n + i]; // Corrected indexing
}

// Convergence test
double absmax = 0;
for (i = 0; i < n; i++) {
double curr = fabs(x[i] - x_courant[i]);
if (curr > absmax)
absmax = curr;
}
norme = absmax / n;

if (( norme <= DBL_EPSILON) || (iteration >= n)) break;

// Copy x_courant to x
memcpy(x, x_courant , n * sizeof(double ));
}
4

// End timing
gettimeofday (& t_elapsed_1 , NULL );
t_elapsed = (t_elapsed_1.tv_sec - t_elapsed_0.tv_sec) +
(t_elapsed_1.tv_usec - t_elapsed_0.tv_usec) / 1e6;

t_cpu_1 = omp_get_wtime ();


t_cpu = t_cpu_1 - t_cpu_0;

// Print result
fprintf(stdout , "\n\n"
"␣␣␣System␣size␣␣␣␣␣␣␣␣␣:␣%5d\n"
"␣␣␣Iterations␣␣␣␣␣␣␣␣␣␣:␣%4d\n"
"␣␣␣Norme␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣:␣%10.3E\n"
"␣␣␣Elapsed␣time␣␣␣␣␣␣␣␣:␣%10.3E␣sec.\n"
"␣␣␣CPU␣time␣␣␣␣␣␣␣␣␣␣␣␣:␣%10.3E␣sec.\n",
n, iteration , norme , t_elapsed , t_cpu
);

// Free allocated memory


free(a);
free(x);
free(x_courant );
free(b);

return EXIT_SUCCESS;
}

A×x=b

1. In this exercice, you must solve the system in parallel.

2. Run the code using 1, 2, 4, 8, 16 threads and plot the speedup and efficiency.

You might also like