Tp2 - Openmp (Introduction) : Imad Kissami
Tp2 - Openmp (Introduction) : Imad Kissami
Exercise 1:
In this very simple exercise, you need to :
1. Write an OpenMP program displaying the number of threads used for the execution and
the rank of each of the threads.
2. Compile the code manually to create a monoprocessor executable and a parallel executable.
3. Test the programs obtained with different numbers of threads for the parallel program,
without submitting in batch.
• Your goal is to minimize the number of changes made to the serial program (add only 1
line)
2
// Initialize matrices
for (int i = 0; i < m; i++) {
for (int j = 0; j < n; j++) {
a[i * n + j] = (i + 1) + (j + 1); // Access via 1D indexing
}
}
// Matrix multiplication
for (int i = 0; i < m; i++) {
for (int j = 0; j < m; j++) {
for (int k = 0; k < n; k++) {
c[i * m + j] += a[i * n + k] * b[k * m + j];
}
}
}
C =A×B
1. Insert the appropriate OpenMP directives and analyze the code performance.
2. Use Collapse directive to parallelize this matrix multiplication code.
3. Run the code using 1, 2, 4, 8, 16 threads and plot the speedup and efficiency.
4. Test the loop iteration repartition modes (STATIC, DYNAMIC, GUIDED) and vary the
chunk sizes.
int main () {
int n = VAL_N , diag = VAL_D;
int i, j, iteration = 0;
double norme;
// Initial solution
for (i = 0; i < n; i++) {
x[i] = 1.0;
}
// Start timing
t_cpu_0 = omp_get_wtime ();
gettimeofday (& t_elapsed_0 , NULL );
// Jacobi Iteration
while (1) {
iteration ++;
// Convergence test
double absmax = 0;
for (i = 0; i < n; i++) {
double curr = fabs(x[i] - x_courant[i]);
if (curr > absmax)
absmax = curr;
}
norme = absmax / n;
// Copy x_courant to x
memcpy(x, x_courant , n * sizeof(double ));
}
4
// End timing
gettimeofday (& t_elapsed_1 , NULL );
t_elapsed = (t_elapsed_1.tv_sec - t_elapsed_0.tv_sec) +
(t_elapsed_1.tv_usec - t_elapsed_0.tv_usec) / 1e6;
// Print result
fprintf(stdout , "\n\n"
"␣␣␣System␣size␣␣␣␣␣␣␣␣␣:␣%5d\n"
"␣␣␣Iterations␣␣␣␣␣␣␣␣␣␣:␣%4d\n"
"␣␣␣Norme␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣:␣%10.3E\n"
"␣␣␣Elapsed␣time␣␣␣␣␣␣␣␣:␣%10.3E␣sec.\n"
"␣␣␣CPU␣time␣␣␣␣␣␣␣␣␣␣␣␣:␣%10.3E␣sec.\n",
n, iteration , norme , t_elapsed , t_cpu
);
return EXIT_SUCCESS;
}
A×x=b
2. Run the code using 1, 2, 4, 8, 16 threads and plot the speedup and efficiency.