0% found this document useful (0 votes)
15 views

09 OpenMP Intro

Uploaded by

Ebtsam Dosoky
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

09 OpenMP Intro

Uploaded by

Ebtsam Dosoky
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Introduction to OpenMP

CSC Training, 2020

/
Processes and threads

Serial
region

Serial Serial
region region
Parallel processes Parallel region Parallel region
Process Thread
Independent execution units A single process may contain multiple
Have their own state information and threads
own memory address space Have their own state information, but
share the same memory address space

/
Processes and threads

Serial
region

Serial Serial
region region
Parallel processes Parallel region Parallel region

Process Thread
Long-lived: spawned when parallel Short-lived: created when entering a
program started, killed when program parallel region, destroyed (joined)
is finished when region ends
Explicit communication between Communication through shared
processes memory

/
OpenMP

/
What is OpenMP?

A collection of compiler directives and library routines , together with a runtime


system, for multi-threaded, shared-memory parallelization
Fortran 77/9X/03 and C/C++ are supported
Latest version of the standard is 5.0 (November 2018)
Full support for accelerators (GPUs)
Support latest versions of C, C++ and Fortran
Support for a fully descriptive loop construct
and more
Compiler support for 5.0 is still incomplete
This course does not discuss any 5.0 specific features

/
Why would you want to learn OpenMP?

OpenMP parallelized program can be run on your many-core workstation or on a


node of a cluster
Enables one to parallelize one part of the program at a time
Get some speedup with a limited investment in time
Efficient and well scaling code still requires effort
Serial and OpenMP versions can easily coexist
Hybrid MPI+OpenMP programming

/
Three components of OpenMP

Compiler directives, i.e. language extensions, for shared memory parallelization


Runtime library routines (Intel: libiomp5, GNU: libgomp)
Conditional compilation to build serial version
Environment variables
Specify the number of threads, thread affinity etc.

/
OpenMP directives

OpenMP directives consist of "sentinel", followed by the directive name and


optional clauses
C/C++:
#pragma omp directive [clauses]

Fortran:
!$omp directive [clauses]

Directives are ignored when code is compiled without OpenMP support

/
Compiling an OpenMP program

Compilers that support OpenMP usually require an option that enables the feature
GNU: -fopenmp
Intel: -qopenmp
Cray: -h omp
OpenMP enabled by default, -h noomp disables
PGI: -mp[=nonuma,align,allcores,bind]
Without these options a serial version is compiled!

/
Parallel construct

Defines a parallel region SPMD: Single Program Multiple Data


C/C++:
#pragma omp parallel [clauses]
structured block

Fortran:
!$omp parallel [clauses]
structured block
!$omp end parallel

Prior to region only one thread, master


Creates a team of threads:
master+slaves
Barrier at the end of the block
/
Example: "Hello world" with OpenMP

program omp_hello #include <stdio.h>

write(*,*) "Obey your master!" int main(int argc, char* argv[])


!$omp parallel {
write(*,*) "Slave to the grind" printf("Obey your master!\n");
!$omp end parallel #pragma omp parallel
write(*,*) "Back with master" {
printf("Slave to the grind\n");
end program omp_hello }
printf("Back with master\n");
> gfortran -fopenmp omp_hello.F90 -o omp }
> OMP_NUM_THREADS=3 ./omp
Obey your master! > gcc -fopenmp omp_hello.c -o omp
Slave to the grind > OMP_NUM_THREADS=3 ./omp
Slave to the grind Obey your master!
Slave to the grind Slave to the grind
Back with master Slave to the grind
Slave to the grind
Back with master
/
How to distribute work?

Each thread executes the same code within the parallel region
OpenMP provides several constructs for controlling work distribution
Loop construct
Single/Master construct
Sections construct
Task construct
Workshare construct (Fortran only)
Thread id can be queried and used for distributing work manually (similar to MPI
rank)

/
Loop construct

Directive instructing compiler to share the work of a loop


Each thread executes only part of the loop
#pragma omp for [clauses]
...

!$omp do [clauses]
...
!$omp end do

in C/C++ limited only to "canonical" for-loops. Iterator base loops are also possible in
C++
Construct must reside inside a parallel region
Combined construct with omp parallel:
#pragma omp parallel for / !$omp parallel do

/
Loop construct

!$omp parallel
!$omp do
do i = 1, n
z(i) = x(i) + y(i)
end do
!$omp end do
!$omp end parallel

#pragma omp parallel


{
#pragma omp for
for (i=0; i < n; i++)
z[i] = x[i] + y[i];
}

/
Summary

OpenMP can be used with compiler directives


Neglected when code is build without OpenMP
Threads are launched within parallel regions
for/do loops can be parallelized easily with loop construct

You might also like