OpenMP Workshop Day 1
OpenMP Workshop Day 1
Christian Terboven
Michael Klemm
1 OpenMP Tutorial
Members of the OpenMP Language Committee
Agenda
2 OpenMP Tutorial
Members of the OpenMP Language Committee
Programming OpenMP
An Overview Of OpenMP
Christian Terboven
Michael Klemm
1 OpenMP Tutorial
Members of the OpenMP Language Committee
History
2 OpenMP Tutorial
Members of the OpenMP Language Committee
What is OpenMP?
• Tasking
• SIMD / Vectorization
• Accelerator Programming
• …
3 OpenMP Tutorial
Members of the OpenMP Language Committee
Get your C/C++ and Fortran Reference Guide!
Covers all of OpenMP 5.0!
4 OpenMP Tutorial
Members of the OpenMP Language Committee
Recent Books About OpenMP
A printed copy of the 5.0 A book that covers all of the A new book about the OpenMP
specifications, 2019 OpenMP 4.5 features, 2017 Common Core, 2019
5 OpenMP Tutorial
Members of the OpenMP Language Committee
Programming OpenMP
Parallel Region
Christian Terboven
Michael Klemm
1 OpenMP Tutorial
Members of the OpenMP Language Committee
OpenMP‘s machine model
2 OpenMP Tutorial
Members of the OpenMP Language Committee
The OpenMP Memory Model
Memory private
PU
• No other thread sees the private memory
change(s) in private memory memory
PU
3 OpenMP Tutorial
Members of the OpenMP Language Committee
The OpenMP Execution Model
• Concept: Fork-Join.
• Allows for an incremental parallelization!
4 OpenMP Tutorial
Members of the OpenMP Language Committee
Parallel Region and Structured Blocks
5 OpenMP Tutorial
Members of the OpenMP Language Committee
Starting OpenMP Programs on Linux
6 OpenMP Tutorial
Members of the OpenMP Language Committee
Demo
7 OpenMP Tutorial
Members of the OpenMP Language Committee
Programming OpenMP
Worksharing
Christian Terboven
Michael Klemm
1 OpenMP Tutorial
Members of the OpenMP Language Committee
For Worksharing
• If only the parallel construct is used, each thread executes the Structured Block.
• Program Speedup: Worksharing
• OpenMP‘s most common Worksharing construct: for
C/C++ Fortran
int i; INTEGER :: i
#pragma omp for !$omp do
for (i = 0; i < 100; i++) DO i = 0, 99
{ a[i] = b[i] + c[i]
a[i] = b[i] + c[i]; END DO
}
– Distribution of loop iterations over all threads in a Team.
– Scheduling of the distribution can be influenced.
2 OpenMP Tutorial
Members of the OpenMP Language Committee
Worksharing illustrated
Pseudo-Code Memory
Here: 4 Threads
Thread 1 do i = 0, 24 A(0)
.
a(i) = b(i) + c(i) .
end do .
A(99)
Thread 2 do i = 25, 49
Serial B(0)
a(i) = b(i) + c(i)
do i = 0, 99 .
end do .
a(i) = b(i) + c(i) .
end do do i = 50, 74 B(99)
a(i) = b(i) + c(i)
Thread 3 end do C(0)
.
do i = 75, 99 .
a(i) = b(i) + c(i) .
C(99)
Thread 4 end do
3 OpenMP Tutorial
Members of the OpenMP Language Committee
The Barrier Construct
4 OpenMP Tutorial
Members of the OpenMP Language Committee
The Single Construct
C/C++ Fortran
#pragma omp single [clause] !$omp single [clause]
... structured block ... ... structured block ...
!$omp end single
• The single construct specifies that the enclosed structured block is executed by only on thread of the
team.
– It is up to the runtime which thread that is.
• Useful for:
– I/O
– Memory allocation and deallocation, etc. (in general: setup work)
– Implementation of the single-creator parallel-executor pattern as we will see later…
5 OpenMP Tutorial
Members of the OpenMP Language Committee
The Master Construct
C/C++ Fortran
#pragma omp master[clause] !$omp master[clause]
... structured block ... ... structured block ...
!$omp end master
• The master construct specifies that the enclosed structured block is executed only by the master thread of
a team.
• Note: The master construct is no worksharing construct and does not contain an implicit barrier at the end.
6 OpenMP Tutorial
Members of the OpenMP Language Committee
Demo
Vector Addition
7 OpenMP Tutorial
Members of the OpenMP Language Committee
Influencing the For Loop Scheduling / 1
• for-construct: OpenMP allows to influence how the iterations are scheduled among the threads of the
team, via the schedule clause:
– schedule(static [, chunk]): Iteration space divided into blocks of chunk size, blocks are assigned to
threads in a round-robin fashion. If chunk is not specified: #threads blocks.
– schedule(dynamic [, chunk]): Iteration space divided into blocks of chunk (not specified: 1) size,
blocks are scheduled to threads in the order in which threads finish previous blocks.
– schedule(guided [, chunk]): Similar to dynamic, but block size starts with implementation-defined
value, then is decreased exponentially down to chunk.
• Default is schedule(static).
8 OpenMP Tutorial
Members of the OpenMP Language Committee
Influencing the For Loop Scheduling / 2
◼ Static Schedule
→ schedule(static [, chunk])
→ Decomposition
depending on chunksize
9 OpenMP Tutorial
Members of the OpenMP Language Committee
Influencing the For Loop Scheduling / 3
• Dynamic schedule
– schedule(dynamic [, chunk])
– Iteration space divided into blocks of chunk size
– Threads request a new block after finishing the previous one
– Default chunk size is 1
• Pros ?
– Workload distribution
• Cons?
– Runtime Overhead
– Chunk size essential for performance
– No NUMA optimizations possible
10 OpenMP Tutorial
Members of the OpenMP Language Committee
Synchronization Overview
C/C++
int i, int s = 0;
#pragma omp parallel for
for (i = 0; i < 100; i++)
{
s = s + a[i];
}
• Data Race: If between two synchronization points at least one thread writes to a memory location from
which at least one other thread reads, the result is not deterministic (race condition).
11 OpenMP Tutorial
Members of the OpenMP Language Committee
Synchronization: Critical Region
• A Critical Region is executed by all threads, but by only one thread simultaneously (Mutual Exclusion).
C/C++
#pragma omp critical (name)
{
... structured block ...
}
12 OpenMP Tutorial
Members of the OpenMP Language Committee
Programming OpenMP
Scoping
Christian Terboven
Michael Klemm
1 OpenMP Tutorial
Members of the OpenMP Language Committee
Scoping Rules
2 OpenMP Tutorial
Members of the OpenMP Language Committee
Privatization of Global/Static Variables
C/C++ Fortran
static int i; SAVE INTEGER :: i
#pragma omp threadprivate(i) !$omp threadprivate(i)
3 OpenMP Tutorial
Members of the OpenMP Language Committee
Privatization of Global/Static Variables
C/C++ Fortran
static int i; SAVE INTEGER :: i
#pragma omp threadprivate(i) !$omp threadprivate(i)
4 OpenMP Tutorial
Members of the OpenMP Language Committee
Back to our example
C/C++
int i, s = 0;
#pragma omp parallel for
for (i = 0; i < 100; i++)
{
#pragma omp critical
{ s = s + a[i]; }
}
5 OpenMP Tutorial
Members of the OpenMP Language Committee
It‘s your turn: Make It Scale!
6 OpenMP Tutorial
Members of the OpenMP Language Committee
(done)
7 OpenMP Tutorial
Members of the OpenMP Language Committee
The Reduction Clause
• In a reduction-operation the operator is applied to all variables in the list. The variables have to be shared.
– reduction(operator:list)
– The result is provided in the associated reduction variable
C/C++
int i, s = 0;
#pragma omp parallel for reduction(+:s)
for(i = 0; i < 99; i++)
{
s = s + a[i];
}
8 OpenMP Tutorial
Members of the OpenMP Language Committee
Example
PI
9 OpenMP Tutorial
Members of the OpenMP Language Committee
Example: Pi (1/2)
double f(double x)
1
{ 4
return (4.0 / (1.0 + x*x)); 𝜋=න
} 1 + 𝑥2
0
double CalcPi (int n)
{
const double fH = 1.0 / (double) n;
double fSum = 0.0;
double fX;
int i;
10 OpenMP Tutorial
Members of the OpenMP Language Committee
Example: Pi (2/2)
double f(double x)
1
{ 4
return (4.0 / (1.0 + x*x)); 𝜋=න
} 1 + 𝑥2
0
double CalcPi (int n)
{
const double fH = 1.0 / (double) n;
double fSum = 0.0;
double fX;
int i;
11 OpenMP Tutorial
Members of the OpenMP Language Committee
Programming OpenMP
OpenMP Tasking Introduction
Christian Terboven
Michael Klemm
1 OpenMP Tutorial
Members of the OpenMP Language Committee
What is a Task in OpenMP?
◼ Tasks are work units whose execution
→ may be deferred or…
… when encountering a taskloop construct → explicit tasks per chunk are created
2 OpenMP Tutorial
Members of the OpenMP Language Committee
Tasking Execution Model
◼ Supports unstructured parallelism ◼ Example (unstructured parallelism)
→ unbounded loops #pragma omp parallel
#pragma omp master
while ( <expr> ) {
while (elem != NULL) {
...
#pragma omp task
}
compute(elem);
elem = elem->next;
→ recursive functions }
void myfunc( <args> )
{
...; myfunc( <newargs> ); ...;
} Parallel Team
3 OpenMP Tutorial
Members of the OpenMP Language Committee
OpenMP Tasking Idiom
◼ OpenMP programmers need a specific idiom to kick off task-parallel execution: parallel master
→ OpenMP version 5.0 introduced the parallel master construct
1 int main(int argc, char* argv[]) 1 int main(int argc, char* argv[])
2 { 2 {
3 [...] 3 [...]
4 #pragma omp parallel 4 #pragma omp parallel
5 { 5 {
6 #pragma omp master 6 #pragma omp single
7 { 7 {
9 start_task_parallel_execution(); 9 start_task_parallel_execution();
9 } 9 }
10 } 10 }
11 [...] 11 [...]
12 } 12 }
4 OpenMP Tutorial
Members of the OpenMP Language Committee
Fibonacci Numbers (in a Stupid Way ☺)
1 int main(int argc, 14 int fib(int n) {
2 char* argv[]) 15 if (n < 2) return n;
3 { 16 int x, y;
4 [...] 17 #pragma omp task shared(x)
5 #pragma omp parallel 18 {
6 { 19 x = fib(n - 1);
7 #pragma omp master 20 }
8 { 21 #pragma omp task shared(y)
9 fib(input); 22 {
10 } 23 y = fib(n - 2);
11 } 24 }
12 [...] 25 #pragma omp taskwait
13 } 26 return x+y;
27 }
Task Queue
fib(2)
fib(3) fib(2)
fib(1) fib(1) fib(0)
6 OpenMP Tutorial
Members of the OpenMP Language Committee
◼ T1 enters fib(4)
◼ T1 creates tasks for
fib(3) and fib(2) fib(4)
◼ T1 and T2 execute tasks
from the queue
◼ T1 and T2 create 4 new fib(3) fib(2)
tasks
◼ T1 - T4 execute tasks
◼ … fib(2) fib(1) fib(1) fib(0)
fib(1) fib(0)
7 OpenMP Tutorial
Members of the OpenMP Language Committee
Programming OpenMP
Using OpenMP Compilers
Christian Terboven
Michael Klemm
8 OpenMP Tutorial
Members of the OpenMP Language Committee
Production Compilers w/ OpenMP Support
◼ GCC
◼ clang/LLVM
◼ Intel Classic and Next-gen Compilers
◼ AOCC, AOMP, ROCmCC
◼ IBM XL
◼ … and many more
9 OpenMP Tutorial
Members of the OpenMP Language Committee
Compiling OpenMP
◼ Enable OpenMP via the compiler’s command-line switches
→ GCC: -fopenmp
→ clang: -fopenmp
10 OpenMP Tutorial
Members of the OpenMP Language Committee
Programming OpenMP
Hands-on Exercises
Christian Terboven
Michael Klemm
11 OpenMP Tutorial
Members of the OpenMP Language Committee
Webinar Exercises
◼ We have implemented a series of small hands-on examples that you can use and play with.
→ Download: git clone https://github.com/cterboven/OpenMP-tutorial-PRACE.git
→ Build: make
→ You can then find the compiled code in the “bin” folder to run it
→ We use the GCC compiler mostly, some examples require Intel’s Math Kernel Library
→ You can use it to cheat ☺, or to check if you came up with the same solution
12 OpenMP Tutorial
Members of the OpenMP Language Committee