Parallel Programming Using Openmp: Mike Bailey
Parallel Programming Using Openmp: Mike Bailey
Parallel Programming Using Openmp: Mike Bailey
Mike Bailey
[email protected]
Computer Graphics
openmp.pptx mjb – March 23, 2020
OpenMP Multithreaded Programming 2
Computer Graphics
mjb – March 23, 2020
What OpenMP Isn’t: 4
• OpenMP doesn’t check for data dependencies, data conflicts, deadlocks, or race conditions.
You are responsible for avoiding those yourself
• OpenMP doesn’t guarantee the order in which threads execute, just that they do execute
• OpenMP does not prevent you from writing code that triggers cache performance problems
(such as in false-sharing), in fact, it makes it really easy
Computer Graphics
mjb – March 23, 2020
Memory Allocation in a Multithreaded Program 5
One-thread Multiple-threads
Stack
Stack Stack
Don’t take this completely
literally. The exact Common
arrangement depends on the
Program operating system and the Program
Executable compiler. For example, Executable
sometimes the stack and heap
are arranged so that they grow
towards each other.
Globals Common
Globals
Heap Common
Heap
Computer Graphics
mjb – March 23, 2020
6
Using OpenMP on Linux
If you are using Visual Studio 2019 and get a compile message that looks like
this:
1>c1xx: error C2338: two-phase name lookup is not supported for C++/CLI,
C++/CX, or OpenMP; use /Zc:twoPhase-
then do this:
Computer Graphics
mjb – March 23, 2020
Numbers of OpenMP threads 8
How to specify how many OpenMP threads you want to have available:
omp_set_num_threads( num );
Setting the number of available threads to the exact number of cores available:
omp_set_num_threads( omp_get_num_procs( ) );
Asking how many OpenMP threads this program is using right now:
num = omp_get_num_threads( );
Computer Graphics
mjb – March 23, 2020
Creating an OpenMP Team of Threads 9
Computer Graphics
mjb – March 23, 2020
Creating an OpenMP Team of Threads 10
#include <stdio.h>
#include <omp.h>
int
main( )
{
omp_set_num_threads( 8 );
#pragma omp parallel default(none)
{
printf( “Hello, World, from thread #%d ! \n” , omp_get_thread_num( ) );
}
return 0;
}
Computer Graphics
mjb – March 23, 2020
Uh-oh… 11
This tells the compiler to parallelize the for-loop into multiple threads. Each thread
automatically gets its own personal copy of the variable i because it is defined within the
for-loop body.
The default(none) directive forces you to explicitly declare all variables declared outside the
parallel region to be either private or shared while they are in the parallel region. Variables
declared within the for-loop are automatically private.
Computer Graphics
mjb – March 23, 2020
OpenMP for-Loop Rules 13
a[102] = a[101] + 1.; // what if this is the first of thread #1’s work?
Computer Graphics
mjb – March 23, 2020
OpenMP For-Loop Rules 14
index++
++index
index < end index--
index <= end ; --index
for( index = start ; )
index > end index += incr
index >= end index = index + incr
index = incr + index
index -= decr
index = index - decr
Computer Graphics
mjb – March 23, 2020
What to do about Variables Declared Before the for-loop Starts? 15
float x = 0.;
#pragma omp parallel for …
for( int i = 0; i < N; i++ )
{
i and y are automatically private because they are
x = (float) i;
defined within the loop.
float y = x*x;
<< more code… >
} Good practice demands that x be explicitly
declared to be shared or private!
private(x)
Means that each thread will get its own version of the variable
shared(x)
Means that all threads will share a common version of the variable
default(none)
I recommend that you include this in your OpenMP for-loop directive. This will
force you to explicitly flag all of your externally-declared variables as shared or
private. Don’t make a mistake by leaving it up to the default!
Example:
#pragma omp parallel for default(none), private(x)
Computer Graphics
mjb – March 23, 2020
Single Program Multiple Data (SPMD) in OpenMP 16
Static Threads
• All work is allocated and assigned at runtime
Dynamic Threads
• The pool is statically assigned some of the work at runtime, but not all of it
• When a thread from the pool becomes idle, it gets a new assignment
• “Round-robin assignments”
OpenMP Scheduling
schedule(static [,chunksize])
schedule(dynamic [,chunksize])
Defaults to static
chunksize defaults to 1
Computer Graphics
mjb – March 23, 2020
OpenMP Allocation of Work to Threads 18
Static,1
0 0,3,6,9 chunksize = 1
1 1,4,7,10 Each thread is assigned one iteration, then
2 2,5,8,11 the assignments start over
Static,2
0 0,1,6,7 chunksize = 2
1 2,3,8,9 Each thread is assigned two iterations, then
2 4,5,10,11 the assignments start over
Static,4
0 0,1,2,3 chunksize = 4
1 4,5,6,7 Each thread is assigned four iterations, then
2 8,9,10,11 the assignments start over
Computer Graphics
mjb – March 23, 2020
Arithmetic Operations Among Threads – A Problem 19
• There is no guarantee when each thread will execute this line correctly
• There is not even a guarantee that each thread will finish this line before some
other thread interrupts it. (Remember that each line of code usually generates
multiple lines of assembly.)
• This is non-deterministic !
Assembly code:
Load sum What if the scheduler
Add myPartialSum decides to switch
Store sum threads right here?
0.469635 0.398893
0.517984 0.446419
0.438868 0.431204
0.437553 0.501783
0.398761 0.334996
0.506564 0.484124
0.489211 0.506362
0.584810 0.448226
0.476670 0.434737
0.530668 0.444919
0.500062 0.442432
0.672593 0.548837
0.411158 0.363092
0.408718 0.544778
0.523448 0.356299
Don’t do it this way! We’ll talk about how to do it correctly in the Trapezoid Integration noteset.
Computer Graphics
mjb – March 23, 2020
Here’s a trapezoid integration example. 21
The partial sums are added up, as shown on the previous page.
The integration was done 30 times.
The answer is supposed to be exactly 2.
None of the 30 answers is even close.
And, not only are the answers bad, they are not even consistently bad!
sum
Trial #
Don’t do it this way! We’ll talk about how to do it correctly in the Trapezoid Integration noteset.
Computer Graphics
mjb – March 23, 2020
Synchronization 22
Critical sections
#pragma omp critical
Restricts execution to one thread at a time
Barriers
#pragma omp barrier
Forces each thread to wait here until all threads arrive
(Note: there is an implied barrier after parallel for loops and OpenMP sections,
unless the nowait clause is used)
Computer Graphics
mjb – March 23, 2020
Synchronization Examples 23
omp_lock_t Sync;
...
omp_init_lock( &Sync );
...
omp_set_lock( &Sync );
<< code that needs the mutual exclusion >>
omp_unset_lock( &Sync );
...
Computer Graphics
mjb – March 23, 2020
Single-thread-execution Synchronization 24
Restricts execution to a single thread ever. This is used when an operation only
makes sense for one thread to do. Reading data from a file is a good example.
Computer Graphics
mjb – March 23, 2020
Creating Sections of OpenMP Code 25
(Note: there is an implied barrier after parallel for loops and OpenMP sections,
unless the nowait clause is used)
Computer Graphics
mjb – March 23, 2020
What do OpenMP Sections do for You? 26
They decrease your overall execution time.
omp_set_num_threads( 1 );
Section 1 Section 2 Section 3
omp_set_num_threads( 2 );
Section 1
Section 2 Section 3
omp_set_num_threads( 3 );
Section 1
Section 2
Section 3
Computer Graphics
mjb – March 23, 2020
A Functional Decomposition example of using Sections 27
omp_set_num_threads( 3 );
Computer Graphics
mjb – March 23, 2020