Openmp 2pp
Openmp 2pp
Openmp 2pp
Mike Bailey
[email protected]
1
Who is in the OpenMP Consortium? 3
• OpenMP doesn’t check for data dependencies, data conflicts, deadlocks, or race conditions.
You are responsible for avoiding those yourself
• OpenMP doesn’t guarantee the order in which threads execute, just that they do execute
• OpenMP does not prevent you from writing false-sharing code (in fact, it makes it really easy)
2
Memory Allocation in a Multithreaded Program 5
One-thread Multiple-threads
Stack
Stack Stack
Don’t take this completely
literally. The exact Common
arrangement depends on
Program the operating system and Program
Executable the compiler. For Executable
example, sometimes the
stack and heap are
arranged so that they
Globals grow towards each other. Common
Globals
Heap Common
Heap
6
Using OpenMP on Linux
3
Number of OpenMP threads 7
Two ways to specify how many OpenMP threads you want to have available:
1. Set the OMP_NUM_THREADS environment variable
2. Call omp_set_num_threads( num );
Asking how many OpenMP threads this program is using right now:
num = omp_get_num_threads( );
4
Creating an OpenMP Team of Threads 9
#include <stdio.h>
#include <omp.h>
int
main( )
{
omp_set_num_threads( 8 );
#pragma omp parallel default(none)
{
printf( “Hello, World, from thread #%d ! \n” , omp_get_thread_num( ) );
}
return 0;
}
Uh-oh… 10
5
Creating OpenMP threads in Loops 11
This tells the compiler to parallelize the for-loop into multiple threads. Each
thread automatically gets its own personal copy of the variable i because it is
defined within the for-loop body.
The default(none) directive forces you to explicitly declare all variables declared outside the
parallel region to be either private or shared while they are in the parallel region. Variables
declared within
Oregon State the for-loop body are automatically private
University
Computer Graphics
mjb – February 28, 2017
6
OpenMP For-Loop Rules 13
index++
++index
index < end index--
index <= end ; --index
for( index = start ; )
index > end index += incr
index >= end index = index + incr
index = incr + index
index -= decr
index = index - decr
default(none)
in all your OpenMP directives. This will force you to explicitly flag all of your
inside variables as shared or private. This will help prevent mistakes.
private(x)
Means that each thread will have its own copy of the variable x
shared(x)
Means that all threads will share a common x. This is potentially dangerous.
Example:
7
Single Program Multiple Data (SPMD) in OpenMP 15
Static Threads
• All work is allocated and assigned at runtime
Dynamic Threads
• Consists of one Master and a pool of threads
• The pool is assigned some of the work at runtime, but not all of it
• When a thread from the pool becomes idle, the Master gives it a new assignment
• “Round-robin assignments”
OpenMP Scheduling
schedule(static [,chunksize])
schedule(dynamic [,chunksize])
Defaults to static
chunksize defaults to 1
In static, the iterations are assigned to threads before the loop starts
8
OpenMP Allocation of Work to Threads 17
Static,1
0 0,3,6,9 chunksize = 1
1 1,4,7,10 Each thread is assigned one iteration, then
2 2,5,8,11 the assignments start over
Static,2
0 0,1,6,7 chunksize = 2
1 2,3,8,9 Each thread is assigned two iterations, then
2 4,5,10,11 the assignments start over
Static,4
0 0,1,2,3 chunksize = 4
1 4,5,6,7 Each thread is assigned four iterations, then
2 8,9,10,11 the assignments start over
• There is no guarantee when each thread will execute this line correctly
• There is not even a guarantee that each thread will finish this line before some
other thread interrupts it. (Remember that each line of code usually generates
multiple lines of assembly.)
• This is non-deterministic !
Assembly code:
Load sum What if the scheduler
Add myPartialSum decides to switch
Store sum threads right here?
9
Here’s a trapezoid integration example (covered in another note set).19
The partial sums are added up, as shown on the previous page.
The integration was done 30 times.
The answer is supposed to be exactly 2.
None of the 30 answers is even close.
And, not only are the answers bad, they are not even consistently bad!
0.469635 0.398893
0.517984 0.446419
0.438868 0.431204
0.437553 0.501783
0.398761 0.334996
0.506564 0.484124
0.489211 0.506362
0.584810 0.448226
0.476670 0.434737
0.530668 0.444919
0.500062 0.442432
0.672593 0.548837
0.411158 0.363092
0.408718 0.544778
0.523448 0.356299
Oregon State University
Computer Graphics
mjb – February 28, 2017
Trial #
Oregon State University
Computer Graphics Don’t do it this way!
mjb – February 28, 2017
10
Arithmetic Operations Among Threads – Three Solutions 21
#pragma omp atomic
sum = sum + myPartialSum;
Tries to use a built-in
• Fixes the non-deterministic problem hardware instruction.
1 • But, serializes the code
• Operators include +, -, *, /, ++, --, >>, <<, ^, |
• Operators include +=, -=, *=, /=, etc.
#pragma omp critical Disables scheduler interrupts
sum = sum + myPartialSum; during the critical section.
2 • Also fixes it
• But, serializes the code
Source: ESPN
Oregon State University
Computer Graphics
mjb – February 28, 2017
11
Reduction vs. Atomic vs. Critical 23
delete [ ] sums;
• This seems perfectly reasonable, it works, and it gets rid of the problem of
multiple threads trying to write into the same reduction variable.
• The reason we don’t do this is that this method provokes a problem called
False Sharing
Oregon State University
Computer Graphics . We will get to that when we discuss caching.
mjb – February 28, 2017
12
Synchronization 25
Critical sections
#pragma omp critical
Restricts execution to one thread at a time
Barriers
#pragma omp barrier
Forces each thread to wait here until all threads arrive
(Note: there is an implied barrier after parallel for loops and OpenMP sections, unless the nowait
clause is used)
Synchronization Examples 26
omp_lock_t Sync;
...
omp_init_lock( &Sync );
...
omp_set_lock( &Sync );
<< code that needs the mutual exclusion >>
omp_unset_lock( &Sync );
...
13
Creating Sections of OpenMP Code 27
omp_set_num_threads( 1 );
Task 1 Task 2 Task 3
omp_set_num_threads( 2 );
Task 1
Task 2 Task 3
omp_set_num_threads( 3 );
Task 1
Task 2
Task 3
14
OpenMP Tasks 29
• An OpenMP task is a single line of code or a structured block which is immediately assigned
to one thread in the current thread team
• The task can be executed immediately, or it can be placed on its thread’s list of things to do.
• If the if clause is used and the argument evaluates to 0, then the task is executed
immediately, superseding whatever else that thread is doing.
• There has to be an existing parallel thread team for this to work. Otherwise one thread ends
up doing all tasks.
• One of the best uses of this is to make a function call. That function then runs concurrently
until it completes.
Tasks are very much like OpenMP Sections, but Sections are more static, that is, trhe
number of sections is set when you write the code, whereas Tasks can be created anytime,
and in any number, under control of your program’s logic.
Oregon State University
Computer Graphics
mjb – February 28, 2017
}
Oregon State University
Computer Graphics
mjb – February 28, 2017
15