OpenMP-More Directives
OpenMP-More Directives
structured_block
single Directive
Clauses and Restrictions
Clauses:
• Threads in the team that do not execute the single directive, wait at the end of the
enclosed code block, unless a nowait clause is specified.
Restrictions:
• It is illegal to branch into or out of a single block
OpenMP Directives: combined directives
Combined parallel directives
• OpenMP provides three directives that are merely conveniences:
• parallel for
• parallel sections
• For the most part, these directives behave identically to an individual
parallel directive being immediately followed by a separate work-
sharing directive
• Most of the rules, clauses and restrictions that apply to both directives
are in effect
• Lets look at one example using parallel for directive
#include <stdio.h>
#include <omp.h>
#define N 1000
parallel for #define CHUNKSIZE 100
Directive main () {
Example int i, chunk;
float a[N], b[N], c[N];
This has the same effect of using a
parallel then a for directive. This is /* Some initializations */
sometimes the best way to do it, for (i=0; i < N; i++) {
however if you need to initialise a[i] = i * 1.0;
variables in parallel first using a
parallel and then a for directive is b[i] = a[i];
required. }
chunk = CHUNKSIZE;
Format:
structured_block
Restrictions:
• It is illegal to branch into or out of a master block
critical Directive
Purpose:
• The critical directive specifies a region of code that must be executed by only one
thread at a time
Format:
#pragma omp critical [name] newline
structured_block
Restrictions:
• It is illegal to branch into or out of a critical block
critical Directive
Notes
Notes:
• If a thread is currently executing inside a critical region and another
thread reaches that critical region and attempts to execute it, it will
block until the first thread exits that critical region
• The optional name enables multiple different critical regions to
exist:
• Names act as global identifiers
• Different critical regions with the same name are treated as the same
region
• All critical sections which are unnamed, are treated as the same section
critical Directive #include <omp.h>
Example main() {
int x;
All threads in the team will x = 0;
attempt to execute in parallel, #pragma omp parallel shared(x)
however, because of the {
CRITICAL construct surrounding #pragma omp critical
the increment of x, only one x = x + 1;
thread will be able to } /* end of parallel section */
read/increment/write x at any }
time
barrier Directive
Purpose:
• The barrier directive synchronizes all threads in the team
• When a barrier directive is reached, a thread will wait at that point until all other threads
have reached that barrier
• All threads then resume executing in parallel the code that follows the barrier
Format:
#pragma omp barrier newline
Restrictions:
• None or all threads in a team must execute the barrier region
atomic Directive
Purpose:
• The atomic directive specifies that a specific memory location must be updated
atomically, rather than letting multiple threads attempt to write to it. In essence, this
directive provides a one line-critical section
Format:
#pragma omp atomic newline
statement_expression
Restrictions:
• The directive applies only to a single, immediately following statement
• An atomic statement must follow a specific syntax (not covered in this course but see the
most recent OpenMP specs for details)
ordered Directive
Purpose:
• The ordered directive specifies that iterations of the enclosed loop will be executed
in the same order as if they were executed on a serial processor
• Threads will need to wait before executing their chunk of iterations if previous
iterations haven't completed yet
• Used within a for loop with an ordered clause
• The ordered directive provides a way to check against serial execution
Format: #pragma omp for ordered [other clauses...]
(loop region)
#pragma omp ordered newline
structured_block
(end of loop region)
ordered Directive
Restrictions
Restrictions:
• An ordered directive can only appear in the dynamic extent of the for or parallel
for directives
• Only one thread is allowed in an ordered section at any time
• It is illegal to branch into or out of an ordered block
• An iteration of a loop must not execute the same ordered directive more than once,
and it must not execute more than one ordered directive
• A loop which contains an ordered directive must also be a loop with an ordered
clause
Data Scope Attribute Clauses
•Also called Data-sharing Attribute Clauses
•An important consideration for OpenMP programming is the understanding and use of
data scoping
•Because OpenMP is based upon the shared memory programming model, most
variables are shared by default
•Global variables include File scope variables, static
•Private variables include:
• Loop index variables
• Stack variables in routines called from parallel regions
Data Scope Attribute Clauses
•The OpenMP Data Scope Attribute Clauses are used to explicitly define
how variables should be scoped. They include:
• private
• shared
• default
• reduction (covered later)
Data Scope Attribute Clauses
•Data Scope Attribute Clauses are used in conjunction with several directives (parallel, for, and
sections) to control the scoping of enclosed variables
•These constructs provide the ability to control the data environment during execution of parallel
constructs
•They define how and which data variables in the serial section of the program are transferred to
the parallel sections of the program (and back)
•They define which variables will be visible to all threads in the parallel sections and which
variables will be privately allocated to all threads
•Data Scope Attribute Clauses are effective only within their lexical/static extent
private Clause
Purpose:
• The private clause declares variables in its list to be private to each thread
Format:
private (list)
Notes:
private variables behave as follows:
• A new object of the same type is declared once for each thread in the team
• All references to the original object are replaced with references to the new object
• Variables declared private should be assumed to be uninitialized for each thread
shared Clause
Purpose:
• The shared clause declares variables in its list to be shared among all threads in the team
Format:
shared (list)
Notes:
• A shared variable exists in only one memory location and all threads can read or write to that
address
• It is the programmer's responsibility to ensure that multiple threads properly access shared
variables (such as via critical sections)
default Clause
Purpose and format
Purpose:
• The default clause allows the user to specify a default scope for all variables in the lexical
extent of any parallel region
Format:
default (shared | none)
default Clause
Notes and Restrictions
Notes:
• Specific variables can be exempted from the default using the private, shared, firstprivate,
lastprivate, and reduction clauses
• The OpenMP specification does not include private or firstprivate as a possible default.
However, actual implementations may provide this option.
• Using none as a default requires that the programmer explicitly scope all variables
Restrictions:
• Only one default clause can be specified on a parallel directive
Environment Variables
OMP_SCHEDULE
•OpenMP provides the environment variables for controlling the execution of
parallel code
•All environment variable names are uppercase
•The values assigned to them are not case sensitive
OMP_SCHEDULE
• applies only to for, parallel directives which have their schedule clause set to runtime
• the value of this variable determines how iterations of the loop are scheduled on processors
for example:
setenv OMP_SCHEDULE "guided, 4"
setenv OMP_SCHEDULE "dynamic"
Environment Variables
OMP_NUM_THEADS and OMP_DYNAMIC
OMP_NUM_THREADS
•Sets the maximum number of threads to use during execution
for example:
setenv OMP_NUM_THREADS 8
OMP_DYNAMIC
•Enables or disables dynamic adjustment of the number of threads available for execution of
parallel regions
•Valid values are true or false
For example:
Setenv OMP_DYNAMIC true
Environment Variables
OMP_NESTED
OMP_NESTED
•Enables or disables nested parallelism
•Valid values are true or false
For example:
setenv OMP_NESTED true
Implementation notes:
•If nested parallelism is supported, it is often only nominal, in that a nested parallel region may
only have one thread
•Experiment and find out for yourself the level of nested parallelism
Environment Variables
OMP_STACKSIZE
OMP_STACKSIZE
•New with OpenMP 3.0. Controls the size of the stack for created (non-Master) threads.
Examples include:
structured_block
structured_block
}
sections Directive
Clauses and Restrictions
Clauses:
•There is an implied barrier at the end of a sections directive,
unless the nowait clause is used
Restrictions:
•It is illegal to branch into or out of section blocks
•section directives must occur within the lexical extent of an
enclosing sections directive
sections Directive
Q. What happens if the number of threads and the number of sections directives are different?
A. If there are more threads than sections, some threads will not execute a section and some
will
If there are more sections than threads, the implementation defines how the extra sections are
executed
}
OpenMP Directives
Threadprivate
threadprivate Directive
Purpose and Format
Purpose:
•The threadprivate directive is used to make global file scope variables local and persistent
to a thread through the execution of multiple parallel regions
Format:
#pragma omp threadprivate (list)
Notes:
•The directive must appear after the declaration of listed variables
•Each thread then gets its own copy of the variable so data written by one thread is not visible to
other threads
• On first entry to a parallel region, data in threadprivate variables should be assumed undefined
• threadprivate variables differ from private variables because they are able to persist between different parallel
sections of a code
threadprivate Directive
Restrictions
Restrictions :
•Data in threadprivate objects is guaranteed to persist only if the dynamic
threads mechanism is "turned off" and the number of threads in different
parallel regions remains constant. The default setting of dynamic threads is
undefined
•A private copy for each list variable is created for each thread. At the end of the reduction, the
reduction variable is applied to all private copies of the shared variable, and the final result is
written to the global shared variable
Format:
reduction (operator:list)
#include <omp.h>
main () {
int i, n, chunk;
float a[100], b[100], result;
reduction Clause /* Some initializations */
Example n = 100;
result = 0.0;
for (i=0; i < n; i++) {
a[i] = i * 1.0;
b[i] = i * 2.0;
}
#pragma omp parallel for \
default(shared) private(i) \
schedule(static) \ <- Iterations of the parallel loop will be distributed in
reduction(+:result) “equal” sized blocksto each thread in the team
for (i=0; i < n; i++)
result = result + (a[i] * b[i]);
printf("Final result= %f\n",result); <- at the end of the parallel loop construct, all
} threads will add their values of "result" to
update the master thread's global copy
reduction Clause
Restrictions
Restrictions:
•Variables in the list must be named scalar variables. They can not be array or structure type
variables. They must also be declared shared in the enclosing context
•Reduction operations may not be associative for real numbers.
•The reduction clause is intended to be used on a region or work-sharing construct in which the
reduction variable is used only in statements which have one of following forms:
x = x op expr x= expr op z
x binop = expr x++
++x x--
--x
X Is a scalar variable in the list
wxpr is a scalar expression that does not reference X
op is not overloaded, and is one of +, *, -, /, &, ^, |, &&, ||
binop is not overloaded and is one of +, *, -, /, &, ^, |,