OpenMP - Reference Book
OpenMP - Reference Book
OpenMP
Table of Contents
1. Introduction
1. What is OpenMP?
2. History
3. Goals of OpenMP
2. OpenMP Programming Model
3. OpenMP Directives
1. Fortran Directive Format
2. C/C++ Directive Format
3. Directive Scoping
4. PARALLEL Construct
5. Work-Sharing Constructs
1. DO / for Directive
2. SECTIONS Directive
3. SINGLE Directive
6. Combined Parallel Work-Sharing Constructs
1. PARALLEL DO / parallel for Directive
2. PARALLEL SECTIONS Directive
7. Synchronization Constructs
1. MASTER Directive
2. CRITICAL Directive
3. BARRIER Directive
4. ATOMIC Directive
5. FLUSH Directive
6. ORDERED Directive
8. THREADPRIVATE Directive
9. Data Scope Attribute Clauses
1. PRIVATE Clause
2. SHARED Clause
3. DEFAULT Clause
4. FIRSTPRIVATE Clause
5. LASTPRIVATE Clause
6. COPYIN Clause
7. REDUCTION Clause
10. Clauses / Directives Summary
11. Directive Binding and Nesting Rules
4. Run-Time Library Routines
1. OMP_SET_NUM_THREADS
2. OMP_GET_NUM_THREADS
3. OMP_GET_MAX_THREADS
4. OMP_GET_THREAD_NUM
5. OMP_GET_NUM_PROCS
6. OMP_IN_PARALLEL
7. OMP_SET_DYNAMIC
8. OMP_GET_DYNAMIC
9. OMP_SET_NESTED
10. OMP_GET_NESTED
11. OMP_INIT_LOCK
12. OMP_DESTROY_LOCK
13. OMP_SET_LOCK
14. OMP_UNSET_LOCK
15. OMP_TEST_LOCK
5. Environment Variables
6. LLNL Specific Information and Recommendations
7. References and More Information
8. Exercise
Introduction
What is OpenMP?
OpenMP Is:
● An Application Program Interface (API) that may be used to explicitly direct multi-threaded, shared memory
parallelism
❍ Environment Variables
● Portable:
❍ The API is specified for C/C++ and Fortran
❍ Multiple platforms have been implemented including most Unix platforms and Windows NT
● Standardized:
❍ Jointly defined and endorsed by a group of major computer hardware and software vendors
OpenMP Is Not:
● Guaranteed to make the most efficient use of shared memory (currently there are no data locality constructs)
History
Ancient History
● In the early 90's, vendors of shared-memory machines supplied similar, directive-based, Fortran programming
extensions:
❍ The user would augment a serial Fortran program with directives specifying which loops were to be
parallelized
❍ The compiler would be responsible for automatically parallelizing such loops across the SMP
processors
● Implementations were all functionally similar, but were diverging (as usual)
● First attempt at a standard was the draft for ANSI X3H5 in 1994. It was never adopted, largely due to waning
interest as distributed memory machines became popular.
Recent History
● The OpenMP standard specification started in the spring of 1997, taking over where ANSI X3H5 had left off,
as newer shared memory machine architectures started to become prevalent.
❍ Compaq / Digital
❍ Hewlett-Packard Company
❍ Intel Corporation
❍ International Business Machines (IBM)
❍ Kuck & Associates, Inc. (KAI)
❍ Absoft Corporation
❍ Edinburgh Portable Compilers
❍ GENIAS Software GmBH
❍ Myrias Computer Technologies, Inc.
❍ The Portland Group, Inc. (PGI)
Release History
● Visit the OpenMP website at http://www.openmp.org/ for more information, including API specifications,
FAQ, presentations, discussions, media releases, calendar and membership application.
Goals of OpenMP
Standardization:
● Establish a simple and limited set of directives for programming shared memory machines. Significant
parallelism can be implemented by using just 3 or 4 directives.
Ease of Use:
● Provide capability to incrementally parallelize a serial program, unlike message-passing libraries which
typically require an all or nothing approach
Portability:
● A shared memory process can consist of multiple threads. OpenMP is based upon the existence of multiple
threads in the shared memory programming paradigm.
Explicit Parallelism:
● OpenMP is an explicit (not automatic) programming model, offering the programmer full control over
parallelization.
● All OpenMP programs begin as a single process: the master thread. The master thread executes sequentially
until the first parallel region construct is encountered.
● The statements in the program that are enclosed by the parallel region construct are then executed in parallel
among the various team threads
● JOIN: When the team threads complete the statements in the parallel region construct, they synchronize and
terminate, leaving only the master thread
● Virtually all of OpenMP parallelism is specified through the use of compiler directives which are imbedded in
C/C++ or Fortran source code.
● The API provides for the placement of parallel constructs inside of other parallel constructs.
Dynamic Threads:
● The API provides for dynamically altering the number of threads which may used to execute different parallel
regions.
PROGRAM HELLO
Serial code
.
.
.
END
#include <omp.h>
main () {
Serial code
.
.
.
OpenMP Directives
Format:
All Fortran OpenMP directives A valid OpenMP directive. Optional. Clauses can be in
must begin with a sentinel. The Must appear after the any order, and repeated as
accepted sentinels depend upon the sentinel and before any necessary unless otherwise
type of Fortran source. Possible clauses. restricted.
sentinels are:
!$OMP
C$OMP
*$OMP
Example:
● !$OMP C$OMP *$OMP are accepted sentinels and must start in column 1
● All Fortran fixed form rules for line length, white space, continuation and comment columns apply for the
entire directive line
● !$OMP is the only accepted sentinel. Can appear in any column, but must be preceded by white space only.
● All Fortran free form rules for line length, white space, continuation and comment columns apply for the
entire directive line
● Continuation lines must have an ampersand as the last non-blank character in a line. The following line must
begin with a sentinel and then the continuation directives.
General Rules:
● Fortran compilers which are OpenMP enabled generally include a command line option which instructs the
compiler to activate and interpret all OpenMP directives.
● Several Fortran OpenMP directives come in pairs and have the form:
!$OMP directive
OpenMP Directives
Format:
#pragma omp
directive-name [clause, ...] newline
Required for all OpenMP A valid OpenMP Optional. Clauses Required. Proceeds
C/C++ directives. directive. Must can be in any the structured
appear after the order, and repeated block which is
pragma and before as necessary unless enclosed by this
any clauses. otherwise directive.
restricted.
Example:
General Rules:
● Case sensitive
● Only one directive-name may be specified per directive (true with Fortran also)
● Each directive applies to at most one succeeding statement, which must be a structured block.
● Long directive lines can be "continued" on succeeding lines by escaping the newline character with a
backslash ("\") at the end of a directive line.
OpenMP Directives
Directive Scoping
● The code textually enclosed between the beginning and the end of a structured block following a directive.
● The static extent of a directives does not span multiple routines or code files
Orphaned Directive:
● An OpenMP directive that appears independently from another enclosing directive is said to be an orphaned
Dynamic Extent:
● The dynamic extent of a directive includes both its static (lexical) extent and the extents of its orphaned
directives.
Example:
DYNAMIC EXTENT
● OpenMP specifies a number of scoping rules on how directives may associate (bind) and nest within each
other
● Illegal and/or incorrect programs may result if the OpenMP binding and nesting rules are ignored
OpenMP Directives
Purpose:
● A parallel region is a block of code that will be executed by multiple threads. This is the fundamental
OpenMP parallel construct.
Format:
block
structured_block
Notes:
● When a thread reaches a PARALLEL directive, it creates a team of threads and becomes the master of the
team. The master is a member of that team and has thread number 0 within that team.
● Starting from the beginning of this parallel region, the code is duplicated and all threads will execute that
code.
● There is an implied barrier at the end of a parallel section. Only the master thread continues execution past
this point.
● The number of threads in a parallel region is determined by the following factors, in order of precedence:
3. Implementation default
Dynamic Threads:
● By default, a program with multiple parallel regions will use the same number of threads to execute each
region. This behavior can be changed to allow the run-time system to dynamically adjust the number of
threads that are created for a given parallel section. The two methods available for enabling dynamic threads
are:
● A parallel region nested within another parallel region results in the creation of a new team, consisting of one
thread, by default.
● Implementations may allow more than one thread in nested parallel regions
Clauses:
● IF clause: If present, it must evaluate to .TRUE. (Fortran) or non-zero (C/C++) in order for a team of threads
to be created. Otherwise, the region is executed serially by the master thread.
● The remaining clauses are described in detail later, in the Data Scope Attribute Clauses section.
Restrictions:
● A parallel region must be a structured block that does not span multiple routines or code files
● Unsynchronized Fortran I/O to the same unit by multiple threads has unspecified behavior
❍ OpenMP library routines are used to obtain thread identifiers and total number of threads
PROGRAM HELLO
END
#include <omp.h>
main () {
OpenMP Directives
Work-Sharing Constructs
● A work-sharing construct divides the execution of the enclosed code region among the members of the team
that encounter it.
● There is no implied barrier upon entry to a work-sharing construct, however there is an implied barrier at the
end of a work sharing construct.
DO / for - shares iterations of a SECTIONS - breaks work into SINGLE - serializes a section of
loop across the team. Represents a separate, discrete sections. Each code
type of "data parallelism". section is executed by a thread.
Can be used to implement a type
of "functional parallelism".
Restrictions:
● A work-sharing construct must be enclosed dynamically within a parallel region in order for the directive to
execute in parallel.
● Successive work-sharing constructs must be encountered in the same order by all members of a team
OpenMP Directives
Work-Sharing Constructs
DO / for Directive
Purpose:
● The DO / for directive specifies that the iterations of the loop immediately following it must be executed in
parallel by the team. This assumes a parallel region has already been initiated, otherwise it executes in serial
on a single processor.
Format:
do_loop
for_loop
Clauses:
● SCHEDULE clause: Describes how iterations of the loop are divided among the threads in the team. For
both C/C++ and Fortran:
STATIC:
Loop iterations are divided into pieces of size chunk and then staticly assigned to threads. If chunk is
not specified, the iterations are evenly (if possible) divided contiguously among the threads.
DYNAMIC:
Loop iterations are divided into pieces of size chunk, and dynamically scheduled among the threads;
when a thread finishes one chunk, it is dynamically assigned another. The default chunk size is 1.
GUIDED:
The chunk size is exponentially reduced with each dispatched piece of the iteration space. The chunk
size specifies the minimum number of iterations to dispatch each time.. The default chunk size is 1.
RUNTIME:
The scheduling decision is deferred until runtime by the environment variable OMP_SCHEDULE. It
is illegal to specify a chunk size for this clause.
The default schedule is implementation dependent. Implementation may also vary slightly in the way
the various schedules are implemented.
● ORDERED clause: Must be present when ORDERED directives are enclosed within the DO/for directive.
See Ordered Directive.
● NO WAIT (Fortran) / nowait (C/C++) clause: if specified, then threads do not synchronize at the end of the
parallel loop. Threads proceed directly to the next statements after the loop. For Fortran, the END DO
directive is optional with NO WAIT being the default.
● Other clauses are described in detail later, in the Data Scope Attribute Clauses section.
Restrictions:
● The DO loop can not be a DO WHILE loop, or a loop without loop control. Also, the loop iteration variable
must be an integer and the loop control parameters must be the same for all threads.
● Program correctness must not depend upon which thread executes a particular iteration.
● The chunk size must be specified as a loop invarient integer expression, as there is no synchronization during
its evaluation by different threads.
● The C/C++ for directive requires that the for-loop must have canonical shape. See the OpenMP API
specification for details.
❍ Variable I will be private to each thread; each thread will have its own unique copy.
❍ The iterations of the loop will be distributed dynamically in CHUNK sized pieces.
❍ Threads will not synchronize upon completing their individual pieces of work (NOWAIT).
PROGRAM VEC_ADD_DO
! Some initializations
DO I = 1, N
A(I) = I * 1.0
B(I) = A(I)
ENDDO
CHUNK = CHUNKSIZE
!$OMP DO SCHEDULE(DYNAMIC,CHUNK)
DO I = 1, N
C(I) = A(I) + B(I)
ENDDO
!$OMP END DO NOWAIT
END
#include <omp.h>
#define CHUNKSIZE 100
#define N 1000
main ()
{
int i, chunk;
float a[N], b[N], c[N];
/* Some initializations */
for (i=0; i < N; i++)
a[i] = b[i] = i * 1.0;
chunk = CHUNKSIZE;
OpenMP Directives
Work-Sharing Constructs
SECTIONS Directive
Purpose:
● The SECTIONS directive is a non-iterative work-sharing construct. It specifies that the enclosed section(s) of
code are to be divided among the threads in the team.
● Independent SECTION directives are nested within a SECTIONS directive Each SECTION is executed once
by a thread in the team. Different sections will be executed by different threads.
Format:
!$OMP SECTION
Fortran block
!$OMP SECTION
block
structured_block
Clauses:
● There is an implied barrier at the end of a SECTIONS directive, unless the nowait (C/C++) or NOWAIT
(Fortran) clause is used.
● Clauses are described in detail later, in the Data Scope Attribute Clauses section.
Questions:
What happens if the number of threads and the number of SECTIONs are different? More threads than
SECTIONs? Less threads than SECTIONs?
Answer
Restrictions:
● SECTION directives must occur within the lexical extent of an enclosing SECTIONS directive
● Simple vector-add program - similar to example used previously for the DO / for directive.
❍ The first n/2 iterations of the DO loop will be distributed to the first thread, and the rest will be
❍ When each thread finishes its block of iterations, it proceeds with whatever code comes next
(NOWAIT)
PROGRAM VEC_ADD_SECTIONS
INTEGER N, I
PARAMETER (N=1000)
REAL A(N), B(N), C(N)
! Some initializations
DO I = 1, N
A(I) = I * 1.0
B(I) = A(I)
ENDDO
!$OMP SECTIONS
!$OMP SECTION
DO I = 1, N/2
C(I) = A(I) + B(I)
ENDDO
!$OMP SECTION
DO I = 1+N/2, N
C(I) = A(I) + B(I)
ENDDO
END
#include <omp.h>
#define N 1000
main ()
{
int i;
float a[N], b[N], c[N];
/* Some initializations */
for (i=0; i < N; i++)
} /* end of sections */
OpenMP Directives
Work-Sharing Constructs
SINGLE Directive
Purpose:
● The SINGLE directive specifies that the enclosed code is to be executed by only one thread in the team.
● May be useful when dealing with sections of code that are not thread safe (such as I/O)
Format:
Fortran block
Clauses:
● Threads in the team that do not execute the SINGLE directive, wait at the end of the enclosed code block,
unless a nowait (C/C++) or NOWAIT (Fortran) clause is specified.
● Clauses are described in detail later, in the Data Scope Attribute Clauses section.
Restrictions:
OpenMP Directives
● Iterations of the DO/for loop will be distributed in equal sized blocks to each thread in the team (SCHEDULE
STATIC)
PROGRAM VECTOR_ADD
! Some initializations
DO I = 1, N
A(I) = I * 1.0
B(I) = A(I)
ENDDO
CHUNK = CHUNKSIZE
!$OMP PARALLEL DO
!$OMP& SHARED(A,B,C,CHUNK) PRIVATE(I)
!$OMP& SCHEDULE(STATIC,CHUNK)
DO I = 1, N
C(I) = A(I) + B(I)
ENDDO
END
#include <omp.h>
#define N 1000
#define CHUNKSIZE 100
main () {
int i, chunk;
float a[N], b[N], c[N];
/* Some initializations */
for (i=0; i < N; i++)
a[i] = b[i] = i * 1.0;
chunk = CHUNKSIZE;
OpenMP Directives
Purpose:
● The PARALLEL SECTIONS directive specifies a parallel region containing a single SECTIONS directive.
The single SECTIONS directive must follow immediately as the very next statement.
Format:
structured_block
structured_block
Clauses:
● The accepted clauses can be any of the clauses accepted by the PARALLEL and SECTIONS directives.
Clauses not previously discussed, are described in detail later, in the Data Scope Attribute Clauses section.
OpenMP Directives
Synchronization Constructs
● Consider a simple example where two threads on two different processors are both trying to increment a
variable x at the same time (assume x is initially 0):
THREAD 1: THREAD 2:
increment(x) increment(x)
{ {
x = x + 1; x = x + 1;
} }
THREAD 1: THREAD 2:
● To avoid a situation like this, the incrementation of x must be synchronized between the two threads to insure
that the correct result is produced.
● OpenMP provides a variety of Synchronization Constructs that control how the execution of each thread
proceeds relative to other team threads.
OpenMP Directives
Synchronization Constructs
MASTER Directive
Purpose:
● The MASTER directive specifies a region that is to be executed only by the master thread of the team. All
other threads on the team skip this section of code
Format:
!$OMP MASTER
block
Fortran
!$OMP END MASTER
C/C++ structured_block
Restrictions:
OpenMP Directives
Synchronization Constructs
CRITICAL Directive
Purpose:
● The CRITICAL directive specifies a region of code that must be executed by only one thread at a time.
Format:
block
Fortran
!$OMP END CRITICAL
C/C++ structured_block
Notes:
● If a thread is currently executing inside a CRITICAL region and another thread reaches that CRITICAL
region and attempts to execute it, it will block until the first thread exits that CRITICAL region.
same region.
❍ All CRITICAL sections which are unnamed, are treated as the same section.
Restrictions:
● All threads in the team will attempt to execute in parallel, however, because of the CRITICAL construct
surrounding the increment of x, only one thread will be able to read/increment/write x at any time
PROGRAM CRITICAL
INTEGER X
X = 0
!$OMP CRITICAL
X = X + 1
!$OMP END CRITICAL
END
#include <omp.h>
main()
{
int x;
x = 0;
OpenMP Directives
Synchronization Constructs
BARRIER Directive
Purpose:
● When a BARRIER directive is reached, a thread will wait at that point until all other threads have reached
that barrier. All threads then resume executing in parallel the code that follows the barrier.
Format:
!$OMP BARRIER
Fortran
Restrictions:
● For C/C++, the smallest statement that contains a barrier must be a structured block. For example:
WRONG RIGHT
if (x == 0)
if (x == 0) {
#pragma omp barrier #pragma omp barrier
}
OpenMP Directives
Synchronization Constructs
ATOMIC Directive
Purpose:
● The ATOMIC directive specifies that a specific memory location must be updated atomically, rather than
letting multiple threads attempt to write to it. In essence, this directive provides a mini-CRITICAL section.
Format:
!$OMP ATOMIC
Fortran statement_expression
C/C++ statement_expression
Restrictions:
Fortran C / C++
● Note: Only the load and store of x are atomic; the evaluation of the expression is not atomic.
OpenMP Directives
Synchronization Constructs
FLUSH Directive
Purpose:
● The FLUSH directive identifies a synchronization point at which the implementation must provide a
consistent view of memory. Thread-visible variables are written back to memory at this point.
Format:
Notes:
❍ Local variables that do not have the SAVE attribute but have had their address used by another
subprogram
❍ Local variables that do not have the SAVE attribute that are declared shared in a parallel region within
the subprogram
❍ Dummy arguments
❍ All pointer dereferences
● The optional list contains a list of named variables that will be flushed in order to avoid flushing all variables.
For pointers in the list, note that the pointer itself is flushed, not the object it points to.
● Implementations must ensure any prior modifications to thread-visible variables are visible to all threads after
this point; ie. compilers must restore values from registers to memory, hardware might need to flush write
buffers, etc
● The FLUSH directive is implied for the directives shown in the table below. The directive is not implied if a
NOWAIT clause is present.
Fortran C / C++
BARRIER barrier
CRITICAL and END CRITICAL critical- upon entry and exit
END DO ordered- upon entry and exit
END PARALLEL parallel- upon exit
END SECTIONS for- upon exit
END SINGLE sections- upon exit
ORDERED and END ORDERED single- upon exit
OpenMP Directives
Synchronization Constructs
ORDERED Directive
Purpose:
● The ORDERED directive specifies that iterations of the enclosed loop will be executed in the same order as if
they were executed on a serial processor.
Format:
!$OMP ORDERED
(block)
Fortran
!$OMP END ORDERED
C/C++ structured_block
Restrictions:
● An ORDERED directive can only appear in the dynamic extent of the following directives:
❍ DO or PARALLEL DO (Fortran)
● An iteration of a loop must not execute the same ORDERED directive more than once, and it must not
execute more than one ORDERED directive.
● A loop which contains an ORDERED directive, must be a loop with an ORDERED clause.
OpenMP Directives
THREADPRIVATE Directive
Purpose:
● The THREADPRIVATE directive is used to make global file scope variables (C/C++) or common blocks
(Fortran) local and persistent to a thread through the execution of multiple parallel regions.
Format:
Notes:
● The directive must appear after the declaration of listed variables/common blocks. Each thread then gets its
own copy of the variable/common block, so data written by one thread is not visible to other threads. For
example:
PROGRAM THREADPRIV
!$OMP THREADPRIVATE(/A/)
END
main () {
● On first entry to a parallel region, data in THREADPRIVATE variables and common blocks should be
assumed undefined, unless a COPYIN clause is specified in the PARALLEL directive
● THREADPRIVATE variables differ from PRIVATE variables (discussed later) because they are able to
persist between different parallel sections of a code.
Restrictions:
● Data in THREADPRIVATE objects is guaranteed to persist only if the dynamic threads mechanism is "turned
off" and the number of threads in different parallel regions remains constant. The default setting of dynamic
threads is undefined.
● The THREADPRIVATE directive must appear after every declaration of a thread private variable/common
block.
OpenMP Directives
● An important consideration for OpenMP programming is the understanding and use of data scoping
● Because OpenMP is based upon the shared memory programming model, most variables are shared by
default
● The OpenMP Data Scope Attribute Clauses are used to explicitly define how variables should be scoped.
They include:
❍ PRIVATE
❍ FIRSTPRIVATE
❍ LASTPRIVATE
❍ SHARED
❍ DEFAULT
❍ REDUCTION
❍ COPYIN
● Data Scope Attribute Clauses are used in conjunction with several directives (PARALLEL, DO/for, and
SECTIONS) to control the scoping of enclosed variables.
● These constructs provide the ability to control the data environment during execution of parallel constructs.
❍ They define how and which data variables in the serial section of the program are transferred to the
parallel sections of the program (and back)
❍ They define which variables will be visible to all threads in the parallel sections and which variables
will be privately allocated to all threads.
● Note: Data Scope Attribute Clauses are effective only within their lexical/static extent.
● See the Clauses / Directives Summary Table for the associations between directives and clauses.
PRIVATE Clause
Purpose:
● The PRIVATE clause declares variables in its list to be private to each thread.
Format:
private (list)
C/C++
Notes:
❍ A new object of the same type is declared once for each thread in the team
❍ All references to the original object are replaced with references to the new object
PRIVATE THREADPRIVATE
Persistent? No Yes
Questions:
For the C/C++ and Fortran THREADPRIVATE example codes, what output would you expect for
alpha[3] and beta[3]? Why?
Answer
SHARED Clause
Purpose:
● The SHARED clause declares variables in its list to be shared among all threads in the team.
Format:
shared (list)
C/C++
Notes:
● A shared variable exists in only one memory location and all threads can read or write to that address
● It is the programmer's responsibility to ensure that multiple threads properly access SHARED variables (such
as via CRITICAL sections)
DEFAULT Clause
Purpose:
● The DEFAULT clause allows the user to specify a default PRIVATE, SHARED, or NONE scope for all
variables in the lexical extent of any parallel region.
Format:
Notes:
● Specific variables can be exempted from the default using the PRIVATE, SHARED, FIRSTPRIVATE,
● The C/C++ OpenMP specification does not include "private" as a possible default. However, actual
implementations may provide this option.
Restrictions:
FIRSTPRIVATE Clause
Purpose:
● The FIRSTPRIVATE clause combines the behavior of the PRIVATE clause with automatic initialization of
the variables in its list.
Format:
firstprivate (list)
C/C++
Notes:
● Listed variables are initialized according to the value of their original objects prior to entry into the parallel or
work-sharing construct.
LASTPRIVATE Clause
Purpose:
● The LASTPRIVATE clause combines the behavior of the PRIVATE clause with a copy from the last loop
iteration or section to the original variable object.
Format:
lastprivate (list)
C/C++
Notes:
● The value copied back into the original variable object is obtained from the last (sequentially) iteration or
section of the enclosing construct.
For example, the team member which executes the final iteration for a DO section, or the team member which
does the last SECTION of a SECTIONS context performs the copy with its own values
COPYIN Clause
Purpose:
● The COPYIN clause provides a means for assigning the same value to THREADPRIVATE variables for all
threads in the team.
Format:
copyin (list)
C/C++
Notes:
● List contains the names of variables to copy. In Fortran, the list can contain both the names of common blocks
and named variables.
● The master thread variable is used as the copy source. The team threads are initialized with its value upon
entry into the parallel construct.
REDUCTION Clause
Purpose:
● The REDUCTION clause performs a reduction on the variables that appear in its list.
● A private copy for each list variable is created for each thread. At the end of the reduction, the reduction
variable is applied to all private copies of the shared variable, and the final result is written to the global
shared variable.
Format:
● Iterations of the parallel loop will be distributed in equal sized blocks to each thread in the team (SCHEDULE
STATIC)
● At the end of the parallel loop construct, all threads will add their values of "result" to update the master
thread's global copy.
PROGRAM DOT_PRODUCT
! Some initializations
DO I = 1, N
A(I) = I * 1.0
B(I) = I * 2.0
ENDDO
RESULT= 0.0
CHUNK = CHUNKSIZE
!$OMP PARALLEL DO
DO I = 1, N
RESULT = RESULT + (A(I) * B(I))
ENDDO
#include <omp.h>
main () {
int i, n, chunk;
float a[100], b[100], result;
/* Some initializations */
n = 100;
chunk = 10;
result = 0.0;
for (i=0; i < n; i++)
{
a[i] = i * 1.0;
b[i] = i * 2.0;
}
Restrictions:
● Variables in the list must be named scalar variables. They can not be array or structure type variables. They
must also be declared SHARED in the enclosing context.
● The REDUCTION clause is intended to be used on a region or work-sharing construct in which the reduction
variable is used only in statements which have one of following forms:
Fortran C / C++
OpenMP Directives
● The table below summarizes which clauses are accepted by which OpenMP directives.
Directive
Clause PARALLEL PARALLEL
PARALLEL DO/for SECTIONS SINGLE
DO/for SECTIONS
IF
PRIVATE
SHARED
DEFAULT
FIRSTPRIVATE
LASTPRIVATE
REDUCTION
COPYIN
SCHEDULE
ORDERED
NOWAIT
❍ CRITICAL
❍ BARRIER
❍ ATOMIC
❍ FLUSH
❍ ORDERED
❍ THREADPRIVATE
● Implementations may (and do) differ from the standard in which clauses are supported by each directive.
OpenMP Directives
This section is provided mainly as a quick reference on rules which govern OpenMP directives and binding.
Users should consult their implementation documentation and the OpenMP standard for other rules and
restrictions.
● Unless indicated otherwise, rules apply to both Fortran and C/C++ OpenMP implementations.
● Note: the Fortran API also defines a number of Data Environment rules. Those have not been reproduced
here.
Directive Binding:
● The DO/for, SECTIONS, SINGLE, MASTER and BARRIER directives bind to the dynamically enclosing
PARALLEL, if one exists. If no parallel region is currently being executed, the directives have no effect.
● The ATOMIC directive enforces exclusive access with respect to ATOMIC directives in all threads, not just
the current team.
● The CRITICAL directive enforces exclusive access with respect to CRITICAL directives in all threads, not
just the current team.
● A directive can never bind to any directive outside the closest enclosing PARALLEL.
Directive Nesting:
● A PARALLEL directive dynamically inside another PARALLEL directive logically establishes a new team,
which is composed of only the current thread unless nested parallelism is enabled.
● DO/for, SECTIONS, and SINGLE directives that bind to the same PARALLEL are not allowed to be nested
inside of each other.
● DO/for, SECTIONS, and SINGLE directives are not permitted in the dynamic extent of CRITICAL,
ORDERED and MASTER regions.
● CRITICAL directives with the same name are not permitted to be nested inside of each other.
● BARRIER directives are not permitted in the dynamic extent of DO/for, ORDERED, SECTIONS, SINGLE,
MASTER and CRITICAL regions.
● MASTER directives are not permitted in the dynamic extent of DO/for, SECTIONS and SINGLE directives.
● ORDERED directives are not permitted in the dynamic extent of CRITICAL regions.
● Any directive that is permitted when executed dynamically inside a PARALLEL region is also legal when
executed outside a parallel region. When executed dynamically outside a user-specified parallel region, the
directive is executed with respect to a team composed of only the master thread.
● The OpenMP standard defines an API for library calls that perform a variety of functions:
❍ The lock variable must be accessed only through the locking routines
❍ For Fortran, the lock variable should be of type integer and of a kind large enough to hold an address.
❍ For C/C++, the lock variable must have type omp_lock_t or type omp_nest_lock_t, depending
on the function being used.
● Implementation notes:
Current OpenMP implementations for the SP (IBM and KAI) do not implement nested parallelism
routines. KAI does implement dynamic threads library routines.
OMP_SET_NUM_THREADS
Purpose:
● Sets the number of threads that will be used in the next parallel region.
Format:
SUBROUTINE OMP_SET_NUM_THREADS(scalar_integer_expression)
Fortran
● This routine can only be called from the serial portions of the code
OMP_GET_NUM_THREADS
Purpose:
● Returns the number of threads that are currently in the team executing the parallel region from which it is
called.
Format:
int omp_get_num_threads(void)
C/C++
● If this call is made from a serial portion of the program, or a nested parallel region that is serialized, it will
return 1.
OMP_GET_MAX_THREADS
Purpose:
● Returns the maximum value that can be returned by a call to the OMP_GET_NUM_THREADS function.
int omp_get_max_threads(void)
C/C++
● Generally reflects the number of threads as set by the OMP_NUM_THREADS environment variable or the
OMP_SET_NUM_THREADS() library routine.
OMP_GET_THREAD_NUM
Purpose:
● Returns the thread number of the thread, within the team, making this call. This number will be between 0
and OMP_GET_NUM_THREADS-1. The master thread of the team is thread 0
Format:
int omp_get_thread_num(void)
C/C++
● If called from a nested parallel region, or a serial region, this function will return 0.
Examples:
● Example 1 is the correct way to determine the number of threads in a parallel region.
● Example 2 is incorrect - the TID variable must be PRIVATE
● Example 3 is incorrect - the OMP_GET_THREAD_NUM call is outside the parallel region
Example 1: Correct
PROGRAM HELLO
TID = OMP_GET_THREAD_NUM()
PRINT *, 'Hello World from thread = ', TID
...
END
Example 2: Incorrect
PROGRAM HELLO
!$OMP PARALLEL
TID = OMP_GET_THREAD_NUM()
PRINT *, 'Hello World from thread = ', TID
...
END
Example 3: Incorrect
PROGRAM HELLO
TID = OMP_GET_THREAD_NUM()
PRINT *, 'Hello World from thread = ', TID
!$OMP PARALLEL
...
END
OMP_GET_NUM_PROCS
Purpose:
Format:
int omp_get_num_procs(void)
C/C++
OMP_IN_PARALLEL
Purpose:
● May be called to determine if the section of code which is executing is parallel or not.
Format:
int omp_in_parallel(void)
C/C++
● For Fortran, this function returns .TRUE. if it is called from the dynamic extent of a region executing in
parallel, and .FALSE. otherwise. For C/C++, it will return a non-zero integer if parallel, and zero otherwise.
OMP_SET_DYNAMIC
Purpose:
● Enables or disables dynamic adjustment (by the run time system) of the number of threads available for
execution of parallel regions.
Format:
SUBROUTINE OMP_SET_DYNAMIC(scalar_logical_expression)
Fortran
● For Fortran, if called with .TRUE. then the number of threads available for subsequent parallel regions can be
adjusted automatically by the run-time environment. If called with .FALSE., dynamic adjustment is disabled.
● For C/C++, if dynamic_threads evaluates to non-zero, then the mechanism is enabled, otherwise it is disabled.
● The OMP_SET_DYNAMIC subroutine has precedence over the OMP_DYNAMIC environment variable.
OMP_GET_DYNAMIC
Purpose:
Format:
int omp_get_dynamic(void)
C/C++
● For Fortran, this function returns .TRUE. if dynamic thread adjustment is enabled, and .FALSE. otherwise.
● For C/C++, non-zero will be returned if dynamic thread adjustment is enabled, and zero otherwise.
OMP_SET_NESTED
Purpose:
Format:
SUBROUTINE OMP_SET_NESTED(scalar_logical_expression)
Fortran
● For Fortran, calling this function with .FALSE. will disable nested parallelism, and calling with .TRUE. will
enable it.
● For C/C++, if nested evaluates to non-zero, nested parallelism is enabled; otherwise it is disabled.
OMP_GET_NESTED
Purpose:
Format:
void omp_get_nested
C/C++
● For Fortran, this function returns .TRUE. if nested parallelism is enabled, and .FALSE. otherwise.
● For C/C++, non-zero will be returned if nested parallelism is enabled, and zero otherwise.
OMP_INIT_LOCK
Purpose:
Format:
SUBROUTINE OMP_INIT_LOCK(var)
Fortran SUBROUTINE OMP_INIT_NEST_LOCK(var)
OMP_DESTROY_LOCK
Purpose:
● This subroutine disassociates the given lock variable from any locks.
Format:
SUBROUTINE OMP_DESTROY_LOCK(var)
Fortran SUBROUTINE OMP_DESTROY_NEST_LOCK(var)
● It is illegal to call this routine with a lock variable that is not initialized.
OMP_SET_LOCK
Purpose:
● This subroutine forces the executing thread to wait until the specified lock is available. A thread is granted
ownership of a lock when it becomes available.
Format:
SUBROUTINE OMP_SET_LOCK(var)
Fortran SUBROUTINE OMP_SET_NEST_LOCK(var)
● It is illegal to call this routine with a lock variable that is not initialized.
OMP_UNSET_LOCK
Purpose:
Format:
SUBROUTINE OMP_UNSET_LOCK(var)
Fortran SUBROUTINE OMP_UNSET_NEST_LOCK(var)
● It is illegal to call this routine with a lock variable that is not initialized.
OMP_TEST_LOCK
Purpose:
● This subroutine attempts to set a lock, but does not block if the lock is unavailable.
Format:
SUBROUTINE OMP_TEST_LOCK(var)
Fortran SUBROUTINE OMP_TEST_NEST_LOCK(var)
● For Fortran, .TRUE. is returned if the lock was set successfully, otherwise .FALSE. is returned.
● For C/C++, non-zero is returned if the lock was set successfully, otherwise zero is returned.
● It is illegal to call this routine with a lock variable that is not initialized.
Environment Variables
● OpenMP provides four environment variables for controlling the execution of parallel code.
● All environment variable names are uppercase. The values assigned to them are not case sensitive.
OMP_SCHEDULE
Applies only to DO, PARALLEL DO (Fortran) and for, parallel for (C/C++) directives
which have their schedule clause set to RUNTIME. The value of this variable determines how
iterations of the loop are scheduled on processors. For example:
OMP_NUM_THREADS
Sets the maximum number of threads to use during execution. For example:
setenv OMP_NUM_THREADS 8
OMP_DYNAMIC
Enables or disables dynamic adjustment of the number of threads available for execution of parallel
regions. Valid values are TRUE or FALSE. For example:
OMP_NESTED
Enables or disables nested parallelism. Valid values are TRUE or FALSE. For example:
● Implementation notes:
The current IBM OpenMP implementations (IBM and KAI) for the SP do not implement nested
parallelism. The KAI implementation does implement dynamic threads.
LC OpenMP Implementations:
● OpenMP is fully supported in the native compilers of all IBM, Intel and Compaq systems. Additionally, the
KAI Guide products, which fully support OpenMP, are available on LC production machines.
● LC maintains different versions of compilers. For the most recent information, please see:
www.llnl.gov/asci/platforms/bluepac/CompsAvails.html
Compiling:
Documentation:
● "OpenMP C and C++ Application Program Interface, Version 1.0". OpenMP Architecture Review Board.
October 1998.
● "OpenMP Fortran Application Program Interface, Version 1.0". OpenMP Architecture Review Board.
October 1997.
● "OpenMP". Workshop presentation. John Engle, Lawrence Livermore National Laboratory. October, 1998.
● "Introduction to OpenMP Using the KAP/PRO Toolset". Kuck & Associates, Inc.
● "Guide Reference Manual (C/C++ Edition, Version 3.6". Kuck & Associates, Inc.
● "Guide Reference Manual (Fortran Edition, Version 3.6". Kuck & Associates, Inc.