04
04
04
Programming
Introduction to OpenMP
These slides were originally written by Dr. Barbara Chapman, University of Houston
Outline
• Introduction to OpenMP
• Parallel Programming with OpenMP
– Worksharing, tasks, data environment, synchronization
• OpenMP Performance and Best Practices
• Hybrid MPI/OpenMP
• Case Studies and Examples
• Reference Materials
2
OpenMP* Overview
C$OMP FLUSH #pragma omp critical
C$OMP THREADPRIVATE(/ABC/) CALL OMP_SET_NUM_THREADS(10)
C$OMP OpenMP: An API for
parallel do shared(a, Writing
b, c) Multithreaded
call omp_test_lock(jlok)
call OMP_INIT_LOCK (ilok)
Applications
C$OMP MASTER
C$OMP ATOMIC
C$OMP Stands
SINGLE for Open
PRIVATE(X) Multi-Processing
setenv OMP_SCHEDULE “dynamic”
A setDOofORDERED
C$OMP PARALLEL compiler directives
PRIVATE (A, B, and
C) library
C$OMP ORDERED
routines for parallel application programmers
C$OMP PARALLEL REDUCTION (+: A, B)
Greatly simplifies writing multi-threaded (MT)
C$OMP SECTIONS
programs
#pragma omp parallelin Fortran,
for C and
private(A, B) C++!$OMP BARRIER
Standardizes
C$OMP PARALLEL last 20 years
COPYIN(/blk/) C$OMP of
DO SMP practice
lastprivate(XX)
• Widely available
• Single source code: parallel
and sequential code
• Ease of use, incremental
approach to programming
• Can be combined with MPI
to create “hybrid” code
www.compunity.org
The OpenMP ARB 2011
www.openmp.org
OpenMP Meeting 2013
8
• Oct 1997 – 1.0 Fortran
• Oct 1998 – 1.0 C/C++
• Nov 1999 – 1.1 Fortran (interpretations added)
• Nov 2000 – 2.0 Fortran
• Mar 2002 – 2.0 C/C++
• May 2005 – 2.5 Fortran/C/C++ (mostly a merge)
• Apr 2008 – 3.0 Fortran/C/C++ (extensions)
• July 2011 – 3.1 Fortran/C/C++ (extensions)
• March 2013 – 4.0 Fortran/C/C++ (extensions)
• Nov 2015 – 4.5 Fortran/C/C++ (extensions)
• Nov 2018 – 5.0 Fortran/C/C++ (extensions)
• Nov 2020 – 5.1 Fortran/C/C++ (extensions)
http://www.openmp.org
OpenMP Overview
• A set of compiler directives inserted in the source
program
• Also some library functions
• Ideally, compiler directives do not affect
sequential code
– pragmas in C / C++
– (specially written) comments in Fortran code
The OpenMP Shared Memory API
• High-level directive-based multithreaded programming
– The user makes strategic decisions
– Compiler figures out details
– Threads communicate by sharing variables
– Synchronization to order accesses and prevent data conflicts
– Structured programming to reduce likelihood of bugs
11
Summary: What is OpenMP?
• De-facto standard API to write shared memory
parallel applications in C, C++, and Fortran
• Consists of:
– Compiler directives
– Runtime routines
– Environment variables
• Initial version released end of 1997
– For Fortran only
– Subsequent releases for C, C++
• Version 2.5 merged specs for all three languages
12
OpenMP Components
Runtime Environment
Directives environment variables
• Parallel region • Number of threads • Number of threads
• Worksharing constructs • Thread ID • Scheduling type
• Dynamic thread
• Tasking adjustment • Dynamic thread
adjustment
• Synchronization • Nested parallelism
• Nested parallelism
• Data-sharing attributes • Schedule
• Active levels • Stacksize
• Thread limit • Idle threads
• Nesting level • Active levels
• Ancestor thread • Thread limit
• Team size
• Locking
• Wallclock timer
13
OpenMP Syntax
• Most OpenMP constructs are compiler directives using pragmas.
– For C and C++, the pragmas take the form:
#pragma omp construct [clause [clause]…]
– For Fortran, the directives take one of the forms:
• Fixed form
*$OMP construct [clause [clause]…]
C$OMP construct [clause [clause]…]
• Free form (but works for fixed form too)
!$OMP construct [clause [clause]…]
• Include file and the OpenMP lib module
#include <omp.h>
use omp_lib
statement 1;
#pragma <specific OpenMP directive>
statement2;
statement3;
Statement 2 (may be) executed in parallel
Statement 1 and 3 are executed sequentially
Idea of OpenMP
statement 1
!$OMP <specific OpenMP directive>
statement2
!$OMP END <specific OpenMP directive>
statement3
Statement 2 (may be) executed in parallel
Statement 1 and 3 are executed sequentially
Basic Idea of OpenMP
23
Status of OpenMP Implementation
sequential Sequential
compiler Program
Annotated Fortran/C/C++
Source compiler
Parallel
OpenMP Program
compiler
OpenMP Usage
data = 0
data 1
data 0 data 0
p1 p2
CS267 Lecture 6 28
OpenMP Memory Model
• OpenMP assumes a shared memory
• Threads communicate by sharing variables.
29
How do threads interact?
• OpenMP is a shared memory model.
• Threads interact (“communicate”) by sharing variables
• Unintended sharing of data causes race conditions:
• the program’s outcome may change if the threads are
scheduled differently
• To prevent race conditions:
• Use synchronization to order data access and protect data
conflicts
• Synchronization is expensive so:
• Do what you can to change how data is accessed to minimize
the need for synchronization.
OpenMP Parallel Computing Solution Stack
End User
Application
Directives, Environment
OpenMP library
Compiler variables
Runtime library
OS/system
31
Parallel Regions
• You create threads in OpenMP with the “omp
parallel” pragma.
• For example, to create a 4 thread parallel region:
Runtime function
double A[1000]; to request a
Each thread
executes a omp_set_num_threads(4); certain number of
copy of the #pragma omp parallel threads
code within {
the int ID = omp_get_thread_num();
structured pooh(ID,A); Runtime function
block returning a thread
}
ID
Each thread calls pooh(ID,A) for ID = 0 to 3
Parallel Regions
double A[1000];
• Each thread executes the omp_set_num_threads(4);
same code redundantly. #pragma omp parallel
{
int ID = omp_get_thread_num();
double A[1000]; pooh(ID, A);
}
omp_set_num_threads(4) printf(“all done\n”);
A single
copy of A pooh(0,A) pooh(1,A) pooh(2,A) pooh(3,A)
is shared
between all
threads.
printf(“all done\n”); Threads wait here for all threads to
finish before proceeding (i.e. a barrier)
OpenMP: Structured blocks (C/C++)
• Most constructs apply to structured blocks
• Structured block: a block with one point of entry at the top and one
point of exit at the bottom.
• The only “branches” allowed are STOP statements in Fortran and
exit() in C/C++
OK NOT
#pragma omp parallel if(go_now()) goto more;
{ #pragma omp parallel OK
int id = omp_get_thread_num(); {
more: res[id] = do_big_job(id); int id = omp_get_thread_num();
if(!conv(res[id]) goto more; more: res[id] = do_big_job(id);
} if(conv(res[id]) goto done;
printf(“ All done \n”); go to more;
}
done: if(!really_done()) goto more;
bar.f
foo.f
C$OMP PARALLEL subroutine whoami
call whoami + external omp_get_thread_num
C$OMP END PARALLEL integer iam, omp_get_thread_num
iam = omp_get_thread_num()
C$OMP CRITICAL
Static/lexical Dynamic extent print*,’Hello from ‘, iam
extent of of parallel C$OMP END CRITICAL
parallel region region includes Orphaned directives
return can appear outside a
lexical extent
end parallel construct
36
Exercise:
A multi-threaded “Hello world” program
• Write a multithreaded program where each thread
prints “hello world”.
void main()
{
int ID = 0;
printf(“ hello(%d) ”, ID);
printf(“ world(%d) \n”, ID);
}
A multi-threaded “Hello world” program
• Write a multithreaded program where each thread
prints “hello world”.
#include “omp.h” OpenMP include file
void main() Parallel region with default Sample Output:
{ number of threads
hello(1) hello(0) world(1)
#pragma omp parallel
world(0)
{
hello (3) hello(2)
int ID =
world(2)
omp_get_thread_num();
printf(“ hello(%d) ”, ID); world(3)
printf(“ world(%d) \n”, ID);
} } End of the parallel region Runtime library function to
return a thread ID.
39