Module3
Module3
OPENACC DIRECTIVES
Dr. Volker Weinberg | LRZ
MODULE OVERVIEW
OpenACC Directives
A pragma in C/C++ gives instructions to the compiler on how to compile the code.
Compilers that do not understand a particular pragma can freely ignore it.
A directive in Fortran is a specially formatted comment that likewise instructions the
compiler in it compilation of the code and can be freely ignored.
“acc” informs the compiler that what will come is an OpenACC directive
Directives are commands in OpenACC for altering our code.
Clauses are specifiers or additions to directives.
OPENACC PARALLEL DIRECTIVE
OPENACC PARALLEL DIRECTIVE
Explicit programming
Parallel Hardware
The parallel directive instructs the compiler to
CPU create parallel gangs on the accelerator
Gangs are independent groups of worker
threads on the accelerator
The code contained within a parallel directive
<sequential code> is executed redundantly by all parallel gangs
loop
loop
#pragma acc parallel
{ gang gang
loop
loop
loop
for(int i = 0; i < N; i++)
{ gang gang
// Do Something
}
loop
loop
This loop will be gang gang
}
executed redundantly
on each gang
OPENACC PARALLEL DIRECTIVE
Expressing parallelism
#pragma acc parallel In this example, the first loop is not marked with
{ the loop directive
for(int i = 0; i < N; i++)
a[i] = 0; This means that the loop will be “redundantly
parallelized”
#pragma acc loop
for(int j = 0; j < N; j++) Redundant parallelization, in this case, means
a[j]++; that the loop will be run in its entirety, multiple
} times, by the parallel hardware
The second loop is marked with the loop
directive, meaning that the loop iterations will be
properly split across the parallel hardware
OPENACC LOOP DIRECTIVE
Inside of a kernels compute region
#pragma acc kernels With the kernels directive, the loop directive is
{ implied
#pragma acc loop
for(int i = 0; i < N; i++) The programmer can still explicitly define loops
a[i] = 0; with the loop directive, however this could affect
the optimizations the compiler makes
#pragma acc loop
for(int j = 0; j < M; j++) The loop directive is not needed, but does allow
b[j] = 0;
the programmer to optimize the loops
}
themselves
OPENACC LOOP DIRECTIVE
Parallelizing loop nests
#pragma acc parallel loop You are able to include multiple loop directives
for(int i = 0; i < N; i++){ to parallelize multi-dimensional loop nests
#pragma acc loop
C/C++
for(int j = 0; j < M; j++){ On some parallel hardware, this will allow you to
a[i][j] = 0;
} express more levels of parallelism, and increase
} performance further
Other parallel hardware has difficulties
!$acc parallel loop expressing enough parallelism for multi-
do i = 1, N
dimensional loops
Fortran
!$acc loop
do j = 1, M
a(i,j) = 0 In this case, inner loop directives may be
end do ignored
end do
OPENACC KERNELS DIRECTIVE
OPENACC KERNELS DIRECTIVE
Compiler directed parallelization
CPU Parallel Hardware
The kernels directive instructs the compiler to
search for parallel loops in the code
The compiler will analyze the loops and parallelize
those it finds safe and profitable to do so
<sequential code>
The kernels directive can be applied to regions
#pragma acc kernels containing multiple loop nests
{
<for loop>
<for loop>
}
OPENACC KERNELS DIRECTIVE
Parallelizing a single loop
C/C++
#pragma acc kernels In this example, the kernels directive applies to
for(int i = 0; j < N; i++) the next for loop
a[i] = 0;
The compiler will take the loop, and attempt to
parallelize it on the parallel hardware
Fortran
!$acc kernels The compiler will also attempt to optimize the
do i = 1, N loop
a(i) = 0
end do If the compiler decides that the loop is not
!$acc end kernels parallelizable, it will not parallelize the loop
OPENACC KERNELS DIRECTIVE
Parallelizing many loops
#pragma acc kernels In this example, we mark a region of code with
{
for(int i = 0; i < N; i++) the kernels directive
C/C++
a[i] = 0;
The kernels region is defined by the curly
for(int j = 0; j < M; j++)
b[j] = 0; braces in C/C++, and the !$acc kernels and
} !$acc end kernels in Fortran
end do
Each loop can be parallelized/optimized in a
do j = 1, M different way
b(j) = 0
end do
!$acc end kernels
EXPRESSING PARALLELISM
Compiler generated parallelism
Kernels Parallel
Programmer based parallelization
Compiler decides
Programmer basedwhat to parallelize
optimizations Programmer decides what to parallelize
with direction from user and communicates that to the compiler
Programmer based restrictions
Compiler guarantees correctness Programmer guarantees correctness
Can cover multiple loop nests Must decorate each loop nest
COMPILING
$ pgcc –fast –acc –ta=multicore –Minfo=accel main.c
FEEDBACK
main:
7, Generating Multicore code
8, #pragma acc loop gang
COMPILING PARALLEL CODE (PGI)
CODE
7: #pragma acc kernels
8: for(int i = 0; i < N; i++)
9: a[i] = 0;
COMPILING
$ pgcc –fast –acc –ta=multicore –Minfo=accel main.c
FEEDBACK
main:
8, Loop is parallelizable
Generating Multicore code
8, #pragma acc loop gang
COMPILING PARALLEL CODE (PGI)
CODE
7: #pragma acc kernels
8: for(int i = 1; i < N; i++)
Non-parallel loop
9: a[i] = a[i-1] + a[i];
COMPILING
$ pgcc –fast –acc –ta=multicore –Minfo=accel main.c
FEEDBACK
main:
8, Loop carried dependence of a-> prevents parallelization
Loop carried backward dependence of a-> prevents vectorization
COMPILING PARALLEL CODE (PGI)
CODE
7: #pragma acc parallel loop
8: for(int i = 1; i < N; i++)
Non-parallel loop
9: a[i] = a[i-1] + a[i];
COMPILING
$ pgcc –fast –acc –ta=multicore –Minfo=accel main.c
FEEDBACK
main:
7, Generating Multicore code
8, #pragma acc loop gang
KEY CONCEPTS
By end of this module, you should now understand
The parallel, kernels, and loop directives
The key differences in functionality and use between the kernels and parallel
directives
When and where to include loop directives
How the parallel and kernel directives conceptually generate parallelism
THANK YOU
OPENACC RESOURCES
Guides ● Talks ● Tutorials ● Videos ● Books ● Spec ● Code Samples ● Teaching Materials ● Events ● Success Stories ● Courses ● Slack ● Stack Overflow
FREE
Compilers
https://www.openacc.org/community#slack