0% found this document useful (0 votes)
8 views

Module3

Uploaded by

singhguma86
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Module3

Uploaded by

singhguma86
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

MODULE THREE:

OPENACC DIRECTIVES
Dr. Volker Weinberg | LRZ
MODULE OVERVIEW
OpenACC Directives

 The parallel directive


 The kernels directive
 The loop directive
 Fundamental differences between the kernels and parallel directive
 Expressing parallelism in OpenACC
OPENACC SYNTAX
OPENACC SYNTAX
Syntax for using OpenACC directives in code
C/C++ Fortran
#pragma acc directive clauses !$acc directive clauses
<code> <code>

 A pragma in C/C++ gives instructions to the compiler on how to compile the code.
Compilers that do not understand a particular pragma can freely ignore it.
 A directive in Fortran is a specially formatted comment that likewise instructions the
compiler in it compilation of the code and can be freely ignored.
 “acc” informs the compiler that what will come is an OpenACC directive
 Directives are commands in OpenACC for altering our code.
 Clauses are specifiers or additions to directives.
OPENACC PARALLEL DIRECTIVE
OPENACC PARALLEL DIRECTIVE
Explicit programming
Parallel Hardware
 The parallel directive instructs the compiler to
CPU create parallel gangs on the accelerator
 Gangs are independent groups of worker
threads on the accelerator
 The code contained within a parallel directive
<sequential code> is executed redundantly by all parallel gangs

#pragma acc parallel


{
<sequential code>
}
OPENACC PARALLEL DIRECTIVE
Expressing parallelism

#pragma acc parallel


{ gang gang

When encountering the


parallel directive, the
compiler will generate gang gang
1 or more parallel
gangs, which execute
redundantly.
} gang gang
OPENACC PARALLEL DIRECTIVE
Expressing parallelism

loop
loop
#pragma acc parallel
{ gang gang

loop

loop
loop
for(int i = 0; i < N; i++)
{ gang gang
// Do Something
}

loop
loop
This loop will be gang gang
}
executed redundantly
on each gang
OPENACC PARALLEL DIRECTIVE
Expressing parallelism

#pragma acc parallel


{ gang gang

for(int i = 0; i < N; i++)


{ gang gang
// Do Something
}

This means that each gang gang


}
gang will execute the
entire loop
OPENACC PARALLEL DIRECTIVE
Parallelizing a single loop
C/C++
#pragma acc parallel
 Use a parallel directive to mark a region of
{
#pragma acc loop code where you want parallel execution to occur
for(int i = 0; i < N; i++)
a[i] = 0;  This parallel region is marked by curly braces in
} C/C++ or a start and end directive in Fortran

Fortran  The loop directive is used to instruct the


!$acc parallel
compiler to parallelize the iterations of the next
!$acc loop loop to run across the parallel gangs
do i = 1, N
a(i) = 0
end do
!$acc end parallel
OPENACC PARALLEL DIRECTIVE
Parallelizing a single loop

C/C++  This pattern is so common that you can do all of


#pragma acc parallel loop this in a single line of code
for(int i = 0; i < N; i++)
a[i] = 0;  In this example, the parallel loop directive
applies to the next loop
 This directive both marks the region for parallel
Fortran execution and distributes the iterations of the
loop.
!$acc parallel loop
do i = 1, N  When applied to a loop with a data dependency,
a(i) = 0
end do parallel loop may produce incorrect results
OPENACC PARALLEL DIRECTIVE
Expressing parallelism

#pragma acc parallel


{

#pragma acc loop


for(int i = 0; i < N; i++)
{
// Do Something
}
The loop directive
informs the compiler
} which loops to
parallelize.
OPENACC PARALLEL DIRECTIVE
Parallelizing many loops
 To parallelize multiple loops, each loop should
be accompanied by a parallel directive
#pragma acc parallel loop  Each parallel loop can have different loop
for(int i = 0; i < N; i++)
a[i] = 0;
boundaries and loop optimizations

#pragma acc parallel loop


 Each parallel loop can be parallelized in a
for(int j = 0; j < M; j++) different way
b[j] = 0;
 This is the recommended way to parallelize
multiple loops. Attempting to parallelize multiple
loops within the same parallel region may give
performance issues or unexpected results
OPENACC LOOP DIRECTIVE
OPENACC LOOP DIRECTIVE
Expressing parallelism

 Mark a single for loop for parallelization C/C++


 Allows the programmer to give additional #pragma acc loop
information and/or optimizations about the for(int i = 0; i < N; i++)
// Do something
loop
 Provides many different ways to describe the Fortran
type of parallelism to apply to the loop
!$acc loop
 Must be contained within an OpenACC do i = 1, N
! Do something
compute region (either a kernels or a parallel
region) to parallelize loops
OPENACC LOOP DIRECTIVE
Inside of a parallel compute region

#pragma acc parallel  In this example, the first loop is not marked with
{ the loop directive
for(int i = 0; i < N; i++)
a[i] = 0;  This means that the loop will be “redundantly
parallelized”
#pragma acc loop
for(int j = 0; j < N; j++)  Redundant parallelization, in this case, means
a[j]++; that the loop will be run in its entirety, multiple
} times, by the parallel hardware
 The second loop is marked with the loop
directive, meaning that the loop iterations will be
properly split across the parallel hardware
OPENACC LOOP DIRECTIVE
Inside of a kernels compute region

#pragma acc kernels  With the kernels directive, the loop directive is
{ implied
#pragma acc loop
for(int i = 0; i < N; i++)  The programmer can still explicitly define loops
a[i] = 0; with the loop directive, however this could affect
the optimizations the compiler makes
#pragma acc loop
for(int j = 0; j < M; j++)  The loop directive is not needed, but does allow
b[j] = 0;
the programmer to optimize the loops
}
themselves
OPENACC LOOP DIRECTIVE
Parallelizing loop nests
#pragma acc parallel loop  You are able to include multiple loop directives
for(int i = 0; i < N; i++){ to parallelize multi-dimensional loop nests
#pragma acc loop
C/C++

for(int j = 0; j < M; j++){  On some parallel hardware, this will allow you to
a[i][j] = 0;
} express more levels of parallelism, and increase
} performance further
 Other parallel hardware has difficulties
!$acc parallel loop expressing enough parallelism for multi-
do i = 1, N
dimensional loops
Fortran

!$acc loop
do j = 1, M
a(i,j) = 0  In this case, inner loop directives may be
end do ignored
end do
OPENACC KERNELS DIRECTIVE
OPENACC KERNELS DIRECTIVE
Compiler directed parallelization
CPU Parallel Hardware
 The kernels directive instructs the compiler to
search for parallel loops in the code
 The compiler will analyze the loops and parallelize
those it finds safe and profitable to do so
<sequential code>
 The kernels directive can be applied to regions
#pragma acc kernels containing multiple loop nests
{
<for loop>

<for loop>
}
OPENACC KERNELS DIRECTIVE
Parallelizing a single loop

C/C++
#pragma acc kernels  In this example, the kernels directive applies to
for(int i = 0; j < N; i++) the next for loop
a[i] = 0;
 The compiler will take the loop, and attempt to
parallelize it on the parallel hardware
Fortran
!$acc kernels  The compiler will also attempt to optimize the
do i = 1, N loop
a(i) = 0
end do  If the compiler decides that the loop is not
!$acc end kernels parallelizable, it will not parallelize the loop
OPENACC KERNELS DIRECTIVE
Parallelizing many loops
#pragma acc kernels  In this example, we mark a region of code with
{
for(int i = 0; i < N; i++) the kernels directive
C/C++

a[i] = 0;
 The kernels region is defined by the curly
for(int j = 0; j < M; j++)
b[j] = 0; braces in C/C++, and the !$acc kernels and
} !$acc end kernels in Fortran

!$acc kernels  The compiler will attempt to parallelize all loops


do i = 1, N within the kernels region
a(i) = 0
Fortran

end do
 Each loop can be parallelized/optimized in a
do j = 1, M different way
b(j) = 0
end do
!$acc end kernels
EXPRESSING PARALLELISM
Compiler generated parallelism

#pragma acc kernels


{

for(int i = 0; i < N; i++)


{
// Do Something
}

for(int i = 0; i < M; i++)


{
// Do Something Else
}
With the kernels
} directive, the loop
directive is implied.
EXPRESSING PARALLELISM
Compiler generated parallelism
Each loop can have a different number of
#pragma acc kernels gangs, and those gangs can be
{ organized/optimized completely differently.
for(int i = 0; i < N; i++)
{
// Do Something
}

for(int i = 0; i < M; i++)


{
// Do Something Else
}
This process can happen
} multiple times within the
kernels region.
OPENACC KERNELS DIRECTIVE
Fortran array syntax

!$acc kernels  One advantage that the kernels directive has


a(:) = 1
b(:) = 2 over the parallel directive is Fortran array syntax
c(:) = a(:) + b(:)
!$acc end kernels
 The parallel directive must be paired with the
loop directive, and the loop directive does not
recognize the array syntax as a loop
!$acc parallel loop  The kernels directive can correctly parallelize
c(:) = a(:) + b(:)
the array syntax
KERNELS VS PARALLEL

Kernels Parallel
Programmer based parallelization
 Compiler decides
Programmer basedwhat to parallelize
optimizations  Programmer decides what to parallelize
with direction from user and communicates that to the compiler
Programmer based restrictions
 Compiler guarantees correctness  Programmer guarantees correctness
 Can cover multiple loop nests  Must decorate each loop nest

When fully optimized, both will give similar performance.


COMPILING PARALLEL CODE
COMPILING PARALLEL CODE (PGI)
CODE
7: #pragma acc parallel loop
8: for(int i = 0; i < N; i++)
9: a[i] = 0;

COMPILING
$ pgcc –fast –acc –ta=multicore –Minfo=accel main.c

FEEDBACK
main:
7, Generating Multicore code
8, #pragma acc loop gang
COMPILING PARALLEL CODE (PGI)
CODE
7: #pragma acc kernels
8: for(int i = 0; i < N; i++)
9: a[i] = 0;

COMPILING
$ pgcc –fast –acc –ta=multicore –Minfo=accel main.c

FEEDBACK
main:
8, Loop is parallelizable
Generating Multicore code
8, #pragma acc loop gang
COMPILING PARALLEL CODE (PGI)
CODE
7: #pragma acc kernels
8: for(int i = 1; i < N; i++)
Non-parallel loop
9: a[i] = a[i-1] + a[i];

COMPILING
$ pgcc –fast –acc –ta=multicore –Minfo=accel main.c

FEEDBACK
main:
8, Loop carried dependence of a-> prevents parallelization
Loop carried backward dependence of a-> prevents vectorization
COMPILING PARALLEL CODE (PGI)
CODE
7: #pragma acc parallel loop
8: for(int i = 1; i < N; i++)
Non-parallel loop
9: a[i] = a[i-1] + a[i];

COMPILING
$ pgcc –fast –acc –ta=multicore –Minfo=accel main.c

FEEDBACK
main:
7, Generating Multicore code
8, #pragma acc loop gang
KEY CONCEPTS
By end of this module, you should now understand
 The parallel, kernels, and loop directives
 The key differences in functionality and use between the kernels and parallel
directives
 When and where to include loop directives
 How the parallel and kernel directives conceptually generate parallelism
THANK YOU
OPENACC RESOURCES
Guides ● Talks ● Tutorials ● Videos ● Books ● Spec ● Code Samples ● Teaching Materials ● Events ● Success Stories ● Courses ● Slack ● Stack Overflow

Resources Success Stories


https://www.openacc.org/resources https://www.openacc.org/success-stories

FREE
Compilers

Compilers and Tools Events


https://www.openacc.org/tools https://www.openacc.org/events

https://www.openacc.org/community#slack

You might also like