0% found this document useful (0 votes)

9 views34 pages

Module 3

Uploaded by

singhguma86

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views34 pages

Module 3

Uploaded by

singhguma86

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

MODULE THREE:

OPENACC DIRECTIVES
Dr. Volker Weinberg | LRZ
MODULE OVERVIEW
OpenACC Directives

 The parallel directive

 The kernels directive
 The loop directive
 Fundamental differences between the kernels and parallel directive
 Expressing parallelism in OpenACC
OPENACC SYNTAX
OPENACC SYNTAX
Syntax for using OpenACC directives in code
C/C++ Fortran
#pragma acc directive clauses !$acc directive clauses
<code> <code>

 A pragma in C/C++ gives instructions to the compiler on how to compile the code.
Compilers that do not understand a particular pragma can freely ignore it.
 A directive in Fortran is a specially formatted comment that likewise instructions the
compiler in it compilation of the code and can be freely ignored.
 “acc” informs the compiler that what will come is an OpenACC directive
 Directives are commands in OpenACC for altering our code.
 Clauses are specifiers or additions to directives.
OPENACC PARALLEL DIRECTIVE
OPENACC PARALLEL DIRECTIVE
Explicit programming
Parallel Hardware
 The parallel directive instructs the compiler to
CPU create parallel gangs on the accelerator
 Gangs are independent groups of worker
threads on the accelerator
 The code contained within a parallel directive
<sequential code> is executed redundantly by all parallel gangs

#pragma acc parallel

{
<sequential code>
}
OPENACC PARALLEL DIRECTIVE
Expressing parallelism

#pragma acc parallel

{ gang gang

When encountering the

parallel directive, the
compiler will generate gang gang
1 or more parallel
gangs, which execute
redundantly.
} gang gang
OPENACC PARALLEL DIRECTIVE
Expressing parallelism

loop
loop
#pragma acc parallel
{ gang gang

loop

loop
loop
for(int i = 0; i < N; i++)
{ gang gang
// Do Something
}

loop
loop
This loop will be gang gang
}
executed redundantly
on each gang
OPENACC PARALLEL DIRECTIVE
Expressing parallelism

#pragma acc parallel

{ gang gang

for(int i = 0; i < N; i++)

{ gang gang
// Do Something
}

This means that each gang gang

}
gang will execute the
entire loop
OPENACC PARALLEL DIRECTIVE
Parallelizing a single loop
C/C++
#pragma acc parallel
 Use a parallel directive to mark a region of
{
#pragma acc loop code where you want parallel execution to occur
for(int i = 0; i < N; i++)
a[i] = 0;  This parallel region is marked by curly braces in
} C/C++ or a start and end directive in Fortran

Fortran  The loop directive is used to instruct the

!$acc parallel
compiler to parallelize the iterations of the next
!$acc loop loop to run across the parallel gangs
do i = 1, N
a(i) = 0
end do
!$acc end parallel
OPENACC PARALLEL DIRECTIVE
Parallelizing a single loop

C/C++  This pattern is so common that you can do all of

#pragma acc parallel loop this in a single line of code
for(int i = 0; i < N; i++)
a[i] = 0;  In this example, the parallel loop directive
applies to the next loop
 This directive both marks the region for parallel
Fortran execution and distributes the iterations of the
loop.
!$acc parallel loop
do i = 1, N  When applied to a loop with a data dependency,
a(i) = 0
end do parallel loop may produce incorrect results
OPENACC PARALLEL DIRECTIVE
Expressing parallelism

#pragma acc parallel

{

#pragma acc loop

for(int i = 0; i < N; i++)
{
// Do Something
}
The loop directive
informs the compiler
} which loops to
parallelize.
OPENACC PARALLEL DIRECTIVE
Parallelizing many loops
 To parallelize multiple loops, each loop should
be accompanied by a parallel directive
#pragma acc parallel loop  Each parallel loop can have different loop
for(int i = 0; i < N; i++)
a[i] = 0;
boundaries and loop optimizations

#pragma acc parallel loop

 Each parallel loop can be parallelized in a
for(int j = 0; j < M; j++) different way
b[j] = 0;
 This is the recommended way to parallelize
multiple loops. Attempting to parallelize multiple
loops within the same parallel region may give
performance issues or unexpected results
OPENACC LOOP DIRECTIVE
OPENACC LOOP DIRECTIVE
Expressing parallelism

 Mark a single for loop for parallelization C/C++

 Allows the programmer to give additional #pragma acc loop
information and/or optimizations about the for(int i = 0; i < N; i++)
// Do something
loop
 Provides many different ways to describe the Fortran
type of parallelism to apply to the loop
!$acc loop
 Must be contained within an OpenACC do i = 1, N
! Do something
compute region (either a kernels or a parallel
region) to parallelize loops
OPENACC LOOP DIRECTIVE
Inside of a parallel compute region

#pragma acc parallel  In this example, the first loop is not marked with
{ the loop directive
for(int i = 0; i < N; i++)
a[i] = 0;  This means that the loop will be “redundantly
parallelized”
#pragma acc loop
for(int j = 0; j < N; j++)  Redundant parallelization, in this case, means
a[j]++; that the loop will be run in its entirety, multiple
} times, by the parallel hardware
 The second loop is marked with the loop
directive, meaning that the loop iterations will be
properly split across the parallel hardware
OPENACC LOOP DIRECTIVE
Inside of a kernels compute region

#pragma acc kernels  With the kernels directive, the loop directive is
{ implied
#pragma acc loop
for(int i = 0; i < N; i++)  The programmer can still explicitly define loops
a[i] = 0; with the loop directive, however this could affect
the optimizations the compiler makes
#pragma acc loop
for(int j = 0; j < M; j++)  The loop directive is not needed, but does allow
b[j] = 0;
the programmer to optimize the loops
}
themselves
OPENACC LOOP DIRECTIVE
Parallelizing loop nests
#pragma acc parallel loop  You are able to include multiple loop directives
for(int i = 0; i < N; i++){ to parallelize multi-dimensional loop nests
#pragma acc loop
C/C++

for(int j = 0; j < M; j++){  On some parallel hardware, this will allow you to
a[i][j] = 0;
} express more levels of parallelism, and increase
} performance further
 Other parallel hardware has difficulties
!$acc parallel loop expressing enough parallelism for multi-
do i = 1, N
dimensional loops
Fortran

!$acc loop
do j = 1, M
a(i,j) = 0  In this case, inner loop directives may be
end do ignored
end do
OPENACC KERNELS DIRECTIVE
OPENACC KERNELS DIRECTIVE
Compiler directed parallelization
CPU Parallel Hardware
 The kernels directive instructs the compiler to
search for parallel loops in the code
 The compiler will analyze the loops and parallelize
those it finds safe and profitable to do so
<sequential code>
 The kernels directive can be applied to regions
#pragma acc kernels containing multiple loop nests
{
<for loop>

<for loop>
}
OPENACC KERNELS DIRECTIVE
Parallelizing a single loop

C/C++
#pragma acc kernels  In this example, the kernels directive applies to
for(int i = 0; j < N; i++) the next for loop
a[i] = 0;
 The compiler will take the loop, and attempt to
parallelize it on the parallel hardware
Fortran
!$acc kernels  The compiler will also attempt to optimize the
do i = 1, N loop
a(i) = 0
end do  If the compiler decides that the loop is not
!$acc end kernels parallelizable, it will not parallelize the loop
OPENACC KERNELS DIRECTIVE
Parallelizing many loops
#pragma acc kernels  In this example, we mark a region of code with
{
for(int i = 0; i < N; i++) the kernels directive
C/C++

a[i] = 0;
 The kernels region is defined by the curly
for(int j = 0; j < M; j++)
b[j] = 0; braces in C/C++, and the !$acc kernels and
} !$acc end kernels in Fortran

!$acc kernels  The compiler will attempt to parallelize all loops

do i = 1, N within the kernels region
a(i) = 0
Fortran

end do
 Each loop can be parallelized/optimized in a
do j = 1, M different way
b(j) = 0
end do
!$acc end kernels
EXPRESSING PARALLELISM
Compiler generated parallelism

#pragma acc kernels

{

for(int i = 0; i < N; i++)

{
// Do Something
}

for(int i = 0; i < M; i++)

{
// Do Something Else
}
With the kernels
} directive, the loop
directive is implied.
EXPRESSING PARALLELISM
Compiler generated parallelism
Each loop can have a different number of
#pragma acc kernels gangs, and those gangs can be
{ organized/optimized completely differently.
for(int i = 0; i < N; i++)
{
// Do Something
}

for(int i = 0; i < M; i++)

{
// Do Something Else
}
This process can happen
} multiple times within the
kernels region.
OPENACC KERNELS DIRECTIVE
Fortran array syntax

!$acc kernels  One advantage that the kernels directive has

a(:) = 1
b(:) = 2 over the parallel directive is Fortran array syntax
c(:) = a(:) + b(:)
!$acc end kernels
 The parallel directive must be paired with the
loop directive, and the loop directive does not
recognize the array syntax as a loop
!$acc parallel loop  The kernels directive can correctly parallelize
c(:) = a(:) + b(:)
the array syntax
KERNELS VS PARALLEL

Kernels Parallel
Programmer based parallelization
 Compiler decides
Programmer basedwhat to parallelize
optimizations  Programmer decides what to parallelize
with direction from user and communicates that to the compiler
Programmer based restrictions
 Compiler guarantees correctness  Programmer guarantees correctness
 Can cover multiple loop nests  Must decorate each loop nest

When fully optimized, both will give similar performance.

COMPILING PARALLEL CODE
COMPILING PARALLEL CODE (PGI)
CODE
7: #pragma acc parallel loop
8: for(int i = 0; i < N; i++)
9: a[i] = 0;

COMPILING
$ pgcc –fast –acc –ta=multicore –Minfo=accel main.c

FEEDBACK
main:
7, Generating Multicore code
8, #pragma acc loop gang
COMPILING PARALLEL CODE (PGI)
CODE
7: #pragma acc kernels
8: for(int i = 0; i < N; i++)
9: a[i] = 0;

COMPILING
$ pgcc –fast –acc –ta=multicore –Minfo=accel main.c

FEEDBACK
main:
8, Loop is parallelizable
Generating Multicore code
8, #pragma acc loop gang
COMPILING PARALLEL CODE (PGI)
CODE
7: #pragma acc kernels
8: for(int i = 1; i < N; i++)
Non-parallel loop
9: a[i] = a[i-1] + a[i];

COMPILING
$ pgcc –fast –acc –ta=multicore –Minfo=accel main.c

FEEDBACK
main:
8, Loop carried dependence of a-> prevents parallelization
Loop carried backward dependence of a-> prevents vectorization
COMPILING PARALLEL CODE (PGI)
CODE
7: #pragma acc parallel loop
8: for(int i = 1; i < N; i++)
Non-parallel loop
9: a[i] = a[i-1] + a[i];

COMPILING
$ pgcc –fast –acc –ta=multicore –Minfo=accel main.c

FEEDBACK
main:
7, Generating Multicore code
8, #pragma acc loop gang
KEY CONCEPTS
By end of this module, you should now understand
 The parallel, kernels, and loop directives
 The key differences in functionality and use between the kernels and parallel
directives
 When and where to include loop directives
 How the parallel and kernel directives conceptually generate parallelism
THANK YOU
OPENACC RESOURCES
Guides ● Talks ● Tutorials ● Videos ● Books ● Spec ● Code Samples ● Teaching Materials ● Events ● Success Stories ● Courses ● Slack ● Stack Overflow

Resources Success Stories

https://www.openacc.org/resources https://www.openacc.org/success-stories

FREE
Compilers

Compilers and Tools Events

https://www.openacc.org/tools https://www.openacc.org/events

https://www.openacc.org/community#slack

15 Chinese Diesel Heater Problems + Troubleshooting & Error Codes
100% (2)
15 Chinese Diesel Heater Problems + Troubleshooting & Error Codes
15 pages
26 32 13.13 Diesel Engine Generator
100% (2)
26 32 13.13 Diesel Engine Generator
36 pages
Final - District Cooling System Design MENA - July2021
100% (1)
Final - District Cooling System Design MENA - July2021
220 pages
OpenACC Fundamentals
No ratings yet
OpenACC Fundamentals
38 pages
Shelf Life Expiration Date (SLED/BBD) Entered During Goods Movement Posting Is Not Updated To Batch Master - SAP ERP & SAP S/4 HANA
No ratings yet
Shelf Life Expiration Date (SLED/BBD) Entered During Goods Movement Posting Is Not Updated To Batch Master - SAP ERP & SAP S/4 HANA
2 pages
Flexsim Simulation Tutorial
No ratings yet
Flexsim Simulation Tutorial
58 pages
The Unofficial Guide To Using SharpCap v0.1
No ratings yet
The Unofficial Guide To Using SharpCap v0.1
13 pages
Winch Turn Sensor Adjustment at LICCON 2
No ratings yet
Winch Turn Sensor Adjustment at LICCON 2
37 pages
Unit3 Mosfet Diode Transistor
No ratings yet
Unit3 Mosfet Diode Transistor
45 pages
Paging Deletion and Control Channels
100% (1)
Paging Deletion and Control Channels
1 page
CSC-334 - P&DC - Lab Manual - V2.0
No ratings yet
CSC-334 - P&DC - Lab Manual - V2.0
102 pages
Flowchart For Computer Science
No ratings yet
Flowchart For Computer Science
6 pages
Proj Chap 2
100% (2)
Proj Chap 2
10 pages
Openmp
No ratings yet
Openmp
115 pages
Unit 3
No ratings yet
Unit 3
88 pages
Windows Server 2008 R2 and Windows 7 Group Policy Settings
No ratings yet
Windows Server 2008 R2 and Windows 7 Group Policy Settings
354 pages
2 TypesofParallelism
No ratings yet
2 TypesofParallelism
69 pages
MINDMAP
No ratings yet
MINDMAP
1 page
Stakeholder Analysis Matrix Template - Tools4dev
No ratings yet
Stakeholder Analysis Matrix Template - Tools4dev
4 pages
Shared Memory and Accelerators
No ratings yet
Shared Memory and Accelerators
88 pages
Parallel Programming Unit 2
No ratings yet
Parallel Programming Unit 2
71 pages
OpenMP Performance Consideration
No ratings yet
OpenMP Performance Consideration
49 pages
Introduction To OpenACC Course 20161102 1530 1
No ratings yet
Introduction To OpenACC Course 20161102 1530 1
64 pages
PDC Lecture 04
No ratings yet
PDC Lecture 04
44 pages
Chapter Three Parallel Computing
No ratings yet
Chapter Three Parallel Computing
44 pages
Introduction To OpenACC Course 20161026 1550 1
No ratings yet
Introduction To OpenACC Course 20161026 1550 1
68 pages
Openmp HPC Ass1
No ratings yet
Openmp HPC Ass1
43 pages
S3076 Getting Started With OpenACC
No ratings yet
S3076 Getting Started With OpenACC
58 pages
Openmp
No ratings yet
Openmp
61 pages
CP4292 Multicore Architecture Lab Manual
No ratings yet
CP4292 Multicore Architecture Lab Manual
36 pages
OpenACC 2017spring
No ratings yet
OpenACC 2017spring
46 pages
Program and Network Properties 2.1 Conditions of Parallelism 2.2 Program Partitioning and Scheduling
No ratings yet
Program and Network Properties 2.1 Conditions of Parallelism 2.2 Program Partitioning and Scheduling
47 pages
c3 Dependence Analysis p1
No ratings yet
c3 Dependence Analysis p1
32 pages
MPC LAB Manual New
No ratings yet
MPC LAB Manual New
24 pages
Module 6
No ratings yet
Module 6
20 pages
Parallel Programming Module 3
No ratings yet
Parallel Programming Module 3
44 pages
Openmp Boston
No ratings yet
Openmp Boston
90 pages
14-Parallelization and Automatic Parallelization-08!11!2024
No ratings yet
14-Parallelization and Automatic Parallelization-08!11!2024
50 pages
OpenACC 1
No ratings yet
OpenACC 1
44 pages
Work Replication With Parallel Region: #Pragma Omp Parallel For (For (J 0 J 10 J++) Printf ("Hello/n") )
No ratings yet
Work Replication With Parallel Region: #Pragma Omp Parallel For (For (J 0 J 10 J++) Printf ("Hello/n") )
19 pages
W8L2 OpenMP6 Furthertopics
No ratings yet
W8L2 OpenMP6 Furthertopics
20 pages
Catalog Company PDF
No ratings yet
Catalog Company PDF
3 pages
W7L2 OpenMP4 Worksharing
No ratings yet
W7L2 OpenMP4 Worksharing
26 pages
Chap4 OpenMP
No ratings yet
Chap4 OpenMP
35 pages
TR PY TX100 S3 CentOS PDF
No ratings yet
TR PY TX100 S3 CentOS PDF
17 pages
Excelente
No ratings yet
Excelente
64 pages
OpenACC Programming Guide 0 0
No ratings yet
OpenACC Programming Guide 0 0
73 pages
Unit III
No ratings yet
Unit III
15 pages
How To Parallelise An Application
No ratings yet
How To Parallelise An Application
30 pages
10 OpenMP-2
No ratings yet
10 OpenMP-2
25 pages
Untitled Document
No ratings yet
Untitled Document
23 pages
Lecture Open MP
No ratings yet
Lecture Open MP
25 pages
Introduction To Open MP
No ratings yet
Introduction To Open MP
42 pages
DM542E
No ratings yet
DM542E
14 pages
Lab Manual
No ratings yet
Lab Manual
31 pages
OpenACC 3
No ratings yet
OpenACC 3
23 pages
GS-TD1: Full HD 3D Camcorder
No ratings yet
GS-TD1: Full HD 3D Camcorder
8 pages
Programming Shared-Memory Platforms With Openmp: John Mellor-Crummey
No ratings yet
Programming Shared-Memory Platforms With Openmp: John Mellor-Crummey
46 pages
Parallel Computing and Openmp Tutorial: Shao-Ching Huang
No ratings yet
Parallel Computing and Openmp Tutorial: Shao-Ching Huang
58 pages
Shared Memory: Openmp Environment and Synchronization
No ratings yet
Shared Memory: Openmp Environment and Synchronization
32 pages
Azizul Azri Bin Mustaffa - PEC12-60
No ratings yet
Azizul Azri Bin Mustaffa - PEC12-60
36 pages
ParallelIzation Principles
No ratings yet
ParallelIzation Principles
40 pages
Lab # 2 by Akram
No ratings yet
Lab # 2 by Akram
14 pages
Openmp Lab: Antonio Gómez-Iglesias Agomez@Tacc - Utexas.Edu Texas Advanced Computing Center
No ratings yet
Openmp Lab: Antonio Gómez-Iglesias Agomez@Tacc - Utexas.Edu Texas Advanced Computing Center
17 pages
Ex - Lab5 - EN
No ratings yet
Ex - Lab5 - EN
2 pages
Unit 3 Parallel Programming: Structure Nos
No ratings yet
Unit 3 Parallel Programming: Structure Nos
26 pages
365 Migration Step by Step
No ratings yet
365 Migration Step by Step
18 pages
Exams Questions On Computing and Rme
No ratings yet
Exams Questions On Computing and Rme
10 pages
Lab+Project Evaluation Schedule
No ratings yet
Lab+Project Evaluation Schedule
1 page
Catálogo SP8
No ratings yet
Catálogo SP8
4 pages
Digital Assignment
No ratings yet
Digital Assignment
2 pages
Work Replication With Parallel Region: #Pragma Omp Parallel For (For (J 0 J 10 J++) Printf ("Hello/n") )
No ratings yet
Work Replication With Parallel Region: #Pragma Omp Parallel For (For (J 0 J 10 J++) Printf ("Hello/n") )
19 pages
Openmp: Openmp Adds Constructs For Shared-Memory
No ratings yet
Openmp: Openmp Adds Constructs For Shared-Memory
15 pages
Coursera Lecture 11.1 OpenACC Intro
No ratings yet
Coursera Lecture 11.1 OpenACC Intro
11 pages
OpenMP 2
No ratings yet
OpenMP 2
3 pages
Consumer Perception and Expectation Towards Online Shopping - A Study With Reference To Chennai City Questionnaire Part-I
No ratings yet
Consumer Perception and Expectation Towards Online Shopping - A Study With Reference To Chennai City Questionnaire Part-I
3 pages
Laser Inatra
No ratings yet
Laser Inatra
1 page
3000 Printer J310 Series: Deskjet
No ratings yet
3000 Printer J310 Series: Deskjet
8 pages
Xe 62011 Open MP
No ratings yet
Xe 62011 Open MP
46 pages
Concurrent and Parallel Programming Unit V-Notes Unit V Openmp, Opencl, Cilk++, Intel TBB, Cuda 5.1 Openmp
No ratings yet
Concurrent and Parallel Programming Unit V-Notes Unit V Openmp, Opencl, Cilk++, Intel TBB, Cuda 5.1 Openmp
10 pages
Group 10 Engine Control System: 1. Cpu Controller Mounting
No ratings yet
Group 10 Engine Control System: 1. Cpu Controller Mounting
7 pages
Parallel Programming: Aaron Bloomfield CS 415 Fall 2005
No ratings yet
Parallel Programming: Aaron Bloomfield CS 415 Fall 2005
24 pages
Programming Assignment: On Openmp
No ratings yet
Programming Assignment: On Openmp
19 pages
Rexroth D&C For Wind Turbines
No ratings yet
Rexroth D&C For Wind Turbines
16 pages
AWS SolutionsArchitect-Associate Version 4.0 PDF
No ratings yet
AWS SolutionsArchitect-Associate Version 4.0 PDF
8 pages
Photoluminescence Imaging Speeds Solar Cell Inspection
No ratings yet
Photoluminescence Imaging Speeds Solar Cell Inspection
7 pages
FortiGate 200B PDF
No ratings yet
FortiGate 200B PDF
2 pages

Module 3

Uploaded by

Module 3

Uploaded by

MODULE THREE:

 The parallel directive

#pragma acc parallel

#pragma acc parallel

When encountering the

#pragma acc parallel

for(int i = 0; i < N; i++)

This means that each gang gang

Fortran  The loop directive is used to instruct the

C/C++  This pattern is so common that you can do all of

#pragma acc parallel

#pragma acc loop

#pragma acc parallel loop

 Mark a single for loop for parallelization C/C++

!$acc kernels  The compiler will attempt to parallelize all loops

#pragma acc kernels

for(int i = 0; i < N; i++)

for(int i = 0; i < M; i++)

for(int i = 0; i < M; i++)

!$acc kernels  One advantage that the kernels directive has

When fully optimized, both will give similar performance.

Resources Success Stories

Compilers and Tools Events

You might also like