0% found this document useful (0 votes)

13 views30 pages

Module5

Uploaded by

singhguma86

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views30 pages

Module5

Uploaded by

singhguma86

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

MODULE FIVE:

DATA MANAGEMENT
Dr. Volker Weinberg | LRZ
MODULE OVERVIEW
OpenACC Data Management

 Explicit Data Management

 OpenACC Data Regions and Clauses
 Unstructured Data Lifetimes
 Data Synchronization
EXPLICIT MEMORY MANAGEMENT
EXPLICIT MEMORY MANAGEMENT
Requirements

 Data must be visible on the device when Host

we run our parallel code
Device
 Data must be visible on the host when we
run our sequential code
 When the host and device don’t share
memory, data movement must occur
 To maximize performance, the Host
programmer should avoid all unnecessary Memory
data transfers Device
Memory
EXPLICIT MEMORY MANAGEMENT
Key problems

 Many parallel accelerators (such as Host

devices) have a separate memory space
from the host Device
 These separate memories can become
out-of-sync and contain completely
different data
 Transferring between these two memories
can be a very time consuming process Host
Memory
Device
Memory
OPENACC DATA DIRECTIVE
OPENACC DATA DIRECTIVE
Definition

 The data directive defines a lifetime #pragma acc data clauses

for data on the device {
 During the region data should be < Sequential and/or Parallel code >
thought of as residing on the
accelerator }
 Data clauses allow the programmer
to control the allocation and !$acc data clauses
movement of data
< Sequential and/or Parallel code >

!$acc end data

DATA CLAUSES
copy( list ) Allocates memory on device and copies data from host to device
when entering region and copies data to the host when exiting region.

Principal use: For many important data structures in your code, this is a
logical default to input, modify and return the data.

copyin( list ) Allocates memory on device and copies data from host to device
when entering region.

Principal use: Think of this like an array that you would use as just an
input to a subroutine.

copyout( list ) Allocates memory on device and copies data to the host when exiting
region.

Principal use: A result that isn’t overwriting the input data structure.

create( list ) Allocates memory on device but does not copy.

Principal use: Temporary arrays.

IMPLIED DATA REGIONS
IMPLIED DATA REGIONS
Definition
 Every kernels and parallel region has
an implicit data region surrounding it
 This allows data to exist solely for the
#pragma acc kernels copyin(a[0:100])
duration of the region {
for( int i = 0; i < 100; i++ )
 All data clauses usable on a data {
directive can be used on a parallel and a[i] = 0;
kernels as well }
}
IMPLIED DATA REGIONS
Explicit vs Implicit Data Regions

Explicit Implicit
#pragma acc data copyin(a[0:100])
{
#pragma acc kernels #pragma acc kernels copyin(a[0:100])
{ {
for( int i = 0; i < 100; i++ ) for( int i = 0; i < 100; i++ )
{ {
a[i] = 0; a[i] = 0;
} }
} }
}

These two codes are functionally the same.

EXPLICIT VS. IMPLICIT DATA REGIONS
Limitation

Explicit 1 Data Copy Implicit 2 Data Copies

#pragma acc data copyout(a[0:100])
{

#pragma acc kernels #pragma acc kernels copyout(a[0:100])

{ {
a[i] = i; a[i] = i;
} }

#pragma acc kernels #pragma acc kernels copy(a[0:100])

{ {
a[i] = 2 * a[i]; a[i] = 2 * a[i];
} }

The code on the left will perform better than the code on the right.
UNSTRUCTURED DATA DIRECTIVES
UNSTRUCTURED DATA DIRECTIVES
Enter Data Directive
 Data lifetimes aren’t always neatly
structured. #pragma acc enter data clauses

 The enter data directive handles device < Sequential and/or Parallel code >
memory allocation
#pragma acc exit data clauses
 You may use either the create or the
copyin clause for memory allocation
 The enter data directive is not the start !$acc enter data clauses
of a data region, because you may
have multiple enter data directives < Sequential and/or Parallel code >

!$acc exit data clauses

UNSTRUCTURED DATA DIRECTIVES
Exit Data Directive
 The exit data directive handles device
memory deallocation #pragma acc enter data clauses
 You may use either the delete or the < Sequential and/or Parallel code >
copyout clause for memory deallocation
#pragma acc exit data clauses
 You should have as many exit data for a
given array as enter data
 These can exist in different functions !$acc enter data clauses

< Sequential and/or Parallel code >

!$acc exit data clauses

UNSTRUCTURED DATA CLAUSES

copyin ( list ) Allocates memory on device and copies data from host to device
on enter data.
copyout ( list ) Allocates memory on device and copies data back to the host on
exit data.
create ( list ) Allocates memory on device without data transfer on enter data.
delete ( list ) Deallocates memory on device without data transfer on exit data
UNSTRUCTURED DATA DIRECTIVES
Basic Example

#pragma acc parallel loop

for(int i = 0; i < N; i++){
c[i] = a[i] + b[i];
}
UNSTRUCTURED DATA DIRECTIVES
Basic Example

#pragma acc enter data copyin(a[0:N],b[0:N]) create(c[0:N])

#pragma acc parallel loop

for(int i = 0; i < N; i++){
c[i] = a[i] + b[i];
}

#pragma acc exit data copyout(c[0:N])

UNSTRUCTURED DATA DIRECTIVES
Basic Example

#pragma acc enter data copyin(a[0:N],b[0:N]) create(c[0:N])

Action
#pragma acc parallel loop
for(int i = 0; i < N; i++){ Copy C B
A
Execute
Deallocate
Allocateloop
C
B
AC
c[i] = a[i] + b[i]; from
from
on
} device
CPU toto
device
device
CPU
#pragma acc exit data copyout(c[0:N])
CPU MEMORY device MEMORY

C’ A B C’
A B C
UNSTRUCTURED DATA DIRECTIVES
Basic Example – proper memory deallocation

#pragma acc enter data copyin(a[0:N],b[0:N]) create(c[0:N])

Action
#pragma acc parallel loop
for(int i = 0; i < N; i++){
Deallocate A
B
c[i] = a[i] + b[i]; from
} device

#pragma acc exit data copyout(c[0:N]) delete(a,b)

CPU MEMORY device MEMORY

C’ A B
A B C
UNSTRUCTURED VS STRUCTURED
With a simple code
Unstructured Structured
 Can have multiple starting/ending points  Must have explicit start/end points
 Can branch across multiple functions  Must be within a single function
 Memory exists until explicitly deallocated  Memory only exists within the data region
#pragma acc enter data copyin(a[0:N],b[0:N]) \ #pragma acc data copyin(a[0:N],b[0:N]) \
create(c[0:N]) copyout(c[0:N])
{
#pragma acc parallel loop #pragma acc parallel loop
for(int i = 0; i < N; i++){ for(int i = 0; i < N; i++){
c[i] = a[i] + b[i]; c[i] = a[i] + b[i];
} }

#pragma acc exit data copyout(c[0:N]) \ }

delete(a,b)
UNSTRUCTURED DATA DIRECTIVES
Branching across multiple functions
int* allocate_array(int N){
int* ptr = (int *) malloc(N * sizeof(int));  In this example enter data and exit data are
#pragma acc enter data create(ptr[0:N])
return ptr; in different functions
}
 This allows the programmer to put device
void deallocate_array(int* ptr){
#pragma acc exit data delete(ptr) allocation/deallocation with the matching
free(ptr); host versions
}

int main(){  This pattern is particularly useful in C++,

int* a = allocate_array(100); where structured scopes may not be
#pragma acc kernels
{ possible.
a[0] = 0;
}
deallocate_array(a);
}
DATA SYNCHRONIZATION
OPENACC UPDATE DIRECTIVE
update: Explicitly transfers data between the host and the device
Useful when you want to synchronize data in the middle of a data region
Clauses:
self: makes host data agree with device data
device: makes device data agree with host data

#pragma acc update self(x[0:count])

#pragma acc update device(x[0:count])
C/C++
!$acc update self(x(1:end_index))
!$acc update device(x(1:end_index))
Fortran
OPENACC UPDATE DIRECTIVE
#pragma acc update device(A[0:N])

The data must exist on

A A*
A
both the CPU and device
CPU Memory device Memory
for the update directive
to work.

B*
B B*
#pragma acc update self(A[0:N])
SYNCHRONIZE DATA WITH UPDATE
int* allocate_array(int N){  Inside the initialize function we alter the
int* A=(int*) malloc(N*sizeof(int));
#pragma acc enter data create(A[0:N]) host copy of ‘A’
return A;
}  This means that after calling initialize the
host and device copy of ‘A’ are out-of-sync
void deallocate_array(int* A){
#pragma acc exit data delete(A)
free(A);
 We use the update directive with the
} device clause to update the device copy of
‘A’
void initialize_array(int* A, int N){
for(int i = 0; i < N; i++){  Without the update directive later compute
A[i] = i;
}
regions will use incorrect data.
#pragma acc update device(A[0:N])
}
COPYING DATA IN DATA REGIONS

#pragma acc enter data copyin(A[:mn],Anew[:mn])

#pragma acc parallel loop copy(A,Anew)
for( int j = 1; j < n-1; j++)
 But wouldn't this code now result in my arrays being copied twice, once by the `data`
region and then again by the `parallel loop`? In fact, the OpenACC runtime is smart
enough to handle exactly this case. Data will be copied _in_ only the first time its
encountered in a data clause and _out_ only the last time its encountered in a data
clause. This allows you to create fully-working directives within your functions and
then later _"hoist"_ the data movement to a higher level without changing your code
at all. This is part of incrementally accelerating your code to avoid incorrect results.
MODULE REVIEW
KEY CONCEPTS
In this module we discussed…
 Why explicit data management is necessary for best performance
 Structured and Unstructured Data Lifetimes
 Explicit and Implicit Data Regions
 The data, enter data, exit data, and update directives
 Data Clauses
LAB ASSIGNMENT
In this module’s lab you will…
 Update the code from the previous module to use explicit data
directives
 Analyze the different between using CUDA Managed Memory and
explicit data management in the lab code.

Rika Zarai - Moja Prirodna Medicina _ PDF
No ratings yet
Rika Zarai - Moja Prirodna Medicina _ PDF
289 pages
Assesment - Basic Python - MCQ - 40 Questions
No ratings yet
Assesment - Basic Python - MCQ - 40 Questions
9 pages
Op 150141 Elektrotehničar Za Razvoj Veb I Mobilnih Aplikacija
No ratings yet
Op 150141 Elektrotehničar Za Razvoj Veb I Mobilnih Aplikacija
312 pages
FUNDAMENTAL OF Programming
No ratings yet
FUNDAMENTAL OF Programming
9 pages
Log Cat 1657605944376
No ratings yet
Log Cat 1657605944376
1,565 pages
Unit - 1 Introduction to DSA
No ratings yet
Unit - 1 Introduction to DSA
15 pages
C Program - Bridge Course
No ratings yet
C Program - Bridge Course
21 pages
Introduction-to-OpenACC-Course-20161102-1530-1
No ratings yet
Introduction-to-OpenACC-Course-20161102-1530-1
64 pages
Discrete Structure 2
No ratings yet
Discrete Structure 2
225 pages
MODULE_1
No ratings yet
MODULE_1
157 pages
Revision C Ch1
No ratings yet
Revision C Ch1
72 pages
CS301 Lec01
No ratings yet
CS301 Lec01
50 pages
CS301 Lec01
No ratings yet
CS301 Lec01
50 pages
Data Structure
No ratings yet
Data Structure
68 pages
Practicals PPBE
No ratings yet
Practicals PPBE
119 pages
Lecture 06
No ratings yet
Lecture 06
70 pages
ICS 143 - Principles of Operating Systems
No ratings yet
ICS 143 - Principles of Operating Systems
52 pages
Dfc20303 - Chapter 4 (Array, Pointer Structure) (1)
No ratings yet
Dfc20303 - Chapter 4 (Array, Pointer Structure) (1)
72 pages
MTS3023 Data Structure: Arrays, Pointers and Struct
No ratings yet
MTS3023 Data Structure: Arrays, Pointers and Struct
39 pages
Report_APTIV_TCI_-_Assessment_on_C___-_2_to_5_Yrs_Exp_it.rasnika5_gmail.com
No ratings yet
Report_APTIV_TCI_-_Assessment_on_C___-_2_to_5_Yrs_Exp_it.rasnika5_gmail.com
40 pages
Chapter One: Fundamentals of Data Structure
No ratings yet
Chapter One: Fundamentals of Data Structure
30 pages
Introduction
No ratings yet
Introduction
37 pages
AEP CS2 DynamicMemoryAllocation
No ratings yet
AEP CS2 DynamicMemoryAllocation
64 pages
Pointers, Dynamic Data, and Reference Types
100% (1)
Pointers, Dynamic Data, and Reference Types
26 pages
Module4
No ratings yet
Module4
40 pages
Lec1 2
No ratings yet
Lec1 2
21 pages
(Abstract Data Types Using Arrays) : Fast NUCES - Department of Computer Science
No ratings yet
(Abstract Data Types Using Arrays) : Fast NUCES - Department of Computer Science
11 pages
Data Structure Week 1-Overview of Data Structures
No ratings yet
Data Structure Week 1-Overview of Data Structures
40 pages
Dbss
No ratings yet
Dbss
44 pages
2. Pointers and Memory Allocation
No ratings yet
2. Pointers and Memory Allocation
29 pages
CS301 Lec01
No ratings yet
CS301 Lec01
48 pages
Subject: Data Structures & Algorithms Ecture: 03
No ratings yet
Subject: Data Structures & Algorithms Ecture: 03
31 pages
Pointers in C++
No ratings yet
Pointers in C++
62 pages
Lecture 2 M Allocation (ADT Arrays)
No ratings yet
Lecture 2 M Allocation (ADT Arrays)
38 pages
Fuzz or Lose - Kostya Serebryany - CppCon 2017
No ratings yet
Fuzz or Lose - Kostya Serebryany - CppCon 2017
51 pages
CC213-lec01 (1)
No ratings yet
CC213-lec01 (1)
49 pages
Vedic Vishv Rashtra Ka Itihas Vol 3 PDF
No ratings yet
Vedic Vishv Rashtra Ka Itihas Vol 3 PDF
366 pages
DSA-Class 05-Pointers (1)
No ratings yet
DSA-Class 05-Pointers (1)
55 pages
Chapter-1
No ratings yet
Chapter-1
171 pages
MODULE_1
No ratings yet
MODULE_1
177 pages
Module-1_DS_2024
No ratings yet
Module-1_DS_2024
158 pages
Ch. 3 Lecture 1 - 3 PDF
No ratings yet
Ch. 3 Lecture 1 - 3 PDF
83 pages
Chapter 1-7 Full Chapter Revision
No ratings yet
Chapter 1-7 Full Chapter Revision
113 pages
Pointers Old
No ratings yet
Pointers Old
51 pages
Worksheet-1
No ratings yet
Worksheet-1
14 pages
Module 1
No ratings yet
Module 1
30 pages
Assignmnet0
No ratings yet
Assignmnet0
15 pages
Unit 1 Introduction to Data Structures
No ratings yet
Unit 1 Introduction to Data Structures
98 pages
M1_final to Host
No ratings yet
M1_final to Host
123 pages
CSE225 Lecture 02 Dynamic Memroy Allocation
No ratings yet
CSE225 Lecture 02 Dynamic Memroy Allocation
46 pages
DS - Module 1
No ratings yet
DS - Module 1
25 pages
UNIX Shell Script
No ratings yet
UNIX Shell Script
52 pages
Lecture 5
No ratings yet
Lecture 5
23 pages
Aamir - DSA Assignment
No ratings yet
Aamir - DSA Assignment
13 pages
Procedural Concept: - The Main Program Coordinates Calls To Procedures and Hands Over Appropriate Data As Parameters
No ratings yet
Procedural Concept: - The Main Program Coordinates Calls To Procedures and Hands Over Appropriate Data As Parameters
29 pages
IP QUESTION BANK WITH ANSWERS
No ratings yet
IP QUESTION BANK WITH ANSWERS
16 pages
Typical Structure of POP Programs: Main Program
No ratings yet
Typical Structure of POP Programs: Main Program
243 pages
Two Mark Questions - Unit 3
No ratings yet
Two Mark Questions - Unit 3
4 pages
Assignment 1 Week 1 961
No ratings yet
Assignment 1 Week 1 961
6 pages
Top - Niunaijun.blackboxa66 Logcat
No ratings yet
Top - Niunaijun.blackboxa66 Logcat
17 pages
DSA Lecture02 Basics
No ratings yet
DSA Lecture02 Basics
23 pages
DS
No ratings yet
DS
27 pages
Week 19 Dynamic Data Structure
No ratings yet
Week 19 Dynamic Data Structure
37 pages
Lecture 14
No ratings yet
Lecture 14
27 pages
Wa0001.
No ratings yet
Wa0001.
32 pages
unit - 1
No ratings yet
unit - 1
20 pages
122 Midterm Theory Notes
No ratings yet
122 Midterm Theory Notes
11 pages
Lecture-2.1-C Variables and Data Type
No ratings yet
Lecture-2.1-C Variables and Data Type
22 pages
Bash Assignment
100% (1)
Bash Assignment
8 pages
Elliot
No ratings yet
Elliot
18 pages
Data Structures - CS301 Power Point Slides Lecture 01
No ratings yet
Data Structures - CS301 Power Point Slides Lecture 01
48 pages
Java Reviewer
No ratings yet
Java Reviewer
18 pages
DS Unit - 1
No ratings yet
DS Unit - 1
47 pages
Garbage Collector
No ratings yet
Garbage Collector
11 pages
56 Tanay JAVA Exp7
No ratings yet
56 Tanay JAVA Exp7
5 pages
Xii Ip Ut 2 Marking Scheme
No ratings yet
Xii Ip Ut 2 Marking Scheme
4 pages
Lab 1
No ratings yet
Lab 1
4 pages
CPP Journal
No ratings yet
CPP Journal
25 pages
Untitled Boxing Game AUTO DODGE AND COUNTER
No ratings yet
Untitled Boxing Game AUTO DODGE AND COUNTER
2 pages
Oop Concepts
No ratings yet
Oop Concepts
32 pages
The Continue Statement: Syntax of Goto Statement
No ratings yet
The Continue Statement: Syntax of Goto Statement
4 pages
Macro and Macro Processors
No ratings yet
Macro and Macro Processors
4 pages
NET Framework Unit 1
No ratings yet
NET Framework Unit 1
10 pages
FP Assignment
No ratings yet
FP Assignment
2 pages
Java
No ratings yet
Java
4 pages
X Comp EVII 24 25
No ratings yet
X Comp EVII 24 25
7 pages
Mastering C++ Network Automation: Run Automation across Configuration Management, Container Orchestration, Kubernetes, and Cloud Networking
From Everand
Mastering C++ Network Automation: Run Automation across Configuration Management, Container Orchestration, Kubernetes, and Cloud Networking
Justin Barbara
No ratings yet
Mastering C++ Network Automation
From Everand
Mastering C++ Network Automation
Justin Barbara
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet

Module5

Uploaded by

Module5

Uploaded by

MODULE FIVE:

 Explicit Data Management

 Data must be visible on the device when Host

 Many parallel accelerators (such as Host

 The data directive defines a lifetime #pragma acc data clauses

!$acc end data

create( list ) Allocates memory on device but does not copy.

Principal use: Temporary arrays.

These two codes are functionally the same.

Explicit 1 Data Copy Implicit 2 Data Copies

#pragma acc kernels #pragma acc kernels copyout(a[0:100])

#pragma acc kernels #pragma acc kernels copy(a[0:100])

!$acc exit data clauses

< Sequential and/or Parallel code >

!$acc exit data clauses

#pragma acc parallel loop

#pragma acc enter data copyin(a[0:N],b[0:N]) create(c[0:N])

#pragma acc parallel loop

#pragma acc exit data copyout(c[0:N])

#pragma acc enter data copyin(a[0:N],b[0:N]) create(c[0:N])

#pragma acc enter data copyin(a[0:N],b[0:N]) create(c[0:N])

#pragma acc exit data copyout(c[0:N]) delete(a,b)

#pragma acc exit data copyout(c[0:N]) \ }

int main(){  This pattern is particularly useful in C++,

#pragma acc update self(x[0:count])

The data must exist on

#pragma acc enter data copyin(A[:m*n],Anew[:m*n])

You might also like

#pragma acc enter data copyin(A[:mn],Anew[:mn])