0% found this document useful (0 votes)
12 views

Module5

Uploaded by

singhguma86
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Module5

Uploaded by

singhguma86
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

MODULE FIVE:

DATA MANAGEMENT
Dr. Volker Weinberg | LRZ
MODULE OVERVIEW
OpenACC Data Management

 Explicit Data Management


 OpenACC Data Regions and Clauses
 Unstructured Data Lifetimes
 Data Synchronization
EXPLICIT MEMORY MANAGEMENT
EXPLICIT MEMORY MANAGEMENT
Requirements

 Data must be visible on the device when Host


we run our parallel code
Device
 Data must be visible on the host when we
run our sequential code
 When the host and device don’t share
memory, data movement must occur
 To maximize performance, the Host
programmer should avoid all unnecessary Memory
data transfers Device
Memory
EXPLICIT MEMORY MANAGEMENT
Key problems

 Many parallel accelerators (such as Host


devices) have a separate memory space
from the host Device
 These separate memories can become
out-of-sync and contain completely
different data
 Transferring between these two memories
can be a very time consuming process Host
Memory
Device
Memory
OPENACC DATA DIRECTIVE
OPENACC DATA DIRECTIVE
Definition

 The data directive defines a lifetime #pragma acc data clauses


for data on the device {
 During the region data should be < Sequential and/or Parallel code >
thought of as residing on the
accelerator }
 Data clauses allow the programmer
to control the allocation and !$acc data clauses
movement of data
< Sequential and/or Parallel code >

!$acc end data


DATA CLAUSES
copy( list ) Allocates memory on device and copies data from host to device
when entering region and copies data to the host when exiting region.

Principal use: For many important data structures in your code, this is a
logical default to input, modify and return the data.

copyin( list ) Allocates memory on device and copies data from host to device
when entering region.

Principal use: Think of this like an array that you would use as just an
input to a subroutine.

copyout( list ) Allocates memory on device and copies data to the host when exiting
region.

Principal use: A result that isn’t overwriting the input data structure.

create( list ) Allocates memory on device but does not copy.

Principal use: Temporary arrays.


IMPLIED DATA REGIONS
IMPLIED DATA REGIONS
Definition
 Every kernels and parallel region has
an implicit data region surrounding it
 This allows data to exist solely for the
#pragma acc kernels copyin(a[0:100])
duration of the region {
for( int i = 0; i < 100; i++ )
 All data clauses usable on a data {
directive can be used on a parallel and a[i] = 0;
kernels as well }
}
IMPLIED DATA REGIONS
Explicit vs Implicit Data Regions

Explicit Implicit
#pragma acc data copyin(a[0:100])
{
#pragma acc kernels #pragma acc kernels copyin(a[0:100])
{ {
for( int i = 0; i < 100; i++ ) for( int i = 0; i < 100; i++ )
{ {
a[i] = 0; a[i] = 0;
} }
} }
}

These two codes are functionally the same.


EXPLICIT VS. IMPLICIT DATA REGIONS
Limitation

Explicit 1 Data Copy Implicit 2 Data Copies


#pragma acc data copyout(a[0:100])
{

#pragma acc kernels #pragma acc kernels copyout(a[0:100])


{ {
a[i] = i; a[i] = i;
} }

#pragma acc kernels #pragma acc kernels copy(a[0:100])


{ {
a[i] = 2 * a[i]; a[i] = 2 * a[i];
} }

The code on the left will perform better than the code on the right.
UNSTRUCTURED DATA DIRECTIVES
UNSTRUCTURED DATA DIRECTIVES
Enter Data Directive
 Data lifetimes aren’t always neatly
structured. #pragma acc enter data clauses

 The enter data directive handles device < Sequential and/or Parallel code >
memory allocation
#pragma acc exit data clauses
 You may use either the create or the
copyin clause for memory allocation
 The enter data directive is not the start !$acc enter data clauses
of a data region, because you may
have multiple enter data directives < Sequential and/or Parallel code >

!$acc exit data clauses


UNSTRUCTURED DATA DIRECTIVES
Exit Data Directive
 The exit data directive handles device
memory deallocation #pragma acc enter data clauses
 You may use either the delete or the < Sequential and/or Parallel code >
copyout clause for memory deallocation
#pragma acc exit data clauses
 You should have as many exit data for a
given array as enter data
 These can exist in different functions !$acc enter data clauses

< Sequential and/or Parallel code >

!$acc exit data clauses


UNSTRUCTURED DATA CLAUSES

copyin ( list ) Allocates memory on device and copies data from host to device
on enter data.
copyout ( list ) Allocates memory on device and copies data back to the host on
exit data.
create ( list ) Allocates memory on device without data transfer on enter data.
delete ( list ) Deallocates memory on device without data transfer on exit data
UNSTRUCTURED DATA DIRECTIVES
Basic Example

#pragma acc parallel loop


for(int i = 0; i < N; i++){
c[i] = a[i] + b[i];
}
UNSTRUCTURED DATA DIRECTIVES
Basic Example

#pragma acc enter data copyin(a[0:N],b[0:N]) create(c[0:N])

#pragma acc parallel loop


for(int i = 0; i < N; i++){
c[i] = a[i] + b[i];
}

#pragma acc exit data copyout(c[0:N])


UNSTRUCTURED DATA DIRECTIVES
Basic Example

#pragma acc enter data copyin(a[0:N],b[0:N]) create(c[0:N])


Action
#pragma acc parallel loop
for(int i = 0; i < N; i++){ Copy C B
A
Execute
Deallocate
Allocateloop
C
B
AC
c[i] = a[i] + b[i]; from
from
on
} device
CPU toto
device
device
CPU
#pragma acc exit data copyout(c[0:N])
CPU MEMORY device MEMORY

C’ A B C’
A B C
UNSTRUCTURED DATA DIRECTIVES
Basic Example – proper memory deallocation

#pragma acc enter data copyin(a[0:N],b[0:N]) create(c[0:N])


Action
#pragma acc parallel loop
for(int i = 0; i < N; i++){
Deallocate A
B
c[i] = a[i] + b[i]; from
} device

#pragma acc exit data copyout(c[0:N]) delete(a,b)


CPU MEMORY device MEMORY

C’ A B
A B C
UNSTRUCTURED VS STRUCTURED
With a simple code
Unstructured Structured
 Can have multiple starting/ending points  Must have explicit start/end points
 Can branch across multiple functions  Must be within a single function
 Memory exists until explicitly deallocated  Memory only exists within the data region
#pragma acc enter data copyin(a[0:N],b[0:N]) \ #pragma acc data copyin(a[0:N],b[0:N]) \
create(c[0:N]) copyout(c[0:N])
{
#pragma acc parallel loop #pragma acc parallel loop
for(int i = 0; i < N; i++){ for(int i = 0; i < N; i++){
c[i] = a[i] + b[i]; c[i] = a[i] + b[i];
} }

#pragma acc exit data copyout(c[0:N]) \ }


delete(a,b)
UNSTRUCTURED DATA DIRECTIVES
Branching across multiple functions
int* allocate_array(int N){
int* ptr = (int *) malloc(N * sizeof(int));  In this example enter data and exit data are
#pragma acc enter data create(ptr[0:N])
return ptr; in different functions
}
 This allows the programmer to put device
void deallocate_array(int* ptr){
#pragma acc exit data delete(ptr) allocation/deallocation with the matching
free(ptr); host versions
}

int main(){  This pattern is particularly useful in C++,


int* a = allocate_array(100); where structured scopes may not be
#pragma acc kernels
{ possible.
a[0] = 0;
}
deallocate_array(a);
}
DATA SYNCHRONIZATION
OPENACC UPDATE DIRECTIVE
update: Explicitly transfers data between the host and the device
Useful when you want to synchronize data in the middle of a data region
Clauses:
self: makes host data agree with device data
device: makes device data agree with host data

#pragma acc update self(x[0:count])


#pragma acc update device(x[0:count])
C/C++
!$acc update self(x(1:end_index))
!$acc update device(x(1:end_index))
Fortran
OPENACC UPDATE DIRECTIVE
#pragma acc update device(A[0:N])

The data must exist on


A A*
A
both the CPU and device
CPU Memory device Memory
for the update directive
to work.

B*
B B*
#pragma acc update self(A[0:N])
SYNCHRONIZE DATA WITH UPDATE
int* allocate_array(int N){  Inside the initialize function we alter the
int* A=(int*) malloc(N*sizeof(int));
#pragma acc enter data create(A[0:N]) host copy of ‘A’
return A;
}  This means that after calling initialize the
host and device copy of ‘A’ are out-of-sync
void deallocate_array(int* A){
#pragma acc exit data delete(A)
free(A);
 We use the update directive with the
} device clause to update the device copy of
‘A’
void initialize_array(int* A, int N){
for(int i = 0; i < N; i++){  Without the update directive later compute
A[i] = i;
}
regions will use incorrect data.
#pragma acc update device(A[0:N])
}
COPYING DATA IN DATA REGIONS

#pragma acc enter data copyin(A[:m*n],Anew[:m*n])


#pragma acc parallel loop copy(A,Anew)
for( int j = 1; j < n-1; j++)
 But wouldn't this code now result in my arrays being copied twice, once by the `data`
region and then again by the `parallel loop`? In fact, the OpenACC runtime is smart
enough to handle exactly this case. Data will be copied _in_ only the first time its
encountered in a data clause and _out_ only the last time its encountered in a data
clause. This allows you to create fully-working directives within your functions and
then later _"hoist"_ the data movement to a higher level without changing your code
at all. This is part of incrementally accelerating your code to avoid incorrect results.
MODULE REVIEW
KEY CONCEPTS
In this module we discussed…
 Why explicit data management is necessary for best performance
 Structured and Unstructured Data Lifetimes
 Explicit and Implicit Data Regions
 The data, enter data, exit data, and update directives
 Data Clauses
LAB ASSIGNMENT
In this module’s lab you will…
 Update the code from the previous module to use explicit data
directives
 Analyze the different between using CUDA Managed Memory and
explicit data management in the lab code.

You might also like