Module5
Module5
DATA MANAGEMENT
Dr. Volker Weinberg | LRZ
MODULE OVERVIEW
OpenACC Data Management
Principal use: For many important data structures in your code, this is a
logical default to input, modify and return the data.
copyin( list ) Allocates memory on device and copies data from host to device
when entering region.
Principal use: Think of this like an array that you would use as just an
input to a subroutine.
copyout( list ) Allocates memory on device and copies data to the host when exiting
region.
Principal use: A result that isn’t overwriting the input data structure.
Explicit Implicit
#pragma acc data copyin(a[0:100])
{
#pragma acc kernels #pragma acc kernels copyin(a[0:100])
{ {
for( int i = 0; i < 100; i++ ) for( int i = 0; i < 100; i++ )
{ {
a[i] = 0; a[i] = 0;
} }
} }
}
The code on the left will perform better than the code on the right.
UNSTRUCTURED DATA DIRECTIVES
UNSTRUCTURED DATA DIRECTIVES
Enter Data Directive
Data lifetimes aren’t always neatly
structured. #pragma acc enter data clauses
The enter data directive handles device < Sequential and/or Parallel code >
memory allocation
#pragma acc exit data clauses
You may use either the create or the
copyin clause for memory allocation
The enter data directive is not the start !$acc enter data clauses
of a data region, because you may
have multiple enter data directives < Sequential and/or Parallel code >
copyin ( list ) Allocates memory on device and copies data from host to device
on enter data.
copyout ( list ) Allocates memory on device and copies data back to the host on
exit data.
create ( list ) Allocates memory on device without data transfer on enter data.
delete ( list ) Deallocates memory on device without data transfer on exit data
UNSTRUCTURED DATA DIRECTIVES
Basic Example
C’ A B C’
A B C
UNSTRUCTURED DATA DIRECTIVES
Basic Example – proper memory deallocation
C’ A B
A B C
UNSTRUCTURED VS STRUCTURED
With a simple code
Unstructured Structured
Can have multiple starting/ending points Must have explicit start/end points
Can branch across multiple functions Must be within a single function
Memory exists until explicitly deallocated Memory only exists within the data region
#pragma acc enter data copyin(a[0:N],b[0:N]) \ #pragma acc data copyin(a[0:N],b[0:N]) \
create(c[0:N]) copyout(c[0:N])
{
#pragma acc parallel loop #pragma acc parallel loop
for(int i = 0; i < N; i++){ for(int i = 0; i < N; i++){
c[i] = a[i] + b[i]; c[i] = a[i] + b[i];
} }
B*
B B*
#pragma acc update self(A[0:N])
SYNCHRONIZE DATA WITH UPDATE
int* allocate_array(int N){ Inside the initialize function we alter the
int* A=(int*) malloc(N*sizeof(int));
#pragma acc enter data create(A[0:N]) host copy of ‘A’
return A;
} This means that after calling initialize the
host and device copy of ‘A’ are out-of-sync
void deallocate_array(int* A){
#pragma acc exit data delete(A)
free(A);
We use the update directive with the
} device clause to update the device copy of
‘A’
void initialize_array(int* A, int N){
for(int i = 0; i < N; i++){ Without the update directive later compute
A[i] = i;
}
regions will use incorrect data.
#pragma acc update device(A[0:N])
}
COPYING DATA IN DATA REGIONS