0% found this document useful (0 votes)
13 views

Graphics

This document discusses graphics interoperability between CUDA and OpenGL. It describes how vertex buffer objects (VBOs) registered with CUDA can be accessed by GPU kernels via mapping. An example kernel is provided that sets point positions stored in a VBO according to cosine functions. The document also introduces N-body simulation as an application of OpenGL interoperability, describing the mathematical modeling of particle interactions and how computer simulation via CUDA integration could address the computationally expensive N2 interactions.

Uploaded by

sue
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Graphics

This document discusses graphics interoperability between CUDA and OpenGL. It describes how vertex buffer objects (VBOs) registered with CUDA can be accessed by GPU kernels via mapping. An example kernel is provided that sets point positions stored in a VBO according to cosine functions. The document also introduces N-body simulation as an application of OpenGL interoperability, describing the mathematical modeling of particle interactions and how computer simulation via CUDA integration could address the computationally expensive N2 interactions.

Uploaded by

sue
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

5.

Graphics Interoperability

Contents
5.1. Vertex Buffer Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.2. N-Body Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

CUDA and OpenGL (or Direct3D) can be used simultaneously by a single host program,
working on the same device data. Mapping graphics resources into the address space of
CUDA is known as graphics interoperability.
Data exchange between CUDA and OpenGL on the device (i. e., the GPU) is possible via
1. vertex buffer objects (VBOs),
2. frame buffer objects (FBOs), and
3. textures.
In the following, we concentrate on vertex buffer objects.

5.1. Vertex Buffer Objects

5.1.1. Registering OpenGL VBOs

In order to use OpenGL interoperability, the device has to be selected first. In case of a
single GPU, the device number is 0.
#i n c l u d e <cuda_gl_interop . h>
...
// use CUDA d e v i c e 0
cudaGLSetGLDevice ( 0 ) ;

An OpenGL VBO is registered with CUDA via cudaGraphicsGLRegisterBuffer(). The


third argument of this function can be used to indicate read-only or write-discard access
by CUDA (cf. CUDA API documentation for details). A registered VBO is known as a
CUDA graphics resource, all required information is stored in a structure of the same
name.

Frauke Sprengel, Visual Computing – GPU Computing 59


5. Graphics Interoperability

GLuint vbo = 0 ;
g l G e n B u f f e r s ( 1 , &vbo ) ;
g l B i n d B u f f e r (GL_ARRAY_BUFFER, vbo ) ;
g l B u f f e r D a t a (GL_ARRAY_BUFFER, . . . ) ;
g l B i n d B u f f e r (GL_ARRAY_BUFFER, 0 ) ;

s t r u c t c u d a G r a p h i c s R e s o u r c e ∗ vboCudaRes = 0 ;
c u d a G r a p h i c s G L R e g i s t e r B u f f e r (&vboCudaRes , vbo ,
cudaGraphicsMapFlagsNone ) ;

Similar functions are available for registering Direct3D resources. The following functions
are common for OpenGL and Direct3D interoperability.

5.1.2. Mapping CUDA Graphics Resources

The data of a registered VBO can be accessed by CUDA kernels like data in global device
memory via a device pointer.
s i z e _ t nBytes ;
f l o a t 4 ∗ vboData ; // or any o t h e r t y p e
cudaGraphicsMapResources ( 1 , &vboCudaRes , 0 ) ;
cudaGraphicsResourceGetMappedPointer ( ( void ∗ ∗ ) &vboData ,
&nBytes , vboCudaRes ) ;

myKernel <<<...>>>(vboData , ...);

cudaGraphicsUnmapResources ( 1 , &vboCudaRes , 0 ) ;

In order to get the device pointer,


• the CUDA graphics resource has to be mapped via cudaGraphicsMapResources() and
• the pointer be queried via cudaGraphicsResourceGetMappedPointer() before launching
the kernel.
Afterwards
• the CUDA resource has to be unmapped via cudaGraphicsUnmapResources(), which
acts as a synchronizing barrier to guarantee that all CUDA operations on the VBO
are finished before any subsequent graphics operations.
This code could, e. g., be the first section of the GLUT display () function, with the second
section containing the usual OpenGL rendering commands. We will illustrate this by a
simple example.

60 Frauke Sprengel, Visual Computing – GPU Computing


5.1. Vertex Buffer Objects

5.1.3. Example: Setting Point Positions

The following kernel sets the positions of 2048 points, which are stored in a VBO, according
to time-dependent cosine functions with incommensurable frequencies (i. e., having no
common divisor).
CUDA provides fast intrinsic mathematical functions, such as __cosf() which is used here
to compute the cosines.
__global__
void s e t P o s i t i o n s K e r n e l ( f l o a t 4 ∗ p o s i t i o n s , f l o a t time ,
f l o a t maxPosition ) {
u n s i g n e d int i d x = b l o c k I d x . x ∗ blockDim . x
+ threadIdx . x ;
u n s i g n e d int n P a r t i c l e s = gridDim . x ∗ blockDim . x ;
p o s i t i o n s [ i d x ] . x = maxPosition ∗ 0 . 5 f ∗ ( __cosf ( 3 ∗
( 3 . 1 4 1 5 9 f ∗ i d x / ( f l o a t ) n P a r t i c l e s + time ) )
+ 1.0 f ) ;
p o s i t i o n s [ i d x ] . y = maxPosition ∗ 0 . 5 f ∗ ( __cosf ( 4 ∗
( 3 . 1 4 1 5 9 f ∗ i d x / ( f l o a t ) n P a r t i c l e s + time ) )
+ 1.0 f ) ;
p o s i t i o n s [ i d x ] . z = maxPosition ∗ i d x /
( float ) nParticles ;
}

In the display function, the kernel is launched first to set the point positions. The final
glutPostRedisplay() requests an immediate redisplay to keep the animation running.
__global__
void d i s p l a y ( ) {
// l a u n c h k e r n e l
// c l e a r OpenGL b u f f e r s , a p p l y camera t r a n s f o r m a t i o n

g l B i n d B u f f e r (GL_ARRAY_BUFFER, vbo ) ;
g l V e r t e x P o i n t e r ( 3 , GL_FLOAT, 4 ∗ s i z e o f ( f l o a t ) ,
BUFFER_OFFSET( 0 ) ) ;
g l E n a b l e C l i e n t S t a t e (GL_VERTEX_ARRAY) ;
glDrawArrays (GL_POINTS, 0 , n P a r t i c l e s ) ;
g l D i s a b l e C l i e n t S t a t e (GL_VERTEX_ARRAY) ;
g l B i n d B u f f e r (GL_ARRAY_BUFFER, 0 ) ;

glutSwapBuffers ( ) ;
glutPostRedisplay ( ) ;
}

Frauke Sprengel, Visual Computing – GPU Computing 61


5. Graphics Interoperability

Figure 5.1.: The result is a Lissajous figure moving in space.

5.2. N-Body Dynamics


As an example of OpenGL interoperability with CUDA, we will study N-body dynam-
ics, which has numerous applications, e. g.:
• Astrophysics: structure formation in galaxies and planetary rings
• Chemistry and biology: molecular dynamics simulation of gases and fluids, crystal
growth, protein folding
• Physics: simulation of granular material (such as sand or grain), pattern formation
in sand (ripples, dunes) or fluids (heat convection)
• Graph drawing: force-directed layout (node repulsion)
• Computer graphics: particle systems

5.2.1. Mathematical Modeling

Given are N bodies (or particles) with


• masses mi ∈ R,
• positions r i (t) ∈ R3 ,
• velocities v i (t) = ṙ i (t) ∈ R3

62 Frauke Sprengel, Visual Computing – GPU Computing


5.2. N-Body Dynamics

for time t and i ∈ {1, . . . , N }.


In certain applications (such as molecular dynamics), the masses can be set equal,

m1 = . . . = mN = m.

In other cases (such as structure formation), however, different masses are essential.
Considering the dynamics, it is easiest to treat the particles as point masses, thus
neglecting rigid body dynamics like particle rotation.
In the classical N-body problem, the particles are exposed to gravitational forces due
to body-body interactions,
mi mj r ij
f ij = G 2 · ,
∥r ij ∥ ∥r ij ∥

where r ij = r j − r i is the difference vector between the positions of particles i and j


and
G ≈ 6.67 · 10−11 m3 kg−1 s−2
is the gravitational constant.
The total force on particle i is obtained by summing all interactions with the other N − 1
bodies,
X X mj r ij
Fi = f ij = G mi 3.
j̸=i j̸=i ∥r ij ∥

In order to avoid forces that grow without limit as bodies approach each other, a soft-
ening factor ε > 0 is introduced,
N
X mj r ij
F i ≈ G mi  3/2 .
j=1 ∥r ij ∥2 + ε2

Due to the softening factor the condition j ̸= i in the sum index can be omitted, since
f ii = 0 for ε > 0.
The dynamics of particle i is governed by Newton’s second law,

F i (t) = mi ai (t),

where ai (t) = r̈ i (t) ∈ R3 is the acceleration.


Solving for the acceleration, we obtain the ordinary differential equation (ODE)
N
F i (t) X mj r ij (t)
r̈ i (t) = ≈G 3/2 . (5.1)
mi

2
j=1 ∥r ij (t)∥ + ε2

Frauke Sprengel, Visual Computing – GPU Computing 63


5. Graphics Interoperability

5.2.2. Computer Simulation

The N-body problem requires the computation of


!
N N (N − 1)
= = O(N 2 )
2 2

body-body interactions, making the simulation computationally expensive for large N


(which is the typical case).
Setting a cut-off for the interaction over large distances ∥r ij ∥ can reduce the number
of interactions. Thinking of a CUDA implementation, the memory bandwith provides a
additional issue which has to be addressed working on blocks (or tiles) of particles in
shared memory.
In order to learn how CUDA can be used to integrate the ODE, we start with the simple
case of fixed velocities v i .
The familiar Euler forward integration method with a time step h gives

r i (t + h) = r i (t) + h v i .

The following kernel updates the positions of 2048 particles with random initial positions
and velocities within a box of fixed size. The particle position are stored in a VBO that is
registered with CUDA, while the velocities are stored in a CUDA array in global device
memory.
__global__
void u p d a t e P o s i t i o n s K e r n e l ( f l o a t 4 ∗ p o s i t i o n s ,
float ∗ velocityPtr , float timeDiff ,
f l o a t maxPosition ) {
u n s i g n e d int i d x = b l o c k I d x . x ∗ blockDim . x
+ threadIdx . x ;
f l o a t 4 ∗ v e l o c i t i e s = ( f l o a t 4 ∗) v e l o c i t y P t r ;
s t e p I n t e g r a t i o n (&( p o s i t i o n s [ i d x ] ) , &( v e l o c i t i e s [ i d x ] ) ,
timeDiff ) ;
a p p l y R e f l e c t i v e B o u n d a r y C o n d i t i o n s (&( p o s i t i o n s [ i d x ] ) ,
&( v e l o c i t i e s [ i d x ] ) , maxPosition ) ;
}

The kernel calls two device functions for the Euler integration step and the application
of reflective boundary conditions, repectively. The latter ensure that the particles stay
inside the given box by mirroring their positions and velocities back into the box if they
have passed a boundary.
__device__
void s t e p I n t e g r a t i o n ( f l o a t 4 ∗ p o s i t i o n , f l o a t 4 ∗ v e l o c i t y ,
float timeDiff ) {
p o s i t i o n −>x += t i m e D i f f ∗ v e l o c i t y −>x ;
p o s i t i o n −>y += t i m e D i f f ∗ v e l o c i t y −>y ;
p o s i t i o n −>z += t i m e D i f f ∗ v e l o c i t y −>z ;
}

64 Frauke Sprengel, Visual Computing – GPU Computing


5.2. N-Body Dynamics

__device__
void a p p l y R e f l e c t i v e B o u n d a r y C o n d i t i o n s ( f l o a t 4 ∗ p o s i t i o n ,
f l o a t 4 ∗ v e l o c i t y , f l o a t maxPosition ) {
i f ( p o s i t i o n −>x < 0 . 0 f ) {
p o s i t i o n −>x ∗= −1;
v e l o c i t y −>x ∗= −1;
}
e l s e i f ( p o s i t i o n −>x > maxPosition ) {
p o s i t i o n −>x = maxPosition
− ( p o s i t i o n −>x − maxPosition ) ;
v e l o c i t y −>x ∗= −1;
}
// same f o r y and z . . .
}

The result consists of particles moving independently of each other with fixed velocities,
being reflected by the boundaries of the box.

Figure 5.2.: Particles moving independently of each other with fixed velocities.

When considering the body-body interactions in the dynamics, the Verlet1 or leapfrog
(Bocksprung) integration method is used for most applications, as it provides a reasonable
compromise of accuracy and computational efficiency.
The leapfrog integration method with a timestep h consists of two entangled steps for the
velocity and the position, respectively,

v i (t + h/2) = v i (t − h/2) + h ai (t)


r i (t + h) = r i (t) + h v i (t + h/2).
1
Loup Verlet, French physicist, ∗
1931

Frauke Sprengel, Visual Computing – GPU Computing 65


5. Graphics Interoperability

If the velocity at time t is required, it can be calculated in an additional half-step,

v i (t) = v i (t − h/2) + (h/2) ai (t)

The accelerations ai (t) = r̈ i (t) are given by Eq. (5.1) above. The local errors of the
leapfrog method are of order O(h4 ) for the positions and O(h2 ) for the velocities. In
comparison, the local errors of the Euler method are of order O(h2 ) for the positions.
However, both methods are known to be numerically unstable. So you cannot expect
global convergence over long periods of time.

5.2.3. References

Nyland, L., Harris, M., Prins, J.: Fast N-Body Simulation with CUDA. In Nguyen, H.
(ed.): GPU Gems 3, pp. 677–695 (Nguyen 2008).
Available as whitepaper of N-Body demo in NVIDIA Cuda Samples.
Rapaport, D. C.: The Art of Molecular Dynamics Simulation. Cambridge University Press,
Cambridge, 2nd. ed., 2004.

66 Frauke Sprengel, Visual Computing – GPU Computing

You might also like