Graphics
Graphics
Graphics Interoperability
Contents
5.1. Vertex Buffer Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.2. N-Body Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
CUDA and OpenGL (or Direct3D) can be used simultaneously by a single host program,
working on the same device data. Mapping graphics resources into the address space of
CUDA is known as graphics interoperability.
Data exchange between CUDA and OpenGL on the device (i. e., the GPU) is possible via
1. vertex buffer objects (VBOs),
2. frame buffer objects (FBOs), and
3. textures.
In the following, we concentrate on vertex buffer objects.
In order to use OpenGL interoperability, the device has to be selected first. In case of a
single GPU, the device number is 0.
#i n c l u d e <cuda_gl_interop . h>
...
// use CUDA d e v i c e 0
cudaGLSetGLDevice ( 0 ) ;
GLuint vbo = 0 ;
g l G e n B u f f e r s ( 1 , &vbo ) ;
g l B i n d B u f f e r (GL_ARRAY_BUFFER, vbo ) ;
g l B u f f e r D a t a (GL_ARRAY_BUFFER, . . . ) ;
g l B i n d B u f f e r (GL_ARRAY_BUFFER, 0 ) ;
s t r u c t c u d a G r a p h i c s R e s o u r c e ∗ vboCudaRes = 0 ;
c u d a G r a p h i c s G L R e g i s t e r B u f f e r (&vboCudaRes , vbo ,
cudaGraphicsMapFlagsNone ) ;
Similar functions are available for registering Direct3D resources. The following functions
are common for OpenGL and Direct3D interoperability.
The data of a registered VBO can be accessed by CUDA kernels like data in global device
memory via a device pointer.
s i z e _ t nBytes ;
f l o a t 4 ∗ vboData ; // or any o t h e r t y p e
cudaGraphicsMapResources ( 1 , &vboCudaRes , 0 ) ;
cudaGraphicsResourceGetMappedPointer ( ( void ∗ ∗ ) &vboData ,
&nBytes , vboCudaRes ) ;
cudaGraphicsUnmapResources ( 1 , &vboCudaRes , 0 ) ;
The following kernel sets the positions of 2048 points, which are stored in a VBO, according
to time-dependent cosine functions with incommensurable frequencies (i. e., having no
common divisor).
CUDA provides fast intrinsic mathematical functions, such as __cosf() which is used here
to compute the cosines.
__global__
void s e t P o s i t i o n s K e r n e l ( f l o a t 4 ∗ p o s i t i o n s , f l o a t time ,
f l o a t maxPosition ) {
u n s i g n e d int i d x = b l o c k I d x . x ∗ blockDim . x
+ threadIdx . x ;
u n s i g n e d int n P a r t i c l e s = gridDim . x ∗ blockDim . x ;
p o s i t i o n s [ i d x ] . x = maxPosition ∗ 0 . 5 f ∗ ( __cosf ( 3 ∗
( 3 . 1 4 1 5 9 f ∗ i d x / ( f l o a t ) n P a r t i c l e s + time ) )
+ 1.0 f ) ;
p o s i t i o n s [ i d x ] . y = maxPosition ∗ 0 . 5 f ∗ ( __cosf ( 4 ∗
( 3 . 1 4 1 5 9 f ∗ i d x / ( f l o a t ) n P a r t i c l e s + time ) )
+ 1.0 f ) ;
p o s i t i o n s [ i d x ] . z = maxPosition ∗ i d x /
( float ) nParticles ;
}
In the display function, the kernel is launched first to set the point positions. The final
glutPostRedisplay() requests an immediate redisplay to keep the animation running.
__global__
void d i s p l a y ( ) {
// l a u n c h k e r n e l
// c l e a r OpenGL b u f f e r s , a p p l y camera t r a n s f o r m a t i o n
g l B i n d B u f f e r (GL_ARRAY_BUFFER, vbo ) ;
g l V e r t e x P o i n t e r ( 3 , GL_FLOAT, 4 ∗ s i z e o f ( f l o a t ) ,
BUFFER_OFFSET( 0 ) ) ;
g l E n a b l e C l i e n t S t a t e (GL_VERTEX_ARRAY) ;
glDrawArrays (GL_POINTS, 0 , n P a r t i c l e s ) ;
g l D i s a b l e C l i e n t S t a t e (GL_VERTEX_ARRAY) ;
g l B i n d B u f f e r (GL_ARRAY_BUFFER, 0 ) ;
glutSwapBuffers ( ) ;
glutPostRedisplay ( ) ;
}
m1 = . . . = mN = m.
In other cases (such as structure formation), however, different masses are essential.
Considering the dynamics, it is easiest to treat the particles as point masses, thus
neglecting rigid body dynamics like particle rotation.
In the classical N-body problem, the particles are exposed to gravitational forces due
to body-body interactions,
mi mj r ij
f ij = G 2 · ,
∥r ij ∥ ∥r ij ∥
In order to avoid forces that grow without limit as bodies approach each other, a soft-
ening factor ε > 0 is introduced,
N
X mj r ij
F i ≈ G mi 3/2 .
j=1 ∥r ij ∥2 + ε2
Due to the softening factor the condition j ̸= i in the sum index can be omitted, since
f ii = 0 for ε > 0.
The dynamics of particle i is governed by Newton’s second law,
F i (t) = mi ai (t),
r i (t + h) = r i (t) + h v i .
The following kernel updates the positions of 2048 particles with random initial positions
and velocities within a box of fixed size. The particle position are stored in a VBO that is
registered with CUDA, while the velocities are stored in a CUDA array in global device
memory.
__global__
void u p d a t e P o s i t i o n s K e r n e l ( f l o a t 4 ∗ p o s i t i o n s ,
float ∗ velocityPtr , float timeDiff ,
f l o a t maxPosition ) {
u n s i g n e d int i d x = b l o c k I d x . x ∗ blockDim . x
+ threadIdx . x ;
f l o a t 4 ∗ v e l o c i t i e s = ( f l o a t 4 ∗) v e l o c i t y P t r ;
s t e p I n t e g r a t i o n (&( p o s i t i o n s [ i d x ] ) , &( v e l o c i t i e s [ i d x ] ) ,
timeDiff ) ;
a p p l y R e f l e c t i v e B o u n d a r y C o n d i t i o n s (&( p o s i t i o n s [ i d x ] ) ,
&( v e l o c i t i e s [ i d x ] ) , maxPosition ) ;
}
The kernel calls two device functions for the Euler integration step and the application
of reflective boundary conditions, repectively. The latter ensure that the particles stay
inside the given box by mirroring their positions and velocities back into the box if they
have passed a boundary.
__device__
void s t e p I n t e g r a t i o n ( f l o a t 4 ∗ p o s i t i o n , f l o a t 4 ∗ v e l o c i t y ,
float timeDiff ) {
p o s i t i o n −>x += t i m e D i f f ∗ v e l o c i t y −>x ;
p o s i t i o n −>y += t i m e D i f f ∗ v e l o c i t y −>y ;
p o s i t i o n −>z += t i m e D i f f ∗ v e l o c i t y −>z ;
}
__device__
void a p p l y R e f l e c t i v e B o u n d a r y C o n d i t i o n s ( f l o a t 4 ∗ p o s i t i o n ,
f l o a t 4 ∗ v e l o c i t y , f l o a t maxPosition ) {
i f ( p o s i t i o n −>x < 0 . 0 f ) {
p o s i t i o n −>x ∗= −1;
v e l o c i t y −>x ∗= −1;
}
e l s e i f ( p o s i t i o n −>x > maxPosition ) {
p o s i t i o n −>x = maxPosition
− ( p o s i t i o n −>x − maxPosition ) ;
v e l o c i t y −>x ∗= −1;
}
// same f o r y and z . . .
}
The result consists of particles moving independently of each other with fixed velocities,
being reflected by the boundaries of the box.
Figure 5.2.: Particles moving independently of each other with fixed velocities.
When considering the body-body interactions in the dynamics, the Verlet1 or leapfrog
(Bocksprung) integration method is used for most applications, as it provides a reasonable
compromise of accuracy and computational efficiency.
The leapfrog integration method with a timestep h consists of two entangled steps for the
velocity and the position, respectively,
The accelerations ai (t) = r̈ i (t) are given by Eq. (5.1) above. The local errors of the
leapfrog method are of order O(h4 ) for the positions and O(h2 ) for the velocities. In
comparison, the local errors of the Euler method are of order O(h2 ) for the positions.
However, both methods are known to be numerically unstable. So you cannot expect
global convergence over long periods of time.
5.2.3. References
Nyland, L., Harris, M., Prins, J.: Fast N-Body Simulation with CUDA. In Nguyen, H.
(ed.): GPU Gems 3, pp. 677–695 (Nguyen 2008).
Available as whitepaper of N-Body demo in NVIDIA Cuda Samples.
Rapaport, D. C.: The Art of Molecular Dynamics Simulation. Cambridge University Press,
Cambridge, 2nd. ed., 2004.