0% found this document useful (0 votes)
170 views

CUDA OpenGL

CUDA and OpenGL can be used together to accelerate graphics applications. CUDA is used for calculations and data generation while OpenGL draws pixels or vertices to the screen. They share data through common memory in the framebuffer. The key steps are: 1. Create an OpenGL window and context 2. Generate OpenGL buffers and register them with CUDA 3. Map the OpenGL buffers into CUDA memory space for fast data transfer 4. Use CUDA to generate images/data and write to the mapped buffers 5. Draw with OpenGL using the updated buffer data This allows CUDA to pre-process or generate graphics data very quickly, which OpenGL then displays on the screen.

Uploaded by

andizv
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
170 views

CUDA OpenGL

CUDA and OpenGL can be used together to accelerate graphics applications. CUDA is used for calculations and data generation while OpenGL draws pixels or vertices to the screen. They share data through common memory in the framebuffer. The key steps are: 1. Create an OpenGL window and context 2. Generate OpenGL buffers and register them with CUDA 3. Map the OpenGL buffers into CUDA memory space for fast data transfer 4. Use CUDA to generate images/data and write to the mapped buffers 5. Draw with OpenGL using the updated buffer data This allows CUDA to pre-process or generate graphics data very quickly, which OpenGL then displays on the screen.

Uploaded by

andizv
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

CUDA + OpenGL CUDA + OpenGL

• CUDA used for calculation, • CUDA C uses familiar C memory management


data generation, image techniques (malloc, pointers)
manipulation
• OpenGL stores data in abstract generic buffers
• OpenGL used to draw pixels or called buffer objects
vertices on the screen • CUDA/OpenGL interop uses one simple concept:
• Interop is very fast! They share – Map/Unmap an OpenGL buffer into CUDA’s
data through common memory memory space
in the framebuffer

Setup Steps to OpenGL with CUDA 1)


1 Creating the window
1 Create a window (OS specific)
• Each OS does this differently.
2 Create a GL context (also OS specific) We’ll use Win32 for examples here:
– CreateWindowEx() is the Win32 function to
3 Set up the GL viewport and coordinate system create a window. Returns an HWND.

Create the CUDA Context


– Also need the windows HDC:
44 HDC hDC;
hDC=GetDC(hWnd);
5 Generate one or more GL buffers to be shared with CUDA

6 Register these buffers with CUDA


1a Set the Pixel Format for the Window 2)
2 Create the OpenGL Context
static PIXELFORMATDESCRIPTOR pfd=
{ // Create a wGL rendering context
sizeof(PIXELFORMATDESCRIPTOR), // Size Of This Pixel Format Descriptor
1, // Version Number HGLRC hGLRC;
PFD_DRAW_TO_WINDOW | // Format Must Support Window hGLRC=wglCreateContext(hDC);
PFD_SUPPORT_OPENGL | // Format Must Support OpenGL
PFD_DOUBLEBUFFER, // Must Support Double Buffering
PFD_TYPE_RGBA, // Request An RGBA Format // Activate the rendering context
8, // Select Our Color Depth, 8 bits / channel
0, 0, 0, 0, 0, 0, // Color Bits Ignored wglMakeCurrent(hDC,hGLRC);
0, // No Alpha Buffer
0, // Shift Bit Ignored
0, // No Accumulation Buffer // loads OpenGL extensions to support buffers
0, 0, 0, 0, // Accumulation Bits Ignored glewInit();
32, // 32 bit Z-Buffer (Depth Buffer)
0, // No Stencil Buffer
0, // No Auxiliary Buffer
PFD_MAIN_PLANE, // Main Drawing Layer
0, // Reserved
0, 0, 0 // Layer Masks Ignored
};
GLuint PixelFormat;
// create the pixel pixel format descriptor
Note: Use the PFD_STEREO flag on NVIDIA Interested in off-screen rendering? Use GPU Affinity on NVIDIA
PixelFormat=ChoosePixelFormat(hDC,&pfd; Quadro cards for OpenGLStereo Support! Quadro cards to create an OpenGL context without a window
// set the pixel format descriptor
SetPixelFormat(hDC,PixelFormat,&pfd);

3)
3 Set Up Our Viewport 3)
3 Set Up (Cont.)
(0,1.0,-1.0) (0,1.0,1.0)
// Set up which portion of the • Enable depth sorting
// window is being used
glViewport(0, 0, width, height); glEnable(GL_DEPTH_TEST);
(1.0,1.0,-1.0) (1.0,1.0,1.0)
// Just set up an orthogonal system
glMatrixMode(GL_PROJECTION); • Set the clear color and clear the viewport
(0,0,-1.0) (0,0,1.0)
glLoadIdentity(); glClearColor(1.0f, 1.0f, 1.0f, 1.5f);
glOrtho(0,1.0f,0,1.0f,-1.0f,1.0f);
glMatrixMode(GL_MODELVIEW); glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
glLoadIdentity(); (1.0,0,-1.0) (1.0,0,1.0)

More on OpenGL coordinates later; for now,


we’ll just set up a simple orthogonal view
4 Create the CUDA Context 4)
5 Create a OpenGL Buffer(s)
• OpenGL context must be created first GLuint bufferID;

// Generate a buffer ID
• To create the CUDA context: glGenBuffers(1,&bufferID);
– Driver API: Use cuGLCtxCreate() instead of // Make this the current UNPACK buffer (OpenGL is state-based)
cuCtxCreate() glBindBuffer(GL_PIXEL_UNPACK_BUFFER, bufferID);
– Runtime API: Call cudaGLSetGLDevice() before any other // Allocate data for the buffer
API calls glBufferData(GL_PIXEL_UNPACK_BUFFER, width * height * 4,
NULL, GL_DYNAMIC_COPY);
• CUDA/OpenGL interop functions defined in:
– cudagl.h for driver API
– cuda_gl_interop.h in C Runtime for CUDA

5)
6 Register Buffers for CUDA Now, Let’s Actually Draw Something
• Driver API: • Common use case: drawing images
– cuGLRegisterBufferObject( GLuint bufferobj );
• Use Textures
– Unregister before freeing buffer:
cuGLUnregisterBufferObject( GLuint bufferobj );
• Textures are a ubiquitous feature of 3D graphics
• Runtime API: • Simple case: Just draw a texture on a Quad
– cudaGLRegisterBufferObject( GLuint bufObj );
– Unregister before freeing buffer:
cudaGLUnregisterBufferObject( GLuint bufObj );

These commands simply inform the OpenGL and CUDA


drivers that this buffer will be used by both
Textures Steps to Draw an Image From CUDA
1 Allocate a GL buffer the size of the image
2 Allocate a GL texture the size of the image
3 Map the GL buffer to CUDA memory
4 Write the image from CUDA to the mapped memory
5 Unmap the GL buffer
6 Create the texture from the GL buffer
7 Draw a Quad, specify the texture coordinates for each corner
8 Swap front and back buffers to draw to the display

1)
1 Allocate the GL Buffer 2)
2 Create a GL Texture
• Same as before, compute the number of bytes based upon the // Enable Texturing
image data type (avoid 3 byte pixels) glEnable(GL_TEXTURE_2D);
// Generate a texture ID
• Do once at startup, don’t reallocate unless buffer needs to grow glGenTextures(1,&textureID);
─ this is expensive // Make this the current texture (remember that GL is state-based)
GLuint bufferID; glBindTexture( GL_TEXTURE_2D, textureID);
// Generate a buffer ID
// Allocate the texture memory. The last parameter is NULL since we only
glGenBuffers(1,&bufferID); // want to allocate memory, not initialize it
// Make this the current UNPACK buffer (OpenGL is state-based) glTexImage2D( GL_TEXTURE_2D, 0, GL_RGBA8, Width, Height, 0, GL_BGRA,
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, bufferID); GL_UNSIGNED_BYTE, NULL);
// Must set the filter mode, GL_LINEAR enables interpolation when scaling
// Allocate data for the buffer. 4-channel 8-bit image glTexParameteri(GL_TEXTURE_2D,GL_TEXTURE_MIN_FILTER,GL_LINEAR);
glBufferData(GL_PIXEL_UNPACK_BUFFER, Width * Height * 4, glTexParameteri(GL_TEXTURE_2D,GL_TEXTURE_MAG_FILTER,GL_LINEAR);
NULL, GL_DYNAMIC_COPY);
Note: GL_TEXTURE_RECTANGLE_ARB may be used instead of GL_TEXTURE_2D
cudaGLRegisterBufferObject( bufferID );
for improved performance if linear interpolation is not desired. Replace
An OpenGL buffer used for pixels and bound as GL_PIXEL_UNPACK_BUFFER GL_LINEAR with GL_NEAREST in the glTexParameteri() call.
is commonly called a PBO (Pixel Buffer Object)
3)
3 Map the GL Buffer to CUDA 4)
4 Write to the Image
• Provides a CUDA pointer to the GL buffer─on • CUDA C kernels may now use the mapped memory
a single GPU no data is moved (Win & Linux) just like regular GMEM
• When mapped to CUDA, OpenGL should not use
this buffer • CUDA copy functions can use the mapped memory
as a source or destination
• Driver API:
– cuGLMapBufferObject( CUdeviceptr *dptr,
unsigned int *size, GLuint bufferobj );

• C Runtime for CUDA:


– cudaGLMapBufferObject(void **devPtr, GLuint bufObj);

5)
5 Unmap the GL Buffer 6)
6 Create a Texture From the Buffer
// Select the appropriate buffer
• Driver API: glBindBuffer( GL_PIXEL_UNPACK_BUFFER, bufferID);

– cuGLUnmapBufferObject( GLuint bufferobj ); // Select the appropriate texture


glBindTexture( GL_TEXTURE_2D, textureID);
• Runtime API:
// Make a texture from the buffer
– cudaGLUnmapBufferObject( GLuint bufObj); glTexSubImage2D( GL_TEXTURE_2D, 0, 0, 0, Width, Height,
GL_BGRA, GL_UNSIGNED_BYTE, NULL);

Source parameter is NULL, Data is


These functions wait for all previous GPU activity to coming from a PBO, not host memory
complete (asynchronous versions also available). Note: glTexSubImage2D will perform a format conversion if the buffer is a
different format from the texture. We created the texture with format
GL_RGBA8. In glTexSubImage2D we specified GL_BGRA and GL_UNSIGNED_INT.
This is a fast-path combination.
7)
7 Draw the Image! 8)
8 Swap
S Buffers
Just draw a single Quad with texture coordinates Eariler we specified a double buffered pixel format
for each vertex: (PFD_DOUBLEBUFFER).
glBegin(GL_QUADS); All drawing is done to a off-screen framebuffer. When
glTexCoord2f( 0, 1.0f);
glVertex3f(0,0,0); finished just swap the front & back buffers.
glTexCoord2f(0,0);
glVertex3f(0,1.0f,0);
SwapBuffers(hDC);
glTexCoord2f(1.0f,0);
glVertex3f(1.0f,1.0f,0);
glTexCoord2f(1.0f,1.0f); Note: Buffer swapping normally occurs at the vertical refresh interval to avoid
glVertex3f(1.0f,0,0); tearing (commonly 60 hz). You can turn off v-sync in the control panel to make
glEnd(); the swap instant (e.g., when benchmarking).

3D Geometry Coordinate Matrices


The Camera Analogy OpenGL’s coordinate systems
1. Position & Point the Camera at • Model-View Matrix: defines the camera
the Scene (View transform)
position and direction (alternatively the
2. Arrange the scene composition model’s position and orientation)
(Model transform)
3. Adjust the camera zoom • Projection Matrix: Defines the cameras
(Projection Transform) field-of-view and perspective
View Frustum
4. Choose the final size
(Viewport Transform)

See the OpenGL Red Book, page 106 Matrices are states. Manipulating a matrix applies to subsequent calls.
Model-View Transform Projection Transform
• Select the Model-View Matrix • Select the projection Matrix Just choose your lens!
– glMatrixMode(GL_MODELVIEW) – glMatrixMode(GL_PROJECTION)

• Common Operations • Useful Functions:


– glLoadIdentity() Resets the matrix – glLoadIdentity()

– glRotatef() – glOrtho()
– glFrustum()
– glTranslatef()
– gluLookAt()
– glScalef()
– gluPerspective()

Drawing Simple Geometry Vertex Arrays


• Primitives are stored in an OpenGL buffer
• glBegin() / glEnd() ─ Lots of options:
– Can be GL_POINTS, GL_LINES, GL_TRIANGLES, etc.
– GL_POINTS, GL_LINES, GL_LINE_STRIP, GL_LINE_LOOP,
GL_TRIANGLES, GL_TRIANGEL_STRIP, GL_TRIANGLE_FAN,
GL_QUAD_STRIP, GL_POLYGON
• Properties including Color, Texture Coordinates, Surface
Normals can also be stored in the array
• Use a glVertex*() function with glColor*(),
glTexCoord*() • glDrawArrays() is a very powerful mega-function;
Draws whatever is in the array to the screen
• Not very efficient, use only for simple geometry
• Mapping the Vertex Buffer to CUDA allows arbitrary
Note: Many OpenGL functions, such as glVertex*() actually refer to a group
data creation or manipulation!
of functions with different parameter options, e.g., glVertex3f(), glVertex2f(), An OpenGL buffer used for vertices and bound as GL_ARRAY_BUFFER is
glVertex3i()… commonly called a VBO (Vertex Buffer Object)
Using a Vertex Array With CUDA 1)
1 Allocate & Register the Buffer
Allocate the GL buffer for the Vertex array,
1 Register it for CUDA E.g., Each vertex contains 3 floating point coordinates (x,y,z)
and 4 color bytes (RGBA): total 16 bytes per vertex
2 Use CUDA to create/manipulate the data
GLuint vertexArray;
• Map the GL Buffer to CUDA glGenBuffers( 1,&vertexArray);
• Set the values for all vertices in the array glBindBuffer( GL_ARRAY_BUFFER, vertexArray);
• Unmap the GL Buffer glBufferData( GL_ARRAY_BUFFER, numVertices * 16, NULL,
GL_DYNAMIC_COPY );
3 Use OpenGL to Draw the Vertex Data cudaGLRegisterBufferObject( vertexArray );

• Bind the buffer as the GL_ARRAY_BUFFER


• Set the type and array pointers for the type of data in the array
• Draw the array (glDrawArrays() )

4 Swap Buffers

2)
2 Use CUDA to Create the Data 3)
3 Use GL to Draw the Array
void * vertexPointer; // Bind the Buffer
glBindBuffer( GL_ARRAY_BUFFER, vertexBuffer );
// Map the buffer to CUDA
cudaGLMapBufferObject(&ptr, vertexBuffer); // Enable Vertex and Color arrays
// Run a kernel to create/manipulate the data
glEnableClientState( GL_VERTEX_ARRAY );
MakeVerticiesKernel<<<gridSz,blockSz>>>(ptr,numVerticies); glEnableClientState( GL_COLOR_ARRAY );

// Unmap the buffer // Set the pointers to the vertices and colors
cudaGLUnmapbufferObject(vertexBuffer); glVertexPointer(3,GL_FLOAT,16,0);
glColorPointer(4,GL_UNSIGNED_BYTE,16,12);
This is how we
tell OpenGL what type
of data is in the buffer.
More on the Pointers Final Step to Draw
Each Vertex contains 3 coordinates + color: glDrawArrays(GL_POINTS,0, numVerticies);
X Y Z R G B A – Can also use: GL_LINES,
GL_LINE_STRIP,GL_LINE_LOOP,
0 4 8 12 16 GL_TRIANGLES, GL_TRIANGLE_STRIP,
GL_TRIANGLE_FAN, GL_QUADS,
GL_QUAD_STRIP, GL_POLYGON

glVertexPointer(3,GL_FLOAT,16,0)
16 bytes between
elements SwapBuffer();
3 coordinates/ float type
vertex 8-bit type
4 Channels NVIDIA APEX Turbulence Demo. CUDA
Fluid simulation creates particles
which are rendered in OpenGL.
glColorPointer(4,GL_UNSIGNED_BYTE,16,12)

You might also like