CUDA Programming Within Mathematica
CUDA Programming Within Mathematica
Introduction
CUDA is short for Common Unified Device Architecture. NVIDIA developed the language to program graphics cards and let you run C-like code on the Graphical Processing Unit (GPU). CUDA promises significant performance increases, and CUDA-designed programs will scale to multicore systems. Considering CUDAs advanced technology, you may expect its programming to be enormously complicated. Enter Mathematica, the easiest way to program for CUDA and unlock GPU performance potential. Unlike programming in C or developing CUDA wrapper code, now you dont have to be a programming wizard to use CUDA. Mathematica offers an intuitive environmenteven featuring built-in ready-to-use examples for common application areas, such as image processing, medical imaging, statistics, and finance that makes CUDA programming a breeze, even if youve never used Mathematica before. And if you have used Mathematica, youll be amazed by the massive boost in computational power, as well as application performance, enhancing speed by factors easily exceeding 100. In this document, we describe the benefits of CUDA integration in Mathematica and provide some applications for which it is suitable, such as image processing and financial engineering.
With Mathematica, memory and thread management for the GPU is completely automatic, which completely abolishes the need for extra code writing:
The memory between the host and the device is synchronized only as needed, to avoid data transfer (which is known to be very costly) unless absolutely needed. For advanced applications, full control of how and when the memory needs to be copied between the host and GPU devices is provided.
CUDA integration provides full access to Mathematicas native language and built-in functions. It also provides free exchange of data between Mathematica and users CUDA programs. With Mathematicas comprehensive symbolic and numerical functions, built-in application area CUDA Programming within the power 5 support, and graphical interface building functions, users can not only combine Mathematica of Mathematica and GPU computing, but also spend more time on developing and optimizing core CUDA kernel algorithms.
With Mathematicas comprehensive symbolic and numerical functions, built-in application area support, and graphical interface building functions, users can not only combine the power of Mathematica and GPU computing, but also spend more time on developing and optimizing core CUDA kernel algorithms.
Ready-to-use applications
CUDA integration in Mathematica provides several ready-to-use CUDA functions that cover a broad range of topics such as mathematics, image processing, financial engineering, and more. Examples will be given in Section 6.
Performance Improvements
GPUs have been using many cores for quite some time now. A current example is the NVIDIA GeForce GTX 480, which is a consumer card with 480 CUDA cores, support for double precision, and a retail price of less than $500. Mathematica provides support for multicore systems via built-in functions such as Parallelize, ParallelMap, etc., that free you from the cumbersome details of launching multiple kernels and processors and let you concentrate on the core implementation of your algorithms. This provides considerable speed up in many areas such as:
On a NVIDIA GT 240 with an Intel i7 950, FinancialDerivative improvements of up to 35x for binomial and 80x for Monte Carlo methods. Several random number generation algorithms have been implemented and speed ups of 50x have been observed. Volumetric rendering, implicit algebraic functions, and 3D fractals can be computed in real time, with a high frame rate.
System Requirements
To utilize Mathematicas CUDALink, the following operating systems and software are required:
Operating System: Windows, Linux, and Mac OS X, both 32- and 64-bit architecture. Mac OS X users need at least Mac OS X 10.6.3. NVIDIA CUDA enabled products. See www.nvidia.com/object/cuda_gpus.html for further information. Mathematica 8.0 or higher CUDA Toolkit 3.1 or higher A recent NVIDIA driver On Windows, Microsoft Visual Studio (Windows platforms) 2005 or higher
The following function verifies that the system has CUDA support:
CUDAQ@D True
Now, we will create a simple example that negates colors of a 3-channel image. First, write a CUDA kernel function as a string, and assign it to a variable:
kernel = " __global__ void cudaColorNegateHint *img, int *dim, int channelsL 8 int width = dim@0D, height = dim@1D; int xIndex = threadIdx.x + blockIdx.x * blockDim.x; int yIndex = threadIdx.y + blockIdx.y * blockDim.y; int index = channels * HxIndex + yIndex*widthL; if HxIndex < width && yIndex < heightL 8 for Hint c = 0; c < channels; c++L img@index + cD = 255 - img@index + cD;<<";
Pass that string to a built-in function CUDAFunctionLoad, along with the kernel function name and the argument specification. The last argument denotes the dimension of threads per block to be launched.
colorNegate = CUDAFunctionLoad@kernel, "cudaColorNegate", 88_Integer, _, "InputOutput"<, 8_Integer, _, "Input"<, _Integer<, 816, 16<D;
Several things are happening at this stage. Mathematica automatically compiles the kernel function as a dynamic library. There is no need for users to add system interface or memory management code. After compilation, the function is automatically bound to Mathematica and is ready to be called. Now you can apply this new CUDA function to any image format that Mathematica can handle.
i=
>
CUDAInformation generates a detailed report on supported CUDA devices. The returned data from CUDAInformation is a valid Mathematica input form, which means that it can be used to optimize CUDA kernel code programmatically. Several other functions are also provided that return in-depth information about Mathematica, operating systems, hardware, and C/C++ compilers that are currently used by CUDALink.
Mathematica also provides SymbolicC, which provides a hierarchical view of C code as Mathematicas own language. This makes it well suited to creating, manipulating, and optimizing C code. In conjunction with this capability, users can generate CUDA kernel code for several different targets for greater portability, less platform dependency, and better code optimization. Several built-in functions perform code generation, depending on the target:
SymbolicC
CUDACodeGenerate takes a CUDA kernel function or program and generates SymbolicC output. The SymbolicC output can then be used to render CUDA code that calls the correct functions to bind the CUDA code to Mathematica. CUDASymbolicCGenerate produces SymbolicC output for CUDA kernel as well as C++ wrapper code.
Dynamic library
CUDALibraryGenerate generates CUDA interface code and compiles it into a library that can be loaded into Mathematica. CUDALibraryFunctionGenerate takes a CUDA kernel function and generates the code required to use it. Then it takes the sources, compiles it to a dynamic library, and returns Mathematicas library function, which can be called with Mathematicas automatic dynamic library binding capability.
String
CUDACodeStringGenerate generates CUDA interface code in string form, which can be exported to other development platforms.
10
ManipulateBoperationB
, xF,
By specifying possible ranges for variables, Manipulate automatically chooses appropriate controls and creates a user interface around it.
Not only does Mathematica provide access to local resources, but any URL can be used to access data online. The following code imports an image from a given URL:
image = Import@ "http:gallery.wolfram.com2dpopup00_contourMosaic.pop.jpg"D;
The function Import automatically recognizes the file format and converts it into Mathematica expression. This can be directly used by CUDALink functions, such as CUDAImageAdd:
output = CUDAImageAddBimage,
All outputs from Mathematica functions, including the ones from CUDALink functions, are also expressions, and can be easily exported to one of the supported formats using the Export function. For example, the following code exports CUDA output into PNG format:
Export@"masked.png", outputD masked.png $ImportFormats and $ExportFormats give full lists of supported import and export formats.
OpenCL Compatibility
In addition to CUDALink, Mathematica supports OpenCL with the built-in package OpenCLLink, which provides the same benefits and functionality of GPU programming as CUDALink over OpenCL architecture.
Image Processing
CUDALink offers many image processing functions that have been carefully tuned for the GPU. These include pixel operations such as image arithmetic and composition; morphological operators such as such as erosion, dilation, opening, and closing; and image convolution and filtering. All of these operations work on either images or arrays of real and integer numbers.
12 CUDA Programming within Mathematica
Image convolution
CUDALinks convolution is similar to Mathematicas ListConvolve and ImageConvolve functions. It will operate on images, lists, or CUDA memory references, and it can use Mathematicas built-in filters as the kernel.
-1 0 1 -2 0 2 F -1 0 1
CUDAImageConvolveB
Convolving a microscopic image with a Sobel mask to detect edges. CUDALink supports simple pixel operations on one or two images, such as adding or multiplying pixel values from two images.
CUDAImageMultiplyB
Morphological operations
CUDALink supports fundamental operations such as erosion, dilation, opening, and closing. CUDAErosion , CUDADilation, CUDAOpening , and CUDAClosing are equivalent to Mathematicas built-in Erosion, Dilation, Opening, and Closing functions. More sophisticated morphological operations can be built using these fundamental operations.
CUDA Programming within Mathematica 13
CUDALink supports fundamental operations such as erosion, dilation, opening, and closing. CUDAErosion , CUDADilation, CUDAOpening , and CUDAClosing are equivalent to Mathematicas built-in Erosion, Dilation, Opening, and Closing functions. More sophisticated morphological operations can be built using these fundamental operations.
Video Processing
CUDALinks built-in image processing functions can also be applied to videos to perform realtime filtering. Many common formats such as H.264, QuickTime, and DivX are supported. With GPU computing power, CUDALinks video processing function can easily handle full high-resolution video (1080p) filtering in 30 frames per second.
Linear Algebra
You can perform various linear algebra functions with the GPU. Examples include vector addition, products, and other operations, finding minimum or maximum elements, or transposing rows and columns of an image.
Fourier Analysis
The Fourier analysis capabilities of the CUDALink package include forward and inverse Fourier transforms. The CUDAFourier function can operate on a list of 1D, 2D, or 3D real or complex numbers, or on data held in a chunk of device memory that is registered with CUDALinks Fourier memory manager.
14
Fluid Dynamics
Computational fluid dynamics examples are included with CUDALink. They simulate a large number of particles in real time using the GPUs computing power.
Fluid simulation with multi-particles. CUDALinks options pricing function uses the binomial or Monte Carlo method, depending on the type of option selected. Computing options on the GPU can be dozens of times faster than using the CPU and parallel processing.
Volumetric Rendering
CUDALink includes functions to read and display volumetric data in 3D, with interactive interfaces for transfer functions and other volume-rendering parameter controls.
Summary
Thanks to Mathematicas integrated platform design, all functionality is included without the need to buy, learn, use, and maintain multiple tools and add-on packages. With its simplified development cycle, automatic memory management, multicore computing, and built-in functions for many applicationsplus full integration with all of Mathematicas other computation, development, and deployment capabilitiesMathematicas built-in CUDALink package provides a powerful interface for GPU computing.
2010 Wolfram Research, Inc. Mathematica is a registered trademark and Mathematica Player is a trademark of Wolfram Research, Inc. Wolfram|Alpha is a registered trademark and computational knowledge engine is a trademark of Wolfram Alpha LLCA Wolfram Research Company. All other trademarks are the property of their respective owners. Mathematica is not associated with Mathematica Policy Research, Inc. or Mathtech, Inc. MKT3014 2060699 0910.hk
16