0% found this document useful (0 votes)
45 views21 pages

GPGPUs CUDA

The document discusses GPGPUs and how they can be used for general purpose computing. GPGPUs provide high performance computing capabilities at low costs using graphics cards. Programming frameworks like CUDA allow using the graphics card for parallel computing applications like image processing and 3D reconstruction.

Uploaded by

gamer29
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views21 pages

GPGPUs CUDA

The document discusses GPGPUs and how they can be used for general purpose computing. GPGPUs provide high performance computing capabilities at low costs using graphics cards. Programming frameworks like CUDA allow using the graphics card for parallel computing applications like image processing and 3D reconstruction.

Uploaded by

gamer29
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 21

Computing with GPGPUs

Raj Singh
National Center for Microscopy and Imaging Research

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

Graphics Processing Unit (GPU)


Development driven by the multi-billion dollar game industry
Bigger than Hollywood

Need for physics, AI and complex lighting models Impressive Flops / dollar performance
Hardware has to be affordable

Evolution speed surpasses Moores law


Performance doubling approximately 6 months

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

GPU evolution curve

*Courtesy: Nvidia Corporation GPGPUs and CUDA Guest Lecture, CSE167, Fall 2008

GPGPUs (General Purpose GPUs)


A natural evolution of GPUs to support a wider range of applications Widely accepted by the scientific community Cheap high-performance GPGPUs are now available
Its possible to buy a $500 card which can provide almost 2 TFlops of computing.

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

Teraflop computing
Supercomputers are still rated in Teraflops
Expensive and power hungry Not exclusive and have to be shared by several organizations Custom built in several cases

National Center for Atmospheric Research, Boulder installed a 12 Tflop supercomputer in 2007

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

What does it mean for the scientist ?


Desktop supercomputers are possible Energy efficient
Approx 200 Watts / Teraflop

Turnaround time can be cut down by magnitudes.


Simulations/Jobs can take several days

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

GPU hardware
Highly parallel architecture
Akin to SIMD

Designed initially for efficient matrix operations and pixel manipulations pipelines Computing core is lot simpler
No memory management support 64-bit native cores Little or no cache Double precision support.
GPGPUs and CUDA Guest Lecture, CSE167, Fall 2008

Multi-core Horsepower
Latest Nvidia card has 480 cores for simultaneous processing Very high memory bandwidth
> 100 GBytes / sec and increasing

Perfect for embarrassingly parallel compute intensive problems Clusters of GPGPUs available in GreenLight
Figures courtesy: Nvidia programming guide 2.0 GPGPUs and CUDA Guest Lecture, CSE167, Fall 2008

CPU v/s GPU

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

Programming model
The GPU is seen as a compute device to execute a portion of an application that
Has to be executed many times Can be isolated as a function Works independently on different data

Such a function can be compiled to run on the device. The resulting program is called a Kernel
C like language helps in porting existing code.

Copies of kernel execute simultaneously as threads.


GPGPUs and CUDA

Figure courtesy: Nvidia programming guide 2.0 Guest Lecture, CSE167, Fall 2008

Look Ma no cache ..

Cache is expensive By running thousands of fast-switching light threads large memory latency can be masked Context switching of threads is handled by CUDA
Users have little control, only synchronization
GPGPUs and CUDA Guest Lecture, CSE167, Fall 2008

CUDA / OpenCL
A non-OpenGL oriented API to program the GPUs Compiler and tools allow porting of existing C code fairly rapidly Libraries for common math functions like trigonometric, pow(), exp() Provides support for general DRAM memory addressing
Scatter / gather operations

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

What do we do at NCMIR / CALIT2 ?


Research on large data visualization, optical networks and distributed system. Collaborate with Earth sciences, Neuroscience, Gene research, Movie industry Large projects funded by NSF / NIH
NSF EarthScope

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

Electron and Light Microscopes

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

Cluster Driven High-Resolution displays data end-points

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

Electron Tomography
Used for constructing 3D view of a thin biological samples Sample is rotated around an axis and images are acquired for each tilt angle Electron tomography enables high resolution views of cellular and neuronal structures. 3D reconstruction is a complex problem due to high noise to signal ratio, curvilinear electron path, sample deformation, scattering, magnetic lens aberrations
GPGPUs and CUDA

Biological sample
Curvilinear electron path

Tilt series images

Guest Lecture, CSE167, Fall 2008

Challenges
Use a Bundle Adjustment procedure to correct for curvilinear electron path and sample deformation Evaluation of electron micrographs correspondences needs to be done with double precision when using highorder polynomial mappings Non-linear electron projection makes reconstruction computationally intensive. Wide field of view for large datasets CCD cameras are up to 8K x 8K

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

Reconstruction on GPUs
Large datasets take up to several days to reconstruct on a fast serial processor. Goal is to achieve real-time reconstruction Computation is embarrassingly parallel at the tilt level GTX 280 with double-precision support and 240 cores has shown speedups between 10X 50X for large data Tesla units with 4Tflops are the next target for the code.

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

Really ? Free Lunch ?


C-like language support
Missing support for function pointers, recursion, double precision not very accurate, no direct access to I/O Cannot pass structures, unions

Code has to be fairly simple and free of dependencies


Completely self contained in terms of data and variables.

Speedups depend on efficient code


Programmers have to code the parallelism.
No magic spells available for download

Combining CPU and GPU code might be better in cases

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

And more cons


Performance is best for computation intensive apps.
Data intensive apps can be tricky.

Bank conflicts hurt performance Its a black-box with little support for runtime debugging.

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

Resources
http://www.gpgpu.org http://www.nvidia.com/object/cuda_home. html# http://www.nvidia.com/object/cuda_develo p.html http://fastra.ua.ac.be/en/index.html

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

You might also like