0% found this document useful (0 votes)

45 views21 pages

GPGPUs CUDA

The document discusses GPGPUs and how they can be used for general purpose computing. GPGPUs provide high performance computing capabilities at low costs using graphics cards. Programming frameworks like CUDA allow using the graphics card for parallel computing applications like image processing and 3D reconstruction.

Uploaded by

gamer29

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views21 pages

GPGPUs CUDA

Uploaded by

gamer29

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 21

Computing with GPGPUs

Raj Singh
National Center for Microscopy and Imaging Research

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

Graphics Processing Unit (GPU)

Development driven by the multi-billion dollar game industry
Bigger than Hollywood

Need for physics, AI and complex lighting models Impressive Flops / dollar performance
Hardware has to be affordable

Evolution speed surpasses Moores law

Performance doubling approximately 6 months

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

GPU evolution curve

*Courtesy: Nvidia Corporation GPGPUs and CUDA Guest Lecture, CSE167, Fall 2008

GPGPUs (General Purpose GPUs)

A natural evolution of GPUs to support a wider range of applications Widely accepted by the scientific community Cheap high-performance GPGPUs are now available
Its possible to buy a $500 card which can provide almost 2 TFlops of computing.

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

Teraflop computing
Supercomputers are still rated in Teraflops
Expensive and power hungry Not exclusive and have to be shared by several organizations Custom built in several cases

National Center for Atmospheric Research, Boulder installed a 12 Tflop supercomputer in 2007

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

What does it mean for the scientist ?

Desktop supercomputers are possible Energy efficient
Approx 200 Watts / Teraflop

Turnaround time can be cut down by magnitudes.

Simulations/Jobs can take several days

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

GPU hardware
Highly parallel architecture
Akin to SIMD

Designed initially for efficient matrix operations and pixel manipulations pipelines Computing core is lot simpler
No memory management support 64-bit native cores Little or no cache Double precision support.
GPGPUs and CUDA Guest Lecture, CSE167, Fall 2008

Multi-core Horsepower
Latest Nvidia card has 480 cores for simultaneous processing Very high memory bandwidth
> 100 GBytes / sec and increasing

Perfect for embarrassingly parallel compute intensive problems Clusters of GPGPUs available in GreenLight
Figures courtesy: Nvidia programming guide 2.0 GPGPUs and CUDA Guest Lecture, CSE167, Fall 2008

CPU v/s GPU

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

Programming model
The GPU is seen as a compute device to execute a portion of an application that
Has to be executed many times Can be isolated as a function Works independently on different data

Such a function can be compiled to run on the device. The resulting program is called a Kernel
C like language helps in porting existing code.

Copies of kernel execute simultaneously as threads.

GPGPUs and CUDA

Figure courtesy: Nvidia programming guide 2.0 Guest Lecture, CSE167, Fall 2008

Look Ma no cache ..

Cache is expensive By running thousands of fast-switching light threads large memory latency can be masked Context switching of threads is handled by CUDA
Users have little control, only synchronization
GPGPUs and CUDA Guest Lecture, CSE167, Fall 2008

CUDA / OpenCL
A non-OpenGL oriented API to program the GPUs Compiler and tools allow porting of existing C code fairly rapidly Libraries for common math functions like trigonometric, pow(), exp() Provides support for general DRAM memory addressing
Scatter / gather operations

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

What do we do at NCMIR / CALIT2 ?

Research on large data visualization, optical networks and distributed system. Collaborate with Earth sciences, Neuroscience, Gene research, Movie industry Large projects funded by NSF / NIH
NSF EarthScope

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

Electron and Light Microscopes

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

Cluster Driven High-Resolution displays data end-points

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

Electron Tomography
Used for constructing 3D view of a thin biological samples Sample is rotated around an axis and images are acquired for each tilt angle Electron tomography enables high resolution views of cellular and neuronal structures. 3D reconstruction is a complex problem due to high noise to signal ratio, curvilinear electron path, sample deformation, scattering, magnetic lens aberrations
GPGPUs and CUDA

Biological sample
Curvilinear electron path

Tilt series images

Guest Lecture, CSE167, Fall 2008

Challenges
Use a Bundle Adjustment procedure to correct for curvilinear electron path and sample deformation Evaluation of electron micrographs correspondences needs to be done with double precision when using highorder polynomial mappings Non-linear electron projection makes reconstruction computationally intensive. Wide field of view for large datasets CCD cameras are up to 8K x 8K

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

Reconstruction on GPUs
Large datasets take up to several days to reconstruct on a fast serial processor. Goal is to achieve real-time reconstruction Computation is embarrassingly parallel at the tilt level GTX 280 with double-precision support and 240 cores has shown speedups between 10X 50X for large data Tesla units with 4Tflops are the next target for the code.

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

Really ? Free Lunch ?

C-like language support
Missing support for function pointers, recursion, double precision not very accurate, no direct access to I/O Cannot pass structures, unions

Code has to be fairly simple and free of dependencies

Completely self contained in terms of data and variables.

Speedups depend on efficient code

Programmers have to code the parallelism.
No magic spells available for download

Combining CPU and GPU code might be better in cases

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

And more cons

Performance is best for computation intensive apps.
Data intensive apps can be tricky.

Bank conflicts hurt performance Its a black-box with little support for runtime debugging.

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

Resources
http://www.gpgpu.org http://www.nvidia.com/object/cuda_home. html# http://www.nvidia.com/object/cuda_develo p.html http://fastra.ua.ac.be/en/index.html

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

Bendix EC-80 ABS / ATC Controllers: (ABS) Devices Designed To Help Improve The Braking
100% (4)
Bendix EC-80 ABS / ATC Controllers: (ABS) Devices Designed To Help Improve The Braking
44 pages
Report On Gpu
No ratings yet
Report On Gpu
39 pages
CEPA Surface Loading Stress Excel 2007 2010.Xlsmלחץ על צנרת מתחת לכביש
No ratings yet
CEPA Surface Loading Stress Excel 2007 2010.Xlsmלחץ על צנרת מתחת לכביש
1 page
GPU Computing Revolution CUDA
100% (1)
GPU Computing Revolution CUDA
5 pages
Gpgpu Workshop Cuda
No ratings yet
Gpgpu Workshop Cuda
10 pages
Intro GPUs
No ratings yet
Intro GPUs
36 pages
Gpu Computing. Graphics Processing Units (Gpus) Are High-Performance Many-Core Processors
100% (1)
Gpu Computing. Graphics Processing Units (Gpus) Are High-Performance Many-Core Processors
2 pages
Why GPU?: CS8803SC Software and Hardware Cooperative Computing
No ratings yet
Why GPU?: CS8803SC Software and Hardware Cooperative Computing
14 pages
1 Cuda
100% (1)
1 Cuda
173 pages
p10 Cuda
No ratings yet
p10 Cuda
28 pages
Lecture 2
No ratings yet
Lecture 2
15 pages
10 - Introduction and Overview GPGPU
100% (1)
10 - Introduction and Overview GPGPU
69 pages
Cuda Basics
No ratings yet
Cuda Basics
44 pages
Bandwidth Intensive 3-D FFT Kernel For Gpus Using Cuda: Akira Nukada, Yasuhiko Ogata, Toshio Endo, Satoshi Matsuoka
No ratings yet
Bandwidth Intensive 3-D FFT Kernel For Gpus Using Cuda: Akira Nukada, Yasuhiko Ogata, Toshio Endo, Satoshi Matsuoka
11 pages
CUDA Wikipedia
No ratings yet
CUDA Wikipedia
10 pages
Getting Started With CUDA Samples
No ratings yet
Getting Started With CUDA Samples
9 pages
Programming Gpus With Cuda: John Mellor-Crummey
No ratings yet
Programming Gpus With Cuda: John Mellor-Crummey
42 pages
Introduction To CUDA
No ratings yet
Introduction To CUDA
51 pages
Brodtkorb Etal Meta10
No ratings yet
Brodtkorb Etal Meta10
15 pages
06 Intro Gpus
No ratings yet
06 Intro Gpus
33 pages
CUDA
No ratings yet
CUDA
46 pages
cs179 2024 Lec01
No ratings yet
cs179 2024 Lec01
26 pages
Cks 2012 It Art 002
No ratings yet
Cks 2012 It Art 002
10 pages
Chapter 5 - General Purpose PGPU, CUDA
No ratings yet
Chapter 5 - General Purpose PGPU, CUDA
70 pages
Christian Eh An Sen 2
No ratings yet
Christian Eh An Sen 2
18 pages
Cuda Opencl
No ratings yet
Cuda Opencl
17 pages
Cuda Lab Manual
100% (1)
Cuda Lab Manual
22 pages
Accelerating Large Graph Algorithms On The GPU Using Cuda
No ratings yet
Accelerating Large Graph Algorithms On The GPU Using Cuda
12 pages
Thesis Gpu Programming
100% (2)
Thesis Gpu Programming
6 pages
UNIT 4 GPU Computing - HPC
No ratings yet
UNIT 4 GPU Computing - HPC
13 pages
Graphics Processing Unit (GPU) Programming Strategies and Trends in GPU Computing
No ratings yet
Graphics Processing Unit (GPU) Programming Strategies and Trends in GPU Computing
10 pages
GPU Quicksort
No ratings yet
GPU Quicksort
22 pages
Barnett Haskins
No ratings yet
Barnett Haskins
29 pages
Kirk+Hwu GPU
No ratings yet
Kirk+Hwu GPU
92 pages
Part1 22
No ratings yet
Part1 22
77 pages
Unit 2 - GPU DFG
No ratings yet
Unit 2 - GPU DFG
27 pages
Graphics Processing Units Paper PDF
No ratings yet
Graphics Processing Units Paper PDF
14 pages
Lecture 1
No ratings yet
Lecture 1
17 pages
Lec 1
No ratings yet
Lec 1
27 pages
Gpus
No ratings yet
Gpus
32 pages
Introduction To GP-GPU and CUDA: High Performance Computing Center Hanoi University of Science & Technology
No ratings yet
Introduction To GP-GPU and CUDA: High Performance Computing Center Hanoi University of Science & Technology
43 pages
8 Things You Should Know About GPGPU Technology: Q&A With TACC Research Scientists
No ratings yet
8 Things You Should Know About GPGPU Technology: Q&A With TACC Research Scientists
2 pages
Accelerating Large Graph Algorithms On The GPU Using CUDA
No ratings yet
Accelerating Large Graph Algorithms On The GPU Using CUDA
12 pages
Owens
No ratings yet
Owens
67 pages
Graphics Processing Unit GPU Programming Strategie
No ratings yet
Graphics Processing Unit GPU Programming Strategie
14 pages
Lecture-12-PDC - CUDA
No ratings yet
Lecture-12-PDC - CUDA
25 pages
CUDA Tutorial
No ratings yet
CUDA Tutorial
50 pages
Seminar Igor Kamzic COSC3P93
No ratings yet
Seminar Igor Kamzic COSC3P93
58 pages
Unit 5'
No ratings yet
Unit 5'
33 pages
Lecture 2
No ratings yet
Lecture 2
77 pages
Distributed-Shared CUDA: Virtualization of Large-Scale GPU Systems For Programmability and Reliability
No ratings yet
Distributed-Shared CUDA: Virtualization of Large-Scale GPU Systems For Programmability and Reliability
6 pages
Unit 4
No ratings yet
Unit 4
48 pages
Khan Muhammad Nafee Mostafa: Presented by
No ratings yet
Khan Muhammad Nafee Mostafa: Presented by
20 pages
Lecture 1: An Introduction To CUDA: Mike Giles
No ratings yet
Lecture 1: An Introduction To CUDA: Mike Giles
247 pages
Lecture 1: An Introduction To CUDA: Mike Giles
No ratings yet
Lecture 1: An Introduction To CUDA: Mike Giles
40 pages
ECE 498AL The CUDA Programming Model
No ratings yet
ECE 498AL The CUDA Programming Model
37 pages
1 Tutorial Intro
No ratings yet
1 Tutorial Intro
27 pages
GPU Architecture Ebook
No ratings yet
GPU Architecture Ebook
67 pages
GPU Assembly and Shader Programming for Compute: Low-Level Optimization Techniques for High-Performance Parallel Processing
From Everand
GPU Assembly and Shader Programming for Compute: Low-Level Optimization Techniques for High-Performance Parallel Processing
Robert Johnson
No ratings yet
Mastering CUDA C Programming
From Everand
Mastering CUDA C Programming
Ed Norex
No ratings yet
Practical GPU Programming: High-performance computing with CUDA, CuPy, and Python on modern GPUs
From Everand
Practical GPU Programming: High-performance computing with CUDA, CuPy, and Python on modern GPUs
Maris Fenlor
No ratings yet
Practical GPU Programming
From Everand
Practical GPU Programming
Maris Fenlor
No ratings yet
Snapdragon 616 Processor Product Brief
No ratings yet
Snapdragon 616 Processor Product Brief
2 pages
C H E T: Entre For Igher Ducation & Raining
No ratings yet
C H E T: Entre For Igher Ducation & Raining
1 page
Analysis of Fully Developed Turbulent Flow in A Axi-Symmetric Pipe Using Ansys Fluent Software
No ratings yet
Analysis of Fully Developed Turbulent Flow in A Axi-Symmetric Pipe Using Ansys Fluent Software
10 pages
Under Supervision Of: Submitted by Mr. O.N. Singh (S.D.O.) Prakash Hari Sharma Final Year (CSE) HRIT, Ghaziabad
No ratings yet
Under Supervision Of: Submitted by Mr. O.N. Singh (S.D.O.) Prakash Hari Sharma Final Year (CSE) HRIT, Ghaziabad
23 pages
Foodai: Food Image Recognition Via Deep Learning For Smart Food Logging
No ratings yet
Foodai: Food Image Recognition Via Deep Learning For Smart Food Logging
9 pages
SHC Worksheet Tes
No ratings yet
SHC Worksheet Tes
2 pages
Nokia 2300 English
No ratings yet
Nokia 2300 English
54 pages
JSA Chemical Cleaning 1711115740
No ratings yet
JSA Chemical Cleaning 1711115740
4 pages
High Resolution Digital-To-Time Converter For Low Jitter Digital Plls
No ratings yet
High Resolution Digital-To-Time Converter For Low Jitter Digital Plls
4 pages
02 Vinyl Sheet Pile Catalogue-PT. GSi
No ratings yet
02 Vinyl Sheet Pile Catalogue-PT. GSi
20 pages
Professionalism and Ethics: Engineering As A Profession
No ratings yet
Professionalism and Ethics: Engineering As A Profession
8 pages
VTU Exam Question Paper With Solution of 18CV54 Basic Geotechnical Engineering Jan-2021-Divya Viswanath
No ratings yet
VTU Exam Question Paper With Solution of 18CV54 Basic Geotechnical Engineering Jan-2021-Divya Viswanath
21 pages
Camlock
No ratings yet
Camlock
3 pages
Rhino Motion Controls: RMCS-1102 Micro-Stepping Motor Driver (Max. 50Vdc and 5A Per Phase)
No ratings yet
Rhino Motion Controls: RMCS-1102 Micro-Stepping Motor Driver (Max. 50Vdc and 5A Per Phase)
11 pages
Evaluation of Open Pit Mine Slope Stability Analysis - Verma Dkk.
No ratings yet
Evaluation of Open Pit Mine Slope Stability Analysis - Verma Dkk.
12 pages
Datasheet SimSci TRISIMPlus
No ratings yet
Datasheet SimSci TRISIMPlus
5 pages
Computer Basics
No ratings yet
Computer Basics
44 pages
CSE 2242: Graphics Lab 01 - Introduction To Graphics in Java: Aims and Objectives
No ratings yet
CSE 2242: Graphics Lab 01 - Introduction To Graphics in Java: Aims and Objectives
5 pages
Learning C++ by Creating Games With UE4 - Sample Chapter
No ratings yet
Learning C++ by Creating Games With UE4 - Sample Chapter
57 pages
Jsa Jis H 8304
100% (2)
Jsa Jis H 8304
30 pages
Requisition Import White Paper
50% (2)
Requisition Import White Paper
12 pages
MD HR 9
No ratings yet
MD HR 9
1 page
Handbook - Innovative Solutions For Construction
No ratings yet
Handbook - Innovative Solutions For Construction
47 pages
Welding Parameters of XABO 890 960
No ratings yet
Welding Parameters of XABO 890 960
11 pages
BL 6. Scaled - Professional - Scrum
100% (1)
BL 6. Scaled - Professional - Scrum
5 pages
Unit-4 (B) (Three Phase Transfromers)
No ratings yet
Unit-4 (B) (Three Phase Transfromers)
32 pages
T-60 Operation Manual
No ratings yet
T-60 Operation Manual
114 pages
Swps Profile
No ratings yet
Swps Profile
16 pages
Cajas de Medidores
No ratings yet
Cajas de Medidores
16 pages
Design of Joints
No ratings yet
Design of Joints
41 pages
Basic Router Cmmds Shortcut Keys
No ratings yet
Basic Router Cmmds Shortcut Keys
5 pages
Universal Water Swivel
No ratings yet
Universal Water Swivel
1 page

GPGPUs CUDA

Uploaded by

GPGPUs CUDA

Uploaded by

Computing with GPGPUs

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

Graphics Processing Unit (GPU)

Evolution speed surpasses Moores law

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

GPU evolution curve

GPGPUs (General Purpose GPUs)

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

What does it mean for the scientist ?

Turnaround time can be cut down by magnitudes.

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

CPU v/s GPU

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

Copies of kernel execute simultaneously as threads.

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

What do we do at NCMIR / CALIT2 ?

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

Electron and Light Microscopes

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

Cluster Driven High-Resolution displays data end-points

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

Tilt series images

Guest Lecture, CSE167, Fall 2008

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

Really ? Free Lunch ?

Code has to be fairly simple and free of dependencies

Speedups depend on efficient code

Combining CPU and GPU code might be better in cases

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

And more cons

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

GPGPUs and CUDA

Guest Lecture, CSE167, Fall 2008

You might also like