8.4 GPU Architecture and Programming

Uploaded by

Amir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

8.4 GPU Architecture and Programming

Uploaded by

Amir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 27

GPU Architecture &

Programming
System with simple CPU and Video Cards
System with GPU
Multicore CPU vs GPU
CUDA Applications
on GPU
Example 1 : Hello World from GPU (1x10)
$ ./hello
#include <stdio.h>
Hello World from CPU!
__global__ void helloFromGPU (void) Hello World from GPU!
{ Hello World from GPU!
Hello World from GPU!
printf(“Hello World from GPU!\n”); Hello World from GPU!
} Hello World from GPU!
int main(void) Hello World from GPU!
Hello World from GPU!
{ Hello World from GPU!
// hello from cpu Hello World from GPU!
printf(“Hello World from CPU!\n”); Hello World from GPU!
helloFromGPU <<<1, 10>>>();
cudaDeviceReset();
return 0;
}
Memory Management Functions
Data Transfer between CPU and GPU
Example 2 : Data Transfer CPU GPU
CUDA Threads (SIMT)
Thread Hierarchy

Threads launched for a parallel section are partitioned

into thread blocks

Grid = all blocks for a given launch

Thread block is a group of threads that can:

Synchronize their execution

Communicate via shared memory
CUDA C Examples
Vector Addition
https://github.com/olcf-tutorials/vector_addition_cuda
Finding Maximum Value in an array
https://www.geeksforgeeks.org/how-to-run-cuda-c-c-on-jupyter-
notebook-in-google-colaboratory/
Matrix Addition
https://github.com/jcbacong/CUDA-matrix-addition
Matrix Multiplication
https://github.com/lzhengchun/matrix-cuda
GPU Compute Capability and Limits
• Technical Specifications per Compute Capability from 5.0 to 9.0
Maximum dimensionality of grid of thread blocks 3
Maximum x -dimension of a grid of thread blocks 231-1
Maximum y- or z-dimension of a grid of thread blocks 65535
Maximum dimensionality of thread block 3
Maximum x- or y-dimensionality of a block 1024
Maximum z-dimension of a block 64
Maximum number of threads per block 1024

• https://developer.nvidia.com/cuda-gpus
• https://docs.nvidia.com/cuda/cuda-c-programming-guide/
index.html#features-and-technical-specifications
Reference
• Book : CUDA C/C++ Programming Guide
https://docs.nvidia.com/cuda/cuda-c-programming-guide
• Book : PROFESSIONAL CUDA® C Programming – 2014
https://www.cs.utexas.edu/~rossbach/cs380p/papers/cuda-programming.pdf

• Tutorials (CMU &

https://people.cs.pitt.edu/~melhem/courses/xx45p/cuda1.pdf
https://www.cs.cmu.edu/afs/cs/academic/class/15418-s18/www/lectures/06_gpuarch.pdf
• Workshop (Cornell)
https://cvw.cac.cornell.edu/GPUarch/default

GPU Cluster4
No ratings yet
GPU Cluster4
31 pages
GPU Basics
No ratings yet
GPU Basics
93 pages
COE4590_15_GPU1
No ratings yet
COE4590_15_GPU1
14 pages
Programming Gpus With Cuda: John Mellor-Crummey
No ratings yet
Programming Gpus With Cuda: John Mellor-Crummey
42 pages
CUDA
No ratings yet
CUDA
33 pages
GPGPU Programming With CUDA: Leandro Avila - University of Northern Iowa
No ratings yet
GPGPU Programming With CUDA: Leandro Avila - University of Northern Iowa
29 pages
GPU Architecture Ebook
No ratings yet
GPU Architecture Ebook
67 pages
Topic GPU1
No ratings yet
Topic GPU1
32 pages
Chapter 5 - General Purpose PGPU, CUDA
No ratings yet
Chapter 5 - General Purpose PGPU, CUDA
70 pages
From CPU To GPU With CUDA C Language: Michele Tuttafesta Dottorato Di Ricerca in Fisica 25 Ciclo
No ratings yet
From CPU To GPU With CUDA C Language: Michele Tuttafesta Dottorato Di Ricerca in Fisica 25 Ciclo
71 pages
Lec 2 PDC
No ratings yet
Lec 2 PDC
31 pages
chapter-8
No ratings yet
chapter-8
58 pages
CUDA_1
No ratings yet
CUDA_1
45 pages
GPU Programming: Dr. Florian Ferreira
No ratings yet
GPU Programming: Dr. Florian Ferreira
101 pages
27th Aug - Introduction To GPGPU - Part 1
No ratings yet
27th Aug - Introduction To GPGPU - Part 1
32 pages
04 IntroductionGPUsCUDA
No ratings yet
04 IntroductionGPUsCUDA
25 pages
IntroGPUs
No ratings yet
IntroGPUs
36 pages
High Performance Computing On Gpu
No ratings yet
High Performance Computing On Gpu
37 pages
cs179 2017 Lec01
No ratings yet
cs179 2017 Lec01
24 pages
Introduction To Programming Massively Parallel Graphics Processors
No ratings yet
Introduction To Programming Massively Parallel Graphics Processors
84 pages
8 Cud A 1
No ratings yet
8 Cud A 1
38 pages
CUDA Tutorial
No ratings yet
CUDA Tutorial
50 pages
ECE 498AL The CUDA Programming Model
No ratings yet
ECE 498AL The CUDA Programming Model
37 pages
Vector & Array Processing
No ratings yet
Vector & Array Processing
25 pages
GTC-S62191 (1)
No ratings yet
GTC-S62191 (1)
89 pages
Cuda Talk
100% (1)
Cuda Talk
82 pages
21.L18 Intro To GPU and CUDA C
No ratings yet
21.L18 Intro To GPU and CUDA C
89 pages
Quiz3 - Pacuribot
No ratings yet
Quiz3 - Pacuribot
4 pages
1 Cuda
100% (1)
1 Cuda
173 pages
Using GPUs
No ratings yet
Using GPUs
18 pages
Data-Level Parallelism in Vector, SIMD, And: GPU Architectures
No ratings yet
Data-Level Parallelism in Vector, SIMD, And: GPU Architectures
29 pages
GPU Computing 3
No ratings yet
GPU Computing 3
32 pages
CUDA Compute Unified Device Architecture
No ratings yet
CUDA Compute Unified Device Architecture
26 pages
Lec 1
No ratings yet
Lec 1
27 pages
Parallel Processing With Cuda
No ratings yet
Parallel Processing With Cuda
25 pages
UNIT-4
No ratings yet
UNIT-4
48 pages
Lecture GPUArchCUDA01
No ratings yet
Lecture GPUArchCUDA01
57 pages
лк CUDA - 1 PDCn
No ratings yet
лк CUDA - 1 PDCn
31 pages
Lecture 2
No ratings yet
Lecture 2
77 pages
Lecture 1
No ratings yet
Lecture 1
17 pages
Seminar Igor Kamzic COSC3P93
No ratings yet
Seminar Igor Kamzic COSC3P93
58 pages
Lec 3
No ratings yet
Lec 3
48 pages
CH19 COA10e
No ratings yet
CH19 COA10e
20 pages
Lecture2 GPU Architecture_2025
No ratings yet
Lecture2 GPU Architecture_2025
46 pages
PART19
No ratings yet
PART19
20 pages
GPU Architecture
No ratings yet
GPU Architecture
8 pages
CUDA
No ratings yet
CUDA
46 pages
GPU Architecture & Implications: David Luebke NVIDIA Research
No ratings yet
GPU Architecture & Implications: David Luebke NVIDIA Research
94 pages
Why GPU?: CS8803SC Software and Hardware Cooperative Computing
No ratings yet
Why GPU?: CS8803SC Software and Hardware Cooperative Computing
14 pages
002 - Introduction To CUDA Programming - 1
No ratings yet
002 - Introduction To CUDA Programming - 1
54 pages
GPU_Architecture_and_Programming_Lecture
No ratings yet
GPU_Architecture_and_Programming_Lecture
9 pages
Lecture-12-GPU-Programming
No ratings yet
Lecture-12-GPU-Programming
65 pages
Gpu1 - GPU Introduction
No ratings yet
Gpu1 - GPU Introduction
20 pages
Introduction To Gpu Programming With Cuda and Openacc
100% (1)
Introduction To Gpu Programming With Cuda and Openacc
40 pages
cuda
No ratings yet
cuda
25 pages
Multicore Computers
No ratings yet
Multicore Computers
9 pages
Comp Arch Project 2 Final
No ratings yet
Comp Arch Project 2 Final
29 pages
Lecture2 Cuda Basic 2010
No ratings yet
Lecture2 Cuda Basic 2010
44 pages
LPIC-1 Primer
From Everand
LPIC-1 Primer
John Greene
4.5/5 (3)
Foundation Course for Advanced Computer Studies
From Everand
Foundation Course for Advanced Computer Studies
Franck Ismael Djédjé
No ratings yet
Important MCQS
No ratings yet
Important MCQS
5 pages
8.1 CUDA Setup For Google CoLab
No ratings yet
8.1 CUDA Setup For Google CoLab
10 pages
Service Robotic Paper
No ratings yet
Service Robotic Paper
123 pages
Self-Driving Nano Degree Syllabus Udacity
No ratings yet
Self-Driving Nano Degree Syllabus Udacity
17 pages
Software Design
No ratings yet
Software Design
12 pages
Pixel Optimisation Gan
No ratings yet
Pixel Optimisation Gan
15 pages
Basic Probability Rules
No ratings yet
Basic Probability Rules
6 pages
Railfence Cipher
No ratings yet
Railfence Cipher
14 pages
Discrete Event Simulation Notes
No ratings yet
Discrete Event Simulation Notes
8 pages
ECE408 S19 ZJUI Exam1 Study Guide
No ratings yet
ECE408 S19 ZJUI Exam1 Study Guide
25 pages
A survey of architectural approaches for improving GPGPU
No ratings yet
A survey of architectural approaches for improving GPGPU
24 pages
D. Granularity
No ratings yet
D. Granularity
24 pages
Optimizing Harris Corner Detection On GPGPUs Using CUDA
No ratings yet
Optimizing Harris Corner Detection On GPGPUs Using CUDA
129 pages
Realtimesignal Cuda
No ratings yet
Realtimesignal Cuda
26 pages
10 GPU-IntroCUDA3
No ratings yet
10 GPU-IntroCUDA3
141 pages
Module 4.1 - Memory and Data Locality: GPU Teaching Kit
No ratings yet
Module 4.1 - Memory and Data Locality: GPU Teaching Kit
132 pages
Main GPU
No ratings yet
Main GPU
87 pages
07_gpuarch
No ratings yet
07_gpuarch
73 pages
AI Processor Electronics Basic Technology of Artificial Intelligence
No ratings yet
AI Processor Electronics Basic Technology of Artificial Intelligence
399 pages
GPU Architecture
No ratings yet
GPU Architecture
12 pages
How CUDA Programming Works - 1647539841016001sz6e
No ratings yet
How CUDA Programming Works - 1647539841016001sz6e
101 pages
GPU - Final - Gradescope
No ratings yet
GPU - Final - Gradescope
20 pages
CUDA Fortran
No ratings yet
CUDA Fortran
88 pages
[Ebooks PDF] download Dataflow Processing 1st Edition Ali R. Hurson full chapters
100% (7)
[Ebooks PDF] download Dataflow Processing 1st Edition Ali R. Hurson full chapters
70 pages
Cuda Smith Watermaan Speed Up
No ratings yet
Cuda Smith Watermaan Speed Up
7 pages
Histogram
No ratings yet
Histogram
11 pages
s21170 Cuda On Nvidia Ampere Gpu Architecture Taking Your Algorithms To The Next Level of Performance
No ratings yet
s21170 Cuda On Nvidia Ampere Gpu Architecture Taking Your Algorithms To The Next Level of Performance
49 pages
Module 3.1 - CUDA Parallelism Model: GPU Teaching Kit
No ratings yet
Module 3.1 - CUDA Parallelism Model: GPU Teaching Kit
44 pages
Data-Level Parallelism: Nima Honarmand
No ratings yet
Data-Level Parallelism: Nima Honarmand
59 pages
CS8076 - GPU Architecture and Programming
No ratings yet
CS8076 - GPU Architecture and Programming
244 pages
A Practical GPU Based KNN Algorithm: Quansheng Kuang, and Lei Zhao
No ratings yet
A Practical GPU Based KNN Algorithm: Quansheng Kuang, and Lei Zhao
5 pages
Lecture 29 GPU Architecture Example
No ratings yet
Lecture 29 GPU Architecture Example
15 pages
Parallel Programming Module 5
No ratings yet
Parallel Programming Module 5
24 pages
Hidet: Task-Mapping Programming Paradigm For Deep Learning Tensor Programs
No ratings yet
Hidet: Task-Mapping Programming Paradigm For Deep Learning Tensor Programs
15 pages
A New Approach For Parallel Region Growing Algorithm in Image Segmentation Using MATLAB On GPU Architecture
No ratings yet
A New Approach For Parallel Region Growing Algorithm in Image Segmentation Using MATLAB On GPU Architecture
5 pages
CUDA Introduction
No ratings yet
CUDA Introduction
39 pages

8.4 GPU Architecture and Programming

Uploaded by

8.4 GPU Architecture and Programming

Uploaded by

GPU Architecture &

Threads launched for a parallel section are partitioned

Grid = all blocks for a given launch

Thread block is a group of threads that can:

Synchronize their execution

• Tutorials (CMU &

You might also like