0% found this document useful (0 votes)
8 views

8.4 GPU Architecture and Programming

Uploaded by

Amir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

8.4 GPU Architecture and Programming

Uploaded by

Amir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 27

GPU Architecture &

Programming
System with simple CPU and Video Cards
System with GPU
Multicore CPU vs GPU
CUDA Applications
on GPU
Example 1 : Hello World from GPU (1x10)
$ ./hello
#include <stdio.h>
Hello World from CPU!
__global__ void helloFromGPU (void) Hello World from GPU!
{ Hello World from GPU!
Hello World from GPU!
printf(“Hello World from GPU!\n”); Hello World from GPU!
} Hello World from GPU!
int main(void) Hello World from GPU!
Hello World from GPU!
{ Hello World from GPU!
// hello from cpu Hello World from GPU!
printf(“Hello World from CPU!\n”); Hello World from GPU!
helloFromGPU <<<1, 10>>>();
cudaDeviceReset();
return 0;
}
Memory Management Functions
Data Transfer between CPU and GPU
Example 2 : Data Transfer CPU GPU
CUDA Threads (SIMT)
Thread Hierarchy

Threads launched for a parallel section are partitioned


into thread blocks

Grid = all blocks for a given launch

Thread block is a group of threads that can:

Synchronize their execution


Communicate via shared memory
CUDA C Examples
Vector Addition
https://github.com/olcf-tutorials/vector_addition_cuda
Finding Maximum Value in an array
https://www.geeksforgeeks.org/how-to-run-cuda-c-c-on-jupyter-
notebook-in-google-colaboratory/
Matrix Addition
https://github.com/jcbacong/CUDA-matrix-addition
Matrix Multiplication
https://github.com/lzhengchun/matrix-cuda
GPU Compute Capability and Limits
• Technical Specifications per Compute Capability from 5.0 to 9.0
Maximum dimensionality of grid of thread blocks 3
Maximum x -dimension of a grid of thread blocks 231-1
Maximum y- or z-dimension of a grid of thread blocks 65535
Maximum dimensionality of thread block 3
Maximum x- or y-dimensionality of a block 1024
Maximum z-dimension of a block 64
Maximum number of threads per block 1024

• https://developer.nvidia.com/cuda-gpus
• https://docs.nvidia.com/cuda/cuda-c-programming-guide/
index.html#features-and-technical-specifications
Reference
• Book : CUDA C/C++ Programming Guide
https://docs.nvidia.com/cuda/cuda-c-programming-guide
• Book : PROFESSIONAL CUDA® C Programming – 2014
https://www.cs.utexas.edu/~rossbach/cs380p/papers/cuda-programming.pdf

• Tutorials (CMU &


https://people.cs.pitt.edu/~melhem/courses/xx45p/cuda1.pdf
https://www.cs.cmu.edu/afs/cs/academic/class/15418-s18/www/lectures/06_gpuarch.pdf
• Workshop (Cornell)
https://cvw.cac.cornell.edu/GPUarch/default

You might also like