Unit 6 Chapter 1 Parallel Programming Tools Cuda - Programming
Unit 6 Chapter 1 Parallel Programming Tools Cuda - Programming
Device = GPU
Host = CPU
Kernel =
function that
runs on the
device
CUDA Programming Model
A kernel is executed by a grid of thread blocks
Grid – A group of
one or more blocks. A
grid is created for each
CUDA kernel function
Arrays of Parallel Threads
A CUDA kernel is executed by an array of threads
All threads run the same code
Each thread has an ID that it uses to compute memory
addresses and make control decisions
Minimal Kernels
Example: Increment Array Elements
Example: Increment Array Elements
Thread Cooperation
The Missing Piece: threads may need to cooperate
Thread cooperation is valuable
Share results to avoid redundant computation
Share memory accesses
Drastic bandwidth reduction
Thread cooperation is a powerful feature of CUDA
Manage memory
Moving Data…
CUDA allows us to copy data from
one memory type to another.