06-CUDA Thread Organization
06-CUDA Thread Organization
Vishwesh Jatala
Assistant Professor
Department of EECS
Indian Institute of Technology Bhilai
[email protected]
2022-23M
1
Outline
2
CUDA Programming Flow
GPU (Device)
(2) Kernel SM SM SM
Device Memory
Memory
CPU (Host)
3
VectorAdd in CUDA
4
VectorAdd in CUDA
#define N 512
int main(void) {
int *a, *b, *c; // host copies of a, b, c
int *dev_a, *dev_b, *dev_c; //device copies of a, b, c
int size = N * sizeof(int);
…
// Alloc space for host copies of a, b, c and
// setup input values
a = (int *)malloc(size); random_ints(a, N);
b = (int *)malloc(size); random_ints(b, N);
c = (int *)malloc(size);
5
VectorAdd in CUDA
6
VectorAdd in CUDA
7
Practise Problem-1
■ For given two matrices M and N both having size k*k (where
k<=1024), write a CUDA program to perform M+N
❑ Hint: Allocate M and N single dimension array having
k*k elements.
8
Thread Configuration
9
Indexing Arrays with Threads and Thread Blocks
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
M=8
0 1 2 3 4 5 6 7 …………..…………..……………………….. 24 25 26 27 28 29 30 31
Array A
10
Indexing Arrays with Threads and Thread Blocks
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
M=8
0 1 2 3 4 5 6 7 …………..…………..……………………….. 21 22 23 24 25 26 27 28 29 30 31
Array A
11
VectorAdd in CUDA with Thread and Blocks
12
VectorAdd in CUDA
#define N 512
int main(void) {
int *a, *b, *c; // host copies of a, b, c
int *dev_a, *dev_b, *dev_c; //device copies of a, b, c
int size = N * sizeof(int);
…
// Alloc space for host copies of a, b, c and
// setup input values
a = (int *)malloc(size); random_ints(a, N);
b = (int *)malloc(size); random_ints(b, N);
c = (int *)malloc(size);
13
VectorAdd in CUDA
14
Threadblock configuration
15
Practise Problem-2
16
Thread Configuration
17
Exercise: 1D Thread Organization
■ Assumptions:
❑ Matrix is stored in the single dimensional array.
18
1D
19
Why Thread Blocks?
But why not 1 thread block with all the threads in it?
20
GPU Architecture
21
Few Constraints
22
Scalability
GPU-0 GPU-1
23
Few Constraints
24
Compute Capabilities
Source: https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf
25
Summary
■ CUDA Programming
❑ Thread organizations:
❑ Examples
■ Next Lecture
❑ Thread organization (2D & 3D)
❑ GPU Instruction Execution
26
References
27