GPU Architecture
GPU Architecture
GPU Architecture
GPU Architecture
Analytical
Patrick Cozzi Graphics, Inc.
16 fragment
shader processors
Slide from David Luebke: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf Slide from David Luebke: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf
Thread ID
Declaration
Specifier
Execution
Configuration
Mds[i] = Md[j];
__syncthreads();
func(Mds[i], Mds[i + 1]);
Image from: http://courses.engr.illinois.edu/ece498/al/textbook/Chapter2-CudaProgrammingModel.pdf
Thread Synchronization Thread Synchronization
Thread 0 Thread 1 Thread 0 Thread 1
Mds[i] = Md[j]; Mds[i] = Md[j]; Mds[i] = Md[j]; Mds[i] = Md[j];
__syncthreads(); __syncthreads(); __syncthreads(); __syncthreads();
func(Mds[i], Mds[i+1]); func(Mds[i], Mds[i+1]); func(Mds[i], Mds[i+1]); func(Mds[i], Mds[i+1]);
Time: 0 Time: 1
Time: 1 Time: 2
Thread Synchronization Thread Synchronization
Thread 0 Thread 1 Thread 0 Thread 1
Mds[i] = Md[j]; Mds[i] = Md[j]; Mds[i] = Md[j]; Mds[i] = Md[j];
__syncthreads(); __syncthreads(); __syncthreads(); __syncthreads();
func(Mds[i], Mds[i+1]); func(Mds[i], Mds[i+1]); func(Mds[i], Mds[i+1]); func(Mds[i], Mds[i+1]);
Time: 4 Time: 5
Thread Synchronization Thread Synchronization
Why is it important that execution time be
similar among threads?
Why does it only synchronize within a
block?