1 Tutorial Intro
1 Tutorial Intro
– Heterogeneous computing
*Slide from GTC 2011, GPU Computing: Past, Present and Future, David Luebke, NVIDIA
December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.7
1: Backgnd on GPU Computing
Why GPU?
*Slide from GTC 2011, GPU Computing: Past, Present and Future, David Luebke, NVIDIA
December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.8
1: Backgnd on GPU Computing
Why GPU?
*Slide from AFDS 2011, The Programmer’s Guide to the APU Galaxy, Phil Rogers, AMD
December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.9
1: Backgnd on GPU Computing
Why GPU?
• Correct viewpoint?
(if you want 100x speedup)
– GPU = computation workhorse
– CPU = sequential code “accelerator” and I/O
offload engine
Interconnection Network
• Threads are
scalar threads
Time
else C T1 T2
D: v = 10; D T3 T4
E: w = bar[tid.x]+v;
E T1 T2 T3 T4
CUDA code
__global__ void saxpy_parallel(int n, float a, float *x, float *y)
{
int i = blockIdx.x*blockDim.x + threadIdx.x;
if(i<n)
y[i]=a*x[i]+y[i];
}
main() {
… // omitted: allocate and initialize memory
// Invoke parallel SAXPY kernel with 256 threads/block
int nblocks = (n + 255) / 256;
saxpy_parallel<<<nblocks, 256>>>(n, 2.0, x, y);
… // omitted: transfer results from GPU to CPU
}
200
GPGPU-Sim IPC
Similarity Score
copyChunks_kernel() Back Propagation
150
bpnn_layerforward_CUDA()
HotSpot
calculate_temp()
100
50
0
0 50 100 150 200
Hardware IPC
December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.21
1: Backgnd on GPU Computing
Accuracy
RODINIA Benchmark Suite
Tesla C2050 (Fermi) SASS
GPGPU-Sim 3.1.0 – Correlation: 97.35%
500
400
GPGPU-Sim IPC
300
200
100
0
0 100 200 300 400 500
Hardware IPC
December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.22
1: Backgnd on GPU Computing
Accuracy (Average Power)
NVIDIA GTX 480
250
150
100
50
0
0 50 100 150 200 250