0% found this document useful (0 votes)
12 views27 pages

1 Tutorial Intro

Uploaded by

a13288007769
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views27 pages

1 Tutorial Intro

Uploaded by

a13288007769
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 27

Website

GPGPU-Sim 3.x gpgpu-sim.org


A Performance Simulator for
Many-Core Accelerator Research
Tor M. Aamodt
Wilson W. L. Fung
Tayler Hetherington

University of British Columbia


Version of simulator corresponding to these slides = GPGPU-Sim 3.1.2
Tutorial Goals
• Make you more effective in your research using
GPGPU-Sim
– Feel free to ask questions when you have them

• After this tutorial, you will be able to:


– Describe what GPGPU-Sim simulates
– Setup GPGPU-Sim and
run CUDA applications on it
– Do simple performance analysis on CUDA applications
with AerialVision
– Extend GPGPU-Sim for your own research
December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.2
1: Backgnd on GPU Computing
Quick Survey
• How many of you are:
– Graduate students?
– Faculty members?
– Working for government?
– Working for industry?
• Have you written a CUDA or OpenCL
program before?
• Have you used GPGPU-Sim?

December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.3


1: Backgnd on GPU Computing
Overview
1 Brief Background on GPU Computing 40mins
2 GPGPU-Sim Overview 30mins
3 Demo 1: Setup and Run 15mins
Coffee Break (10:00 – 10:30am)
4 Microarchitecture Timing Model 85mins
Lunch (12:00 – 1:00pm)
5a Software Organization 25mins
5b Timing Model (Software) 50mins
5c Power Model - GPUWattch 45mins
Coffee Break (3:00 – 3:30pm)
6 The GPU Design Space 10mins
7a Demo 2: Debugging Tool 15mins
7b Demo 3: Visualizing Performance 30mins
8 Extending GPGPU-Sim (with GPUWattch) 30mins
9 Wrap Up and Discussion 15mins
December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.4
1: Backgnd on GPU Computing
What is a GPU?
• GPU = Graphics Processing Unit
– Optimized for Highly Parallel Workloads
– Highly Programmable
– Commodity Hardware (“Desktop Supercomputing”)
• Nvidia’s GTX580: 16 x 32-wide multiprocessors
– 512 ALUs
• 24,576 concurrent threads

December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.5


1: Backgnd on GPU Computing
GPU Computing

4 core CPU + 1536 core GPU

– Heterogeneous computing

December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.6


1: Backgnd on GPU Computing
Why GPU?

*Slide from GTC 2011, GPU Computing: Past, Present and Future, David Luebke, NVIDIA
December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.7
1: Backgnd on GPU Computing
Why GPU?

*Slide from GTC 2011, GPU Computing: Past, Present and Future, David Luebke, NVIDIA
December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.8
1: Backgnd on GPU Computing
Why GPU?

*Slide from AFDS 2011, The Programmer’s Guide to the APU Galaxy, Phil Rogers, AMD
December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.9
1: Backgnd on GPU Computing
Why GPU?

• OpenCL supported GPUs


(besides AMD and NVIDIA)
– AdrenoTM 3xx GPU from Qualcomm
– MaliTM-T600 Series GPUs from ARM
– HD 4000 on Intel’s Ivy Bridge
– Intel Xeon Phi (Knights Corner)

• GPU Computing is gaining


broad industry support.
December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.10
1: Backgnd on GPU Computing
Programming Model
• Traditional viewpoint
– CPU offload data parallel code sections onto
the GPU

• Correct viewpoint?
(if you want 100x speedup)
– GPU = computation workhorse
– CPU = sequential code “accelerator” and I/O
offload engine

December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.11


1: Backgnd on GPU Computing
GPU Microarchitecture Overview
(10,000 feet)
Single-Instruction, Multiple-Threads
GPU
SIMT Core Cluster SIMT Core Cluster SIMT Core Cluster

SIMT SIMT SIMT SIMT SIMT SIMT


Core Core Core Core Core Core

Interconnection Network

Memory Memory Memory


Partition Partition Partition

GDDR3/GDDR5 GDDR3/GDDR5 Off-chip DRAM GDDR3/GDDR5

December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.12


1: Backgnd on GPU Computing
CUDA and OpenCL
• Extensions of C to support
coprocessor model
• GPGPU-Sim support both
– This tutorial focus on CUDA
• More applications today

December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.13


1: Backgnd on GPU Computing
CUDA Thread Hierarchy
• Kernel Launch
= Grid of Blocks
of Threads

• Threads are
scalar threads

Source: CUDA programming manual

December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.14


1: Backgnd on GPU Computing
CUDA Memory Model
• Memory Spaces
– Shared
– Global
– Local
– Constant
– Texture

Source: CUDA programming manual

December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.15


1: Backgnd on GPU Computing
SIMT Execution Model
• Programmers sees MIMD threads (scalar)
• GPU HW bundles threads into warps and runs
them in lockstep on SIMD hardware
foo[] = {4,8,12,16};
A: v = foo[tid.x]; A T1 T2 T3 T4
B: if (v < 10) B T1 T2 T3 T4
C: v = 0;

Time
else C T1 T2
D: v = 10; D T3 T4
E: w = bar[tid.x]+v;
E T1 T2 T3 T4

December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.16


1: Backgnd on GPU Computing
CUDA Syntax Highlights
• Declaration specifiers to indicate where things live
__global__ void foo(...); // runs on GPU, callable from CPU
__device__ void bar(...); // function callable from a GPU thread

• Parallel kernel launch


foo<<<500, 128>>>(...); // 500 blocks, 128 threads each

• Special variables for thread identification in kernels


dim3 threadIdx; dim3 blockIdx; dim3 blockDim;

December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.17


1: Backgnd on GPU Computing
CUDA Example Code High performance computing with CUDA, SC09 Tutorial,
David Luebke, NVIDIA
Standard C Code
void saxpy_serial(int n, float a, float *x, float *y)
{
for (int i = 0; i < n; ++i)y[i] = a*x[i] + y[i];
}
// Invoke serial SAXPY kernel
saxpy_serial(n, 2.0, x, y);

CUDA code
__global__ void saxpy_parallel(int n, float a, float *x, float *y)
{
int i = blockIdx.x*blockDim.x + threadIdx.x;
if(i<n)
y[i]=a*x[i]+y[i];
}
main() {
… // omitted: allocate and initialize memory
// Invoke parallel SAXPY kernel with 256 threads/block
int nblocks = (n + 255) / 256;
saxpy_parallel<<<nblocks, 256>>>(n, 2.0, x, y);
… // omitted: transfer results from GPU to CPU
}

December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.18


1: Backgnd on GPU Computing
GPGPU-Sim in a Nutshell
• Microarchitecture performance model of
contemporary GPUs
– New: Power Model: GPUWattch
• Runs unmodified CUDA/OpenCL
• BSD License
• Focus of this tutorial:
GPGPU-Sim version 3.1.2 and later

December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.19


1: Backgnd on GPU Computing
GPGPU-Sim 3.1.2
• Since GPGPU-Sim 2.1.1b:
– Refactored for C++ Object-Oriented Implementation
– Redesigned Timing Models
• SIMT Core model, Cache models, GDDR5 timing … (later)
– Asynchronous Kernel Calls
– Concurrent Kernel Execution
– Support for CUDA 3.1
• Since GPGPU-Sim 3.0.1:
– Updated timing model to model Fermi more accurately
– Much more robust SASS support
– Support for CUDA 4.0 and later; OpenCL with newer drivers
– Power Model: GPUWattch
December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.20
1: Backgnd on GPU Computing
Accuracy
RODINIA Benchmark Suite
Quadro FX5800 SASS
GPGPU-Sim 3.1.0 – Correlation: 98.37%

200
GPGPU-Sim IPC

Similarity Score
copyChunks_kernel() Back Propagation
150
bpnn_layerforward_CUDA()
HotSpot
calculate_temp()
100

50

0
0 50 100 150 200
Hardware IPC
December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.21
1: Backgnd on GPU Computing
Accuracy
RODINIA Benchmark Suite
Tesla C2050 (Fermi) SASS
GPGPU-Sim 3.1.0 – Correlation: 97.35%
500

400
GPGPU-Sim IPC

300

200

100

0
0 100 200 300 400 500
Hardware IPC
December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.22
1: Backgnd on GPU Computing
Accuracy (Average Power)
NVIDIA GTX 480
250

200 Average Absolute Error ≈ 12%


Estimated Power (W)

150

100

50

0
0 50 100 150 200 250

Measured Power (W)

December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.23


1: Backgnd on GPU Computing
Dependencies
• Linux
• CUDA Toolkit (3.1 / 4.0 / 4.2)
• Standard Development Environment
– GCC, Make, etc.
• No GPU Hardware for CUDA

December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.24


1: Backgnd on GPU Computing
Citation
• If you use GPGPU-Sim (either 2.x or 3.x) in
your publication, please cite our ISPASS
2009 paper:
Ali Bakhoda, George L. Yuan, Wilson W. L. Fung, Henry Wong, Tor M. Aamodt,
Analyzing CUDA Workloads Using a Detailed GPU Simulator, In proceedings of
the IEEE International Symposium on Performance Analysis of Systems and
Software (ISPASS), pp. 163-174, Boston, MA, April 26-28, 2009.

• Please indicate which version of GPGPU-


Sim you used / extended
– E.g. “GPGPU-Sim version 3.1.2”
December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.25
1: Backgnd on GPU Computing
Session Summary
• GPU Computing
• CUDA Programming Model Concepts
– Thread Hierarchy
– Memory Spaces
– SIMT Execution Model
• GPGPU-Sim:
Timing + power simulator of modern GPUs
– Good accuracy
– Runs on systems without HW GPUs
December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.26
1: Backgnd on GPU Computing
Overview
1 Brief Background on GPU Computing 40mins
2 GPGPU-Sim Overview 30mins
3 Demo 1: Setup and Run 15mins
Coffee Break (10:00 – 10:30am)
4 Microarchitecture Timing Model 85mins
Lunch (12:00 – 1:00pm)
5a Software Organization 25mins
5b Timing Model (Software) 50mins
5c Power Model: GPUWattch 45mins
Coffee Break (3:00 – 3:30pm)
6 The GPU Design Space 10mins
7a Demo 2: Debugging Tool 15mins
7b Demo 3: Visualizing Performance 30mins
8 Extending GPGPU-Sim (with GPUWattch) 30mins
9 Wrap Up and Discussion 15mins
December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.27
1: Backgnd on GPU Computing

You might also like