0% found this document useful (0 votes)

12 views27 pages

1 Tutorial Intro

Uploaded by

a13288007769

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views27 pages

1 Tutorial Intro

Uploaded by

a13288007769

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 27

Website

GPGPU-Sim 3.x gpgpu-sim.org

A Performance Simulator for
Many-Core Accelerator Research
Tor M. Aamodt
Wilson W. L. Fung
Tayler Hetherington

University of British Columbia

Version of simulator corresponding to these slides = GPGPU-Sim 3.1.2
Tutorial Goals
• Make you more effective in your research using
GPGPU-Sim
– Feel free to ask questions when you have them

• After this tutorial, you will be able to:

– Describe what GPGPU-Sim simulates
– Setup GPGPU-Sim and
run CUDA applications on it
– Do simple performance analysis on CUDA applications
with AerialVision
– Extend GPGPU-Sim for your own research
December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.2
1: Backgnd on GPU Computing
Quick Survey
• How many of you are:
– Graduate students?
– Faculty members?
– Working for government?
– Working for industry?
• Have you written a CUDA or OpenCL
program before?
• Have you used GPGPU-Sim?

December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.3

1: Backgnd on GPU Computing
Overview
1 Brief Background on GPU Computing 40mins
2 GPGPU-Sim Overview 30mins
3 Demo 1: Setup and Run 15mins
Coffee Break (10:00 – 10:30am)
4 Microarchitecture Timing Model 85mins
Lunch (12:00 – 1:00pm)
5a Software Organization 25mins
5b Timing Model (Software) 50mins
5c Power Model - GPUWattch 45mins
Coffee Break (3:00 – 3:30pm)
6 The GPU Design Space 10mins
7a Demo 2: Debugging Tool 15mins
7b Demo 3: Visualizing Performance 30mins
8 Extending GPGPU-Sim (with GPUWattch) 30mins
9 Wrap Up and Discussion 15mins
December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.4
1: Backgnd on GPU Computing
What is a GPU?
• GPU = Graphics Processing Unit
– Optimized for Highly Parallel Workloads
– Highly Programmable
– Commodity Hardware (“Desktop Supercomputing”)
• Nvidia’s GTX580: 16 x 32-wide multiprocessors
– 512 ALUs
• 24,576 concurrent threads

December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.5

1: Backgnd on GPU Computing
GPU Computing

4 core CPU + 1536 core GPU

– Heterogeneous computing

December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.6

1: Backgnd on GPU Computing
Why GPU?

*Slide from GTC 2011, GPU Computing: Past, Present and Future, David Luebke, NVIDIA
December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.7
1: Backgnd on GPU Computing
Why GPU?

*Slide from GTC 2011, GPU Computing: Past, Present and Future, David Luebke, NVIDIA
December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.8
1: Backgnd on GPU Computing
Why GPU?

*Slide from AFDS 2011, The Programmer’s Guide to the APU Galaxy, Phil Rogers, AMD
December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.9
1: Backgnd on GPU Computing
Why GPU?

• OpenCL supported GPUs

(besides AMD and NVIDIA)
– AdrenoTM 3xx GPU from Qualcomm
– MaliTM-T600 Series GPUs from ARM
– HD 4000 on Intel’s Ivy Bridge
– Intel Xeon Phi (Knights Corner)

• GPU Computing is gaining

broad industry support.
December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.10
1: Backgnd on GPU Computing
Programming Model
• Traditional viewpoint
– CPU offload data parallel code sections onto
the GPU

• Correct viewpoint?
(if you want 100x speedup)
– GPU = computation workhorse
– CPU = sequential code “accelerator” and I/O
offload engine

December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.11

1: Backgnd on GPU Computing
GPU Microarchitecture Overview
(10,000 feet)
Single-Instruction, Multiple-Threads
GPU
SIMT Core Cluster SIMT Core Cluster SIMT Core Cluster

SIMT SIMT SIMT SIMT SIMT SIMT

Core Core Core Core Core Core

Interconnection Network

Memory Memory Memory

Partition Partition Partition

GDDR3/GDDR5 GDDR3/GDDR5 Off-chip DRAM GDDR3/GDDR5

December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.12

1: Backgnd on GPU Computing
CUDA and OpenCL
• Extensions of C to support
coprocessor model
• GPGPU-Sim support both
– This tutorial focus on CUDA
• More applications today

December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.13

1: Backgnd on GPU Computing
CUDA Thread Hierarchy
• Kernel Launch
= Grid of Blocks
of Threads

• Threads are
scalar threads

Source: CUDA programming manual

December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.14

1: Backgnd on GPU Computing
CUDA Memory Model
• Memory Spaces
– Shared
– Global
– Local
– Constant
– Texture

Source: CUDA programming manual

December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.15

1: Backgnd on GPU Computing
SIMT Execution Model
• Programmers sees MIMD threads (scalar)
• GPU HW bundles threads into warps and runs
them in lockstep on SIMD hardware
foo[] = {4,8,12,16};
A: v = foo[tid.x]; A T1 T2 T3 T4
B: if (v < 10) B T1 T2 T3 T4
C: v = 0;

Time
else C T1 T2
D: v = 10; D T3 T4
E: w = bar[tid.x]+v;
E T1 T2 T3 T4

December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.16

1: Backgnd on GPU Computing
CUDA Syntax Highlights
• Declaration specifiers to indicate where things live
__global__ void foo(...); // runs on GPU, callable from CPU
__device__ void bar(...); // function callable from a GPU thread

• Parallel kernel launch

foo<<<500, 128>>>(...); // 500 blocks, 128 threads each

• Special variables for thread identification in kernels

dim3 threadIdx; dim3 blockIdx; dim3 blockDim;

December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.17

1: Backgnd on GPU Computing
CUDA Example Code High performance computing with CUDA, SC09 Tutorial,
David Luebke, NVIDIA
Standard C Code
void saxpy_serial(int n, float a, float *x, float *y)
{
for (int i = 0; i < n; ++i)y[i] = a*x[i] + y[i];
}
// Invoke serial SAXPY kernel
saxpy_serial(n, 2.0, x, y);

CUDA code
__global__ void saxpy_parallel(int n, float a, float *x, float *y)
{
int i = blockIdx.x*blockDim.x + threadIdx.x;
if(i<n)
y[i]=a*x[i]+y[i];
}
main() {
… // omitted: allocate and initialize memory
// Invoke parallel SAXPY kernel with 256 threads/block
int nblocks = (n + 255) / 256;
saxpy_parallel<<<nblocks, 256>>>(n, 2.0, x, y);
… // omitted: transfer results from GPU to CPU
}

December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.18

1: Backgnd on GPU Computing
GPGPU-Sim in a Nutshell
• Microarchitecture performance model of
contemporary GPUs
– New: Power Model: GPUWattch
• Runs unmodified CUDA/OpenCL
• BSD License
• Focus of this tutorial:
GPGPU-Sim version 3.1.2 and later

December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.19

1: Backgnd on GPU Computing
GPGPU-Sim 3.1.2
• Since GPGPU-Sim 2.1.1b:
– Refactored for C++ Object-Oriented Implementation
– Redesigned Timing Models
• SIMT Core model, Cache models, GDDR5 timing … (later)
– Asynchronous Kernel Calls
– Concurrent Kernel Execution
– Support for CUDA 3.1
• Since GPGPU-Sim 3.0.1:
– Updated timing model to model Fermi more accurately
– Much more robust SASS support
– Support for CUDA 4.0 and later; OpenCL with newer drivers
– Power Model: GPUWattch
December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.20
1: Backgnd on GPU Computing
Accuracy
RODINIA Benchmark Suite
Quadro FX5800 SASS
GPGPU-Sim 3.1.0 – Correlation: 98.37%

200
GPGPU-Sim IPC

Similarity Score
copyChunks_kernel() Back Propagation
150
bpnn_layerforward_CUDA()
HotSpot
calculate_temp()
100

0
0 50 100 150 200
Hardware IPC
December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.21
1: Backgnd on GPU Computing
Accuracy
RODINIA Benchmark Suite
Tesla C2050 (Fermi) SASS
GPGPU-Sim 3.1.0 – Correlation: 97.35%
500

400
GPGPU-Sim IPC

300

200

100

0
0 100 200 300 400 500
Hardware IPC
December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.22
1: Backgnd on GPU Computing
Accuracy (Average Power)
NVIDIA GTX 480
250

200 Average Absolute Error ≈ 12%

Estimated Power (W)

150

100

0
0 50 100 150 200 250

Measured Power (W)

December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.23

1: Backgnd on GPU Computing
Dependencies
• Linux
• CUDA Toolkit (3.1 / 4.0 / 4.2)
• Standard Development Environment
– GCC, Make, etc.
• No GPU Hardware for CUDA

December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.24

1: Backgnd on GPU Computing
Citation
• If you use GPGPU-Sim (either 2.x or 3.x) in
your publication, please cite our ISPASS
2009 paper:
Ali Bakhoda, George L. Yuan, Wilson W. L. Fung, Henry Wong, Tor M. Aamodt,
Analyzing CUDA Workloads Using a Detailed GPU Simulator, In proceedings of
the IEEE International Symposium on Performance Analysis of Systems and
Software (ISPASS), pp. 163-174, Boston, MA, April 26-28, 2009.

• Please indicate which version of GPGPU-

Sim you used / extended
– E.g. “GPGPU-Sim version 3.1.2”
December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.25
1: Backgnd on GPU Computing
Session Summary
• GPU Computing
• CUDA Programming Model Concepts
– Thread Hierarchy
– Memory Spaces
– SIMT Execution Model
• GPGPU-Sim:
Timing + power simulator of modern GPUs
– Good accuracy
– Runs on systems without HW GPUs
December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.26
1: Backgnd on GPU Computing
Overview
1 Brief Background on GPU Computing 40mins
2 GPGPU-Sim Overview 30mins
3 Demo 1: Setup and Run 15mins
Coffee Break (10:00 – 10:30am)
4 Microarchitecture Timing Model 85mins
Lunch (12:00 – 1:00pm)
5a Software Organization 25mins
5b Timing Model (Software) 50mins
5c Power Model: GPUWattch 45mins
Coffee Break (3:00 – 3:30pm)
6 The GPU Design Space 10mins
7a Demo 2: Debugging Tool 15mins
7b Demo 3: Visualizing Performance 30mins
8 Extending GPGPU-Sim (with GPUWattch) 30mins
9 Wrap Up and Discussion 15mins
December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.27
1: Backgnd on GPU Computing

Sega Saturn Architecture: Architecture of Consoles: A Practical Analysis, #5
From Everand
Sega Saturn Architecture: Architecture of Consoles: A Practical Analysis, #5
Rodrigo Copetti
No ratings yet
Power Point
No ratings yet
Power Point
35 pages
GPGPU Sim Tutorial
No ratings yet
GPGPU Sim Tutorial
28 pages
GPU Programming: Dr. Florian Ferreira
No ratings yet
GPU Programming: Dr. Florian Ferreira
101 pages
07 cmsc416 Cuda
No ratings yet
07 cmsc416 Cuda
26 pages
Chapter 5 - General Purpose PGPU, CUDA
No ratings yet
Chapter 5 - General Purpose PGPU, CUDA
70 pages
CUDA
No ratings yet
CUDA
33 pages
cs179 2017 Lec01
No ratings yet
cs179 2017 Lec01
24 pages
IntroGPUs
No ratings yet
IntroGPUs
36 pages
GPU Computing For Data Science - John Joo
No ratings yet
GPU Computing For Data Science - John Joo
34 pages
Introduction To CUDA
No ratings yet
Introduction To CUDA
51 pages
Programming Gpus With Cuda: John Mellor-Crummey
No ratings yet
Programming Gpus With Cuda: John Mellor-Crummey
42 pages
Why GPU?: CS8803SC Software and Hardware Cooperative Computing
No ratings yet
Why GPU?: CS8803SC Software and Hardware Cooperative Computing
14 pages
10 - Introduction and Overview GPGPU
No ratings yet
10 - Introduction and Overview GPGPU
69 pages
cs179_2024_lec01
No ratings yet
cs179_2024_lec01
26 pages
p10-cuda
No ratings yet
p10-cuda
28 pages
1 Cuda
100% (1)
1 Cuda
173 pages
Lecture - 01 - CUDA Programming
No ratings yet
Lecture - 01 - CUDA Programming
52 pages
GPU Cluster4
No ratings yet
GPU Cluster4
31 pages
Introduction To Programming Massively Parallel Graphics Processors
No ratings yet
Introduction To Programming Massively Parallel Graphics Processors
84 pages
COE4590_15_GPU1
No ratings yet
COE4590_15_GPU1
14 pages
Lec 1
No ratings yet
Lec 1
27 pages
лк CUDA - 1 PDCn
No ratings yet
лк CUDA - 1 PDCn
31 pages
CUDA Introduction Mod
No ratings yet
CUDA Introduction Mod
50 pages
Owens
No ratings yet
Owens
67 pages
CUDA Tutorial
No ratings yet
CUDA Tutorial
50 pages
chapter-8
No ratings yet
chapter-8
58 pages
GPGPU Programming With CUDA: Leandro Avila - University of Northern Iowa
No ratings yet
GPGPU Programming With CUDA: Leandro Avila - University of Northern Iowa
29 pages
Seminar Igor Kamzic COSC3P93
No ratings yet
Seminar Igor Kamzic COSC3P93
58 pages
GPGPUs CUDA
No ratings yet
GPGPUs CUDA
21 pages
27th Aug - Introduction To GPGPU - Part 1
No ratings yet
27th Aug - Introduction To GPGPU - Part 1
32 pages
06 Intro Gpus
No ratings yet
06 Intro Gpus
33 pages
ECE 498AL The CUDA Programming Model
No ratings yet
ECE 498AL The CUDA Programming Model
37 pages
GPU Architecture Ebook
No ratings yet
GPU Architecture Ebook
67 pages
Lecture GPUArchCUDA01
No ratings yet
Lecture GPUArchCUDA01
57 pages
GPU_Programming_slides_2
No ratings yet
GPU_Programming_slides_2
37 pages
Introduction To GP-GPU and CUDA: High Performance Computing Center Hanoi University of Science & Technology
No ratings yet
Introduction To GP-GPU and CUDA: High Performance Computing Center Hanoi University of Science & Technology
43 pages
CUDA Compute Unified Device Architecture
No ratings yet
CUDA Compute Unified Device Architecture
26 pages
Lecture 1: An Introduction To CUDA: Mike Giles
No ratings yet
Lecture 1: An Introduction To CUDA: Mike Giles
247 pages
Lecture 1: An Introduction To CUDA: Mike Giles
No ratings yet
Lecture 1: An Introduction To CUDA: Mike Giles
40 pages
Lecture 2
No ratings yet
Lecture 2
77 pages
Barnett Haskins
No ratings yet
Barnett Haskins
29 pages
10 GPU-IntroCUDA3
No ratings yet
10 GPU-IntroCUDA3
141 pages
Lecture 1
No ratings yet
Lecture 1
17 pages
GPU Computing 3
No ratings yet
GPU Computing 3
32 pages
w13s1_MultiprocessingGPU
No ratings yet
w13s1_MultiprocessingGPU
21 pages
Lecture 2
No ratings yet
Lecture 2
15 pages
GPU Introduction
No ratings yet
GPU Introduction
52 pages
GPU Architecture & Implications: David Luebke NVIDIA Research
No ratings yet
GPU Architecture & Implications: David Luebke NVIDIA Research
94 pages
Scipy09 Pycuda Tut
No ratings yet
Scipy09 Pycuda Tut
162 pages
Gpu Cuda Part2
No ratings yet
Gpu Cuda Part2
15 pages
Lec 2 PDC
No ratings yet
Lec 2 PDC
31 pages
Introduction To Gpu Programming With Cuda and Openacc
100% (1)
Introduction To Gpu Programming With Cuda and Openacc
40 pages
High Performance Computing On Gpu
No ratings yet
High Performance Computing On Gpu
37 pages
HPC Final 4-8
No ratings yet
HPC Final 4-8
25 pages
00_CourseIntroduction
No ratings yet
00_CourseIntroduction
33 pages
PDC Lecture 7-8 GPU Architectures
No ratings yet
PDC Lecture 7-8 GPU Architectures
25 pages
Topic GPU1
No ratings yet
Topic GPU1
32 pages
Dreamcast Architecture: Architecture of Consoles: A Practical Analysis, #9
From Everand
Dreamcast Architecture: Architecture of Consoles: A Practical Analysis, #9
Rodrigo Copetti
No ratings yet
Wii Architecture: Architecture of Consoles: A Practical Analysis, #11
From Everand
Wii Architecture: Architecture of Consoles: A Practical Analysis, #11
Rodrigo Copetti
No ratings yet
Human Interface
No ratings yet
Human Interface
29 pages
T.M.Kirankumar: Automation Testing Performance Testing QA Methodologies
No ratings yet
T.M.Kirankumar: Automation Testing Performance Testing QA Methodologies
6 pages
SCXI - 1102/B/C: Calibration Procedure
No ratings yet
SCXI - 1102/B/C: Calibration Procedure
18 pages
VIMSpc (205 243)
No ratings yet
VIMSpc (205 243)
39 pages
Fyp Proposal 11
No ratings yet
Fyp Proposal 11
17 pages
A Foot-Arch Parameter Measurement System Using A
No ratings yet
A Foot-Arch Parameter Measurement System Using A
26 pages
Unit-5 - CD - Code Generation
No ratings yet
Unit-5 - CD - Code Generation
56 pages
Screenshot 2024-10-06 at 4.09.49 PM
No ratings yet
Screenshot 2024-10-06 at 4.09.49 PM
1 page
Automated Exam Checker System
No ratings yet
Automated Exam Checker System
62 pages
LUSAS-Worked Examples
No ratings yet
LUSAS-Worked Examples
507 pages
Alienware m17 R3 Setup and Specifications: Regulatory Model: P45E Regulatory Type: P45E001 June 2020 Rev. A01
No ratings yet
Alienware m17 R3 Setup and Specifications: Regulatory Model: P45E Regulatory Type: P45E001 June 2020 Rev. A01
22 pages
Intel S Mobile Strategy in 2015 and Beyond
No ratings yet
Intel S Mobile Strategy in 2015 and Beyond
19 pages
Introduction To DSA
No ratings yet
Introduction To DSA
11 pages
Flutter Training Report
No ratings yet
Flutter Training Report
11 pages
L02-Introduction To CPU Architecture
No ratings yet
L02-Introduction To CPU Architecture
29 pages
Sheet Cheat West
No ratings yet
Sheet Cheat West
8 pages
DataSheet_S14NA-U12_20230907
No ratings yet
DataSheet_S14NA-U12_20230907
2 pages
Manual Testing Interview Preparation
No ratings yet
Manual Testing Interview Preparation
38 pages
Information Communication Technology
No ratings yet
Information Communication Technology
108 pages
Manual HRSpace3 Eng (051-100) PDF
No ratings yet
Manual HRSpace3 Eng (051-100) PDF
50 pages
Central Monitoring System Viewer
No ratings yet
Central Monitoring System Viewer
26 pages
Data Analytics for Intelligent Transportation Systems 1st Edition Edition Mashrur Chowdhury - eBook PDFinstant download
100% (4)
Data Analytics for Intelligent Transportation Systems 1st Edition Edition Mashrur Chowdhury - eBook PDFinstant download
51 pages
Caepipe: Tutorial For Modeling and Results Review Problem 2
No ratings yet
Caepipe: Tutorial For Modeling and Results Review Problem 2
93 pages
Unicode
No ratings yet
Unicode
17 pages
Advanced Computer Architecture ECE 6373: Pauline Markenscoff N320 Engineering Building 1 E-Mail: Markenscoff@uh - Edu
No ratings yet
Advanced Computer Architecture ECE 6373: Pauline Markenscoff N320 Engineering Building 1 E-Mail: Markenscoff@uh - Edu
151 pages
ZBook Studio 16 Inch G10
No ratings yet
ZBook Studio 16 Inch G10
5 pages
Virtual Furniture Using Augmented Reality: Snehal Mangale, Nabil Phansopkar, Safwaan Mujawar, Neeraj Singh
No ratings yet
Virtual Furniture Using Augmented Reality: Snehal Mangale, Nabil Phansopkar, Safwaan Mujawar, Neeraj Singh
5 pages
LO 3 CSS Ok
No ratings yet
LO 3 CSS Ok
18 pages
Wingridds Users Guide
No ratings yet
Wingridds Users Guide
282 pages
B660M DS3H DDR4 Manual
No ratings yet
B660M DS3H DDR4 Manual
38 pages

1 Tutorial Intro

Uploaded by

1 Tutorial Intro

Uploaded by

Website

GPGPU-Sim 3.x gpgpu-sim.org

University of British Columbia

• After this tutorial, you will be able to:

December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.3

December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.5

4 core CPU + 1536 core GPU

December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.6

• OpenCL supported GPUs

• GPU Computing is gaining

December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.11

SIMT SIMT SIMT SIMT SIMT SIMT

Memory Memory Memory

GDDR3/GDDR5 GDDR3/GDDR5 Off-chip DRAM GDDR3/GDDR5

December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.12

December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.13

Source: CUDA programming manual

December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.14

Source: CUDA programming manual

December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.15

December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.16

• Parallel kernel launch

• Special variables for thread identification in kernels

December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.17

December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.18

December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.19

200 Average Absolute Error ≈ 12%

Measured Power (W)

December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.23

December 2012 GPGPU-Sim Tutorial (MICRO 2012) 1.24

• Please indicate which version of GPGPU-

You might also like