Power Point
Power Point
GPGPU-sim Tutorial
Content
• Introduction
• Configurations of GPGPU-sim
• Experiment & Result
• References
2017/12/15 2
Introduction
What is GPU?
• Graphics Processing Unit
• Optimized for Highly Parallel Workloads
• Highly Programmable
• Commodity Hardware
Architecture of V100
2017/12/15 4
Why GPU?
2017/12/15 5
Architecture of GPU
2017/12/15 6
Software
• CUDA and OpenCL
• Extensions of C to support coprocessor model
• GPGPU-Sim support both
2017/12/15 7
What is GPGPU-sim?
• Microarchitecture performance model of contemporary GPUs
• Functional model
• Timing model
• Power model: GPUWattch
• Runs unmodified CUDA/OpenCL
• BSD License
2017/12/15 8
Modules Overview
CUDA/OpenCL
GPGPU-Sim Abstract
API Library
Entrypoint HW Model
Interface
Power
Timing
Model:
Model
GPUWattch
2017/12/15 9
Top Level Organization
2017/12/15 10
Microarchitecture of GPU
2017/12/15 12
GPGPU-sim: Function model(2/2)
2017/12/15 13
GPGPU-sim: Timing Model
GPGPU-Sim simulates the timing model of a GPU running each
launched CUDA kernel
• Reports # cycles spent running the kernels
• Exclude any time spent on data transfer on PCIe bus
• CPU may run concurrently with asynchronous
kernel launches.
2017/12/15 14
GPGPU-sim: Power Model
GPUWattch
• Estimate power consumed by the GPU according to the timing
behavior
• Validated with power measurements from a real GTX 480
2017/12/15 15
Debugging and Visualization
GPGPU-Sim provides tools to debug and visualize simulated GPU
behavior
• GDB macros
• Cycle-level debugging
• AerialVision
• High-level performance dynamics
2017/12/15 16
Configurations of GPGPU-sim
GPGPU-Sim Configurations
Change configuration by modifying ‘GTX480_run_dir/gpgpusim.conf’
2017/12/15 18
Scheduler
Modify property of scheuler
• Number of warp scheuler in a core
• Issue number of warp scheuler
Examples
• gpgpu_num_sched_per_core
• gpgpu_max_insn_issue_per_warp
2017/12/15 19
Shader Core Pipeline
Modify property of shader core
• Pipeline
• Register number
• Councurrent thread array
• Branch Divergence
Examples
• gpgpu_shader_core_pipeline <# thread/shader core>:<warp size>:<pipeline SIMD width>
• gpgpu_shader_registers <# registers/shader core, default=8192>
• gpgpu_shader_cta <# CTA/shader core, default=8>
2017/12/15 20
Memory Sub-System Configuration
Set up size and operation of serveral kinds of memory and cache
• Memory: share memory
• Cache: Texture, constant, instruction, data cache
Examples:
• gpgpu_perfect_mem <0=off (default), 1=on>
• gpgpu_tex_cache:l1 <nsets>:<bsize>:<assoc>:<rep>:<wr>:<alloc>,<mshr>:<N>:<merge>,<mq>
• gpgpu_const_cache:l1 <nsets>:<bsize>:<assoc>:<rep>:<wr>:<alloc>,<mshr>:<N>:<merge>,<mq>
• gpgpu_cache:il1 <nsets>:<bsize>:<assoc>:<rep>:<wr>:<alloc>,<mshr>:<N>:<merge>,<mq>
• gpgpu_cache:dl2 <nsets>:<bsize>:<assoc>:<rep>:<wr>:<alloc>,<mshr>:<N>:<merge>,<mq>
• gpgpu_shmem_size <shared memory size, default=16kB>
• gpgpu_shmem_warp_parts
• gpgpu_flush_cache <0=off (default), 1=on>
2017/12/15 21
Power information
Simulate power model of GPGPU-Sim
Examples
• power_simulation_enabled 1 # Enable power model
• gpuwattch_xml_file gpuwattch_gtx480.xml # choose the configuration file
• power_trace_enabled 1 # Enable output: detailed average power traces
• steady_power_levels_enabled 1 # Enable output: steady state average power levels and
corresponding performance counters
2017/12/15 22
Experiment & Result
Experiment Environment
• VirtualBox(Recommended)
• Install Oracle VM VirtualBox
• Go to (http://www.gpgpu-sim.org/)
• Download fully setup virtual machine
• Double click this setup file
• Github
• Go to (GPGPU-sim’s github)
• Follow the manual
2017/12/15 24
Installation - AerialVision
Step1. Install AerialVision dependencies
$ sudo apt-get install python-pmw python-ply python-numpy libpng12-dev python-
matplotlib
Hint
VM Password: gpgpu-sim
2017/12/15 25
Benchmarks
63 cuda executable benchmarks
2017/12/15 26
Run a simple program
CUDA program
• Benchmark : vectorAdd.cu
$ ./run_gpgpu-sim.sh ~/cuda/sdk/4.2/C/bin/linux/release/vectorAdd
Host code
Device code
2017/12/15 27
Simulation Result - Overall
Overall report
• simulation cycle of the GPU
• simulation ins. of the GPU
• IPC of the GPU
•…
2017/12/15 28
Simulation result - Cache
Cache report
2017/12/15 29
Simulation result - Interconnect
Interconnect report
2017/12/15 30
Simulation result - Power information
2017/12/15 31
Source code – You can modify it!
2017/12/15 32
Simulation result - Visualization
2017/12/15 35