0% found this document useful (0 votes)
45 views35 pages

Power Point

This document provides an overview of GPGPU-sim, a cycle-accurate simulator for GPUs. It describes what GPUs are and their architecture. It then explains that GPGPU-sim can simulate the timing and power of GPUs running CUDA or OpenCL programs. It details how GPGPU-sim models the GPU functionality, timing, and power usage. It also describes how to configure GPGPU-sim's GPU parameters and run benchmarks. The document concludes by listing references for more information on GPGPU-sim and GPU architecture.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views35 pages

Power Point

This document provides an overview of GPGPU-sim, a cycle-accurate simulator for GPUs. It describes what GPUs are and their architecture. It then explains that GPGPU-sim can simulate the timing and power of GPUs running CUDA or OpenCL programs. It details how GPGPU-sim models the GPU functionality, timing, and power usage. It also describes how to configure GPGPU-sim's GPU parameters and run benchmarks. The document concludes by listing references for more information on GPGPU-sim and GPU architecture.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

LAB5

GPGPU-sim Tutorial
Content
• Introduction
• Configurations of GPGPU-sim
• Experiment & Result
• References

2017/12/15 2
Introduction
What is GPU?
• Graphics Processing Unit
• Optimized for Highly Parallel Workloads
• Highly Programmable
• Commodity Hardware

Architecture of V100

2017/12/15 4
Why GPU?

2017/12/15 5
Architecture of GPU

2017/12/15 6
Software
• CUDA and OpenCL
• Extensions of C to support coprocessor model
• GPGPU-Sim support both

2017/12/15 7
What is GPGPU-sim?
• Microarchitecture performance model of contemporary GPUs
• Functional model
• Timing model
• Power model: GPUWattch
• Runs unmodified CUDA/OpenCL
• BSD License

2017/12/15 8
Modules Overview

PTX Emulator The functional simulator


that executes PTX kernels
(CUDA-Sim)

CUDA/OpenCL
GPGPU-Sim Abstract
API Library
Entrypoint HW Model
Interface

Power
Timing
Model:
Model
GPUWattch

2017/12/15 9
Top Level Organization

2017/12/15 10
Microarchitecture of GPU

See details in the micro-2013 slides


2017/12/15 11
GPGPU-sim: Function model(1/2)
• Functional model for PTX/SASS

PTX(Parallel Thread eXecution) SASS (Native ISA for Nvidia GPUs)

• Scalar PTX ISA(Instruction level) • Better correlation with HW GPU


• Scalar control flow (if-branch, for-loops) • NVIDIA’s cuobjdump
• Register allocation not done in PTX
• Intermediate representation in CUDA tool chain

2017/12/15 12
GPGPU-sim: Function model(2/2)

2017/12/15 13
GPGPU-sim: Timing Model
GPGPU-Sim simulates the timing model of a GPU running each
launched CUDA kernel
• Reports # cycles spent running the kernels
• Exclude any time spent on data transfer on PCIe bus
• CPU may run concurrently with asynchronous
kernel launches.

GPGPU-Sim w/ SASS is ~0.98


correlated to the real HW

2017/12/15 14
GPGPU-sim: Power Model
GPUWattch
• Estimate power consumed by the GPU according to the timing
behavior
• Validated with power measurements from a real GTX 480

2017/12/15 15
Debugging and Visualization
GPGPU-Sim provides tools to debug and visualize simulated GPU
behavior
• GDB macros
• Cycle-level debugging
• AerialVision
• High-level performance dynamics

2017/12/15 16
Configurations of GPGPU-sim
GPGPU-Sim Configurations
Change configuration by modifying ‘GTX480_run_dir/gpgpusim.conf’

1. Simulation Run 7. Memory Sub-System


2. Statistics Collection 8. Operand Collector
3. High-Level Architecture 9. DRAM/Memory Controller
4. Additional Architecture 10. Interconnection
5. Scheduler 11. PTX
6. Shader Core Pipeline 12. Power information

2017/12/15 18
Scheduler
Modify property of scheuler
• Number of warp scheuler in a core
• Issue number of warp scheuler

Examples
• gpgpu_num_sched_per_core
• gpgpu_max_insn_issue_per_warp

2017/12/15 19
Shader Core Pipeline
Modify property of shader core
• Pipeline
• Register number
• Councurrent thread array
• Branch Divergence
Examples
• gpgpu_shader_core_pipeline <# thread/shader core>:<warp size>:<pipeline SIMD width>
• gpgpu_shader_registers <# registers/shader core, default=8192>
• gpgpu_shader_cta <# CTA/shader core, default=8>

2017/12/15 20
Memory Sub-System Configuration
Set up size and operation of serveral kinds of memory and cache
• Memory: share memory
• Cache: Texture, constant, instruction, data cache
Examples:
• gpgpu_perfect_mem <0=off (default), 1=on>
• gpgpu_tex_cache:l1 <nsets>:<bsize>:<assoc>:<rep>:<wr>:<alloc>,<mshr>:<N>:<merge>,<mq>
• gpgpu_const_cache:l1 <nsets>:<bsize>:<assoc>:<rep>:<wr>:<alloc>,<mshr>:<N>:<merge>,<mq>
• gpgpu_cache:il1 <nsets>:<bsize>:<assoc>:<rep>:<wr>:<alloc>,<mshr>:<N>:<merge>,<mq>
• gpgpu_cache:dl2 <nsets>:<bsize>:<assoc>:<rep>:<wr>:<alloc>,<mshr>:<N>:<merge>,<mq>
• gpgpu_shmem_size <shared memory size, default=16kB>
• gpgpu_shmem_warp_parts
• gpgpu_flush_cache <0=off (default), 1=on>
2017/12/15 21
Power information
Simulate power model of GPGPU-Sim
Examples
• power_simulation_enabled 1 # Enable power model
• gpuwattch_xml_file gpuwattch_gtx480.xml # choose the configuration file
• power_trace_enabled 1 # Enable output: detailed average power traces
• steady_power_levels_enabled 1 # Enable output: steady state average power levels and
corresponding performance counters

2017/12/15 22
Experiment & Result
Experiment Environment
• VirtualBox(Recommended)
• Install Oracle VM VirtualBox
• Go to (http://www.gpgpu-sim.org/)
• Download fully setup virtual machine
• Double click this setup file

• Github
• Go to (GPGPU-sim’s github)
• Follow the manual

2017/12/15 24
Installation - AerialVision
Step1. Install AerialVision dependencies
$ sudo apt-get install python-pmw python-ply python-numpy libpng12-dev python-
matplotlib

Step2. Run bin/aerialvision.py in GPGPU-Sim distribution


$ python aerialvision.py

Hint
VM Password: gpgpu-sim

2017/12/15 25
Benchmarks
63 cuda executable benchmarks

2017/12/15 26
Run a simple program
CUDA program
• Benchmark : vectorAdd.cu
$ ./run_gpgpu-sim.sh ~/cuda/sdk/4.2/C/bin/linux/release/vectorAdd

Host code

Device code

2017/12/15 27
Simulation Result - Overall

Overall report
• simulation cycle of the GPU
• simulation ins. of the GPU
• IPC of the GPU
•…

2017/12/15 28
Simulation result - Cache

Cache report

Behavior of every cache in the GPU


• access times
• misses times
• pending hits times
• reservation fails times

2017/12/15 29
Simulation result - Interconnect
Interconnect report

Statistics of the interconnect


• Packet latency
• Network latency

2017/12/15 30
Simulation result - Power information

Total power List of average power

2017/12/15 31
Source code – You can modify it!

2017/12/15 32
Simulation result - Visualization

Time Lapse view Source code view


2017/12/15 33
References
References
• GPGPU-sim Official Website
• GPGPU-sim Manual
• GPUWattch Manual
• Micro-45, Tutorial, GPGPU-Sim 3.x: A Performance Simulator for
Manycore Accelerator Research
• McPAT
• NVIDIA OpenCL
• CUDA Toolkit
• Chinese Note

2017/12/15 35

You might also like