Cuda Emulator
Cuda Emulator
Cuda Emulator
Before Start: CPU Emulator Installation Compilation Options Browsing SDK examples
Ziyi Zheng
Content
Before Start : Emulation For those who want to use CUDA but do not have CUDAenabled GPU CPU Emulator
Aiming at debugging to help code development Replaced by Parallel Nsight ( requiring a CUDA-enabled GPU too) Nvidia starts to remove CPU emulator support on CUDA 3.0, March 2010 Latest CUDA version is CUDA 3.2 September 2010 Need to install CUDA 2.3, June 2009 (toolkit and SDK) Older NVCC, older APIs Can use emulation version of CUFFT, CUBLAS No CUSPARSE, CURNG
CUDA for CPUs CUDA C++ compiler in research MCUDA Developed by Wen-mei Hwus group Aiming at comparing GPUs and optimized CPUs performance CUDA code optimized C++ code for multi-core CPUs Linux based Download
http://impact.crhc.illinois.edu/mcuda.php
Under development by Portland Group (PGI) No GPU required Will be demonstrated at the SC10 Supercomputing
conference in November 13-15, 2010.
papers
http://www.ifp.illinois.edu/~minhdo/publications/parallelvideo.pdf
Not required in the course. Use it only when you want to fairly compare the performance between CPU and GPU
Content
1. http://developer.nvidia.com/object/cuda_2_3_downloads.html
2. Install CUDA Toolkit 2.3 3. Install CUDA SDK code examples 2.3
Available Resources
1. http://developer.nvidia.com/object/cuda_3_2_toolkit_rc.htm 2. Download appropriate GPU driver 3. Install CUDA Toolkit 3.2 4. Install GPU Computing SDK code examples 3.2
NVCC Visual Studio syntax highlighting CUDA BLAS (CUBLAS) and FFT (CUFFT) libraries CUDA Visual Profiler CUDA-GDB for Linux
ATI/AMD Card + CUDA Convert CUDA code into OPENCL code then build OPENCL code and executed on ATI/AMD card
1. http://developer.amd.com/gpu/atistreamsdk/pages/default.
aspx
Additional STEP
1. 2. 3. 4.
http://developer.amd.com/gpu/atistreamsdk/pages/default.aspx Download ATI Stream SDK 2.2 http://www.multiscalelab.org/swan Download Swan (27 May 2010)
Content
Must validate via student ID. Served as an IDE (integrated development environment) Served as an C/C++ compiler and linker for the host
program.
CUDA Build Rules 2.3 1. Right click a projection 2. Choose Custom Build Rules 3. Choose a CUDA rule 2.3 if available in your system which will be available after you installing the CUDA SDK 2.3 4. Right click a .cu file 5. Choose Property 6. Click CUDA rule 2.3
Setting Building Option by Command 1. Click General 2. For Tool : choose Custom Build Tool 3. Then Choose Custom Build Step 4. Enter your building command
CUDA Project
Create one from scratch? Modify existing projects in SDK CUDA visual studio wizard
http://sourceforge.net/projects/cudavswizard/ Third party, independent updates, no document support
CPU Emulation Mode for CUDA 2.3 For projections in CUDA SDK 2.3 In visual studio configuration: Chose EmuRelease or EmuDebug Instead of Release or debug
Content
1. Add a building configuration 2. change build rules settings, (or simply adding -deviceemu
-D_DEVICEEMU into complication command line)
Bandwidth Test Memory transfer on CPU GPU GPU GPU GPU CPU On a 8600m GT card Capability 1.1 # Multi-Processor On a 8600m GT card CPU GPU GPU GPU GPU CPU # cores
Block limit per dimension 1236 MB/s 11836 MB/s 380 MB/s Maximum # thread per block Grid limit per dimension
Matrix Multiplication 8600m GT v.s. Core2 Duo 2.4 GHz GPU : CPU in emulation mode : 1000x faster 0.62 ms around 850ms
Template 8600m GT v.s. Core2 Duo 2.4 GHz GPU : CPU in emulation mode : 3 times slower? Multiply 32 numbers by another 32 numbers 179 ms 66 ms
Matrix A: 80x48 Matrix B: 48x128 Matrix C: 80x128 Computational intensive GPU is better than CPU
unsigned int num_threads = 32; dim3 grid( 1, 1, 1); dim3 threads( num_threads, 1, 1);
32 multiplications