G80 Cuda
G80 Cuda
Crysis Demo
Are GPU’s Useful for Scientific
Computing?
Molecular Dynamics
&
Monte Carlo
nVidia G80 GPU Architecture Overview
•16 Multiprocessors Blocks
•Each MP Block Has:
•8 Streaming Processors
(IEEE 754 spfp compliant)
•16K Shared Memory
•64K Constant Cache
•8K Texture Cache
•Each processor can access
all of the memory at 86Gb/s,
but with different latencies:
•Shared – 2 cycle latency
•Device – 300 cycle latency
Programming Interface
► Interface to GPU via nVidia’s proprietary API
– CUDA (very C-like)
► Looks a lot like UPC (simplified CUDA below)
No Bank Conflicts if
Each Thread Indexes
A Different Bank
Molecular Dynamics
&
Monte Carlo
HP XW9400 with Quad AMD CPU
& Dual nVidia Quaddro 5600 GPUs
= A Teraflop Workstation?
Molecular Dynamics Trial
► Lennard Jones inter-atomic potential
► Verlet integration
► Normalized coordinates
► FCC lattice in a NxNxN Simulation Cell
► Periodic Boundary Conditions
► Trials with Rc = ∞, Rc = 3.0
► Tested nVidia 8800 GPU vs 3.0 Ghz Intel
P4
► Open GL used to implement MD on GPU
MD Timing Tests (NxN brute force)
# Cells #Atoms Time/Step Time/Step Performance
X,Y,Z Total GPU (s) CPU(s) Differential
2 32 0.000308 0.000429 139%
3 108 0.00039 0.004513 1157%
4 256 0.000391 0.025295 6469%
5 500 0.000596 0.092766 15565%
6 864 0.001274 0.27681 21728%
7 1372 0.002845 0.689375 24231%
8 2048 0.005665 1.547 27308%
MD Timing Results (bins & Rc=9 Ang)