0% found this document useful (0 votes)
160 views

EnSPy: Python Library For Computations of Ensembles of Particles On GPU

This document describes EnSPy, a Python library for performing computations on ensembles of particles using GPUs. It discusses how GPUs are well-suited for problems with high data parallelism. The library combines the flexibility of Python with the efficiency of C++ and CUDA for N-body simulations. EnSPy generates and compiles C and CUDA code at runtime from user-specified expressions. Examples demonstrate its use for simulations of particle ensembles in potentials and N-body problems.

Uploaded by

PhtRaveller
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
160 views

EnSPy: Python Library For Computations of Ensembles of Particles On GPU

This document describes EnSPy, a Python library for performing computations on ensembles of particles using GPUs. It discusses how GPUs are well-suited for problems with high data parallelism. The library combines the flexibility of Python with the efficiency of C++ and CUDA for N-body simulations. EnSPy generates and compiles C and CUDA code at runtime from user-specified expressions. Examples demonstrate its use for simulations of particle ensembles in potentials and N-body problems.

Uploaded by

PhtRaveller
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations of ensembles of particles on GPU


Glib Ivashkevych
Institute of Theoretical Physics, NSC KIPT, Kharkov, Ukraine

October 13, 2010

EnSPy: Python library for computations of ensembles of particles on GPU Why GPU?

GPU Graphic Processing Unit programmable manycore multithreaded with very high memory bandwidth

EnSPy: Python library for computations of ensembles of particles on GPU Why GPU?

GPU Graphic Processing Unit programmable manycore multithreaded with very high memory bandwidth GPU programming give us: high performance transparent scalability ... and is useful for problems with high data parallelism: large datasets portions of data could be processed independently

EnSPy: Python library for computations of ensembles of particles on GPU Outline

Outline
1 2 3 4 5 6 7 8 9

NVIDIA GPU Architecture and CUDA Python and CUDA EnSPy functionality EnSPy architecture Example: D5 potential Example: Hill problem Example: Hill problem, Nbody version Performance results Future development

EnSPy: Python library for computations of ensembles of particles on GPU NVIDIA GPU Architecture and CUDA

Outline
1 2 3 4 5 6 7 8 9

NVIDIA GPU Architecture and CUDA Python and CUDA EnSPy functionality EnSPy architecture Example: D5 potential Example: Hill problem Example: Hill problem, Nbody version Performance results Future development

EnSPy: Python library for computations of ensembles of particles on GPU NVIDIA GPU Architecture and CUDA

Simplied GT200 architecture

consists of multiprocessors each MP has:


8 stream processors 1 unit for double precision operations shared memory

global memory

EnSPy: Python library for computations of ensembles of particles on GPU NVIDIA GPU Architecture and CUDA

Multiprocessors and threads


MP can launch numerous threads threads are lightweight little creation and switching overhead threads run the same code threads syncronization within MP cooperation via shared memory each thread have unique identier thread ID Eciency is achieved by latency hiding by calculation, and not by cache usage, as on CPU

EnSPy: Python library for computations of ensembles of particles on GPU NVIDIA GPU Architecture and CUDA

C for CUDA
a set of extensions to C runtime library function and variable type qualiers builtin vector types: oat4, double2 etc. builtin variables Kernels maps parallel part of the program to the GPU execution: N times in parallel by N CUDA threads CUDA Driver API lowlevel control over the execution no need in nvcc compiler if kernels are precompiled only driver needed

EnSPy: Python library for computations of ensembles of particles on GPU NVIDIA GPU Architecture and CUDA

Execution model

EnSPy: Python library for computations of ensembles of particles on GPU Python and CUDA

Outline
1 2 3 4 5 6 7 8 9

NVIDIA GPU Architecture and CUDA Python and CUDA EnSPy functionality EnSPy architecture Example: D5 potential Example: Hill problem Example: Hill problem, Nbody version Performance results Future development

EnSPy: Python library for computations of ensembles of particles on GPU Why Python?

Python: exible multipurpose interpreted language


easy to learn dynamically typed rich builtin functionality very well documented have large and active community

EnSPy: Python library for computations of ensembles of particles on GPU Why Python?

Python scientic packages:


SciPy modeling and simulation
Fourier transforms ODE Optimization scipy.weave.inline C inlining with little or no overhead

EnSPy: Python library for computations of ensembles of particles on GPU Why Python?

Python scientic packages:


SciPy modeling and simulation
Fourier transforms ODE Optimization scipy.weave.inline C inlining with little or no overhead

NumPy arrays, linear algebra etc.

exible array creation routines sorting, random sampling and statistics

EnSPy: Python library for computations of ensembles of particles on GPU Why Python?

Python scientic packages:


SciPy modeling and simulation
Fourier transforms ODE Optimization scipy.weave.inline C inlining with little or no overhead

NumPy arrays, linear algebra etc.

exible array creation routines sorting, random sampling and statistics

Python is a convenient way of interfacing C/C++ libraries

EnSPy: Python library for computations of ensembles of particles on GPU Why Python?

Python and CUDA


We could interface with: Python C API lowlevel approach: overkill SWIG, Boost::Python highlevel approach: overkill PyCUDA most simple and straightforward way for CUDA only scipy.weave.inline simple and straightforward way for both CUDA and plain C/C++

EnSPy: Python library for computations of ensembles of particles on GPU EnSPy functionality

Outline
1 2 3 4 5 6 7 8 9

NVIDIA GPU Architecture and CUDA Python and CUDA EnSPy functionality EnSPy architecture Example: D5 potential Example: Hill problem Example: Hill problem, Nbody version Performance results Future development

EnSPy: Python library for computations of ensembles of particles on GPU EnSPy functionality

Motivation
Combine exibility of Python with eciency of C++ CUDA for Nbody sim interface of EnSPy is written in Python core of EnSPy is written in C++ joined together by scipy.weave.inline C++ core could be used without Python just include header and link with precompiled shared library easily extensible: both through highlevel Python interface and lowlevel C++ core new algorithms, initial distributions etc. multiGPU parallelization its easy to experiment with EnSPy!

EnSPy: Python library for computations of ensembles of particles on GPU EnSPy functionality

EnSPy functionality
Types of ensembles: Simple ensemble without interaction, only external potential Nbody ensemble both external potential and gravitational interaction between particles Current algorithms: 4-th order RungeKutta for simple ensemble Hermite scheme with shared time steps for N-body ensemble

EnSPy: Python library for computations of ensembles of particles on GPU EnSPy functionality

Predened initial distributions: Uniform, point and spherical for simple ensembles Uniform sphere with 2T /|U| = 1 for N-body ensemble user could supply functions (in Python) for initial ensemble generation User specied values and expressions: parameters of initial distribution potential, forces, parameters of integration scheme arbitrary number of triggers Ni (t) of particles which do not cross the given hypersurface Fi (q, p) = 0 before time t arbitrary number of averages Fi (q, p, t) quantities which should be averaged over the ensembles

EnSPy: Python library for computations of ensembles of particles on GPU EnSPy functionality

Runtime generation and compilation of C and CUDA code: User specied expressions (as Python strings) are wrapped by EnSPy template subpackage into C functions and CUDA module Compiled at runtime High usage and calculation eciency: exible Python interface all actual calculations are performed by runtime generated C extension and precompiled shared library Drawback: extra time for generation and compilation of new code

EnSPy: Python library for computations of ensembles of particles on GPU EnSPy architecture

Outline
1 2 3 4 5 6 7 8 9

NVIDIA GPU Architecture and CUDA Python and CUDA EnSPy functionality EnSPy architecture Example: D5 potential Example: Hill problem Example: Hill problem, Nbody version Performance results Future development

EnSPy: Python library for computations of ensembles of particles on GPU EnSPy architecture

Execution ow and architecture


Input parameters Ensemble population (predened or user specied distribution) Code generation and compilation Launching NGPUs threads

EnSPy: Python library for computations of ensembles of particles on GPU EnSPy architecture

GPU parallelization scheme for Nbody simulations

EnSPy: Python library for computations of ensembles of particles on GPU EnSPy architecture

Order of force calculation

EnSPy: Python library for computations of ensembles of particles on GPU Example: D5 potential

Outline
1 2 3 4 5 6 7 8 9

NVIDIA GPU Architecture and CUDA Python and CUDA EnSPy functionality EnSPy architecture Example: D5 potential Example: Hill problem Example: Hill problem, Nbody version Performance results Future development

EnSPy: Python library for computations of ensembles of particles on GPU Example: D5 potential

Overview
Problem: Escape from potential well. Watched values (trigger): N(t) number of particles, remaining in the well at time t Potential: UD5 = 2ay 2 x 2 + xy 2 + Critical energy: Ecr = ES = 0 x4 4

EnSPy: Python library for computations of ensembles of particles on GPU Example: D5 potential

Potential and structure of phase space:


Level lines of D5 potential
2

1
1

px
0 y

2
2 2 1 0 x 1 2

1 x

EnSPy: Python library for computations of ensembles of particles on GPU Example: D5 potential

Calculation setup:
Simple ensemble uniform initial distribution of N = 10240 particles in x > 0 U(x, y ) < E 12 lines of simple Python code (examples/d5.py): specication of integration parameters trigger: x = 0 q0 = 0.

EnSPy: Python library for computations of ensembles of particles on GPU Example: D5 potential

Results:
Regular particles are trapped in well initial mixed state splits
1 E = 0.1 0.8

0.6 N (t)/N (0) 0.4 E = 0.9

0.2

0 0 10 t 20 30

EnSPy: Python library for computations of ensembles of particles on GPU Example: Hill problem

Outline
1 2 3 4 5 6 7 8 9

NVIDIA GPU Architecture and CUDA Python and CUDA EnSPy functionality EnSPy architecture Example: D5 potential Example: Hill problem Example: Hill problem, Nbody version Performance results Future development

EnSPy: Python library for computations of ensembles of particles on GPU Example: Hill problem

Overview
Problem: Toy model of escape from star cluster: escape of star from potential of point rotating star cluster Mc and point galaxy core Mg Mc Watched values (trigger): N(t) number of particles, remaining in cluster at time t Potential in cluster frame of reference (tidal approximation): UHill = 3 2 x 2 GMc r2

Critical energy: Ecr = ES = 4.5 2

EnSPy: Python library for computations of ensembles of particles on GPU Example: Hill problem

Potential:

Hill curves

0.5

0.0

0.5

1.0 1.0

0.5

0.0 x

0.5

EnSPy: Python library for computations of ensembles of particles on GPU Example: Hill problem

Calculation setup:
Simple ensemble uniform initial distribution of N = 10240 particles in |x| < rt U(x, y ) < E =
1 3

trigger: |x| rt = 0 abs(q0) - 1.

rt = 1

= 0.

12 lines of simple Python code (examples/hill plain.py): specication of integration parameters

EnSPy: Python library for computations of ensembles of particles on GPU Example: Hill problem

Results:
Traping of regular particles (some tricky physics here):
1 104 8 103 6 103 4 103 2 103 0 0 2.5 104 5 104 nt

N (t)

E = 1.3 E = 0.8 E = 0.3 7.5 104 1 105

EnSPy: Python library for computations of ensembles of particles on GPU Example: Hill problem, Nbody version

Outline
1 2 3 4 5 6 7 8 9

NVIDIA GPU Architecture and CUDA Python and CUDA EnSPy functionality EnSPy architecture Example: D5 potential Example: Hill problem Example: Hill problem, Nbody version Performance results Future development

EnSPy: Python library for computations of ensembles of particles on GPU Example: Hill problem, Nbody version

Overview
Problem: Simplied model of escape from star cluster: escape of star from potential of rotating star cluster with total mass Mc and point potential of galaxy core with mass Mg Mc (2D) Watched values: Conguration of cluster Potential of galaxy core in cluster frame of reference (tidal approximation): UHillNB = 3 2 x 2

EnSPy: Python library for computations of ensembles of particles on GPU Example: Hill problem, Nbody version

Toy Hill model vs Nbody Hill model:

EnSPy: Python library for computations of ensembles of particles on GPU Example: Hill problem, Nbody version

Calculation setup:
Nbody ensemble 2D (z = 0) initial distribution of N = 10240 particles inside circle R with zero initial velocities 14 lines of simple Python code (examples/hill nbody.py): specication of integration parameters Mc = 1, R = 200, =
1 3

EnSPy: Python library for computations of ensembles of particles on GPU Example: Hill problem, Nbody version

Results: cluster conguration


step = 201
300

step = 401
300

step = 601
300

200

200

200

100

100

100

100

100

y 0 x 100 200 300

100

200

200

200

300 300

200

100

0 x

100

200

300

300 300

200

100

300 300

200

100

0 x

100

200

300

step = 801
300

step = 1001
300

step = 1201
300

200

200

200

100

100

100

100

100

y 0 x 100 200 300

100

200

200

200

300 300

200

100

0 x

100

200

300

300 300

200

100

300 300

200

100

0 x

100

200

300

EnSPy: Python library for computations of ensembles of particles on GPU Performance results

Not as good, as it could be subject to improve. Estimation: 1TFlops on 2x recent Fermi graphic processors
40

30

GF lop/s

20

10 GTX260 DP - N body GTX260 DP simple ensemble 0 1 104 2 104 5 104 N 1 105 2 105

EnSPy: Python library for computations of ensembles of particles on GPU Future development

Must have features:


MPI: shifting from one hostmultiple GPUs to multiple hostsmultiple GPUs environment individual timesteps for Hermite treecodes Performance improvements: utilization of texture memory better load balancing between GPUs

You might also like