Python For Scientific and High Performance Com
Python For Scientific and High Performance Com
Performance Computing
SC09
Portland, Oregon, United States
Monday, November 16, 2009 1:30PM - 5:00PM
http://www.cct.lsu.edu/~wscullin/sc09python/
Introductions
Your presenters:
William R. Scullin
[email protected]
James B. Snyder
[email protected]
Nick Romero
[email protected]
Massimo Di Pierro
[email protected]
Overview
We seek to cover:
Python language and interpreter basics
Popular modules and packages for scientific applications
How to improve performance in Python programs
How to visualize and share data using Python
Where to find documentation and resources
Do:
Feel free to interrupt
the slides are a guide - we're only successful if you learn
what you came for; we can go anywhere you'd like
Ask questions
Find us after the tutorial
About the Tutorial Environment
In [1]:
Python Basics
Interpreter
Built-in Types, keywords, functions
Control Structures
Exception Handling
I/O
Modules, Classes & OO
Interpreters
# start = 0, stop = 10
>>> for element in range(0,10):
... print element,
0 1 2 3 4 5 6 7 8 9
>>> try:
... 1/0
... except ZeroDivisionError:
... print "Oops! divide by zero!"
... except:
... print "some other exception!"
i/o is relatively weak out of the box - luckily there are the
following alternatives:
h5py
Python bindings for HDF5
http://code.google.com/p/h5py/
netCDF4
Python bindings for NetCDF
http://netcdf4-python.googlecode.
com/svn/trunk/docs/netCDF4-module.html
mpi4py allows for classic MPI-IO via MPI.File
Modules
import - load module, define in namespace
>>> import random # import module
>>> random.random() # execute module method
0.82585453878964787
# NOTE: arow & cols are NOT copies, they point to the original data
>>> arow[:] = 0
>>> arow
array([0, 0, 0, 0])
>>> a
array([[0, 0, 0, 0],
[9, 8, 7, 6],
[1, 6, 5, 4]])
# Copy data
>>> copyrow = arow.copy()
Broadcasting with ufuncs
apply operations to many elements with a single call
>>> a = np.array(([1,2,3,4],[8,7,6,5]))
>>> a
array([[1, 2, 3, 4],
[8, 7, 6, 5]])
threading
useful for certain concurrency issues, not usable for parallel
computing due to Global Interpreter Lock (GIL)
subprocess
relatively low level control for spawning and managing
processes
MPI
mpi4py exposes your full local MPI API within Python
as scalable as your local MPI
Python Threading
Python threads
real POSIX threads
share memory and state with their parent processes
do not use IPC or message passing
light weight
generally improve latency and throughput
there's a heck of a catch, one that kills performance...
The Infamous GIL
def calcInside(nsamples,rank):
global inside #we need something everyone can share
random.seed(rank)
for i in range(nsamples):
x = random.random();
y = random.random();
if (x*x)+(y*y)<1:
inside += 1
if __name__ == '__main__':
nt=4 # thread count
inside = 0 #you need to initialize this
samples=100000
threads=[Thread(target=calcInside, args=(samples/nt,i)) for i in range
(nt)]
for t in threads:
t.start()
for t in threads:
t.join()
print (inside*4.0)/samples
Subprocess
The subprocess module allows the Python interpreter to
spawn and control processes. It is unaffected by the GIL. Using
the subprocess.Popen() call, one may start any process
you'd like.
processes = mp.cpu_count()
nsamples = 120000/processes
def calcInside(rank):
inside = 0
random.seed(rank)
for i in range(nsamples):
x = random.random();
y = random.random();
if (x*x)+(y*y)<1:
inside += 1
return (4.0*inside)/nsamples
if __name__ == '__main__':
pool = mp.Pool(processes)
result = pool.map(calcInside, range(processes))
print np.mean(result)
pi with multiprocessing, optimized
import multiprocessing as mp
import numpy as np
processes = mp.cpu_count()
nsamples = 120000/processes
def calcInsideNumPy(rank):
np.random.seed(rank)
xy = np.random.random((nsamples,2))**2 # "vectorized" sample gen
return 4.0*np.sum(np.sum(xy,1)<1)/nsamples
if __name__ == '__main__':
pool = mp.Pool(processes)
result = pool.map(calcInsideNumPy, range(processes))
print np.mean(result)
mpi4py
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
mpisize = comm.Get_size()
nsamples = 120000/mpisize
inside = 0
random.seed(rank)
for i in range(nsamples):
x = random.random();
y = random.random();
if (x*x)+(y*y)<1:
inside += 1
if rank==0:
print (1.0 / mpisize)*pi
Performance
If at all possible...
Don't reinvent the wheel.
someone has probably already done a better job than
your first (and probably second) attempt
Build your own modules against optimized libraries
ESSL, ATLAS, FFTW, PyCUDA, PyOpenCL
Use NumPy data types instead of Python ones
Use NumPy functions instead of Python ones
"vectorize" operations on >1D data types.
avoid for loops, use single-shot operations
Pre-allocate arrays instead of repeated concatenation
use numpy.zeros, numpy.empty, etc..
Real-World Examples and Techniques:
GPAW
Outline
Introduction
NumPy
Memory
FLOPs
Parallel-Python Interpreter and Debugging
Profiling mixed Python-C code
Python Interface BLACS and ScaLAPACK
Concluding remarks
Overview
GPAW is an implementation of the projector augmented
wave method (PAW) method for Kohn-Sham (KS) -
Density Functional Theory (DFT)
Mean-field approach to Schrodinger equation
Uniform real-space grid, multiple levels of parallelization
Non-linear sparse eigenvalue problem
10^6 grid points, 10^3 eigenvalues
Solved self-consistently using RMM-DIIS
Nobel prize in Chemistry to Walter Kohn (1998) for (KS)-DFT
Ab initio atomistic simulation for predicting material
properties
Massively parallel and written in Python-C using the
NumPy library.
GPAW Strong-scaling Results
GPAW code structure
for i in xrange(N):
Y[i] += alpha*X[i]
C[i] += A[i]*B[i]
NumPy - FLOPS
if (x->descr->type_num == PyArray_DOUBLE)
dscal_(&n, &(alpha.real), DOUBLEP(x), &incx);
else
zscal_(&n, &alpha, (void*)COMPLEXP(x), &incx);
Py_RETURN_NONE;
}
Profiling Mixed Python-C code
2048 cores!
Python Interface to BLACS and ScaLAPACK
BLACS: ScaLAPACK:
Cblacs_gridexit numroc
Cblacs_gridinfo Cpdgem2d
Cblacs_gridinit Cpzgemm2d
Cblacs_pinfo Cpdgemr2do
Csys2blacs_handle Cpzgemr2do
Python:
blacs_create
blacs_destroy
blacs_redist
Python Interface to BLACS and ScaLAPACK
GPAW will allow you to run multiple concurrent DFT calculations with
a single executable.
High-throughput computing (HTC) for catalytic materials screening.
Perform compositional sweeps trivially.
Manage the dispatch of many tasks without 3rd party software
Suppose 512-node partition, 2048 cores
Each DFT calculation requires 128 cores, no guarantee that
they all finish at the same time
Set-up for N >> 2048/128 calculations. As soon as one DFT
calculations finish, start another one until the job runs out of
wall-clock time.
Python for plotting and visualization
Overview of matplotlib
Example of MC analysis tool written in Python
Looking at data sharing on the web
From a Scientific Library
To a Scientific Application
Massimo Di Pierro
From Lib to App
(overview)
Numerical Algorithms
From Lib to App
(overview)
Plotting
Plotting
user
gnuplot.py
Ruby on Rails
r.py
Django
Chaco
TurboGears
Dislin
Pylons ...
... matplotlib
web2py
Why?
code project
web2py and MVC
code project
application1
application
2
application
3
web2py and MVC
Models Controllers Views
code project
application
1
application=”
dna”
application
3
Data Data
Logic/Workflow
representation presentation
web2py and MVC
Models Controllers Views
code project
application
1
<h1>
db.define_table( def upload_dna():
Upload DNA Seq. Minimal
‘dna’, return dict(form=
application=” Field(‘sequence’)) crud.create(db.dna))
</h1> Complete
dna” {{=form}} Application
application
3
Data Data
Logic/Workflow
representation presentation
web2py and Dispatching
<h1>
Upload DNA Seq.
</h1>
{{=form}}
web2py and Dispatching
hostnam
e
web2py and Dispatching
app name
web2py and Dispatching
controller
web2py and Dispatching
action
name
web2py and Views
<h1>
Upload DNA Seq.
</h1>
{{=form}}
web2py and Views
<h1>
Upload DNA Seq.
</h1>
{{=form}}
{{=form}}
web2py and Authentication
authenticatio
n
web2py and AppAdmin
database interface
web2py web based IDE
web based IDE
Goal
db.define_table('dna',
Field('name'),
Field('sequence','text'))
def random_gene(n):
return ''.join(['ATGC'[int(n+10*math.sin(n*k)) % 4] \
for k in range(10+n)])+'UAA'
def random_dna():
return ''.join([random_gene(random.randint(0,10)) \
for k in range(50)])
if not db(db.dna.id>0).count():
for k in range(100):
db.dna.insert(name=uuid.uuid4(),sequence=random_dna())
Define some algorithms
def find_gene_size(a):
r=re.compile('(UAA|UAG|UGA)(?P<gene>.*?)(UAA|UAG|UGA)')
return [(g.start(),len(g.group('gene'))) \
for g in r.finditer(a)]
def needleman_wunsch(a,b,p=0.97):
"""Needleman-Wunsch and Smith-Waterman"""
z=[]
for i,r in enumerate(a):
z.append([])
for j,c in enumerate(b):
if r==c:
z[-1].append(z[i-1][j-1]+1 if i*j>0 else 1)
else:
z[-1].append(p*max(z[i-1][j] if i>0 else 0,
z[i][j-1] if j>0 else 0))
return z
in models/matplotlib_helpers.py
def plot(title='title',xlab='x',ylab='y',data={}):
fig=Figure()
fig.set_facecolor('white')
ax=fig.add_subplot(111)
if title: ax.set_title(title)
if xlab: ax.set_xlabel(xlab)
if ylab: ax.set_ylabel(ylab)
legend=[]
keys=sorted(data)
for key in keys:
stream = data[key]
(x,y)=([],[])
for point in stream:
x.append(point[0])
y.append(point[1])
ell=ax.hist(y,20)
canvas=FigureCanvas(fig)
response.headers['Content-Type']='image/png'
stream=cStringIO.StringIO()
canvas.print_png(stream)
return stream.getvalue()
Define actions
in controllers/default.py
def index():
rows=db(db.dna.id).select(db.dna.id,db.dna.name)
return dict(rows=rows)
@auth.requires_login()
def gene_size():
dna = db.dna[request.args(0)] or \
redirect(URL(r=request,f='index'))
lengths = find_gene_size(dna.sequence)
return hist(data={'Lengths':lengths})
Define Views
in views/default/index.html
{{extend 'layout.html'}}
<a href="{{=URL(r=request,f='compare')}}">compare</a>
<ul>
{{for row in rows:}}
<li>{{=row.name}}
[<a href="{{=URL(r=request,f='gene_size',args=row.id)}}">gene sizes</a>]
</li>
{{pass}}
</ul>
Try it
in models/matplotlib_helpers.py
def pcolor2d(title='title',xlab='x',ylab='y',
z=[[1,2,3,4],[2,3,4,5],[3,4,5,6],[4,5,6,7]]):
fig=Figure()
fig.set_facecolor('white')
ax=fig.add_subplot(111)
if title: ax.set_title(title)
if xlab: ax.set_xlabel(xlab)
if ylab: ax.set_ylabel(ylab)
image=ax.imshow(z)
image.set_interpolation('bilinear')
canvas=FigureCanvas(fig)
response.headers['Content-Type']='image/png'
stream=cStringIO.StringIO()
canvas.print_png(stream)
return stream.getvalue()
Define Actions
in controllers/default.py
def needleman_wunsch_plot():
dna1 = db.dna[request.vars.sequence1]
dna2 = db.dna[request.vars.sequence2]
z = needleman_wunsch(dna1.sequence,dna2.sequence)
return pcolor2d(z=z)
def compare():
form = SQLFORM.factory(
Field('sequence1',db.dna,
requires=IS_IN_DB(db,'dna.id','%(name)s')),
Field('sequence2',db.dna,
requires=IS_IN_DB(db,'dna.id','%(name)s')))
if form.accepts(request.vars):
image=URL(r=request,f='needleman_wunsch_plot',
vars=form.vars)
else:
image=None
return dict(form=form, image=image)
Define Views
in views/default/compare.html
{{extend 'layout.html'}}
{{=form}}
{{if image:}}
Sequence1 = {{=db.dna[request.vars.sequence1].name}}<br/>
Sequence2 = {{=db.dna[request.vars.sequence2].name}}<br/>
Python
http://www.python.org/
all the current documentation, software, tutorials, news, and pointers to advice
you'll ever need
GPAW
https://wiki.fysik.dtu.dk/gpaw/
GPAW documentation and code
SciPy and NumPy
http://numpy.scipy.org/
The official NumPy website
http://conference.scipy.org/
The annual SciPy conference
http://www.enthought.com/
Enthought, Inc. the commercial sponsors of SciPy, NumPy, Chaco, EPD and
more
Matplotlib
http://matplotlib.sourceforge.net/
best 2D package on the planet
mpi4py
http://mpi4py.scipy.org/
Yet More Resources
Tau
http://www.cs.uoregon.edu/research/tau/home.php
official open source site
http://www.paratools.com/index.php
commercial tools and support for Tau
web2py
http://www.web2py.com/
web framework used in this tutorial
Hey! There's a Python BOF
Python for High Performance and Scientific Computing
Primary Session Leader:
Andreas Schreiber (German Aerospace Center)
This work is supported in part by the the members of the Chicago Python
resources of the Argonne Leadership User's Group (ChiPy) for allowing us
Computing Facility at Argonne National to ramble on about science and HPC
Laboratory, which is supported by the the Python community for their
Office of Science of the U.S. Department feedback and support
of Energy under contract DE-AC02- CCT at LSU
06CH11357. numerous others at HPC centers
nationwide
Extended thanks to
Northwestern University
De Paul University
the families of the presenters
Sameer Shende, ParaTools, Inc.
Enthought, Inc. for their continued
support and sponsorship of SciPy
and NumPy
Lisandro Dalcin for his work on
mpi4py and tolerating a lot of
questions