Advanced
Python
for
Scien2fic
Compu2ng
Michael
Milligan
[email protected]
Follow
along!
h=ps://www.msi.umn.edu/content/programming
Files
in:
/home/support/public/tutorials/PythonSciComp/
To
get
the
most
out
of
this…
• Basic
knowledge
of
Python
• Working
Python
install
(feel
free
to
use
ours!)
– Enthought
Python
DistribuCon
/
Canopy
provides
scienCfic
and
math
libraries
pre-‐installed
• MSI
login
+
SSH
or
NX
• Follow
along!
Why
Python
for
ScienCfic
CompuCng?
• Rapid
development
– Easy,
readable
syntax
– VersaCle
tools
for
experimentaCon/learning
– Comprehensive
libraries
• Powerful
Features
– Process
data
at
near
“naCve
code”
speeds
– Excellent
visualizaCon
packages
– Comprehensive
libraries
When
you
leave
today,
you
should
be
able
to…
• Program
interacCvely
with
ipython
• understand
the
basics
of
numpy
and
scipy
• Efficiently
compute
with
large
arrays
of
data
• Load
and
save
data
to/from
files
on
disk
• Use
matplotlib
to
plot
data
• Take
advantage
of
supercompuCng
resources
with
parallel
compu2ng
• Know
where
to
turn
for
more
help
with
these
topics
Details
• We
are
describing
Enthought
Python
DistribuCon.
– EssenCally:
Pre-‐assembled
compilaCon
of
Python
2.7
+
numpy,
scipy,
other
useful
libraries
– Free
for
academic
use,
a
basic
version
is
free
for
non-‐
commercial
use
– Your
computers,
departments,
etc
may
have
a
different
version
of
Python
installed.
Everything
we
will
see
today
is
open
source.
• In
MSI:
module
load
python-‐epd
Workshop
ConvenCons
• UNIX
shell
commands
are
indicated
with
the
percent
sign.
• IPython
interpreter
commands
have
In/Out
labels
• Neither
sign
indicates
python
code
that
should
be
entered
into
a
text
file.
IPython:
InteracCve
Python
• Powerful
environment
for
interacCve
work
• Run
as
“ipython”
from
any
terminal
• -‐-‐pylab
opCon
auto-‐
loads
numpy,
sets
up
graphics
for
ploang
• Inspect
any
object
with
“?”
or
help()
IPython:
InteracCve
Python
• Build
up
a
workspace
of
objects
and
funcCons
• Full
history
access
through
Out[],
%recall,
up/down
arrow
keys
• %load,
%edit,
or
%run
external
files
• Lots
more,
type
%magic
NumPy
and
SciPy
• NumPy
provides:
• the
basic
array
and
matrix
data
types
• Efficient
implementaCons
of
low-‐level
math
operaCons
• A
large
library
of
high-‐level
math
funcCons
built
from
efficient
primiCves
• SciPy
provides:
• A
home
for
a
wide
variety
of
open-‐source
mathemaCcal
and
scienCfic
algorithms
• Modules
for
opCmizaCon,
signal
processing,
linear
algebra,
staCsCcs,
interpolaCon,
and
more
NumPy
arrays
• Array
data
type
with
vectorized
operaCons(similar
to
Matlab
or
IDL)
• Supports
same
operaCons
as
Python
list
type
• …except
every
element
is
of
same
data
type
• …so
they
can
be
stored
in
memory
packed
like
C
arrays
NumPy
arrays
are
fast
Here we are comparing a “pure Python” loop to the
equivalent in numpy
MulCdimensional
arrays
• NumPy
arrays
are
rectangles
in
arbitrarily
many
dimensions
• +
-‐
*
/
operate
element-‐by-‐
element
for
same-‐shape
arrays
Array
slicing
• Index
notaCon
gives
access
to
any
“slice”
of
an
array
• Array
slices
can
be
assigned
–
this
changes
the
original
array
• X
=
M[1,:,:].copy()
would
avoid
changing
M
Other
common
methods
• Numpy
arrays
have
many
useful
built-‐in
methods
Other
common
methods
• …and
the
numpy
module
provides
more
CondiCons
and
tests
• Vectorized
logical
operators
+
indexing
funcCons
• Output
of
index
funcCons
can
be
used
to
slice
arrays
CondiCons
and
tests
• Vectorized
logical
operators
+
indexing
funcCons
• Output
of
index
funcCons
can
be
used
to
slice
arrays
More
useful
numpy
modules
• numpy.fft
–
FFTs,
forward/inverse,
1-‐D
and
N-‐D
• numpy.random
–
generate
random
numbers,
many
distribuCons
to
choose
from
• numpy.matrix
–
special
arrays
that
obey
matrix
math
• numpy.polynomial
–
module
for
represenCng
and
manipulaCng
arbitrary
polynomials
Ploang
made
easy
• Matplotlib
provides
high-‐quality
2-‐D
(and
some
3-‐D)
ploang
• Display
in
window
or
output
to
PDF,
SVG,
PNG,
etc
• Implemented
as
modular
object-‐oriented
system
• Pylab
provides
a
Matlab-‐ish
interacCve
interface
to
Matplotlib
• Access
with
ipython
-‐-‐pylab
• Defaults
to
popping
up
plots
in
a
separate
window
Some
basic
examples…
Some
advanced
examples…
• These
examples
are
from
the
matplotlib.org
examples
secCon…
Some
advanced
examples…
Some
advanced
examples…
Some
advanced
examples…
Some
advanced
examples…
SciPy
expands
the
menu
• Clustering
algorithms
(scipy.cluster)
• IntegraCon
and
ODEs
(scipy.integrate)
• InterpolaCon
(scipy.interpolate)
• Input
and
output
(scipy.io)
• Linear
algebra
(scipy.linalg)
• MulC-‐dimensional
image
processing
(scipy.ndimage)
• OpCmizaCon
and
root
finding
(scipy.opCmize)
• Signal
processing
(scipy.signal)
• Sparse
matrices
(scipy.sparse)
• SpaCal
algorithms
and
data
structures
(scipy.spaCal)
• Special
funcCons
(scipy.special)
• StaCsCcal
funcCons
(scipy.stats)
• And
then
some…
SciPy
is
also
fast
• Most
SciPy
rouCnes
use
fast
NumPy
low-‐level
math
operaCons
• Some
SciPy
rouCnes
use
highly
opCmized
external
libraries
– E.g.
scipy.linalg
links
to
BLAS,
LAPACK
or
MKL
behind
the
scenes
Data
on
disk
• Chances
are
you
want
to
load
and
save
data
• numpy
and
scipy.io
offer
a
variety
of
faciliCes
Data
on
disk:
text
files
• Very
common
for
smaller
data
sets:
simple
columns
of
numbers
• numpy.loadtxt()
–
simple
interface,
good
defaults
• numpy.genfromtxt()
–
more
complex,
handles
unusual
formaang,
comments,
missing
values,
etc
Data
on
disk:
text
files
• Numpy.savetxt()
–
write
to
columns
of
numbers
Data
on
disk:
binary
formats
• Binary
data
is
much
more
scalable
• Smaller
files
on
disk
• Faster
to
load
and
save
• May
be
necessary
to
exchange
data
with
other
sopware
• SCck
to
portable
(machine-‐independent)
formats
Data
on
disk:
binary
formats
• NumPy
na2ve
format
(.npy)
• numpy.load()
and
numpy.save()
• Or
use
numpy.savez()
to
store
many
arrays
in
compressed
.npz
• Fast,
portable,
but
mostly
only
supported
by
Python
•
scipy.io.matlab
–
support
for
Matlab
(.mat)
•
scipy.io.loadmat()
and
scipy.io.savemat()
•
scipy.io.idl
–
read
(no
save)
IDL
.sav
files
•
scipy.io.readsav()
Data
on
disk:
binary
formats
• Many
standard
formats
supported
• scipy.io.netcdf
–
NetCDF3
interface
• h5py
exposes
HDF5
API
• PyTables
is
an
excellent
high-‐level
interface
to
HDF5
• pyfits
for
FITS
datasets
• Etc…
Scaling
up
with
parallelizaCon
• For
big
jobs
you
will
eventually
want
to
parallelize
your
code
• The
Python
interpreter
has
trouble
with
mulCthreading
–
mulC-‐process
is
usually
best
• Approach
depends
on
the
problem
you
need
to
solve
Parallel
processes
• Many
jobs
need
to
process
lots
of
data,
don’t
need
to
communicate
amongst
themselves
• SomeCmes
called
“embarrassingly
parallel”
• GNU
Parallel
-‐-‐
a
simple
way
to
launch
jobs
• Launch
one
job
for
every
file
in
a
dir,
line
in
a
file,
etc
• Can
work
with
PBS
on
itasca
to
use
many
nodes
GNU
Parallel
example
GNU
Parallel
example
• -‐j
should
match
ppn
(unless
you
know
what
you’re
doing)
–
this
is
processes
per
node
• Will
run
one
job
per
line
of
input
on
stdin
or
in
argfile
–
max
of
nodes
*
ppn
running
at
once
• See
“man
parallel”
for
more
features
MPI
for
Python
• MPI
“Message
Passing
Interface”
enables
parallel
processes
to
communicate
efficiently
• Commonly
one
process
will
be
“controller”
and
manage
worker
processes
• Inherent
support
for
scaser-‐gather
operaCons
• MPI
is
well-‐supported
on
our
clusters
• mpi4py
interfaces
to
MPI
from
inside
Python
• Caveat
for
MPI
gurus:
numpy
does
not
have
distributed
arrays
yet,
complicates
some
algorithms
Example
with
mpi4py
• Simple
“Hello
world”
script
Example
with
mpi4py
• Simple
“Hello
world”
script
More
with
mpi4py
• Possible
to
pass
numpy
arrays
like
buffers
More
with
mpi4py
• Also
works
with
(pickle-‐able)
Python
objects
• Much
slower
than
C-‐based
arrays,
but
very
convenient
Too
much
to
cover…
• Ipython
notebook
–
connect
to
ipython
with
a
browser
for
a
MathemaCca-‐like
notebook
interface
• PyCUDA
and
PyOpenCL
–
GPU
compuCng
• SymPy
–
MathemaCca-‐style
symbolic
math
• Databases
are
easy
to
connect
to
Python;
or
use
advanced
big
data
toolkit
like
Pandas
or
PyTables
IPython
notebook
example
• Example:
IPython
notebook
with
pylab
and
sympy
• notebook
creates
graphical
log
in
a
browser
• sympy:
symbolic
CAS
• To
try
this:
Community
and
DocumentaCon
• AcCvely
developed
and
supported
• Excellent
documentaCon
• www.python.org/doc
• Scipy.org
• wiki.scipy.org
• Ipython.org
• matplotlib.org
• Mpi4py.scipy.org
Next
Step
• Hands-‐on
• You
can
also
run
the
examples
on
your
laptop’s
Python
distribuCon
• Enthought
is
installed
in
all
labs
and
on
supercomputers
at
MSI
• Full
academic
version
of
Enthought
Canopy
installed
(not
default
yet)
• QuesCons!