1st ASTERICS-OBELICS International School
6-9 June 2017, Annecy, France.
PYTHON LIBRARIES
Tamás Gál
[email protected]
@tamasgal
https://github.com/tamasgal
OVERVIEW
• Who is this clown?
• Python Introduction
• Basic Python Internals
• Libraries and Tools for Scientific Computing
• NumPy
• Numba
• NumExpr
} Make it faster!
}
• SciPy
• AstroPy
• Pandas
• SymPy Tools for scientists!
• Matplotlib
• Jupyter
• IPython
WHO IS THIS CLOWN?
• Tamás Gál, born 1985 in Debrecen (Hungary)
• PhD candidate in astro particle physics at
Erlangen Centre for Astroparticle Physics (ECAP) working on the KM3NeT project
• Programming background:
• Coding enthusiast since ~1993
• First real application written in Amiga Basic (toilet manager, tons of GOTOs)
• Python, JuliaLang, JavaScript and C/C++/Obj-C for work
• Haskell for fun
• Earlier also Java, Perl, PHP, Delphi, MATLAB, whatsoever…
• I also like playing around with integrated circuits and Arduino
• Some related projects:
KM3Pipe (core analysis framework in the KM3NeT experiment),
RainbowAlga (interactive 3D neutrino event display),
ROyWeb (interactive realtime visualisation/graphing)
3
PYTHON
BRIEF HISTORY OF PYTHON
• Rough idea in the late 1980s
• Meant to descend the ABC language
• First line of code in December 1989 by Guido van Rossum
• Python 2.0 in October 2000
• Python 3.0 in December 2008
5
PYTHONS POPULARITY
“Programming language of the year” in 2007 and 2010.
6
POPULAR LANGUAGES
Python is currently the fourth most popular language
and rocks the top 10 since 2003.
7
YOUR JOURNEY THROUGH PYTHON?
(JUST A VERY ROUGH GUESS, NOT A MEAN GAME)
Raise your hand and keep it up until you answer a question with “no”.
• Have you ever launched the Python interpreter?
Explorer
• Wrote for/while-loops or if/else statements?
• …your own functions?
Novice
• …classes?
• …list/dict/set comprehensions?
Intermediate
• Do you know what a generator is?
• Have you ever implemented a decorator?
Advanced
• …a metaclass?
• …a C-extension?
Are you
• Do you know and can you explain the output of the following line? kidding me???
print(5 is 7 - 2, 300 is 302 - 2)
8
BASIC PYTHON INTERNALS
to understand the performance issues
FROM SOURCE TO RUNTIME
source bytecode
foo.py compiler foo.pyc interpreter runtime
10
DATA IN PYTHON
PyObject
ref.
type
Every piece of data is a PyObject
count
•
structural
subtype
PyIntObject
>>> dir(42)
['__abs__', '__add__', '__and__', '__bool__', '__ceil__', '__class__', ref.
'__delattr__', '__dir__', '__divmod__', '__doc__', '__eq__', '__float__',
type count
'__floor__', '__floordiv__', '__format__', '__ge__', '__getattribute__',
'__getnewargs__', '__gt__', '__hash__', '__index__', '__init__',
'__init_subclass__', '__int__', '__invert__', '__le__', '__lshift__', '__lt__',
'__mod__', '__mul__', '__ne__', '__neg__', '__new__', '__or__', '__pos__', 42
'__pow__', '__radd__', '__rand__', '__rdivmod__', '__reduce__', '__reduce_ex__',
'__repr__', '__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__', '__ror__',
'__round__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__',
'__rxor__', '__setattr__', '__sizeof__', '__str__', '__sub__',
'__subclasshook__', '__truediv__', '__trunc__', '__xor__', 'bit_length',
'conjugate', 'denominator', 'from_bytes', 'imag', 'numerator', 'real',
PyTypeObject
'to_bytes']
(_typeobject)
type
field ref.
attr.
count
11 attr. field attr.
THE TYPE OF A PyObject
“An object has a ‘type’ that determines what it
represents and what kind of data it contains.
An object’s type is fixed when it is created. Types
themselves are represented as objects. The type itself
has a type pointer pointing to the object representing
the type ‘type’, which contains a pointer to itself!”
— object.h
12
YOUR BEST FRIEND AND WORST ENEMY:
GIL - Global Interpreter Lock
• The GIL prevents parallel execution of (Python) bytecode
• Even though Python has real threads, they never execute
code at the same time
• Context switching between threads creates overhead
(the user cannot control thread-priority)
• Threads perform pretty bad on CPU bound tasks
• They do a great job speeding up I/O heavy tasks
13
THREADS AND CPU BOUND TASKS
single thread: two threads:
This is probably not really what you expected…
14
THREADS FIGHTING FOR THE GIL
OS X: 4 threads on 1 CPU (Python 2.6)
By David M Beazley: http://dabeaz.com/GIL/gilvis
15
THREADS FIGHTING FOR THE GIL
OS X: 4 threads on 4 CPUs (Python 2.6)
By David M Beazley: http://dabeaz.com/GIL/gilvis
16
OK, but then: how should Python ever compete with
all those super fast C/Fortran libraries?
C-extensions and interfacing C/Fortran!
Those can release the GIL and do the heavy stuff in
the background.
A DUMB SPEED COMPARISON
CALCULATING THE MEAN OF 1000000 RANDOM NUMBERS
pure Python: NumPy (~13x faster):
Julia (~16x faster):
Numba (~8x faster):
19
CRAZY LLVM COMPILER OPTIMISATIONS
SUMMING UP NUMBERS FROM 0 TO N=100,000,000
pure Python: NumPy (~80x faster):
Julia (~7000000x faster):
Numba (~300000x faster):
pushq %rbp
movq %rsp, %rbp
xorl %eax, %eax
Source line: 3
testq %rdi, %rdi
jle L32
leaq -1(%rdi), %rax
leaq -2(%rdi), %rcx
mulq %rcx
shldq $63, %rax, %rdx
leaq -1(%rdx,%rdi,2), %rax
Source line: 6
L32:
popq %rbp
retq
nopw %cs:(%rax,%rax)
20
PYTHON LIBRARIES
for scientific computing
pandas
NumPy
Jupyter
Matplotlib
SimPy IPython
AstroPy
SciPy Not part of NumFocus but covered in this talk:
Numba Numexpr
22
SCIPY
Scientific Computing Tools for Python
THE SCIPY STACK
• Core packages
• SciPy Library: numerical algorithms, signal processing, optimisation, statistics etc.
• NumPy
• Matplotlib: 2D/3D plotting library
• pandas: high performance, easy to use data structures
• SymPy: symbolic mathematics and computer algebra
• IPython: a rich interactive interface to process data and test ideas
• nose: testing framework for Python code
• Other packages:
• Chaco, Mayavi, Cython, Scikits (scikit-learn, scikit-image), h5py, PyTables and
much more
https://www.scipy.org
24
SCIPY CORE LIBRARY
• Clustering package (scipy.cluster)
• Constants (scipy.constants)
• Discrete Fourier transforms (scipy.fftpack)
• Integration and ODEs (scipy.integrate)
• Interpolation (scipy.interpolate)
• Input and output (scipy.io)
• Linear algebra (scipy.linalg)
• Miscellaneous routines (scipy.misc)
• Multi-dimensional image processing (scipy.ndimage)
• Orthogonal distance regression (scipy.odr)
• Optimization and root finding (scipy.optimize)
• Signal processing (scipy.signal)
• Sparse matrices (scipy.sparse)
• Sparse linear algebra (scipy.sparse.linalg)
• Compressed Sparse Graph Routines
(scipy.sparse.csgraph)
• Spatial algorithms and data structures (scipy.spatial)
• Special functions (scipy.special)
• Statistical functions (scipy.stats)
• Statistical functions for masked arrays (scipy.stats.mstats)
25
SCIPY INTERPOLATE
from scipy import interpolate
x = np.linspace(0, 10, 10)
y = np.sin(x)
x_fine = np.linspace(0, 10, 500)
f_linear = interpolate.interp1d(x, y, kind='linear')
f_bicubic = interpolate.interp1d(x, y, kind='cubic')
plt.plot(x, y, 'o',
x_fine, f_linear(x_fine), '--',
x_fine, f_bicubic(x_fine), ‘-.');
26
NUMPY
Numerical Python
NUMPY
NumPy is the fundamental package for scientific computing with Python.
• gives us a powerful N-dimensional array object: ndarray
• broadcasting functions
• tools for integrating C/C++ and Fortran
• linear algebra, Fourier transform and random number capabilities
• most of the scientific libraries build upon NumPy
28
NUMPY: ndarray
ndim: 1
shape: (6,)
1 2 3 4 5 6
Continuous array in memory with a fixed type,
no pointer madness!
C/Fortran compatible memory layout,
so they can be passed to those
without any further efforts.
29
NUMPY: ARRAY OPERATIONS AND ufuncs
easy and intuitive
element-wise
operations
a ufunc, which can operate both on scalars and arrays (element-wise)
30
RESHAPING ARRAYS
ndim: 1
shape: (6,)
1 2 3 4 5 6
a[0] a[1]
No rearrangement of the elements
but setting the iterator limits internally!
31
RESHAPING ARRAYS IS CHEAP
Don’t worry, we will discover NumPy in the hands-on workshop!
32
MATPLOTLIB
A Python plotting library which produces publication quality figures in a variety
of hardcopy formats and interactive environments.
• Integrates well with IPython and Jupyter
• Plots, histograms, power spectra, bar charts, error chars, scatterplots, etc. with
an easy to use API
• Full control of line styles, font properties, axes properties etc.
• The easiest way to get started is browsing its wonderful gallery full of
thumbnails and copy&paste examples:
http://matplotlib.org/gallery.html
34
MATPLOTLIB EXAMPLE
35
MATPLOTLIB EXAMPLE
36
PANDAS
A Python Data Analysis Library inspired by data frames in R, which
• gives us a powerful data structure: DataFrame
• database-like handling of data
• integrates well with NumPy
• wraps the Matplotlib API
• has a huge number of I/O related functions to parse data:
CSV, HDF5, SQL, Feather, JSON, HTML, Excel, and more…
38
THE DataFrame
A table-like structure, where you can access elements
by row and column.
39
THE DataFrame
Lots of functions to allow filtering, manipulating
and aggregating the data to fit your needs.
Don’t worry, we will discover Pandas in the hands-on workshop!
40
sponsored by
NUMBA
JIT (LLVM) compiler for Python
NUMBA
Numba is a compiler for Python array and numerical functions that gives you the
power to speed up code written in directly in Python.
• uses LLVM to boil down pure Python code to JIT optimised machine code
• only accelerate selected functions decorated by yourself
• native code generation for CPU (default) and GPU
• integration with the Python scientific software stack (thanks to NumPy)
• runs side by side with regular Python code or third-party C extensions and libraries
• great CUDA support
• N-core scalability by releasing the GIL (beware: no protection from race conditions!)
• create NumPy ufuncs with the @[gu]vectorize decorator(s)
42
FROM SOURCE TO RUNTIME
source bytecode
foo.py compiler foo.pyc interpreter runtime
Control flow graph Data flow graph Codegen via
LLVM
bytecode
interpretation
Typed
Numba IR Type inference
Numba IR
Lowering LLVM IR
43
NUMBA JIT-EXAMPLE
numbers = np.arange(1000000).reshape(2500, 400)
def sum2d(arr): @nb.jit
M, N = arr.shape def sum2d_jit(arr):
result = 0.0 M, N = arr.shape
for i in range(M): result = 0.0
for j in range(N): for i in range(M):
result += arr[i,j] for j in range(N):
return result result += arr[i,j]
return result
289 ms ± 3.02 ms per loop 2.13 ms ± 42.6 µs per loop
~135x faster, with a single line of code
44
NUMBA VECTORIZE-EXAMPLE
a = np.arange(1000000, dtype='f8')
b = np.arange(1000000, dtype='f8') + 23
NumPy:
np.abs(a - b) / (np.abs(a) + np.abs(b)) 23 ms ± 845 µs per loop
Numba @vectorize:
@nb.vectorize
def nb_rel_diff(a, b):
return abs(a - b) / (abs(a) + abs(b))
rel_diff(a, b) 3.56 ms ± 43.2 µs per loop
~6x faster
45
NUMEXPR
initially written by David Cooke
Routines for the fast evaluation of array expressions elementwise
by using a vector-based virtual machine.
NUMEXPR USAGE EXAMPLE
import numpy as np
import numexpr as ne
a = np.arange(5)
b = np.linspace(0, 2, 5)
ne.evaluate("a**2 + 3*b”)
array([ 0. , 2.5, 7. , 13.5, 22. ])
47
NUMEXPR SPEED-UP
a = np.random.random(1000000)
NumPy:
2 * a**3 - 4 * a**5 + 6 * np.log(a) 82.4 ms ± 1.88 ms per loop
Numexpr with 4 threads:
ne.set_num_threads(4)
ne.evaluate("2 * a**3 - 4 * a**5 + 6 * log(a)") 7.85 ms ± 103 µs per loop
~10x faster
48
NUMEXPR - SUPPORTED OPERATORS
• Logical operators: &, |, ~
• Comparison operators:
<, <=, ==, !=, >=, >
• Unary arithmetic operators: -
• Binary arithmetic operators:
+, -, *, /, **, %, <<, >>
49
NUMEXPR - SUPPORTED FUNCTIONS
• where(bool, number1, number2): number -- number1 if the bool condition is true, number2 otherwise.
• {sin,cos,tan}(float|complex): float|complex -- trigonometric sine, cosine or tangent.
• {arcsin,arccos,arctan}(float|complex): float|complex -- trigonometric inverse sine, cosine or tangent.
• arctan2(float1, float2): float -- trigonometric inverse tangent of float1/float2.
• {sinh,cosh,tanh}(float|complex): float|complex -- hyperbolic sine, cosine or tangent.
• {arcsinh,arccosh,arctanh}(float|complex): float|complex -- hyperbolic inverse sine, cosine or tangent.
• {log,log10,log1p}(float|complex): float|complex -- natural, base-10 and log(1+x) logarithms.
• {exp,expm1}(float|complex): float|complex -- exponential and exponential minus one.
• sqrt(float|complex): float|complex -- square root.
• abs(float|complex): float|complex -- absolute value.
• conj(complex): complex -- conjugate value.
• {real,imag}(complex): float -- real or imaginary part of complex.
• complex(float, float): complex -- complex from real and imaginary parts.
• contains(str, str): bool -- returns True for every string in `op1` that contains `op2`.
• sum(number, axis=None): Sum of array elements over a given axis. Negative axis are not supported.
• prod(number, axis=None): Product of array elements over a given axis. Negative axis are not supported.
50
THE HISTORY OF ASTROPY
(standard situation back in 2011)
• Example Problem: convert from EQ J2000 RA/Dec to Galactic
coordinates
• Solution in Python
• pyast
• Astrolib
huge discussion
• Astrophysics
started in June 2011
• PyEphem series of votes
• PyAstro
• Kapteyn
• ???
First public version (v0.2) presented and described in the following paper:
http://adsabs.harvard.edu/abs/2013A%26A...558A..33A
52
ASTROPY CORE PACKAGE
A community-driven package intended to contain much of the core functionality and
some common tools needed for performing astronomy and astrophysics with Python.
• Data structures and transformations
• constants, units and quantities, N-dimensional datasets, data tables, times and dates,
astronomical coordinate system, models and fitting, analytic functions
• Files and I/O
• unified read/write interface
• FITS, ASCII tables, VOTable (XML), Virtual Observatory access, HDF5, YAML, …
• Astronomy computations and utilities
• cosmological calculations, convolution and filtering, data visualisations, astrostatistics
tools
53
ASTROPY
AFFILIATED PACKAGES
• Tons of astronomy related packages
• which are not part of the core package,
• but has requested to be included as part of the
Astropy project’s community
54
ASTROPY EXAMPLE
downloading via HTTP
checking some FITS meta
extracting image data
plotting via Matplotlib
ASTROPY EXAMPLE
Don’t worry, we will discover AstroPy in the hands-on workshop!
56
A Python library for symbolic mathematics.
SIMPY
• It aims to become a full-featured computer algebra system (CAS)
• while keeping the code as simple as possible
• in order to be comprehensible and easily extensible.
• SymPy is written entirely in Python.
• It only depends on mpmath, a pure Python library for arbitrary
floating point arithmetic
SIMPY
• solving equations
• solving differential equations
• simplifications: trigonometry, polynomials
• substitutions
• factorisation, partial fraction decomposition
• limits, differentiation, integration, Taylor series
• combinatorics, statistics, …
• much much more
SIMPY EXAMPLE
In [1]: import math
Base Python In [2]: math.sqrt(8)
Out[2]: 2.8284271247461903
In [3]: math.sqrt(8)**2
Out[3]: 8.000000000000002
In [4]: import sympy
SymPy
In [5]: sympy.sqrt(8)
Out[5]: 2*sqrt(2)
In [6]: sympy.sqrt(8)**2
Out[6]: 8
SIMPY EXAMPLE
In [15]: x, y = sympy.symbols('x y')
In [16]: expr = x + 2*y
In [17]: expr
Out[17]: x + 2*y
In [18]: expr + 1
Out[18]: x + 2*y + 1
In [19]: expr * x
Out[19]: x*(x + 2*y)
In [20]: sympy.expand(expr * x)
Out[20]: x**2 + 2*x*y
SIMPY EXAMPLE
In [1]: import sympy
In [2]: from sympy import init_printing, integrate, diff, exp, cos, sin, oo
In [3]: init_printing(use_unicode=True)
In [4]: x = sympy.symbols('x')
In [5]: diff(sin(x)*exp(x), x)
Out[5]:
x x
ℯ ⋅sin(x) + ℯ ⋅cos(x)
In [6]: integrate(exp(x)*sin(x) + exp(x)*cos(x), x)
Out[6]:
x
ℯ ⋅sin(x)
In [7]: integrate(sin(x**2), (x, -oo, oo))
Out[7]:
√2⋅√π
%%%%%
2 62
IPYTHON
• The interactive Python shell!
• Object introspection
• Input history, persistent across sessions
• Extensible tab completion
• “Magic” commands (basically macros)
• Easily embeddable in other Python programs and GUIs
• Integrated access to the pdb debugger and the Python profiler
• Syntax highlighting
• real multi-line editing
• Provides a kernel for Jupyter
• …and such more!
Project Jupyter is an open source project that offers a set of tools
for interactive and exploratory computing.
JUPYTER
• Born out of the IPython project in 2014
• Jupyter provides a console and a notebook server for all kinds of languages
(the name Jupyter comes from Julia, Python and R)
• An easy way to explore and prototype
• Notebooks support Markdown and LaTeX-like input and rendering
• Allows sharing code and analysis results
• Extensible (slideshow plugins, JupyterLab, VIM binding, …)
66
JUPYTER CONSOLE
A terminal frontend for kernels which use the Jupyter protocol.
67
JUPYTER NOTEBOOK
• A Web-based application suitable for capturing the whole computation process:
• developing
• documenting
• and executing code
• as well as communicating the results.
• Two main components:
• a web application: a browser-based tool for interactive authoring of documents
which combine explanatory text, mathematics, computations and their rich
media output.
• notebook documents: a representation of all content visible in the web
application, including inputs and outputs of the computations, explanatory text,
mathematics, images, and rich media representations of objects.
68
JUPYTER NOTEBOOK
cells for code/markup input
rendered output
for text/images/tables etc.
69
JUPYTERLAB
• The next level of interacting with notebooks
• Extensible: terminal, text editor, image viewer, etc.
• Supports editing multiple notebooks at once
• Drag and drop support to arrange panes
70
JUPYTERLAB
71
JUPYTERHUB
• JupyterHub creates a multi-user Hub which
spawns, manages, and proxies multiple instances of
the single-user Jupyter notebook server
• A nice environment for teaching
• Great tool for collaborations
72
DOCOPT
creates beautiful command-line interfaces
by Vladimir Keleshev
https://github.com/docopt/docopt
ARGPARSE/OPTPARSE
Many classes and functions,
default values,
extensive documentation,
very hard to memorise
a basic setup.
74
DOCOPT
#!/usr/bin/env python
”””
Naval Fate.
Usage:
naval_fate ship new <name>...
naval_fate ship <name> move <x> <y> [--speed=<kn>]
naval_fate ship shoot <x> <y>
naval_fate mine (set|remove) <x> <y> [--moored|--drifting]
naval_fate -h | --help
naval_fate --version
Options:
-h --help Show this screen.
--version Show version.
--speed=<kn> Speed in knots [default: 10].
--moored Moored (anchored) mine.
--drifting Drifting mine.
”””
from docopt import docopt
arguments = docopt(__doc__, version='Naval Fate 2.0')
DOCOPT
arguments =
{
"--drifting": false,
"--help": false,
"--moored": false,
"--speed": "20",
"--version": false,
"<name>": [
"Guardian"
],
naval_fate ship Guardian move 10 50 --speed=20
"<x>": "10",
"<y>": "50",
"mine": false,
"move": true,
"new": false,
"remove": false,
"set": false,
"ship": true,
"shoot": false
}
ACKNOWLEDGEMENT
H2020-Astronomy ESFRI and Research Infrastructure Cluster
(Grant Agreement number: 653477)
And many thanks to Vincent, Jayesh, Nicolas and all the
others in the organising committee!