Python For Sciences and Engineering
Python For Sciences and Engineering
Dr Edward Schoeld A*STAR / Singapore Computational Sciences Club Seminar June 14, 2011
1. Why Python?
Introducing Python
What is Python?
interpreted strongly but dynamically typed object-oriented intuitive, readable open source, free batteries included
batteries included
Pythons standard library is: very large well-supported well-documented
Native Python code executes 10x more slowly than C and FORTRAN
Date 1961 1984 1997 2000, Apr 2003, Aug 2007, Mar 2009, Sep
Cost per GFLOPS (US $) US $1.1 trillion US $15,000,000 US $30,000 $1000 $82 $0.42 $0.13
Technology 17 million IBM 1620s Cray X-MP Two 16-CPU clusters of Pentiums Bunyip Beowulf cluster KASY0 Ambric AM2045 ATI Radeon R800 Source: Wikipedia: FLOPS
Efciency
When FORTRAN was invented, computer time was more expensive than programmer time. In the 1980s and 1990s that reversed.
Efcient programming
What if ...
... you now need to reach Sydney?
Advantages of Python
Easy to write Easy to maintain Great standard libraries Thriving ecosystem of third-party packages Open source
Batteries included
Pythons standard library is: very large well supported well documented
Question
What is the date 177 days from now?
C C++ C#
A different language for each task? A language you know? A language others in your team are using: support and help?
Python Interpreted Powerful data input/output Great plotting General-purpose language Cost Open source Yes Yes Yes Powerful Free Yes
Python Powerful Portable Standard libraries Easy to write and maintain Easy to learn Yes Yes Vast Yes Yes
Python
Fast to write Good for embedded systems, device drivers and operating systems Good for most other high-level tasks
Yes
No
No
Yes
Yes
No
Standard library
Vast
Limited
Python Powerful, well-designed language Standard libraries Easy to learn Code brevity Easy to write and maintain Yes Vast Yes Short Yes
Open source
Python is open source software Benets: No vendor lock-in Cross-platform Insurance against bugs in the platform Free
A common sentiment: We achieve immediate functioning code so much faster in Python than in any other language that its staggering. - Robin Friedrich, Senior Project Engineer
Metaslash, Inc: 1999 to 2001 Mission-critical system for air-trafc control Replicated, fault-tolerant data storage
See http://www.python.org/about/success/ for lots more case studies and success stories
Small beginnings Piecemeal growth, quirky interfaces ... Large, cumbersome systems
NumPy
An n-dimensional array/matrix package
NumPy
Centre of Pythons numerical computing ecosystem
NumPy
The most fundamental tool for numerical computing in Python Fast multi-dimensional array capability
a rich set of numerical data types nearly 400 functions and methods on arrays: type conversions mathematical logical
NumPy's features
Fast. Written in C with BLAS/LAPACK hooks. Rich set of data types Linear algebra: matrix inversion, decompositions, Discrete Fourier transforms Random number generation Trig, hypergeometric functions, etc.
Universal functions
NumPy denes 'ufuncs' that operate on entire arrays and other sequences (hence 'universal') Example: sin()
>>> a = numpy.array([20, 30, 40, 50]) >>> c = 10 * numpy.sin(a) >>> c array([ 9.12945251, -9.88031624, 7.4511316 , -2.62374854])
Array slicing
Fancy indexing
What is SciPy?
Back-end: computational work Front-end: input / output, visualization, GUIs Dozens of great scientic packages exist
NumPy: numerical / array module Matplotlib: great 2D and 3D plotting library IPython: nice interactive Python shell SciPy: set of scientic libraries: sparse matrices, signal processing, RPy: integration with the R statistical environment
Cython: C language extensions Mayavi: 3D graphics, volumetric rendering Nitimes, Nipype: Python tools for neuroimaging SymPy: symbolic mathematics library
v0 = [3., 1., 4.] # initial param estimate # Fitting v, success = leastsq(err, v0, args=(x, y), maxfev=10000) print 'Estimated parameters: ', v print 'True parameters: ', true_params X = numpy.linspace(xmin, xmax, 5 * n) pylab.plot(x, y, 'ro', X, myfunc(v, X)) pylab.show()
Sparse matrices
Sparse matrices are mostly zeros. They can be symmetric or asymmetric. Sparsity patterns vary: block sparse, band matrices, ... They can be huge! Only non-zeros are stored.
SciPy supports seven sparse storage schemes ... and sparse solvers in Fortran.
Matplotlib
Great plotting package in Python Matlab-like syntax Great rendering: anti-aliasing etc. Many backends: Cairo, GTK, Cocoa, PDF Flexible output: to EPS, PS, PDF, TIFF, PNG, ...
3. Scaling
HPC
High-performance computing
Aspects to HPC
Supercomputers Parallel programming Caches, shared memory Code porting Distributed clusters / grids Scripting Job control Specialized hardware
Hierarchical data
Databases without the relational baggage
Applications of PyTables
aeronautics drug discovery nancial analysis climate prediction telecommunications data mining statistical analysis etc.
PyTables performance
OPSI indexing engine speed: Querying 10 billion rows can take hundredths of a second! Target use-case: mostly read-only or append-only data
Important principles
1. "Premature optimization is the root of all evil" Don't write cryptic code just to make it more efcient!
2. 1-5% of the code takes up the vast majority of the computing time! ... and it might not be the 1-5% that you think!
The largest Python training provider in South-East Asia Delighted customers include: