591 lines (469 with data), 15.0 kB
#LyX 1.3 created this file. For more info see http://www.lyx.org/
\lyxformat 221
\textclass amsbook
\language english
\inputencoding auto
\fontscheme default
\graphics default
\paperfontsize default
\papersize Default
\paperpackage a4
\use_geometry 0
\use_amsmath 0
\use_natbib 0
\use_numerical_citations 0
\paperorientation portrait
\secnumdepth 3
\tocdepth 3
\paragraph_separation indent
\defskip medskip
\quotes_language english
\quotes_times 2
\papercolumns 1
\papersides 1
\paperpagestyle default
\layout Chapter
Why Python?
\layout Section
Who is using Python?
\layout Standard
The use of Python in scientific computing is as wide as the field itself.
A sampling of current work is provided here to indicate the breadth of
disciplines represented and the scale of the problems addressed.
The NASA Jet Propulsion Laboratory (JPL) uses Python as an interface language
to
\shape smallcaps
FORTRAN
\shape default
and C++ libraries which form a suite of tools for plotting and visualization
of spacecraft trajectory parameters in mission design and navigation.
The Space Telescope Science Institute (STScI) uses Python in many phases
of their pipeline: scheduling Hubble data acquisitions, managing volumes
of data, and analyzing astronomical images
\begin_inset LatexCommand \cite{BarrettEtal2004}
\end_inset
.
The National Oceanic Atmospheric Administration (NOAA) uses Python for
a wide variety of scientific computing tasks including simple scripts to
parse and translate data files, prototyping of computational algorithms,
writing user interfaces, web front ends, and the development of models
\begin_inset LatexCommand \cite{NOAA2000,BarkerHealy2001,ParkerHallBarker2001}
\end_inset
.
At the Fundamental Symmetries Lab at Princeton University, Python is used
to efficiently analyze large data sets from an experiment that searches
for CPT and Lorentz Violation using an atomic magnetometer
\begin_inset LatexCommand \cite{Kornack2002,Kominis2003}
\end_inset
.
The Pediatric Clinical Electrophysiology unit at The University of Chicago,
which collects approximately 100
\begin_inset ERT
status Collapsed
\layout Standard
\backslash
,
\end_inset
GB of data per week, uses Python to explore novel approaches to the localization
and detection of epileptic seizures
\begin_inset LatexCommand \cite{HunterEtal2005}
\end_inset
.
The Enthought Corporation is using Python to build customized applications
for oil exploration for the petroleum industry.
At the world's largest radio telescopes, e.g., Arecibo and the Green Bank
Telescope, Python is used for data processing, modelling, and scripting
high-performance computing jobs in order to search for and monitor binary
and millisecond pulsars in terabyte datasets
\begin_inset LatexCommand \cite{Ransometal2004a,Ransom2005}
\end_inset
.
At the Computational Genomics Laboratory at the Australian National University,
researchers are using Python to build a toolkit which enables the specification
of novel statistical models of sequence evolution on parallel hardware
\begin_inset LatexCommand \cite{Huttley2004,Butterfield2004}
\end_inset
.
Michel Sanner's group at the Scripps Research Institute uses Python extensively
to build a suite of applications for molecular visualization and exploration
of drug/molecule interactions using virtual reality and 3D printing technology
\begin_inset LatexCommand \cite{Sanner2005a,Sanner2005b}
\end_inset
.
Engineers at Google use Python in automation, control and tuning of their
computational grid, and use
\family typewriter
SWIG
\family default
generated Python of their in-house C++ libraries in virtually all facets
of their work
\begin_inset LatexCommand \cite{Beazley1998,Stein2005}
\end_inset
.
Many other use cases -- ranging from animation at Industrial Light and
Magic, to space shuttle mission control, to grid monitoring and control
at Rackspace, to drug discovery, meteorology and air traffic control --
are detailed in O'Reilly's two volumes of
\emph on
Python Success Stories
\emph default
\begin_inset LatexCommand \cite{PySuccess2002,PySuccess2005}
\end_inset
.
\layout Section
Advantages of Python
\layout Quotation
\shape italic
The canonical, "Python is a great first language", elicited, "Python is
a great last language!"
\shape default
-- Noah Spurrier
\layout Standard
This quotation summarizes an important reason scientists migrate to Python
as a programming language.
As a
\begin_inset Quotes eld
\end_inset
great first language
\begin_inset Quotes erd
\end_inset
Python has a simple, expressive syntax that is accessible to the newcomer.
\begin_inset Quotes eld
\end_inset
Python as executable pseudocode
\begin_inset Quotes erd
\end_inset
reflects the fact that Python syntax mirrors the obvious and intuitive
pseudo-code syntax used in many journals
\begin_inset LatexCommand \cite{Strous2001}
\end_inset
.
As a great first language, it does not impose a single programming paradigm
on scientists, as Java does with object oriented programming, but rather
allows one to code at many levels of sophistication, including BASIC/FORTRAN/Ma
tlab style procedural programming familiar to many scientists.
Here is the canonical first program
\begin_inset Quotes eld
\end_inset
hello world
\begin_inset Quotes erd
\end_inset
in Python:
\layout Standard
\noindent
\size small
\begin_inset ERT
status Collapsed
\layout Standard
\backslash
begin{verbatim}
\newline
# Python
\newline
print 'hello world'
\newline
\backslash
end{verbatim}
\end_inset
\size default
Contrast the simplicity of that program with the complexity
\begin_inset Quotes eld
\end_inset
hello world
\begin_inset Quotes erd
\end_inset
in Java
\size small
\begin_inset ERT
status Collapsed
\layout Standard
\backslash
begin{verbatim}
\newline
// java
\newline
class myfirstjavaprog
\newline
{
\newline
public static void main(String args[])
\newline
{
\newline
System.out.println("Hello World!");
\newline
}
\newline
}
\newline
\backslash
end{verbatim}
\end_inset
\size default
\layout Standard
\noindent
In addition to being accessible to new programmers and scientists, Python
is powerful enough to manage the complexity of large applications, supporting
functional programming, object orienting programming, generic programming
and metaprogramming.
That Python supports these paradigms suggests why it is also a
\begin_inset Quotes eld
\end_inset
great last language
\begin_inset Quotes erd
\end_inset
: as one increases their programming sophistication, the language scales
naturally.
By contrast, commercial languages like Matlab and IDL, which also support
a simple syntax for simple programs do not scale well to complex programming
tasks.
\layout Standard
The built-in Python data-types and standard library provide a powerful platform
in every distribution
\begin_inset LatexCommand \cite{PyLibRef,Lundh2001}
\end_inset
.
The standard data types encompass regular and arbitrary length integers,
complex numbers, floating point numbers, strings, lists, associative arrays,
sets and more.
In the standard library included with every Python distribution are modules
for regular expressions, data encodings, multimedia formats, math, networking
protocols, binary arrays and files, and much more.
Thus one can open a file on a remote web server and work with it as easily
as with a local file
\begin_inset ERT
status Collapsed
\layout Standard
\backslash
begin{verbatim}
\newline
# this 3 line script downloads and prints the yahoo web site
\newline
from urllib import urlopen
\newline
for line in urlopen('http://yahoo.com').readlines():
\newline
print line
\newline
\backslash
end{verbatim}
\end_inset
\layout Standard
Complementing these built-in features, Python is also readily extensible,
giving it a wealth of libraries for scientific computing that have been
in development for many years
\begin_inset LatexCommand \cite{Dubois1996b,Dubois1996c}
\end_inset
.
\family typewriter
Numeric Python
\family default
supports large array manipulations, math, optimized linear algebra, efficient
Fourier transforms and random numbers.
\family typewriter
scipy
\family default
is a collection of Python wrappers of high performance FORTRAN code (eg
LAPACK, ODEPACK) for numerical analysis
\begin_inset LatexCommand \cite{LAPACK}
\end_inset
.
\family typewriter
IPython
\family default
is a command shell ala Mathematica, Matlab and IDL for interactive programming,
data exploration and visualization with support for command history, completion
, debugging and more.
\family typewriter
Matplotlib
\family default
is a 2D graphics package for making publication quality graphics with a
Matlab compatible syntax that is also embeddable in applications.
\family typewriter
f2py
\family default
,
\family typewriter
SWIG
\family default
,
\family typewriter
weave
\family default
, and
\family typewriter
pyrex
\family default
are tools for rapidly building Python interfaces to high performance compiled
code,
\family typewriter
MayaVi
\family default
is a user friendly graphical user interface for 3D visualizations built
on top of the state-of-the-art Visualization Toolkit
\begin_inset LatexCommand \cite{SchroederEtal2002}
\end_inset
.
\family typewriter
pympi
\family default
,
\family typewriter
pypar
\family default
,
\family typewriter
pyro
\family default
,
\family typewriter
scipy.cow
\family default
, and
\family typewriter
pyxg
\family default
are tools for cluster building and doing parallel, remote and distributed
computations.
This is a sampling of general purpose libraries for scientific computing
in Python, and does not begin to address the many high quality, domain
specific libraries that are also available.
\layout Standard
All of the infrastructure described above is open source software that is
freely distributable for academic and commercial use.
In both the educational and scientific arenas, this is a critical point.
For education, this platform provides students with tools that they can
take with them outside the classroom to their homes and jobs and careers
beyond.
By contrast, the use commercial tools such as Matlab and IDL limits access
to major institutions.
For scientists, the use of open source tools is consistent with the scientific
principle that all of the steps in an analysis or simulation should be
open for review, and with the principle of reproducible research
\begin_inset LatexCommand \cite{BuckheitDonoho1995}
\end_inset
.
\layout Section
Mixed Language Programming
\layout Standard
The programming languages of each generation evolve in part to fix the problems
of those that came before
\begin_inset LatexCommand \cite{BerginEtal1996}
\end_inset
.
\shape smallcaps
FORTRAN
\shape default
, the original high level language of scientific computing
\begin_inset LatexCommand \cite{Rosen1967}
\end_inset
, was designed to allow scientists to express code at a level closer to
the language of the problem domain.
\shape smallcaps
ALGOL
\shape default
and its successor Pascal, widely used in education in the 1970s, were designed
to alleviate some of the perceived problems with
\shape smallcaps
FORTRAN
\shape default
and to create a language with a simpler and more expressive syntax
\begin_inset LatexCommand \cite{Backus1963,Naur1963}
\end_inset
.
Object oriented programming languages evolved to allow a closer correspondence
between the code and the physical system it models
\begin_inset LatexCommand \cite{GoldbergRobson1989}
\end_inset
, and C++ provided a relatively high performance object orientated implementatio
n compatible with the popular C programming language
\begin_inset LatexCommand \cite{Stroustrup1994,Stroustrup2000}
\end_inset
.
But implementing object orientation efficiently requires programmers stay
close to the machine, managing memory and pointers, and this created a
lot of complexity in programs while limiting portability.
Interpreted languages such as Tcl, Perl, Python, and Java evolved to manage
some of the low-level and platform specific details, making programs easier
to write and maintain, but with a performance penalty
\begin_inset LatexCommand \cite{Ousterhout1998,ArnoldEtal2005}
\end_inset
.
For many scientists, however, pure object oriented systems like Java are
unfamiliar, and languages like Matlab and Python provide the safety, portabilit
y and ease of use of an interpreted language without imposing an object
oriented approach to coding
\begin_inset LatexCommand \cite{VanRossumDrake2003,HanselmanLittlefield2004}
\end_inset
.
\layout Standard
The result of these several decades is that there are many platforms for
scientific computing in use today.
The number of man hours invested in numerical methods in
\shape smallcaps
FORTRAN
\shape default
, visualization libraries in C++, bioinformatics toolkits in Perl, object
frameworks in Java, domain specific toolkits in Matlab, etc
\begin_inset ERT
status Collapsed
\layout Standard
\backslash
dots
\end_inset
requires an approach that integrates this work.
Python is the language that provides maximal integration with other languages,
with tools for transparently and semi-automatically interfacing with
\shape smallcaps
FORTRAN
\shape default
, C, C++, Java, .NET, Matlab, and Mathematica code
\begin_inset LatexCommand \cite{Hugunin1997,Beazley1998}
\end_inset
.
In our view, the ability to work seamlessly with code from many languages
is the present and the future of scientific computing, and Python effectively
integrates these languages into a single environment.
\layout Section
Getting started
\layout Standard
We'll get started with python by introducing arrays and plotting by working
with a simple ASCII text file
\family typewriter
mydata.dat
\family default
of two columns; the first column contains the times that some measurement
was acquired, and the second column are the sampled voltages at that time.
The file looks like
\layout LyX-Code
0.0000 0.4911
\layout LyX-Code
0.0500 0.5012
\layout LyX-Code
0.1000 0.7236
\layout LyX-Code
0.1500 1.1756
\layout LyX-Code
...
and so on
\layout Standard
\noindent
While it would be easy enough to process this file by writing a python function
to do it, there is no need to, since the matplotlib pylab module has a
matlab-compatible l
\family typewriter
oad
\family default
function for loading ASCII array data.
To complete these exercises, you should have ipython and matplotlib installed,
and start ipython in pylab mode with
\layout LyX-Code
> ipython -pylab
\layout Standard
\begin_inset ERT
status Open
\layout Standard
\backslash
lstinputlisting[caption={Loading an ASCII text file and plotting the columns}]{snippets/load_data.ipy}
\end_inset
\the_end