Menu

[r4573]: / trunk / course / why_python.lyx  Maximize  Restore  History

Download this file

591 lines (469 with data), 15.0 kB

#LyX 1.3 created this file. For more info see http://www.lyx.org/
\lyxformat 221
\textclass amsbook
\language english
\inputencoding auto
\fontscheme default
\graphics default
\paperfontsize default
\papersize Default
\paperpackage a4
\use_geometry 0
\use_amsmath 0
\use_natbib 0
\use_numerical_citations 0
\paperorientation portrait
\secnumdepth 3
\tocdepth 3
\paragraph_separation indent
\defskip medskip
\quotes_language english
\quotes_times 2
\papercolumns 1
\papersides 1
\paperpagestyle default

\layout Chapter

Why Python?
\layout Section

Who is using Python?
\layout Standard

The use of Python in scientific computing is as wide as the field itself.
 A sampling of current work is provided here to indicate the breadth of
 disciplines represented and the scale of the problems addressed.
 The NASA Jet Propulsion Laboratory (JPL) uses Python as an interface language
 to 
\shape smallcaps 
FORTRAN
\shape default 
 and C++ libraries which form a suite of tools for plotting and visualization
 of spacecraft trajectory parameters in mission design and navigation.
 The Space Telescope Science Institute (STScI) uses Python in many phases
 of their pipeline: scheduling Hubble data acquisitions, managing volumes
 of data, and analyzing astronomical images 
\begin_inset LatexCommand \cite{BarrettEtal2004}

\end_inset 

.
 The National Oceanic Atmospheric Administration (NOAA) uses Python for
 a wide variety of scientific computing tasks including simple scripts to
 parse and translate data files, prototyping of computational algorithms,
 writing user interfaces, web front ends, and the development of models
 
\begin_inset LatexCommand \cite{NOAA2000,BarkerHealy2001,ParkerHallBarker2001}

\end_inset 

.
 At the Fundamental Symmetries Lab at Princeton University, Python is used
 to efficiently analyze large data sets from an experiment that searches
 for CPT and Lorentz Violation using an atomic magnetometer 
\begin_inset LatexCommand \cite{Kornack2002,Kominis2003}

\end_inset 

.
 The Pediatric Clinical Electrophysiology unit at The University of Chicago,
 which collects approximately 100
\begin_inset ERT
status Collapsed

\layout Standard

\backslash 
,
\end_inset 

GB of data per week, uses Python to explore novel approaches to the localization
 and detection of epileptic seizures 
\begin_inset LatexCommand \cite{HunterEtal2005}

\end_inset 

.
 The Enthought Corporation is using Python to build customized applications
 for oil exploration for the petroleum industry.
 At the world's largest radio telescopes, e.g., Arecibo and the Green Bank
 Telescope, Python is used for data processing, modelling, and scripting
 high-performance computing jobs in order to search for and monitor binary
 and millisecond pulsars in terabyte datasets 
\begin_inset LatexCommand \cite{Ransometal2004a,Ransom2005}

\end_inset 

.
 At the Computational Genomics Laboratory at the Australian National University,
 researchers are using Python to build a toolkit which enables the specification
 of novel statistical models of sequence evolution on parallel hardware
 
\begin_inset LatexCommand \cite{Huttley2004,Butterfield2004}

\end_inset 

.
 Michel Sanner's group at the Scripps Research Institute uses Python extensively
 to build a suite of applications for molecular visualization and exploration
 of drug/molecule interactions using virtual reality and 3D printing technology
\begin_inset LatexCommand \cite{Sanner2005a,Sanner2005b}

\end_inset 

.
 Engineers at Google use Python in automation, control and tuning of their
 computational grid, and use 
\family typewriter 
SWIG
\family default 
 generated Python of their in-house C++ libraries in virtually all facets
 of their work 
\begin_inset LatexCommand \cite{Beazley1998,Stein2005}

\end_inset 

.
 Many other use cases -- ranging from animation at Industrial Light and
 Magic, to space shuttle mission control, to grid monitoring and control
 at Rackspace, to drug discovery, meteorology and air traffic control --
 are detailed in O'Reilly's two volumes of 
\emph on 
Python Success Stories
\emph default 
 
\begin_inset LatexCommand \cite{PySuccess2002,PySuccess2005}

\end_inset 

.
\layout Section

Advantages of Python
\layout Quotation


\shape italic 
The canonical, "Python is a great first language", elicited, "Python is
 a great last language!"
\shape default 
 -- Noah Spurrier 
\layout Standard

This quotation summarizes an important reason scientists migrate to Python
 as a programming language.
 As a 
\begin_inset Quotes eld
\end_inset 

great first language
\begin_inset Quotes erd
\end_inset 

 Python has a simple, expressive syntax that is accessible to the newcomer.
 
\begin_inset Quotes eld
\end_inset 

Python as executable pseudocode
\begin_inset Quotes erd
\end_inset 

 reflects the fact that Python syntax mirrors the obvious and intuitive
 pseudo-code syntax used in many journals 
\begin_inset LatexCommand \cite{Strous2001}

\end_inset 

.
 As a great first language, it does not impose a single programming paradigm
 on scientists, as Java does with object oriented programming, but rather
 allows one to code at many levels of sophistication, including BASIC/FORTRAN/Ma
tlab style procedural programming familiar to many scientists.
 Here is the canonical first program 
\begin_inset Quotes eld
\end_inset 

hello world
\begin_inset Quotes erd
\end_inset 

 in Python:
\layout Standard
\noindent 

\size small 

\begin_inset ERT
status Collapsed

\layout Standard

\backslash 
begin{verbatim}
\newline 
# Python
\newline 
print 'hello world'
\newline 

\backslash 
end{verbatim}
\end_inset 


\size default 
  Contrast the simplicity of that program with the complexity 
\begin_inset Quotes eld
\end_inset 

hello world
\begin_inset Quotes erd
\end_inset 

 in Java  
\size small 

\begin_inset ERT
status Collapsed

\layout Standard

\backslash 
begin{verbatim}
\newline 
// java 
\newline 
class myfirstjavaprog
\newline 
{  
\newline 
  public static void main(String args[])
\newline 
  {
\newline 
    System.out.println("Hello World!");
\newline 
  }
\newline 
} 
\newline 

\backslash 
end{verbatim}
\end_inset 


\size default 
 
\layout Standard
\noindent 
In addition to being accessible to new programmers and scientists, Python
 is powerful enough to manage the complexity of large applications, supporting
 functional programming, object orienting programming, generic programming
 and metaprogramming.
 That Python supports these paradigms suggests why it is also a 
\begin_inset Quotes eld
\end_inset 

great last language
\begin_inset Quotes erd
\end_inset 

: as one increases their programming sophistication, the language scales
 naturally.
 By contrast, commercial languages like Matlab and IDL, which also support
 a simple syntax for simple programs do not scale well to complex programming
 tasks.
\layout Standard

The built-in Python data-types and standard library provide a powerful platform
 in every distribution 
\begin_inset LatexCommand \cite{PyLibRef,Lundh2001}

\end_inset 

.
 The standard data types encompass regular and arbitrary length integers,
 complex numbers, floating point numbers, strings, lists, associative arrays,
 sets and more.
 In the standard library included with every Python distribution are modules
 for regular expressions, data encodings, multimedia formats, math, networking
 protocols, binary arrays and files, and much more.
 Thus one can open a file on a remote web server and work with it as easily
 as with a local file 
\begin_inset ERT
status Collapsed

\layout Standard

\backslash 
begin{verbatim}
\newline 
# this 3 line script downloads and prints the yahoo web site
\newline 
from urllib import urlopen
\newline 
for line in urlopen('http://yahoo.com').readlines():
\newline 
   print line
\newline 

\backslash 
end{verbatim}
\end_inset 


\layout Standard

Complementing these built-in features, Python is also readily extensible,
 giving it a wealth of libraries for scientific computing that have been
 in development for many years 
\begin_inset LatexCommand \cite{Dubois1996b,Dubois1996c}

\end_inset 

.
 
\family typewriter 
Numeric Python
\family default 
 supports large array manipulations, math, optimized linear algebra, efficient
 Fourier transforms and random numbers.
 
\family typewriter 
scipy
\family default 
 is a collection of Python wrappers of high performance FORTRAN code (eg
 LAPACK, ODEPACK) for numerical analysis 
\begin_inset LatexCommand \cite{LAPACK}

\end_inset 

.
 
\family typewriter 
IPython
\family default 
 is a command shell ala Mathematica, Matlab and IDL for interactive programming,
 data exploration and visualization with support for command history, completion
, debugging and more.
 
\family typewriter 
Matplotlib
\family default 
 is a 2D graphics package for making publication quality graphics with a
 Matlab compatible syntax that is also embeddable in applications.
 
\family typewriter 
f2py
\family default 
, 
\family typewriter 
SWIG
\family default 
, 
\family typewriter 
weave
\family default 
, and 
\family typewriter 
pyrex
\family default 
 are tools for rapidly building Python interfaces to high performance compiled
 code, 
\family typewriter 
MayaVi
\family default 
 is a user friendly graphical user interface for 3D visualizations built
 on top of the state-of-the-art Visualization Toolkit 
\begin_inset LatexCommand \cite{SchroederEtal2002}

\end_inset 

.
 
\family typewriter 
pympi
\family default 
, 
\family typewriter 
pypar
\family default 
, 
\family typewriter 
pyro
\family default 
, 
\family typewriter 
scipy.cow
\family default 
, and 
\family typewriter 
pyxg
\family default 
 are tools for cluster building and doing parallel, remote and distributed
 computations.
 This is a sampling of general purpose libraries for scientific computing
 in Python, and does not begin to address the many high quality, domain
 specific libraries that are also available.
\layout Standard

All of the infrastructure described above is open source software that is
 freely distributable for academic and commercial use.
 In both the educational and scientific arenas, this is a critical point.
 For education, this platform provides students with tools that they can
 take with them outside the classroom to their homes and jobs and careers
 beyond.
 By contrast, the use commercial tools such as Matlab and IDL limits access
 to major institutions.
 For scientists, the use of open source tools is consistent with the scientific
 principle that all of the steps in an analysis or simulation should be
 open for review, and with the principle of reproducible research 
\begin_inset LatexCommand \cite{BuckheitDonoho1995}

\end_inset 

.
\layout Section

Mixed Language Programming
\layout Standard

The programming languages of each generation evolve in part to fix the problems
 of those that came before 
\begin_inset LatexCommand \cite{BerginEtal1996}

\end_inset 

.
 
\shape smallcaps 
FORTRAN
\shape default 
, the original high level language of scientific computing 
\begin_inset LatexCommand \cite{Rosen1967}

\end_inset 

, was designed to allow scientists to express code at a level closer to
 the language of the problem domain.
 
\shape smallcaps 
ALGOL
\shape default 
 and its successor Pascal, widely used in education in the 1970s, were designed
 to alleviate some of the perceived problems with 
\shape smallcaps 
FORTRAN
\shape default 
 and to create a language with a simpler and more expressive syntax 
\begin_inset LatexCommand \cite{Backus1963,Naur1963}

\end_inset 

.
 Object oriented programming languages evolved to allow a closer correspondence
 between the code and the physical system it models 
\begin_inset LatexCommand \cite{GoldbergRobson1989}

\end_inset 

, and C++ provided a relatively high performance object orientated implementatio
n compatible with the popular C programming language 
\begin_inset LatexCommand \cite{Stroustrup1994,Stroustrup2000}

\end_inset 

.
 But implementing object orientation efficiently requires programmers stay
 close to the machine, managing memory and pointers, and this created a
 lot of complexity in programs while limiting portability.
 Interpreted languages such as Tcl, Perl, Python, and Java evolved to manage
 some of the low-level and platform specific details, making programs easier
 to write and maintain, but with a performance penalty 
\begin_inset LatexCommand \cite{Ousterhout1998,ArnoldEtal2005}

\end_inset 

.
 For many scientists, however, pure object oriented systems like Java are
 unfamiliar, and languages like Matlab and Python provide the safety, portabilit
y and ease of use of an interpreted language without imposing an object
 oriented approach to coding 
\begin_inset LatexCommand \cite{VanRossumDrake2003,HanselmanLittlefield2004}

\end_inset 

.
\layout Standard

The result of these several decades is that there are many platforms for
 scientific computing in use today.
 The number of man hours invested in numerical methods in 
\shape smallcaps 
FORTRAN
\shape default 
, visualization libraries in C++, bioinformatics toolkits in Perl, object
 frameworks in Java, domain specific toolkits in Matlab, etc
\begin_inset ERT
status Collapsed

\layout Standard

\backslash 
dots 
\end_inset 

requires an approach that integrates this work.
 Python is the language that provides maximal integration with other languages,
 with tools for transparently and semi-automatically interfacing with 
\shape smallcaps 
FORTRAN
\shape default 
, C, C++, Java, .NET, Matlab, and Mathematica code 
\begin_inset LatexCommand \cite{Hugunin1997,Beazley1998}

\end_inset 

.
 In our view, the ability to work seamlessly with code from many languages
 is the present and the future of scientific computing, and Python effectively
 integrates these languages into a single environment.
\layout Section

Getting started
\layout Standard

We'll get started with python by introducing arrays and plotting by working
 with a simple ASCII text file 
\family typewriter 
mydata.dat
\family default 
 of two columns; the first column contains the times that some measurement
 was acquired, and the second column are the sampled voltages at that time.
 The file looks like
\layout LyX-Code

0.0000 0.4911
\layout LyX-Code

0.0500 0.5012
\layout LyX-Code

0.1000 0.7236
\layout LyX-Code

0.1500 1.1756
\layout LyX-Code

...
 and so on
\layout Standard
\noindent 
While it would be easy enough to process this file by writing a python function
 to do it, there is no need to, since the matplotlib pylab module has a
 matlab-compatible l
\family typewriter 
oad
\family default 
 function for loading ASCII array data.
 To complete these exercises, you should have ipython and matplotlib installed,
 and start ipython in pylab mode with 
\layout LyX-Code

> ipython -pylab
\layout Standard


\begin_inset ERT
status Open

\layout Standard

\backslash 
lstinputlisting[caption={Loading an ASCII text file and plotting the columns}]{snippets/load_data.ipy}
\end_inset 


\the_end
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.