2021 lines (1548 with data), 43.2 kB
#LyX 1.3 created this file. For more info see http://www.lyx.org/
\lyxformat 221
\textclass amsbook
\language english
\inputencoding auto
\fontscheme default
\graphics default
\paperfontsize default
\papersize Default
\paperpackage a4
\use_geometry 0
\use_amsmath 0
\use_natbib 0
\use_numerical_citations 0
\paperorientation portrait
\secnumdepth 3
\tocdepth 3
\paragraph_separation indent
\defskip medskip
\quotes_language english
\quotes_times 2
\papercolumns 1
\papersides 1
\paperpagestyle default
\layout Chapter
A whirlwind tour of python and the standard library
\layout Standard
This is a quick-and-dirty introduction to the python language for the impatient
scientist.
There are many top notch, comprehensive introductions and tutorials for
python.
For absolute beginners, there is the
\shape italic
Python Beginner's Guide
\shape default
.
\begin_inset Foot
collapsed true
\layout Standard
http://www.python.org/moin/BeginnersGuide
\end_inset
The official
\shape italic
Python Tutorial
\shape default
can be read online
\begin_inset Foot
collapsed true
\layout Standard
http://docs.python.org/tut/tut.html
\end_inset
or downloaded
\begin_inset Foot
collapsed true
\layout Standard
http://docs.python.org/download.html
\end_inset
in a variety of formats.
There are over 100 python tutorials collected online.
\begin_inset Foot
collapsed true
\layout Standard
http://www.awaretek.com/tutorials.html
\end_inset
\layout Standard
There are also many excellent books.
Targetting newbies is Mark Pilgrim's
\shape italic
Dive into Python
\shape default
which in available in print and for free online
\begin_inset Foot
collapsed true
\layout Standard
http://diveintopython.org/toc/index.html
\end_inset
, though for absolute newbies even this may be too hard
\begin_inset LatexCommand \cite{Dive}
\end_inset
.
For experienced programmers, David Beasley's
\shape italic
Python Essential Reference
\shape default
is an excellent introduction to python, but is a bit dated since it only
covers python2.1
\begin_inset LatexCommand \cite{Beasley}
\end_inset
.
Likwise Alex Martelli's
\shape italic
Python in a Nutshell
\shape default
is highly regarded and a bit more current -- a 2nd edition is in the works
\begin_inset LatexCommand \cite{Nutshell}
\end_inset
.
And
\shape italic
The Python Cookbook
\shape default
is an extremely useful collection of python idioms, tips and tricks
\begin_inset LatexCommand \cite{Cookbook}
\end_inset
.
\layout Standard
But the typical scientist I encounter wants to solve a specific problem,
eg, to make a certain kind of graph, to numerically integrate an equation,
or to fit some data to a parametric model, and doesn't have the time or
interest to read several books or tutorials to get what they want.
This guide is for them: a short overview of the language to help them get
to what they want as quickly as possible.
\layout Section
Hello Python
\layout Standard
Python is a dynamically typed, object oriented, interpreted language.
Interpreted means that your program interacts with the python interpreter,
similar to Matlab, Perl, Tcl and Java, and unlike FORTRAN, C, or C++ which
are compiled.
So let's fire up the python interpreter and get started.
I'm not going to cover installing python -- it's standard on most linux
boxes and for windows there is a friendly GUI installer.
To run the python interpreter, on windows, you can click
\family typewriter
Start->All Programs->Python 2.4->Python (command line)
\family default
or better yet, install
\family typewriter
ipython
\family default
, a python shell on steroids, and use that.
On linux / unix systems, you just need to type
\family typewriter
python
\family default
or
\family typewriter
ipython
\family default
at the command line.
The
\family typewriter
>>>
\family default
is the default python shell prompt, so don't type it in the examples below
\layout LyX-Code
>>> print 'hello world'
\layout LyX-Code
hello world
\layout LyX-Code
\layout Standard
As this example shows,
\shape italic
hello world
\shape default
in python is pretty easy -- one common phrase you hear in the python community
is that
\begin_inset Quotes eld
\end_inset
it fits your brain
\begin_inset Quotes erd
\end_inset
.
-- the basic idea is that coding in python feels natural.
Compare python's version with
\shape italic
hello world
\shape default
in C++
\layout LyX-Code
// C++
\layout LyX-Code
#include <iostream>
\layout LyX-Code
int main ()
\layout LyX-Code
{
\layout LyX-Code
std::cout << "Hello World" << std::endl;
\layout LyX-Code
return 0;
\layout LyX-Code
}
\layout Section
Python is a calculator
\layout Standard
Aside from my daughter's solar powered cash-register calculator, Python
is the only calculator I use.
From the python shell, you can type arbitrary arithmetic expressions.
\layout LyX-Code
>>> 2+2
\layout LyX-Code
4
\layout LyX-Code
>>> 2**10
\layout LyX-Code
1024
\layout LyX-Code
>>> 10/5
\layout LyX-Code
2
\layout LyX-Code
>>> 2+(24.3 + .9)/.24
\layout LyX-Code
107.0
\layout LyX-Code
>>> 2/3
\layout LyX-Code
0
\layout Standard
The last line is a standard newbie gotcha -- if both the left and right
operands are integers, python returns an integer.
To do floating point division, make sure at least one of the numbers is
a float
\layout LyX-Code
>>> 2.0/3
\layout LyX-Code
0.66666666666666663
\layout Standard
The distinction between integer and floating point division is a common
source of frustration among newbies and is slated for destruction in the
mythical Python 3000.
\begin_inset Foot
collapsed true
\layout Standard
Python 3000 is a future python release that will clean up several things
that Guido considers to be warts.
\end_inset
Since the removal of the distinction is slated, you can invoke the time
machine with the
\family typewriter
from __future__
\family default
directive; these directives allow python programmers today to use features
that will become standard in future releases but are not included by default
because they would break existing code.
From future directives should be among the first lines you type in your
python code if you are going to use them, otherwise they may not work The
future division operator will assume floating point division by default,
\begin_inset Foot
collapsed false
\layout Standard
The astute reader will note that 2/3 was represented as 0.66666666666666663
and not 0.66666666666666666 as might be expected.
This is, of course, because computers are binary calculators, and there
is no exact binary representation of 2/3, just as there is no exact binary
representation of 0.1
\layout LyX-Code
>>> 0.1
\layout LyX-Code
0.10000000000000001
\layout Standard
Some languages try and hide this from you, but python is explicit.
\end_inset
and provides another operator // to do classic integer division.
\layout LyX-Code
>>> from __future__ import division
\layout LyX-Code
>>> 2/3
\layout LyX-Code
0.66666666666666663
\layout LyX-Code
>>> 2//3
\layout LyX-Code
0
\layout Standard
python has four basic numeric types: int, long, float and complex, but unlike
C++, BASIC, FORTRAN or Java, you don't have to declare these types.
python can infer them
\layout LyX-Code
>>> type(1)
\layout LyX-Code
<type 'int'>
\layout LyX-Code
>>> type(1.0)
\layout LyX-Code
<type 'float'>
\layout LyX-Code
>>> type(2**200)
\layout LyX-Code
<type 'long'>
\layout LyX-Code
\layout Standard
\begin_inset Formula $2^{200}$
\end_inset
is a huge number!
\layout LyX-Code
>>> 2**200
\layout LyX-Code
1606938044258990275541962092341162602522202993782792835301376L
\layout Standard
but python will blithely compute it and much larger numbers for you as long
as you have CPU and memory to handle them.
The integer type, if it overflows, will automatically convert to a python
\family typewriter
long
\family default
(as indicated by the appended
\family typewriter
L
\family default
in the output above) and has no built-in upper bound on size, unlike C/C++
longs.
\layout Standard
Python has built in support for complex numbers.
Eg, we can verify
\begin_inset Formula $i^{2}=-1$
\end_inset
\layout LyX-Code
>>> x = complex(0,1)
\layout LyX-Code
>>> x*x
\layout LyX-Code
(-1+0j)
\layout Standard
To access the real and imaginary parts of a complex number, use the
\family typewriter
real
\family default
and
\family typewriter
imag
\family default
attributes
\layout LyX-Code
>>> x.real
\layout LyX-Code
0.0
\layout LyX-Code
>>> x.imag
\layout LyX-Code
1.0
\layout Standard
If you come from other languages like Matlab, the above may be new to you.
In matlab, you might do something like this (>> is the standard matlab
shell prompt)
\layout LyX-Code
>> x = 0+j
\layout LyX-Code
x =
\layout LyX-Code
0.0000 + 1.0000i
\layout LyX-Code
\layout LyX-Code
>> real(x)
\layout LyX-Code
ans =
\layout LyX-Code
0
\layout LyX-Code
\layout LyX-Code
>> imag(x)
\layout LyX-Code
ans =
\layout LyX-Code
1
\layout LyX-Code
\layout LyX-Code
\layout Standard
That is, in Matlab, you use a
\shape italic
function
\shape default
to access the real and imaginary parts of the data, but in python these
are attributes of the complex object itself.
This is a core feature of python and other object oriented languages: an
object carries its data and methods around with it.
One might say:
\begin_inset Quotes eld
\end_inset
a complex number knows it's real and imaginary parts
\begin_inset Quotes erd
\end_inset
or
\begin_inset Quotes eld
\end_inset
a complex number knows how to take its conjugate
\begin_inset Quotes erd
\end_inset
, you don't need external functions for these operations
\layout LyX-Code
>>> x.conjugate
\layout LyX-Code
<built-in method conjugate of complex object at 0xb6a62368>
\layout LyX-Code
>>> x.conjugate()
\layout LyX-Code
-1j
\layout Standard
On the first line, I just followed along from the example above with
\family typewriter
real
\family default
and
\family typewriter
imag
\family default
and typed
\family typewriter
x.conjugate
\family default
and python printed the representation
\family typewriter
<built-in method conjugate of complex object at 0xb6a62368>.
\family default
This means that
\family typewriter
conjugate
\family default
is a
\shape italic
method
\shape default
, a.k.a a function, and in python we need to use parentheses to call a function.
If the method has arguments, like the
\family typewriter
x
\family default
in
\family typewriter
sin(x)
\family default
, you place them inside the parentheses, and if it has no arguments, like
\family typewriter
conjugate
\family default
, you simply provide the open and closing parentheses.
\family typewriter
real
\family default
,
\family typewriter
imag
\family default
and
\family typewriter
conjugate
\family default
are attributes of the complex object, and
\family typewriter
conjugate
\family default
is a
\shape italic
callable
\shape default
attribute, known as a
\shape italic
method
\shape default
.
\layout Standard
OK, now you are an object oriented programmer.
There are several key ideas in object oriented programming, and this is
one of them: an object carries around with it data (simple attributes)
and methods (callable attributes) that provide additional information about
the object and perform services.
It's one stop shopping -- no need to go to external functions and libraries
to deal with it -- the object knows how to deal with itself.
\layout Section
Accessing the standard library
\layout Standard
Arithmetic is fine, but before long you may find yourself tiring of it and
wanting to compute logarithms and exponents, sines and cosines
\layout LyX-Code
>>> log(10)
\layout LyX-Code
Traceback (most recent call last):
\layout LyX-Code
File "<stdin>", line 1, in ?
\layout LyX-Code
NameError: name 'log' is not defined
\layout Standard
These functions are not built into python, but don't despair, they are built
into the python standard library.
To access a function from the standard library, or an external library
for that matter, you must import it.
\layout LyX-Code
>>> import math
\layout LyX-Code
>>> math.log(10)
\layout LyX-Code
2.3025850929940459
\layout LyX-Code
>>> math.sin(math.pi)
\layout LyX-Code
1.2246063538223773e-16
\layout Standard
Note that the default
\family typewriter
log
\family default
function is a base 2 logarithm (use
\family typewriter
math.log10
\family default
for base 10 logs) and that floating point math is inherently imprecise,
since analytically
\begin_inset Formula $\sin(\pi)=0$
\end_inset
.
\layout Standard
It's kind of a pain to keep typing
\family typewriter
math.log
\family default
and
\family typewriter
math.sin
\family default
and
\family typewriter
math.p
\family default
i, and python is accomodating.
There are additional forms of
\family typewriter
import
\family default
that will let you save more or less typing depending on your desires
\layout LyX-Code
\color blue
# Appreviate the module name: m is an alias
\layout LyX-Code
>>> import math as m
\layout LyX-Code
>>> m.cos(2*m.pi)
\layout LyX-Code
1.0
\layout LyX-Code
\layout LyX-Code
\color blue
# Import just the names you need
\layout LyX-Code
>>> from math import exp, log
\layout LyX-Code
>>> log(exp(1))
\layout LyX-Code
1.0
\layout LyX-Code
\layout LyX-Code
\color blue
# Import everything - use with caution!
\layout LyX-Code
>>> from math import *
\layout LyX-Code
>>> sin(2*pi*10)
\layout LyX-Code
-2.4492127076447545e-15
\layout Standard
To help you learn more about what you can find in the math library, python
has nice introspection capabilities -- introspection is a way of asking
an object about itself.
For example, to find out what is available in the math library, we can
get a directory of everything available with the
\family typewriter
dir
\family default
command
\begin_inset Foot
collapsed false
\layout Standard
In addition to the introdpection and help provided in the python interpreter,
the official documentation of the python standard library is very good
and up-to-date http://docs.python.org/lib/lib.html .
\end_inset
\layout LyX-Code
>>> dir(math)
\layout LyX-Code
['__doc__', '__file__', '__name__', 'acos', 'asin', 'atan', 'atan2', 'ceil',
'cos', 'cosh', 'degrees', 'e', 'exp', 'fabs', 'floor', 'fmod', 'frexp',
'hypot', 'ldexp', 'log', 'log10', 'modf', 'pi', 'pow', 'radians', 'sin',
'sinh', 'sqrt', 'tan', 'tanh']
\layout Standard
This gives us just a listing of the names that are in the math module --
they are fairly self descriptive, but if you want more, you can call
\family typewriter
help
\family default
on any of these functions for more information
\layout LyX-Code
>>> help(math.sin)
\layout LyX-Code
Help on built-in function sin:
\layout LyX-Code
sin(...)
\layout LyX-Code
sin(x)
\layout LyX-Code
Return the sine of x (measured in radians).
\layout Standard
and for the whole math library
\layout LyX-Code
>>> help(math)
\layout LyX-Code
Help on module math:
\layout LyX-Code
\layout LyX-Code
NAME
\layout LyX-Code
math
\layout LyX-Code
\layout LyX-Code
FILE
\layout LyX-Code
/usr/local/lib/python2.3/lib-dynload/math.so
\layout LyX-Code
\layout LyX-Code
DESCRIPTION
\layout LyX-Code
This module is always available.
It provides access to the
\layout LyX-Code
mathematical functions defined by the C standard.
\layout LyX-Code
\layout LyX-Code
FUNCTIONS
\layout LyX-Code
acos(...)
\layout LyX-Code
acos(x)
\layout LyX-Code
\layout LyX-Code
Return the arc cosine (measured in radians) of x.
\layout LyX-Code
\layout LyX-Code
asin(...)
\layout LyX-Code
asin(x)
\layout LyX-Code
\layout LyX-Code
Return the arc sine (measured in radians) of x.
\layout LyX-Code
\layout Standard
And much more which is snipped.
Likewise, we can get information on the complex object in the same way
\layout LyX-Code
>>> x = complex(0,1)
\layout LyX-Code
>>> dir(x)
\layout LyX-Code
['__abs__', '__add__', '__class__', '__coerce__', '__delattr__', '__div__',
'__divmod__', '__doc__', '__eq__', '__float__', '__floordiv__', '__ge__',
'__getattribute__', '__getnewargs__', '__gt__', '__hash__', '__init__',
'__int__', '__le__', '__long__', '__lt__', '__mod__', '__mul__', '__ne__',
'__neg__', '__new__', '__nonzero__', '__pos__', '__pow__', '__radd__',
'__rdiv__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloor
div__', '__rmod__', '__rmul__', '__rpow__', '__rsub__', '__rtruediv__',
'__setattr__', '__str__', '__sub__', '__truediv__', 'conjugate', 'imag',
'real']
\layout LyX-Code
\layout Standard
Notice that called
\family typewriter
dir
\family default
or
\family typewriter
help
\family default
on the
\family typewriter
math
\family default
\shape italic
module
\shape default
, the
\family typewriter
math.sin
\family default
\shape italic
function
\shape default
, and the
\family typewriter
complex
\family default
\shape italic
number
\shape default
\family typewriter
x
\family default
.
That's because modules, functions and numbers are all
\shape italic
objects
\shape default
, and we use the same object introspection and help capabilites on them.
We can find out what type of object they are by calling
\family typewriter
type
\family default
on them, which is another function in python's introspection arsenal
\layout LyX-Code
>>> type(math)
\layout LyX-Code
<type 'module'>
\layout LyX-Code
>>> type(math.sin)
\layout LyX-Code
<type 'builtin_function_or_method'>
\layout LyX-Code
>>> type(x)
\layout LyX-Code
<type 'complex'>
\layout LyX-Code
\layout Standard
Now, you may be wondering: what were all those god-awful looking double
underscore methods, like
\family typewriter
__abs__
\family default
and
\family typewriter
__mul__
\family default
in the
\family typewriter
dir
\family default
listing of the complex object above? These are methods that define what
it means to be a numeric type in python, and the complex object implements
these methods so that complex numbers act like the way should, eg
\family typewriter
__mul__
\family default
implements the rules of complex multiplication.
The nice thing about this is that python specifies an application programming
interface (API) that is the definition of what it means to be a number
in python.
And this means you can define your own numeric types, as long as you implement
the required special double underscore methods for your custom type.
double underscore methods are very important in python; although the typical
newbie never sees them or thinks about them, they are there under the hood
providing all the python magic, and more importantly, showing the way to
let you make magic.
\layout Section
Strings
\layout Standard
We've encountered a number of types of objects above: int, float, long,
complex, method/function and module.
We'll continue our tour with an introduction to strings, which are critical
components of almost every program.
You can create strings in a number of different ways, with single quotes,
double quotes, or triple quotes -- this diversity of methods makes it easy
if you need to embed string characters in the string itself
\layout LyX-Code
\color blue
# single, double and triple quoted strings
\layout LyX-Code
>>> s = 'Hi Mom!'
\layout LyX-Code
>>> s = "Hi Mom!"
\layout LyX-Code
>>> s = """Porky said, "That's all folks!" """
\layout Standard
You can add strings together to concatenate them
\layout LyX-Code
\color blue
# concatenating strings
\layout LyX-Code
>>> first = 'John'
\layout LyX-Code
>>> last = 'Hunter'
\layout LyX-Code
>>> first+last
\layout LyX-Code
'JohnHunter'
\layout Standard
or call string methods to process them: upcase them or downcase them, or
replace one character with another
\layout LyX-Code
\color blue
# string methods
\layout LyX-Code
>>> last.lower()
\layout LyX-Code
'hunter'
\layout LyX-Code
>>> last.upper()
\layout LyX-Code
'HUNTER'
\layout LyX-Code
>>> last.replace('h', 'p')
\layout LyX-Code
'Hunter'
\layout LyX-Code
>>> last.replace('H', 'P')
\layout LyX-Code
'Punter'
\layout Standard
Note that in all of these examples, the string
\family typewriter
last
\family default
is unchanged.
All of these methods operate on the string and return a new string, leaving
the original unchanged.
In fact, python strings cannot be changed by any python code at all: they
are
\shape italic
immutable
\shape default
(unchangeable).
The concept of mutable and immutable objects in python is an important
one, and it will come up again, because only immutable objects can be used
as keys in python dictionaries and elements of python sets.
\layout Standard
You can access individual characters, or slices of the string (substrings),
using indexing.
A string in sequence of characters, and strings implement the sequence
protocol in python -- we'll see more examples of python sequences later
-- and all sequences have the same syntax for accessing their elements.
Python uses 0 based indexing which means the first element is at index
0; you can use negative indices to access the last elements in the sequence
\layout LyX-Code
\color blue
# string indexing
\layout LyX-Code
>>> last = 'Hunter'
\layout LyX-Code
>>> last[0]
\layout LyX-Code
'H'
\layout LyX-Code
>>> last[1]
\layout LyX-Code
'u'
\layout LyX-Code
>>> last[-1]
\layout LyX-Code
'r'
\layout Standard
To access substrings, or generically in terms of the sequence protocol,
slices, you use a colon to indicate a range
\layout LyX-Code
\color blue
# string slicing
\layout LyX-Code
>>> last[0:2]
\layout LyX-Code
'Hu'
\layout LyX-Code
>>> last[2:4]
\layout LyX-Code
'nt'
\layout Standard
As this example shows, python uses
\begin_inset Quotes eld
\end_inset
one-past-the-end
\begin_inset Quotes erd
\end_inset
indexing when defining a range; eg, in the range
\family typewriter
indmin:indmax
\family default
, the element of
\family typewriter
imax
\family default
is not included.
You can use negative indices when slicing too; eg, to get everything before
the last character
\layout LyX-Code
>>> last[0:-1]
\layout LyX-Code
'Hunte'
\layout Standard
You can also leave out either the min or max indicator; if they are left
out, 0 is assumed to be the
\family typewriter
indmin
\family default
and one past the end of the sequence is assumed to be
\family typewriter
indmax
\layout LyX-Code
>>> last[:3]
\layout LyX-Code
'Hun'
\layout LyX-Code
>>> last[3:]
\layout LyX-Code
'ter'
\layout Standard
There is a third number that can be placed in a slice, a step, with syntax
indmin:indmax:step; eg, a step of 2 will skip every second letter
\layout LyX-Code
>>> last[1:6:2]
\layout LyX-Code
'utr'
\layout Standard
Although this may be more that you want to know about slicing strings, the
time spent here is worthwhile.
As mentioned above, all python sequences obey these rules.
In addition to strings, lists and tuples, which are built-in python sequence
data types and are discussed in the next section, the numeric arrays widely
used in scientific computing also implement the sequence protocol, and
thus have the same slicing rules.
\layout Exercise
What would you expect last[:] to return?
\layout Standard
One thing that comes up all the time is the need to create strings out of
other strings and numbers, eg to create filenames from a combination of
a base directory, some base filename, and some numbers.
Scientists like to create lots of data files like
\layout LyX-Code
data/myexp01.dat
\layout LyX-Code
data/myexp02.dat
\layout LyX-Code
data/myexp03.dat
\layout LyX-Code
data/myexp04.dat
\layout Standard
and then write code to loop over these files and analyze them.
We're going to show how to do that, starting with the newbie way and progressiv
ely building up to the way of python zen master.
All of the methods below
\shape italic
work
\shape default
, but the zen master way will more efficient, more scalable (eg to larger
numbers of files) and cross-platform.
\begin_inset Foot
collapsed false
\layout Standard
\begin_inset Quotes eld
\end_inset
But it works
\begin_inset Quotes erd
\end_inset
is a common defense of bad code; my rejoinder to this is
\begin_inset Quotes eld
\end_inset
A computer scientist is someone who fixes things that aren't broken
\begin_inset Quotes erd
\end_inset
.
\end_inset
Here's the newbie way: we also introduce the for-loop here in the spirit
of diving into python -- note that python uses whitespace indentation to
delimit the for-loop code block
\layout LyX-Code
\color blue
# The newbie way
\layout LyX-Code
for i in (1,2,3,4):
\layout LyX-Code
fname = 'data/myexp0' + str(i) + '.dat'
\layout LyX-Code
print fname
\layout Standard
Now as promised, this will print out the 4 file names above, but it has
three flaws: it doesn't scale to 10 or more files, it is inefficient, and
it is not cross platform.
It doesn't scale because it hard-codes the '
\family typewriter
0
\family default
' after
\family typewriter
myexp
\family default
, it is inefficient because to add several strings requires the creation
of temporary strings, and it is not cross-platform because it hard-codes
the directory separator '/'.
\layout LyX-Code
\color blue
# On the path to elightenment
\layout LyX-Code
for i in (1,2,3,4):
\layout LyX-Code
fname = 'data/myexp%02d.dat'%i
\layout LyX-Code
print fname
\layout Standard
This example uses string interpolation, the funny % thing.
If you are familiar with C programming, this will be no surprise to you
(on linux/unix systems do
\family typewriter
man sprintf
\family default
at the unix shell).
The percent character is a string formatting character:
\family typewriter
%02d
\family default
means to take an integer (the
\family typewriter
d
\family default
part) and print it with two digits, padding zero on the left (the
\family typewriter
%02
\family default
part).
There is more to be said about string interpolation, but let's finish the
job at hand.
This example is better than the newbie way because is scales up to files
numbered 0-99, and it is more efficient because it avoids the creation
of temporary strings.
For the platform independent part, we go to the python standard library
\family typewriter
os.path
\family default
, which provides a host of functions for platform-independent manipulations
of filenames, extensions and paths.
Here we use
\family typewriter
os.path.join
\family default
to combine the directory with the filename in a platform independent way.
On windows, it will use the windows path separator '
\backslash
' and on unix it will use '/'.
\layout LyX-Code
\color blue
# the zen master approach
\layout LyX-Code
import os
\layout LyX-Code
for i in (1,2,3,4):
\layout LyX-Code
fname = os.path.join('data', 'myexp%02d.dat'%i)
\layout LyX-Code
print fname
\layout LyX-Code
\layout Standard
OK, I promised to torture you a bit more with string interpolation -- don't
worry, I remembered.
The ability to properly format your data when printing it is crucial in
scientific endeavors: how many signficant digits do you want, do you want
to use integer, floating point representation or exponential notation?
These three choices are provided with
\family typewriter
%d
\family default
,
\family typewriter
%f
\family default
and
\family typewriter
%e
\family default
, with lots of variations on the theme to indicate precision and more
\layout LyX-Code
>>> 'warm for %d minutes at %1.1f C' % (30, 37.5)
\layout LyX-Code
'warm for 30 minutes at 37.5 C'
\layout LyX-Code
\layout LyX-Code
>>> 'The mass of the sun is %1.4e kg'% (1.98892*10**30)
\layout LyX-Code
'The mass of the sun is 1.9889e+30 kg'
\layout LyX-Code
\layout Standard
There are two string methods,
\family typewriter
split
\family default
and
\family typewriter
join
\family default
, that arise frequenctly in Numeric processing, specifically in the context
of processing data files that have comma, tab, or space separated numbers
in them.
\family typewriter
split
\family default
takes a single string, and splits it on the indicated character to a sequence
of strings.
This is useful to take a single line of space or comma separated values
and split them into individual numbers
\layout LyX-Code
\color blue
# s is a single string and we split it into a list of strings
\layout LyX-Code
\color blue
# for further processing
\layout LyX-Code
>>> s = '1.0 2.0 3.0 4.0 5.0'
\layout LyX-Code
>>> s.split(' ')
\layout LyX-Code
['1.0', '2.0', '3.0', '4.0', '5.0']
\layout Standard
The return value, with square brackets, indicates that python has returned
a list of strings.
These individual strings need further processing to convert them into actual
floats, but that is the first step.
The conversion to floats will be discussed in the next session, when we
learn about list comprehensions.
The converse method is join, which is often used to create string output
to an ASCII file from a list of numbers.
In this case you want to join a list of numbers into a single line for
printing to a file.
The example below will be clearer after the next section, in which lists
are discussed
\layout LyX-Code
\color blue
# vals is a list of floats and we convert it to a single
\layout LyX-Code
\color blue
# space separated string
\layout LyX-Code
>>> vals = [1.0, 2.0, 3.0, 4.0, 5.0]
\layout LyX-Code
>>> ' '.join([str(val) for val in vals])
\layout LyX-Code
'1.0 2.0 3.0 4.0 5.0'
\layout Standard
There are two new things in the example above.
One, we called the join method directly on a string itself, and not on
a variable name.
Eg, in the previous examples, we always used the name of the object when
accessing attributes, eg
\family typewriter
x.real
\family default
or
\family typewriter
s.upper()
\family default
.
In this example, we call the
\family typewriter
join
\family default
method on the string which is a single space.
The second new feature is that we use a list comprehension
\family typewriter
[str(val) for val in vals]
\family default
as the argument to
\family typewriter
join
\family default
.
\family typewriter
join
\family default
requires a sequence of strings, and the list comprehension converts a list
of floats to a strings.
This can be confusing at first, so don't dispair if it is.
But it is worth bringing up early because list comprehensions are a very
useful feature of python.
To help elucidate, compare
\family typewriter
vals
\family default
, which is a list of floats, with the conversion of
\family typewriter
vals
\family default
to a list of strings using list comprehensions in the next line
\layout LyX-Code
\color blue
# converting a list of floats to a list of strings
\layout LyX-Code
>>> vals
\layout LyX-Code
[1.0, 2.0, 3.0, 4.0, 5.0]
\layout LyX-Code
>>> [str(val) for val in vals]
\layout LyX-Code
['1.0', '2.0', '3.0', '4.0', '5.0']
\layout Section
The basic python data structures
\layout Standard
Strings, covered in the last section, are sequences of characters.
python has two additional built-in sequence types which can hold arbitrary
elements: tuples and lists.
tuples are created using parentheses, and lists are created using square
brackets
\layout LyX-Code
\color blue
# a tuple and a list of elements of the same type
\layout LyX-Code
\color blue
# (homogeneous)
\layout LyX-Code
>>> t = (1,2,3,4) # tuple
\layout LyX-Code
>>> l = [1,2,3,4] # list
\layout Standard
Both tuples and lists can also be used to hold elements of different types
\layout LyX-Code
\color blue
# a tuple and list of int, string, float
\layout LyX-Code
>>> t = (1,'john', 3.0)
\layout LyX-Code
>>> l = [1,'john', 3.0]
\layout Standard
Tuples and lists have the same indexing and slicing rules as each other,
and as string discussed above, because both implement the python sequence
protocol, with the only difference being that tuple slices return tuples
(indicated by the parentheses below) and list slices return lists (indicated
by the square brackets)
\layout LyX-Code
# indexing and slicing tuples and lists
\layout LyX-Code
>>> t[0]
\layout LyX-Code
1
\layout LyX-Code
>>> l[0]
\layout LyX-Code
1
\layout LyX-Code
>>> t[:-1]
\layout LyX-Code
(1, 'john')
\layout LyX-Code
>>> l[:-1]
\layout LyX-Code
[1, 'john']
\layout Standard
So why the difference between tuples and lists? A number of explanations
have been offered on the mailing lists, but the only one that makes a differenc
e to me is that tuples are immutable, like strings, and hence can be used
as keys to python dictionaries and included as elements of sets, and lists
are mutable, and cannot.
So a tuple, once created, can never be changed, but a list can.
For example, if we try to reassign the first element of the tuple above,
we get an error
\layout LyX-Code
>>> t[0] = 'why not?'
\layout LyX-Code
Traceback (most recent call last):
\layout LyX-Code
File "<stdin>", line 1, in ?
\layout LyX-Code
TypeError: object doesn't support item assignment
\layout Standard
But the same operation is perfectly accetable for lists
\layout LyX-Code
>>> l[0] = 'why not?'
\layout LyX-Code
>>> l
\layout LyX-Code
['why not?', 'john', 3.0]
\layout Standard
lists also have a lot of methods, tuple have none, save the special double
underscore methods that are required for python objects and sequences
\layout LyX-Code
\color blue
# tuples contain only
\begin_inset Quotes eld
\end_inset
hidden
\begin_inset Quotes erd
\end_inset
double underscore methods
\layout LyX-Code
>>> dir(t)
\layout LyX-Code
['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__',
'__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__getslice__',
'__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__',
'__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__',
'__rmul__', '__setattr__', '__str__']
\layout LyX-Code
\layout LyX-Code
\color blue
# but lists contain other methods, eg append, extend and
\layout LyX-Code
\color blue
# reverse
\layout LyX-Code
>>> dir(l)['__add__', '__class__', '__contains__', '__delattr__', '__delitem__',
'__delslice__', '__doc__', '__eq__', '__ge__', '__getattribute__', '__getitem__
', '__getslice__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__',
'__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__',
'__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr__', '__setite
m__', '__setslice__', '__str__', 'append', 'count', 'extend', 'index', 'insert',
'pop', 'remove', 'reverse', 'sort']
\layout Standard
Many of these list methods change, or mutate, the list, eg append adds an
element to the list,
\family typewriter
extend
\family default
extends the list with a sequence of elements,
\family typewriter
sort
\family default
sorts the list in place,
\family typewriter
reverse
\family default
reverses it in place,
\family typewriter
pop
\family default
takes an element off the list and returns it.
\layout Standard
We've seen a couple of examples of creating a list above -- let's look at
some more using list methods
\layout LyX-Code
>>> x = []
\color blue
# create the empty list
\layout LyX-Code
>>> x.append(1)
\color blue
# add the integer one to it
\layout LyX-Code
>>> x.extend(['hi', 'mom'])
\color blue
# append two strings to it
\layout LyX-Code
>>> x
\layout LyX-Code
[1, 'hi', 'mom']
\layout LyX-Code
>>> x.reverse()
\color blue
# reverse the list, in place
\layout LyX-Code
>>> x
\layout LyX-Code
['mom', 'hi', 1]
\layout Standard
We mentioned list comprehensions in the last section when discussing string
methods.
List comprehensions are a way of creating a list using a for loop in a
single line of python.
Let's create a list of the perfect cubes from 1 to 10, first with a for
loop and then with a list comprehension.
The list comprehension code will not only be shorter and more elegant,
it can be much faster (the dots are the indentation block indicator from
the python shell and should not be typed)
\layout LyX-Code
\color blue
# a list of perfect cubes using a for-loop
\layout LyX-Code
>>> cubes = []
\layout LyX-Code
>>> for i in range(1,10):
\layout LyX-Code
...
cubes.append(i**3)
\layout LyX-Code
...
\layout LyX-Code
>>> cubes
\layout LyX-Code
[1, 8, 27, 64, 125, 216, 343, 512, 729]
\layout LyX-Code
\layout LyX-Code
\color blue
# functionally equivalent code using list comprehensions
\layout LyX-Code
>>> cubes = [i**3 for i in range(1,10)]
\layout LyX-Code
>>> cubes
\layout LyX-Code
[1, 8, 27, 64, 125, 216, 343, 512, 729]
\layout Standard
The list comprehension code is faster because it all happens at the C level.
In the simple for-loop version, the python expression which appends the
cube of
\family typewriter
i
\family default
has to be evaluated by the python interpreter for each element of the loop.
In the list comprehension example, the single line is parsed once and executed
at the C level.
The difference in speed can be considerable, and the list comprehension
example is shorter and more elegant to boot.
\layout Standard
The remaining essential built-in data strucuture in python is the dictionary,
which is an associative array that maps arbitrary immutable objects to
arbitrary objects.
int, long, float, string and tuple are all immutable and can be used as
keys; to a dictionary list and dict are mutable and cannot.
A dictionary takes one kind of object as the key, and this key points to
another object which is the value.
In a contrived but easy to comprehent examples, one might map names to
ages
\layout LyX-Code
>>> ages = {}
\color blue
# create an empty dict
\layout LyX-Code
>>> ages['john'] = 36
\layout LyX-Code
>>> ages['fernando'] = 33
\layout LyX-Code
>>> ages
\color blue
# view the whole dict
\layout LyX-Code
{'john': 36, 'fernando': 33}
\layout LyX-Code
>>> ages['john']
\layout LyX-Code
36
\layout LyX-Code
>>> ages['john'] = 37
\color blue
# reassign john's age
\layout LyX-Code
>>> ages['john']
\layout LyX-Code
37
\layout Standard
Dictionary lookup is very fast; Tim Peter's once joked that any python program
which uses a dictionary is automatically 10 times faster than any C program,
which is of course false, but makes two worthy points in jest: dictionary
lookup is fast, and dictionaries can be used for important optimizations,
eg, creating a cache of frequently used values.
As a simple eaxample, suppose you needed to compute the product of two
numbers between 1 and 100 in an inner loop -- you could use a dictionary
to cache the product of all possible pairs of numbers < 100 .
\layout LyX-Code
\layout LyX-Code
>>> prod = dict([ ( (i,j), i*j ) for i in range(100)
\layout LyX-Code
for j in range(i,100)] )
\layout LyX-Code
>>> prod[(8,12)]
\layout LyX-Code
96
\layout Standard
The last example is syntactically a bit challenging, but bears careful study.
We are initializing a dictionary with a list comprehension.
The list comprehension is made up of length 2 tuples
\family typewriter
( (i,j), i*j)
\family default
).
When a dictionary is initialized with a sequence of length 2 tuples, it
assumes the first element of the tuple
\family typewriter
(i,j)
\family default
is the key and the second element i*j is the value.
Thus we have a lookup table from pairs of numbers
\family typewriter
i,j
\family default
to their product.
Creating dictionaries from list comprehensions as in this example is something
that hard-core python programmers do almost every day, and you should too.
\layout Exercise
Create a dictionary mapping integers from 0-1000 to their cube using list
comprehensions.
\layout Section
The Zen of Python
\layout Exercise
\family typewriter
>>> import this
\layout Section
Functions and classes
\layout Standard
You can define functions just about anywhere in python code.
The typical function definition takes zero or more arguments, zero or more
keyword arguments, and is followed by a documentation string and the function
definition, optionally returing a value.
Here is a function to compute the hypoteneuse of a right triange
\layout LyX-Code
def hypot(base, height):
\layout LyX-Code
'compute the hypoteneuse of a right triangle'
\layout LyX-Code
import math
\layout LyX-Code
return math.sqrt(base**2 + height**2)
\layout Standard
As in the case of the for-loop, leading white space is significant and is
used to delimt the start and end of the function.
In the example below, x = 1 is not in the function, because it is not indented
\layout LyX-Code
def growone(l):
\layout LyX-Code
'append 1 to a list l'
\layout LyX-Code
l.append(1)
\layout LyX-Code
x = 1
\layout Standard
Note that this function does not return anything, because the append method
modifies the list that was passed in.
python is pretty flexible with functions: you can define functions within
function definitions (just be mindful of your indentation), you can attach
attributes to functions (like other objects), you can pass functions as
arguments to other functions.
\layout Standard
A function keyword argument defines a default value for a function that
can be overridden.
Here is an example which provides a normalize keyword argument.
The default argument is
\family typewriter
normalize=None
\family default
; the value None is a standard python idiom which usually means either do
the default thing or do nothing.
If
\family typewriter
normalize
\family default
is not
\family typewriter
None
\family default
, we assume it is a function that can be called to normalize our data
\layout LyX-Code
def psd(x, normalize=None):
\layout LyX-Code
'compute the power spectral density of x'
\layout LyX-Code
if normalize is not None: x = normalize(x)
\layout LyX-Code
\color blue
# compute the power spectra of x and return it
\layout Standard
This function could be called with or without a
\family typewriter
normalize
\family default
keyword argument, since if the argument is not passed, the dcefault of
\family typewriter
None
\family default
is assumed
\layout LyX-Code
\layout LyX-Code
\color blue
# no normalize argument do the default thing
\layout LyX-Code
>>> psd(x)
\layout LyX-Code
\layout LyX-Code
\color blue
# define a custom normalize function as pass it to psd
\layout LyX-Code
>>> def unitstd(x): return x/std(x)
\layout LyX-Code
>>> psd(x, normalize=unitstd)
\the_end