2895 lines (2233 with data), 60.2 kB
      
      
        #LyX 1.3 created this file. For more info see http://www.lyx.org/
\lyxformat 221
\textclass amsbook
\language english
\inputencoding auto
\fontscheme default
\graphics default
\paperfontsize default
\papersize Default
\paperpackage a4
\use_geometry 0
\use_amsmath 0
\use_natbib 0
\use_numerical_citations 0
\paperorientation portrait
\secnumdepth 3
\tocdepth 3
\paragraph_separation indent
\defskip medskip
\quotes_language english
\quotes_times 2
\papercolumns 1
\papersides 1
\paperpagestyle default
\layout Chapter
A whirlwind tour of python and the standard library
\begin_inset OptArg
collapsed true
\layout Standard
Python intro
\end_inset 
\layout Standard
This is a quick-and-dirty introduction to the python language for the impatient
 scientist.
 There are many top notch, comprehensive introductions and tutorials for
 python.
 For absolute beginners, there is the 
\shape italic 
Python Beginner's Guide
\shape default 
.
\begin_inset Foot
collapsed true
\layout Standard
http://www.python.org/moin/BeginnersGuide
\end_inset 
 The official 
\shape italic 
Python Tutorial
\shape default 
 can be read online
\begin_inset Foot
collapsed true
\layout Standard
http://docs.python.org/tut/tut.html
\end_inset 
 or downloaded
\begin_inset Foot
collapsed true
\layout Standard
http://docs.python.org/download.html
\end_inset 
 in a variety of formats.
 There are over 100 python tutorials collected online.
\begin_inset Foot
collapsed true
\layout Standard
http://www.awaretek.com/tutorials.html
\end_inset 
\layout Standard
There are also many excellent books.
 Targetting newbies is Mark Pilgrim's 
\shape italic 
Dive into Python
\shape default 
 which in available in print and for free online
\begin_inset Foot
collapsed true
\layout Standard
http://diveintopython.org/toc/index.html
\end_inset 
, though for absolute newbies even this may be too hard 
\begin_inset LatexCommand \cite{Dive}
\end_inset 
.
 For experienced programmers, David Beasley's 
\shape italic 
Python Essential Reference
\shape default 
 is an excellent introduction to python, but is a bit dated since it only
 covers python2.1 
\begin_inset LatexCommand \cite{Beasley}
\end_inset 
.
 Likwise Alex Martelli's 
\shape italic 
Python in a Nutshell
\shape default 
 is highly regarded and a bit more current -- a 2nd edition is in the works
\begin_inset LatexCommand \cite{Nutshell}
\end_inset 
.
 And 
\shape italic 
The Python Cookbook
\shape default 
 is an extremely useful collection of python idioms, tips and tricks 
\begin_inset LatexCommand \cite{Cookbook}
\end_inset 
.
\layout Standard
But the typical scientist I encounter wants to solve a specific problem,
 eg, to make a certain kind of graph, to numerically integrate an equation,
 or to fit some data to a parametric model, and doesn't have the time or
 interest to read several books or tutorials to get what they want.
 This guide is for them: a short overview of the language to help them get
 to what they want as quickly as possible.
 We get to advanced material pretty quickly, so it may be touch sledding
 if you are a python newbie.
 Take in what you can, and if you start getting dizzy, skip ahead to the
 next section; you can always come back to absorb more detail later, after
 you get your real work done.
\layout Section
Hello Python
\layout Standard
Python is a dynamically typed, object oriented, interpreted language.
 Interpreted means that your program interacts with the python interpreter,
 similar to Matlab, Perl, Tcl and Java, and unlike FORTRAN, C, or C++ which
 are compiled.
 So let's fire up the python interpreter and get started.
 I'm not going to cover installing python -- it's standard on most linux
 boxes and for windows there is a friendly GUI installer.
 To run the python interpreter, on windows, you can click 
\family typewriter 
Start->All Programs->Python 2.4->Python (command line)
\family default 
 or better yet, install 
\family typewriter 
ipython
\family default 
, a python shell on steroids, and use that.
 On linux / unix systems, you just need to type 
\family typewriter 
python
\family default 
 or 
\family typewriter 
ipython
\family default 
 at the command line.
 The 
\family typewriter 
>>>
\family default 
 is the default python shell prompt, so don't type it in the examples below
\layout LyX-Code
>>> print 'hello world'
\layout LyX-Code
hello world
\layout LyX-Code
\layout Standard
As this example shows, 
\shape italic 
hello world
\shape default 
 in python is pretty easy -- one common phrase you hear in the python community
 is that 
\begin_inset Quotes eld
\end_inset 
it fits your brain
\begin_inset Quotes erd
\end_inset 
.
 -- the basic idea is that coding in python feels natural.
 Compare python's version with 
\shape italic 
hello world
\shape default 
 in C++
\layout LyX-Code
// C++
\layout LyX-Code
#include <iostream>
\layout LyX-Code
int main ()
\layout LyX-Code
{   
\layout LyX-Code
  std::cout << "Hello World" << std::endl;
\layout LyX-Code
  return 0;
\layout LyX-Code
}
\layout Section
\begin_inset LatexCommand \label{sec:into_calculator}
\end_inset 
Python is a calculator
\begin_inset OptArg
collapsed true
\layout Standard
Calculator
\end_inset 
\layout Standard
Aside from my daughter's solar powered cash-register calculator, Python
 is the only calculator I use.
 From the python shell, you can type arbitrary arithmetic expressions.
\layout LyX-Code
>>> 2+2
\layout LyX-Code
4
\layout LyX-Code
>>> 2**10
\layout LyX-Code
1024
\layout LyX-Code
>>> 10/5
\layout LyX-Code
2
\layout LyX-Code
>>> 2+(24.3 + .9)/.24
\layout LyX-Code
107.0
\layout LyX-Code
>>> 2/3
\layout LyX-Code
0
\layout Standard
The last line is a standard newbie gotcha -- if both the left and right
 operands are integers, python returns an integer.
 To do floating point division, make sure at least one of the numbers is
 a float
\layout LyX-Code
>>> 2.0/3
\layout LyX-Code
0.66666666666666663
\layout Standard
The distinction between integer and floating point division is a common
 source of frustration among newbies and is slated for destruction in the
 mythical Python 3000.
\begin_inset Foot
collapsed true
\layout Standard
Python 3000 is a future python release that will clean up several things
 that Guido considers to be warts.
\end_inset 
 Since default integer division will be removed in the future, you can invoke
 the time machine with the 
\family typewriter 
from __future__
\family default 
 directives; these directives allow python programmers today to use features
 that will become standard in future releases but are not included by default
 because they would break existing code.
 From future directives should be among the first lines you type in your
 python code if you are going to use them, otherwise they may not work.
 The future division operator will assume floating point division by default,
\begin_inset Foot
collapsed false
\layout Standard
You may have noticed that 2/3 was represented as 0.66666666666666663 and
 not 0.66666666666666666 as might be expected.
 This is because computers are binary calculators, and there is no exact
 binary representation of 2/3, just as there is no exact binary representation
 of 0.1
\layout LyX-Code
>>> 0.1
\layout LyX-Code
0.10000000000000001
\layout Standard
Some languages try and hide this from you, but python is explicit.
\end_inset 
and provides another operator // to do classic integer division.
\layout LyX-Code
>>> from __future__ import division
\layout LyX-Code
>>> 2/3
\layout LyX-Code
0.66666666666666663
\layout LyX-Code
>>> 2//3
\layout LyX-Code
0
\layout Standard
python has four basic numeric types: int, long, float and complex, but unlike
 C++, BASIC, FORTRAN or Java, you don't have to declare these types.
 python can infer them
\layout LyX-Code
>>> type(1)
\layout LyX-Code
<type 'int'>
\layout LyX-Code
>>> type(1.0)
\layout LyX-Code
<type 'float'>
\layout LyX-Code
>>> type(2**200)
\layout LyX-Code
<type 'long'>
\layout LyX-Code
\layout Standard
\begin_inset Formula $2^{200}$
\end_inset 
is a huge number!
\layout LyX-Code
>>> 2**200
\layout LyX-Code
1606938044258990275541962092341162602522202993782792835301376L
\layout Standard
but python will blithely compute it and much larger numbers for you as long
 as you have CPU and memory to handle them.
 The integer type, if it overflows, will automatically convert to a python
 
\family typewriter 
long
\family default 
 (as indicated by the appended 
\family typewriter 
L
\family default 
 in the output above) and has no built-in upper bound on size, unlike C/C++
 longs.
\layout Standard
Python has built in support for complex numbers.
 Eg, we can verify 
\begin_inset Formula $i^{2}=-1$
\end_inset 
 
\layout LyX-Code
>>> x = complex(0,1)
\layout LyX-Code
>>> x*x
\layout LyX-Code
(-1+0j)
\layout Standard
To access the real and imaginary parts of a complex number, use the 
\family typewriter 
real
\family default 
 and 
\family typewriter 
imag
\family default 
 attributes
\layout LyX-Code
>>> x.real
\layout LyX-Code
0.0
\layout LyX-Code
>>> x.imag
\layout LyX-Code
1.0
\layout Standard
If you come from other languages like Matlab, the above may be new to you.
 In matlab, you might do something like this (>> is the standard matlab
 shell prompt)
\layout LyX-Code
>> x = 0+j
\layout LyX-Code
x =
\layout LyX-Code
   0.0000 + 1.0000i
\layout LyX-Code
\layout LyX-Code
>> real(x)
\layout LyX-Code
ans =
\layout LyX-Code
     0
\layout LyX-Code
\layout LyX-Code
>> imag(x)
\layout LyX-Code
ans =
\layout LyX-Code
     1
\layout LyX-Code
\layout LyX-Code
\layout Standard
That is, in Matlab, you use a 
\shape italic 
function
\shape default 
 to access the real and imaginary parts of the data, but in python these
 are attributes of the complex object itself.
 This is a core feature of python and other object oriented languages: an
 object carries its data and methods around with it.
 One might say: 
\begin_inset Quotes eld
\end_inset 
a complex number knows it's real and imaginary parts
\begin_inset Quotes erd
\end_inset 
 or 
\begin_inset Quotes eld
\end_inset 
a complex number knows how to take its conjugate
\begin_inset Quotes erd
\end_inset 
, you don't need external functions for these operations
\layout LyX-Code
>>> x.conjugate
\layout LyX-Code
<built-in method conjugate of complex object at 0xb6a62368>
\layout LyX-Code
>>> x.conjugate()
\layout LyX-Code
-1j
\layout Standard
On the first line, I just followed along from the example above with 
\family typewriter 
real
\family default 
 and 
\family typewriter 
imag
\family default 
 and typed 
\family typewriter 
x.conjugate
\family default 
 and python printed the representation 
\family typewriter 
<built-in method conjugate of complex object at 0xb6a62368>.
 
\family default 
This means that 
\family typewriter 
conjugate
\family default 
 is a 
\shape italic 
method
\shape default 
, a.k.a a function, and in python we need to use parentheses to call a function.
 If the method has arguments, like the 
\family typewriter 
x
\family default 
 in 
\family typewriter 
sin(x)
\family default 
, you place them inside the parentheses, and if it has no arguments, like
 
\family typewriter 
conjugate
\family default 
, you simply provide the open and closing parentheses.
 
\family typewriter 
real
\family default 
, 
\family typewriter 
imag
\family default 
 and 
\family typewriter 
conjugate
\family default 
 are attributes of the complex object, and 
\family typewriter 
conjugate
\family default 
 is a 
\shape italic 
callable
\shape default 
 attribute, known as a 
\shape italic 
method
\shape default 
.
\layout Standard
OK, now you are an object oriented programmer.
 There are several key ideas in object oriented programming, and this is
 one of them: an object carries around with it data (simple attributes)
 and methods (callable attributes) that provide additional information about
 the object and perform services.
 It's one stop shopping -- no need to go to external functions and libraries
 to deal with it -- the object knows how to deal with itself.
\layout Section
Accessing the standard library
\begin_inset OptArg
collapsed true
\layout Standard
Standard Library
\end_inset 
\layout Standard
Arithmetic is fine, but before long you may find yourself tiring of it and
 wanting to compute logarithms and exponents, sines and cosines
\layout LyX-Code
>>> log(10)
\layout LyX-Code
Traceback (most recent call last):
\layout LyX-Code
  File "<stdin>", line 1, in ?
\layout LyX-Code
NameError: name 'log' is not defined
\layout Standard
These functions are not built into python, but don't despair, they are built
 into the python standard library.
 To access a function from the standard library, or an external library
 for that matter, you must import it.
\layout LyX-Code
>>> import math
\layout LyX-Code
>>> math.log(10)
\layout LyX-Code
2.3025850929940459
\layout LyX-Code
>>> math.sin(math.pi)
\layout LyX-Code
1.2246063538223773e-16
\layout Standard
Note that the default 
\family typewriter 
log
\family default 
 function is a base 2 logarithm (use 
\family typewriter 
math.log10
\family default 
 for base 10 logs) and that floating point math is inherently imprecise,
 since analytically
\begin_inset Formula $\sin(\pi)=0$
\end_inset 
.
\layout Standard
It's kind of a pain to keep typing 
\family typewriter 
math.log
\family default 
 and 
\family typewriter 
math.sin
\family default 
 and 
\family typewriter 
math.p
\family default 
i, and python is accomodating.
 There are additional forms of 
\family typewriter 
import
\family default 
 that will let you save more or less typing depending on your desires
\layout LyX-Code
\color blue
# Appreviate the module name: m is an alias
\layout LyX-Code
>>> import math as m
\layout LyX-Code
>>> m.cos(2*m.pi)
\layout LyX-Code
1.0
\layout LyX-Code
\layout LyX-Code
\color blue
# Import just the names you need
\layout LyX-Code
>>> from math import exp, log
\layout LyX-Code
>>> log(exp(1))
\layout LyX-Code
1.0
\layout LyX-Code
\layout LyX-Code
\color blue
# Import everything - use with caution!
\layout LyX-Code
>>> from math import *
\layout LyX-Code
>>> sin(2*pi*10)
\layout LyX-Code
-2.4492127076447545e-15
\layout Standard
To help you learn more about what you can find in the math library, python
 has nice introspection capabilities -- introspection is a way of asking
 an object about itself.
 For example, to find out what is available in the math library, we can
 get a directory of everything available with the 
\family typewriter 
dir
\family default 
 command
\begin_inset Foot
collapsed false
\layout Standard
In addition to the introdpection and help provided in the python interpreter,
 the official documentation of the python standard library is very good
 and up-to-date http://docs.python.org/lib/lib.html .
\end_inset 
\layout LyX-Code
>>> dir(math)
\layout LyX-Code
['__doc__', '__file__', '__name__', 'acos', 'asin', 'atan', 'atan2', 'ceil',
 'cos', 'cosh', 'degrees', 'e', 'exp', 'fabs', 'floor', 'fmod', 'frexp',
 'hypot', 'ldexp', 'log', 'log10', 'modf', 'pi', 'pow', 'radians', 'sin',
 'sinh', 'sqrt', 'tan', 'tanh']
\layout Standard
This gives us just a listing of the names that are in the math module --
 they are fairly self descriptive, but if you want more, you can call 
\family typewriter 
help
\family default 
 on any of these functions for more information
\layout LyX-Code
>>> help(math.sin) 
\layout LyX-Code
Help on built-in function sin:
\layout LyX-Code
sin(...)
\layout LyX-Code
sin(x)
\layout LyX-Code
Return the sine of x (measured in radians).
\layout Standard
and for the whole math library
\layout LyX-Code
>>> help(math) 
\layout LyX-Code
Help on module math:
\layout LyX-Code
 
\layout LyX-Code
NAME
\layout LyX-Code
    math
\layout LyX-Code
 
\layout LyX-Code
FILE
\layout LyX-Code
    /usr/local/lib/python2.3/lib-dynload/math.so
\layout LyX-Code
 
\layout LyX-Code
DESCRIPTION
\layout LyX-Code
    This module is always available.
  It provides access to the
\layout LyX-Code
    mathematical functions defined by the C standard.
\layout LyX-Code
 
\layout LyX-Code
FUNCTIONS
\layout LyX-Code
    acos(...)
\layout LyX-Code
        acos(x)
\layout LyX-Code
         
\layout LyX-Code
        Return the arc cosine (measured in radians) of x.
\layout LyX-Code
     
\layout LyX-Code
    asin(...)
\layout LyX-Code
        asin(x)
\layout LyX-Code
         
\layout LyX-Code
        Return the arc sine (measured in radians) of x.
\layout LyX-Code
     
\layout Standard
And much more which is snipped.
 Likewise, we can get information on the complex object in the same way
\layout LyX-Code
>>> x = complex(0,1)
\layout LyX-Code
>>> dir(x)
\layout LyX-Code
['__abs__', '__add__', '__class__', '__coerce__', '__delattr__', '__div__',
 '__divmod__', '__doc__', '__eq__', '__float__', '__floordiv__', '__ge__',
 '__getattribute__', '__getnewargs__', '__gt__', '__hash__', '__init__',
 '__int__', '__le__', '__long__', '__lt__', '__mod__', '__mul__', '__ne__',
 '__neg__', '__new__', '__nonzero__', '__pos__', '__pow__', '__radd__',
 '__rdiv__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloor
div__', '__rmod__', '__rmul__', '__rpow__', '__rsub__', '__rtruediv__',
 '__setattr__', '__str__', '__sub__', '__truediv__', 'conjugate', 'imag',
 'real']
\layout LyX-Code
\layout Standard
Notice that called 
\family typewriter 
dir
\family default 
 or 
\family typewriter 
help
\family default 
 on the 
\family typewriter 
math
\family default 
 
\shape italic 
module
\shape default 
, the 
\family typewriter 
math.sin
\family default 
 
\shape italic 
function
\shape default 
, and the 
\family typewriter 
complex
\family default 
 
\shape italic 
number
\shape default 
 
\family typewriter 
x
\family default 
.
 That's because modules, functions and numbers are all 
\shape italic 
objects
\shape default 
, and we use the same object introspection and help capabilites on them.
 We can find out what type of object they are by calling 
\family typewriter 
type
\family default 
 on them, which is another function in python's introspection arsenal
\layout LyX-Code
>>> type(math)
\layout LyX-Code
<type 'module'>
\layout LyX-Code
>>> type(math.sin)
\layout LyX-Code
<type 'builtin_function_or_method'>
\layout LyX-Code
>>> type(x)
\layout LyX-Code
<type 'complex'>
\layout LyX-Code
\layout Standard
Now, you may be wondering: what were all those god-awful looking double
 underscore methods, like 
\family typewriter 
__abs__ 
\family default 
and 
\family typewriter 
__mul__
\family default 
 in the 
\family typewriter 
dir
\family default 
 listing of the complex object above? These are methods that define what
 it means to be a numeric type in python, and the complex object implements
 these methods so that complex numbers act like the way should, eg 
\family typewriter 
__mul__
\family default 
 implements the rules of complex multiplication.
 The nice thing about this is that python specifies an application programming
 interface (API) that is the definition of what it means to be a number
 in python.
 And this means you can define your own numeric types, as long as you implement
 the required special double underscore methods for your custom type.
 double underscore methods are very important in python; although the typical
 newbie never sees them or thinks about them, they are there under the hood
 providing all the python magic, and more importantly, showing the way to
 let you make magic.
\layout Section
\begin_inset LatexCommand \label{sec:intro_string}
\end_inset 
Strings
\layout Standard
We've encountered a number of types of objects above: int, float, long,
 complex, method/function and module.
 We'll continue our tour with an introduction to strings, which are critical
 components of almost every program.
 You can create strings in a number of different ways, with single quotes,
 double quotes, or triple quotes -- this diversity of methods makes it easy
 if you need to embed string characters in the string itself
\layout LyX-Code
\color blue
# single, double and triple quoted strings
\layout LyX-Code
>>> s = 'Hi Mom!'
\layout LyX-Code
>>> s = "Hi Mom!"
\layout LyX-Code
>>> s = """Porky said, "That's all folks!" """
\layout Standard
You can add strings together to concatenate them
\layout LyX-Code
\color blue
# concatenating strings
\layout LyX-Code
>>> first = 'John'
\layout LyX-Code
>>> last = 'Hunter'
\layout LyX-Code
>>> first+last
\layout LyX-Code
'JohnHunter'
\layout Standard
or call string methods to process them: upcase them or downcase them, or
 replace one character with another
\layout LyX-Code
\color blue
# string methods
\layout LyX-Code
>>> last.lower()
\layout LyX-Code
'hunter'
\layout LyX-Code
>>> last.upper()
\layout LyX-Code
'HUNTER'
\layout LyX-Code
>>> last.replace('h', 'p')
\layout LyX-Code
'Hunter'
\layout LyX-Code
>>> last.replace('H', 'P')
\layout LyX-Code
'Punter' 
\layout Standard
Note that in all of these examples, the string 
\family typewriter 
last
\family default 
 is unchanged.
 All of these methods operate on the string and return a new string, leaving
 the original unchanged.
 In fact, python strings cannot be changed by any python code at all: they
 are 
\shape italic 
immutable
\shape default 
 (unchangeable).
 The concept of mutable and immutable objects in python is an important
 one, and it will come up again, because only immutable objects can be used
 as keys in python dictionaries and elements of python sets.
\layout Standard
You can access individual characters, or slices of the string (substrings),
 using indexing.
 A string in sequence of characters, and strings implement the sequence
 protocol in python -- we'll see more examples of python sequences later
 -- and all sequences have the same syntax for accessing their elements.
 Python uses 0 based indexing which means the first element is at index
 0; you can use negative indices to access the last elements in the sequence
\layout LyX-Code
\color blue
# string indexing
\layout LyX-Code
>>> last = 'Hunter'
\layout LyX-Code
>>> last[0]
\layout LyX-Code
'H'
\layout LyX-Code
>>> last[1]
\layout LyX-Code
'u'
\layout LyX-Code
>>> last[-1] 
\layout LyX-Code
'r' 
\layout Standard
To access substrings, or generically in terms of the sequence protocol,
 slices, you use a colon to indicate a range
\layout LyX-Code
\color blue
# string slicing
\layout LyX-Code
>>> last[0:2]
\layout LyX-Code
'Hu'
\layout LyX-Code
>>> last[2:4]
\layout LyX-Code
'nt'
\layout Standard
As this example shows, python uses 
\begin_inset Quotes eld
\end_inset 
one-past-the-end
\begin_inset Quotes erd
\end_inset 
 indexing when defining a range; eg, in the range 
\family typewriter 
indmin:indmax
\family default 
, the element of 
\family typewriter 
imax
\family default 
 is not included.
 You can use negative indices when slicing too; eg, to get everything before
 the last character
\layout LyX-Code
>>> last[0:-1]
\layout LyX-Code
'Hunte'
\layout Standard
You can also leave out either the min or max indicator; if they are left
 out, 0 is assumed to be the 
\family typewriter 
indmin
\family default 
 and one past the end of the sequence is assumed to be 
\family typewriter 
indmax
\layout LyX-Code
>>> last[:3]
\layout LyX-Code
'Hun'
\layout LyX-Code
>>> last[3:]
\layout LyX-Code
'ter'
\layout Standard
There is a third number that can be placed in a slice, a step, with syntax
 indmin:indmax:step; eg, a step of 2 will skip every second letter
\layout LyX-Code
>>> last[1:6:2]
\layout LyX-Code
'utr'
\layout Standard
Although this may be more that you want to know about slicing strings, the
 time spent here is worthwhile.
 As mentioned above, all python sequences obey these rules.
 In addition to strings, lists and tuples, which are built-in python sequence
 data types and are discussed in the next section, the numeric arrays widely
 used in scientific computing also implement the sequence protocol, and
 thus have the same slicing rules.
\layout Exercise
What would you expect last[:] to return?
\layout Standard
One thing that comes up all the time is the need to create strings out of
 other strings and numbers, eg to create filenames from a combination of
 a base directory, some base filename, and some numbers.
 Scientists like to create lots of data files like and then write code to
 loop over these files and analyze them.
 We're going to show how to do that, starting with the newbie way and progressiv
ely building up to the way of python zen master.
 All of the methods below 
\shape italic 
work
\shape default 
, but the zen master way will more efficient, more scalable (eg to larger
 numbers of files) and cross-platform.
\begin_inset Foot
collapsed false
\layout Standard
\begin_inset Quotes eld
\end_inset 
But it works
\begin_inset Quotes erd
\end_inset 
 is a common defense of bad code; my rejoinder to this is 
\begin_inset Quotes eld
\end_inset 
A computer scientist is someone who fixes things that aren't broken
\begin_inset Quotes erd
\end_inset 
.
 
\end_inset 
 Here's the newbie way: we also introduce the for-loop here in the spirit
 of diving into python -- note that python uses whitespace indentation to
 delimit the for-loop code block
\layout LyX-Code
\color blue
# The newbie way
\layout LyX-Code
for i in (1,2,3,4):
\layout LyX-Code
    fname = 'data/myexp0' + str(i) + '.dat'
\layout LyX-Code
    print fname
\layout Standard
Now as promised, this will print out the 4 file names above, but it has
 three flaws: it doesn't scale to 10 or more files, it is inefficient, and
 it is not cross platform.
 It doesn't scale because it hard-codes the '
\family typewriter 
0
\family default 
' after 
\family typewriter 
myexp
\family default 
, it is inefficient because to add several strings requires the creation
 of temporary strings, and it is not cross-platform because it hard-codes
 the directory separator '/'.
\layout LyX-Code
\color blue
# On the path to elightenment
\layout LyX-Code
for i in (1,2,3,4):
\layout LyX-Code
    fname = 'data/myexp%02d.dat'%i
\layout LyX-Code
    print fname
\layout Standard
This example uses string interpolation, the funny % thing.
 If you are familiar with C programming, this will be no surprise to you
 (on linux/unix systems do 
\family typewriter 
man sprintf 
\family default 
at the unix shell).
 The percent character is a string formatting character: 
\family typewriter 
%02d
\family default 
 means to take an integer (the 
\family typewriter 
d
\family default 
 part) and print it with two digits, padding zero on the left (the
\family typewriter 
 %02
\family default 
 part).
 There is more to be said about string interpolation, but let's finish the
 job at hand.
 This example is better than the newbie way because is scales up to files
 numbered 0-99, and it is more efficient because it avoids the creation
 of temporary strings.
 For the platform independent part, we go to the python standard library
 
\family typewriter 
os.path
\family default 
, which provides a host of functions for platform-independent manipulations
 of filenames, extensions and paths.
 Here we use 
\family typewriter 
os.path.join
\family default 
 to combine the directory with the filename in a platform independent way.
 On windows, it will use the windows path separator '
\backslash 
' and on unix it will use '/'.
\layout LyX-Code
\color blue
# the zen master approach
\layout LyX-Code
import os
\layout LyX-Code
for i in (1,2,3,4):
\layout LyX-Code
    fname = os.path.join('data', 'myexp%02d.dat'%i)
\layout LyX-Code
    print fname
\layout Exercise
Suppose you have data files named like
\layout LyX-Code
data/2005/exp0100.dat
\layout LyX-Code
data/2005/exp0101.dat
\layout LyX-Code
data/2005/exp0102.dat
\layout LyX-Code
...
\layout LyX-Code
data/2005/exp1000.dat
\layout Standard
Write the python code that iterates over these files, constructing the filenames
 as strings in using 
\family typewriter 
os.path.join
\family default 
 to construct the paths in a platform-independent way.
 
\shape italic 
Hint
\shape default 
: read the help for 
\family typewriter 
os.path.join
\family default 
!
\layout Standard
OK, I promised to torture you a bit more with string interpolation -- don't
 worry, I remembered.
 The ability to properly format your data when printing it is crucial in
 scientific endeavors: how many signficant digits do you want, do you want
 to use integer, floating point representation or exponential notation?
 These three choices are provided with 
\family typewriter 
%d
\family default 
, 
\family typewriter 
%f
\family default 
 and 
\family typewriter 
%e
\family default 
, with lots of variations on the theme to indicate precision and more
\layout LyX-Code
>>> 'warm for %d minutes at %1.1f C' % (30, 37.5)
\layout LyX-Code
'warm for 30 minutes at 37.5 C'
\layout LyX-Code
\layout LyX-Code
>>> 'The mass of the sun is %1.4e kg'% (1.98892*10**30)
\layout LyX-Code
'The mass of the sun is 1.9889e+30 kg'
\layout LyX-Code
\layout Standard
There are two string methods, 
\family typewriter 
split
\family default 
 and 
\family typewriter 
join
\family default 
, that arise frequenctly in Numeric processing, specifically in the context
 of processing data files that have comma, tab, or space separated numbers
 in them.
 
\family typewriter 
split
\family default 
 takes a single string, and splits it on the indicated character to a sequence
 of strings.
 This is useful to take a single line of space or comma separated values
 and split them into individual numbers
\layout LyX-Code
\color blue
# s is a single string and we split it into a list of strings
\layout LyX-Code
\color blue
# for further processing
\layout LyX-Code
>>> s = '1.0 2.0 3.0 4.0 5.0'
\layout LyX-Code
>>> s.split(' ')
\layout LyX-Code
['1.0', '2.0', '3.0', '4.0', '5.0']
\layout Standard
The return value, with square brackets, indicates that python has returned
 a list of strings.
 These individual strings need further processing to convert them into actual
 floats, but that is the first step.
  The conversion to floats will be discussed in the next session, when we
 learn about list comprehensions.
 The converse method is join, which is often used to create string output
 to an ASCII file from a list of numbers.
 In this case you want to join a list of numbers into a single line for
 printing to a file.
 The example below will be clearer after the next section, in which lists
 are discussed
\layout LyX-Code
\color blue
# vals is a list of floats and we convert it to a single
\layout LyX-Code
\color blue
# space separated string
\layout LyX-Code
>>> vals = [1.0, 2.0, 3.0, 4.0, 5.0]
\layout LyX-Code
>>> ' '.join([str(val) for val in vals])
\layout LyX-Code
'1.0 2.0 3.0 4.0 5.0'
\layout Standard
There are two new things in the example above.
 One, we called the join method directly on a string itself, and not on
 a variable name.
 Eg, in the previous examples, we always used the name of the object when
 accessing attributes, eg 
\family typewriter 
x.real
\family default 
 or 
\family typewriter 
s.upper()
\family default 
.
 In this example, we call the 
\family typewriter 
join
\family default 
 method on the string which is a single space.
 The second new feature is that we use a list comprehension 
\family typewriter 
[str(val) for val in vals] 
\family default 
as the argument to 
\family typewriter 
join
\family default 
.
 
\family typewriter 
join
\family default 
 requires a sequence of strings, and the list comprehension converts a list
 of floats to a strings.
 This can be confusing at first, so don't dispair if it is.
 But it is worth bringing up early because list comprehensions are a very
 useful feature of python.
 To help elucidate, compare 
\family typewriter 
vals
\family default 
, which is a list of floats, with the conversion of 
\family typewriter 
vals
\family default 
 to a list of strings using list comprehensions in the next line
\layout LyX-Code
\color blue
# converting a list of floats to a list of strings
\layout LyX-Code
>>> vals
\layout LyX-Code
[1.0, 2.0, 3.0, 4.0, 5.0]
\layout LyX-Code
>>> [str(val) for val in vals] 
\layout LyX-Code
['1.0', '2.0', '3.0', '4.0', '5.0']
\layout Section
The basic python data structures
\begin_inset OptArg
collapsed true
\layout Standard
Data Structures
\end_inset 
\layout Standard
Strings, covered in the last section, are sequences of characters.
 python has two additional built-in sequence types which can hold arbitrary
 elements: tuples and lists.
 tuples are created using parentheses, and lists are created using square
 brackets
\layout LyX-Code
\color blue
# a tuple and a list of elements of the same type
\layout LyX-Code
\color blue
# (homogeneous)
\layout LyX-Code
>>> t = (1,2,3,4)  # tuple
\layout LyX-Code
>>> l = [1,2,3,4]  # list
\layout Standard
Both tuples and lists can also be used to hold elements of different types
\layout LyX-Code
\color blue
# a tuple and list of int, string, float
\layout LyX-Code
>>> t = (1,'john', 3.0)
\layout LyX-Code
>>> l = [1,'john', 3.0]
\layout Standard
Tuples and lists have the same indexing and slicing rules as each other,
 and as string discussed above, because both implement the python sequence
 protocol, with the only difference being that tuple slices return tuples
 (indicated by the parentheses below) and list slices return lists (indicated
 by the square brackets)
\layout LyX-Code
# indexing and slicing tuples and lists
\layout LyX-Code
>>> t[0]
\layout LyX-Code
1
\layout LyX-Code
>>> l[0]
\layout LyX-Code
1
\layout LyX-Code
>>> t[:-1]
\layout LyX-Code
(1, 'john')
\layout LyX-Code
>>> l[:-1]
\layout LyX-Code
[1, 'john']
\layout Standard
So why the difference between tuples and lists? A number of explanations
 have been offered on the mailing lists, but the only one that makes a differenc
e to me is that tuples are immutable, like strings, and hence can be used
 as keys to python dictionaries and included as elements of sets, and lists
 are mutable, and cannot.
 So a tuple, once created, can never be changed, but a list can.
 For example, if we try to reassign the first element of the tuple above,
 we get an error
\layout LyX-Code
>>> t[0] = 'why not?'
\layout LyX-Code
Traceback (most recent call last):
\layout LyX-Code
 File "<stdin>", line 1, in ?
\layout LyX-Code
TypeError: object doesn't support item assignment
\layout Standard
But the same operation is perfectly accetable for lists
\layout LyX-Code
>>> l[0] = 'why not?'
\layout LyX-Code
>>> l
\layout LyX-Code
['why not?', 'john', 3.0]
\layout Standard
lists also have a lot of methods, tuples have none, save the special double
 underscore methods that are required for python objects and sequences
\layout LyX-Code
\color blue
# tuples contain only 
\begin_inset Quotes eld
\end_inset 
hidden
\begin_inset Quotes erd
\end_inset 
 double underscore methods
\layout LyX-Code
>>> dir(t)
\layout LyX-Code
['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__',
 '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__getslice__',
 '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__',
 '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__',
 '__rmul__', '__setattr__', '__str__']
\layout LyX-Code
\layout LyX-Code
\color blue
# but lists contain other methods, eg append, extend and
\layout LyX-Code
\color blue
# reverse
\layout LyX-Code
>>> dir(l)
\layout LyX-Code
['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__delsli
ce__', '__doc__', '__eq__', '__ge__', '__getattribute__', '__getitem__',
 '__getslice__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__',
 '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__',
 '__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr__', '__setite
m__', '__setslice__', '__str__', 'append', 'count', 'extend', 'index', 'insert',
 'pop', 'remove', 'reverse', 'sort']
\layout Standard
Many of these list methods change, or mutate, the list, eg append adds an
 element to the list
\family typewriter 
: extend
\family default 
 extends the list with a sequence of elements, 
\family typewriter 
sort
\family default 
 sorts the list in place, 
\family typewriter 
reverse
\family default 
 reverses it in place, 
\family typewriter 
pop
\family default 
 takes an element off the list and returns it.
\layout Standard
We've seen a couple of examples of creating a list above -- let's look at
 some more using list methods
\layout LyX-Code
>>> x = []                   
\color blue
# create the empty list
\layout LyX-Code
>>> x.append(1)              
\color blue
# add the integer one to it
\layout LyX-Code
>>> x.extend(['hi', 'mom'])  
\color blue
# append two strings to it
\layout LyX-Code
>>> x
\layout LyX-Code
[1, 'hi', 'mom']
\layout LyX-Code
>>> x.reverse()              
\color blue
# reverse the list, in place
\layout LyX-Code
>>> x
\layout LyX-Code
['mom', 'hi', 1]
\layout LyX-Code
>>> len(x)
\layout LyX-Code
3
\layout Standard
We mentioned list comprehensions in the last section when discussing string
 methods.
  List comprehensions are a way of creating a list using a for loop in a
 single line of python.
 Let's create a list of the perfect cubes from 1 to 10, first with a for
 loop and then with a list comprehension.
 The list comprehension code will not only be shorter and more elegant,
 it can be much faster (the dots are the indentation block indicator from
 the python shell and should not be typed)
\layout LyX-Code
\color blue
# a list of perfect cubes using a for-loop
\layout LyX-Code
>>> cubes = []
\layout LyX-Code
>>> for i in range(1,10):
\layout LyX-Code
...
     cubes.append(i**3)
\layout LyX-Code
...
 
\layout LyX-Code
>>> cubes
\layout LyX-Code
[1, 8, 27, 64, 125, 216, 343, 512, 729]
\layout LyX-Code
\layout LyX-Code
\color blue
# functionally equivalent code using list comprehensions
\layout LyX-Code
>>> cubes = [i**3 for i in range(1,10)]
\layout LyX-Code
>>> cubes
\layout LyX-Code
[1, 8, 27, 64, 125, 216, 343, 512, 729]
\layout Standard
The list comprehension code is faster because it all happens at the C level.
  In the simple for-loop version, the python expression which appends the
 cube of 
\family typewriter 
i
\family default 
 has to be evaluated by the python interpreter for each element of the loop.
 In the list comprehension example, the single line is parsed once and executed
 at the C level.
  The difference in speed can be considerable, and the list comprehension
 example is shorter and more elegant to boot.
\layout Standard
The remaining essential built-in data strucuture in python is the dictionary,
 which is an associative array that maps arbitrary immutable objects to
 arbitrary objects.
 int, long, float, string and tuple are all immutable and can be used as
 keys; to a dictionary list and dict are mutable and cannot.
 A dictionary takes one kind of object as the key, and this key points to
 another object which is the value.
 In a contrived but easy to comprehent examples, one might map names to
 ages
\layout LyX-Code
>>> ages = {}            
\color blue
# create an empty dict
\layout LyX-Code
>>> ages['john'] = 36
\layout LyX-Code
>>> ages['fernando'] = 33
\layout LyX-Code
>>> ages                 
\color blue
# view the whole dict
\layout LyX-Code
{'john': 36, 'fernando': 33}
\layout LyX-Code
>>> ages['john']
\layout LyX-Code
36
\layout LyX-Code
>>> ages['john'] = 37    
\color blue
# reassign john's age
\layout LyX-Code
>>> ages['john']
\layout LyX-Code
37
\layout Standard
Dictionary lookup is very fast; Tim Peter's once joked that any python program
 which uses a dictionary is automatically 10 times faster than any C program,
 which is of course false, but makes two worthy points in jest: dictionary
 lookup is fast, and dictionaries can be used for important optimizations,
 eg, creating a cache of frequently used values.
 As a simple eaxample, suppose you needed to compute the product of two
 numbers between 1 and 100 in an inner loop -- you could use a dictionary
 to cache the cube of all odd of numbers < 100; if you were inteterested
 in all numbers, you might simply use a list to store the cached cubes --
 I am cacheing only the odd numbers to show you how a dictionary can be
 used to represent a sparse data structure
\layout LyX-Code
\layout LyX-Code
>>> cubes = dict([ ( i, i**3 ) for i in range(1,100,2)])
\layout LyX-Code
>>> cubes[5]
\layout LyX-Code
125
\layout Standard
The last example is syntactically a bit challenging, but bears careful study.
  We are initializing a dictionary with a list comprehension.
  The list comprehension is made up of length 2 tuples 
\family typewriter 
( i, i**3
\family default 
 ).
  When a dictionary is initialized with a sequence of length 2 tuples, it
 assumes the first element of the tuple
\family typewriter 
 i
\family default 
 is the 
\shape italic 
key
\shape default 
 and the second element i**3is the 
\shape italic 
value
\shape default 
.
  Thus we have a lookup table from odd integers to to cube.
  Creating dictionaries from list comprehensions as in this example is something
 that hard-core python programmers do almost every day, and you should too.
\layout Exercise
Create a lookup table of the product of all pairs of numbers less than 100.
 The key will be a tuple of the two numbers 
\family typewriter 
(i,j)
\family default 
 and the value will be the product.
 Hint: you can loop over multiple ranges in a list comprehension, eg 
\family typewriter 
[ something for i in range(Ni) for j in range(Nj)]
\layout Section
The Zen of Python
\begin_inset OptArg
collapsed true
\layout Standard
Zen
\end_inset 
\layout Exercise
\family typewriter 
>>> import this
\layout Section
Functions and classes
\layout Standard
You can define functions just about anywhere in python code.
 The typical function definition takes zero or more arguments, zero or more
 keyword arguments, and is followed by a documentation string and the function
 definition, optionally returing a value.
 Here is a function to compute the hypoteneuse of a right triange
\layout LyX-Code
def hypot(base, height):
\layout LyX-Code
   'compute the hypoteneuse of a right triangle'
\layout LyX-Code
   import math
\layout LyX-Code
   return math.sqrt(base**2 + height**2)
\layout Standard
As in the case of the for-loop, leading white space is significant and is
 used to delimt the start and end of the function.
 In the example below, x = 1 is not in the function, because it is not indented
\layout LyX-Code
def growone(l):
\layout LyX-Code
   'append 1 to a list l'
\layout LyX-Code
   l.append(1)
\layout LyX-Code
x = 1
\layout Standard
Note that this function does not return anything, because the append method
 modifies the list that was passed in.
 You should be careful when designing functions that have side effects such
 as modifying the structures that are passed in; they should be named and
 documented in such a way that these side effects are clear.
\layout Standard
Python is pretty flexible with functions: you can define functions within
 function definitions (just be mindful of your indentation), you can attach
 attributes to functions (like other objects), you can pass functions as
 arguments to other functions.
 A function keyword argument defines a default value for a function that
 can be overridden.
 Below is an example which provides a normalize keyword argument.
 The default argument is 
\family typewriter 
normalize=None
\family default 
; the value None is a standard python idiom which usually means either do
 the default thing or do nothing.
 If 
\family typewriter 
normalize
\family default 
 is not 
\family typewriter 
None
\family default 
, we assume it is a function that can be called to normalize our data
\layout LyX-Code
def psd(x, normalize=None):
\layout LyX-Code
    'compute the power spectral density of x'
\layout LyX-Code
    if normalize is not None: x = normalize(x)
\layout LyX-Code
   
\color blue
 # compute the power spectra of x and return it
\layout Standard
This function could be called with or without a 
\family typewriter 
normalize
\family default 
 keyword argument, since if the argument is not passed, the default of 
\family typewriter 
None
\family default 
 is used and no normalization is done.
\layout LyX-Code
\layout LyX-Code
\color blue
# no normalize argument; do the default thing
\layout LyX-Code
>>> psd(x)   
\layout LyX-Code
\layout LyX-Code
\color blue
# define a custom normalize function unitstd as pass it
\layout LyX-Code
\color blue
# to psd
\layout LyX-Code
>>> def unitstd(x): return x/std(x)
\layout LyX-Code
>>> psd(x, normalize=unitstd)
\layout LyX-Code
\layout Standard
In Section
\begin_inset LatexCommand \ref{sec:into_calculator}
\end_inset 
 we noticed that complex objects have the real and imag data attributes,
 and the conjugate method.
 An object is an instance of a class that defines it, and in python you
 can easily define your own classes.
 In that section, we emphasized that one of the important features of a
 classes/objects is that they carry around their data and methods in a single
 bundle.
 Let's look at the mechnics of defining classes, and creating instances
 (a.k.a.
 objects) of these classes.
 Classes have a special double underscore method __init__ that is used as
 the function to initialize the class.
 For this example, we'll continue with the normalize theme above, but in
 this case the normalization requires some data parameters.
 This example arises when you want to normalize an image which may range
 over 0-255 (8 bit image) or from 0-65535 (16 bit image) to the 0-1 interval.
 For 16 bit images, you would normally divide everything by 65525, but you
 might want to configure this to a smaller number if your data doesn't use
 the whole intensity range to enhance contrast.
 For simplicitly, let's suppose our normalize class is only interested in
 the pixel maximum, and will divide all the data by that value.
\layout LyX-Code
from __future__ import division  
\color blue
# make sure we do float division
\layout LyX-Code
class Normalize:
\layout LyX-Code
    """
\layout LyX-Code
    A class to normalize data by dividing it by a maximum value
\layout LyX-Code
    """
\layout LyX-Code
    def __init__(self, maxval):
\layout LyX-Code
        'maxval will be mapped to 1'
\layout LyX-Code
        self.maxval = maxval
\layout LyX-Code
    def __call__(self, data):
\layout LyX-Code
        'do the normalization'
\layout LyX-Code
        
\color blue
# in real life you would also want to clip all values of
\layout LyX-Code
\color blue
        # data>maxval so that the returned value will be in the unit
\layout LyX-Code
\color blue
        # interval
\layout LyX-Code
        return data/self.maxval
\layout Standard
The triple quoted string following the definition of class Normalize is
 the class documentation stringd, and it will bre shown to the user when
 they do 
\family typewriter 
help(Normalize)
\family default 
.
 A commonly used convention is to name classes with 
\shape italic 
UpperCase
\shape default 
, but this is not required.
 self is a special variable that a class can use to refer to its own data
 and methods, and must be the first argument to all the class methods.
 The 
\family typewriter 
__init__
\family default 
 method stores the normalization value maxval as a class attribute in 
\family typewriter 
self.maxval
\family default 
, and this value can later be reused by other class methods (as it is in
 
\family typewriter 
__call__
\family default 
) and it can be altered by the user of the class, as will illustrate below.
 The 
\family typewriter 
__call__
\family default 
 method is another piece of python double underscore magic, it allows class
 instances to be used as 
\shape italic 
functions
\shape default 
, eg you can call them just like you can call any function.
 OK, now let's see how you could use this.
 
\layout Standard
The first line use used to create an 
\shape italic 
instance
\shape default 
 of the 
\shape italic 
class
\shape default 
 
\family typewriter 
Normalize
\family default 
, and the special method 
\family typewriter 
__init__
\family default 
 is implicitly called.
 The second line implicitly calls the special 
\family typewriter 
__call__
\family default 
method
\layout LyX-Code
>>> norm = Normalize(65356) 
\color blue
# good for 16 bit images
\layout LyX-Code
>>> norm(255)               
\color blue
# call this function
\layout LyX-Code
0.0039017075708427688
\layout LyX-Code
\layout LyX-Code
\color blue
# We can reset the maxval attribute, and the call method 
\layout LyX-Code
\color blue
# is automagically updated
\layout LyX-Code
>>> norm.maxval = 255       
\color blue
# reset the maxval
\layout LyX-Code
>>> norm(255)               
\color blue
# and call it again
\layout LyX-Code
1.0
\layout LyX-Code
\layout LyX-Code
\color blue
# We can pass the norm instance to the psd function we defined above, which
 
\layout LyX-Code
\color blue
# is expecting a function
\layout LyX-Code
>>> pdf(X, normalize=norm)            
\layout Exercise
Pretend that 
\family typewriter 
complex
\family default 
 were not built-in to the python core, and write your own complex class
 
\family typewriter 
MyComplex
\family default 
.
 Provide 
\family typewriter 
real
\family default 
 and 
\family typewriter 
imag
\family default 
 attributes and the 
\family typewriter 
conjugate
\family default 
 method.
 Define
\family typewriter 
 __abs__
\family default 
, 
\family typewriter 
__mul__
\family default 
 and 
\family typewriter 
__add__
\family default 
 to implement the absolute value of complex numbers, multiplication of complex
 numbers and addition of complex numbers.
 See the API definition of the python number protocol; although this is
 written for C programmers, it contains information about the required function
 call signatures for each of the double underscore methods that define the
 number protocol in python; where they use 
\family typewriter 
o1
\family default 
 on that page, you would use 
\family typewriter 
self
\family default 
 in python, and where they use 
\family typewriter 
o2
\family default 
 you might use 
\family typewriter 
other
\family default 
 in python.
\begin_inset Foot
collapsed true
\layout Standard
http://www.python.org/doc/current/api/number.html
\end_inset 
 To get you started, I'll show you what the 
\family typewriter 
__add__
\family default 
 method should look like
\layout LyX-Code
\color blue
# An example double underscore method required in your MyComplex
\layout LyX-Code
\color blue
# implementation
\layout LyX-Code
def __add__(self, other):
\layout LyX-Code
    'add self to other and return a new MyComplex instance'
\layout LyX-Code
    r = self.real + other.real
\layout LyX-Code
    i = self.imag + other.imag
\layout LyX-Code
    return MyComplex(r,i)
\layout LyX-Code
\layout LyX-Code
\color blue
# When you are finished, test your implementation with 
\layout LyX-Code
>>> x = MyComplex(2,3)
\layout LyX-Code
>>> y = MyComplex(0,1)
\layout LyX-Code
>>> x.real
\layout LyX-Code
2.0
\layout LyX-Code
>>> y.imag
\layout LyX-Code
1.0
\layout LyX-Code
>>> x.conjugate()
\layout LyX-Code
(2-3j)
\layout LyX-Code
>>> x+y
\layout LyX-Code
(2+4j)
\layout LyX-Code
>>> x*y
\layout LyX-Code
(-3+2j)
\layout LyX-Code
>>> abs(x*y)
\layout LyX-Code
3.6055512754639891
\layout LyX-Code
\layout Section
Files and file like objects
\begin_inset OptArg
collapsed true
\layout Standard
Files
\end_inset 
\layout Standard
Working with files is one of the most common and important things we do
 in scientific computing because that is usually where the data lives.
 In Section
\begin_inset LatexCommand \ref{sec:intro_string}
\end_inset 
, we went through the mechanics of automatically building file names like
\layout LyX-Code
data/myexp01.dat
\layout LyX-Code
data/myexp02.dat
\layout LyX-Code
data/myexp03.dat
\layout LyX-Code
data/myexp04.dat
\layout Standard
but we didn't actually do anything with these files.
 Here we'll show how to read in the data and do something with it.
 Python makes working with files easy and dare I say fun.
 The test data set lives in 
\family typewriter 
data/family.csv
\family default 
 and is a standard comma separated value file that contains information
 about my family: first name, last name, age, height in cm, weight in kg
 and birthdate.
 We'll open this file and parse it -- note that python has a standard module
 for parsing CSV files that is much more sophisticated than what I am doing
 here.
 Nevertheless, it serves as an easy to understand example that is close
 enough to real life that it is worth doing.
 Here is what the data file looks like
\layout LyX-Code
First,Last,Age,Weight,Height,Birthday
\layout LyX-Code
John,Hunter,36,175,180,1968-03-05
\layout LyX-Code
Miriam,Sierig,33,135,177,1971-05-04
\layout LyX-Code
Rahel,Hunter,7,55,134,1998-02-25
\layout LyX-Code
Ava,Hunter,3,45,121,2001-04-26
\layout LyX-Code
Clara,Hunter,0,15,55,2004-10-02
\layout Standard
Here is the code to parse that file
\layout LyX-Code
\color blue
# open the file for reading
\layout LyX-Code
fh = file('../data/family.csv', 'r')
\layout LyX-Code
\color blue
# slurp the header, splitting on the comma
\layout LyX-Code
headers = fh.readline().split(',')
\layout LyX-Code
\color blue
# now loop over the remaining lines in the file and parse them
\layout LyX-Code
for line in fh:
\layout LyX-Code
    
\color blue
# remove any leading or trailing white space
\layout LyX-Code
    line = line.strip()
\layout LyX-Code
    
\color blue
# split the line on the comma into separate variables
\layout LyX-Code
    first, last, age, weight, height, dob = line.split(',')
\layout LyX-Code
    
\color blue
# convert some of these strings to floats
\layout LyX-Code
    age, weight, height = [float(val) for val in (age, weight, height)]
\layout LyX-Code
    print first, last, age, weight, height, dob
\layout Standard
This example illustrates several interesting things.
 The syntax for opening a file is 
\family typewriter 
file(filename, mode)
\family default 
 and the 
\family typewriter 
mode
\family default 
 is a string like 
\family typewriter 
'r'
\family default 
 or 
\family typewriter 
'w'
\family default 
 that determines whether you are opening in read or write mode.
 You can also read and write binary files with 
\family typewriter 
'rb'
\family default 
 and
\family typewriter 
 'wb'
\family default 
.
 There are more options and you should do 
\family typewriter 
help(file)
\family default 
 to learn about them.
 We then use the file 
\family typewriter 
readline
\family default 
 method to read in the first line of the file.
 This returns a string (the line of text) and we call the string method
 
\family typewriter 
split(',')
\family default 
 to split that string wherever it sees a comma, and this returns a list
 of strings which are the headers
\layout LyX-Code
>>> headers
\layout LyX-Code
['First', 'Last', 'Age', 'Weight', 'Height', 'Birthday
\backslash 
n']
\layout Standard
The new line character 
\family typewriter 
'
\backslash 
n'
\family default 
 at the end of 
\family typewriter 
'Birthday
\backslash 
n'
\family default 
 indicates we forgot to strip the string of whitespace.
 To fix that, we should have done
\layout LyX-Code
>>> headers = fh.readline().strip().split(',')
\layout LyX-Code
>>> headers
\layout LyX-Code
['First', 'Last', 'Age', 'Weight', 'Height', 'Birthday'] 
\layout Standard
Notice how this works like a pipeline: 
\family typewriter 
fh.readline 
\family default 
returns a line of text as a string; we call the string method 
\family typewriter 
strip
\family default 
 which returns a string with all white space (spaces, tabs, newlines) removed
 from the left and right; we then call the 
\family typewriter 
split
\family default 
 method on this stripped string to split it into a list of strings.
\layout Standard
Next we start to loop over the file -- this is a nice feature of python
 file handles, you can iterate over them as a sequence.
 We've learned our lesson about trailing newlines, so we first strip the
 line with 
\family typewriter 
line = line.strip()
\family default 
.
 The rest is string processing, splitting the line on a comma as we did
 for the headers, and converting the strings to numbers where approriate
 by calling f
\family typewriter 
loat(val)
\family default 
 for each of 
\family typewriter 
age
\family default 
, 
\family typewriter 
weight
\family default 
 and 
\family typewriter 
height
\family default 
.
 Notice how we use list comprehensions and tuple unpacking -- the age, weight,
 
\family typewriter 
height = [float(val) for val in (age, weight, height)] 
\family default 
line, to convert several values at once.
\layout Standard
Now that we have all this data, how mught we store it.
 We could store it in a 
\family typewriter 
results
\family default 
 list
\layout LyX-Code
results = []
\layout LyX-Code
for line in fh:
\layout LyX-Code
    
\color blue
# process the line as above to get the variables
\layout LyX-Code
    results.append( (first, last, age, weight, height, dob) )
\layout LyX-Code
\layout LyX-Code
\layout LyX-Code
\color blue
# and later when we want to analyze the data
\layout LyX-Code
for first, last, age, weight, height, dob in results:
\layout LyX-Code
    
\color blue
# do something with the data
\layout Exercise
\family typewriter 
zip
\family default 
 magic.
  Python has a nice funcion 
\family typewriter 
zip
\family default 
 that lets you do very useful things with lists of tuples.
  
\family typewriter 
results
\family default 
 above is a list of tuples -- each tuple is the 
\family typewriter 
first
\family default 
, 
\family typewriter 
last
\family default 
, 
\family typewriter 
age
\family default 
, 
\family typewriter 
weight
\family default 
, 
\family typewriter 
height
\family default 
, 
\family typewriter 
dob
\family default 
 for a family member.
  What happens if you do 
\layout LyX-Code
>>> first, last, age, weight, height, dob = zip(*results)
\layout Standard
What is 
\family typewriter 
age
\family default 
 now?
\layout Exercise
Write a class 
\family typewriter 
Person
\family default 
 and store the attributes 
\family typewriter 
first
\family default 
, 
\family typewriter 
last
\family default 
, 
\family typewriter 
age
\family default 
, 
\family typewriter 
weight
\family default 
, 
\family typewriter 
height
\family default 
, 
\family typewriter 
dob
\family default 
 in that class.
  Add a class instance to the results list, eg
\layout LyX-Code
results.append(Person(first, last, age, weight, height, dob))
\layout Standard
Python also has a special syntax for printing to an open writable file object
\layout LyX-Code
\color blue
# open the file for writing
\layout LyX-Code
outfile = file('mydata.data', 'w') 
\layout LyX-Code
for x,y,z in myresults:
\layout LyX-Code
    print >> outfile, '%1.3f %1.3f %1.3f'%(x,y,z)
\layout Standard
Another really nice thing about file objects is that other classes can implement
 the file protcol and allow you to use them as if they were files.
 For example, the StringIO module in the standard library allows you to
 read and write to strings as if they were files.
 The urllib.urlopen function allows you to open a remove web page as a file
 object.
 Try this
\layout LyX-Code
\color blue
# loop over the lines in google's html
\layout LyX-Code
from urllib import urlopen
\layout LyX-Code
for line in urlopen('http://www.google.com').readlines():
\layout LyX-Code
    print line,
\the_end