2895 lines (2233 with data), 60.2 kB
#LyX 1.3 created this file. For more info see http://www.lyx.org/
\lyxformat 221
\textclass amsbook
\language english
\inputencoding auto
\fontscheme default
\graphics default
\paperfontsize default
\papersize Default
\paperpackage a4
\use_geometry 0
\use_amsmath 0
\use_natbib 0
\use_numerical_citations 0
\paperorientation portrait
\secnumdepth 3
\tocdepth 3
\paragraph_separation indent
\defskip medskip
\quotes_language english
\quotes_times 2
\papercolumns 1
\papersides 1
\paperpagestyle default
\layout Chapter
A whirlwind tour of python and the standard library
\begin_inset OptArg
collapsed true
\layout Standard
Python intro
\end_inset
\layout Standard
This is a quick-and-dirty introduction to the python language for the impatient
scientist.
There are many top notch, comprehensive introductions and tutorials for
python.
For absolute beginners, there is the
\shape italic
Python Beginner's Guide
\shape default
.
\begin_inset Foot
collapsed true
\layout Standard
http://www.python.org/moin/BeginnersGuide
\end_inset
The official
\shape italic
Python Tutorial
\shape default
can be read online
\begin_inset Foot
collapsed true
\layout Standard
http://docs.python.org/tut/tut.html
\end_inset
or downloaded
\begin_inset Foot
collapsed true
\layout Standard
http://docs.python.org/download.html
\end_inset
in a variety of formats.
There are over 100 python tutorials collected online.
\begin_inset Foot
collapsed true
\layout Standard
http://www.awaretek.com/tutorials.html
\end_inset
\layout Standard
There are also many excellent books.
Targetting newbies is Mark Pilgrim's
\shape italic
Dive into Python
\shape default
which in available in print and for free online
\begin_inset Foot
collapsed true
\layout Standard
http://diveintopython.org/toc/index.html
\end_inset
, though for absolute newbies even this may be too hard
\begin_inset LatexCommand \cite{Dive}
\end_inset
.
For experienced programmers, David Beasley's
\shape italic
Python Essential Reference
\shape default
is an excellent introduction to python, but is a bit dated since it only
covers python2.1
\begin_inset LatexCommand \cite{Beasley}
\end_inset
.
Likwise Alex Martelli's
\shape italic
Python in a Nutshell
\shape default
is highly regarded and a bit more current -- a 2nd edition is in the works
\begin_inset LatexCommand \cite{Nutshell}
\end_inset
.
And
\shape italic
The Python Cookbook
\shape default
is an extremely useful collection of python idioms, tips and tricks
\begin_inset LatexCommand \cite{Cookbook}
\end_inset
.
\layout Standard
But the typical scientist I encounter wants to solve a specific problem,
eg, to make a certain kind of graph, to numerically integrate an equation,
or to fit some data to a parametric model, and doesn't have the time or
interest to read several books or tutorials to get what they want.
This guide is for them: a short overview of the language to help them get
to what they want as quickly as possible.
We get to advanced material pretty quickly, so it may be touch sledding
if you are a python newbie.
Take in what you can, and if you start getting dizzy, skip ahead to the
next section; you can always come back to absorb more detail later, after
you get your real work done.
\layout Section
Hello Python
\layout Standard
Python is a dynamically typed, object oriented, interpreted language.
Interpreted means that your program interacts with the python interpreter,
similar to Matlab, Perl, Tcl and Java, and unlike FORTRAN, C, or C++ which
are compiled.
So let's fire up the python interpreter and get started.
I'm not going to cover installing python -- it's standard on most linux
boxes and for windows there is a friendly GUI installer.
To run the python interpreter, on windows, you can click
\family typewriter
Start->All Programs->Python 2.4->Python (command line)
\family default
or better yet, install
\family typewriter
ipython
\family default
, a python shell on steroids, and use that.
On linux / unix systems, you just need to type
\family typewriter
python
\family default
or
\family typewriter
ipython
\family default
at the command line.
The
\family typewriter
>>>
\family default
is the default python shell prompt, so don't type it in the examples below
\layout LyX-Code
>>> print 'hello world'
\layout LyX-Code
hello world
\layout LyX-Code
\layout Standard
As this example shows,
\shape italic
hello world
\shape default
in python is pretty easy -- one common phrase you hear in the python community
is that
\begin_inset Quotes eld
\end_inset
it fits your brain
\begin_inset Quotes erd
\end_inset
.
-- the basic idea is that coding in python feels natural.
Compare python's version with
\shape italic
hello world
\shape default
in C++
\layout LyX-Code
// C++
\layout LyX-Code
#include <iostream>
\layout LyX-Code
int main ()
\layout LyX-Code
{
\layout LyX-Code
std::cout << "Hello World" << std::endl;
\layout LyX-Code
return 0;
\layout LyX-Code
}
\layout Section
\begin_inset LatexCommand \label{sec:into_calculator}
\end_inset
Python is a calculator
\begin_inset OptArg
collapsed true
\layout Standard
Calculator
\end_inset
\layout Standard
Aside from my daughter's solar powered cash-register calculator, Python
is the only calculator I use.
From the python shell, you can type arbitrary arithmetic expressions.
\layout LyX-Code
>>> 2+2
\layout LyX-Code
4
\layout LyX-Code
>>> 2**10
\layout LyX-Code
1024
\layout LyX-Code
>>> 10/5
\layout LyX-Code
2
\layout LyX-Code
>>> 2+(24.3 + .9)/.24
\layout LyX-Code
107.0
\layout LyX-Code
>>> 2/3
\layout LyX-Code
0
\layout Standard
The last line is a standard newbie gotcha -- if both the left and right
operands are integers, python returns an integer.
To do floating point division, make sure at least one of the numbers is
a float
\layout LyX-Code
>>> 2.0/3
\layout LyX-Code
0.66666666666666663
\layout Standard
The distinction between integer and floating point division is a common
source of frustration among newbies and is slated for destruction in the
mythical Python 3000.
\begin_inset Foot
collapsed true
\layout Standard
Python 3000 is a future python release that will clean up several things
that Guido considers to be warts.
\end_inset
Since default integer division will be removed in the future, you can invoke
the time machine with the
\family typewriter
from __future__
\family default
directives; these directives allow python programmers today to use features
that will become standard in future releases but are not included by default
because they would break existing code.
From future directives should be among the first lines you type in your
python code if you are going to use them, otherwise they may not work.
The future division operator will assume floating point division by default,
\begin_inset Foot
collapsed false
\layout Standard
You may have noticed that 2/3 was represented as 0.66666666666666663 and
not 0.66666666666666666 as might be expected.
This is because computers are binary calculators, and there is no exact
binary representation of 2/3, just as there is no exact binary representation
of 0.1
\layout LyX-Code
>>> 0.1
\layout LyX-Code
0.10000000000000001
\layout Standard
Some languages try and hide this from you, but python is explicit.
\end_inset
and provides another operator // to do classic integer division.
\layout LyX-Code
>>> from __future__ import division
\layout LyX-Code
>>> 2/3
\layout LyX-Code
0.66666666666666663
\layout LyX-Code
>>> 2//3
\layout LyX-Code
0
\layout Standard
python has four basic numeric types: int, long, float and complex, but unlike
C++, BASIC, FORTRAN or Java, you don't have to declare these types.
python can infer them
\layout LyX-Code
>>> type(1)
\layout LyX-Code
<type 'int'>
\layout LyX-Code
>>> type(1.0)
\layout LyX-Code
<type 'float'>
\layout LyX-Code
>>> type(2**200)
\layout LyX-Code
<type 'long'>
\layout LyX-Code
\layout Standard
\begin_inset Formula $2^{200}$
\end_inset
is a huge number!
\layout LyX-Code
>>> 2**200
\layout LyX-Code
1606938044258990275541962092341162602522202993782792835301376L
\layout Standard
but python will blithely compute it and much larger numbers for you as long
as you have CPU and memory to handle them.
The integer type, if it overflows, will automatically convert to a python
\family typewriter
long
\family default
(as indicated by the appended
\family typewriter
L
\family default
in the output above) and has no built-in upper bound on size, unlike C/C++
longs.
\layout Standard
Python has built in support for complex numbers.
Eg, we can verify
\begin_inset Formula $i^{2}=-1$
\end_inset
\layout LyX-Code
>>> x = complex(0,1)
\layout LyX-Code
>>> x*x
\layout LyX-Code
(-1+0j)
\layout Standard
To access the real and imaginary parts of a complex number, use the
\family typewriter
real
\family default
and
\family typewriter
imag
\family default
attributes
\layout LyX-Code
>>> x.real
\layout LyX-Code
0.0
\layout LyX-Code
>>> x.imag
\layout LyX-Code
1.0
\layout Standard
If you come from other languages like Matlab, the above may be new to you.
In matlab, you might do something like this (>> is the standard matlab
shell prompt)
\layout LyX-Code
>> x = 0+j
\layout LyX-Code
x =
\layout LyX-Code
0.0000 + 1.0000i
\layout LyX-Code
\layout LyX-Code
>> real(x)
\layout LyX-Code
ans =
\layout LyX-Code
0
\layout LyX-Code
\layout LyX-Code
>> imag(x)
\layout LyX-Code
ans =
\layout LyX-Code
1
\layout LyX-Code
\layout LyX-Code
\layout Standard
That is, in Matlab, you use a
\shape italic
function
\shape default
to access the real and imaginary parts of the data, but in python these
are attributes of the complex object itself.
This is a core feature of python and other object oriented languages: an
object carries its data and methods around with it.
One might say:
\begin_inset Quotes eld
\end_inset
a complex number knows it's real and imaginary parts
\begin_inset Quotes erd
\end_inset
or
\begin_inset Quotes eld
\end_inset
a complex number knows how to take its conjugate
\begin_inset Quotes erd
\end_inset
, you don't need external functions for these operations
\layout LyX-Code
>>> x.conjugate
\layout LyX-Code
<built-in method conjugate of complex object at 0xb6a62368>
\layout LyX-Code
>>> x.conjugate()
\layout LyX-Code
-1j
\layout Standard
On the first line, I just followed along from the example above with
\family typewriter
real
\family default
and
\family typewriter
imag
\family default
and typed
\family typewriter
x.conjugate
\family default
and python printed the representation
\family typewriter
<built-in method conjugate of complex object at 0xb6a62368>.
\family default
This means that
\family typewriter
conjugate
\family default
is a
\shape italic
method
\shape default
, a.k.a a function, and in python we need to use parentheses to call a function.
If the method has arguments, like the
\family typewriter
x
\family default
in
\family typewriter
sin(x)
\family default
, you place them inside the parentheses, and if it has no arguments, like
\family typewriter
conjugate
\family default
, you simply provide the open and closing parentheses.
\family typewriter
real
\family default
,
\family typewriter
imag
\family default
and
\family typewriter
conjugate
\family default
are attributes of the complex object, and
\family typewriter
conjugate
\family default
is a
\shape italic
callable
\shape default
attribute, known as a
\shape italic
method
\shape default
.
\layout Standard
OK, now you are an object oriented programmer.
There are several key ideas in object oriented programming, and this is
one of them: an object carries around with it data (simple attributes)
and methods (callable attributes) that provide additional information about
the object and perform services.
It's one stop shopping -- no need to go to external functions and libraries
to deal with it -- the object knows how to deal with itself.
\layout Section
Accessing the standard library
\begin_inset OptArg
collapsed true
\layout Standard
Standard Library
\end_inset
\layout Standard
Arithmetic is fine, but before long you may find yourself tiring of it and
wanting to compute logarithms and exponents, sines and cosines
\layout LyX-Code
>>> log(10)
\layout LyX-Code
Traceback (most recent call last):
\layout LyX-Code
File "<stdin>", line 1, in ?
\layout LyX-Code
NameError: name 'log' is not defined
\layout Standard
These functions are not built into python, but don't despair, they are built
into the python standard library.
To access a function from the standard library, or an external library
for that matter, you must import it.
\layout LyX-Code
>>> import math
\layout LyX-Code
>>> math.log(10)
\layout LyX-Code
2.3025850929940459
\layout LyX-Code
>>> math.sin(math.pi)
\layout LyX-Code
1.2246063538223773e-16
\layout Standard
Note that the default
\family typewriter
log
\family default
function is a base 2 logarithm (use
\family typewriter
math.log10
\family default
for base 10 logs) and that floating point math is inherently imprecise,
since analytically
\begin_inset Formula $\sin(\pi)=0$
\end_inset
.
\layout Standard
It's kind of a pain to keep typing
\family typewriter
math.log
\family default
and
\family typewriter
math.sin
\family default
and
\family typewriter
math.p
\family default
i, and python is accomodating.
There are additional forms of
\family typewriter
import
\family default
that will let you save more or less typing depending on your desires
\layout LyX-Code
\color blue
# Appreviate the module name: m is an alias
\layout LyX-Code
>>> import math as m
\layout LyX-Code
>>> m.cos(2*m.pi)
\layout LyX-Code
1.0
\layout LyX-Code
\layout LyX-Code
\color blue
# Import just the names you need
\layout LyX-Code
>>> from math import exp, log
\layout LyX-Code
>>> log(exp(1))
\layout LyX-Code
1.0
\layout LyX-Code
\layout LyX-Code
\color blue
# Import everything - use with caution!
\layout LyX-Code
>>> from math import *
\layout LyX-Code
>>> sin(2*pi*10)
\layout LyX-Code
-2.4492127076447545e-15
\layout Standard
To help you learn more about what you can find in the math library, python
has nice introspection capabilities -- introspection is a way of asking
an object about itself.
For example, to find out what is available in the math library, we can
get a directory of everything available with the
\family typewriter
dir
\family default
command
\begin_inset Foot
collapsed false
\layout Standard
In addition to the introdpection and help provided in the python interpreter,
the official documentation of the python standard library is very good
and up-to-date http://docs.python.org/lib/lib.html .
\end_inset
\layout LyX-Code
>>> dir(math)
\layout LyX-Code
['__doc__', '__file__', '__name__', 'acos', 'asin', 'atan', 'atan2', 'ceil',
'cos', 'cosh', 'degrees', 'e', 'exp', 'fabs', 'floor', 'fmod', 'frexp',
'hypot', 'ldexp', 'log', 'log10', 'modf', 'pi', 'pow', 'radians', 'sin',
'sinh', 'sqrt', 'tan', 'tanh']
\layout Standard
This gives us just a listing of the names that are in the math module --
they are fairly self descriptive, but if you want more, you can call
\family typewriter
help
\family default
on any of these functions for more information
\layout LyX-Code
>>> help(math.sin)
\layout LyX-Code
Help on built-in function sin:
\layout LyX-Code
sin(...)
\layout LyX-Code
sin(x)
\layout LyX-Code
Return the sine of x (measured in radians).
\layout Standard
and for the whole math library
\layout LyX-Code
>>> help(math)
\layout LyX-Code
Help on module math:
\layout LyX-Code
\layout LyX-Code
NAME
\layout LyX-Code
math
\layout LyX-Code
\layout LyX-Code
FILE
\layout LyX-Code
/usr/local/lib/python2.3/lib-dynload/math.so
\layout LyX-Code
\layout LyX-Code
DESCRIPTION
\layout LyX-Code
This module is always available.
It provides access to the
\layout LyX-Code
mathematical functions defined by the C standard.
\layout LyX-Code
\layout LyX-Code
FUNCTIONS
\layout LyX-Code
acos(...)
\layout LyX-Code
acos(x)
\layout LyX-Code
\layout LyX-Code
Return the arc cosine (measured in radians) of x.
\layout LyX-Code
\layout LyX-Code
asin(...)
\layout LyX-Code
asin(x)
\layout LyX-Code
\layout LyX-Code
Return the arc sine (measured in radians) of x.
\layout LyX-Code
\layout Standard
And much more which is snipped.
Likewise, we can get information on the complex object in the same way
\layout LyX-Code
>>> x = complex(0,1)
\layout LyX-Code
>>> dir(x)
\layout LyX-Code
['__abs__', '__add__', '__class__', '__coerce__', '__delattr__', '__div__',
'__divmod__', '__doc__', '__eq__', '__float__', '__floordiv__', '__ge__',
'__getattribute__', '__getnewargs__', '__gt__', '__hash__', '__init__',
'__int__', '__le__', '__long__', '__lt__', '__mod__', '__mul__', '__ne__',
'__neg__', '__new__', '__nonzero__', '__pos__', '__pow__', '__radd__',
'__rdiv__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloor
div__', '__rmod__', '__rmul__', '__rpow__', '__rsub__', '__rtruediv__',
'__setattr__', '__str__', '__sub__', '__truediv__', 'conjugate', 'imag',
'real']
\layout LyX-Code
\layout Standard
Notice that called
\family typewriter
dir
\family default
or
\family typewriter
help
\family default
on the
\family typewriter
math
\family default
\shape italic
module
\shape default
, the
\family typewriter
math.sin
\family default
\shape italic
function
\shape default
, and the
\family typewriter
complex
\family default
\shape italic
number
\shape default
\family typewriter
x
\family default
.
That's because modules, functions and numbers are all
\shape italic
objects
\shape default
, and we use the same object introspection and help capabilites on them.
We can find out what type of object they are by calling
\family typewriter
type
\family default
on them, which is another function in python's introspection arsenal
\layout LyX-Code
>>> type(math)
\layout LyX-Code
<type 'module'>
\layout LyX-Code
>>> type(math.sin)
\layout LyX-Code
<type 'builtin_function_or_method'>
\layout LyX-Code
>>> type(x)
\layout LyX-Code
<type 'complex'>
\layout LyX-Code
\layout Standard
Now, you may be wondering: what were all those god-awful looking double
underscore methods, like
\family typewriter
__abs__
\family default
and
\family typewriter
__mul__
\family default
in the
\family typewriter
dir
\family default
listing of the complex object above? These are methods that define what
it means to be a numeric type in python, and the complex object implements
these methods so that complex numbers act like the way should, eg
\family typewriter
__mul__
\family default
implements the rules of complex multiplication.
The nice thing about this is that python specifies an application programming
interface (API) that is the definition of what it means to be a number
in python.
And this means you can define your own numeric types, as long as you implement
the required special double underscore methods for your custom type.
double underscore methods are very important in python; although the typical
newbie never sees them or thinks about them, they are there under the hood
providing all the python magic, and more importantly, showing the way to
let you make magic.
\layout Section
\begin_inset LatexCommand \label{sec:intro_string}
\end_inset
Strings
\layout Standard
We've encountered a number of types of objects above: int, float, long,
complex, method/function and module.
We'll continue our tour with an introduction to strings, which are critical
components of almost every program.
You can create strings in a number of different ways, with single quotes,
double quotes, or triple quotes -- this diversity of methods makes it easy
if you need to embed string characters in the string itself
\layout LyX-Code
\color blue
# single, double and triple quoted strings
\layout LyX-Code
>>> s = 'Hi Mom!'
\layout LyX-Code
>>> s = "Hi Mom!"
\layout LyX-Code
>>> s = """Porky said, "That's all folks!" """
\layout Standard
You can add strings together to concatenate them
\layout LyX-Code
\color blue
# concatenating strings
\layout LyX-Code
>>> first = 'John'
\layout LyX-Code
>>> last = 'Hunter'
\layout LyX-Code
>>> first+last
\layout LyX-Code
'JohnHunter'
\layout Standard
or call string methods to process them: upcase them or downcase them, or
replace one character with another
\layout LyX-Code
\color blue
# string methods
\layout LyX-Code
>>> last.lower()
\layout LyX-Code
'hunter'
\layout LyX-Code
>>> last.upper()
\layout LyX-Code
'HUNTER'
\layout LyX-Code
>>> last.replace('h', 'p')
\layout LyX-Code
'Hunter'
\layout LyX-Code
>>> last.replace('H', 'P')
\layout LyX-Code
'Punter'
\layout Standard
Note that in all of these examples, the string
\family typewriter
last
\family default
is unchanged.
All of these methods operate on the string and return a new string, leaving
the original unchanged.
In fact, python strings cannot be changed by any python code at all: they
are
\shape italic
immutable
\shape default
(unchangeable).
The concept of mutable and immutable objects in python is an important
one, and it will come up again, because only immutable objects can be used
as keys in python dictionaries and elements of python sets.
\layout Standard
You can access individual characters, or slices of the string (substrings),
using indexing.
A string in sequence of characters, and strings implement the sequence
protocol in python -- we'll see more examples of python sequences later
-- and all sequences have the same syntax for accessing their elements.
Python uses 0 based indexing which means the first element is at index
0; you can use negative indices to access the last elements in the sequence
\layout LyX-Code
\color blue
# string indexing
\layout LyX-Code
>>> last = 'Hunter'
\layout LyX-Code
>>> last[0]
\layout LyX-Code
'H'
\layout LyX-Code
>>> last[1]
\layout LyX-Code
'u'
\layout LyX-Code
>>> last[-1]
\layout LyX-Code
'r'
\layout Standard
To access substrings, or generically in terms of the sequence protocol,
slices, you use a colon to indicate a range
\layout LyX-Code
\color blue
# string slicing
\layout LyX-Code
>>> last[0:2]
\layout LyX-Code
'Hu'
\layout LyX-Code
>>> last[2:4]
\layout LyX-Code
'nt'
\layout Standard
As this example shows, python uses
\begin_inset Quotes eld
\end_inset
one-past-the-end
\begin_inset Quotes erd
\end_inset
indexing when defining a range; eg, in the range
\family typewriter
indmin:indmax
\family default
, the element of
\family typewriter
imax
\family default
is not included.
You can use negative indices when slicing too; eg, to get everything before
the last character
\layout LyX-Code
>>> last[0:-1]
\layout LyX-Code
'Hunte'
\layout Standard
You can also leave out either the min or max indicator; if they are left
out, 0 is assumed to be the
\family typewriter
indmin
\family default
and one past the end of the sequence is assumed to be
\family typewriter
indmax
\layout LyX-Code
>>> last[:3]
\layout LyX-Code
'Hun'
\layout LyX-Code
>>> last[3:]
\layout LyX-Code
'ter'
\layout Standard
There is a third number that can be placed in a slice, a step, with syntax
indmin:indmax:step; eg, a step of 2 will skip every second letter
\layout LyX-Code
>>> last[1:6:2]
\layout LyX-Code
'utr'
\layout Standard
Although this may be more that you want to know about slicing strings, the
time spent here is worthwhile.
As mentioned above, all python sequences obey these rules.
In addition to strings, lists and tuples, which are built-in python sequence
data types and are discussed in the next section, the numeric arrays widely
used in scientific computing also implement the sequence protocol, and
thus have the same slicing rules.
\layout Exercise
What would you expect last[:] to return?
\layout Standard
One thing that comes up all the time is the need to create strings out of
other strings and numbers, eg to create filenames from a combination of
a base directory, some base filename, and some numbers.
Scientists like to create lots of data files like and then write code to
loop over these files and analyze them.
We're going to show how to do that, starting with the newbie way and progressiv
ely building up to the way of python zen master.
All of the methods below
\shape italic
work
\shape default
, but the zen master way will more efficient, more scalable (eg to larger
numbers of files) and cross-platform.
\begin_inset Foot
collapsed false
\layout Standard
\begin_inset Quotes eld
\end_inset
But it works
\begin_inset Quotes erd
\end_inset
is a common defense of bad code; my rejoinder to this is
\begin_inset Quotes eld
\end_inset
A computer scientist is someone who fixes things that aren't broken
\begin_inset Quotes erd
\end_inset
.
\end_inset
Here's the newbie way: we also introduce the for-loop here in the spirit
of diving into python -- note that python uses whitespace indentation to
delimit the for-loop code block
\layout LyX-Code
\color blue
# The newbie way
\layout LyX-Code
for i in (1,2,3,4):
\layout LyX-Code
fname = 'data/myexp0' + str(i) + '.dat'
\layout LyX-Code
print fname
\layout Standard
Now as promised, this will print out the 4 file names above, but it has
three flaws: it doesn't scale to 10 or more files, it is inefficient, and
it is not cross platform.
It doesn't scale because it hard-codes the '
\family typewriter
0
\family default
' after
\family typewriter
myexp
\family default
, it is inefficient because to add several strings requires the creation
of temporary strings, and it is not cross-platform because it hard-codes
the directory separator '/'.
\layout LyX-Code
\color blue
# On the path to elightenment
\layout LyX-Code
for i in (1,2,3,4):
\layout LyX-Code
fname = 'data/myexp%02d.dat'%i
\layout LyX-Code
print fname
\layout Standard
This example uses string interpolation, the funny % thing.
If you are familiar with C programming, this will be no surprise to you
(on linux/unix systems do
\family typewriter
man sprintf
\family default
at the unix shell).
The percent character is a string formatting character:
\family typewriter
%02d
\family default
means to take an integer (the
\family typewriter
d
\family default
part) and print it with two digits, padding zero on the left (the
\family typewriter
%02
\family default
part).
There is more to be said about string interpolation, but let's finish the
job at hand.
This example is better than the newbie way because is scales up to files
numbered 0-99, and it is more efficient because it avoids the creation
of temporary strings.
For the platform independent part, we go to the python standard library
\family typewriter
os.path
\family default
, which provides a host of functions for platform-independent manipulations
of filenames, extensions and paths.
Here we use
\family typewriter
os.path.join
\family default
to combine the directory with the filename in a platform independent way.
On windows, it will use the windows path separator '
\backslash
' and on unix it will use '/'.
\layout LyX-Code
\color blue
# the zen master approach
\layout LyX-Code
import os
\layout LyX-Code
for i in (1,2,3,4):
\layout LyX-Code
fname = os.path.join('data', 'myexp%02d.dat'%i)
\layout LyX-Code
print fname
\layout Exercise
Suppose you have data files named like
\layout LyX-Code
data/2005/exp0100.dat
\layout LyX-Code
data/2005/exp0101.dat
\layout LyX-Code
data/2005/exp0102.dat
\layout LyX-Code
...
\layout LyX-Code
data/2005/exp1000.dat
\layout Standard
Write the python code that iterates over these files, constructing the filenames
as strings in using
\family typewriter
os.path.join
\family default
to construct the paths in a platform-independent way.
\shape italic
Hint
\shape default
: read the help for
\family typewriter
os.path.join
\family default
!
\layout Standard
OK, I promised to torture you a bit more with string interpolation -- don't
worry, I remembered.
The ability to properly format your data when printing it is crucial in
scientific endeavors: how many signficant digits do you want, do you want
to use integer, floating point representation or exponential notation?
These three choices are provided with
\family typewriter
%d
\family default
,
\family typewriter
%f
\family default
and
\family typewriter
%e
\family default
, with lots of variations on the theme to indicate precision and more
\layout LyX-Code
>>> 'warm for %d minutes at %1.1f C' % (30, 37.5)
\layout LyX-Code
'warm for 30 minutes at 37.5 C'
\layout LyX-Code
\layout LyX-Code
>>> 'The mass of the sun is %1.4e kg'% (1.98892*10**30)
\layout LyX-Code
'The mass of the sun is 1.9889e+30 kg'
\layout LyX-Code
\layout Standard
There are two string methods,
\family typewriter
split
\family default
and
\family typewriter
join
\family default
, that arise frequenctly in Numeric processing, specifically in the context
of processing data files that have comma, tab, or space separated numbers
in them.
\family typewriter
split
\family default
takes a single string, and splits it on the indicated character to a sequence
of strings.
This is useful to take a single line of space or comma separated values
and split them into individual numbers
\layout LyX-Code
\color blue
# s is a single string and we split it into a list of strings
\layout LyX-Code
\color blue
# for further processing
\layout LyX-Code
>>> s = '1.0 2.0 3.0 4.0 5.0'
\layout LyX-Code
>>> s.split(' ')
\layout LyX-Code
['1.0', '2.0', '3.0', '4.0', '5.0']
\layout Standard
The return value, with square brackets, indicates that python has returned
a list of strings.
These individual strings need further processing to convert them into actual
floats, but that is the first step.
The conversion to floats will be discussed in the next session, when we
learn about list comprehensions.
The converse method is join, which is often used to create string output
to an ASCII file from a list of numbers.
In this case you want to join a list of numbers into a single line for
printing to a file.
The example below will be clearer after the next section, in which lists
are discussed
\layout LyX-Code
\color blue
# vals is a list of floats and we convert it to a single
\layout LyX-Code
\color blue
# space separated string
\layout LyX-Code
>>> vals = [1.0, 2.0, 3.0, 4.0, 5.0]
\layout LyX-Code
>>> ' '.join([str(val) for val in vals])
\layout LyX-Code
'1.0 2.0 3.0 4.0 5.0'
\layout Standard
There are two new things in the example above.
One, we called the join method directly on a string itself, and not on
a variable name.
Eg, in the previous examples, we always used the name of the object when
accessing attributes, eg
\family typewriter
x.real
\family default
or
\family typewriter
s.upper()
\family default
.
In this example, we call the
\family typewriter
join
\family default
method on the string which is a single space.
The second new feature is that we use a list comprehension
\family typewriter
[str(val) for val in vals]
\family default
as the argument to
\family typewriter
join
\family default
.
\family typewriter
join
\family default
requires a sequence of strings, and the list comprehension converts a list
of floats to a strings.
This can be confusing at first, so don't dispair if it is.
But it is worth bringing up early because list comprehensions are a very
useful feature of python.
To help elucidate, compare
\family typewriter
vals
\family default
, which is a list of floats, with the conversion of
\family typewriter
vals
\family default
to a list of strings using list comprehensions in the next line
\layout LyX-Code
\color blue
# converting a list of floats to a list of strings
\layout LyX-Code
>>> vals
\layout LyX-Code
[1.0, 2.0, 3.0, 4.0, 5.0]
\layout LyX-Code
>>> [str(val) for val in vals]
\layout LyX-Code
['1.0', '2.0', '3.0', '4.0', '5.0']
\layout Section
The basic python data structures
\begin_inset OptArg
collapsed true
\layout Standard
Data Structures
\end_inset
\layout Standard
Strings, covered in the last section, are sequences of characters.
python has two additional built-in sequence types which can hold arbitrary
elements: tuples and lists.
tuples are created using parentheses, and lists are created using square
brackets
\layout LyX-Code
\color blue
# a tuple and a list of elements of the same type
\layout LyX-Code
\color blue
# (homogeneous)
\layout LyX-Code
>>> t = (1,2,3,4) # tuple
\layout LyX-Code
>>> l = [1,2,3,4] # list
\layout Standard
Both tuples and lists can also be used to hold elements of different types
\layout LyX-Code
\color blue
# a tuple and list of int, string, float
\layout LyX-Code
>>> t = (1,'john', 3.0)
\layout LyX-Code
>>> l = [1,'john', 3.0]
\layout Standard
Tuples and lists have the same indexing and slicing rules as each other,
and as string discussed above, because both implement the python sequence
protocol, with the only difference being that tuple slices return tuples
(indicated by the parentheses below) and list slices return lists (indicated
by the square brackets)
\layout LyX-Code
# indexing and slicing tuples and lists
\layout LyX-Code
>>> t[0]
\layout LyX-Code
1
\layout LyX-Code
>>> l[0]
\layout LyX-Code
1
\layout LyX-Code
>>> t[:-1]
\layout LyX-Code
(1, 'john')
\layout LyX-Code
>>> l[:-1]
\layout LyX-Code
[1, 'john']
\layout Standard
So why the difference between tuples and lists? A number of explanations
have been offered on the mailing lists, but the only one that makes a differenc
e to me is that tuples are immutable, like strings, and hence can be used
as keys to python dictionaries and included as elements of sets, and lists
are mutable, and cannot.
So a tuple, once created, can never be changed, but a list can.
For example, if we try to reassign the first element of the tuple above,
we get an error
\layout LyX-Code
>>> t[0] = 'why not?'
\layout LyX-Code
Traceback (most recent call last):
\layout LyX-Code
File "<stdin>", line 1, in ?
\layout LyX-Code
TypeError: object doesn't support item assignment
\layout Standard
But the same operation is perfectly accetable for lists
\layout LyX-Code
>>> l[0] = 'why not?'
\layout LyX-Code
>>> l
\layout LyX-Code
['why not?', 'john', 3.0]
\layout Standard
lists also have a lot of methods, tuples have none, save the special double
underscore methods that are required for python objects and sequences
\layout LyX-Code
\color blue
# tuples contain only
\begin_inset Quotes eld
\end_inset
hidden
\begin_inset Quotes erd
\end_inset
double underscore methods
\layout LyX-Code
>>> dir(t)
\layout LyX-Code
['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__',
'__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__getslice__',
'__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__',
'__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__',
'__rmul__', '__setattr__', '__str__']
\layout LyX-Code
\layout LyX-Code
\color blue
# but lists contain other methods, eg append, extend and
\layout LyX-Code
\color blue
# reverse
\layout LyX-Code
>>> dir(l)
\layout LyX-Code
['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__delsli
ce__', '__doc__', '__eq__', '__ge__', '__getattribute__', '__getitem__',
'__getslice__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__',
'__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__',
'__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr__', '__setite
m__', '__setslice__', '__str__', 'append', 'count', 'extend', 'index', 'insert',
'pop', 'remove', 'reverse', 'sort']
\layout Standard
Many of these list methods change, or mutate, the list, eg append adds an
element to the list
\family typewriter
: extend
\family default
extends the list with a sequence of elements,
\family typewriter
sort
\family default
sorts the list in place,
\family typewriter
reverse
\family default
reverses it in place,
\family typewriter
pop
\family default
takes an element off the list and returns it.
\layout Standard
We've seen a couple of examples of creating a list above -- let's look at
some more using list methods
\layout LyX-Code
>>> x = []
\color blue
# create the empty list
\layout LyX-Code
>>> x.append(1)
\color blue
# add the integer one to it
\layout LyX-Code
>>> x.extend(['hi', 'mom'])
\color blue
# append two strings to it
\layout LyX-Code
>>> x
\layout LyX-Code
[1, 'hi', 'mom']
\layout LyX-Code
>>> x.reverse()
\color blue
# reverse the list, in place
\layout LyX-Code
>>> x
\layout LyX-Code
['mom', 'hi', 1]
\layout LyX-Code
>>> len(x)
\layout LyX-Code
3
\layout Standard
We mentioned list comprehensions in the last section when discussing string
methods.
List comprehensions are a way of creating a list using a for loop in a
single line of python.
Let's create a list of the perfect cubes from 1 to 10, first with a for
loop and then with a list comprehension.
The list comprehension code will not only be shorter and more elegant,
it can be much faster (the dots are the indentation block indicator from
the python shell and should not be typed)
\layout LyX-Code
\color blue
# a list of perfect cubes using a for-loop
\layout LyX-Code
>>> cubes = []
\layout LyX-Code
>>> for i in range(1,10):
\layout LyX-Code
...
cubes.append(i**3)
\layout LyX-Code
...
\layout LyX-Code
>>> cubes
\layout LyX-Code
[1, 8, 27, 64, 125, 216, 343, 512, 729]
\layout LyX-Code
\layout LyX-Code
\color blue
# functionally equivalent code using list comprehensions
\layout LyX-Code
>>> cubes = [i**3 for i in range(1,10)]
\layout LyX-Code
>>> cubes
\layout LyX-Code
[1, 8, 27, 64, 125, 216, 343, 512, 729]
\layout Standard
The list comprehension code is faster because it all happens at the C level.
In the simple for-loop version, the python expression which appends the
cube of
\family typewriter
i
\family default
has to be evaluated by the python interpreter for each element of the loop.
In the list comprehension example, the single line is parsed once and executed
at the C level.
The difference in speed can be considerable, and the list comprehension
example is shorter and more elegant to boot.
\layout Standard
The remaining essential built-in data strucuture in python is the dictionary,
which is an associative array that maps arbitrary immutable objects to
arbitrary objects.
int, long, float, string and tuple are all immutable and can be used as
keys; to a dictionary list and dict are mutable and cannot.
A dictionary takes one kind of object as the key, and this key points to
another object which is the value.
In a contrived but easy to comprehent examples, one might map names to
ages
\layout LyX-Code
>>> ages = {}
\color blue
# create an empty dict
\layout LyX-Code
>>> ages['john'] = 36
\layout LyX-Code
>>> ages['fernando'] = 33
\layout LyX-Code
>>> ages
\color blue
# view the whole dict
\layout LyX-Code
{'john': 36, 'fernando': 33}
\layout LyX-Code
>>> ages['john']
\layout LyX-Code
36
\layout LyX-Code
>>> ages['john'] = 37
\color blue
# reassign john's age
\layout LyX-Code
>>> ages['john']
\layout LyX-Code
37
\layout Standard
Dictionary lookup is very fast; Tim Peter's once joked that any python program
which uses a dictionary is automatically 10 times faster than any C program,
which is of course false, but makes two worthy points in jest: dictionary
lookup is fast, and dictionaries can be used for important optimizations,
eg, creating a cache of frequently used values.
As a simple eaxample, suppose you needed to compute the product of two
numbers between 1 and 100 in an inner loop -- you could use a dictionary
to cache the cube of all odd of numbers < 100; if you were inteterested
in all numbers, you might simply use a list to store the cached cubes --
I am cacheing only the odd numbers to show you how a dictionary can be
used to represent a sparse data structure
\layout LyX-Code
\layout LyX-Code
>>> cubes = dict([ ( i, i**3 ) for i in range(1,100,2)])
\layout LyX-Code
>>> cubes[5]
\layout LyX-Code
125
\layout Standard
The last example is syntactically a bit challenging, but bears careful study.
We are initializing a dictionary with a list comprehension.
The list comprehension is made up of length 2 tuples
\family typewriter
( i, i**3
\family default
).
When a dictionary is initialized with a sequence of length 2 tuples, it
assumes the first element of the tuple
\family typewriter
i
\family default
is the
\shape italic
key
\shape default
and the second element i**3is the
\shape italic
value
\shape default
.
Thus we have a lookup table from odd integers to to cube.
Creating dictionaries from list comprehensions as in this example is something
that hard-core python programmers do almost every day, and you should too.
\layout Exercise
Create a lookup table of the product of all pairs of numbers less than 100.
The key will be a tuple of the two numbers
\family typewriter
(i,j)
\family default
and the value will be the product.
Hint: you can loop over multiple ranges in a list comprehension, eg
\family typewriter
[ something for i in range(Ni) for j in range(Nj)]
\layout Section
The Zen of Python
\begin_inset OptArg
collapsed true
\layout Standard
Zen
\end_inset
\layout Exercise
\family typewriter
>>> import this
\layout Section
Functions and classes
\layout Standard
You can define functions just about anywhere in python code.
The typical function definition takes zero or more arguments, zero or more
keyword arguments, and is followed by a documentation string and the function
definition, optionally returing a value.
Here is a function to compute the hypoteneuse of a right triange
\layout LyX-Code
def hypot(base, height):
\layout LyX-Code
'compute the hypoteneuse of a right triangle'
\layout LyX-Code
import math
\layout LyX-Code
return math.sqrt(base**2 + height**2)
\layout Standard
As in the case of the for-loop, leading white space is significant and is
used to delimt the start and end of the function.
In the example below, x = 1 is not in the function, because it is not indented
\layout LyX-Code
def growone(l):
\layout LyX-Code
'append 1 to a list l'
\layout LyX-Code
l.append(1)
\layout LyX-Code
x = 1
\layout Standard
Note that this function does not return anything, because the append method
modifies the list that was passed in.
You should be careful when designing functions that have side effects such
as modifying the structures that are passed in; they should be named and
documented in such a way that these side effects are clear.
\layout Standard
Python is pretty flexible with functions: you can define functions within
function definitions (just be mindful of your indentation), you can attach
attributes to functions (like other objects), you can pass functions as
arguments to other functions.
A function keyword argument defines a default value for a function that
can be overridden.
Below is an example which provides a normalize keyword argument.
The default argument is
\family typewriter
normalize=None
\family default
; the value None is a standard python idiom which usually means either do
the default thing or do nothing.
If
\family typewriter
normalize
\family default
is not
\family typewriter
None
\family default
, we assume it is a function that can be called to normalize our data
\layout LyX-Code
def psd(x, normalize=None):
\layout LyX-Code
'compute the power spectral density of x'
\layout LyX-Code
if normalize is not None: x = normalize(x)
\layout LyX-Code
\color blue
# compute the power spectra of x and return it
\layout Standard
This function could be called with or without a
\family typewriter
normalize
\family default
keyword argument, since if the argument is not passed, the default of
\family typewriter
None
\family default
is used and no normalization is done.
\layout LyX-Code
\layout LyX-Code
\color blue
# no normalize argument; do the default thing
\layout LyX-Code
>>> psd(x)
\layout LyX-Code
\layout LyX-Code
\color blue
# define a custom normalize function unitstd as pass it
\layout LyX-Code
\color blue
# to psd
\layout LyX-Code
>>> def unitstd(x): return x/std(x)
\layout LyX-Code
>>> psd(x, normalize=unitstd)
\layout LyX-Code
\layout Standard
In Section
\begin_inset LatexCommand \ref{sec:into_calculator}
\end_inset
we noticed that complex objects have the real and imag data attributes,
and the conjugate method.
An object is an instance of a class that defines it, and in python you
can easily define your own classes.
In that section, we emphasized that one of the important features of a
classes/objects is that they carry around their data and methods in a single
bundle.
Let's look at the mechnics of defining classes, and creating instances
(a.k.a.
objects) of these classes.
Classes have a special double underscore method __init__ that is used as
the function to initialize the class.
For this example, we'll continue with the normalize theme above, but in
this case the normalization requires some data parameters.
This example arises when you want to normalize an image which may range
over 0-255 (8 bit image) or from 0-65535 (16 bit image) to the 0-1 interval.
For 16 bit images, you would normally divide everything by 65525, but you
might want to configure this to a smaller number if your data doesn't use
the whole intensity range to enhance contrast.
For simplicitly, let's suppose our normalize class is only interested in
the pixel maximum, and will divide all the data by that value.
\layout LyX-Code
from __future__ import division
\color blue
# make sure we do float division
\layout LyX-Code
class Normalize:
\layout LyX-Code
"""
\layout LyX-Code
A class to normalize data by dividing it by a maximum value
\layout LyX-Code
"""
\layout LyX-Code
def __init__(self, maxval):
\layout LyX-Code
'maxval will be mapped to 1'
\layout LyX-Code
self.maxval = maxval
\layout LyX-Code
def __call__(self, data):
\layout LyX-Code
'do the normalization'
\layout LyX-Code
\color blue
# in real life you would also want to clip all values of
\layout LyX-Code
\color blue
# data>maxval so that the returned value will be in the unit
\layout LyX-Code
\color blue
# interval
\layout LyX-Code
return data/self.maxval
\layout Standard
The triple quoted string following the definition of class Normalize is
the class documentation stringd, and it will bre shown to the user when
they do
\family typewriter
help(Normalize)
\family default
.
A commonly used convention is to name classes with
\shape italic
UpperCase
\shape default
, but this is not required.
self is a special variable that a class can use to refer to its own data
and methods, and must be the first argument to all the class methods.
The
\family typewriter
__init__
\family default
method stores the normalization value maxval as a class attribute in
\family typewriter
self.maxval
\family default
, and this value can later be reused by other class methods (as it is in
\family typewriter
__call__
\family default
) and it can be altered by the user of the class, as will illustrate below.
The
\family typewriter
__call__
\family default
method is another piece of python double underscore magic, it allows class
instances to be used as
\shape italic
functions
\shape default
, eg you can call them just like you can call any function.
OK, now let's see how you could use this.
\layout Standard
The first line use used to create an
\shape italic
instance
\shape default
of the
\shape italic
class
\shape default
\family typewriter
Normalize
\family default
, and the special method
\family typewriter
__init__
\family default
is implicitly called.
The second line implicitly calls the special
\family typewriter
__call__
\family default
method
\layout LyX-Code
>>> norm = Normalize(65356)
\color blue
# good for 16 bit images
\layout LyX-Code
>>> norm(255)
\color blue
# call this function
\layout LyX-Code
0.0039017075708427688
\layout LyX-Code
\layout LyX-Code
\color blue
# We can reset the maxval attribute, and the call method
\layout LyX-Code
\color blue
# is automagically updated
\layout LyX-Code
>>> norm.maxval = 255
\color blue
# reset the maxval
\layout LyX-Code
>>> norm(255)
\color blue
# and call it again
\layout LyX-Code
1.0
\layout LyX-Code
\layout LyX-Code
\color blue
# We can pass the norm instance to the psd function we defined above, which
\layout LyX-Code
\color blue
# is expecting a function
\layout LyX-Code
>>> pdf(X, normalize=norm)
\layout Exercise
Pretend that
\family typewriter
complex
\family default
were not built-in to the python core, and write your own complex class
\family typewriter
MyComplex
\family default
.
Provide
\family typewriter
real
\family default
and
\family typewriter
imag
\family default
attributes and the
\family typewriter
conjugate
\family default
method.
Define
\family typewriter
__abs__
\family default
,
\family typewriter
__mul__
\family default
and
\family typewriter
__add__
\family default
to implement the absolute value of complex numbers, multiplication of complex
numbers and addition of complex numbers.
See the API definition of the python number protocol; although this is
written for C programmers, it contains information about the required function
call signatures for each of the double underscore methods that define the
number protocol in python; where they use
\family typewriter
o1
\family default
on that page, you would use
\family typewriter
self
\family default
in python, and where they use
\family typewriter
o2
\family default
you might use
\family typewriter
other
\family default
in python.
\begin_inset Foot
collapsed true
\layout Standard
http://www.python.org/doc/current/api/number.html
\end_inset
To get you started, I'll show you what the
\family typewriter
__add__
\family default
method should look like
\layout LyX-Code
\color blue
# An example double underscore method required in your MyComplex
\layout LyX-Code
\color blue
# implementation
\layout LyX-Code
def __add__(self, other):
\layout LyX-Code
'add self to other and return a new MyComplex instance'
\layout LyX-Code
r = self.real + other.real
\layout LyX-Code
i = self.imag + other.imag
\layout LyX-Code
return MyComplex(r,i)
\layout LyX-Code
\layout LyX-Code
\color blue
# When you are finished, test your implementation with
\layout LyX-Code
>>> x = MyComplex(2,3)
\layout LyX-Code
>>> y = MyComplex(0,1)
\layout LyX-Code
>>> x.real
\layout LyX-Code
2.0
\layout LyX-Code
>>> y.imag
\layout LyX-Code
1.0
\layout LyX-Code
>>> x.conjugate()
\layout LyX-Code
(2-3j)
\layout LyX-Code
>>> x+y
\layout LyX-Code
(2+4j)
\layout LyX-Code
>>> x*y
\layout LyX-Code
(-3+2j)
\layout LyX-Code
>>> abs(x*y)
\layout LyX-Code
3.6055512754639891
\layout LyX-Code
\layout Section
Files and file like objects
\begin_inset OptArg
collapsed true
\layout Standard
Files
\end_inset
\layout Standard
Working with files is one of the most common and important things we do
in scientific computing because that is usually where the data lives.
In Section
\begin_inset LatexCommand \ref{sec:intro_string}
\end_inset
, we went through the mechanics of automatically building file names like
\layout LyX-Code
data/myexp01.dat
\layout LyX-Code
data/myexp02.dat
\layout LyX-Code
data/myexp03.dat
\layout LyX-Code
data/myexp04.dat
\layout Standard
but we didn't actually do anything with these files.
Here we'll show how to read in the data and do something with it.
Python makes working with files easy and dare I say fun.
The test data set lives in
\family typewriter
data/family.csv
\family default
and is a standard comma separated value file that contains information
about my family: first name, last name, age, height in cm, weight in kg
and birthdate.
We'll open this file and parse it -- note that python has a standard module
for parsing CSV files that is much more sophisticated than what I am doing
here.
Nevertheless, it serves as an easy to understand example that is close
enough to real life that it is worth doing.
Here is what the data file looks like
\layout LyX-Code
First,Last,Age,Weight,Height,Birthday
\layout LyX-Code
John,Hunter,36,175,180,1968-03-05
\layout LyX-Code
Miriam,Sierig,33,135,177,1971-05-04
\layout LyX-Code
Rahel,Hunter,7,55,134,1998-02-25
\layout LyX-Code
Ava,Hunter,3,45,121,2001-04-26
\layout LyX-Code
Clara,Hunter,0,15,55,2004-10-02
\layout Standard
Here is the code to parse that file
\layout LyX-Code
\color blue
# open the file for reading
\layout LyX-Code
fh = file('../data/family.csv', 'r')
\layout LyX-Code
\color blue
# slurp the header, splitting on the comma
\layout LyX-Code
headers = fh.readline().split(',')
\layout LyX-Code
\color blue
# now loop over the remaining lines in the file and parse them
\layout LyX-Code
for line in fh:
\layout LyX-Code
\color blue
# remove any leading or trailing white space
\layout LyX-Code
line = line.strip()
\layout LyX-Code
\color blue
# split the line on the comma into separate variables
\layout LyX-Code
first, last, age, weight, height, dob = line.split(',')
\layout LyX-Code
\color blue
# convert some of these strings to floats
\layout LyX-Code
age, weight, height = [float(val) for val in (age, weight, height)]
\layout LyX-Code
print first, last, age, weight, height, dob
\layout Standard
This example illustrates several interesting things.
The syntax for opening a file is
\family typewriter
file(filename, mode)
\family default
and the
\family typewriter
mode
\family default
is a string like
\family typewriter
'r'
\family default
or
\family typewriter
'w'
\family default
that determines whether you are opening in read or write mode.
You can also read and write binary files with
\family typewriter
'rb'
\family default
and
\family typewriter
'wb'
\family default
.
There are more options and you should do
\family typewriter
help(file)
\family default
to learn about them.
We then use the file
\family typewriter
readline
\family default
method to read in the first line of the file.
This returns a string (the line of text) and we call the string method
\family typewriter
split(',')
\family default
to split that string wherever it sees a comma, and this returns a list
of strings which are the headers
\layout LyX-Code
>>> headers
\layout LyX-Code
['First', 'Last', 'Age', 'Weight', 'Height', 'Birthday
\backslash
n']
\layout Standard
The new line character
\family typewriter
'
\backslash
n'
\family default
at the end of
\family typewriter
'Birthday
\backslash
n'
\family default
indicates we forgot to strip the string of whitespace.
To fix that, we should have done
\layout LyX-Code
>>> headers = fh.readline().strip().split(',')
\layout LyX-Code
>>> headers
\layout LyX-Code
['First', 'Last', 'Age', 'Weight', 'Height', 'Birthday']
\layout Standard
Notice how this works like a pipeline:
\family typewriter
fh.readline
\family default
returns a line of text as a string; we call the string method
\family typewriter
strip
\family default
which returns a string with all white space (spaces, tabs, newlines) removed
from the left and right; we then call the
\family typewriter
split
\family default
method on this stripped string to split it into a list of strings.
\layout Standard
Next we start to loop over the file -- this is a nice feature of python
file handles, you can iterate over them as a sequence.
We've learned our lesson about trailing newlines, so we first strip the
line with
\family typewriter
line = line.strip()
\family default
.
The rest is string processing, splitting the line on a comma as we did
for the headers, and converting the strings to numbers where approriate
by calling f
\family typewriter
loat(val)
\family default
for each of
\family typewriter
age
\family default
,
\family typewriter
weight
\family default
and
\family typewriter
height
\family default
.
Notice how we use list comprehensions and tuple unpacking -- the age, weight,
\family typewriter
height = [float(val) for val in (age, weight, height)]
\family default
line, to convert several values at once.
\layout Standard
Now that we have all this data, how mught we store it.
We could store it in a
\family typewriter
results
\family default
list
\layout LyX-Code
results = []
\layout LyX-Code
for line in fh:
\layout LyX-Code
\color blue
# process the line as above to get the variables
\layout LyX-Code
results.append( (first, last, age, weight, height, dob) )
\layout LyX-Code
\layout LyX-Code
\layout LyX-Code
\color blue
# and later when we want to analyze the data
\layout LyX-Code
for first, last, age, weight, height, dob in results:
\layout LyX-Code
\color blue
# do something with the data
\layout Exercise
\family typewriter
zip
\family default
magic.
Python has a nice funcion
\family typewriter
zip
\family default
that lets you do very useful things with lists of tuples.
\family typewriter
results
\family default
above is a list of tuples -- each tuple is the
\family typewriter
first
\family default
,
\family typewriter
last
\family default
,
\family typewriter
age
\family default
,
\family typewriter
weight
\family default
,
\family typewriter
height
\family default
,
\family typewriter
dob
\family default
for a family member.
What happens if you do
\layout LyX-Code
>>> first, last, age, weight, height, dob = zip(*results)
\layout Standard
What is
\family typewriter
age
\family default
now?
\layout Exercise
Write a class
\family typewriter
Person
\family default
and store the attributes
\family typewriter
first
\family default
,
\family typewriter
last
\family default
,
\family typewriter
age
\family default
,
\family typewriter
weight
\family default
,
\family typewriter
height
\family default
,
\family typewriter
dob
\family default
in that class.
Add a class instance to the results list, eg
\layout LyX-Code
results.append(Person(first, last, age, weight, height, dob))
\layout Standard
Python also has a special syntax for printing to an open writable file object
\layout LyX-Code
\color blue
# open the file for writing
\layout LyX-Code
outfile = file('mydata.data', 'w')
\layout LyX-Code
for x,y,z in myresults:
\layout LyX-Code
print >> outfile, '%1.3f %1.3f %1.3f'%(x,y,z)
\layout Standard
Another really nice thing about file objects is that other classes can implement
the file protcol and allow you to use them as if they were files.
For example, the StringIO module in the standard library allows you to
read and write to strings as if they were files.
The urllib.urlopen function allows you to open a remove web page as a file
object.
Try this
\layout LyX-Code
\color blue
# loop over the lines in google's html
\layout LyX-Code
from urllib import urlopen
\layout LyX-Code
for line in urlopen('http://www.google.com').readlines():
\layout LyX-Code
print line,
\the_end