Menu

[r6218]: / trunk / py4science / book / intro_to_python.lyx  Maximize  Restore  History

Download this file

2895 lines (2233 with data), 60.2 kB

#LyX 1.3 created this file. For more info see http://www.lyx.org/
\lyxformat 221
\textclass amsbook
\language english
\inputencoding auto
\fontscheme default
\graphics default
\paperfontsize default
\papersize Default
\paperpackage a4
\use_geometry 0
\use_amsmath 0
\use_natbib 0
\use_numerical_citations 0
\paperorientation portrait
\secnumdepth 3
\tocdepth 3
\paragraph_separation indent
\defskip medskip
\quotes_language english
\quotes_times 2
\papercolumns 1
\papersides 1
\paperpagestyle default

\layout Chapter

A whirlwind tour of python and the standard library
\begin_inset OptArg
collapsed true

\layout Standard

Python intro
\end_inset 


\layout Standard

This is a quick-and-dirty introduction to the python language for the impatient
 scientist.
 There are many top notch, comprehensive introductions and tutorials for
 python.
 For absolute beginners, there is the 
\shape italic 
Python Beginner's Guide
\shape default 
.
\begin_inset Foot
collapsed true

\layout Standard

http://www.python.org/moin/BeginnersGuide
\end_inset 

 The official 
\shape italic 
Python Tutorial
\shape default 
 can be read online
\begin_inset Foot
collapsed true

\layout Standard

http://docs.python.org/tut/tut.html
\end_inset 

 or downloaded
\begin_inset Foot
collapsed true

\layout Standard

http://docs.python.org/download.html
\end_inset 

 in a variety of formats.
 There are over 100 python tutorials collected online.
\begin_inset Foot
collapsed true

\layout Standard

http://www.awaretek.com/tutorials.html
\end_inset 


\layout Standard

There are also many excellent books.
 Targetting newbies is Mark Pilgrim's 
\shape italic 
Dive into Python
\shape default 
 which in available in print and for free online
\begin_inset Foot
collapsed true

\layout Standard

http://diveintopython.org/toc/index.html
\end_inset 

, though for absolute newbies even this may be too hard 
\begin_inset LatexCommand \cite{Dive}

\end_inset 

.
 For experienced programmers, David Beasley's 
\shape italic 
Python Essential Reference
\shape default 
 is an excellent introduction to python, but is a bit dated since it only
 covers python2.1 
\begin_inset LatexCommand \cite{Beasley}

\end_inset 

.
 Likwise Alex Martelli's 
\shape italic 
Python in a Nutshell
\shape default 
 is highly regarded and a bit more current -- a 2nd edition is in the works
\begin_inset LatexCommand \cite{Nutshell}

\end_inset 

.
 And 
\shape italic 
The Python Cookbook
\shape default 
 is an extremely useful collection of python idioms, tips and tricks 
\begin_inset LatexCommand \cite{Cookbook}

\end_inset 

.
\layout Standard

But the typical scientist I encounter wants to solve a specific problem,
 eg, to make a certain kind of graph, to numerically integrate an equation,
 or to fit some data to a parametric model, and doesn't have the time or
 interest to read several books or tutorials to get what they want.
 This guide is for them: a short overview of the language to help them get
 to what they want as quickly as possible.
 We get to advanced material pretty quickly, so it may be touch sledding
 if you are a python newbie.
 Take in what you can, and if you start getting dizzy, skip ahead to the
 next section; you can always come back to absorb more detail later, after
 you get your real work done.
\layout Section

Hello Python
\layout Standard

Python is a dynamically typed, object oriented, interpreted language.
 Interpreted means that your program interacts with the python interpreter,
 similar to Matlab, Perl, Tcl and Java, and unlike FORTRAN, C, or C++ which
 are compiled.
 So let's fire up the python interpreter and get started.
 I'm not going to cover installing python -- it's standard on most linux
 boxes and for windows there is a friendly GUI installer.
 To run the python interpreter, on windows, you can click 
\family typewriter 
Start->All Programs->Python 2.4->Python (command line)
\family default 
 or better yet, install 
\family typewriter 
ipython
\family default 
, a python shell on steroids, and use that.
 On linux / unix systems, you just need to type 
\family typewriter 
python
\family default 
 or 
\family typewriter 
ipython
\family default 
 at the command line.
 The 
\family typewriter 
>>>
\family default 
 is the default python shell prompt, so don't type it in the examples below
\layout LyX-Code

>>> print 'hello world'
\layout LyX-Code

hello world
\layout LyX-Code

\layout Standard

As this example shows, 
\shape italic 
hello world
\shape default 
 in python is pretty easy -- one common phrase you hear in the python community
 is that 
\begin_inset Quotes eld
\end_inset 

it fits your brain
\begin_inset Quotes erd
\end_inset 

.
 -- the basic idea is that coding in python feels natural.
 Compare python's version with 
\shape italic 
hello world
\shape default 
 in C++
\layout LyX-Code

// C++
\layout LyX-Code

#include <iostream>
\layout LyX-Code

int main ()
\layout LyX-Code

{   
\layout LyX-Code

  std::cout << "Hello World" << std::endl;
\layout LyX-Code

  return 0;
\layout LyX-Code

}
\layout Section


\begin_inset LatexCommand \label{sec:into_calculator}

\end_inset 

Python is a calculator
\begin_inset OptArg
collapsed true

\layout Standard

Calculator
\end_inset 


\layout Standard

Aside from my daughter's solar powered cash-register calculator, Python
 is the only calculator I use.
 From the python shell, you can type arbitrary arithmetic expressions.
\layout LyX-Code

>>> 2+2
\layout LyX-Code

4
\layout LyX-Code

>>> 2**10
\layout LyX-Code

1024
\layout LyX-Code

>>> 10/5
\layout LyX-Code

2
\layout LyX-Code

>>> 2+(24.3 + .9)/.24
\layout LyX-Code

107.0
\layout LyX-Code

>>> 2/3
\layout LyX-Code

0
\layout Standard

The last line is a standard newbie gotcha -- if both the left and right
 operands are integers, python returns an integer.
 To do floating point division, make sure at least one of the numbers is
 a float
\layout LyX-Code

>>> 2.0/3
\layout LyX-Code

0.66666666666666663
\layout Standard

The distinction between integer and floating point division is a common
 source of frustration among newbies and is slated for destruction in the
 mythical Python 3000.
\begin_inset Foot
collapsed true

\layout Standard

Python 3000 is a future python release that will clean up several things
 that Guido considers to be warts.
\end_inset 

 Since default integer division will be removed in the future, you can invoke
 the time machine with the 
\family typewriter 
from __future__
\family default 
 directives; these directives allow python programmers today to use features
 that will become standard in future releases but are not included by default
 because they would break existing code.
 From future directives should be among the first lines you type in your
 python code if you are going to use them, otherwise they may not work.
 The future division operator will assume floating point division by default,
\begin_inset Foot
collapsed false

\layout Standard

You may have noticed that 2/3 was represented as 0.66666666666666663 and
 not 0.66666666666666666 as might be expected.
 This is because computers are binary calculators, and there is no exact
 binary representation of 2/3, just as there is no exact binary representation
 of 0.1
\layout LyX-Code

>>> 0.1
\layout LyX-Code

0.10000000000000001
\layout Standard

Some languages try and hide this from you, but python is explicit.
\end_inset 

and provides another operator // to do classic integer division.
\layout LyX-Code

>>> from __future__ import division
\layout LyX-Code

>>> 2/3
\layout LyX-Code

0.66666666666666663
\layout LyX-Code

>>> 2//3
\layout LyX-Code

0
\layout Standard

python has four basic numeric types: int, long, float and complex, but unlike
 C++, BASIC, FORTRAN or Java, you don't have to declare these types.
 python can infer them
\layout LyX-Code

>>> type(1)
\layout LyX-Code

<type 'int'>
\layout LyX-Code

>>> type(1.0)
\layout LyX-Code

<type 'float'>
\layout LyX-Code

>>> type(2**200)
\layout LyX-Code

<type 'long'>
\layout LyX-Code

\layout Standard


\begin_inset Formula $2^{200}$
\end_inset 

is a huge number!
\layout LyX-Code

>>> 2**200
\layout LyX-Code

1606938044258990275541962092341162602522202993782792835301376L
\layout Standard

but python will blithely compute it and much larger numbers for you as long
 as you have CPU and memory to handle them.
 The integer type, if it overflows, will automatically convert to a python
 
\family typewriter 
long
\family default 
 (as indicated by the appended 
\family typewriter 
L
\family default 
 in the output above) and has no built-in upper bound on size, unlike C/C++
 longs.
\layout Standard

Python has built in support for complex numbers.
 Eg, we can verify 
\begin_inset Formula $i^{2}=-1$
\end_inset 

 
\layout LyX-Code

>>> x = complex(0,1)
\layout LyX-Code

>>> x*x
\layout LyX-Code

(-1+0j)
\layout Standard

To access the real and imaginary parts of a complex number, use the 
\family typewriter 
real
\family default 
 and 
\family typewriter 
imag
\family default 
 attributes
\layout LyX-Code

>>> x.real
\layout LyX-Code

0.0
\layout LyX-Code

>>> x.imag
\layout LyX-Code

1.0
\layout Standard

If you come from other languages like Matlab, the above may be new to you.
 In matlab, you might do something like this (>> is the standard matlab
 shell prompt)
\layout LyX-Code

>> x = 0+j
\layout LyX-Code

x =
\layout LyX-Code

   0.0000 + 1.0000i
\layout LyX-Code

\layout LyX-Code

>> real(x)
\layout LyX-Code

ans =
\layout LyX-Code

     0
\layout LyX-Code

\layout LyX-Code

>> imag(x)
\layout LyX-Code

ans =
\layout LyX-Code

     1
\layout LyX-Code

\layout LyX-Code

\layout Standard

That is, in Matlab, you use a 
\shape italic 
function
\shape default 
 to access the real and imaginary parts of the data, but in python these
 are attributes of the complex object itself.
 This is a core feature of python and other object oriented languages: an
 object carries its data and methods around with it.
 One might say: 
\begin_inset Quotes eld
\end_inset 

a complex number knows it's real and imaginary parts
\begin_inset Quotes erd
\end_inset 

 or 
\begin_inset Quotes eld
\end_inset 

a complex number knows how to take its conjugate
\begin_inset Quotes erd
\end_inset 

, you don't need external functions for these operations
\layout LyX-Code

>>> x.conjugate
\layout LyX-Code

<built-in method conjugate of complex object at 0xb6a62368>
\layout LyX-Code

>>> x.conjugate()
\layout LyX-Code

-1j
\layout Standard

On the first line, I just followed along from the example above with 
\family typewriter 
real
\family default 
 and 
\family typewriter 
imag
\family default 
 and typed 
\family typewriter 
x.conjugate
\family default 
 and python printed the representation 
\family typewriter 
<built-in method conjugate of complex object at 0xb6a62368>.
 
\family default 
This means that 
\family typewriter 
conjugate
\family default 
 is a 
\shape italic 
method
\shape default 
, a.k.a a function, and in python we need to use parentheses to call a function.
 If the method has arguments, like the 
\family typewriter 
x
\family default 
 in 
\family typewriter 
sin(x)
\family default 
, you place them inside the parentheses, and if it has no arguments, like
 
\family typewriter 
conjugate
\family default 
, you simply provide the open and closing parentheses.
 
\family typewriter 
real
\family default 
, 
\family typewriter 
imag
\family default 
 and 
\family typewriter 
conjugate
\family default 
 are attributes of the complex object, and 
\family typewriter 
conjugate
\family default 
 is a 
\shape italic 
callable
\shape default 
 attribute, known as a 
\shape italic 
method
\shape default 
.
\layout Standard

OK, now you are an object oriented programmer.
 There are several key ideas in object oriented programming, and this is
 one of them: an object carries around with it data (simple attributes)
 and methods (callable attributes) that provide additional information about
 the object and perform services.
 It's one stop shopping -- no need to go to external functions and libraries
 to deal with it -- the object knows how to deal with itself.
\layout Section

Accessing the standard library
\begin_inset OptArg
collapsed true

\layout Standard

Standard Library
\end_inset 


\layout Standard

Arithmetic is fine, but before long you may find yourself tiring of it and
 wanting to compute logarithms and exponents, sines and cosines
\layout LyX-Code

>>> log(10)
\layout LyX-Code

Traceback (most recent call last):
\layout LyX-Code

  File "<stdin>", line 1, in ?
\layout LyX-Code

NameError: name 'log' is not defined
\layout Standard

These functions are not built into python, but don't despair, they are built
 into the python standard library.
 To access a function from the standard library, or an external library
 for that matter, you must import it.
\layout LyX-Code

>>> import math
\layout LyX-Code

>>> math.log(10)
\layout LyX-Code

2.3025850929940459
\layout LyX-Code

>>> math.sin(math.pi)
\layout LyX-Code

1.2246063538223773e-16
\layout Standard

Note that the default 
\family typewriter 
log
\family default 
 function is a base 2 logarithm (use 
\family typewriter 
math.log10
\family default 
 for base 10 logs) and that floating point math is inherently imprecise,
 since analytically
\begin_inset Formula $\sin(\pi)=0$
\end_inset 

.
\layout Standard

It's kind of a pain to keep typing 
\family typewriter 
math.log
\family default 
 and 
\family typewriter 
math.sin
\family default 
 and 
\family typewriter 
math.p
\family default 
i, and python is accomodating.
 There are additional forms of 
\family typewriter 
import
\family default 
 that will let you save more or less typing depending on your desires
\layout LyX-Code


\color blue
# Appreviate the module name: m is an alias
\layout LyX-Code

>>> import math as m
\layout LyX-Code

>>> m.cos(2*m.pi)
\layout LyX-Code

1.0
\layout LyX-Code

\layout LyX-Code


\color blue
# Import just the names you need
\layout LyX-Code

>>> from math import exp, log
\layout LyX-Code

>>> log(exp(1))
\layout LyX-Code

1.0
\layout LyX-Code

\layout LyX-Code


\color blue
# Import everything - use with caution!
\layout LyX-Code

>>> from math import *
\layout LyX-Code

>>> sin(2*pi*10)
\layout LyX-Code

-2.4492127076447545e-15
\layout Standard

To help you learn more about what you can find in the math library, python
 has nice introspection capabilities -- introspection is a way of asking
 an object about itself.
 For example, to find out what is available in the math library, we can
 get a directory of everything available with the 
\family typewriter 
dir
\family default 
 command
\begin_inset Foot
collapsed false

\layout Standard

In addition to the introdpection and help provided in the python interpreter,
 the official documentation of the python standard library is very good
 and up-to-date http://docs.python.org/lib/lib.html .
\end_inset 


\layout LyX-Code

>>> dir(math)
\layout LyX-Code

['__doc__', '__file__', '__name__', 'acos', 'asin', 'atan', 'atan2', 'ceil',
 'cos', 'cosh', 'degrees', 'e', 'exp', 'fabs', 'floor', 'fmod', 'frexp',
 'hypot', 'ldexp', 'log', 'log10', 'modf', 'pi', 'pow', 'radians', 'sin',
 'sinh', 'sqrt', 'tan', 'tanh']
\layout Standard

This gives us just a listing of the names that are in the math module --
 they are fairly self descriptive, but if you want more, you can call 
\family typewriter 
help
\family default 
 on any of these functions for more information
\layout LyX-Code

>>> help(math.sin) 
\layout LyX-Code

Help on built-in function sin:
\layout LyX-Code

sin(...)
\layout LyX-Code

sin(x)
\layout LyX-Code

Return the sine of x (measured in radians).
\layout Standard

and for the whole math library
\layout LyX-Code

>>> help(math) 
\layout LyX-Code

Help on module math:
\layout LyX-Code

 
\layout LyX-Code

NAME
\layout LyX-Code

    math
\layout LyX-Code

 
\layout LyX-Code

FILE
\layout LyX-Code

    /usr/local/lib/python2.3/lib-dynload/math.so
\layout LyX-Code

 
\layout LyX-Code

DESCRIPTION
\layout LyX-Code

    This module is always available.
  It provides access to the
\layout LyX-Code

    mathematical functions defined by the C standard.
\layout LyX-Code

 
\layout LyX-Code

FUNCTIONS
\layout LyX-Code

    acos(...)
\layout LyX-Code

        acos(x)
\layout LyX-Code

         
\layout LyX-Code

        Return the arc cosine (measured in radians) of x.
\layout LyX-Code

     
\layout LyX-Code

    asin(...)
\layout LyX-Code

        asin(x)
\layout LyX-Code

         
\layout LyX-Code

        Return the arc sine (measured in radians) of x.
\layout LyX-Code

     
\layout Standard

And much more which is snipped.
 Likewise, we can get information on the complex object in the same way
\layout LyX-Code

>>> x = complex(0,1)
\layout LyX-Code

>>> dir(x)
\layout LyX-Code

['__abs__', '__add__', '__class__', '__coerce__', '__delattr__', '__div__',
 '__divmod__', '__doc__', '__eq__', '__float__', '__floordiv__', '__ge__',
 '__getattribute__', '__getnewargs__', '__gt__', '__hash__', '__init__',
 '__int__', '__le__', '__long__', '__lt__', '__mod__', '__mul__', '__ne__',
 '__neg__', '__new__', '__nonzero__', '__pos__', '__pow__', '__radd__',
 '__rdiv__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloor
div__', '__rmod__', '__rmul__', '__rpow__', '__rsub__', '__rtruediv__',
 '__setattr__', '__str__', '__sub__', '__truediv__', 'conjugate', 'imag',
 'real']
\layout LyX-Code

\layout Standard

Notice that called 
\family typewriter 
dir
\family default 
 or 
\family typewriter 
help
\family default 
 on the 
\family typewriter 
math
\family default 
 
\shape italic 
module
\shape default 
, the 
\family typewriter 
math.sin
\family default 
 
\shape italic 
function
\shape default 
, and the 
\family typewriter 
complex
\family default 
 
\shape italic 
number
\shape default 
 
\family typewriter 
x
\family default 
.
 That's because modules, functions and numbers are all 
\shape italic 
objects
\shape default 
, and we use the same object introspection and help capabilites on them.
 We can find out what type of object they are by calling 
\family typewriter 
type
\family default 
 on them, which is another function in python's introspection arsenal
\layout LyX-Code

>>> type(math)
\layout LyX-Code

<type 'module'>
\layout LyX-Code

>>> type(math.sin)
\layout LyX-Code

<type 'builtin_function_or_method'>
\layout LyX-Code

>>> type(x)
\layout LyX-Code

<type 'complex'>
\layout LyX-Code

\layout Standard

Now, you may be wondering: what were all those god-awful looking double
 underscore methods, like 
\family typewriter 
__abs__ 
\family default 
and 
\family typewriter 
__mul__
\family default 
 in the 
\family typewriter 
dir
\family default 
 listing of the complex object above? These are methods that define what
 it means to be a numeric type in python, and the complex object implements
 these methods so that complex numbers act like the way should, eg 
\family typewriter 
__mul__
\family default 
 implements the rules of complex multiplication.
 The nice thing about this is that python specifies an application programming
 interface (API) that is the definition of what it means to be a number
 in python.
 And this means you can define your own numeric types, as long as you implement
 the required special double underscore methods for your custom type.
 double underscore methods are very important in python; although the typical
 newbie never sees them or thinks about them, they are there under the hood
 providing all the python magic, and more importantly, showing the way to
 let you make magic.
\layout Section


\begin_inset LatexCommand \label{sec:intro_string}

\end_inset 

Strings
\layout Standard

We've encountered a number of types of objects above: int, float, long,
 complex, method/function and module.
 We'll continue our tour with an introduction to strings, which are critical
 components of almost every program.
 You can create strings in a number of different ways, with single quotes,
 double quotes, or triple quotes -- this diversity of methods makes it easy
 if you need to embed string characters in the string itself
\layout LyX-Code


\color blue
# single, double and triple quoted strings
\layout LyX-Code

>>> s = 'Hi Mom!'
\layout LyX-Code

>>> s = "Hi Mom!"
\layout LyX-Code

>>> s = """Porky said, "That's all folks!" """
\layout Standard

You can add strings together to concatenate them
\layout LyX-Code


\color blue
# concatenating strings
\layout LyX-Code

>>> first = 'John'
\layout LyX-Code

>>> last = 'Hunter'
\layout LyX-Code

>>> first+last
\layout LyX-Code

'JohnHunter'
\layout Standard

or call string methods to process them: upcase them or downcase them, or
 replace one character with another
\layout LyX-Code


\color blue
# string methods
\layout LyX-Code

>>> last.lower()
\layout LyX-Code

'hunter'
\layout LyX-Code

>>> last.upper()
\layout LyX-Code

'HUNTER'
\layout LyX-Code

>>> last.replace('h', 'p')
\layout LyX-Code

'Hunter'
\layout LyX-Code

>>> last.replace('H', 'P')
\layout LyX-Code

'Punter' 
\layout Standard

Note that in all of these examples, the string 
\family typewriter 
last
\family default 
 is unchanged.
 All of these methods operate on the string and return a new string, leaving
 the original unchanged.
 In fact, python strings cannot be changed by any python code at all: they
 are 
\shape italic 
immutable
\shape default 
 (unchangeable).
 The concept of mutable and immutable objects in python is an important
 one, and it will come up again, because only immutable objects can be used
 as keys in python dictionaries and elements of python sets.
\layout Standard

You can access individual characters, or slices of the string (substrings),
 using indexing.
 A string in sequence of characters, and strings implement the sequence
 protocol in python -- we'll see more examples of python sequences later
 -- and all sequences have the same syntax for accessing their elements.
 Python uses 0 based indexing which means the first element is at index
 0; you can use negative indices to access the last elements in the sequence
\layout LyX-Code


\color blue
# string indexing
\layout LyX-Code

>>> last = 'Hunter'
\layout LyX-Code

>>> last[0]
\layout LyX-Code

'H'
\layout LyX-Code

>>> last[1]
\layout LyX-Code

'u'
\layout LyX-Code

>>> last[-1] 
\layout LyX-Code

'r' 
\layout Standard

To access substrings, or generically in terms of the sequence protocol,
 slices, you use a colon to indicate a range
\layout LyX-Code


\color blue
# string slicing
\layout LyX-Code

>>> last[0:2]
\layout LyX-Code

'Hu'
\layout LyX-Code

>>> last[2:4]
\layout LyX-Code

'nt'
\layout Standard

As this example shows, python uses 
\begin_inset Quotes eld
\end_inset 

one-past-the-end
\begin_inset Quotes erd
\end_inset 

 indexing when defining a range; eg, in the range 
\family typewriter 
indmin:indmax
\family default 
, the element of 
\family typewriter 
imax
\family default 
 is not included.
 You can use negative indices when slicing too; eg, to get everything before
 the last character
\layout LyX-Code

>>> last[0:-1]
\layout LyX-Code

'Hunte'
\layout Standard

You can also leave out either the min or max indicator; if they are left
 out, 0 is assumed to be the 
\family typewriter 
indmin
\family default 
 and one past the end of the sequence is assumed to be 
\family typewriter 
indmax
\layout LyX-Code

>>> last[:3]
\layout LyX-Code

'Hun'
\layout LyX-Code

>>> last[3:]
\layout LyX-Code

'ter'
\layout Standard

There is a third number that can be placed in a slice, a step, with syntax
 indmin:indmax:step; eg, a step of 2 will skip every second letter
\layout LyX-Code

>>> last[1:6:2]
\layout LyX-Code

'utr'
\layout Standard

Although this may be more that you want to know about slicing strings, the
 time spent here is worthwhile.
 As mentioned above, all python sequences obey these rules.
 In addition to strings, lists and tuples, which are built-in python sequence
 data types and are discussed in the next section, the numeric arrays widely
 used in scientific computing also implement the sequence protocol, and
 thus have the same slicing rules.
\layout Exercise

What would you expect last[:] to return?
\layout Standard

One thing that comes up all the time is the need to create strings out of
 other strings and numbers, eg to create filenames from a combination of
 a base directory, some base filename, and some numbers.
 Scientists like to create lots of data files like and then write code to
 loop over these files and analyze them.
 We're going to show how to do that, starting with the newbie way and progressiv
ely building up to the way of python zen master.
 All of the methods below 
\shape italic 
work
\shape default 
, but the zen master way will more efficient, more scalable (eg to larger
 numbers of files) and cross-platform.
\begin_inset Foot
collapsed false

\layout Standard


\begin_inset Quotes eld
\end_inset 

But it works
\begin_inset Quotes erd
\end_inset 

 is a common defense of bad code; my rejoinder to this is 
\begin_inset Quotes eld
\end_inset 

A computer scientist is someone who fixes things that aren't broken
\begin_inset Quotes erd
\end_inset 

.
 
\end_inset 

 Here's the newbie way: we also introduce the for-loop here in the spirit
 of diving into python -- note that python uses whitespace indentation to
 delimit the for-loop code block
\layout LyX-Code


\color blue
# The newbie way
\layout LyX-Code

for i in (1,2,3,4):
\layout LyX-Code

    fname = 'data/myexp0' + str(i) + '.dat'
\layout LyX-Code

    print fname
\layout Standard

Now as promised, this will print out the 4 file names above, but it has
 three flaws: it doesn't scale to 10 or more files, it is inefficient, and
 it is not cross platform.
 It doesn't scale because it hard-codes the '
\family typewriter 
0
\family default 
' after 
\family typewriter 
myexp
\family default 
, it is inefficient because to add several strings requires the creation
 of temporary strings, and it is not cross-platform because it hard-codes
 the directory separator '/'.
\layout LyX-Code


\color blue
# On the path to elightenment
\layout LyX-Code

for i in (1,2,3,4):
\layout LyX-Code

    fname = 'data/myexp%02d.dat'%i
\layout LyX-Code

    print fname
\layout Standard

This example uses string interpolation, the funny % thing.
 If you are familiar with C programming, this will be no surprise to you
 (on linux/unix systems do 
\family typewriter 
man sprintf 
\family default 
at the unix shell).
 The percent character is a string formatting character: 
\family typewriter 
%02d
\family default 
 means to take an integer (the 
\family typewriter 
d
\family default 
 part) and print it with two digits, padding zero on the left (the
\family typewriter 
 %02
\family default 
 part).
 There is more to be said about string interpolation, but let's finish the
 job at hand.
 This example is better than the newbie way because is scales up to files
 numbered 0-99, and it is more efficient because it avoids the creation
 of temporary strings.
 For the platform independent part, we go to the python standard library
 
\family typewriter 
os.path
\family default 
, which provides a host of functions for platform-independent manipulations
 of filenames, extensions and paths.
 Here we use 
\family typewriter 
os.path.join
\family default 
 to combine the directory with the filename in a platform independent way.
 On windows, it will use the windows path separator '
\backslash 
' and on unix it will use '/'.
\layout LyX-Code


\color blue
# the zen master approach
\layout LyX-Code

import os
\layout LyX-Code

for i in (1,2,3,4):
\layout LyX-Code

    fname = os.path.join('data', 'myexp%02d.dat'%i)
\layout LyX-Code

    print fname
\layout Exercise

Suppose you have data files named like
\layout LyX-Code

data/2005/exp0100.dat
\layout LyX-Code

data/2005/exp0101.dat
\layout LyX-Code

data/2005/exp0102.dat
\layout LyX-Code

...
\layout LyX-Code

data/2005/exp1000.dat
\layout Standard

Write the python code that iterates over these files, constructing the filenames
 as strings in using 
\family typewriter 
os.path.join
\family default 
 to construct the paths in a platform-independent way.
 
\shape italic 
Hint
\shape default 
: read the help for 
\family typewriter 
os.path.join
\family default 
!
\layout Standard

OK, I promised to torture you a bit more with string interpolation -- don't
 worry, I remembered.
 The ability to properly format your data when printing it is crucial in
 scientific endeavors: how many signficant digits do you want, do you want
 to use integer, floating point representation or exponential notation?
 These three choices are provided with 
\family typewriter 
%d
\family default 
, 
\family typewriter 
%f
\family default 
 and 
\family typewriter 
%e
\family default 
, with lots of variations on the theme to indicate precision and more
\layout LyX-Code

>>> 'warm for %d minutes at %1.1f C' % (30, 37.5)
\layout LyX-Code

'warm for 30 minutes at 37.5 C'
\layout LyX-Code

\layout LyX-Code

>>> 'The mass of the sun is %1.4e kg'% (1.98892*10**30)
\layout LyX-Code

'The mass of the sun is 1.9889e+30 kg'
\layout LyX-Code

\layout Standard

There are two string methods, 
\family typewriter 
split
\family default 
 and 
\family typewriter 
join
\family default 
, that arise frequenctly in Numeric processing, specifically in the context
 of processing data files that have comma, tab, or space separated numbers
 in them.
 
\family typewriter 
split
\family default 
 takes a single string, and splits it on the indicated character to a sequence
 of strings.
 This is useful to take a single line of space or comma separated values
 and split them into individual numbers
\layout LyX-Code


\color blue
# s is a single string and we split it into a list of strings
\layout LyX-Code


\color blue
# for further processing
\layout LyX-Code

>>> s = '1.0 2.0 3.0 4.0 5.0'
\layout LyX-Code

>>> s.split(' ')
\layout LyX-Code

['1.0', '2.0', '3.0', '4.0', '5.0']
\layout Standard

The return value, with square brackets, indicates that python has returned
 a list of strings.
 These individual strings need further processing to convert them into actual
 floats, but that is the first step.
  The conversion to floats will be discussed in the next session, when we
 learn about list comprehensions.
 The converse method is join, which is often used to create string output
 to an ASCII file from a list of numbers.
 In this case you want to join a list of numbers into a single line for
 printing to a file.
 The example below will be clearer after the next section, in which lists
 are discussed
\layout LyX-Code


\color blue
# vals is a list of floats and we convert it to a single
\layout LyX-Code


\color blue
# space separated string
\layout LyX-Code

>>> vals = [1.0, 2.0, 3.0, 4.0, 5.0]
\layout LyX-Code

>>> ' '.join([str(val) for val in vals])
\layout LyX-Code

'1.0 2.0 3.0 4.0 5.0'
\layout Standard

There are two new things in the example above.
 One, we called the join method directly on a string itself, and not on
 a variable name.
 Eg, in the previous examples, we always used the name of the object when
 accessing attributes, eg 
\family typewriter 
x.real
\family default 
 or 
\family typewriter 
s.upper()
\family default 
.
 In this example, we call the 
\family typewriter 
join
\family default 
 method on the string which is a single space.
 The second new feature is that we use a list comprehension 
\family typewriter 
[str(val) for val in vals] 
\family default 
as the argument to 
\family typewriter 
join
\family default 
.
 
\family typewriter 
join
\family default 
 requires a sequence of strings, and the list comprehension converts a list
 of floats to a strings.
 This can be confusing at first, so don't dispair if it is.
 But it is worth bringing up early because list comprehensions are a very
 useful feature of python.
 To help elucidate, compare 
\family typewriter 
vals
\family default 
, which is a list of floats, with the conversion of 
\family typewriter 
vals
\family default 
 to a list of strings using list comprehensions in the next line
\layout LyX-Code


\color blue
# converting a list of floats to a list of strings
\layout LyX-Code

>>> vals
\layout LyX-Code

[1.0, 2.0, 3.0, 4.0, 5.0]
\layout LyX-Code

>>> [str(val) for val in vals] 
\layout LyX-Code

['1.0', '2.0', '3.0', '4.0', '5.0']
\layout Section

The basic python data structures
\begin_inset OptArg
collapsed true

\layout Standard

Data Structures
\end_inset 


\layout Standard

Strings, covered in the last section, are sequences of characters.
 python has two additional built-in sequence types which can hold arbitrary
 elements: tuples and lists.
 tuples are created using parentheses, and lists are created using square
 brackets
\layout LyX-Code


\color blue
# a tuple and a list of elements of the same type
\layout LyX-Code


\color blue
# (homogeneous)
\layout LyX-Code

>>> t = (1,2,3,4)  # tuple
\layout LyX-Code

>>> l = [1,2,3,4]  # list
\layout Standard

Both tuples and lists can also be used to hold elements of different types
\layout LyX-Code


\color blue
# a tuple and list of int, string, float
\layout LyX-Code

>>> t = (1,'john', 3.0)
\layout LyX-Code

>>> l = [1,'john', 3.0]
\layout Standard

Tuples and lists have the same indexing and slicing rules as each other,
 and as string discussed above, because both implement the python sequence
 protocol, with the only difference being that tuple slices return tuples
 (indicated by the parentheses below) and list slices return lists (indicated
 by the square brackets)
\layout LyX-Code

# indexing and slicing tuples and lists
\layout LyX-Code

>>> t[0]
\layout LyX-Code

1
\layout LyX-Code

>>> l[0]
\layout LyX-Code

1
\layout LyX-Code

>>> t[:-1]
\layout LyX-Code

(1, 'john')
\layout LyX-Code

>>> l[:-1]
\layout LyX-Code

[1, 'john']
\layout Standard

So why the difference between tuples and lists? A number of explanations
 have been offered on the mailing lists, but the only one that makes a differenc
e to me is that tuples are immutable, like strings, and hence can be used
 as keys to python dictionaries and included as elements of sets, and lists
 are mutable, and cannot.
 So a tuple, once created, can never be changed, but a list can.
 For example, if we try to reassign the first element of the tuple above,
 we get an error
\layout LyX-Code

>>> t[0] = 'why not?'
\layout LyX-Code

Traceback (most recent call last):
\layout LyX-Code

 File "<stdin>", line 1, in ?
\layout LyX-Code

TypeError: object doesn't support item assignment
\layout Standard

But the same operation is perfectly accetable for lists
\layout LyX-Code

>>> l[0] = 'why not?'
\layout LyX-Code

>>> l
\layout LyX-Code

['why not?', 'john', 3.0]
\layout Standard

lists also have a lot of methods, tuples have none, save the special double
 underscore methods that are required for python objects and sequences
\layout LyX-Code


\color blue
# tuples contain only 
\begin_inset Quotes eld
\end_inset 

hidden
\begin_inset Quotes erd
\end_inset 

 double underscore methods
\layout LyX-Code

>>> dir(t)
\layout LyX-Code

['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__',
 '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__getslice__',
 '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__',
 '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__',
 '__rmul__', '__setattr__', '__str__']
\layout LyX-Code

\layout LyX-Code


\color blue
# but lists contain other methods, eg append, extend and
\layout LyX-Code


\color blue
# reverse
\layout LyX-Code

>>> dir(l)
\layout LyX-Code

['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__delsli
ce__', '__doc__', '__eq__', '__ge__', '__getattribute__', '__getitem__',
 '__getslice__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__',
 '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__',
 '__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr__', '__setite
m__', '__setslice__', '__str__', 'append', 'count', 'extend', 'index', 'insert',
 'pop', 'remove', 'reverse', 'sort']
\layout Standard

Many of these list methods change, or mutate, the list, eg append adds an
 element to the list
\family typewriter 
: extend
\family default 
 extends the list with a sequence of elements, 
\family typewriter 
sort
\family default 
 sorts the list in place, 
\family typewriter 
reverse
\family default 
 reverses it in place, 
\family typewriter 
pop
\family default 
 takes an element off the list and returns it.
\layout Standard

We've seen a couple of examples of creating a list above -- let's look at
 some more using list methods
\layout LyX-Code

>>> x = []                   
\color blue
# create the empty list
\layout LyX-Code

>>> x.append(1)              
\color blue
# add the integer one to it
\layout LyX-Code

>>> x.extend(['hi', 'mom'])  
\color blue
# append two strings to it
\layout LyX-Code

>>> x
\layout LyX-Code

[1, 'hi', 'mom']
\layout LyX-Code

>>> x.reverse()              
\color blue
# reverse the list, in place
\layout LyX-Code

>>> x
\layout LyX-Code

['mom', 'hi', 1]
\layout LyX-Code

>>> len(x)
\layout LyX-Code

3
\layout Standard

We mentioned list comprehensions in the last section when discussing string
 methods.
  List comprehensions are a way of creating a list using a for loop in a
 single line of python.
 Let's create a list of the perfect cubes from 1 to 10, first with a for
 loop and then with a list comprehension.
 The list comprehension code will not only be shorter and more elegant,
 it can be much faster (the dots are the indentation block indicator from
 the python shell and should not be typed)
\layout LyX-Code


\color blue
# a list of perfect cubes using a for-loop
\layout LyX-Code

>>> cubes = []
\layout LyX-Code

>>> for i in range(1,10):
\layout LyX-Code

...
     cubes.append(i**3)
\layout LyX-Code

...
 
\layout LyX-Code

>>> cubes
\layout LyX-Code

[1, 8, 27, 64, 125, 216, 343, 512, 729]
\layout LyX-Code

\layout LyX-Code


\color blue
# functionally equivalent code using list comprehensions
\layout LyX-Code

>>> cubes = [i**3 for i in range(1,10)]
\layout LyX-Code

>>> cubes
\layout LyX-Code

[1, 8, 27, 64, 125, 216, 343, 512, 729]
\layout Standard

The list comprehension code is faster because it all happens at the C level.
  In the simple for-loop version, the python expression which appends the
 cube of 
\family typewriter 
i
\family default 
 has to be evaluated by the python interpreter for each element of the loop.
 In the list comprehension example, the single line is parsed once and executed
 at the C level.
  The difference in speed can be considerable, and the list comprehension
 example is shorter and more elegant to boot.
\layout Standard

The remaining essential built-in data strucuture in python is the dictionary,
 which is an associative array that maps arbitrary immutable objects to
 arbitrary objects.
 int, long, float, string and tuple are all immutable and can be used as
 keys; to a dictionary list and dict are mutable and cannot.
 A dictionary takes one kind of object as the key, and this key points to
 another object which is the value.
 In a contrived but easy to comprehent examples, one might map names to
 ages
\layout LyX-Code

>>> ages = {}            
\color blue
# create an empty dict
\layout LyX-Code

>>> ages['john'] = 36
\layout LyX-Code

>>> ages['fernando'] = 33
\layout LyX-Code

>>> ages                 
\color blue
# view the whole dict
\layout LyX-Code

{'john': 36, 'fernando': 33}
\layout LyX-Code

>>> ages['john']
\layout LyX-Code

36
\layout LyX-Code

>>> ages['john'] = 37    
\color blue
# reassign john's age
\layout LyX-Code

>>> ages['john']
\layout LyX-Code

37
\layout Standard

Dictionary lookup is very fast; Tim Peter's once joked that any python program
 which uses a dictionary is automatically 10 times faster than any C program,
 which is of course false, but makes two worthy points in jest: dictionary
 lookup is fast, and dictionaries can be used for important optimizations,
 eg, creating a cache of frequently used values.
 As a simple eaxample, suppose you needed to compute the product of two
 numbers between 1 and 100 in an inner loop -- you could use a dictionary
 to cache the cube of all odd of numbers < 100; if you were inteterested
 in all numbers, you might simply use a list to store the cached cubes --
 I am cacheing only the odd numbers to show you how a dictionary can be
 used to represent a sparse data structure
\layout LyX-Code

\layout LyX-Code

>>> cubes = dict([ ( i, i**3 ) for i in range(1,100,2)])
\layout LyX-Code

>>> cubes[5]
\layout LyX-Code

125
\layout Standard

The last example is syntactically a bit challenging, but bears careful study.
  We are initializing a dictionary with a list comprehension.
  The list comprehension is made up of length 2 tuples 
\family typewriter 
( i, i**3
\family default 
 ).
  When a dictionary is initialized with a sequence of length 2 tuples, it
 assumes the first element of the tuple
\family typewriter 
 i
\family default 
 is the 
\shape italic 
key
\shape default 
 and the second element i**3is the 
\shape italic 
value
\shape default 
.
  Thus we have a lookup table from odd integers to to cube.
  Creating dictionaries from list comprehensions as in this example is something
 that hard-core python programmers do almost every day, and you should too.
\layout Exercise

Create a lookup table of the product of all pairs of numbers less than 100.
 The key will be a tuple of the two numbers 
\family typewriter 
(i,j)
\family default 
 and the value will be the product.
 Hint: you can loop over multiple ranges in a list comprehension, eg 
\family typewriter 
[ something for i in range(Ni) for j in range(Nj)]
\layout Section

The Zen of Python
\begin_inset OptArg
collapsed true

\layout Standard

Zen
\end_inset 


\layout Exercise


\family typewriter 
>>> import this
\layout Section

Functions and classes
\layout Standard

You can define functions just about anywhere in python code.
 The typical function definition takes zero or more arguments, zero or more
 keyword arguments, and is followed by a documentation string and the function
 definition, optionally returing a value.
 Here is a function to compute the hypoteneuse of a right triange
\layout LyX-Code

def hypot(base, height):
\layout LyX-Code

   'compute the hypoteneuse of a right triangle'
\layout LyX-Code

   import math
\layout LyX-Code

   return math.sqrt(base**2 + height**2)
\layout Standard

As in the case of the for-loop, leading white space is significant and is
 used to delimt the start and end of the function.
 In the example below, x = 1 is not in the function, because it is not indented
\layout LyX-Code

def growone(l):
\layout LyX-Code

   'append 1 to a list l'
\layout LyX-Code

   l.append(1)
\layout LyX-Code

x = 1
\layout Standard

Note that this function does not return anything, because the append method
 modifies the list that was passed in.
 You should be careful when designing functions that have side effects such
 as modifying the structures that are passed in; they should be named and
 documented in such a way that these side effects are clear.
\layout Standard

Python is pretty flexible with functions: you can define functions within
 function definitions (just be mindful of your indentation), you can attach
 attributes to functions (like other objects), you can pass functions as
 arguments to other functions.
 A function keyword argument defines a default value for a function that
 can be overridden.
 Below is an example which provides a normalize keyword argument.
 The default argument is 
\family typewriter 
normalize=None
\family default 
; the value None is a standard python idiom which usually means either do
 the default thing or do nothing.
 If 
\family typewriter 
normalize
\family default 
 is not 
\family typewriter 
None
\family default 
, we assume it is a function that can be called to normalize our data
\layout LyX-Code

def psd(x, normalize=None):
\layout LyX-Code

    'compute the power spectral density of x'
\layout LyX-Code

    if normalize is not None: x = normalize(x)
\layout LyX-Code

   
\color blue
 # compute the power spectra of x and return it
\layout Standard

This function could be called with or without a 
\family typewriter 
normalize
\family default 
 keyword argument, since if the argument is not passed, the default of 
\family typewriter 
None
\family default 
 is used and no normalization is done.
\layout LyX-Code

\layout LyX-Code


\color blue
# no normalize argument; do the default thing
\layout LyX-Code

>>> psd(x)   
\layout LyX-Code

\layout LyX-Code


\color blue
# define a custom normalize function unitstd as pass it
\layout LyX-Code


\color blue
# to psd
\layout LyX-Code

>>> def unitstd(x): return x/std(x)
\layout LyX-Code

>>> psd(x, normalize=unitstd)
\layout LyX-Code

\layout Standard

In Section
\begin_inset LatexCommand \ref{sec:into_calculator}

\end_inset 

 we noticed that complex objects have the real and imag data attributes,
 and the conjugate method.
 An object is an instance of a class that defines it, and in python you
 can easily define your own classes.
 In that section, we emphasized that one of the important features of a
 classes/objects is that they carry around their data and methods in a single
 bundle.
 Let's look at the mechnics of defining classes, and creating instances
 (a.k.a.
 objects) of these classes.
 Classes have a special double underscore method __init__ that is used as
 the function to initialize the class.
 For this example, we'll continue with the normalize theme above, but in
 this case the normalization requires some data parameters.
 This example arises when you want to normalize an image which may range
 over 0-255 (8 bit image) or from 0-65535 (16 bit image) to the 0-1 interval.
 For 16 bit images, you would normally divide everything by 65525, but you
 might want to configure this to a smaller number if your data doesn't use
 the whole intensity range to enhance contrast.
 For simplicitly, let's suppose our normalize class is only interested in
 the pixel maximum, and will divide all the data by that value.
\layout LyX-Code

from __future__ import division  
\color blue
# make sure we do float division
\layout LyX-Code

class Normalize:
\layout LyX-Code

    """
\layout LyX-Code

    A class to normalize data by dividing it by a maximum value
\layout LyX-Code

    """
\layout LyX-Code

    def __init__(self, maxval):
\layout LyX-Code

        'maxval will be mapped to 1'
\layout LyX-Code

        self.maxval = maxval
\layout LyX-Code

    def __call__(self, data):
\layout LyX-Code

        'do the normalization'
\layout LyX-Code

        
\color blue
# in real life you would also want to clip all values of
\layout LyX-Code


\color blue
        # data>maxval so that the returned value will be in the unit
\layout LyX-Code


\color blue
        # interval
\layout LyX-Code

        return data/self.maxval
\layout Standard

The triple quoted string following the definition of class Normalize is
 the class documentation stringd, and it will bre shown to the user when
 they do 
\family typewriter 
help(Normalize)
\family default 
.
 A commonly used convention is to name classes with 
\shape italic 
UpperCase
\shape default 
, but this is not required.
 self is a special variable that a class can use to refer to its own data
 and methods, and must be the first argument to all the class methods.
 The 
\family typewriter 
__init__
\family default 
 method stores the normalization value maxval as a class attribute in 
\family typewriter 
self.maxval
\family default 
, and this value can later be reused by other class methods (as it is in
 
\family typewriter 
__call__
\family default 
) and it can be altered by the user of the class, as will illustrate below.
 The 
\family typewriter 
__call__
\family default 
 method is another piece of python double underscore magic, it allows class
 instances to be used as 
\shape italic 
functions
\shape default 
, eg you can call them just like you can call any function.
 OK, now let's see how you could use this.
 
\layout Standard

The first line use used to create an 
\shape italic 
instance
\shape default 
 of the 
\shape italic 
class
\shape default 
 
\family typewriter 
Normalize
\family default 
, and the special method 
\family typewriter 
__init__
\family default 
 is implicitly called.
 The second line implicitly calls the special 
\family typewriter 
__call__
\family default 
method
\layout LyX-Code

>>> norm = Normalize(65356) 
\color blue
# good for 16 bit images
\layout LyX-Code

>>> norm(255)               
\color blue
# call this function
\layout LyX-Code

0.0039017075708427688
\layout LyX-Code

\layout LyX-Code


\color blue
# We can reset the maxval attribute, and the call method 
\layout LyX-Code


\color blue
# is automagically updated
\layout LyX-Code

>>> norm.maxval = 255       
\color blue
# reset the maxval
\layout LyX-Code

>>> norm(255)               
\color blue
# and call it again
\layout LyX-Code

1.0
\layout LyX-Code

\layout LyX-Code


\color blue
# We can pass the norm instance to the psd function we defined above, which
 
\layout LyX-Code


\color blue
# is expecting a function
\layout LyX-Code

>>> pdf(X, normalize=norm)            
\layout Exercise

Pretend that 
\family typewriter 
complex
\family default 
 were not built-in to the python core, and write your own complex class
 
\family typewriter 
MyComplex
\family default 
.
 Provide 
\family typewriter 
real
\family default 
 and 
\family typewriter 
imag
\family default 
 attributes and the 
\family typewriter 
conjugate
\family default 
 method.
 Define
\family typewriter 
 __abs__
\family default 
, 
\family typewriter 
__mul__
\family default 
 and 
\family typewriter 
__add__
\family default 
 to implement the absolute value of complex numbers, multiplication of complex
 numbers and addition of complex numbers.
 See the API definition of the python number protocol; although this is
 written for C programmers, it contains information about the required function
 call signatures for each of the double underscore methods that define the
 number protocol in python; where they use 
\family typewriter 
o1
\family default 
 on that page, you would use 
\family typewriter 
self
\family default 
 in python, and where they use 
\family typewriter 
o2
\family default 
 you might use 
\family typewriter 
other
\family default 
 in python.
\begin_inset Foot
collapsed true

\layout Standard

http://www.python.org/doc/current/api/number.html
\end_inset 

 To get you started, I'll show you what the 
\family typewriter 
__add__
\family default 
 method should look like
\layout LyX-Code


\color blue
# An example double underscore method required in your MyComplex
\layout LyX-Code


\color blue
# implementation
\layout LyX-Code

def __add__(self, other):
\layout LyX-Code

    'add self to other and return a new MyComplex instance'
\layout LyX-Code

    r = self.real + other.real
\layout LyX-Code

    i = self.imag + other.imag
\layout LyX-Code

    return MyComplex(r,i)
\layout LyX-Code

\layout LyX-Code


\color blue
# When you are finished, test your implementation with 
\layout LyX-Code

>>> x = MyComplex(2,3)
\layout LyX-Code

>>> y = MyComplex(0,1)
\layout LyX-Code

>>> x.real
\layout LyX-Code

2.0
\layout LyX-Code

>>> y.imag
\layout LyX-Code

1.0
\layout LyX-Code

>>> x.conjugate()
\layout LyX-Code

(2-3j)
\layout LyX-Code

>>> x+y
\layout LyX-Code

(2+4j)
\layout LyX-Code

>>> x*y
\layout LyX-Code

(-3+2j)
\layout LyX-Code

>>> abs(x*y)
\layout LyX-Code

3.6055512754639891
\layout LyX-Code

\layout Section

Files and file like objects
\begin_inset OptArg
collapsed true

\layout Standard

Files
\end_inset 


\layout Standard

Working with files is one of the most common and important things we do
 in scientific computing because that is usually where the data lives.
 In Section
\begin_inset LatexCommand \ref{sec:intro_string}

\end_inset 

, we went through the mechanics of automatically building file names like
\layout LyX-Code

data/myexp01.dat
\layout LyX-Code

data/myexp02.dat
\layout LyX-Code

data/myexp03.dat
\layout LyX-Code

data/myexp04.dat
\layout Standard

but we didn't actually do anything with these files.
 Here we'll show how to read in the data and do something with it.
 Python makes working with files easy and dare I say fun.
 The test data set lives in 
\family typewriter 
data/family.csv
\family default 
 and is a standard comma separated value file that contains information
 about my family: first name, last name, age, height in cm, weight in kg
 and birthdate.
 We'll open this file and parse it -- note that python has a standard module
 for parsing CSV files that is much more sophisticated than what I am doing
 here.
 Nevertheless, it serves as an easy to understand example that is close
 enough to real life that it is worth doing.
 Here is what the data file looks like
\layout LyX-Code

First,Last,Age,Weight,Height,Birthday
\layout LyX-Code

John,Hunter,36,175,180,1968-03-05
\layout LyX-Code

Miriam,Sierig,33,135,177,1971-05-04
\layout LyX-Code

Rahel,Hunter,7,55,134,1998-02-25
\layout LyX-Code

Ava,Hunter,3,45,121,2001-04-26
\layout LyX-Code

Clara,Hunter,0,15,55,2004-10-02
\layout Standard

Here is the code to parse that file
\layout LyX-Code


\color blue
# open the file for reading
\layout LyX-Code

fh = file('../data/family.csv', 'r')
\layout LyX-Code


\color blue
# slurp the header, splitting on the comma
\layout LyX-Code

headers = fh.readline().split(',')
\layout LyX-Code


\color blue
# now loop over the remaining lines in the file and parse them
\layout LyX-Code

for line in fh:
\layout LyX-Code

    
\color blue
# remove any leading or trailing white space
\layout LyX-Code

    line = line.strip()
\layout LyX-Code

    
\color blue
# split the line on the comma into separate variables
\layout LyX-Code

    first, last, age, weight, height, dob = line.split(',')
\layout LyX-Code

    
\color blue
# convert some of these strings to floats
\layout LyX-Code

    age, weight, height = [float(val) for val in (age, weight, height)]
\layout LyX-Code

    print first, last, age, weight, height, dob
\layout Standard

This example illustrates several interesting things.
 The syntax for opening a file is 
\family typewriter 
file(filename, mode)
\family default 
 and the 
\family typewriter 
mode
\family default 
 is a string like 
\family typewriter 
'r'
\family default 
 or 
\family typewriter 
'w'
\family default 
 that determines whether you are opening in read or write mode.
 You can also read and write binary files with 
\family typewriter 
'rb'
\family default 
 and
\family typewriter 
 'wb'
\family default 
.
 There are more options and you should do 
\family typewriter 
help(file)
\family default 
 to learn about them.
 We then use the file 
\family typewriter 
readline
\family default 
 method to read in the first line of the file.
 This returns a string (the line of text) and we call the string method
 
\family typewriter 
split(',')
\family default 
 to split that string wherever it sees a comma, and this returns a list
 of strings which are the headers
\layout LyX-Code

>>> headers
\layout LyX-Code

['First', 'Last', 'Age', 'Weight', 'Height', 'Birthday
\backslash 
n']
\layout Standard

The new line character 
\family typewriter 
'
\backslash 
n'
\family default 
 at the end of 
\family typewriter 
'Birthday
\backslash 
n'
\family default 
 indicates we forgot to strip the string of whitespace.
 To fix that, we should have done
\layout LyX-Code

>>> headers = fh.readline().strip().split(',')
\layout LyX-Code

>>> headers
\layout LyX-Code

['First', 'Last', 'Age', 'Weight', 'Height', 'Birthday'] 
\layout Standard

Notice how this works like a pipeline: 
\family typewriter 
fh.readline 
\family default 
returns a line of text as a string; we call the string method 
\family typewriter 
strip
\family default 
 which returns a string with all white space (spaces, tabs, newlines) removed
 from the left and right; we then call the 
\family typewriter 
split
\family default 
 method on this stripped string to split it into a list of strings.
\layout Standard

Next we start to loop over the file -- this is a nice feature of python
 file handles, you can iterate over them as a sequence.
 We've learned our lesson about trailing newlines, so we first strip the
 line with 
\family typewriter 
line = line.strip()
\family default 
.
 The rest is string processing, splitting the line on a comma as we did
 for the headers, and converting the strings to numbers where approriate
 by calling f
\family typewriter 
loat(val)
\family default 
 for each of 
\family typewriter 
age
\family default 
, 
\family typewriter 
weight
\family default 
 and 
\family typewriter 
height
\family default 
.
 Notice how we use list comprehensions and tuple unpacking -- the age, weight,
 
\family typewriter 
height = [float(val) for val in (age, weight, height)] 
\family default 
line, to convert several values at once.
\layout Standard

Now that we have all this data, how mught we store it.
 We could store it in a 
\family typewriter 
results
\family default 
 list
\layout LyX-Code

results = []
\layout LyX-Code

for line in fh:
\layout LyX-Code

    
\color blue
# process the line as above to get the variables
\layout LyX-Code

    results.append( (first, last, age, weight, height, dob) )
\layout LyX-Code

\layout LyX-Code

\layout LyX-Code


\color blue
# and later when we want to analyze the data
\layout LyX-Code

for first, last, age, weight, height, dob in results:
\layout LyX-Code

    
\color blue
# do something with the data
\layout Exercise


\family typewriter 
zip
\family default 
 magic.
  Python has a nice funcion 
\family typewriter 
zip
\family default 
 that lets you do very useful things with lists of tuples.
  
\family typewriter 
results
\family default 
 above is a list of tuples -- each tuple is the 
\family typewriter 
first
\family default 
, 
\family typewriter 
last
\family default 
, 
\family typewriter 
age
\family default 
, 
\family typewriter 
weight
\family default 
, 
\family typewriter 
height
\family default 
, 
\family typewriter 
dob
\family default 
 for a family member.
  What happens if you do 
\layout LyX-Code

>>> first, last, age, weight, height, dob = zip(*results)
\layout Standard

What is 
\family typewriter 
age
\family default 
 now?
\layout Exercise

Write a class 
\family typewriter 
Person
\family default 
 and store the attributes 
\family typewriter 
first
\family default 
, 
\family typewriter 
last
\family default 
, 
\family typewriter 
age
\family default 
, 
\family typewriter 
weight
\family default 
, 
\family typewriter 
height
\family default 
, 
\family typewriter 
dob
\family default 
 in that class.
  Add a class instance to the results list, eg
\layout LyX-Code

results.append(Person(first, last, age, weight, height, dob))
\layout Standard

Python also has a special syntax for printing to an open writable file object
\layout LyX-Code


\color blue
# open the file for writing
\layout LyX-Code

outfile = file('mydata.data', 'w') 
\layout LyX-Code

for x,y,z in myresults:
\layout LyX-Code

    print >> outfile, '%1.3f %1.3f %1.3f'%(x,y,z)
\layout Standard

Another really nice thing about file objects is that other classes can implement
 the file protcol and allow you to use them as if they were files.
 For example, the StringIO module in the standard library allows you to
 read and write to strings as if they were files.
 The urllib.urlopen function allows you to open a remove web page as a file
 object.
 Try this
\layout LyX-Code


\color blue
# loop over the lines in google's html
\layout LyX-Code

from urllib import urlopen
\layout LyX-Code

for line in urlopen('http://www.google.com').readlines():
\layout LyX-Code

    print line,
\the_end
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.