Python&Fortran SIO
Python&Fortran SIO
Peter Shearer
Scripps Institution of Oceanography
University of California, San Diego
1 Introduction 1
1.1 Scientific Computing at SIO . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Why you should learn a “real” language . . . . . . . . . . . . . . . . 4
1.3 Learning to program . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 FORTRAN vs. C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 UNIX introduction 9
2.1 Getting started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Basic commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Files and editing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4 Basic commands, continued . . . . . . . . . . . . . . . . . . . . . . . 15
2.4.1 Wildcards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4.2 The .login and .cshrc files . . . . . . . . . . . . . . . . . . . . 17
2.5 Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.6 File transfer and compression . . . . . . . . . . . . . . . . . . . . . . 21
2.6.1 FTP command . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6.2 File compression . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.6.3 Using the tar command . . . . . . . . . . . . . . . . . . . . . 24
2.6.4 Remote logins and job control . . . . . . . . . . . . . . . . . . 25
2.7 Miscellaneous commands . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.7.1 Common sense . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.8 Advanced UNIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.8.1 Some sed and awk examples . . . . . . . . . . . . . . . . . . . 31
2.9 Example of UNIX script to process data . . . . . . . . . . . . . . . . 31
2.10 Common UNIX command summary . . . . . . . . . . . . . . . . . . 34
4 Fortran 51
4.1 Fortran history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.2 Texts and manuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.3 Compiling and running F90 programs . . . . . . . . . . . . . . . . . 53
4.3.1 The first program explained . . . . . . . . . . . . . . . . . . . 54
4.3.2 How to multiply two integers . . . . . . . . . . . . . . . . . . 55
4.4 An important historical digression . . . . . . . . . . . . . . . . . . . 56
iii
iv CONTENTS
8 LaTeX 265
8.1 A simple example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
8.2 Example with equations . . . . . . . . . . . . . . . . . . . . . . . . . 267
8.3 Changing the default parameters . . . . . . . . . . . . . . . . . . . . 269
8.3.1 Font size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
8.3.2 Font attributes . . . . . . . . . . . . . . . . . . . . . . . . . . 269
8.3.3 Line spacing . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
8.4 Including graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
8.5 Want to know more? . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
Introduction
This course is intended to help incoming students get up to speed on the various
computing tools that will help them with their research and some of the homework
assignments for other classes. The perspective is largely that of the Geoscience
program at SIO, but we hope that the course is general enough to be useful for
other students as well.
All students should have access to a Mac and an account on the IGPP Mac
network. Please let me know if you do not already have an account. If you are using
your own Mac, you will need to install the following:
• Fink
• gfortran
• XCode
• Python
• TexShop
• TextWrangler
1
2 CHAPTER 1. INTRODUCTION
Before 2004 or so, computer hardware used at SIO was of two main types:
However, these boundaries are now blurred because many PCs run Linex1 (an
open source form of UNIX) and Apple has adopted UNIX for their operating system
(starting with OS X).
This permits these machines to be used for both serious scientific computations
and word processing and more traditional PC activities.
1
Linex is pronounced similar to \linen" (see http://www.paul.sladen.org/pronunciation/)
1.1. SCIENTIFIC COMPUTING AT SIO 3
Ten to twenty years ago, most geophysics departments had networks of Sun
workstations. Today these have largely been replaced with networks of cheap PCs
running Linux or with Apple machines. At IGPP we now use Macs for everything
except for certain specialized or high-performance computers maintained by individ-
ual PIs. For example, David Sandwell’s group maintains some Suns because some
of their processing software runs only on that platform.
1.1.2 Software
Here is a list of programming languages and software used at IGPP (Items with *
will be discussed in this class).
1. Programming
(a) *FORTRAN (most IGPP faculty use this, large library of existing code)
(b) C (more widely taught and used by computer scientists)
(c) C++ (extended version of C)
(d) Java (used a lot for web programming, has portability advantages, but
often not fast enough for serious computing)
(e) *Python (lots of buzz, Lisa Tauxe likes it)
(f) MatLab (widely used but commercial product, previously presented in
this class, but our aim is to replace it with Python)
(g) HTML for web sites (most people now don’t program in raw html, but
use web designer software, such as DreamWeaver)
2. Community Programs
Commercial Programs
The final project in this class will be to write a program that plays tic-tac-toe. This
is a daunting task if one tries to write it all at once. But we will divide it into
smaller parts for separate assignments over the weeks, and only put all the pieces
together for the final program at the end.
Also, be aware the most languages have far more features than you actually need.
Few of us except professional programmers learn them all. But we do not need to
write professional programs—we only need to learn to write practical programs that
serve our own needs. For example, I probably only know about 20 UNIX commands.
A real computer nerd would find pathetic some of the ways that I do things. But
it’s been enough for me to get by and to do the science that I want to do. (Having
said that, it wouldn’t hurt for me to learn more and I’m going to look over some of
Duncan Agnew’s notes from the 2009 class to find some new tricks)
Some years ago when I (Peter) first started teaching programming in this class, I had
to decide whether to teach C or Fortran. Like most SIO faculty of my generation or
older, I was experienced in FORTRAN but had little exposure to C. After talking to
some people who know both FORTRAN and C, however, I decided to bite the bullet
and learn enough C to teach this class. This was motivated in part by the fact that
C is now one of the standard languages taught in computer science departments;
many of our incoming students have C experience but few have Fortran experience.
Here is my summary of the advantages and disadvantages of either choice:
• FORTRAN
{ Advantages
∗ Large amount of existing code
∗ Preferred language of most SIO faculty (most faculty are old)
∗ Complex numbers are built in
∗ Choice of single or double precision math functions
{ Disadvantages
∗ Column sensitive format in older versions
∗ Dead language in computer science departments
• C
{ Advantages
∗ Large amount of existing code
6 CHAPTER 1. INTRODUCTION
With reasonable compilers, both languages are equally fast. Ultimately, there
are fixes for both languages that permit them to assume the advantages of the other
language. For example, you can use pointers in FORTRAN90 and you can define
an add-on set of functions to do complex arithmetic in C. However, after learning
enough C to teach the class, I concluded that C is not a very user friendly language
for those without programming experience. So I switched to Fortran90, an improved
version of Fortran77, that combines the advantages of Fortran and C into a more
user-friendly package.
1.5 Python
To get a full list of options, simply type the command name alone. But for now,
let us examine the arguments in our example:
-R0/360/-90/90
This sets the map limits of longitude (lon1=0, lon2=360), and latitude (lat1=-
90, lat2=90). Note that we could have written this with a space between the “R”
and the zero (”0”) immediately following. Most people leave this space out to more
easily separate the different commands.
-JQ180/6i
-B60g30
This sets the labeled lat/lon lines to 60 degree intervals and the unlabeled lines
to 30 degree intervals
-P
-Dc
Sets the resolution of the coastline to “c” for crude. This is all that is required
for a small map of the whole globe. For larger maps or closeups, higher resolution
will be required. The available options are: (f)ull, (h)igh, (i)ntermediate, (l)ow and
(c)rude. Note that the full resolution files require over 55Mb of data and provide
great detail. It takes more time to generate the plot, too, so generally should only
be used for extreme close ups. The default is (l)ow resolution.
3.1. INSTALLATION OF FINK, GMT OTHER USEFUL TOOLS 45
#!/bin/csh
gmtset ANOT_FONT_SIZE 10
pscoast -R-180/180/-90/90 -JH0/6i -Bg0:."IRIS FARM stations": \
-P -Dc -G150 -X1.2i -Y4i -K >! map.ps
psxy -O -R -JH -St0.06i -G255/0/0 -: station.list >> map.ps
-JH0/6i
The 0 results in no grid lines or labels The title is set off with :.” and “: (weird!)
-G255/0/0
#!/bin/csh
gmtset HEADER_FONT_SIZE 20
gmtset ANOT_FONT_SIZE 10
pscoast -R-121/-114/32/37 -JM6i -B1g1:."IRIS FARM stations": \
-P -Di -I1 -I2 -I3 -N1 -N2 -G255/200/200 -X1.2i -Y4i -K >! map.ps
Changes are:
gmtset HEADER_FONT_SIZE 20
-R-121/-114/32/37
-JM6i
Label and draw grid lines every 1 degree Use same title
-I1
-I2
-I3
-N1
-N2
-G255/200/200
Fill land areas with red=255, green=200, blue=200 - a ghastly shade of pink:
3.1. INSTALLATION OF FINK, GMT OTHER USEFUL TOOLS 47
36˚ 36˚
35˚ 35˚
34˚ 34˚
33˚ 33˚
32˚ 32˚
239˚ 240˚ 241˚ 242˚ 243˚ 244˚ 245˚ 246˚
Next, let’s add lines that show the traces of mapped faults in southern California.
For this, we use a file called calif.flts that has the following format:
370.0000 99.0000
-115.5496 32.9312
-115.5419 32.9142
-115.5358 32.9029
-115.5276 32.8890
370.0000 99.0000
-115.9218 32.9916
-115.9096 32.9849
-115.8936 32.9745
-115.8729 32.9655
-115.8398 32.9498
-115.8216 32.9410
-115.7983 32.9295
48 CHAPTER 3. GENERIC MAPPING TOOLS
-115.7796 32.9228
370.0000 99.0000
-115.8391 33.0127
-115.8205 33.0069
-115.8017 33.0015
-115.7892 32.9973
etc.
The faults are defined as (lon,lat) pairs. A value of (370,99) is used to separate
the different faults because 370 is off the map (only goes to 360!). To plot these
faults on our map of the southern California stations, we can use the psxy command
a second time in the script do.gmt6:
#!/bin/csh
gmtset HEADER_FONT_SIZE 20
gmtset ANOT_FONT_SIZE 10
pscoast -R-121/-114/32/37 -JM6i -B1:."IRIS FARM stations": \
-P -Di -I1 -I2 -I3 -N1 -N2 -G255/200/200 -X1.2i -Y4i -K >! map.ps
psxy -O -R -JM -M’370’ -W8/255/0/0 calif.flts -K >> map.ps
psxy -O -R -JM -St0.15i -G0/0/255 -: station.list >> map.ps
Changes are:
We removed the g0 so we don’t plot grid lines which might get confused with the
faults
We plot the faults using:
-M’370’
lines starting with ’370’ mark segment boundaries. Without this command the plot
would look like a mess because lines would be drawn to the (370,99) points.
-W8/255/0/0
3.1. INSTALLATION OF FINK, GMT OTHER USEFUL TOOLS 49
This draws the line with linewidth=8 (thicker than normal) and color red=255,
green=0, blue=0
Note that we do not need the -: option because the points are already given as
(lon,lat).
We plot the stations last so that they will go on top of the faults. We change
the symbol size and the color:
-St0.15i
-G0/0/255
36˚ 36˚
35˚ 35˚
34˚ 34˚
33˚ 33˚
32˚ 32˚
239˚ 240˚ 241˚ 242˚ 243˚ 244˚ 245˚ 246˚
Install Fink and GMT if you haven’t already. Write a script that plots your
birthplace as a big (.2in) red star. Use an orthographic projection with the star at
lon0/lat0. Draw all rivers, national and state boundaries. Make a title in 20pt font
with your name. Look up how to annotate the point with text (HINT: pstext and
label the star with the name of the place you were born. Turn in the script and the
.eps file produced by it (via e-mail to [email protected]).
Chapter 4
Fortran
Why learn Fortran? The answer for geophysics students at SIO is obvious—most
IGPP faculty program in Fortran. However, let me make a broader pitch for its
usefulness. For many years, Fortran was the language of choice for scientific pro-
gramming. Recently, C has emerged as a competitor and seems to now be much
more widely taught to students, perhaps because it is favored by computer science
departments (who worry more about writing compilers than how to handle complex
numbers).
Why learn Fortran if you already know C? Several reasons come to mind:
1. You will communicate and exchange software more readily with most IGPP
faculty, who are generally proficient in Fortran but rather challenged in C.
4. It’s fun!
51
52 CHAPTER 4. FORTRAN
The name Fortran is derived from IBS Mathematical FORmula TRANslation Sys-
tem. Originally it was spelled in all caps as FORTRAN, but the more modern
usage is to only capitalize the first letter. Fortran is one of the oldest computer
programming languages and was begun in 1954 by IBM programmers led by John
Backus (no relation to George). It is updated and ”improved” by some committee
of computer sciences every ten years or so. Important versions include:
Fortran IV (released in 1972)
Fortran 77 (released in 1980), major revision
Fortran 90 (released in 1991), major revision
Fortran 95 (released in ?), minor revision
Wikipedia also lists Fortran 2003 and 2008, which I don’t know anything about. I
would not recommend using these at this point in time. Most versions are designed to
be fully backward compatible with previous versions. However, Fortran90 departed
from the column sensitive format of older Fortran, resulting in a more modern
approach that necessitated some slight incompatibilities with older Fortran. I will
try to teach this class entirely in Fortran90, but will describe the differences with
Fortran77 when they are important because you are likely to need to work with
existing Fortran77 code at some point.
This class will be a tutorial on Fortran90 and will not be comprehensive. Thus, I
recommend that you buy a textbook that will be serve as a more complete reference.
I just checked on Amazon and there are 4 or 5 books out there. One that I have
used and recommend is: Fortran 90/95 Explained (Metcalf and Ker Reid, Oxford
Univ. Press, 1999), which has some favorable reviews. It’s only $47. There is an
updated version that I have not seen for $53. You may want to check around to see
which book you like best.
In class, we will use gfortran, which can be downloaded free for the Macs. To
make sure you have it in your path, enter: ‘which gfortran’ and you should get
something like /usr/local/gfortran/bin/gfortran
I recommend that you add the following to your .cshrc file:
This will allow you to just type f90 instead of gfortran when you want to compile
programs. The examples that follow assume that you have done this.
Fortran90 programs are written as ascii files that end in ‘.f90’. This is called the
source code. Programs in older versions of Fortran end in simply ‘.f’. Because of the
slight incompatibilities between the versions, be sure to use the appropriate suffix
so that both you and the compiler know which version to use. It is in fact possible
to set things up so that your F90 programs end in .f rather than .f90. I do not
recommend this because there is so much existing code around in Fortran77 that
confusion is likely to result if you do not explicitly identify your code as F90.
Before the program can be run, it must be compiled using the Fortran90 com-
piler on your computer, creating the executable file that you use to actually run
the program. For example, suppose you have a program called printmess.f90 with
contents:
To compile this program from my computer (rock in this case), I enter ‘f90
printmess.f90 -o printmess’ and get the following response:
No news is good news! If there was a syntax or other error detected during the
compilation, we would get an error message at this point. This creates an executable
version of the program called ‘printmess’ (you should ALWAYS use name of the
program without the ”.f90” for the executable) which can then be run simply by
entering ./printmess:
rock% ./printmess
test message
rock%
Note the because . is not in our path, we need to type ./printmess and not just
printmess.
54 CHAPTER 4. FORTRAN
If you are like me, you will quickly tire of typing in the line ‘f90 printmess.f90 -o
printmess’ Thus, I recommend that you create a Makefile (yes, the first character
MUST be capitalized) in the same directory as the program. Include in the Makefile
the following lines:
%: %.f90
gfortran $< -o $*
The tricky part of this is that the space before the gfortran MUST be entered
as a tab, not as a series of spaces.
Once you have set up this Makefile, then you can just enter:
make printmess
in order to compile the program. Makefiles are very useful to keep track of compiler
options and to bind with subroutines. (more about this later).
The difference between source code and excutables is fundamental to languages
like FORTRAN and C. It is why they generally run much faster than uncompiled
languages like BASIC or Matlab or Python scripts.
OK, now let’s examine our simple F90 test program again:
The first line is a comment line. Anything following an exclamation mark (!) is
a comment. The end of the line terminates the comment. There is no need (as in
many languages) to terminate the comment with another flag. We can also use ! to
add an inline comment following a statement, e.g.,
would be a valid line. The next line is blank. You can put in as many blank lines as
you want and they will be ignored by the compiler. Blank lines provide a good way
to improve the readability of longer programs by breaking them up into coherent
blocks of code.
4.3. COMPILING AND RUNNING F90 PROGRAMS 55
The next line is used to name the program. The name printmess is not used by
the program at all. This line is optional but is considered good programming style.
Good style also suggests that lines in the body of the program are indented, in this
case by three spaces:
print * will output to the screen. The desired output is enclosed in quotes (apos-
trophes would also work). Finally, all F90 programs must terminate with an ‘end’
statement. The ‘program printmess’ is optional, but is considered good program-
ming practice. These style points don’t matter much for a short program like this,
which would work just as well if were written as:
but are helpful for longer programs (hundreds to thousands of lines of code) where
the label following the ‘end’ would remind a reader of which program is actually
ending.
ASSIGNMENT F1
Write a F90 program to print your favorite pithy phrase.
program multint
integer :: a, b, c !declare variables
a = 2
b = 3
c = a * b
print *, "Product = ", c
end program multint
The program uses three variables, the letters a, b and c. Variable names can
be from 1 to 31 characters long. The first character must be a letter. The remain-
ing characters can be any combination of letters, numbers, and underscores ( ).
Variables in Fortran can be of many types, including real (floating point), integer,
complex, double precision, character and logical. In our case, we want them to be
integers so we define them using a ‘type statement’
56 CHAPTER 4. FORTRAN
Older Fortran programs do not include the :: in these statements; this convention
still works under F90 but is discouraged. For the Sun F90 compiler, these are 4-
byte integers that can range from -2,147,483,648 to 2,147,483,647. Alternatively,
they could have been defined as real numbers:
real :: a, b, c
in which case they would be floating point (real) variables. For the Sun compiler,
real numbers range from 1.175494e-38 to 3.402823e+38)
The lines that follow are pretty self explanatory:
a = 2
b = 3
c = a * b
print *, "Product = ", c
It is not required that variables be declared in Fortran. If they are not declared,
then the compiler assigns them as integer or real, depending upon their first letter.
Variable names beginning with i through n (INteger, get it?) are assumed to be
integers; all others are assumed to be real. Many, if not most, older Fortran programs
adopt this convention. Often they only declare variables when the rule is broken,
for example when it is desired that the variable ‘year’ be an integer. A prominent
example of this type of Fortran programming may be found in the first edition (1986)
of Numerical Recipes (but by the time of the second edition in 1992, the authors
had been shamed into declaring all variables).
Such undeclared variables are said to be declared implicitly based upon their
first letter. An ‘implicit’ statement can be used to modify the default rules, for
example
will make all undeclared variables real. However, modifications such as this will only
lead to more confusing code. Modern programming practice is to explicitly declare
ALL variables and to have the compiler warn us when variables are not declared.
This can be done by including the statement
implicit none
at the beginning of every Fortran program. I will admit that, as a long time Fortran
programmer, I do not always follow this practice. I must concede, however, that it is
a good idea and is likely to save more time in the long run (by eliminating program
bugs that are often caused by undeclared variables) than is lost while writing the
program.
Example 1: Suppose you should have the following line in your program:
x2 = a1 + a2*sin(theta)*scale1/scale2
x2 = a1 + a2*sin(theta)*scalc1/scale2
If you don’t follow the practice of declaring all of your variables, then the error
will not be detected during compilation. The program will run and assign zero to
the otherwise unused variable scalc1 and you will get wrong answers. On the other
hand, if you use ‘implicit none’ and declare your variables, the compiler will flag
scalc1 as an undeclared variable and you can fix it before it causes any more trouble.
kmdeg = 111.19
dist = (delta2 - delta1)*kmdeg
but you have not declared kmdeg explicitly. Because the letter k is between i and n,
the program will declare kmdeg as an integer. The value 111.19 will be truncated
to 111 and you will get values for dist that are slightly, but not obviously, wrong
(the worst kind of program bug to have).
To save you from these embarrassments and to encourage good programming
habits, we will declare all variables for the programs in this class and I will dock
you points if you fail to do so in assignments.
58 CHAPTER 4. FORTRAN
ASSIGNMENT F2
Copy the program multint but leave out the integer :: a, b, c statement. What
happens when you run the program? Why?
ASSIGNMENT F3
Cut and paste the following defective program onto your computer:
program longjump
beamon_long = 8.90 !distance in meters
powell_long = 8.95
dif_inch = (powell_long - beamon_1ong) * 39.37
print *,"Powell jumped ",dif_inch," inches more than Beamon"
end program longjump
Compile and run the program. Then explain why it gives the wrong answer.
There are always lots of ways to write the same program. Here is another way to
write the multint.f90 code:
program multint2
implicit none
integer :: a=2, b=3, c !declare variables
c = a * b; print *, "Product = ", c
end program multint2
First, notice that variables can be assigned values when they are declared (OK
in F90, don’t try this in F77). Second, notice that more than one command can be
included on a line if the commands are separated by a semicolon (again, only OK in
F90). Both of these changes make the code more similar to C. The variable assigning
option is a reasonable convenience, but the multiple command option is certainly
misused in this case because it makes the code much harder to read. Unless you
have a really good reason to put more than one command on a line (saving space is
NOT a good reason!), I suggest that you never use semicolons in this way.
Any reasonable programming language must provide a way to loop over a series
of values for a variable. In C, this is most naturally implemented with the ‘for’
statement. In FORTRAN this is done with the ‘do loop’.
Here is an example program that generates a table of trig functions:
4.5. MAKING A FORMATTED TRIG TABLE USING A DO LOOP 59
program trigtable
implicit none
integer :: itheta
real :: theta, stheta, ctheta, ttheta, degrad
degrad = 180./3.1415927
do itheta = 0, 89, 1
theta = real(theta)
ctheta = cos(theta/degrad)
stheta = sin(theta/degrad)
ttheta = tan(theta/degrad)
print "(f5.1,1x,f6.4,1x,f6.4,1x,f7.4)", theta, ctheta, stheta, ttheta
enddo
end program trigtable
Fortran (like C and Matlab) uses radians (not degrees) as the arguments of the
trig functions. Thus, following the definitions of the variables as real, we assign
degrad to 180/pi so that we can easily make this conversion.
do itheta = 0, 89, 1
This begins the do loop which must eventually be closed with the enddo state-
ment. For clarity, we indent the inside of the loop. This is a loop over values of theta
from a starting value of 0, incremented by 1 each time, until theta is greater than
89. The 1 is actually optional as it is the default increment. Thus theta will assume
the values (0, 1, 2, ...., 88, 89) inside the loop. Notice that we use an integer for the
do loop. Older versions of Fortran (e.g., F77) permitted the use of real variables
in do loops. This is not recommended and roundoff peculiarities mean that the do
loop would need to be written in the form:
Note that we use 89.1 rather than 89.0 as the ending value to avoid the possibility
that roundoff error might cause the desired ending value (computed by successively
adding 1.0 to theta) to slightly exceed 89 and thus be excluded from the loop. I
have a lot of old code of this form, but I’m going to slowly try to get rid of it.
Inside the do loop we first convert from the integer theta to the real theta using:
theta = real(theta)
We then compute the cosine, sine and tangent of theta, after converting from
degrees to radians by dividing by degrad (anybody see how we could make the
program slightly more efficient?). To make the output look nice, we do not use
60 CHAPTER 4. FORTRAN
which would space the numbers irregularly among the columns. Instead, we explic-
itly specify the output format using a format specification:
where f5.1 specifies that theta will be output as a real number into 5 total spaces,
with 1 digit to the right of the decimal place. Similarly, f6.4 specifies that ctheta is
output into spaces with 4 digits to the right of the decimal place. The numbers will
be right justified, with leading blanks used as necessary. The 1x specifies that one
blank character will be output between each of the numbers.
An alternative way to write this:
Here we have used the continuation character & to split the statement into two
lines. Blanks are ignored so we can space things out neatly to line up the variables
with their formats. Notice that we have removed the need for the 1x between formats
by adding an additional column to the appropriate formats (i.e., writing f7.4 rather
than f6.4). Because the numbers are right justified, this will add an additional space
to the left of each number. Aligning things this neatly is probably more trouble than
it’s worth, but it certainly makes the code easier to understand.
Notice that in this case the f7.4 appears twice in a row. Often programmers will
write this more compactly as:
This convention is still allowed in F90 but should not be used unless you want to
use the format statement more than once, e.g.,
4.5. MAKING A FORMATTED TRIG TABLE USING A DO LOOP 61
117 is termed a line label and must consist of digits. It is best not to refer to it as
a line number (the old usage) because it normally has nothing to do with the line
numbers and the labels need not be sequential (more about this later). Note that
the format line need not immediately follow the print line(s); it can come before.
Many older programs put all of the format statements at the end of the code.
An alternative way in F90 to use the same format specification more than once
is to define it as a character variable, i.e.,
but we are getting ahead of ourselves because we have not yet shown how to use
character variables.
We used the Fortran sine, cosine and tangent function in the trigtable program.
Here is a complete list of math functions:
acos(x) arccosine
asin(x) arcsine
atan(x) arctangent
atan2(y,x) arctangent of y/x in correct quadrant (***very useful!)
cos(x) cosine
cosh(x) hyperbolic cosine
exp(x) exponential
log(x) natural logarithm
log10(x) base 10 log
sin(x) sine
sinh(x) hyperbolic sine
sqrt(x) square root
tan(x) tangent
tanh(x) hyperbolic tangent
degrad = 180./3.1415927
degrad = 180/3.1415927
The reason is to make completely sure that the program will compute a real
quotient and not an integer quotient. In fact, this caution is not needed in this case,
as the following program demonstrates:
program testfrac
implicit none
real c
c = 2/3
print *,’2/3 = ’, c
c = 2/3.
print *,’2/3. = ’, c
c = 2./3
print *,’2./3 = ’, c
c = 2./3.
print *,’2./3. = ’, c
end program testfrac
2/3 = 0.0E+0
2/3. = 0.6666667
2./3 = 0.6666667
2./3. = 0.6666667
As long as one part of the fraction is real, the program will compute a real
quotient. It is only when both numbers are written as integers that the result is
truncated. However, I have gotten into the habit of always including the decimal
point in real expressions to avoid someday accidentally writing something like:
There are many different formats that can specified. Here are some common exam-
ples:
4.6. INPUT USING THE KEYBOARD 63
If a number does not fit into the allocated spaces, it will appear as a series of
asterisks (*****)
ASSIGNMENT F4
Write a F90 program to print a table of x, sinh(x), and cosh(x) (the hyperbolic
sine and cosine, these are built-in functions in Fortran) for values of x ranging from
0.0 to 6.0 at increments of 0.5 (use these x values directly in sinh(x) and cosh(x), do
not convert them to radians). Use a suitable format to make nicely aligned columns
of numbers.
So far all of our example programs have run without prompting the user for any in-
formation. To expand our abilities, let’s learn how to input data from the keyboard.
In most programs, we will want to first prompt the user to input the data, so here
is an example of how to input two numbers:
Pretty easy, isn’t it? (Compare this section with the corresponding input section
in the C or Python notes and you will see why I think Fortran is more user friendly
than C or Python)
64 CHAPTER 4. FORTRAN
program usermult
implicit none
integer :: a, b, c
print *, "Enter two integers"
read *, a, b
c = a * b
print *, "Product = ", c
end program usermult
rock% usermult
Enter two integers
2 5
Product = 10
rock% usermult
Enter two integers
2
5
Product = 10
Often in cases like this I will forget that I’m supposed to input more than one
number. When the program just sits there, I then realize that I need to input more
numbers (or I wonder why it’s taking so long to finish!).
What happens if we make a mistake and try entering a real number? Let’s check:
rock% usermult
Enter two integers
3.1 15
This is what happens on my old Sun computer, but most Fortran compilers will
produce a similar message. The error message in this case is quite informative and
4.7. IF STATEMENTS 65
tells us exactly what the problem is. This is better than performing the computation
and returning the wrong answer (the default in C for this example unless you are
quite careful).
4.7 If statements
Next, let’s modify this program so that it will allow the user to continue entering
numbers until he/she wants to stop:
program usermult2
implicit none
integer :: a, b, c
do
print *, "Enter two integers (zeros to stop)"
read *, a, b
if (a == 0 .and. b == 0) exit
c = a * b
print *, "Product = ", c
enddo
end program usermult2
Here the do loop has no arguments and the block of code inside the do loop will
be repeatedly executed until an ‘exit’ command is executed. ‘Exit’ (a new feature
in F90) means to leave the do loop entirely and go to the next line after the ‘enddo’
statement. As in our previous example, we indent the block inside the do loop to
make the code easier to read.
The program will allow the user to continuing entering numbers to be multiplied.
When the user wishes to stop the program (in a more elegant way than hitting
CNTRL-C), he/she enters zeros for both arguments. The if statement checks for
this and exits the do loop in this case:
if (a == 0 .and. b == 0) exit
FORTRAN
77 90 C PYTHON MATLAB meaning
.eq. == == == == equals
.ne. /= != != ~= does not equal
.lt. < < < < less than
.le. <= <= <= <= less than or equal to
.gt. > > > > greater than
66 CHAPTER 4. FORTRAN
The F77 syntax will still work under F90 and you are likely to see this in many
of the older programs, e.g.,
will also work. These operators can be combined to make complex tests, e.g.,
There is, of course, an order of operations for these things which I can’t remember
very well. Look it up in a book if you are unsure or, better, just put in enough
parenthesis to make it completely clear to anyone reading your code.
One nice aspect of Fortran compared to C is that if you make a mistake and
type, for example,
if (a = 0) exit
you will get an error message during compilation. In C this is a valid statement
with a completely different meaning than is intended!
In the above example, a single statement is executed when the if condition is true.
A more versatile form is as follows:
The blocks of code can contain many lines if desired. As many ‘else if’ statements
as required can be used. At most, one block of code will be executed (once one of the
‘if’ tests is satisfied, it does not check the others). The final ‘else’ will be executed
if none of the preceding if statements is true. The final ‘else’ is optional.
Here is a demonstration program that repeatedly prompt the user for a positive
real number. If it is negative, ask the user to try again. If it is positive, it computes
and displays the square root using the sqrt() function. If the user enters zero, the
program stops.
program usersqrt
implicit none
real :: a, b
do
print *, ’Enter positive real number (0 to stop)’
read *, a
if (a < 0) then
print *,’This number is negative!’
cycle
else if (a == 0) then
exit
else
b = sqrt(a)
print *, ’sqrt = ’, b
end if
end do
end program usersqrt
Notice the use of the ‘cycle’ command (also new in F90), which directs the
program to the next iteration of the do loop. In contrast, the ‘exit’ command exits
the do loop entirely. The ‘cycle’ and ‘exit’ commands permit code to be written that
is free of the ‘go to’ statements that would likely have been present in a F77 version
of this program (‘go to’ statements are considered mortal sins by the programming
style police).
ASSIGNMENT F5
Write a program to repeatedly ask the user for the constants a, b, and c in
the quadratic equation a*x**2+b*x+c=0. Using the quadratic formula, have the
program identify and compute any real roots. Output the number of real roots and
their values. Stop the program if the users enters zeros for all three values. HINT:
Test your program for some simple examples to make sure it is working correctly
(a=1, b=2, c=-3 should return -3 and 1).
68 CHAPTER 4. FORTRAN
Here is an example program that uses some of the concepts that we have just learned:
! compute the greatest common factor of two integers
program gcf
implicit none
integer :: a, b, i, imax
do
print *, ’Enter two integers (zeros to stop)’
read *, a, b
if (a == 0 .and. b == 0) then
exit
else
do i = 1, min(a, b)
if (mod(a, i) == 0 .and. mod(b, i) == 0) imax=i
end do
print *, "Greatest common factor = ", imax
end if
enddo
end program gcf
This is not a particularly efficient algorithm, but it runs plenty fast for small
numbers. There are some new things here:
1. min(a,b) computes the minimum of a and b using the intrinsic Fortran ‘min’
function. Naturally there is also a ‘max’ function. min and max can have
more than 2 arguments if desired.
ASSIGNMENT F6
Modify gfc.f90 to compute the least common multiple of two integers.
4.9. USER DEFINED FUNCTIONS AND SUBROUTINES 69
1. You can test these pieces individually to see if they work before trying to get
the complete program to work.
To illustrate how to define your own function, here again is the greatest common
factor program:
program gcf2
implicit none
integer :: a, b, gcf
integer, external :: getgcf
do
print *, ’Enter two integers (zeros to stop)’
read *, a, b
if (a == 0 .and. b == 0) then
exit
else
gcf = getgcf(a,b)
print *, "Greatest common factor = ", gcf
end if
enddo
end program gcf2
We now perform the gcf calculation in the function getgcf. The variables a and
b in the main program are the function arguments. They are passed to the function
in the statement:
gcf = getgcf(a, b)
70 CHAPTER 4. FORTRAN
The variables x and y in the function will assume the values passed to the
function by the main program. These arguments must match in number and type
(real vs. integer, etc.) with the variables in the main program. Note, however, that
they do not need to have the same names.
Notice that we must declare getgcf in the calling program:
This is the preferred F90 syntax, although the following will also work:
integer :: getgcf
The name of the function is by default the value that will be passed back to
the main program. In some cases involving recursive functions (a more complicated
topic that we may cover later), it may desirable to have the value returned to the
main program be specified by a different variable name. To do this, we can use the
optional ‘result’ specifier in the subroutine name:
The result(z) indicates that the result will be passed back to the calling program as
the variable z. Thus, gcf in the calling program will assume the value of z in the
function.
ASSIGNMENT F7
Modify your program from F6 to compute the least-common multiple as a user-
defined function.
4.9. USER DEFINED FUNCTIONS AND SUBROUTINES 71
4.9.1 Subroutines
Functions are limited in their usefulness because they are designed to pass only
one value back to the calling program. A more general construct is the Fortran
subroutine, which allows unlimited numbers of values to be passed to and from the
calling program.
Here is our first geophysically useful example, a subroutine to compute the dis-
tance and azimuth between any two points on the Earth’s surface:
program userdist
implicit none
real lat1, lon1, lat2, lon2, del, azi
do
print *, ’Enter 1st point lat, lon’
read *, lat1, lon1
print *, ’Enter 2nd point lat, lon’
read *, lat2, lon2
call SPH_AZI(lat1, lon1, lat2, lon2, del, azi)
print *,’del, azi = ’, del, azi
end do
end program userdist
azi=0.
return
end if
pi=3.141592654
raddeg=pi/180.
theta1=(90.-flat1)*raddeg
theta2=(90.-flat2)*raddeg
phi1=flon1*raddeg
phi2=flon2*raddeg
stheta1=sin(theta1)
stheta2=sin(theta2)
ctheta1=cos(theta1)
ctheta2=cos(theta2)
cang=stheta1*stheta2*cos(phi2-phi1)+ctheta1*ctheta2
ang=acos(cang)
del=ang/raddeg
sang=sqrt(1.-cang*cang)
caz=(ctheta2-ctheta1*cang)/(sang*stheta1)
saz=-stheta2*sin(phi1-phi2)/sang
az=atan2(saz,caz)
azi=az/raddeg
if (azi.lt.0.) azi=azi+360.
end subroutine SPH_AZI
In this case, the lat/lon values are passed to the subroutine while del and azi are
passed back to the main program. However, note that if flat1, etc., were changed in
the subroutine, then the corresponding variable would also be changed in the main
program as well. lat1 in the main program and flat1 in the subroutine point to the
same memory location. If one is changed, the other automatically changes as well.
Fortran is not case-sensitive so sph azi and SPH AZI have the same meaning. I
like to put subroutine names in all caps so they are more visible. Note that sph azi
does not have to be declared in the main program.
The subroutine is well-documented in this case, explaining exactly what is going
into the subroutine and what is going out, as well as some of the limitations of
the routine. This may seem like overkill, but documenting your subroutines as
completely as possible is likely to save you considerable time later if you ever want
to use the routine again. It is good to document the routine well enough that you,
or someone else, can use it correctly without having to study the code itself. Clarity
is important—I have seen versions of this routine that do not make clear whether
4.9. USER DEFINED FUNCTIONS AND SUBROUTINES 73
the azimuth is measured at the first point to the second point, or vice versa. Listing
the limits of the subroutine may help prevent future misuse of the routine, in this
example it may prevent the naive user from assuming that the distance returned
is accurate when used with geographic latitude and longitude on the Earth. The
routine is also designed to be robust with respect to pathological inputs, such as
when the two points have the same coordinates.
ASSIGNMENT F8
Write a single subroutine that computes the volume, surface area, and circumfer-
ence of a sphere, given its radius, together with a main program that inputs different
values for the radius from the keyboard and prints the results. Allow the user to
terminate the program by entering zero for the radius. E-mail me the source code
in a single file containing both the main program and the subroutine.
A powerful aspect of subroutines is that one can link to compiled versions of existing
subroutines without having to recompile them. This means that you only have to
maintain one version of a subroutine; it need not be listed along with the source
code of the main program. For example, the SPH AZI subroutine is also part of a
F77 package of spherical geometry subroutines contained in:
~shearer/PROG/SUBS/sphere_subs.f
program userdist2
implicit none
real flat1,flon1,flat2,flon2,del,azi
do
print *,’Enter 1st point lat,lon’
read *,flat1,flon1
print *,’Enter 2nd point lat,lon’
read *,flat2,flon2
call SPH_AZI(flat1,flon1,flat2,flon2,del,azi)
print *,’del, azi = ’, del, azi
end do
end program userdist2
When we compile this program, we need to indicate where the SPH AZI sub-
routine can be found. We want to link with what is called an ‘object file’ for the
74 CHAPTER 4. FORTRAN
subroutines. Object files end in .o and there is a sphere subs f90.o file in the same
directory ( shearer/PROG/SUBS) as the source file sphere subs.f.
You must link with object files that have been compiled using the same Fortran
compiler as you use for the main program. When I switched to using gfortran from
g77 a year or two ago, I had to recompile my subroutines in order for them to link
properly with gfortran compiled code. I still occasionally run into this problem when
I link to a subroutine I have not used in awhile.
To create a F90 object file, enter, for example:
You can also create a F90 object file from F77 source code:
This will work on native F77 code that includes non-F90 compatible features
(e.g., comment lines starting with C), because the F90 compiler sees that the source
file ends in .f rather than .f90.
To compile userdist2 and link to sphere subs.o, we enter:
This quickly becomes cumbersome to write so you will find it convenient to set
up a Makefile to keep track of all this for you. Here is part of a Makefile that does
this for this program:
OBJS1= /home/shearer/PROG/SUBS/sphere_subs_f90.o
Note that you MUST use a tab to generate the spaces before ’gfortran’ for the
Makefile to work correctly! This is a leading source of confusion for novice Makefile
users.
You can list the full path names for any number of subroutine object files in this
way. To compile the program, simply enter:
make userdist2
4.10. INTERNAL PROCEDURES 75
ASSIGNMENT F9
Study the SPH MID subroutine contained in sphere subs.f (in ∼shearer/PROG/SUBS).
Write a program that uses this subroutine to compute the midpoint between two
(lat,lon) points input by the user. Print out the latitude and longitude of this point.
Do not attempt to copy the SPH MID source code, just link to the sphere subs f90.o
object file. Make sure that the argument list for SPH MID in your program matches
the subroutine arguments. E-mail me the source code and also tell me where the
working executable version of the program is located. Be sure to give me execute
permission so that I can try running your program. (Remember: ‘ls -l’ shows the
current permissions, ‘chmod’ is how you change the permissions, ‘man chmod’ will
explain this.) If you are at sea (!), then you may have to copy the sphere subs.f rou-
tines to your local machine. You can make an object file from them using gfortran
-c sphere subs.f -o sphere subs.o
The functions and subroutines that we discussed above are called ‘external’ because
are located outside of the main program. With external procedures all of the values
that are to passed into and out of the procedure must be part of the argument list.
All other variables are local to the procedure or to the main program, even if they
have the same name as a variable in a different procedure. External procedures are
the only kind of procedure allowed in F77 and are what you will find in books like
Numerical Recipes or in the subroutine libraries maintained by various scientists at
SIO. Their great advantage is their portability—everything you need to know about
what they do is contained in their argument list.
However, in some cases it may be more convenient to use an ‘internal’ subroutine
or function, a method that is new to F90. Internal procedures are listed immediately
BEFORE the end statement in the main program. All variable names are shared
within internal procedures; thus argument lists are not necessarily required for them
to work.
Here is an example adapted from the Brainard text that illustrates how internal
subroutines work:
program sort3
implicit none
real :: a1, a2, a3
76 CHAPTER 4. FORTRAN
call read_numbers
call sort_numbers
call print_numbers
contains
subroutine read_numbers
print *,’Enter three numbers’
read *, a1, a2, a3
end subroutine read_numbers
subroutine sort_numbers
if (a1 > a2) call swap(a1,a2)
if (a1 > a3) call swap(a1,a3)
if (a2 > a3) call swap(a2,a3)
end subroutine sort_numbers
subroutine print_numbers
print *,’The sorted numbers are: ’
print *, a1, a2, a3
end subroutine print_numbers
subroutine swap(x,y)
real :: x, y, temp
temp = x
x = y
y = temp
end subroutine swap
Internal procedures are listed following a ‘contains’ statement and before the end
statement for the main program. Variables already declared in the main program
need not be declared in the internal procedure. Arguments are optional; they are
used here in the swap routine to permit it to be used for different pairs from a1, a2
and a3. Note that the procedures are not nested (e.g., one might have tried to put
swap internal to sort numbers); internal procedures may not contain other internal
procedures. Note also that variables declared within a subroutine are purely local
to the subroutine. For example, the value for temp (in swap) is not available in the
main program. Even if temp were declared in the main program, its value will not
correspond to the value of temp in the swap subroutine (try it!).
The use of internal subroutines is rather pointless in this case because the code
would probably be clearer without them. However, for longer programs they may
well be useful for making the code more structured. If you write a program and
4.11. EXTENDED PRECISION 77
notice that you are using the same block of code over and over again, this would a
situation where using a subroutine would make sense. The advantage of an internal
subroutine is that you don’t have to match the argument lists and declare all of the
variables. External subroutines can become cumbersome when they involve a large
number of variables. Often common blocks are used to avoid long argument lists in
this case (more about this later). In many case, internal subroutines may provide a
neater way to handle this situation.
program testdouble
implicit none
character (len = 30) :: fmt = "(a5,f44.40)"
real a4
real*8 a8
double precision aa
real (kind=4) :: k4
real (kind=8) :: k8
real (kind=16) :: k16 !only include if using 64-bit machine + compiler
a4 = 8.9
print fmt, ’a4 = ’, a4
a8 = 8.9d0
print fmt, ’a8 = ’, a8
aa = 8.9d0
print fmt, ’aa = ’, aa
k4 = 8.9
print fmt, ’k4 = ’, k4
k8 = 8.9_8
print fmt, ’k8 = ’, k8
print fmt, ’k16= ’, k16 !only include if using 64-bit machine + compiler
end program testdouble
and which produces the following output on my Mac running 32-bit gfortran:
a4 = 8.8999996185302734375000000000000000000000
a8 = 8.9000000000000003552714000000000000000000
aa = 8.9000000000000003552714000000000000000000
k4 = 8.8999996185302734375000000000000000000000
k8 = 8.9000000000000003552714000000000000000000
you will get single precision accuracy. Even the ‘dble’ operator (which works in F77,
at least on the Suns) will not work in F90 in assigning variables:
4.11. EXTENDED PRECISION 79
or
k8 = 8.9_8
Note that the 0 following the d is the power of 10, thus 1.23d3 is 1230, 0.84d-2 is
0.0084, etc.
To get the full 16-byte precision (only on the Suns or 64-bit Macs), you must
say
k16 = 8.9_16
rather than k16=8.9d0 or k16=8.9 8, which will assign k16 only to double precision
accuracy.
Many of my existing F77 routines use dble( ) to define double precision numbers.
These will not work correctly if they are changed to F90, unless all of the dble( )
commands are rewritten. They are fine, however, if they continue to be compiled
using F77. I don’t know if this is a bug in Sun F90, or if the use of dble( ) in this
way is non-standard Fortran.
ASSIGNMENT F10
Examine the datetime.f source code in ∼shearer/PROG/SUBS (also see class
website) and write a F90 program to compute the number of seconds that have
elapsed since noon on your birth date (or the exact time if you know it) until a
user specified date and time. Have the program print out this number of seconds.
You will want to use the DT TIMEDIF subroutine. Make sure that all of the
variables match and are of the same type (integer, real, and real*8 for timdif, the
final argument). You should also be aware that & in column 6 of F77 code is how
lines are continued, that is:
subroutine DT_TIMEDIF(iyr1,imon1,idy1,ihr1,imn1,sc1,
& iyr2,imon2,idy2,ihr2,imn2,sc2,timdif)
80 CHAPTER 4. FORTRAN
Extra credit: Determine the date and time when you will be (or were) exactly one
billion seconds old. This will require using one of the other subroutines in datetime.f
Mark your calendar for a party!
Just as you can set aside fixed numbers of bytes for real numbers, you can do
the same to store integers of varying sizes. The standard is 4-byte integers, which
have 32 bits and thus will go (approximately) from −231 to 231 . However, 2-byte
and 8-byte integers are also allowed. These are sometimes called short and long
integers, respectively. The following example program shows how this works, using
two different ways to define the integers:
program testinteger
implicit none
integer*2 i2
integer i4
integer*8 i8
integer (kind=2) :: k2
integer (kind=4) :: k4
integer (kind=8) :: k8
i8 = 32000
i4 = i8
i2 = i8
print *, i2, i4, i8
i8 = 32000**2
i4 = i8
i2 = i8
print *, i2, i4, i8
i8 = 32000
i8 = i8**4
i4 = i8
i2 = i8
print *, i2, i4, i8
The zero fields in the output are incorrect and indicate that the number was too
large to be stored in the given variable type. Note that k2, k4 and k8 (defined using
the kind statement) will give the same results, even though they are not explicitly
tested in the program.
Regular integers suffice for most purposes. Short integers are useful to save space
for data sets that don’t require bigger numbers (i.e., beyond about ±32000).
4.12 Arrays
real, dimension(100) :: a, b
integer, dimension(50,2) :: index
which defines a and b as 100 element arrays (a(1) to a(100)) and index as a 50x2
element array (index(1,1) to index(50,2)). Note that the default starting array num-
ber is 1 (not 0 as in C). Also note that array elements are written using parentheses,
not brackets as in C.
Older Fortran programs would define the same arrays as follows:
This syntax will still work and is easier to read for short programs. The new standard
has the advantage, however, of being able to easily define many arrays with identical
dimensions.
A nice feature of Fortran is that we can specify the lower and upper array
boundaries explicitly in the declaration, e.g.,
real, dimension(-100:100) :: a, b
integer, dimension(0:50, 2) :: index
In this case there are 201 values for a and b (from a(-100) to a(100)) and 102
values for index (from index(0,1) to index(50,2)).
Here is an example program that uses an array to compute prime numbers less
than 100:
82 CHAPTER 4. FORTRAN
program prime
implicit none
integer, parameter :: maxnum=100
integer :: i, j, prod(maxnum), max_i, max_j, nprime=0
do i = 1, maxnum
prod(i) = 0
enddo
max_i = floor(sqrt(real(maxnum)))
do i = 2, max_i
if (prod(i) == 0) then
max_j = maxnum/i
do j = 2, max_j
prod(i*j) = 1
enddo
end if
end do
do i = 2, maxnum
if (prod(i) == 0) then
nprime = nprime + 1
print "(i4)", i
end if
enddo
The method is sometimes called the sieve of Eratosthenes, named after a Greek
mathematician from the 3rd century BC. We start with a list of numbers from 2
to 100 and consider them all possible primes. In the program, this list is the array
prod. We initialize the array by setting its values to zero. The strategy is to then
flag numbers that are not prime by setting the corresponding array values to one.
We then start with the number 2 and eliminate all multiples of 2 up to the
maximum value of 1000. We then move to 3 and eliminate (i.e., set prod to 1) all
multiples of 3. We can skip 4 because 4 and all its multiples were already eliminated.
We need check numbers, i, only up to sqrt(100) because larger factors would already
have been eliminated.
When we are finished, we simply print out all numbers that were not eliminated.
In our case, this is all i such that prod(i) = 0. We count the number of primes using
the counter variable nprime.
A new aspect of this program is the defining of maxnum as a parameter:
4.12. ARRAYS 83
This tells the program to set maxnum to 100 and that this value will never be
changed during the program. It also allows the array dimension for prod (in the
next line) to be set by maxnum. This makes it easy for us to change the size of our
prime search by changing the value of maxnum without having to change anything
else in the program.
Note that in the statement
max_i = floor(sqrt(real(maxnum)))
it is necessary to convert maxnum to a real number before taking the square root.
This is because the argument to the Fortran sqrt function must be real; if we had
written sqrt(maxnum) we would have gotten an error during compilation.
This program lists the prime numbers with one number per output line and thus
will become cumbersome for larger values of maxnum. Let’s modify the program to
list 10 primes per line. To to this, we save the prime numbers in a separate array
called pnum. Here is the code:
program prime2
implicit none
integer, parameter :: maxnum=1000
integer, dimension(maxnum) :: prod, pnum
integer :: i, j, max_i, max_j, nprime=0
do i = 1, maxnum
prod(i) = 0
enddo
max_i = floor(sqrt(real(maxnum)))
do i = 2, max_i
if (prod(i) == 0) then
max_j = maxnum/i
do j = 2, max_j
prod(i*j) = 1
enddo
end if
end do
do i = 2, maxnum
if (prod(i) == 0) then
nprime = nprime + 1
pnum(nprime) = i
end if
6.4. VARIABLE TYPES 147
The time has come to talk about variable types. We’ve been very relaxed up to
now, because we don’t have to declare them up front and we can often even change
them from one type to another on the fly. But - variable types matter, so here goes.
Like Fortran, Python has integer, floating point (both long and short), string and
complex variable types. It is pretty clever about figuring out what is required. Here
are some examples:
Lesson learned: you can’t add a number and a string. and string addition is different!
But you really have to be careful with this. If you multiply a float by an integer, it
is possible that you will convert the float to an integer when you really wanted all
those numbers after the decimal! So, if you want a float, use a float.
You can convert from one type to another (if appropriate) with:
int(Number); str(number); float(NUMBER);
long(Number); complex(real,imag)
long() converts to a double precision floating point and complex() converts the two
parts to a complex number.
There is another kind of variable called “boolean”. These are: true, false, and,
or, and not For the record, the integer ‘1’ is true and ‘0’ is false. These can be used
to control the flow of the program as we shall learn later.
In Fortran, you encountered arrays, which was a nice way to group a sequence of
numbers that belonged together. In Python of course we also have arrays, but we
also have more complicated data structures, like lists, tuples, and dictionaries, that
group arbitrary variables together, like strings and integers and floats - whatever
you want really. We’ll go through some attributes of the various data structures,
starting with lists.
6.5.1 Lists
• Lists are denoted with [] and can contain any arbitrary set of items, including
other lists!
6.5. DATA STRUCTURES 149
• Items in the list are referred to by an index number, starting with 0. Note
that this is different from Fortran which starts counting in arrays with the
number 1.
• You can count from the end to the beginning by starting with -1 (the last item
in the list), -2 (second to last), etc.
Examples:
>>> newlist=mylist[1:3]
This takes items 2 and 3 out (note it takes out up to but not including the last item
number - don’t ask me why). Or, we can slice it this way:
>>> newlist=mylist[3:]
which takes from the fourth item (starting from 0!) to the end.
To copy a list BEWARE! You can make a copy - but it isn’t a copy like in
Fortran, but it is just another name for the SAME OBJECT, so:
>>> mycopy=mylist
>>>mylist[2]=‘new’
>>>mycopy[2]
‘new’
See how mycopy got changed when we changed mylist? To spawn a new list that is
a copy, but an independent entity:
>>>mycopy=mylist[:]
Now try:
>>>mylist[2]=1003
>>>mycopy[2]
‘new’
150 CHAPTER 6. LISA’S PYTHON NOTES
"""
Hi there I can type as
many lines as I want
"""
Strings can be added together (newstring = ’spam’ + ’alot’). They can be sliced
(newerstring = newstring[0:3]). but they CANNOT be changed in place - you can’t
do this: newstring[0]=’b’. To find more of the things you can and cannot do to
strings, see: http://docs.python.org/tutorial/introduction.html#strings
Tuples are sort of like lists, but like strings, their elements cannot be changed.
However, you can slice, concatenate, etc. For more see:
http://docs.python.org/tutorial/datastructures.html#tuples-and-sequences
6.5.5 Dictionaries!
Dictionaries are denoted by {}. They are also somewhat like lists, but instead of
integer indices, they use alphanumeric ‘keys’: I love dictionaries. So here is a bit
more about them.
To define one:
>>> Telnos={‘lisa’:46084,‘lab’:46531,‘jeff’:44707} # defines a dictionary
To return the value associated with a specific key:
>>> Telnos[‘lisa’]
46084
>>> Telnos[’lisa’]=46048
Arrays in Python are more similar to arrays in Fortran than they are to lists. Unlike
lists, arrays have to be all of the same data type (dtype), usually numbers (integers
or floats), although there appears to be something called a character array. Also,
the size and shape of an array must be known a priori and not determined on the fly
like lists. For example we can define a list with L=[], then append to it as desired,
but not so arrays - they are much pickier and we’ll see how to set them up later.
Why use arrays when you can use lists? They are far more efficient than lists
particularly for things like matrix math. But just to make things a little confusing,
152 CHAPTER 6. LISA’S PYTHON NOTES
there are several different data objects that are loosely called arrays, e.g., arrays,
character arrays and matrices. These are all subclasses of ndarray. I’m just going
to briefly introduce arrays and matrices here.
Here are a few ways of making arrays:
don’t? The answer is that some of these are functions and some are classes, both of
which we will get to later.
Let’s see what the methods can do. First, arrays made in the above example are
of different data types. To find out what data type an array is, just use the method
dtype as in:
>>> D.dtype
dtype(’float64’)
>>>
And of course arrays, unlike lists have dimensions and shape. Dimensions tell
us how many axes there are with axes defined as in this illustration:
a) b) 2
=
A array: is
ax
axis = 1
axis = 0
axis = 0
[[1, 2, 3],
[4, 2, 0],
[1, 1, 2]]
axis = 1
As shown above our A array has two dimensions (axis 0 and 1). To get Python
to tell us this, we use the ndim method:
Notice how zeros, ones and ndarray used a shape tuple in order to define the
arrays in the examples above. The shape of an array is how many elements are along
each axis. So, naturally we see that the C array is a 2x3 array. Python returns a
tuple with the shape information using the shape method:
>>> C.shape
(2, 3)
Let’s say we don’t want a 2x3 array for the sequence in the array C, but we want
a 3x2 array. Python can reshape an array with a different shape tuple like this:
>>> C.reshape((3,2))
array([[1, 2],
[3, 4],
[5, 6]])
154 CHAPTER 6. LISA’S PYTHON NOTES
And sometimes we just want all the elements lined up along one axis. We could
do that with reshape of course using a tuple with the size of the array (the total
number of elements). You can see that this is 6 here. We could even get python to
tell us what the size is (C.size) and use that in the reshape size tuple. Alternatively
we can use the ravel() method which doesn’t require us to know the size in advance:
>>> C.ravel()
array([1, 2, 3, 4, 5, 6])
There are other ways to reshape, slice and dice arrays. The syntax for slicing of
arrays is similar to that for lists:
>>> B=A[0:2] # carve the top two lines off of matrix A from above
array([[1, 2, 3],
[4, 5, 6]])
Lots of applications in Earth Science require the transpose of an array:
>>> A.transpose() # this is the same as A.T
array([[1, 4, 7],
[2, 5, 8],
[3, 6, 9]])
Also, we can concatenate two arrays together with the - you guessed it - con-
catenate() method. For a lot more tricks with arrays, go to the NumPy Reference
website here: http://docs.scipy.org/doc/numpy/reference/.
We promised to tell you about matrix objects, so here goes. A matrix is another
subclass of ndarray with special advantages for particular applications. While arrays
are intended to be general purpose n-dimensional arrays for numerical computing,
matrices are better for linear algebra type problems: you can take the inverse and
find hermition easier (.I and .H methods) and matrix multiplication works like it
does in Matlab for matrix objects, whereas arrays do element-wise computations
(we’ll learn more about this later). Check out this website for more differences
between the two: http://www.scipy.org/NumPy for Matlab Users.
To convert the A array to a list: L=A.tolist(), from a list or tuple to an ar-
ray: A=numpy.array(L), or from a list, a tuple or an array to a NumPy array:
a=numpy.asarray(L))
Are you tired of typing yet? Like UNIX shell scripts, Python scripts are programs
that can be run and re-run from the command line. You can type in the same
6.6. PYTHON SCRIPTS 155
stuff you’ve been doing interactively into a script file (ending in .py). You can edit
scripts with: vi, TextWrangler, Xcode, emacs, etc. NOT Word! And then you can
run them like this:
%python < myscript.py
Or you can put in a header line identifying the script as python (#!/usr/bin/env
python), make it executable (chmod a+x), and run it like this:
% myscript.py
Here is a familiar example that creates a script using the UNIX cat command,
makes it executable and then runs it:
% cat > printmess.py
#!/usr/bin/env python
# simple Python test program (printmess.py)
print ’test message’
^-D
% chmod a+x printmess.py
% ./printmess.py
test message
so that the file is interpreted as Python. Unlike Fortran or C, you CANNOT start
with a comment line (try switching lines 1 and 2 and see what happens).
The second line is a comment line. Anything to the right of # is assumed to be
a comment (Remember that in Fortran ! serves the same function).
Notice that print goes by default to your screen. Because the message is a string,
you can use single or double quotes for the test message. You can get an apostrophe
in your output by using double quotes and quote marks by using single quotes, i.e.,
#! /usr/bin/env python
# simple Python test program 2 (printmess2.py)
print "The pump don’t work ’cuz the vandals took the handles"
print ’She said "I know what it\’s like to be dead"’
produces:
% ./printmess2.py
The pump don’t work ’cuz the vandals took the handles
She said "I know what it’s like to be dead"
%
156 CHAPTER 6. LISA’S PYTHON NOTES
In the second print statement, the \’ is necessary to prevent an error (try it). This
is an example of a Python ‘escape code’. These are used to escape some special
meaning, as in an end-quote for a string in this example. We use the backslash to
say that we really really want a quote mark here. Other escape codes are listed
here:
http://www.python-course.eu/variables.php
Here’s another example of a program - this one has an typo in line 4:
#! /usr/bin/env python
abeg = 2.1
aend = 3.9
adif = aend - abge
print ’adif = ’, adif
You had intended to type ’abeg’ but typed ’abge’ instead. When you run the
program, you get an error message:
Error messages are a desirable feature of Python. You don’t want the program to
run by assigning some arbitrary value to abge and giving you a wrong answer. Yet
many languages will do exactly that, including Fortran (we can avoid this potential
problem in Fortran by using the ’implicit none’ statement at the beginning our our
programs).
ASSIGNMENT P1
Write a Python script to print your favorite pithy phrase. e-mail your script to
[email protected]
Any reasonable programming language must provide a way to group blocks of code
together, to be executed under certain conditions. In Fortran, for example, there
are if statements and do loops which are bounded by the statements if, endif and
do, endo respectively. Many of these programs encourage the use of indentation
to make the code more readable, but do not require it. In Python, indentation is
the way that code blocks are defined - there are no terminating statements. Also,
the initiating statement terminates in a colon. The trick is that all code indented
6.7. A FIRST LOOK AT CODE BLOCKS 157
the same number of spaces (or tabs) to the right belong together. The code block
terminates when the next line is less indented. A typical Python program looks like
this:
program statement
block 1 top statement:
block 1 statement
block 1 statement \
ha-ha i can break the indentation convention!
block 1 statement
block 2 top statement:
block 2 statement
block 2 statement
block 3 top statement:
block 3 statement
block 3 statement
block 4 top statement: block 4 single line of code
block 2 statement
block 2 statement
block 1 statement
block 1 statement
program statement
• Any statement can be continued on the next line with the continuation char-
acter \ and the indentation of the following line is arbitrary.
• If a code block consists of a single statement, then that may be placed on the
same line as the colon.
• The command break breaks you out of the code block. Use with caution!
• There is a cheat that comes in handy when you are writing a complicated
program and want to put in the code blocks but don’t want them to DO
anything yet: the command pass does nothing and can be used to stand in for
a code block.
Good housekeeping Tip #2: Always use only spaces or only tabs in your code
indentation. I use only spaces because I use vi to write my code. Others use Xcode,
the Python IDLE program, or TextWrangler to write their code and some of these
things use tabs by default. Whatever you do BE CONSISTENT because tabs are
not the same as spaces in Python even if you can’t tell the difference just by looking
at it.
158 CHAPTER 6. LISA’S PYTHON NOTES
In the following, I’ll show you how Python uses code blocks to create “do” and
“while” loops, and “if” statements.
Here is an example of a “for loop” that is similar to the way you would do it in
Fortran. :
#!/usr/bin/env python
mylist=[42,‘spam’,‘ocelot’]
for i in range(0,len(mylist),1): # note absence of Indices list, start and step
print mylist[i]
print ’All done’
This script creates the list mylist with the line mylist=[42,‘spam’,‘ocelot’]. The
length of mylist is an integer value returned by len(mylist). The script uses this
integer as the ‘stop’ value in the range() function, which returns a list of integers
from 0 to the stop value MINUS ONE at intervals of one. [ The minus one con-
vention is hard to get use to for Fortran programmers, but it is typical of Python
syntax (and also of C) so just deal with it.] Anyway, range(start,stop,step) is just
like numpy.arange(start,stop,step) but returns integers instead of floats. Also, like
numpy.arange(), there is a short hand form when the minimum is zero and the
interval is one, so we could (and will) just use the command range(stop).
Python makes i step through the list of numbers from 0 to 2, printing the ith
element of mylist. Note how the print command is indented - this is the program
block that is executed for each i. Note also that the line could have been on the
previous line after the colon, because there is only one line in the program block.
But never-mind, this way works too and is more Fortran like. When i finishes it’s
business, the program block terminates. At that point, the program prints out the
’All done’ string. There is no “enddo” statement or equivalent in Python.
But, Python is far more fun than the Fortran-like for i in syntax in the above
code snippet. In Python we can just step through a list directly. Here is another
script which does just that (why not?):
#!/usr/bin/env python
mylist=[42,‘spam’,‘ocelot’]
for item in mylist: # note absence of range statement
print item
print ’All done’
6.7. A FIRST LOOK AT CODE BLOCKS 159
Note that of course we could have used any variable name instead of ‘item’, but it
makes sense to use variable names that mean what they do. It is easier to understand
what ‘item’ stands for than just the Fortran style of i.
Here is an example with a little more heft to it. It creates a table of trigonometry
functions, spitting them out with a formatted print statement:
#! /usr/bin/env python
import numpy as np
deg2rad = np.pi/180. # remember conversion to radians
for theta in range(90): # short form of range, returns [0,1,2...89]
ctheta = np.cos(theta*deg2rad) # define ctheta as cosine of theta
stheta = np.sin(theta*deg2rad)# define stheta as sine of theta
ttheta = np.tan(theta*deg2rad) # define ttheta as tangent of theta
print ’%5.1f %8.4f %8.4f %8.4f’ %(theta, ctheta, stheta, ttheta)
Let’s pick this one apart a bit. First, notice the use of the variable deg2rad to
convert from degrees to radians. Also notice how deg2rad is defined: deg2rad =
np.pi/180. using the NumPy function for π and the decimal point after 180. While
in this case, it makes absolutely no difference (try it!), it is a good practice to use
real numbers if you want your variable to stay real. In fact:
Good housekeeping Tip #3: Always use a decimal if you want your variable to
be a floating point variable.
The expression ctheta = np.cos(theta*deg2rad) uses the numpy cosine function.
Ideally theta should be a real variable while in fact it is an integer in this expression,
but fortunately Python figures that out and converts it to a real. Note that we could
have also converted theta to a float first with the command float(theta).
print ’%5.1f %8.4f %8.4f %8.4f’ %(theta, ctheta, stheta, ttheta)
which would space the numbers irregularly among the columns and put out really
long numbers. Instead, we explicitly specify the output format. The output format
is given in the quotes. The format for each number follows the %, 5.1f is for 5
spaces of floating point output, with 1 space to the right of the decimal point (in
Fortran this is f5.1). The single blank space between %5.1f and %8.4f is included in
the output, in fact any text there is reproduced exactly in the output, thus to put
commas between the output numbers, write:
160 CHAPTER 6. LISA’S PYTHON NOTES
print ’%5.1f ’\t’ %8.4f’\t’ %8.4f,\t’ %8.4f’ %(theta, ctheta, stheta, ttheta)
The “for loop” is just one way of controlling flow in Python. There are also if and
while code blocks. These execute code blocks the same way as for loops (colon
terminated top statements, indented text, etc.). For both of these, the code block is
executed if the top statement is TRUE. For the “if” block, the code is executed once
but in a “while” block, the code keeps executing as long as the statement remains
TRUE.
The key to flow control therefore is in the top statement of each code block; if
it is TRUE, then execute, otherwise skip it. To decide if something is TRUE or not
(in the boolean sense), we need to evaluate a statement using comparisons. You
know all about comparisons from Fortran. Python of course also has comparisons
and they work in similar ways with a few differences. Here’s a handy table with
comparisons (relational operators) in different languages:
Comparisons
F 77 F90 C MATLAB PYTHON meaning
.eq. == == == == equals
.ne. /= != ∼= != does not equal
.lt. < ¡< < < less than
.le. <= <= <= <= less than or equal to
.gt. > > > > greater than
.ge. >= >= >= >= greater than or equal to
.and. .and. & & and
.or. .or. —— — or
These operators can be combined to make complex tests. Here is a juicy com-
plicated statement:
There are rules for the order of operations for these things like, multiplication gets
done before addition. But these are easy to forget. You can look it up in the
6.7. A FIRST LOOK AT CODE BLOCKS 161
documentation if you are unsure or, better, just put in enough parenthesis to make
it completely clear to anyone reading your code.
Good housekeeping Tip #4: Use parentheses liberally - make the order of operation
completely unambiguous even if you could get away with fewer.
One nice aspect of Python compared to C is that if you make a mistake and
type, for example,
if (a = 0):
you will get an error message during compilation. In C this is a valid statement
with a completely different meaning than is intended!
#!/usr/bin/env python
mylist=[’jane’,’doug’,’denise’]
if ’susie’ in mylist:
pass # don’t do anything
if ’susie’ not in mylist:
print ’call susie and apologize!’
mylist.append(’susie’)
elif ’george’ in mylist: # if first statement is false, try this one
print ’susie and george both in list’
else: # if both statements are false, do this:
print "susie in list but george isn’t"
While loops
As already mentioned, the ‘while’ block continues executing as long as the while
top statement is TRUE. In other words, the if block is only executed once, while
162 CHAPTER 6. LISA’S PYTHON NOTES
the while block keeps looping until the statement turns FALSE. Here are a few
examples:
#!/usr/bin/env python
a=1
while a < 10:
print a
a+=1
print "I’m done counting!"
All of these program blocks can also be done in an interactive session also using
indentation. The interactive shell responds with ’.....’ instead of ’>>>’ once you
type a statement it recognizes as a top statement. To signal that you are done with
the program block, simply hit return:
>>> a=1
>>> while a<10:
.... print a
.... a+=1
....[return to execute block]
ASSIGNMENT P2
Rewrite your Fortran Assignment F4 in Python. E-mail it to [email protected]
If you are using Python interactively or want interactivity in a script, use the com-
mand: raw input(). It acts as a prompt and reads in whatever is supplied prior to
a return as a string.
6.8. FILE I/O IN PYTHON 163
In this example, the variable ans will be read in as a string variable, converted to a
float and appended to the list, X. raw input() is a simple but rather annoying way
to enter things into a program. Another (less annoying) way is put the data in a
file (e.g., myfile.txt) with cat, paste, Excel (saved as a text file), or whatever and
read it into Python. The approach to this is similar to Fortran: we must first open
the file, then read it in and parse lines into the desired variables.
To open a file we use the command open(), one of Python’s built-in functions.
For a complete list of these, see:
http://docs.python.org/library/functions.html
The open() function returns an object, complete with methods, like readlines()
which, yes, reads all the lines. Here is a script (ReadStations.py that will open the
file station.list from the chapter on GMT, read in the data and print it out line by
line.
#!/usr/bin/env python
f=open(’station.list’)
StationNFO=f.readlines()
for line in StationNFO:
print line
% ReadStations.py
9.02920 38.76560 2442 AAE
The function open() has some bells and whistles to it and has the form open(name[,
mode[, buffering]) where the stuff in square brackets is optional. The ‘name’ argu-
ment is the file name to open and ‘mode’ is the way in which it should be opened,
most commonly for reading ’r’, writing ’w’ or appending ’a’. I use the form ’rU’ for
unformatted reading because I often want to read in files that were saved in Dos,
Mac OR Unix line endings and ’rU’ figures all that out for you. Just in case you are
164 CHAPTER 6. LISA’S PYTHON NOTES
curious, Unix lines end in ’\n’, Mac files in ’\r’ and Dos (and windows) lines end in
’\r\n’. I never use the ’buffering’ argument and don’t know what it does.
If you are curious about the line endings, try typing out the ‘representation’
of the line repr(line) in the above script and you get all the stuff that is normally
invisible like the apostrophes and the line terminations:
% ReadStations.py
’ 9.02920 38.76560 2442 AAE \n’
’ 42.63900 74.49400 1645 AAK \n’
’ 37.93040 58.11890 678 ABKT\n’
’ 51.88370 -176.68440 116 ADK \n’
’ -13.90930 -171.77730 706 AFI \n’
etc.
Notice how in our first version, printing the line also printed the line feed (\n) as
an extra line. To clean this off of each line, we can use the string strip() function:
print line.strip(’\n’)
% ReadStations.py
9.02920 38.76560 2442 AAE
42.63900 74.49400 1645 AAK
37.93040 58.11890 678 ABKT
Let’s say you want to read in the data table into lists called Lats, Lons, and
StaIDs (the first three columns). You need to split each line into its columns and
append the correct column into the appropriate list. Fortran automatically splits
on the spaces so you probably didn’t have to worry about this sort of thing yet, but
Python reads in the entire line as a string and ignores the spaces or other possible
delimiters (commas, semi-colons, tabs, etc.). To split the line, we use the string
function split([sep]) where [sep] is an optional separator. If no separator is specified
(e.g., line.split()), it will split on spaces. Anything could be a separator, but the
most common ones are ’,’, ’;’, and ’\t’. The latter is how a tab appears if you were
to, say, print out the representation of the line, which shows all the invisibles.
Here is a slightly modified version of ReadStations.py, ParseStations.py which
parses out the lines and puts numbers (floats or integers) in the right lists:
#!/usr/bin/env python
Lats,Lons,StaIDs,StaName=[],[],[] ,[]# creates lists to put things in
StationNFO=open(’station.list’).readlines() # combines the open and readlines methods!
6.8. FILE I/O IN PYTHON 165
As in Fortran, Python can also read from standard input. To do this, we need a
system specific module, called sys which among other things has a stdin method.
So, instead of specifying a file name in the open command, we could substitute the
following line:
#!/usr/bin/env python
import sys
Lats,Lons,StaIDs,StaName=[],[],[] ,[]# creates lists to put things in
StationNFO=sys.stdin.readlines() # reads from standard input
for line in StationNFO:
nfo=line.strip(’\n’).split() # strips off the line ending and splits on spaces
Lats.append(float(nfo[0])) # puts float of 1st column into Lats
Lons.append(float(nfo[1]))# puts float of 2nd column into Lons
StaIDs.append(int(nfo[2])) # puts integer of 3rd column into StaIDs
StaName.append(nfo[3])# puts the ID string into StaName
print Lats[-1],Lons[-1],StaIDs[-1] # prints out last thing appended
We could also use command line switches by reading in arguments from the
command line. In the following example, we use the switch ’-f’ with the following
argument begin the file name:
% ReadStations.py -f station.list
In the special case where the data in a file are entirely numeric, you can read in the
file with a special numpy function loadtxt(). This reads the data into a list whereby
each element of the list is a list of numbers from each line.
Let’s say I have a Python module that will convert latitudes and longitudes to UTM
coordinates. O.K. I really do have one that I downloaded from here:
http://code.google.com/p/pyproj/issues/attachmentText?id=27&aid=
-80884174771817564 &name=UTM.py&token=46ab62caa041c3f240ca0e55b7b25ad6
I wrote a script (ConvertStations.py) to convert each of the stations in my list
to their UTM equivalents (assuming these were in a WGS-84 ellipsoid). It would
be nice if after having done this to the data, I could then write it out somehow,
preferably to a file. Of course I could use the print command like this:
#!/usr/bin/env python
import UTM # imports the UTM module
Ellipsoid=23-1 # UTMs code for WGS-84
StationNFO=open(’station.list’).readlines()
for line in StationNFO:
nfo=line.strip(’\n’).split()
lat=float(nfo[0])
lon=float(nfo[1])
StaName= nfo[3]
Zone,Easting, Northing=UTM.LLtoUTM(Ellipsoid,lon,lat)
print StaName, ’: ’, Easting, Northing, Zone
% ConvertStations.py
AAE : 474238.170087 998088.469113 37P
6.8. FILE I/O IN PYTHON 167
But we yearn for more. So, more elegantly, I can open an output file [for ap-
pending ‘a’ or (over)writing ’w’] write a formatted string using the write method on
the output file object with format string:
#!/usr/bin/env python
import UTM # imports the UTM module
outfile=open(’mynewfile’,’w’) # creates outfile object
Ellipsoid=23-1 # UTMs code for WGS-84
StationNFO=open(’station.list’).readlines()
for line in StationNFO:
nfo=line.strip(’\n’).split()
lat=float(nfo[0])
lon=float(nfo[1])
StaName= nfo[3]
Zone,Easting, Northing=UTM.LLtoUTM(Ellipsoid,lon,lat)
outfile.write(’%s %s %s %s\n’%(StaName, Easting, Northing, Zone))
The only significant changes are 1) the object outfile is opened for writing. Note
that this will clobber anything in a pre-existing file by that name and 2) the output
file gets written to in the statement with a write method on the output file object:
outfile.write(’%s %s %s %s\n’%(StaName, Easting, Northing, Zone))
The write statement uses the syntax: ’format string’%(list of variables tuple). For-
mat strings have these rules:
• For each variable in (what you...) you need a format: %s for string, %i for
integer, %f for float, %e for exponent
• you can also specify further, e.g.: %7.1f for 7 characters with 1 after the
decimal %10.3e for 10 characters with 3 after the decimal
• where the number of characters include the decimal and padded spaces
x,y=4.82,2.3e3
print ’%7.1f,%s\t%10.3e’%(x,’hi there’,y)
4.8,hi there 2.300e+03
• In the ConvertStations2.py script, the ’\n’ string puts a UNIX line ending on
it. Without that, the whole file is but a single line (very annoying).
A session using the script (ConvertStations2.py and a peek at the resulting file could
look like this:
% ConvertStations2.py
% head mynewfile
AAE 474238.170087 998088.469113 37P
AAK 458516.115522 4720850.45385 43T
ABKT 598330.712671 4198681.92944 40S
ADK 521722.179764 5748148.625 1U
AFI 416023.683618 8462168.07766 2L
ALE 509467.666259 9161062.29194 20X
ALQ 366981.843985 3868044.56906 13S
ANMO 366981.843985 3868044.56906 13S
ANTO 482347.254856 4413225.7807 36S
AQU 368638.770654 4690300.1797 33T
I’m assuming you know what the UNIX head command does or at least how to find
out!
6.9 Functions
So far you have learned how to use functions from program modules like NumPy.
You can imagine that there are many bits of code that you might write that you
will want to use again and again, say converting between degrees and radians and
back, or finding the great circle distance between two points on Earth, or converting
between UTM and latitude/longitude coordinates (as in UTM.py, my new favorite
package). In Fortran, you learned about subroutines and functions which do this
sort of work for you. In Python, we also have a way to do this of course. The basic
structure of a program with a Python function is:
#!/usr/bin/env python
def FUNCNAME(in_args):
"""
DOC STRING
"""
some code that the functions does something
6.9. FUNCTIONS 169
return out_args
The first line must have ’def’ as the first three letters, must have a function name
with parentheses and a terminal colon. If you want to pass some variables to the
function, they go where in arg sits, separated by commas. Unlike in Fortran, there
are no output variables here.
There are four different ways to handle argument passing.
1) You could have a function that doesn’t need any arguments at all:
#!/usr/bin/env python
def gimmepi():
"""
returns pi
"""
return 3.141592653589793
print gimmepi()
2) You could use a Fortran like style, where there is a set list of what are called
‘formal’ variables that must be passed:
#!/usr/bin/env python
def deg2rad(degrees):
"""
converts degrees to radians
"""
return degrees*3.141592653589793/180.
print ’42 degrees in radians is: ’,deg2rad(42.)
3) You could have a more flexible need for variables. You signal this by putting
*args in the in args list (along with any formal variables you want):
#!/usr/bin/env python
def print_args(*args):
"""
prints argument list
"""
print ’You sent me these arguments: ’
for arg in args:
print arg
print_args(1,4,’hi there’)
print_args(42)
170 CHAPTER 6. LISA’S PYTHON NOTES
4) You can use a keyworded, variable-length list by putting **kwargs in for in args:
#!/usr/bin/env python
def print_kwargs(**kwargs):
"""
prints keyworded argument list
"""
for key in kwargs:
print ’%s %s’ %(key, kwargs[key])
print_kwargs(arg1=’one’,arg2=42,arg3=’ocelot’)
Doc String
Although you can certainly write functional code without a document string, make
a habit of always including one. Trust me - you’ll be glad you did. This can later
be used to remind you of what you thought you were doing years later. It can be
used to print out a help message by the calling program and it also let’s others
know what you intended. Notice the use of the triple quotes before and after the
documentation string - that means that you can write as many lines as you want.
Function body
This part of the code must be indented, just like in a for loop, or other block of
code.
Return statement
You don’t need this unless you want to pass back information to the calling body
(see, for example print kwargs() above). But unlike in Fortran, where variables are
passed back the same way they get in, through the first line, Python separates the
entrance and the exit. See how it can be done in the gimme pi() example above.
It is considered good Python style to treat your main program block as a function
too. (This helps with using the document string as a help function and building
program documentation in general.) In any case, I recommend that you just start
doing it that way too. In this case, we have to call the main program with the final
(not indented) line main():
6.9. FUNCTIONS 171
#!/usr/bin/env python
def print_kwargs(**kwargs):
"""
prints keyworded argument list
"""
for key in kwargs:
print ’%s %s’ %(key, kwargs[key])
def main():
"""
calls function print_kwargs
"""
print_kwargs(arg1=’one’,arg2=42,arg3=’ocelot’)
main() # runs the main program
Notice how in the above examples, all the functions preceded the main function.
This is because Python is an interpreter and not compiled - so it won’t know about
anything declared below as it goes through the script line by line. On the other hand,
we’ve been running lots of functions and they were not in the program we used to
call them. The trick here is that you can put a bunch of functions in a separate file
(in your path) and import it, just like we did with NumPy. Your functions can then
be called from within your program in the same way as for NumPy.
So let’s say I put all the above functions in a file called myfuncs.py:
def gimmepi():
"""
returns pi
"""
return 3.141592653589793
def deg2rad(degrees):
"""
converts degrees to radians
"""
return degrees*3.141592653589793/180.
def print_args(*args):
"""
prints argument list
"""
print ’You sent me these arguments: ’
for arg in args:
print arg
I could then just import the module myfuncs from within another program, or just
interactively. I can use the functions, or just call for help:
% python
172 CHAPTER 6. LISA’S PYTHON NOTES
>>>
As in Fortran, inside a function, variable names have their own meaning which in
many cases will be different from inside the calling function. So, variables names
declared inside a function stay in the function. This is true unless you declare them
to be “global”. Here is an example in which the main program “knows” about the
functions variable V:
def myfunc():
global V
V=123
def main():
myfunc()
print V
main()
In addition to being able to write your own functions, of course Python has
LOTS of modules and a gazzillion functions. The enthought distribution that you
are using Includes plotting, numerical recipes, trig functions, image manipulation,
animation, and many more. We will explore some of these in the rest of the class.
ASSIGNMENT P3:
Write a subroutine module that has these functions:
• a function that returns the bulk parameters relating to the Earth from this
website:
http://nssdc.gsfc.nasa.gov/planetary/factsheet/earthfact.html
Also include the average radius of: 6,371 km
• a function that converts degrees to radians
• one that converts radians to degrees
• one that converts longitude and latitude to cartesian coordinates. [hint, x =
cos(az) cos(pl), y = sin(az) cos(pl), z = sin(pl)]], assuming a radius of unity
• one that converts cartesian coordinates back to longitude and latitude
6.10. COMBINING F90 CODE WITH PYTHON 173
• and one that calculates the great circle distance between two points using the
numpy.dot() function to get the angular separation and the radius of the Earth
to get the arc length. Assume the average radius of the Earth (gotten from
the first function).
Then write a program that takes keyboard entry for an longitude and latitude pair
and prints out the X,Y,Z, converts back and prints the new longitude and latitude
out (as a check) and gives you the great circle distance in km. [HINT: Take a look at
the function SPH AZI in the Fortran chapter.] E-mail your code to [email protected]
Now you have the very basics of Python programming under your belt, we can try
to pull together the different threads you have been learning by investigation how
to combine F90 with Python. You might ask why not just use Python? or Fortran?
The answer is that there is a lot of Fortran code lying around that you might want
to use, but no way to visualize the data, so you want Python for that. Or you are
solving an extremely computationally intensive problem (global climate model, or
convection in the Earth’s core or mantle, or ....) and you need code that is as fast as
possible. Although NumPy is compiled and therefore very fast, Fortran is still the
fastest thing around. For whatever reason, you are taking a class in both Fortran
and Python so it makes pedagogical sense to try to tie the two halves together.
The next question is “How?”. There are several different approaches to this
ranging from the simple to the sophisticated. The simple approach would be to
create output files with Fortran and the read it in and plot it or whatever with
Python. A more sophisticated approach would be to call subroutines and functions
from a Fortran module which is imported like any other module into Python. This
is more tricky and involves a Python package called f2py which came with your
Python distribution.
I have tested f2py using the gfortran and python packages that were recom-
mended for this class and it worked fine. But I also had to get rid of my beloved
antique /usr/local/bin/g77 compiler, but you probably don’t have one. This is just
a trouble shooting tip. For reference, the URLs for these are:
http://gcc.gnu.org/wiki/GFortranBinaries#MacOS
and
http://www.enthought.com/products/getepd.php
174 CHAPTER 6. LISA’S PYTHON NOTES
Because these websites change quickly, I also put the .dmg files on the class website:
http://mahi.ucsd.edu/class233/gfortran-and-gcc-4-6-2-RC20111019-Sno-x86-64.dmb
and
http://mahi.ucsd.edu/class233/epd-7.1-2-macosx-i386.dmb
Assuming you have installed everything properly, there we will cover three ways
to use f2py in the following. These are:
• Brute force: Try to get f2py to create a compiled module (ending in .so) from
a standard F90 program, which can then be imported like any other module.
Because Fortran subroutines have arguments that both go into the subroutine
and come out, and Python doesn’t, f2py has to make guesses as to the use of
variables and does its best. This works in simple cases, but can fail badly at
times.
• Signature File: Ask f2py to read through the code, picking out all the variables
and create what is called a “signature file” (ending in .pyf). This has f2py’s
guesses as to what the variables are supposed to do. The signature file can
be edited to supply the correct intent of variables. Then f2py can create the
compiled module based on what is in the signature file, which functions with
fewer errors (from bad guesses).
• F90 surgery: The Fortran code can be modified itself to help f2py in inter-
preting variables.
Let’s start with an example using the program gcf2.f90 from the chapter on Fortran.
This has a function to calculate greatest common factors. In fact we don’t need all
the fiddly bits at the top - just the function getgcf. Let’s save it in a file called
gcf.f90:
function getgcf(x, y)
implicit none
integer :: getgcf, x, y, i, z
do i = 1, min(x,y)
if (mod(x, i) == 0 .and. mod(y, i) == 0) getgcf = i
end do
end function getgcf
To compile the function (or the whole program!), use this syntax:
6.10. COMBINING F90 CODE WITH PYTHON 175
The -c switch specifies which file to compile and the -m switch specifies the stem of
the output file. f2py will create something called that stem with .so appended to
it, so if all is well you will get a file called gcf.so. The output file gcf.so is a callable
module from within python:
The brute force method is not without potential problems. For example, we
could use the function without checking for variable type and send the F90 func-
tion something with the wrong type, getting back the wrong answer with no error
message. Also, f2py assumes that all variables are going IN to the subroutine and
doesn’t know that some are intended to come out. So we need a way to “teach” f2py
which variables are intended to go in and which are intended to come out. This can
be done either with a “signature file” if you can’t modify the Fortran code itself, or
by inserting a few python hints into the Fortran code itself.
To illustrate the problem, let’s consider another F90 subroutine sph azi from the
Fortran chapter:
cang=stheta1*stheta2*cos(phi2-phi1)+ctheta1*ctheta2
ang=acos(cang)
del=ang/raddeg
sang=sqrt(1.-cang*cang)
caz=(ctheta2-ctheta1*cang)/(sang*stheta1)
saz=-stheta2*sin(phi1-phi2)/sang
az=atan2(saz,caz)
azi=az/raddeg
if (azi.lt.0.) azi=azi+360.
end subroutine SPH_AZI
%python
>>> import sph_azi
>>> delta,azi=0.,0. # have to declare these..
>>> sph_azi.sph_azi(33,-117,41,-72,delta,azi) # note module/function name are same.
>>> print delta, azi
0.0 0.0
Well, that didn’t work. The problem is that Python can’t get the variables delta,
azi back out from the subroutine. The Fortran style of using the entrance as the exit
makes it tough to figure out which variables are supposed to go in and which are
supposed to come out, to f2py doesn’t even try. The solution is to use the “signature
file” method.
We can use f2py to create a signature file sph azi.pyf with the command:
The syntax is a little different from the brute force method in that we are not
compiling sph azi yet, we are just creating the .pyf file (specified by the -h switch,
for use in the eventual sph azi module (specified by the -m switch).
If we look inside the ph azi.pyf file, we find:
interface ! in :sph_azi
subroutine sph_azi(flat1,flon1,flat2,flon2,del,azi) ! in :sph_azi:sph_azi.f90
real :: flat1
real :: flon1
real :: flat2
real :: flon2
real :: del
real :: azi
end subroutine sph_azi
end interface
end python module sph_azi
At this point, all we have is a list of all the variables f2py found. Note that while
Fortran doesn’t care about case, Python does, so the variable names within the .f90
code are converted to lower case by default. (You can suppress this with the –n-
lower switch.) The default assumption is that all of the variables found are headed
IN to the subroutine and not intended to come out. Because in the case of sph azi
this is not true (del and azi are coming out), we must edit the .pyf file to explicitly
say which variables are coming in and which are coming out. This is done with the
by inserted the function intent(in) or intent(out) after the variable type:
If we save the modified .pyf file as sph azi1.pyf, we can compile sph azi with the
command:
which creates a new file sph azi.so. Let’s see if that one works better:
% python
>>> import sph_azi
178 CHAPTER 6. LISA’S PYTHON NOTES
>>> delta,azi=sph_azi.sph_azi(33,-117,41,-72)
>>> print delta, azi
36.4012794495 64.0623321533
And it does!
In the last section we saw how to use F90 code, without touching it, just by clarifying
a few things for poor old f2py. IF you can modify the F90 source, you can skip the
signature file by inserting special commands, called ‘f2py directives’ that the Fortran
compiler will ignore (because they start with a ’ !’), but f2py will recognize (sneaky,
huh?). Here is an example of a duly modified code, saved as sph azi2.f90:
subroutine SPH_AZI(flat1, flon1, flat2, flon2, del, azi)
implicit none
real :: flat1,flon1,flat2,flon2,del,azi,pi,raddeg,theta1,theta2, &
phi1,phi2,stheta1,stheta2,ctheta1,ctheta2, &
sang,cang,ang,caz,saz,az
!f2py intent(in) flat1
!f2py intent(in) flon1
!f2py intent(in) flat2
!f2py intent(in) flon2
!f2py intent(out) del
!f2py intent(out) azi
if ( (flat1 == flat2 .and. flon1 == flon2) .or. &
(flat1 == 90. .and. flat2 == 90.) .or. &
(flat1 == -90. .and. flat2 == -90.) ) then
del=0.
azi=0.
return
end if
pi=3.141592654
etc.
This works just like the signature file example but using the module name
sph azi2 instead of sph azi.
• NB: If you are modifying Fortran 77, then use Cf2py instead of ‘f2py’ but this
isn’t included in the Enthought Python you installed, and I have never used
it, so you are on your own here.
6.11. CLASSES 179
• Watch out with arrays – Fortran and the default NumPy arrays are the trans-
pose of one another! There are ways spelled out in the NumPy documentation
for making your Pytnon arrays behave the same as the Fortran ones - so go
read that before you get fancy ideas.
• f2py converts all the Fortran subroutine names to lower case. You can suppress
this with a –n-lower option.
• For more detailed information on what f2py is really doing, see the documen-
tation available here:
http://www.scipy.org/F2py
There is also a helpful, but somewhat dated reference manual available here:
http://cens.ioc.ee/projects/f2py2e/usersguide/
Note that in the latter, there are many references to a module named Numeric,
which is a predecessor of NumPy - so don’t try to call it because you don’t have it
installed.
ASSIGNMENT P4
Modify your ASSIGNMENT F8 such that that main program is in Python and
calls your F90 subroutine. Use the F90 surgery method above to modify your
subroutine such that it compiles without the use of signature files. E-mail both files
plus the script you would use to compile them to [email protected]
6.11 Classes
Before we go any further, we need to learn some basic concepts about classes. These
are the basis of “object oriented programming” OOP (that again!). Class objects lie
behind plotting, for example and a rudimentary understanding of what they are and
how they work will come in handy when we start doing anything but the simplest
plotting exercises.
A class object is created by a call to a “class definition” which which can be
thought of as a blueprint for the class object. Here is an simple example of a class
definition:
180 CHAPTER 6. LISA’S PYTHON NOTES
class Circle:
"""
This is simple example of a class
"""
pi=3.141592653589793
def __init__(self,r):
self.r=r
def area(self):
return 0.5*self.pi*self.r**2
def circumference(self):
return 2.*pi*self.r
Saving this class in a file called Shapes.py we can use it in a Python session in a
manner similar to function modules:
In spite of superficial similarities, classes are not the same as functions. Although
the Shape module is imported just the same as any other, to use it, we first have
to create a class “instance” (C=Shapes.Circle(r)). C is an object with “attributes”
(variables) and “methods”. All methods (parts that start with “def”), have an
argument list. The first argument has to be a reference to the class instance itself,
or “self”, followed by any variables you want to pass into the method. So the init
method initializes the instance attributes of an object. In the above case, it defined
the attribute r, which gets passed in when the class is first called. Asking for any
attribute (note the lack of parentheses), retrieves the current value of that attribute.
Attributes can be changed (as in C.r=2.0).
The other methods (area and circumference) are defined like any function except
note the use of ’self’ as the first argument. This is required in all class method
definitions. In our case, no other parameters are passed in because the only one
used is r, so the argument list consists of only self. Calling these methods returns
the current values of these methods.
6.12. MATPLOTLIB 181
You can make a subclass (child) of the parent class which has all the attributes
and methods of the parent, but may have a few attributes and methods of its own.
You do this by setting up another class definition within a class.
So, the bottom line about classes is that they are in the same category of things
as variables, lists, dictionaries, etc. That is, they are ‘data structures’ - they hold
data, and the methods to process that data. If you are curious about classes, there’s
lots more to know about classes that we don’t have time to get into, but you can
find useful tutorials online:
(e.g., http://www.sthurlow.com/python/lesson08/)
ASSIGNMENT P5
• Write a module called Shapes
• Shapes should have classes for: circle, sphere, cylinder, rectangle, and cube.
• these classes should have methods that return things volume, circumference,
area, mass (pass the argument density), where appropriate.
• write a class called Earth that has attributes using the bulk parameters from
assignment P3
• write a program that uses the Earth density, radius from the Earth class, passes
these to the sphere class and calculates the mass. How does this compare with
the mass given by the Earth class?
6.12 Matplotlib
So far you have learned the basics of Python, NumPy and how to link F90 code with
Python. But Python was sold as a way of visualizing data and we haven’t yet seen
a single plot! There are many plotting options within the Python umbrella. The
most mature and the one I am most familiar with is matplotlib, a popular graphics
module of Python. Actually matplotlib is a collection of a bunch of other modules,
toolkits, methods and classes. For a fairly complete and readable tour of matplotlib,
check out these links:
http://matplotlib.sourceforge.net/Matplotlib.pdf
and here:
http://matplotlib.sourceforge.net/
#!/usr/bin/env python
import matplotlib
matplotlib.use("TkAgg") # my favorite backend
import pylab # module with matplotlib
pylab.plot([1,2,3]) # plot some numbers
pylab.ylabel(’Y’) # label the y-axis
pylab.show() # reveal the plot
The first step should be obvious by now, it imports matplotlib. Figures are
rendered on “backends” so they appear on screen. There are a lot of different back-
ends with slightly different looks. Some work better on different operating systems.
I use the very old school backend called “TkAgg” backend because it “works”. So
step 2 sets the backend: matplotlib.use(“TkAgg”). The module matplotlib itself
contains a lot of other modules. One of these, pylab is the “business end” that has
a lot of plotting methods and classes. It must be loaded alongside matplotlib, so
step 3 is: import pylab. After that the fun starts.
In the above example, we call the plot method with a list as an argument. As
I mentioned, matplotlib uses the concept of “classes” to make plots and this has
just happened behind the scenes. We could have named the plot instance with
a the figure() method (e.g., fig=pylab.figure()) and then referred to it later with
the command fig.plot([1,2,3]), but we don’t have to in this simple case - the class
instance is implied and is the “current plot”. You can tell this, if you do the above
example in interactive mode:
Once that happens, we won’t be able to change the plot any more and in fact, we
won’t get our terminal back until the little plot window is closed. You can save your
plot with the little disk icon in a variety of formats. Adobe Illustrator likes .svg, or
.eps while Microsoft products like .png file formats.
If you find it annoying to always have to close figures with the little red button,
or save them with the disk icon, you can tweak the program like this:
#!/usr/bin/env python
import matplotlib
matplotlib.use("TkAgg")
import pylab
pylab.ion() # turn on interactivity
pylab.plot([1,2,3])
pylab.ylabel(’Y’)
pylab.draw() # draw the current plot
ans=raw_input(’press [s] to save figure, any other key to quit: ’)
if ans==’s’:
pylab.savefig(’myfig.eps’)
r=x*numpy.pi/180.
c=numpy.cos(r)
s=numpy.sin(r)
s2=numpy.sin(r)**2
pylab.plot(x,c,’r--’,x,s,’g^’,x,s2,’k-’)
pylab.title(’Fun with trig’)
pylab.text(250,-.5,’pithy note’)
pylab.legend([’cos(x)’,\
’sin(x)’,r’$\sin(x^2$)’],’lower left’)
pylab.xlabel(r’$\theta’)
pylab.annotate(’triangles!’,\
xy=(175,0),xytext=(110,-.25),\
arrowprops=dict(facecolor=’black’,\
shrink=0.05))
pylab.show()
0.5
0.0
triangles!
pithy note
0.5
cos(x)
sin(x)
sin(x )
2
1.0
0 50 100 150 200 250 300 350
$\theta
The title appears at the top of the plot. Text labels get places at the x and
y coordinates on the plot and the legend will appear in the upper/lower right/left
corner as specified in the string. The pylab.text(x,y,string, kwargs) method also has
optional key word arguments, specifying font, size, color and the like. The legend
’labelist’ is a list of labels for each plot element. So, every line or point style that
you want in your legend, append a label to the label list after the relevant plot
command. Also note that the legend and xlabel methods use a special format for
strings (r’LateX String’) which allows embedded LaTeX equation syntax to make
scientific equations look right - so now you have to learn LaTeX!. Finally, the arrow
gets drawn with the annotate method, which has a lot of other attributes as well.
188 CHAPTER 6. LISA’S PYTHON NOTES
$\delta^{18}O$
6.12.4 Histograms
I downloaded a week’s worth of earthquake location, magnitude etc. from the web-
site:
http://earthquake.usgs.gov/earthquakes/catalogs/index.php
by clicking on the “XML merged catalog, past 7 days” link.
This a compressed (gnu zip) XML file. After unzipping it (by clicking on it), the
file (called merged catalog.xml looked something like this:
Reading in all the data, I can plot them various ways. In this example, I plot a
histogram of the magnitudes:
#!/usr/bin/env python
import matplotlib
matplotlib.use("TkAgg")
import pylab
def readEQs(infile):
input=open(infile,’rU’).readlines()
EQs=[] # list to put EQ dictionaries in
linenum=0
while linenum <len(input):
if ’event id’ in input[linenum]: # new event
EQ={} # define a dictionary
linenum+=2 # increment past time-stamp
while ’param’ in input[linenum]:
record=input[linenum].split(’=’)
datakey=record[1].split()[0].strip(’"’)
EQ[datakey]=record[2].strip(’\n’).strip(’/>’).strip(’"’)
linenum+=1 # keep going until </event>
if ’</event>’ in input[linenum]: # done with event
EQs.append(EQ)
linenum+=1 # look for next event id
return EQs
EQs=readEQs(’merged_catalog.xml’)
Magnitudes=[] # set up container
for eq in EQs: # step through earthquake dictionaries
Magnitudes.append(float(eq[’magnitude’])) # collect magnitudes
pylab.hist(Magnitudes,bins=50,normed=True) # plot ’em
pylab.xlabel(’Richter Magnitude’)
pylab.ylabel(’Frequency’)
pylab.show()
190 CHAPTER 6. LISA’S PYTHON NOTES
0.8
0.7
0.6
0.5
Frequency
0.4
0.3
0.2
0.1
0.0
0 1 2 3 4 5 6 7
Richter Magnitude
In the example, please notice the clever way in which we parse the XML code.
First, we look for the new event marker, then split the record on its ’=’. This gives
a element list with what we need. The second element contains the key and the
third element the value. To get the key name, we split on the space which puts the
key name (which puts the key name enclosed in quotes in the first element of a list,
which we select and strip off the quotes. This we use as the key in the dictionary,
EQ. To get the value, we use the third element in the ’record’ list (record[2]), strip
off the end of line character and the quotes. We pair the value with the key in the
EQ dictionary and continue, incrementing linenum until we have all the keys picked
off and hit the ’<event >’ line. When we are done with a dictionary, we append
it to a list of all the dictionaries, increment linenum and press on. We keep going
until we have read all the data.
After we have parsed the data file into a list of dictionaries, we make a list for the
thing we one to plot (Magnitudes), hunt through the list, picking out the magnitude
data and after turning it into a float, append it to the list. We plot the histogram
using the pylab.hist method. We can label the plot as per usual and display it with
show(). The dictionaries get appended to a list. This way, any particular key could
easily get fished out later for plotting. In this case, we just put the float of the
’Magnitude’ column into the list Magnitudes. These get plotted in the histogram.
6.12. MATPLOTLIB 191
I mostly think pie charts are silly, but some people love them. So if you want to see
data as a pie chart, say the fraction of earthquakes in each magnitude bin from the
last example, we can modify the script thusly:
#!/usr/bin/env python
import matplotlib
matplotlib.use("TkAgg")
import pylab
from readEQs import *
EQs=readEQs(’merged_catalog.xml’)
Fracs,Labels=[],[]
bin0=0
for m in range(1,8): # assume no magnitudes bigger than 8 last week!
num=0 # initialize count
for eq in EQs:
eqm=float(eq[’magnitude’])
if eqm<m and eqm>bin0:num+=1 # count all magnitudes in this bin
Fracs.append(float(num))
Labels.append(str(bin0)+’-’+str(m))
bin0=m # increment to next bin
pylab.pie(Fracs, labels=Labels)
pylab.axis(’equal’) # make the pie round!
pylab.title(’Silly Pie Chart’)
pylab.show()
6-7
5-6
4-5
1-2
3-4
2-3
Notice how the function readEQs from the histogram example has been put into a
module by itself and then called within this program.
192 CHAPTER 6. LISA’S PYTHON NOTES
6.12.6 Basemap
Most of your maps can be made with GMT, but python can make maps too. Once
you know Python, it can be much easier to use GMT because the same principles
apply and all of the power of matplotlib is available for enhancing your plots. We
generate maps with a special “toolkit” of matplotlib called Basemap.
Here is a simple example, using our earthquake data from the histogram example
to make a mollweide projection with the earthquake locations as red dots.
#!/usr/bin/env python
import matplotlib
matplotlib.use("TkAgg")
import pylab,numpy
from readEQs import *
from mpl_toolkits.basemap import Basemap
EQs=readEQs(’merged_catalog.xml’) # read in data (see histogram example)
Lats,Lons=[],[] # set up lists for location data
for eq in EQs: # step through the list of earthquakes
Lats.append(float(eq[’latitude’])) # collect the latitudes (as floats)
Lons.append(float(eq[’longitude’]))
map=Basemap(projection=’moll’,lon_0=0,resolution=’c’) # create a map instance
map.drawcoastlines()
map.drawmapboundary()
map.drawmeridians(numpy.arange(0,360,30)) # draws longitudes from list
map.drawparallels(numpy.arange(-60,90,30)) # draws latitudes from list
X,Y=map(Lons,Lats) # calculates the projection of the X,Y
pylab.plot(X,Y,’ro’) # uses pylab’s plot to plot these arrays
pylab.savefig(map.eps) # save the figure in EPS format
6.12. MATPLOTLIB 193
In this example, the list of earthquake dictionaries, EQs is read in with the same
file reading function from the histogram example. Then we fish out the latitude
and longitudes from each earthquake dictionary (eq) and append the floating point
equivalent (remember they are strings in the dictionary!) into the relevant lists
(Lats, Lons).
The basic concept of Basemap is that you create a map class instance with a call
to Basemap. The attributes of you map (e.g., resolution, projection, central longi-
tudes, map boundaries, map center, etc.), i.e., all the things you set with switches
in pscoast are set with keyword arguments in the call to Basemap. The details of
these key word arguments depend on the projection you choose.
After you create the map instance (called map) in the above example, you can
modify attributes in using methods available to this particular subclass. To draw the
coastlines, use the method drawcoastlines(). To draw the lines of longitude we use
the method drawmeridians() with the an array (or list) containing the longitudes you
want to plot. In this case we generated the array with numpy.arange, but we could
have used range(0,360,30) or any arbitrary integer or floating point list. Drawing
the lines of latitude works the same way (with the drawparallels() method).
Now we want to plot a bunch of points on the map. By sending in the longitudes
and latitudes of the earthquake locations as arguments to the map class, the class
chews on them and spits out x,y values based on the projection we specified when
creating the map instance with Basemap. Now we can plot the returned X, Y lists
using the ‘regular’ pylab plotting functions. We can go on and decorate the map
with anything available in pylab. The advantage of using Basemap over, say, GMT
is you can build on your knowledge of matplotlib, instead of struggling with two
completely different and incompatible plotting packages.
For more examples, check the documentation available here:
http://matplotlib.github.com/basemap/users/examples.html
ASSIGNMENT P7
Redo your GMT1 assignment using the Basemap toolkit!. Add a point labelled
“Now I’m here!” and the location of SIO.
Of the many plotting methods available in matplotlib, one of the more useful for
Earth Scientists is the contour plot. Here is a example of the gravity anomaly
194 CHAPTER 6. LISA’S PYTHON NOTES
1e 7
0
1.20
20 1.05
0.90
40
0.75
60
0.60
80
0.45
100
0.30
0.15
0 20 40 60 80 100 120
The first new function for us in the above example is meshgrid. This makes 2D
arrays X and Y for a 3D mesh/surface plots. These have the dimensions of x by y.
Each row of X is the same as x and there is a row for every element in y. Similarly,
each column in Y is a copy of y and there is a column for every element in x.
The line h=numpy.sqrt(X**2+Y**2) calculates the horizontal distance h from
the origin for every point on the X,Y grid. Then we calculate the gravity anomaly
g for a spherical mass with radius (R) at depth of z with the command:
g=(G*4.*numpy.pi*R**3.*drho)/(3.*(h**2+z**2))
6.13. DEEPER INTO NUMPY AND SCIPY 195
You can find the formula for this in any geophysics text book. Anyway, now we
have g which if you print it out, looks like this:
It is a 2D array in which every element is the value of g at that grid point. There are
a bunch of ways to visualize this, but here we plot it as a contour plot. To do this we
interpolate between all the grid points and choose a color map translating the value
of g into a color from blue to orange. When choosing color maps, be aware that a
lot of people are red-green color blind and appreciate some other color contrast, like
the blue-orange one chosen here.
Now we have some plotting skills under our belt, we can take advantage of the com-
putational tools available in NumPy and another scientific package called Scipy. We
have already met these: cos(), sin(), pi(), arctan2(), arange(), among others. Now
meet polyfit(x,y,order), and polyval(coeffs,x). The former fits an n-order polynomial
to input data and returns a list of the coefficients and the latter evaluates the value
at x using coefficients returned from polyfit.
To find a best fit line (y = mx + b where m is the slope and b is the intercept) we
can use polyfit() by setting for order to one. Order is two for a quadratic polynomial,
(y = ax2 + bx + c) and three for cubic (y = ax3 + bx2 + cx + d). See how these work
in the following example:
#!/usr/bin/env python
import matplotlib
matplotlib.use("TkAgg")
196 CHAPTER 6. LISA’S PYTHON NOTES
import numpy,pylab
from numpy import random
x,y=[1.,2.,3.,4.],[1.,1.8,3.4,4.2] # defines two lists
X=numpy.arange(x[0],x[-1]+.5,.1)
pylab.plot(x,y,’ro’)
coeffs=numpy.polyfit(x,y,1)
Y=numpy.polyval(coeffs,X)
pylab.plot(X,Y,’r-’)
y2=[3.42, 11.24, 19.86, 34.87]
pylab.plot(x,y2,’bs’)
coeffs2=numpy.polyfit(x,y2,2)
Y2=numpy.polyval(coeffs2,X)
pylab.plot(X,Y2,’b-’)
pylab.show()
45
40
35
30
25
20
15
10
0
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5
In the first call to the function polyfit(), it returns a list coeffs=[ 1.12 -0.2 ], with
the coefficients m and b; strangely it has no commas, so be careful. This list can be
passed directly into polyval(), which returns the y values evaluated at the positions
in the X list (or array) that is passed in. We plot these values as the red line,
compared to the original data points (x, y) which are red dots.
In the second call to the function polyfit(), we sent it new values for y (y2) and
asked for a second order polynomial fit. coeffs2 is [ 1.7975 1.3095 0.5925] which are
a, b, c respectively. This in turn gets churned through polyval() and plotted as the
blue curve (as compared to y2 which are the blue dots).
6.13. DEEPER INTO NUMPY AND SCIPY 197
We zoomed by the finer points of list and array indexing earlier in the chapter, eager
to get to plotting. Now we delve deeper.
First a review of lists. You will recall that in python, indexing starts with 0,
so for the list L=[0,2,4,6,8], L[1] is 2. The index of the last item is -1, so L[-1]=8.
To find out what the index for the number 4 is, for example, we have the index()
method: L.index(4), which will return the number 2. We actually already used
this method when we implemented command line arguments, but it wasn’t really
explained. We know that to reassign a given index a new value we use the syntax
L[1]=2.5. And to use a part of a list (a slice) we use, e.g., B=L[2:4], which defines
B as a list with L’s elements 2 and 3 (4 and 6). And you also know that B=L[2:]
takes all the elements from 2 to the end. From these examples, you can infer that
the basic syntax for slicing is [start:stop:step]; if the step is omitted it is assumed
to be 1.
Arrays (and matrices) work in a similar fashion to lists, but these are mul-
tidimensional objects, so things get hairy fast. The basic syntax is the same:
[start:stop:step], or i:j:k. but with Python arrays, we step through all the j’s for
each i at step k. This is best shown with examples:
Let’s pick about the statement B[1:3,:-1:2] to see if we can understand what it
does. The first part alone returns lines 2 and 3:
>>> B[1:3]
array([[ 6., 7., 8., 9., 10., 11.],
[ 12., 13., 14., 15., 16., 17.]])
Here j goes from [:-1], in other words, we all but the last element:
>>> B[1:3,:-1]
array([[ 6., 7., 8., 9., 10.],
[ 12., 13., 14., 15., 16.]])
198 CHAPTER 6. LISA’S PYTHON NOTES
And finally, we have the step of 2, which takes every other element:
>>> B[1:3,:-1:2]
array([[ 6., 8., 10.],
[ 12., 14., 16.]])
Earlier in the course, we learned that for loops with lists just step through item by
item. In n-dimensional arrays, they steps through row by row (like in slicing). For
example,
>>> for r in B:
... print r
...
[ 0. 1. 2. 3. 4. 5.]
[ 6. 7. 8. 9. 10. 11.]
[ 12. 13. 14. 15. 16. 17.]
[ 18. 19. 20. 21. 22. 23.]
[ 24. 25. 26. 27. 28. 29.]
If you really want to step through element by element, you can use the ravel()
method which flattens an N-dimensional array to a single dimension:
of distributions. Python does this with the numpy.random module. To learn more,
read the documentation at:
http://docs.scipy.org/doc/numpy/reference/routines.random.html
Let’s look at a few of these:
#!/usr/bin/env python
import matplotlib
matplotlib.use("TkAgg")
import pylab
from numpy import random
N,min,max=500,10,20
bins=(max-min)*2
Nums=[]
for n in range(N):
Nums.append(random.uniform(min,max))
pylab.hist(Nums,bins=bins,facecolor=’orange’)
pylab.title(’Uniform distribution’)
pylab.show()
35 Uniform distribution
30
25
20
15
10
010 12 14 16 18 20
The new twist here is the call to the uniform() method of the random() module
of NumPy. This returns uniformly distributed numbers between the min and max
values specified. The other embellishments were to the hist() module in matplotlib
by increasing the number of bins (the default is too few in my opinion) and to change
the color of the columns to a pretty shade of orange.
200 CHAPTER 6. LISA’S PYTHON NOTES
There are dozens of other distributions available in the random module, but the all
time favorite is the normal, or Gaussian distribution. Here we have an example of
how to use it to retrieve numbers from a normal distribution with a mean of 10 and
a standard deviation of 2:
#!/usr/bin/env python
import matplotlib
matplotlib.use("TkAgg")
import pylab
from numpy import random
N,mu,sigma=500,10,2
Nums=[]
for i in range(N):
Nums.append(random.normal(mu,sigma))
pylab.hist(Nums,bins=20,facecolor=’orange’)
pylab.title(’Normal distribution’)
pylab.show()
60 Normal distribution
50
40
30
20
10
02 4 6 8 10 12 14 16
All this talk about normal distributions makes me hungry for some statistical anal-
ysis. There are a large number of statistical packages available in the Enthought
Python Distribution we are using for this class. Numpy has a number of useful
functions and there are many more hidden in the stats module of the Scipy package.
6.13. DEEPER INTO NUMPY AND SCIPY 201
For a summary of available functions, see: The documentation is still a bit thin, but
to get you started, see:
http://docs.scipy.org/doc/numpy/reference/routines.statistics.html
and
http://docs.scipy.org/doc/scipy/reference/stats.html
Here we will consider a few useful functions to give you a feel for how they work.
Let’s start with 1D arrays and find the mean, standard deviation and sum. For
fun, we can use some of the fake data generated by the random.normal() function
and see how the actual means and standard deviations of a data set drawn from a
normal distribution compare with the true mean of µ and standard deviation σ.
From the normal distribution example, we generated a list of numbers drawn
from a distribution with µ = 10 and sigma = 2:
#!/usr/bin/env python
import numpy
from numpy import random
N,mu,sigma=500,10,2
Nums=[]
for i in range(N):
Nums.append(random.normal(mu,sigma))
ANums=numpy.array(Nums)
print numpy.mean(ANums), numpy.std(ANums),numpy.sum(ANums)
We wanted to calculate the mean, standard deviation and sum of our sample (Nums
using the functions numpy.mean(), numpy.std(), numpy.sum(). These work on ar-
rays, not lists, so we first convert to an array: ANums=numpy.array(Nums). When
we run this code, we get:
Note that you will get a different answer every time you run this because the random
sample really is pretty random. The mean and standard deviation here are 10.042....
and 2.0027, which are close to the true values of 10 and 2 respectively.
ASSIGNMENT P8:
Modify the statistics example above to calculate 1000 versions of Nums, with an
√
N of 10. Calculate the mean and standard error (standard deviation/ N ) values
202 CHAPTER 6. LISA’S PYTHON NOTES
for each sample. Plot a histogram of the means. The standard error times 1.96
is the 95% confidence bound for the mean, i.e., the mean ± these bounds should
contain the true mean (10) 95% of the time. For what fraction of the 1000 samples
is this true?
We can also use the statistics methods on n-dimensional arrays. Here, the argu-
ment can specify the axis along which we want to do the calculation. Recalling the
arrays from before, we can illustrate the use of the numpy.sum() function as follows:
>>> A= numpy.array([[1,2,3],[4,2,0],[1,1,2]])
>>> A.sum(axis=0)
array([6, 5, 5])
>>> A.sum(axis=1)
array([6, 6, 4])
Having introduced you to the joys of command lines, why now a lecture on GUIs?
GUIs work differently than all the other scripts we have been writing in that they
don’t just start at one end and proceed through to the end - the program flow can
be controlled by the user. GUIs let command line phobic people use your software.
They allows greater degree of interactivity with your data visualization. They can
streamline data analysis in an intuitive way. And, they are fun.
Okay, how do I make a GUI? GUIs are composed of a number of things like
text boxes, radio buttons, sliders, etc. called widgets. When the program starts, it
first builds the GUI then waits for something to happen (a menu to be selected or a
mouse click or some data to be entered.... ) - a state called an event loop. There are
a number of ways of creating GUIs in Python. The oldest and most standard way
is called Tkinter which is based on the even older UNIX language Tk. The newer
wxPython has now reached some maturity and comes standard with the Enthought
Python Distribution. see:
http://wiki.wxpython.org/Getting%20Started
http://zetcode.com/wxpython/
and
http://wiki.wxpython.org/wxPython%20by%20Example
It seems to be the way things are going, so we’ll use wxPython in this class.
6.14. GRAPHICAL USER INTERFACES - GUIS 203
wxPython package
describes the application
subclass resizes the window
initializes the App subclass around the image
makes an image object makes a status bar
Elements of a GUI:
Widgets Things you put on your GUI like text boxes, buttons,
radio buttons, check boxes, pull down menus, etc.
see figure below.
Events Things that trigger actions like
mouse movements and clicks,
menu selections, button clicks, etc.
Dialogs and Messages Pop up windows with information or action items.
Layout Setting up your GUI with a nice layout
Input and Output Getting data in and out.
Check boxes
Buttons
Text Entry
Radio Buttons
The example is for a simple editor illustrating widgets, events, dialogs, and layout.
206 CHAPTER 6. LISA’S PYTHON NOTES
Define some ID numbers for future use
Make a Menu object, append two options with a separator
Make a menuBar object, append the Menu object under
the label “File” and stick it on the top of the frame
Bind actions to each menu item ((e.g,
g OnOpen
p to ID_OPEN)
Gets the file name, opens it and reads it into the text
editor (named control)
which creates a text box and a file menu. You can open a file for editing (but saving
it costs extra.)
1. read in the mean and standard deviation using raw input() queries (annoying!)
4. read them in from Entry boxes using a GUI, construct a command and running
the program automatically through the command line as in the above ideas.
5. read them in from Entry boxes using a GUI, and call a method from within
the GUI
You should already know how to do the first three options from what you have
already learned. But here are a few examples for fun.
Option 1) Program queries with raw input():
#!/usr/bin/env python
import matplotlib
matplotlib.use("TkAgg")
import pylab,sys
import exceptions # new module that allows error trapping!
from numpy import random
"""random_2_i"""
N,mu,sigma=500,10,2
pylab.ion()
fig=pylab.figure(1,figsize=(5,4))
while 1:
Nums=[]
try: # executes unless error!
mu=float(raw_input(’Enter mean: ’))
sig=float(raw_input(’Enter sigma: ’))
for i in range(N):
Nums.append(random.normal(mu,sigma))
pylab.hist(Nums,bins=20,facecolor=’orange’)
pylab.title(’Normal distribution’)
pylab.draw()
ans=raw_input("Press return to continue, q to quit: ")
if ans==’q’: break
pylab.clf() # clears figure so we can plot a new one
208 CHAPTER 6. LISA’S PYTHON NOTES
The program will keep updating the figure until you type ‘q’. There are two new
things in this code snippet, which could come in handy: 1) the pylab.clf() command,
which clears the figure and allows us to plot a new one and 2) error trapping with
the module exceptions. In this example, when you enter an invalid number, the
error trap syntax:
try:
execute some code if no error happens
except:
do this if you get an error - note you can specify the
error type.
will give you a warning and then go back up to the top of the while loop.
Option 2) Read from standard input
#!/usr/bin/env python
6.14. GRAPHICAL USER INTERFACES - GUIS 209
import matplotlib
matplotlib.use("TkAgg")
import pylab,sys
from numpy import random
"""random\_2\_file.py"""
fig=pylab.figure(1,figsize=(5,4))
pylab.title(’Normal distribution’)
line=sys.stdin.readline()
N,Nums=500,[]
params=line.split()
mu,sigma=float(params[0]),float(params[1])
for i in range(N):
Nums.append(random.normal(mu,sigma))
pylab.hist(Nums,bins=20,facecolor=’orange’)
pylab.show()
#!/usr/bin/env python
import matplotlib
matplotlib.use("TkAgg")
import pylab,sys
from numpy import random
"""random\_2\_switch.py"""
N,mu,sigma=500,10,2
pylab.ion()
if ’-mu’ in sys.argv: mu=float(sys.argv[sys.argv.index(’-mu’)+1])
if ’-sig’ in sys.argv: sig=float(sys.argv[sys.argv.index(’-sig’)+1])
Nums=[]
for i in range(N):
Nums.append(random.normal(mu,sigma))
pylab.hist(Nums,bins=20,facecolor=’orange’)
pylab.title(’Normal distribution’)
pylab.draw()
raw_input("Any key to quit")
This uses the already familiar use of command line switches and can be run with
a command like:
#!/usr/bin/env python
import os,wx
""" gauss_GUI_cmd.py"""
ID_MEAN,ID_STD=101,102
class App(wx.App):
def OnInit(self):
self.frame=MyFrame(None,-1,’Gauss command line’)
self.frame.Center()
self.frame.Show()
return True
class MyFrame(wx.Frame):
def __init__(self, parent,id,title):
wx.Frame.__init__(self,parent,id,title,(-1,-1),wx.Size(300,300))
panel=wx.Panel(self,-1)
self.mean_label=wx.StaticText(panel,-1," Mean",wx.DLG_PNT(panel,15,5))
self.mean_entry=wx.TextCtrl(panel,ID_MEAN,"",wx.DLG_PNT(panel,40,5),\
wx.DLG_SZE(panel,80,12))
self.std_label=wx.StaticText(panel,-1," STD",wx.DLG_PNT(panel,15,20))
self.std_entry=wx.TextCtrl(panel,ID_STD,"",wx.DLG_PNT(panel,40,20),
wx.DLG_SZE(panel,80,12))
button=wx.Button(panel,-1,"Plot",wx.DLG_PNT(panel,40,45),wx.DLG_SZE(panel,25,12))
button.Bind(wx.EVT_BUTTON,self.PlotGauss)
def PlotGauss(self,event):
commandstring=’random_2_switch.py -mu ’+self.mean_entry.GetValue()
+’ -sig ’+self.std_entry.GetValue()
print ’running: ’,commandstring
os.system(commandstring)
app=App(False)
app.MainLoop()
And we get:
6.14. GRAPHICAL USER INTERFACES - GUIS 211
#!/usr/bin/env python
import matplotlib; matplotlib.use(’TkAgg’)
from matplotlib.backends.backend_wx import FigureCanvasWx,\
FigureManager, NavigationToolbar2Wx # import some matplotlib tools for wxPython
import numpy,pylab,wx
ID_MEAN,ID_STD,ID_PLOT=wx.NewId(),wx.NewId(),wx.NewId() # makes wxPython assign ids
class PlotFigure(wx.Frame): # initializes plot window
def __init__(self):
wx.Frame.__init__(self, None, -1, "matplotlib in wxFrame")
self.Destroy()
Let’s take a closer look at the PlotFigure class. This is where I design the basic
GUI. I want a plot window to stick my matplotlib figure. I want a toolbar under
my figure. I want some text entry boxes for putting in the mean and standard
deviation I want a button to trigger a new plot, and I want a button to quit the
program nicely. These all need to be laid out in a reasonable fashion.
So, first, I need to make some boxes to put things in. Actually, I need two
boxes for this: one to put all the text entry and plot widgets in and one to assemble
everything nicely in.
To create the entry and plot widgets and putting them in a box I can use the
following code:
sizer = wx.BoxSizer(wx.VERTICAL) # make a box called sizer, things will add vertically
sizer.Add(self.canvas, 1, wx.LEFT|wx.TOP|wx.GROW) # put in the canvas
sizer.Add(box, 0, wx.GROW) # put in the widget box (with text entry boxes, etc.)
sizer.Add(self.quitbutton, 0, wx.GROW) # put in the quit button
sizer.Add(self.toolbar, 0, wx.GROW) # put in the toolbar
self.SetSizer(sizer) # resize things nicely
self.Fit() # make things fit in properly
self.plotbutton=wx.Button(self,-1,"Plot")
self.plotbutton.Bind(wx.EVT_BUTTON,self.PlotGauss)
self.quitbutton=wx.Button(self,-1,"Quit")
self.quitbutton.Bind(wx.EVT_BUTTON,self.OnQuit)
box.Add(self.mean_label, 0, wx.GROW)
box.Add(self.mean_entry, 0, wx.GROW)
box.Add(self.std_label, 0, wx.GROW)
box.Add(self.std_entry, 0, wx.GROW)
box.Add(self.plotbutton, 0, wx.GROW)
sizer.Add(box, 0, wx.GROW)
sizer.Add(self.quitbutton, 0, wx.GROW)
sizer.Add(self.toolbar, 0, wx.GROW)
self.SetSizer(sizer)
self.Fit()
self.fig.add_subplot(111)
self.PlotGauss(self)
def PlotGauss(self, event):
pylab.figure(num=1)
pylab.clf()
mu,sigma=float(self.mean_entry.GetValue()),float(self.std_entry.GetValue())
N,Nums=500,[]
for i in range(N):
Nums.append(random.normal(mu,sigma))
pylab.hist(Nums,bins=20,facecolor=’orange’)
self.canvas.draw()
self.canvas.gui_repaint()
def OnQuit(self,event):
self.Destroy()
app = wx.PySimpleApp()
frame = PlotFigure()
frame.Show()
app.MainLoop()
This gives a completed GUI that responds to the text entry and replots the
histogram when the plot button is clicked. It also quits nicely when you click on
‘quit’.
In the section on matplotlib, we learned about the pylab module. Behind the scenes,
pylab is an interface to three layers in matplotlib: the FigureCanvas (the area onto
which the figure gets drawn), the Renderer (the thing that does the drawing) and
the Artist (controls the Renderer to paint on the FigureCanvas). Up to now, we used
pylab handle the details for us. To take control of the plots (placement of figures,
fonts, tickmarks, axes....) you need to know more about Artist containers (Fig-
ure, Axis, and Axes) and things that get drawn in them called “primitives” (lines,
6.14. GRAPHICAL USER INTERFACES - GUIS 215
rectangles, text, images). For a nice tutorial look in the matplotlib documentation:
http://matplotlib.sourceforge.net/users/artists.html#artist-tutorial
Artist
Artist objects (like lines, tick marks, axes, text) are all configurable. When you use
a command like add axes or add subplot you create an Axes instance. Remember
how we made an Axes instance and called ax?
fig=pylab.figure()
ax=fig.add_subplot(111)
Every time you plot something on fig with,e.g., the plot command, you create an
object of the Line2D class (yes even points). pylab keeps a record of each of these
plot instances by adding to a list associated with the Axis instance and retrieved by
the method ’lines’ (e.g., ax.lines is the list of plotting calls on the Axes instance ax).
BTW: This is how the legend commanworks, if you remember. So by identifying
the line you want in the list, you can change its attributes (e.g., color, linewidth,
linestyle or marker). I realize that was way more than you wanted to know right
now, but we will need some idea of what FigureCanvas does in the following.
Line color
#!/usr/bin/env python
import matplotlib
matplotlib.use("TkAgg")
import pylab, numpy
""" Program linecolor.py"""
pylab.ion() # makes plot interactive
fig=pylab.figure() # makes a figure instance
ax=fig.add_subplot(111) # Axes instance
t=numpy.arange(0,1,.01)
s=numpy.sin(2*numpy.pi*t)
c=numpy.cos(2*numpy.pi*t)
ax.plot(t,s,color=’blue’,lw=2) #Line2D instance
ax.plot(t,c,color=’magenta’,lw=2)
pylab.draw()
print ax.lines # prints all your plot instances
print ’last line color: ’,ax.lines[-1].get_color()
raw_input("Any key to change last line to red ")
ax.lines[-1].set_color(’red’) # sets last line to red
pylab.draw()
raw_input()
216 CHAPTER 6. LISA’S PYTHON NOTES
Mouse events
Earlier, we learned how to make GUIs using wxPython and even managed to embed
a matplotlib figure into a wxPython Frame. But we couldn’t interact with the
plot directly, say by clicking on an individual box in the tictactoe example and
have the program place an ’X’ or an ’O’ there. To do this, we need the program to
recognize key or mouse events and return these to the program. In wxPython we had
events (EVT BUTTON) which were connected to callback functions (like OnQuit)
using the method BIND. In matplotlib we have a similar event ’button press event’
which can be connected to a function (e.g., onclick) using the FigureCanvas method
mpl connect, e.g.:
fig.canvas.mpl_connect(’button_press_event’, \
onclick)
Mouse events are the most common type of interaction with plots. We can use
them to identify data points, say in a digitizer, or to flag them as bad, or pick
them as special (e.g., P wave arrival, stratigraphic tie point, start or end point for
a calculation) matplotlib supports several mouse events:
Event Name Description
’button press event’: mouse button pressed
’button release event’ mouse button is released
’motion notify event’ mouse action
’scroll event’ mouse scroll wheel is rolled
’pick event’ an Artist object is selected
Here is an example of a button press event:
#!/usr/bin/env python
""" Program onclick.py"""
import matplotlib, numpy
matplotlib.use("TkAgg")
import pylab
from matplotlib.backend_bases import FigureCanvasBase # imports canvas tools
pylab.ion()
fig=pylab.figure()
ax=fig.add_subplot(111)
data=numpy.random.rand(10) # get 10 random numbers
ax.plot(data)
ax.plot(data,’ro’)
pylab.draw()
def onclick(event):
print ’button=%d, x=%d, y=%d, xdata=%f,\
ydata=%f’%(event.button, event.x, event.y, \
6.14. GRAPHICAL USER INTERFACES - GUIS 217
event.xdata, event.ydata)
cid=fig.canvas.mpl_connect(’button_press_event’,\
onclick) # connect the button press to the function onclick
raw_input() #pauses the program
You could combine this with the editor we wrote before to make a digitizer!
In the last example, we just identified the location of the mouse click, but didn’t
identify any particular plot object. In principle, each object (line, text, rectangle,
axes) could be picked. There is a catch however. When you create the object, you
have to do these things too:
1. set the ‘picker’ to True (and usually some floating point tolerance).
#!/usr/bin/env python
import matplotlib; matplotlib.use("TkAgg")
import pylab,numpy
class LineColor:
"""connects the picker to an Artist Line2D object and changes line color"""
def __init__(self,line):
self.line=line
self.connect=\
self.line.figure.canvas.mpl_connect(\
’pick_event’,self.on_pick)
def on_pick(self,event): # finds right line
if event.artist!=self.line: return
self.line.set_color(’red’) # makes red
self.line.set_linewidth(2) # makes fatter
self.line.figure.canvas.draw() # redraws line
def main():
"""
Program clickme.py
Plots some lines that are clickable, connects them to the LineColor action.
"""
fig=pylab.figure()
ax=fig.add_subplot(111)
t=numpy.arange(0,1,.01)
s=numpy.sin(2*numpy.pi*t)
ax.plot(t,s,color=’blue’,picker=True)
ax.plot(t+.25*numpy.pi,s,color=’magenta’,picker=True)
ax.plot(t+.5*numpy.pi,s,color=’cyan’,picker=True)
lines=[] # makes a list to store clickable line objects
218 CHAPTER 6. LISA’S PYTHON NOTES
The class LineColor is the action that happens when a plot object gets clicked on.
On initialization, it connects the click action to the function on pick which turns
the object red and fattens it up a bit. The main program draws some lines. Each
plot instance gets stored in the list ax.lines. Then the objects in ax.lines get turned
into clickable LineColor objects. When you click on a line, it turns red and fattens
up a bit.
Your final is to do a tic-tac-toe program, and it would be fun to make it work
inside a GUI. To get you most of the way there, I wrote a really silly Tic-Tac-Toe
program. It is not a GUI, just a clickable matplotlib window:
#!/usr/bin/env python
import matplotlib
matplotlib.use("TkAgg")
import pylab
import numpy,sys,exceptions
from numpy import random
def finish(who,xline,yline):
""" Check who won"""
if len(xline)>0:
if who==0: # player wins
pylab.plot(xline,yline,’r-’)
print ’you have won!’
else: # computer wins
pylab.plot(xline,yline,’g-’)
raw_input(’you have lost to a stupid computer!’)
pylab.draw()
cid = fig.canvas.mpl_connect(’button_press_event’, quit) # quit on mouse click
print ’click anywhere on plot to quit’
def quit(event): # graceful exit
sys.exit()
def winner(who,myboxes):
""" checks to see if boxes selected have won """
# check for diagonals first
if ’11’ in myboxes and ’22’ in myboxes and ’33’ in myboxes: \
finish(who, [.5,2.5],[.5,2.5])
if ’13’ in myboxes and ’22’ in myboxes and ’31’ in myboxes: \
finish(who, [.5,2.5],[2.5,.5])
# check for rows)
if ’11’ in myboxes and ’21’ in myboxes and ’31’ in myboxes: \
6.14. GRAPHICAL USER INTERFACES - GUIS 219
finish(who, [.5,2.5],[.5,.5])
if ’12’ in myboxes and ’22’ in myboxes and ’32’ in myboxes: \
finish(who, [.5,2.5],[1.5,1.5])
if ’13’ in myboxes and ’23’ in myboxes and ’33’ in myboxes: \
finish(who, [.5,2.5],[2.5,2.5])
# check for columns)
if ’11’ in myboxes and ’12’ in myboxes and ’13’ in myboxes: \
finish(who, [.5,.5],[.5,2.5])
if ’21’ in myboxes and ’22’ in myboxes and ’23’ in myboxes: \
finish(who, [1.5,1.5],[.5,2.5])
if ’31’ in myboxes and ’32’ in myboxes and ’33’ in myboxes: \
finish(who, [2.5,2.5],[.5,2.5])
Contour plots are really just a way to visualize something that is inherently 3D on
a 2D surface. Think about a topographic map - the contour intervals are elevations
and our brains can reconstruct the 3D world by looking at the contours on the map.
6.15. 3D PLOTTING WITH PYTHON 221
But with computers we can visualize the 3D world in a more realistic manner. There
are lots of 3D plotting packages, and even within Python there are several different
approaches, one using a 3D toolkit of matplotlib that uses the same logic as for
‘regular’ matplotlib. For more on this module, see:
http://matplotlib.sourceforge.net/mpl toolkits/mplot3d/index.html
But for more 3D horsepower, there is a module called mlab, which is part of the
enthought.mayavi module. See:
http://github.enthought.com/mayavi/mayavi/mlab.html
And then there is Mayavi itself, which comes with the Enthought Python Edition.
This was way beyond what I know, but if you are curious, check out this website:
http://github.enthought.com/mayavi/mayavi/examples.html
#!/usr/bin/env python
import matplotlib
matplotlib.use("TkAgg")
import pylab,numpy
from mpl_toolkits.mplot3d import axes3d
G=6.67e-11 # grav constant in Nm^2/kg^2 (SI)
R=2. # radius in meters
z=3. # depth of burial
drho=500 # density contrast in kg/m^3
x=numpy.arange(-2.*z,2.*z,0.1)
y=numpy.arange(-2.*z,2.*z,0.1)
X,Y=pylab.meshgrid(x,y)
h=numpy.sqrt(X**2+Y**2)
g= (G*4.*numpy.pi*R**3.*drho)/(3.*(h**2+z**2))
fig=pylab.figure()
ax=axes3d.Axes3D(fig)
ax.plot_surface(X,Y,g)
ax.set_xlabel(’X’)
ax.set_ylabel(’Y’)
ax.set_zlabel(’Z’)
pylab.show()
6.15. 3D PLOTTING WITH PYTHON 223
The call to mlab.figure creates a figure instance instance with the background
color (bgcolor) set to white. In mlab, color gets set with the familiar r,g,b but here
the colors run from 0 (black) to 1 (full strength), so a color of (1,1,1) is white,
(1,0,0) is red, and so on. The default is for a black background. Then the surface
gets drawn with a call to mlab.surf().
There are lots more 3D plotting functions available in the two packages described
here. To whet your appetite, I’ve picked out a few:
6.15. 3D PLOTTING WITH PYTHON 225
Here are some considerations for you to help you decide which way you want to
go:
mplot3d versus mlab
mplot3d (matplotlib) mlab (mayavi)
Pros: Pros:
mplot3d is a natural extension of pylab Prettier
it is easier to learn Interactivity
pylab functions work for mplot3d too More functions
Can be animated :)
Cons Cons
Limited plotting styles no svg output
ps and eps are buggy
slower
harder to learn for pylab masters
Lines of ux
Professor R.L. Parker (our hero and professor emeritus of the SIO department)
wrote a Fortran (f77) program called ‘force.f’. This was a slight modification of his
‘magmap’ program that calculates magnetic field vectors given a geomagnetic refer-
ence model (see Bob’s software website http://igppweb.ucsd.edu/∼parker/Software/).
The program force.f has disappeared from Professor Parker’s website in the mean
time but is available on the class website at:
http://mahi.ucsd.edu/class233/force.zip
You run it with a session like this:
it creates a file draws the magnetic lines of flux from the core outward and saves
the data to a file like this:
% force
=================
226 CHAPTER 6. LISA’S PYTHON NOTES
igrf 2005
lines 200
radius 0.547
output lines.f05
exec
===================
igrf 2005
lines 200
radius 0.547
output lines.f05
===================
=================
quit
The red commands are typed by the user and then blue stuff is the program response.
This will create an output file lines.f05 which looks something like this:
The first three columns are x, y, z on a magnetic flux line and the fourth is R in
units of core mantle boundary radii. Each field line is separated from the rest by an
entry with the number of points in the previous line and 3 “50.000”s Some of the
field lines go WAY out in space (100 CMB radii); we’ll come back to this later.
Parker also provided a script (look) which chops off the parts of the lines that are
more than 4 radii away, projects the 3D lines onto a plane specified by the user and
saves the data in a new file. It also invokes Parker’s most famous program plotxy
(which IS available on his website). Plotxy produces a postscript file mypost.
When run with the command look 45 75 ¡ lines.f05, we get:
The thought occurs, wouldn’t this look cool in 3D? Here is a little script inspired
by the plot3d() example from the Mlab gallery.
228 CHAPTER 6. LISA’S PYTHON NOTES
#!/usr/bin/env python
import numpy as np
from enthought.mayavi import mlab
lines=np.fromstring(open(’lines.f05’).read(),dtype=float,sep=’ ’)
Xs,Ys,Zs,Rs=lines[0:-4:4],lines[1:-3:4],lines[2:-2:4],lines[3:-1:4]
line=0
lx,ly,lz,lr=[],[],[],[]
mlab.figure(bgcolor=(1,1,1)) # sets the background to white
while line<len(Xs):
if Rs[line]<5: # truncates far away field lines
lx.append(Xs[line])
ly.append(Ys[line])
lz.append(Zs[line])
lr.append(Rs[line])
elif Rs[line]>=5 and Ys[line]!=50.: # detects the 50’s
while Rs[line]>5 and Rs[line]!=50:
line+=1
x,y,z,r=np.array(lx),np.array(ly),np.array(lz),np.array(lr)
mlab.plot3d(x,y,z,r,colormap=’Spectral’)
lx,ly,lz,lr=[],[],[],[]
if Ys[line]==50. and line<len(Rs):
x,y,z,r=np.array(lx),np.array(ly),np.array(lz),np.array(lr)
mlab.plot3d(x,y,z,r,colormap=’Spectral’)
lx,ly,lz,lr=[],[],[],[]
line+=1
mlab.show()
which produces something like this (but in 3D you can wiggle it around - much
more fun!):
6.15. 3D PLOTTING WITH PYTHON 229
Eigenvectors
Linear algebra has a lot of applications in the geosciences. One of the most useful
tricks is to calculate what are called “eigenparameters”. Say you have a bunch of
points and want to calculate a best-fit line through them - but they are in three
dimensions. Or you want a best fit plane, say the fault surface through a bunch of
earthquakes. Or you want to know the principal axis of the moment of inertia tensor.
Or the orientations of a stress or strain tensors. Or the preferred orientation of
mineral grains or clasts in a sedimentary deposit. Or the directions of the anisotropy
of just about anything. And the list goes on. Here is an example that finds the
eigenvectors and eigenvalues of what is called the covariance matrix of a bunch of 3D
points. These could be the end points of unit vectors (directions), or point masses
in space, for example.
#!/usr/bin/env python
import numpy
from numpy import linalg
from enthought.mayavi import mlab
dat=open(’points.xyz’,’rU’).readlines()
x,y,z=[],[],[]
for line in dat:
rec=line.strip(’\n’).split()
x.append(float(rec[0]))
y.append(float(rec[1]))
z.append(float(rec[2]))
X,Y,Z=numpy.array(x),numpy.array(y),numpy.array(z)
T=numpy.array([[numpy.sum(X*X),numpy.sum(X*Y),numpy.sum(X*Z)],\
[numpy.sum(Y*X),numpy.sum(Y*Y),numpy.sum(Y*Z)],\
[numpy.sum(Z*X),numpy.sum(Z*Y),numpy.sum(Z*Z)]])
evals,evects=linalg.eig(T)
print ’principal axis: ’,evects.transpose()[0], ’ with variance of ’,evals[0]
print ’major axis: ’,evects.transpose()[1], ’ with variance of ’,evals[1]
print ’minor axis: ’,evects.transpose()[2], ’ with variance of ’,evals[2]
pv=evects.transpose()[0]*3.
mlab.figure(bgcolor=(1,1,1))
mlab.points3d(X,Y,Z,color=(0,0,0),scale_factor=0.25,opacity=.5)
mlab.outline(color=(.7,0,0))
mlab.plot3d([pv[0],-pv[0]],[pv[1],-pv[1]],\
[pv[2],-pv[2]],tube_radius=0.1,color=(0,1,0))
mlab.show()
230 CHAPTER 6. LISA’S PYTHON NOTES
This script opens a file called points.xyz which looks like this:
It then parses the data into lists of floating point variables. These get turned into
arrays. Then the sums of the products and squares get put into a coherence matrix
of the form:
P 2 P P
x xy xz
P P 2 P
xy y yz .
P P P 2
xz yz z
The function linalg.eigs() returns the eigenvectors and eigenvalues of this “T” ma-
trix. The largest (principal) eigenvalue corresponds to the principal eigenvector and
is the axis along which the variance (spread) is the greatest. The minor eigenvector
corresponds to the axis along which the variance is least. One “feature” of this
function is that the coordinates of the eigenvectors are along axis 0, so are the first
column of the evects array:
evects=
[[ 0.70464008 -0.70917141 0.02362748]
6.15. 3D PLOTTING WITH PYTHON 231
% points.py
principal axis: [ 0.70464008 -0.001072 0.70956409] with variance of 733.172618519
major axis: [-0.70917141 0.03223454 0.70429883] with variance of 30.501077006
minor axis: [ 0.02362748 0.99947976 -0.02195351] with variance of 8.55431560739
These notes will reflect my attempt to learn Python from Lisa’s lecture notes. My
strategy will be to write a series of programs that mimic the programs in my For-
tran90 notes.
This assumes that Enthought Python is installed. To verify this, enter:
% which python
#! /usr/bin/env python
# simple Python test program (printmess.py)
print ’test message’
Provided one has execute permission on this file (chmod 755 printmess.py), this can
be run by entering:
% ./printmess.py
test message
%
233
234 CHAPTER 7. PETER’S PYTHON NOTES
Notice that we need to start with ./ because . is not in our path (for security
reasons).
The first line MUST be:
#! /usr/bin/env python
so that the file is interpreted as Python. Unlike Fortran or C, you CANNOT start
with a comment line (try switching lines 1 and 2 and see what happens).
The second line is a comment line. Anything to the right of # is assumed to be
a comment (in Fortran ! serves the same function).
Notice that print goes by default to your screen. You can use single or double
quotes for the test message. You can get an apostrophe in your output by using
double quotes and quote marks by using single quotes, i.e.,
#! /usr/bin/env python
# simple Python test program 2 (printmess2.py)
print "The pump don’t work ’cuz the vandals took the handles"
print ’She said "I know what it\’s like to be dead"’
produces:
% ./printmess2.py
The pump don’t work ’cuz the vandals took the handles
She said "I know what it’s like to be dead"
%
In the second print statement, the backslash in front of the apostrophe is necessary
to prevent an error (try it).
ASSIGNMENT P1
Write a Python script to print your favorite pithy phrase.
#! /usr/bin/env python
a = 2
b = 3
c = a*b
print ’product = ’, c
7.1. HOW TO MULTIPLY TWO INTEGERS 235
The program uses three variables, the letters a, b and c. Variable names should
always start with a letter. The remaining characters can be any combination of
letters, numbers, underscores ( ), and dashes (-). Unlike Fortran, variables are
CASE SENSITIVE (x is different from X). Variables in Python can be of many
types, including real (floating point), integer, complex, string, and logical. However,
unlike C or Fortran their type is not explicitly defined. Instead it is determined by
the syntax upon first use, i.e.,
x = 2 defines x as integer
x = 2. defines x as real
x = ’2’ defines x as a string
Basic Python has a very limited number of math operations, which include +, -,
* (multiply), / (divide), ** (to power of), and % (remainder). To get more advanced
math operations, use the numpy module (see below).
#! /usr/bin/env python
abeg = 2.1
aend = 3.9
adif = aend - abge
print ’adif = ’, adif
You had intended to type ’abeg’ but typed ’abge’ instead. When you run the
program, you get an error message:
This is a desirable feature of Python. You don’t want the program to run
by assigning some arbitrary value to abge and giving you a wrong answer. Yet
many languages will do exactly that, including Fortran (we can avoid this potential
problem in Fortran by using the ’implicit none’ statement at the beginning of our
programs).
236 CHAPTER 7. PETER’S PYTHON NOTES
There are always lots of ways to write the same program. Here is another way to
write the multint.py code (multint2.py):
#! /usr/bin/env python
a = 2; b = 3; print ’product = ’, a*b
Notice that more than one command can be included on a line if the commands are
separated by a semicolon (also OK in F90). This makes the code more like C. This
option is rarely a good idea because it makes the code harder to read. Unless you
have a really good reason to put more than one command on a line (saving space is
NOT a good reason!), I suggest that you never use semicolons in this way.
7.2 Numpy
#! /usr/bin/env python
import numpy
print ’pi = ’, numpy.pi
print ’sin(pi/6) = ’, numpy.sin(numpy.pi/6.)
produces:
% ./testnumpy.py
pi = 3.14159265359
sin(pi/6) = 0.5
Notice that, like almost all languages, the trig arguments are in radians.
It can be annoying to have to type numpy. lots of times. We can save typing by
importing numpy as a name that we assign, i.e.,
#! /usr/bin/env python
import numpy as np #or anything else
print ’pi = ’, np.pi
print ’sin(pi/6) = ’, np.sin(np.pi/6.)
You can avoid having to type even the np. by importing everything:
in which case one can use pi and sqrt with no prefixes. However, this is NOT
recommended because one can lose track of where things come from. You code will
be clearer and less error prone if you include the prefix to show which functions and
variables are coming from numpy.
Any advanced programming language must provide a way to loop over a series
of values for a variable. In C, this is most naturally implemented with the “for”
statement. In FORTRAN this is done with the “do” loop. Python uses its own
version of the for loop.
Here is an example program that generates a table of trig functions:
#! /usr/bin/env python
import numpy as np
degrad = 180./3.1415927
for theta in range(0, 90, 1):
ctheta = np.cos(theta/degrad)
stheta = np.sin(theta/degrad)
ttheta = np.tan(theta/degrad)
print ’%5.1f %8.4f %8.4f %8.4f’ %(theta, ctheta, stheta, ttheta)
Notice the use of the variable degrad to convert from degrees to radians. Next comes
This begins the “for” loop and contains a lot of interesting syntax. Unlike most
languages (e.g., C or Fortran), the lines that follow that are inside of the for loop
MUST be indented. Statements that expect a subsequent indentation level end
in a colon (:). For clarity, most people indent loops in other languages, but the
indentation is optional. Here the indentation is how Python knows what is inside
the loop and where the loop ends, because Python has NO STATEMENT TO END
THE LOOP (e.g., enddo in Fortran). The key point is that the 4 lines following
the for statement, must ALL BE INDENTED EXACTLY THE SAME. Tabs and
blanks are treated differently in Python, so two lines may appear to have the same
indentation but actually be different. Thus, it is safest to NEVER USE TABS so
that one can always can see exactly what the true indentation is.
238 CHAPTER 7. PETER’S PYTHON NOTES
Within the for loop, theta will assume a series of specified values, in this case
given by range(0, 90, 1). Range is an integer function that in this case will assume
the values from 0 to 89 (!), step 1. Why 89? Because it is the last number less than
90. You’re right to think this makes no sense but it is typical of Python syntax; the
upper limit is always one more than the last value used. The 1 for step is optional
because 1 is the default step size, i.e., range(0, 90) would give the same numbers.
Note that because range is an integer function, you cannot enter real values as limits
to define theta as real in the for loop. This is consistent with F90, which does not
allow real do loops, although they were OK in older versions of Fortran. Loops
with real increments are not considered good programming practice because of the
precision problems that repeated sums can cause.
Next comes
ctheta = np.cos(theta/degrad)
This uses the numpy cosine function. Ideally theta should be a real variable in this
expression, but fortunately Python figures out that int/real should be real (see more
discussion of this later). Better style is to write
ctheta = np.cos(float(theta)/degrad)
where ‘float’ converts from integer to real (‘int’ converts from real to integer), or for
maximum efficiency we could write
rtheta_deg = float(theta)/degrad
ctheta = np.cos(rtheta_deg)
stheta = np.sin(rtheta_deg)
etc.
which would space the numbers irregularly among the columns. Instead, we explic-
itly specify the output format using a format specification:
Here the output format is given in the single quotes. The format for each number
follows the space to the right of the decimal point (in Fortran this is f5.1). The
single blank space between text there is reproduced exactly in the output, thus to
put commas between the output numbers, write:
7.3. MAKING A TRIG TABLE USING A ‘FOR’ STATEMENT 239
Here we have used the continuation character to split the statement into two lines.
Notice that in this case the 2nd line does not have to be indented to match the
other lines, because this is all considered to be one line by the Python interpreter.
This example highlights a (minor) disadvantage of Python. Because it cares about
the spaces between the different formats, we cannot add spaces in the top line so
that the formats and their variables line up perfectly. Aligning things this neatly
is probably more trouble than it’s worth, but it certainly makes the code easier to
understand.
We used the numpy sine, cosine and tangent function in the trigtable program. Here
are more numpy math functions:
Note that the arcsin, etc., functions have different names than Fortran and C (which
use asin, acos, etc.)
degrad = 180./3.1415927
degrad = 180/3.1415927
The reason is to make completely sure that the program will compute a real
quotient and not an integer quotient. In fact, this caution is not needed in this case,
as the following program demonstrates:
#! /usr/bin/env python
print ’2/3 = ’, 2/3
print ’2./3 = ’, 2./3
print ’2/3. = ’, 2/3.
print ’2./3. = ’, 2./3.
As long as one part of the fraction is real, the program will compute a real
quotient. It is only when both numbers are written as integers that the result is
truncated. However, I have gotten into the habit of always including the decimal
point in real expressions to avoid someday accidentally writing something like:
mimics the do loop syntax in Fortran. However, the Python for loop is a much more
versatile operator than the Fortran do loop. Here are some examples:
for x in [1, 10, 100, 1000]: # x will assume these 4 values in the loop
for name in ["Sue", "Dave", "Mary"]: # name will assume these 3 names
There are many different formats that can specified. Here are some common exam-
ples:
ASSIGNMENT P2
Write a Python program to print a table of x, sinh(x), and cosh(x) (the hyper-
bolic sine and cosine) for values of x ranging in radians from 0.0 to 6.0 at increments
of 0.5. Use a suitable format to make nicely aligned columns of numbers. NOTE:
sinh and cosh are not included in numpy, but are included in the ‘math’ module.
So you will need to include the line ‘import math’ and then write math.sinh, etc.,
to get the functions.
So far all of our example programs have run without prompting the user for any in-
formation. To expand our abilities, let’s learn how to input data from the keyboard.
In most programs, we will want to first prompt the user to input the data, so here
is an example of how to input a number:
“raw input” is preferred over “input” for reasons I don’t fully understand that have
to do with hacking issues. The prompt “Enter number here: ” will print on the
screen and the user types the number to the right of this and hits return. The
input is ALWAYS a string variable, so we convert it to a real number using the float
function.
We can write a program to multiply two numbers as follows (usermult.py):
242 CHAPTER 7. PETER’S PYTHON NOTES
#! /usr/bin/env python
a = float(raw_input("Enter first number: "))
b = float(raw_input("Enter second number: "))
c = a*b
print ’Product = ’, c
This is a little clunky because ideally we might want to input the numbers on
the same line. This is surprisingly complicated to do in Python (at least as far as
I could tell, more experienced users can correct me). Here is one way to do this
(usermult2.py):
#! /usr/bin/env python
a, b = [float(x) for x in raw_input("Enter two numbers: ").split()]
c = a*b
print ’Product = ’, c
The logic is as follows: The raw input will be a string, such as ”5 23” which we
then need to split into two parts. This is done using the split operator by appending
.split() to the string. We now have the two strings ”5” and ”23” and we use a ‘for
loop’ to assign x to each in turn and convert to the real variables a and b using the
float function. Note that a and b can be assigned together, i.e., in Python you can
write
a, b = 1, 2
to assign two numbers in one line. This program will work correctly as long as
a blank is used to separate the numbers being input. If the user enters a comma
following the first number, it will crash because if will try to convert something like
’2,’ to a number. That is, the ’.split’ operator looks for blanks, not commas.
As a long-time Fortran programmer, you will have a hard time convincing me
that this syntax is easier to learn than:
print *, "Enter two numbers"
read *, a, b
which is how you input two numbers in Fortran. In addition, the Fortran version
will accept a comma between the numbers without giving an error.
7.6 If statements
Next, let’s modify this program so that it will allow the user to continue entering
numbers until he/she wants to stop (usermult3.py):
7.6. IF STATEMENTS 243
#! /usr/bin/env python
while (1 < 2):
a, b = [float(x) for x in raw_input \
("Enter two numbers: ").split()]
if (a==0 and b==0): break
c = a*b
print ’Product = ’, c
In this case we use a while loop, which continues until the following statement is no
longer true. Since 1 is always less than 2, the loop will continue forever, unless we
explicitly break out of it. But of course we could include a variable in the expression
as in:
x = 1
while (x < 11):
...
...
x = x + 1
in which case x will assume the values from 1 to 10 (easy to also do with a for loop).
The program will allow the user to continuing entering numbers to be multiplied.
When the user wishes to stop the program (in a more elegant way than hitting
[CNTRL] C!), he/she enters zeros for both arguments. The ”if” statement checks
for this and exits the loop in this case:
The break statement (exit in Fortran) breaks out of the while loop. In this case,
we just execute a single command, but we could also execute a block of code, e.g.
(usermult4.py):
#! /usr/bin/env python
while (1 < 2):
a, b = [float(x) for x in raw_input \
("Enter two numbers (zeros to stop): ").split()]
if (a==0 and b==0):
print ’You entered two zeros’
print ’so the program will now end’
break
c = a*b
print ’Product = ’, c
Notice that this block must be indented relative to the if statement line, and that
there is no ‘end if’ line (just like there is no ‘end do’ line for the while loop). The
end of the if block is shown simply by the end of the indentation.
244 CHAPTER 7. PETER’S PYTHON NOTES
The parentheses in
while 1 < 2:
will work the same. But I think it looks nicer with them, and you will get less
confused when you switch between computer languages if you always leave them in.
Getting the while loop to go forever by using ‘1 < 2’ is a bit of a kludge. It’s
probably better to write
while (True):
which uses the boolean expression ‘True’ (similarly ‘False’ is also permitted) . Note
that we could use a variable for this purpose, i.e.,
b = True
while (b)
b = 1 < 2
while (b)
FORTRAN
77 90 C MATLAB PYTHON meaning
.eq. == == == == equals
.ne. /= != ~= != does not equal
.lt. < < < < less than
.le. <= <= <= <= less than or equal to
.gt. > > > > greater than
.ge. >= >= >= >= greater than or equal to
.and. .and. && & and
.or. .or. || | or
There is very likely a specific order of operations for these things which I can’t
remember very well. Look it up in a book if you are unsure or, better, just put in
enough parenthesis to make it completely clear to anyone reading your code.
One nice aspect of Python compared to C is that if you make a mistake and
type, for example,
if (a = 0):
you will get an error message during compilation. In C this is a valid statement
with a completely different meaning than is intended!
if (logical expression):
(block of code)
elif (logical expression):
(block of code)
elif (logical expression):
(block of code)
.
.
else:
(block of code)
Note that ‘elif’ is the Python version of the F90 ‘else if’ and that the blocks of
code can contain many lines. As many elif statements as required can be used. At
most, one block of code will be executed (once one of the ‘if’ tests is satisfied, it
does not check the others). The final ‘else’ will be executed if none of the preceding
if statements is true. The final ‘else’ is optional.
Here is a demonstration program (usersqrt.py) that repeatedly prompt the user
for a positive real number. If it is negative, ask the user to try again. If it is positive,
it computes and displays the square root using the numpy.sqrt() function. If the
user enters zero, the program stops.
#! /usr/bin/env python
import numpy as np
while (True):
a = float(raw_input("Enter positive real number (0 to stop) "))
if (a < 0):
print ’This number is negative!’
246 CHAPTER 7. PETER’S PYTHON NOTES
continue
elif (a == 0):
break
else:
b = np.sqrt(a)
print ’sqrt = ’, b
This program is very similar to the F90 version we saw earlier. The ‘continue’ line
(optional in this case) continues the ‘while loop’ and serves the same function as
the F90 ‘cycle’ command. The ‘break’ line leaves the while loop and thus ends the
program. Recall that ‘exit’ is the F90 equivalent.
ASSIGNMENT P3
Write a Python program to repeatedly ask the user for the constants a, b, and c
in the quadratic equation a*x**2+b*x+c=0. Using the quadratic formula, have the
program identify and compute any real roots. Output the number of real roots and
their values. Stop the program if the users enters zeros for all three values. HINTS:
If you have trouble getting Python to read in all three numbers on one line, feel free
to enter them on three separate lines. Test your program for some simple examples
to make sure it is working correctly (a=1, b=2, c=-3 should return -3 and 1).
The structure is similar to the corresponding F90 gcf program, but it has fewer
lines due to the lack of enddo and endif statements. Note that ‘a % i’ computes the
modulus (remainder) of a/i. Note that a and b must be integers, so we changed the
user input line to ‘int(x)’ from the ‘float(x)’ that we used for the earlier programs.
Finally, note that we add one to min(a,b) because the Python upper limit is one
7.9. USER DEFINED FUNCTIONS 247
less than the number in the range argument (this is hard to remember for F90
programers!).
ASSIGNMENT P4
Modify gcf.py to compute the least common multiple of two integers.
(To motivate this, let’s repeat the F90 notes here:) As the length and complexity
of a computer program grows, it is a good strategy to break the problem down into
smaller pieces by defining functions or subroutines to perform smaller tasks. This
provides several advantages:
1. You can test these pieces individually to see if they work before trying to get
the complete program to work.
To illustrate how to define your own function in Python, here again is the greatest
common factor program:
#! /usr/bin/env python
# use a function to compute the greatest common factor of two integers
while (True):
a, b = [int(x) for x in raw_input \
("Enter two numbers (zeros to stop): ").split()]
if (a == 0 and b == 0):
break
else:
imax = GETGCF(a, b)
print ’Greatest common factor = ’, imax
Because Python is not compiled (it just interprets your script line by line), you
define your functions BEFORE the main program. This is opposite to the convention
in Fortran, where functions and subroutines normally follow the main program. To
248 CHAPTER 7. PETER’S PYTHON NOTES
define a function, you must start with ‘def’ and then the name of the function and
any arguments:
Notice the syntax: a and b are passed into the function from the calling program.
imax is returned to the calling program with the ‘return’ statement. The contents of
GETGCF are indented; we know the function ends when the indentation ends. Of
course, we don’t really need the imax in the main program—we could have simply
written:
else:
print ’Greatest common factor = ’, GETGCF(a, b)
In the F90 notes, we introduce subroutines at this point. I’m not sure if there
is an exact equivalent in Python, that is, something you call with variables in an
argument list that can be used for both input and output from the subroutine.
However, Python functions are more versatile than Fortran functions because they
can return more than one value. Thus, we will do the SPH AZI example using a
function in userdist.py:
#! /usr/bin/env python
while (True):
lat1, lon1 = [float(x) for x in raw_input \
("Enter first point lat, lon: ").split()]
lat2, lon2 = [float(x) for x in raw_input \
("Enter second point lat, lon: ").split()]
delta, azi = SPH_AZI(lat1, lon1, lat2, lon2)
print ’delta, azi = ’, delta, azi
Notice that delta and azi are not part of the argument list, but both are returned
from SPH AZI when we write:
When I first tried translating this program from the F90 version, I had trouble
because it turns out that ‘del’ is a special word in Python and is used to delete
items from lists (we have not talked about this yet). So I had to change the variable
name ‘del’ to ‘delta’ to get it to work. If you use Xcode to do your editing, then the
key words show up in different colors, so you have a clue that you should not use
them for variable names. Here are the keywords in basic Python:
250 CHAPTER 7. PETER’S PYTHON NOTES
DO NOT USE THESE AS VARIABLE NAMES! Most of these words kind of sound
like things that make sense at some level, although we aren’t going to discuss them
until we need to. But the one the really bugs me is ‘lambda’ as one would like to
think that a Greek letter would always be OK to use in mathematical expressions1 .
ANOTHER WARNING: Another potential pitfall is using certain ‘special’ names
for your Python script. I ran into this recently when I called one of my programs
‘string.py’. The program worked fine, but created a module that prevented numpy
from working in my other Python scripts! I believe the problem is that there is a
module called ‘string’ but am not sure about this. Note that ‘string’ is not one of
the Python keywords. It would be good to find a complete list of program names
to avoid in Python, but I have not found this yet. Can any of you find such a list
or a way to generate it?
If we want to use SPH AZI in multiple programs, it’s annoying and inefficient to
have to include it in each script. To avoid this, we can simply include the function
in a separate script, named, for example, sph subs.py:
import sph_subs as ss
while (True):
lat1, lon1 = [float(x) for x in raw_input \
("Enter first point lat, lon: ").split()]
lat2, lon2 = [float(x) for x in raw_input \
("Enter second point lat, lon: ").split()]
delta, azi = ss.SPH_AZI(lat1, lon1, lat2, lon2)
print ’delta, azi = ’, delta, azi
Note that we drop the .py suffix when we import sph subs and that we choose ‘ss’
as the prefix to use before the function names (analogous to using ‘np’ as the prefix
for the numpy functions.
The sph subs.py script must be in the same directory as userdist2.py. When you
run the userdist2.py script, a file called sph subs.pyc, is automatically created. This
is a binary file (don’t try to edit it), which helps future scripts to load faster.
7.10 Arrays
Here are some examples of how to define arrays in Python using numpy:
#! /usr/bin/env python
import numpy as np
a = np.ndarray(shape=(100), dtype=float)
ii = np.ndarray(shape=(50,2), dtype=int)
In this case, a is defined as a vector (1-D array) with 100 real elements and ii
is defined as a 50 x 2 matrix of integers. Annoyingly, like C, array indices start
with zero, not one. Thus, in the above example the ‘a’ array has values from a[0]
to a[99], and the ‘b’ array includes b[0,0] but not b[50,2]. Notice that in Python,
unlike Fortran, we use brackets, not parenthesis, to refer to specific array values.
Here is an example program that uses an array to compute prime numbers less
than 100:
#! /usr/bin/env python
import numpy as np
maxnum = 100
max_i = int(np.sqrt(maxnum))
252 CHAPTER 7. PETER’S PYTHON NOTES
nprime = 0
for i in range(2, maxnum+1):
if (prod[i] == 0):
nprime = nprime + 1
print i
print ’Number of primes found = ’, nprime
Note that we set the upper limit of the prod array to maxnum+1, so that the actual
array goes from prod[0] to prod[maxnum] (recall that array indices start with zero).
We don’t use prod[0] for anything. Actually, we also don’t use prod[1] for anything,
just like in the F90 version.
Now let’s change the code to print out many numbers per line by saving the
prime numbers in a separate array called pnum. Here is the code:
#! /usr/bin/env python
import numpy as np
maxnum = 1000
max_i = int(np.sqrt(maxnum))
for i in range(2, max_i+1):
if (prod[i] == 0):
max_j = maxnum/i
for j in range(2, max_j+1):
prod[i*j] = 1
nprime = 0
for i in range(2, maxnum+1):
if (prod[i] == 0):
nprime = nprime + 1
pnum[nprime] = i
print pnum[1:nprime+1]
7.11. CHARACTER STRINGS 253
Notice that we can define the second array, pnum, the same size as the first array
by simply saying
pnum = prod
The output of this code is:
Number of primes found = 168
[ 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61
67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151
157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251
257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359
367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463
467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587 593
599 601 607 613 617 619 631 641 643 647 653 659 661 673 677 683 691 701
709 719 727 733 739 743 751 757 761 769 773 787 797 809 811 821 823 827
829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941 947 953
967 971 977 983 991 997]
Notice that, like in F90, we can specify a range of array indices using pnum[1:nprime+1]
and how Python adds an opening and closing bracket around the array contents
upon output. Why is the upper limit nprime+1 and not nprime? Because of the
non-intuitive way Python sets the upper limits. Recall that we ran into this before
with ‘range’ and ‘shape’ in (np.ndarray). It’s confusing because of the difference
from the F90 convention.
The output looks pretty nice, but for completeness, at some point it would be
nice to know how to make an explicitly formatted output table of the primes like we
did in the F90 notes. I was not able to easily find an example of this by Googling
around.
s1 = ’Peter’
s1 = ’Peter’
s2 = ’Shearer’
s3 = s1 + ’ ’ + s2
There are lots of built in functions to manipulate strings in Python. Here are some
examples:
254 CHAPTER 7. PETER’S PYTHON NOTES
#! /usr/bin/env python
s1 = ’Peter Shearer’
print s1[0:5]
print len(s1)
print s1.find(’Sh’)
print s1.split()
Here is the Python version of fileinout.f90, a program that reads pairs of numbers
from an input file, computes their product, and outputs the original numbers and
their product to an output file:
#! /usr/bin/env python
while (True):
line = filein.readline()
if not line: break
x, y = [float(a) for a in line.split()]
z = x * y
line2 = "%10.3f %10.3f %10.3f \n" % (x, y, z)
fileout.write(line2)
filein.close
fileout.close
Let’s examine how it works. First we read and open the input file:
The first line here is similar to what we used before to input numbers except this
time all we need is the string returned by raw input. Next we open file file and assign
it to object infile (this is like a unit number in F90). The optional 2nd argument
‘r’ opens the file for read only. This is good to specify because it will prevent our
accidentally writing on top of it. It also means the file must already exist. Similarly
we read in and open the output file:
This is almost identical to the input version, except we specify ‘w’ for write.
Next, we use a while loop to read the input file one line at a time:
while (True):
line = filein.readline()
if not line: break
We use the readline method to read line (a string) from the object filein. If there is
nothing more to read, then ‘not line’ will be true and we exit the while loop. Next
we split line into two parts (assumes the numbers are separated with blanks and no
commas) and then converts to two real numbers (we saw this before when we input
two numbers on one line), and then computes their product:
Next we perform a formatted write of the three numbers to line2 (another string
variable) and then write this to object line2:
When we are done (having left the while loop with the break), we close the two
files:
filein.close
fileout.close
There are typically many ways to write the same program. This is even more
true in Python than it was in Fortran. Here is a more compact version of fileinout.py:
#! /usr/bin/env python
which is a new syntax and shows how Python will automatically grab one line at a
time from the input file object.
Alternatively, we could read the entire input file in at once:
#! /usr/bin/env python
lines = filein.readlines()
for line in lines:
x, y = [float(a) for a in line.split()]
z = x * y
fileout.write("%10.3f %10.3f %10.3f \n" % (x, y, z))
filein.close
fileout.close
7.13. USING TUPLES AND LISTS 257
where we use ‘readlines’ rather than ‘readline’ in order to input the entire file. Then
notice how we can pull out one line at a time by writing:
There is probably even a way to do this where we don’t loop over the lines, but
somehow directly read x and y as vectors, compute z as a vector, and then output
the vectors directly to the output file. Any Python experts want to try to find this
approach?
A big advantage of Python over Fortran is the ability to make plots directly from
Python scripts. There are a variety of plot packages available for Python. We are
going to use matplotlib because this is what Lisa uses. Please consult her notes for
more details.
Let’s start with an example program (xyplot.py):
#! /usr/bin/env python
import matplotlib
matplotlib.use("TkAgg")
import pylab
pylab.plot(x, y)
pylab.show()
which produces Figure 7.1. The first three lines are necessary to import the plotting
packages and define some of their attributes. Next, we define two lists of 4 numbers
each:
These are not the same as the arrays that we used earlier. In fact they could be
a mixture of different types of variables (e.g., int, float, string), but in this example
258 CHAPTER 7. PETER’S PYTHON NOTES
they need to be of the same type and size. Next, we plot the points and display the
result:
pylab.plot(x, y)
pylab.show()
which brings up the pop-up window. From the window, you can save the plot by
clicking on the little microdisk icon. You can save it in different format (e.g., .eps,
.png) by using the appropriate suffix on the file name. You must close the window
(click on the ref button in the far upper left) to get back to your terminal window.
Now let’s improve the appearance of this plot and add some labels (xyplot2.py):
#! /usr/bin/env python
import matplotlib
matplotlib.use("TkAgg")
import pylab
pylab.xlim(0.5, 4.5)
pylab.ylim(0.5, 4.5)
7.14. PLOTTING WITH PYTHON 259
pylab.plot(x, y, ’bs’)
pylab.xlabel(’X axis’)
pylab.ylabel(’Y axis’)
pylab.title(’My x-y points’)
pylab.show()
which produces Figure 7.2. In this case we define the plot limits so that the endpoints
do not sit on the plot axes:
pylab.xlim(0.5, 4.5)
pylab.ylim(0.5, 4.5)
pylab.plot(x, y, ’bs’)
where the optional pylab.plot 3rd argument can indicate various colors, symbols,
and line types (r=red, b=blue, g=green, c=cyan, m=magenta, y=yellow, k=black,
w=white; +=plus, .=dot, o=circle, *=star, p=pentagon, s=square, x=x, D=diamond,
h=hexagon, carat=triangle; -=solid line, –=dashed line, :=dotted line, -.=dash-
dotted line, None=no connecting lines).
260 CHAPTER 7. PETER’S PYTHON NOTES
As a example to illustrate both Python matrix math and plotting capabilities, let’s
compute and plot the best-fitting line to the points from the last section, i.e., (1, 1),
(2, 1.9), (3, 3.2), and (4, 4). Let the equation of the line be y = a + bx, where a is
the y-intercept and b is the slope. Then we can express the desired fit to the data
as the matrix equation:
1 1 1
1.9 1 2 a
3.2 = 1 (7.1)
3 b
4 1 4
where the goal is to adjust a and b so that the r.h.s. best fits the l.h.s. Note that the
2x4 matrix in this case consists of a column vector with ones (that are multiplied
by a) and a column vector containing the x values of the data points (which are
multiplied by b, the slope).
This is a standard form for linear problems: d = Gm where d is a data vector
with the y-values of the data, G is the linear operator for predicting the data
from the model, and m is the desired model. In this case the model is simply the
coefficients a and b of the line.
Our program will need to import numpy to do basic matrix arithmetic. To do
more advanced matrix operations, such as computing inverses and eigenvalues, we
also import the linear algebra package numpy.linalg
(see, e.g., http://docs.scipy.org/doc/numpy/reference/routines.linalg.html):
import numpy as np
import numpy.linalg as lin
In the last section, we defined the x and y vectors as lists, but we now need to
make them matrices, using the numpy mat method:
x = np.mat([1, 2, 3, 4])
y = np.mat([1.0, 1.9, 3.2, 4.0])
d = y.T
7.14. PLOTTING WITH PYTHON 261
where .T is the transpose operator. In our case, m will also be a 2-vector that
contains the y-intercept and slope of the best fitting line. To solve for m, we apply
the standard formula for the least-squares solution2 to d = Gm.
−1
m = GT G GT d (7.2)
m = lin.inv(G.T * G) * G.T * d
print ’m = \n’, m
where lin.inv is the linalg matrix inverse function. The print statement is to show
the coefficients a and b in the best-fitting line, i.e.,
m =
[[-0.05]
[ 1.03]]
The best-fitting line is of the form y = −0.05 + 1.03 ∗ x. To plot this line, we
compute the y values for x values of 0.8 and 4.2:
x2 = np.mat([0.8, 4.2])
d2 = np.mat([[1, 0.8], [1, 4.2]]) * m
#! /usr/bin/env python
import matplotlib
matplotlib.use("TkAgg")
import pylab
import numpy as np
import numpy.linalg as lin #matrix ops, such as inverse and determinant
x = np.mat([1, 2, 3, 4])
y = np.mat([1.0, 1.9, 3.2, 4.0])
G = np.mat([[1,1], [1,2], [1, 3], [1, 4]])
x2 = np.mat([0.8, 4.2])
2
Note to Geophysics Graduate students: The formula for m given here is the sort of thing that
you are expected to know how to derive before you take your departmental exam. See a linear
algebra or statistics text for details
262 CHAPTER 7. PETER’S PYTHON NOTES
pylab.xlim(0.5, 4.5)
pylab.ylim(0.5, 4.5)
pylab.plot(x, y, ’sb’)
pylab.plot(x2.T, d2, ’r-’) #(x2, d2.T) fails for lines, but works for symbols (bug?)
pylab.xlabel(’X axis’)
pylab.ylabel(’Y axis’)
pylab.title(’My best-fitting line’)
pylab.show()
which produces the plot shown in Figure 7.3. I had trouble getting the red line to plot
using the matrix input to pylab.plot. For some weird reason (bug?), pylab.plot(x2,
d2, ’sr’) works, but pylab.plot(x2, d2, ’r-’) fails. In addition, pylab.plot(x2, d2.T,
’sr’) works and pylab.plot(x2, d2.T, ’r-’) fails. The only way I succeeded in plotting
the line is pylab.plot(x2.T, d2, ’r-’). This makes no sense to me!
Note that this problem can also be solved using the numpy.linalg least-squares
8.2. EXAMPLE WITH EQUATIONS 267
Notice that Latex automatically indented the first line of the paragraph. To
avoid this, use the command for the target paragraph:
\documentclass{article}
\begin{document}
\noindent
Today (\today) the rate of exchange between the British pound
and the American dollar is \pounds 1 = \$1.63, an increase of
1\% over yesterday.
\end{document}
To globally change the paragraph indenting, you can change the default for this
directly:
\documentclass{article}
\begin{document}
\setlength{\parindent}{0.5in}
Today (\today) the rate of exchange between the British pound
and the American dollar is \pounds 1 = \$1.63, an increase of
1\% over yesterday.
\end{document}
This will now indent all paragraphs by 0.5 inches. To remove paragraph indent-
ing, just set this parameter to zero. Latex ignores the carriage returns at the ends
of each line in the input file. It also ignores extra blanks; it only considers the first
blank. New paragraphs are defined by adding a blank line between blocks of text.
\documentclass{article}
\begin{document}
\setlength{\parindent}{0.0in}
The function $X(p)$ is more nicely behaved than $T(X)$ since it does not
cross itself (there is a single value of $X$ for each value of $p$), but
the inverse function $p(x)$ is multi valued. An even nicer function is
the combination
\begin{equation}
268 CHAPTER 8. LATEX
\end{document}
The function X(p) is more nicely behaved than T (X) since it does not cross itself
(there is a single value of X for each value of p), but the inverse function p(x) is
multi valued. An even nicer function is the combination
Note that equations within the text are enclosed with $ signs. \begin{equation}
and \end{equation} are used to put the equation on a separate line. Variables
within the equations are automatically put into italics. Greek letters are defined
as \tau, \eta, etc. Equations are automatically numbered (this can be changed if
desired). Subscripts are defined with the underscore ( ), superscripts with the carat.
Fractions are written as, for example, {x \over y}. \int is for the integral symbol;
note how the limits are written. Curly brackets are used to separate things—they
do not appear in the typeset version.
8.3. CHANGING THE DEFAULT PARAMETERS 269
The \begin{eqnarray} section is used to align the = signs in the two lines of
equations. Note that &=& is used to define what it is that is being aligned. Every
line except the last line has a carriage return (\\). Although TeX generally does
an excellent job of spacing equations, sometimes some fine tuning will help. In this
case \, is used to place a tiny amount of space before the dz.
There is a standard font size declared for each document class. In most cases this
is Roman 10 pt. One easy way to change the size is with the following commands,
arranged in increasing order of size:
\tiny
\scriptsize
\footnotesize
\small
\normalsize
\large
\Large
\LARGE
\huge
\Huge
These commands can be invoked in several different ways to put “in situ” in italics
and then return to normal font:
OR
{\itshape in situ}
270 CHAPTER 8. LATEX
OR
\begin{itshape}
in situ
\end{itshape}
These different options also apply to the font sizes listed above. Here is an
example of how these options can be used:
\documentclass{article}
\begin{document}
\end{document}
which produces:
Let’s test different ways to say in situ in italics. There are several different ways
to say in situ in italics; indeed in situ can be written in italics in at least three
different ways.
Next let’s experiment with large, Large, and LARGE type, as well as small,
footnotesize, and scriptsize type.
There is a default line spacing created whenever a new font is selected. Here are
examples of two different ways that this can be changed:
Here is an example:
\documentclass{article}
\begin{document}
\setlength{\baselineskip}{20pt}
Today (\today) the rate of exchange between the British pound
and the American dollar is \pounds 1 = \$1.63, an increase of
1\% over yesterday. Let’s write one more line here so that we
get up to three lines and can see the spacing better.
\end{document}
which produces:
Today (November 30, 2012) the rate of exchange between the British pound and
the American dollar is £1 = $1.63, an increase of 1% over yesterday. Let’s write
one more line here so that we get up to three lines and can see the spacing better.
Postscript, EPS, and PDF files can easily be embedded as figures in a Latex docu-
ment by including Latex extension packages such as graphics or graphicx. Here is
an example that uses graphicx:
\documentclass{article}
\usepackage{graphicx}
\begin{document}
\setlength{\baselineskip}{20pt}
We are now going to show how to embed a Postscript file into
a LaTex document using the includegraphics command.
\begin{figure}[h]
\begin{center}
\includegraphics[scale=0.7]{plot_1.pdf}
\end{center}
\caption{Here is the caption for this plot. This will be automatically
positioned below the plot.}
\end{figure}
And then here is some more text to show where the next block of text
will appear. Blah, blah, blah...
\end{document}
Figure 8.1: Here is the caption for this plot. This will be automatically positioned
below the plot.
And then here is some more text to show where the next block of text will
appear. Blah, blah, blah...
Note that the graphicx package is loaded with the \usepackage{graphicx} com-
mand at the start of the file. The \begin{figure} macro has various options for
where the figure will be positioned, including at the present location in the text [h],
or at the top [t] or bottom of the page [b]. These options can be combined, i.e., [tb]
will position the figure at either then top or bottom of the page. In this example,
the figure is scaled to 70% of its original size. Depending upon how the figure is
positioned in the PDF file, you also may need to apply a “bounding box” to remove
the surrounding white space using the bb= option, e.g.,
which specifies exactly what part of the page will be windowed and displayed. This
is necessary for PDF figures that appear in only part of an entire page and it can be
tedious to find the right bounding box. In my experience, an easier option is to use
the Mac Preview program to open Postscript or EPS files (from Adobe Illustrator
8.5. WANT TO KNOW MORE? 273
or other graphics programs) and then save them within Preview as PDF files. In
this case, they are tightly windowed and the bb option is not needed.
Figures are automatically numbered; users have control over the starting figure
number.
There is a huge amount of material about Latex on the web. Check out David
McMillan’s great Latex example file, ex.tex, which you can find in ∼shearer/CLASS/COMP/LATEX.
The accompanying files psfig.tex, hobbes.ps, and gnufig.tex are also included.
A good Latex reference book is:
Kopka, H., and P.W. Daly, A Guide to Latex 2e, Addison-Wesley, New York,
1995.
274 CHAPTER 8. LATEX
Chapter 9
Postscript plotting
newpath
144 72 moveto
288 432 lineto
stroke
showpage
We start with the NEWPATH operator. This empties the current path and
declares we are starting a new path (because we are at the beginning of the file, this
command is not actually necessary here but it’s a good idea to start with this).
Next we move to the point (144, 72):
144 72 moveto
The default coordinate system for Postscript files is in units of 1/72 of an inch,
measured from the lower left corner of the page. The arguments for moveto are the
x and y coordinates. These numbers move onto a “stack” and are then read by the
moveto command. Thus, this command moves us to a point 2 inches to the right of
275
276 CHAPTER 9. POSTSCRIPT PLOTTING
the left page edge and 1 inch above the page bottom. This is the “current point”
and we can think of this as a pen location.
Next we add a line segment between the current point and a new point at (288
432) (4 inches, 6 inches):
You can think of this as a “draw” command except the draw is not actually
executed yet—it is just added to the currently defined path. To draw the path on
the page, we use the “stroke” command:
stroke
showpage
We can preview this file with pageview or ghostview. We can also send it to
a Postscript printer. However, to be sure that the printer recognizes that it is a
Postscript file, we should always start the file with %! PostScript file, i.e. (mypost2),
%! PostScript file
newpath
144 72 moveto
288 432 lineto
stroke
showpage
If you don’t start with this, then the printer will just print out the ascii text.
This is no big deal with a small file like this but is a big problem for a large Postscript
file. You don’t want 200 pages of text to come out of the printer when you really
wanted a single page of graphics! If you ever do make this mistake, you will need to
cancel the print job on the printer and/or kill the job on your computer.
If you are like me, you will find the 1/72 inch coordinate system an annoyance.
Both Bob Parker and I prefer to use 1/1000 inch coordinates. This can by adding
the following line at the beginning of your file:
This means that objects will be drawn 0.072 times as large as they would have
been drawn with the old coordinate system. The coordinate 1000 thus now indicates
1000*(1/72)*0.072 = 1 inch.
Note that % is used to add comments. Anything to the right of the % is assumed
to be a comment (the first line is a special “comment” that the printer recognizes).
Why use 1/1000 inch as the coordinate instead of inches directly? The nice thing
about 1/1000 inch is that this is roughly the limit of what the human eye can resolve
on the page so it is not likely that one will need to divide things smaller than this.
Thus, one can avoid decimal places in the numbers and can save a little bit of space
in the file.
We can save more space if we shorten the “moveto” and “lineto” commands
since they will be used many times in a complicated file. We can shorten then by
defining our own shorter operators:
/m {moveto} def
/d {lineto} def
This tells the Postscipt interpreter to replace “m” with “moveto” and “d” with
“lineto”. Our revised Postscript file thus becomes (mypost3):
%! PostScript file
/m {moveto} def
/d {lineto} def
0.072 0.072 scale % Coords are 1/1000 th inch
newpath
2000 1000 m
4000 6000 d
stroke
showpage
Now let’s draw a box with thicker lines than our starting examples (mypost4):
%! PostScript file
/m {moveto} def
/d {lineto} def
0.072 0.072 scale % Coords are 1/1000 th inch
newpath
4000 4000 m
4000 5000 d
5000 5000 d
5000 4000 d
4000 4000 d
278 CHAPTER 9. POSTSCRIPT PLOTTING
30 setlinewidth
stroke
showpage
We draw the four sides with four “d” (lineto) commands. Before actually drawing
the box, we set the line width to 30/1000 inch. The resulting plot will have a slight
notch in the starting corner. To avoid this, we can use the “closepath” command
(mypost5):
%! PostScript file
/m {moveto} def
/d {lineto} def
0.072 0.072 scale % Coords are 1/1000 th inch
newpath
4000 4000 m
4000 5000 d
5000 5000 d
5000 4000 d
closepath
30 setlinewidth
stroke
CLOSEPATH adds a line segment from the current point to the initial point in
the path, thus we don’t need the fourth LINETO command. It also closes the path
with a mitered join so we don’t have the notch as in the previous example.
Once a closed path is defined, we can fill in the box using the FILL command
(mypost6):
%! PostScript file
/m {moveto} def
/d {lineto} def
0.072 0.072 scale % Coords are 1/1000 th inch
newpath
4000 4000 m
4000 5000 d
5000 5000 d
5000 4000 d
closepath
0.5 setgray
fill
showpage
Before doing the fill, we set the gray level for printing at 0.5. The gray levels
range from zero (black) to one (white).
9.1. PSPLOT FORTRAN SUBROUTINES 279
Of course we can also print text of various sizes and types. Here is an example
(mypost7):
%! PostScript file
/m {moveto} def
/d {lineto} def
0.072 0.072 scale % Coords are 1/1000 th inch
/Times-Roman findfont
300 scalefont
setfont
2000 3000 moveto
(Example text) show
showpage
Here we set the type of font with “/Times-Roman findfont” and then set the
font size with:
300 scalefont
This sets the font height to 0.3 inches. The scaled font must then be set to
the current font with the SETFONT command. We then move to the point (2000
3000). Finally, we print the string “Example text” at the current location using the
SHOW command:
We must enclose the text in parentheses to denote it as a string. There are lots
of different choices for the fonts. The main choices for science graphics are Helvetica
and Times and their bold and italic variations.
Well, I could go on with these examples but it would be better to just consult a
Postscript book to learn more about all the things one can do. Two good ones are:
Postscript Language, Tutorial and Cookbook and Postscript, Language Reference
Manual, both by Adobe Systems and printed by Addison-Wesley.
It is useful to be able to generate Postscript files directly from within a Fortran pro-
gram. Given knowledge of the Postscript language, one could write the appropriate
commands to a file. For convenience, I have written a set of Fortran subroutines for
making Postscript files that handle a lot of the bookkeeping details and allow the
user to concentrate on the graphics. These routines are contained in:
280 CHAPTER 9. POSTSCRIPT PLOTTING
/users/shearer/PROG/PLOT/psplot.f
To call these routines from a F90 program, they should be linked with:
/users/shearer/PROG/PLOT/psplotlib.a
/users/shearer/PROG/PLOT/psplot.man
program testpsplot
implicit none
call PSFILE(’mypost’)
call PSWIND(1.5, 7.5, 3.0, 7.0, 0., 10., 0., 20.)
call PSEND
end program testpsplot
call PSFILE(’mypost’)
This opens the Postscript file with name “mypost” (assigning it to Fortran unit
number 17; units 17 and 18 should not be used in any program that uses the psplot
routines). We could of course give the file any name we want at this stage. The
argument could also be a string variable, allowing the user to input the name.
This defines a user coordinate system that will be used for subsequent psplot com-
mands. The first four numbers (x1,x2,y1,y2) define the location of a coordinate
“box” on the page in inches. In this case, (1.5, 3.0) and (7.5, 7.0) define the (x,y)
coordinates of the lower left and upper right corner of the this rectangle in inches.
9.1. PSPLOT FORTRAN SUBROUTINES 281
The next four numbers (x3,x4,y3,y4) define the corresponding values at these points
for the user scale. Thus in this case, the lower left point in user coordinates is (0, 0)
while the upper right point is (10,20). All other points will be linearly interpolated
(or extrapolated) using these values. Often we will want to plot the frame defined
by this imaginary box (as in the PSAXES command below) but this is not required
to define the coordinate transformation.
This draws a frame with tics and numbered tic labels as specified. The position
of the frame is assumed to be defined by the corners set in PSWIND. Here is the
documentation for the PSAXES command:
The labels are offset by 0.5 inches from the frame. Next we draw a line between
user-coordinate points at (2,2) and (5,8):
call PSEND
If you leave out this step, the program will not put a showpage command at the
end of the file and your Postscript plot will not work!
WARNING: Most of the numerical inputs to the PSPLOT routines must be
real numbers! Do not use 2 instead of 2. because the integer and real binary
representations of 2 are different (the subroutines have no way of knowing that you
used 2 rather than 2.) Note, however, that you can use ‘i3’ for the format for the
axes numbers if you know that there won’t be decimal places in the numbers.
Here is a more complicated example that demonstrates many of the PSPLOT
subroutines. See the psplot.doc file for details.
program testpsplot2
implicit none
real :: x, y
integer :: i, icol
character (len=10) :: text
call PSFILE(’mypost’)
call PSWIND(1.5, 7.5, 3.0, 7.0, 0., 10., 0., 20.)
do icol=1,15
x=2.5
y=float(icol)+2.
call PSMOVE(x,y)
call PSCOL(icol)
call PSSYMB(-3,0.15)
call PSMOVE(x,y)
call PSTIC(0.2, 0., 0)
call PSNUMB(float(icol),’i3’)
enddo
call PSGRAY(0.)
do i=1,9
call PSLORG(i)
x=8.5
y=float(i)*2
call PSMOVE(x, y)
call PSSYMB(1, 0.1)
write (text,’(a4,i1)’) ’lorg’, i
call PSLAB(text)
enddo
call PSEND
end program testpsplot2