0% found this document useful (0 votes)
72 views

Python and Matplotlib Essentials For Scientists and Engineers-Morgan & Claypool (2015)

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views

Python and Matplotlib Essentials For Scientists and Engineers-Morgan & Claypool (2015)

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 205

Python and Matplotlib Essentials

for Scientists and Engineers


Matt A Wood
Department of Physics and Astronomy
Texas A&M University-Commerce

Morgan & Claypool Publishers


Copyright © 2015 Morgan & Claypool Publishers
All rights reserved. No part of this publication may be reproduced, stored in a retrieval
system or transmitted in any form or by any means, electronic, mechanical, photocopying,
recording or otherwise, without the prior permission of the publisher, or as expressly
permitted by law or under terms agreed with the appropriate rights organization. Multiple
copying is permitted in accordance with the terms of licences issued by the Copyright
Licensing Agency, the Copyright Clearance Centre and other reproduction rights
organisations.
Rights & Permissions
To obtain permission to re-use copyrighted material from Morgan & Claypool Publishers,
please contact [email protected].
ISBN 978-1-6270-5620-5 (ebook)
ISBN 978-1-6270-5619-9 (print)
ISBN 978-1-6270-5622-9 (mobi)
DOI 10.1088/978-1-6270-5620-5
Version: 20150601
IOP Concise Physics
ISSN 2053-2571 (online)
ISSN 2054-7307 (print)
A Morgan & Claypool publication as part of IOP Concise Physics
Published by Morgan & Claypool Publishers, 40 Oak Drive, San Rafael, CA, 94903, USA
IOP Publishing, Temple Circus, Temple Way, Bristol BS1 6HG, UK
Contents
Preface
Acknowledgements
About the author
1 Introduction: why Python and Matplotlib?
1.1 Numerical analysis and publication-quality plots
1.2 Enter Python
1.3 Resources
2 Downloading and installation
3 First steps
3.1 Working with strings
3.1.1 Hello, World!
3.1.2 Introduction to string methods
3.1.3 String concatenation
3.1.4 Slicing strings
3.1.5 Example: a sequence of file names
3.1.6 Error messages
3.2 Accessing user input
3.3 Your first Python program file
4 Working with numbers
4.1 A powerful calculator
4.2 Lists, tuples and arrays
4.2.1 Lists
4.2.2 Slicing lists
4.2.3 List comprehension
4.2.4 Tuples
4.2.5 Lists caution #1: copying lists
4.2.6 Lists caution #2: multiplying lists by a constant
5 NumPy arrays
5.1 Creating and reshaping arrays
5.1.1 NumPy
5.1.2 NumPy
5.1.3 Other array creation methods
5.2 Basic operations with arrays
5.2.1 Copying arrays
5.3 Dictionaries
5.4 Basic statistics
5.5 Universal functions
5.6 Precision and round-off error
5.7 NumPy matrix objects
6 File input and output
6.1 Reading from a file
6.1.1 General form: numbers and text
6.1.2 NumPy and
6.1.3 Reading and working with dates and times
6.1.4 Reading files with Astropy
6.2 Writing to a file
6.2.1 Formatted output
6.2.2 Writing text and numbers to a file
6.2.3 NumPy
6.2.4 Astropy
7 Simple programing: flow control
7.1 Conditionals
7.2 statements
7.3 loops
7.4 statements
7.5 and statements
8 Functions and modules
8.1 Introduction: coding best practices
8.2 Simple Python functions and modules
8.3 Functions with keyword arguments
8.4 Functional programming: list comprehension, and
8.4.1 Introduction
8.4.2 List comprehension and generator comprehension
8.4.3 The function
8.4.4 The function
8.4.5 The function
9 Classes and class methods
9.1 Introduction
9.2 Class attributes
9.3 Copying and deep copying
9.4 Methods
10 Making plots with Matplotlib
10.1 Simple line and point plots
10.2 Including error bars
10.3 Multiple plots on a page
10.4 Histogram plots
10.5 Quick and easy plotting routines for two-column data
10.6 Customization: text on plots, and inset figures
10.7 Image plots with
10.8 3D plots
10.8.1 3D scatter plots
10.8.2 3D wireframe and surface plots
11 Applications
11.1 Fits to data
11.1.1 Linear least squares: fitting a polynomial
11.1.2 Non-linear least squares
11.1.3 Linear systems of equations
11.2 Numerical integration
11.3 Integrating ordinary differential equations
11.4 Fourier transforms
11.5 Writing sound files
12 Visualization and animations
12.1 VPython
12.2 Making figures with Mayavi
12.3 Animations
13 Interfacing with other languages
For Janie—with love.
Preface
Python and Matplotlib Essentials for Scientists and Engineers is intended to provide a
starting point for scientists or engineers (or students of either discipline) who want to
explore using Python and Matplotlib to work with data and/or simulations, and to make
publication-quality plots. The active user base of Python and Matplotlib has been growing
rapidly in recent years as people realize these packages have a very high level of
functionality, are freely available for any likely operating system and are relatively simple
to learn and use compared to similar software solutions.
No previous programming experience is needed before beginning this book, as my aim
is to make this a stand-alone introduction to Python and Matplotlib. Indeed, my hope is
that you the reader can take this introduction and discover for yourself in just a few hours
whether Python and Matplotlib provide most if not all of the tools you need to get your
work done and your publication-quality plots rendered.
The examples given in this book are available for download at the companion website
pythonessentials.com.
Acknowledgements
I would like to thank first of all my wife Janie, for her encouragement and support. I am
grateful to Pim Schellart of the Astrophysics Department of Radboud University,
Nijmegen, The Netherlands, for first introducing me to the Python language, and to Martin
D Still of the Science Mission Directorate at NASA for introducing me to more advanced
Matplotlib capabilities and helping me to get up to speed. Finally, thank you to the
students and SARA colleagues who have commented on earlier versions of this
manuscript.
About the author
Matt A Wood

Matt A Wood graduated with a BS degree in physics from Iowa State University, and
Master’s and PhD degrees in astronomy from the University of Texas at Austin. He spent
a year as a NATO postdoctoral fellow at the Université de Montreal in Quebec before
accepting a position as assistant professor at The Florida Institute of Technology. He spent
the 2008–2009 academic year on sabbatical at Radboud University in Nijmegen, The
Netherlands, where he was first introduced to the Python programming language. In 2012
he joined the Department of Physics and Astronomy at Texas A&M University-Commerce
as department head. His current research focuses on mass-transfer binary star systems
known as cataclysmic variables. He has been an author on more than 80 peer-reviewed
publications and a similar number of non-refereed publications. He lives in Greenville,
Texas, and when not doing astronomy or administrative tasks he enjoys playing guitar and
bass, walking his doberman Dexter and exploring the world with his wife Janie.
IOP Concise Physics
Python and Matplotlib Essentials
for Scientists and Engineers
Matt A Wood
Chapter 1

Introduction: why Python and Matplotlib?


1.1 Numerical analysis and publication-quality plots
As a scientist or engineer, you need a software package that will allow you to quickly and
accurately analyze (including perhaps generating) your data and plot the results in a form
that you can use in peer-reviewed publications or formal presentations—what I will be
calling publication-quality plots. The software needs to be easy to learn and use, versatile,
run on whatever computer operating system (OS) you are using and preferably be free of
charge. Many people—at least initially—default to using Microsoft Excel (or some other
spreadsheet application) because they have some familiarity with it. But while Excel can
be useful for simple analyses and visualization of data, it is clunky and fairly limited in
what it can achieve. And while it is possible—with effort—to produce a publication-
quality plot with Excel, the default plots are not going to win any style awards and are not
suitable for publication in professional journals or for presentations at professional
conferences.
For many years, MATLAB1 and IDL2 have been industry standards for this kind of
work, providing sophisticated development and visualization environments that allow
rapid design and implementation of algorithms for use in computationally intensive data
analysis, numeric computation and visualization projects. Both are high-level languages
that are substantially easier to code for a typical problem than for example C/C++ or
Fortran and the visualization and data-mining capabilities of both are excellent. However,
all of this comes at a substantial cost. Node-locked commercial licenses for MATLAB
start at over $2000 and academic pricing starts at $500 per license, with additional charges
for the various ‘toolboxes’ that may be required.
In the fields of physics and astronomy, the package SuperMongo3 (SM) is fairly
widely used for making publication-quality plots. It is a powerful interactive plotting
package that allows the user to generate beautiful plots with a minimum number of simple
commands or user-defined macros, to easily include LaTeX symbols in strings, and to
save the plots as postscript files. If you know what this all means then you know this is
very useful for technical publishing, and if not, do not worry—Python/Matplotlib can do
all this and more. I still use SM for some of my plots and, while it also is not free, it is
reasonably priced at $300 for a departmental site license with unlimited free upgrades and
personal technical support from the authors. SM also only really runs on Unix-like
systems (e.g. Linux and MacOS), so it is not cross-platform.
There are several good plotting packages available that are open source (i.e. free),
including Gnuplot4 and GNU Octave5. Gnuplot is a command-line driven plotting utility
that is cross-platform and widely used, but which typically requires more time and effort
on the part of the user to prepare publication-quality plots. The Octave language is very
similar to MATLAB, such that code developed for MATLAB will typically run under
Octave with little modification required. However, Octave uses Gnuplot to render output,
with results that may not have the polish that MATLAB yields by default.
There are many, many other choices as well, and it is probably safe to say that any
plotting package that has been around for more than a few years is capable of producing
publication-quality plots, given enough time and effort on the part of the user. And this is
really the key—you are busy, and if you are using a plotting package that requires an hour
or more of your time to produce a publication-quality plot, and then another hour to
produce a similar plot, and so on, then that package is keeping you from getting other
important tasks done.

1.2 Enter Python


You are reading this book because you have heard or read that Python might be the
solution you are looking for and I am here to tell you it very probably is! The Python
language can probably do far more than you or I will ever need or want to do—visit
python.org to explore in depth. My purpose here is not to give a comprehensive overview
of the language, but to introduce you to the essential core features that will get you up to
speed on simple data manipulation and analysis, and making the kinds of plots you need
for your work. If you decide you want to mine websites for most commonly used words
and phrases, or set up a beautiful graphical user interface (GUI), or write a fully
functioning game, you can do that with Python, but you will need to go elsewhere to find
out how.
Python was developed in the early 1990s by Guido van Rossum while at Stichting
Mathematisch Centrum in the Netherlands. He is fondly known in the Python community
as the Benevolent Dictator for Life (BDFL), but countless others have contributed to the
development of the language and community packages. He chose the name Python
because he was feeling irreverent and was a big fan of Monty Python’s Flying Circus.
Python is a well-executed, object-oriented programming language comparable with
Perl, Ruby, or Java. It uses a syntax that renders programs easy to read as well as to write.
It comes with a large (and growing) standard library with extensive capabilities and can
call modules that were written in a compiled language such as C/C++ or Fortran. Python
can be used in interactive mode to test short pieces of code and includes a bundled
development environment (Integrated DeveLopment Environment). Python is free
to download and use, and runs on all major OSs. The language is copyrighted, but is freely
re-distributable after modification as all releases of the language are open source6.
In addition to basic data types such as numbers (integer, floating point, complex),
strings and lists, Python also supports object-oriented programming with classes, allowing
you to define your own object types. Exception handling of errors is cleanly implemented
and the language is well suited to grouping code into modules and packages. Data types
can be dynamically assigned and Python implements automatic memory management so
you do not have to worry about allocating and freeing memory in your code. Some say
entering at the Python prompt allows one to fly7, but that may
be stretching reality just a bit.
To sum up this introduction: if you are a scientist or engineer looking for a numerical
analysis and plotting system that is easy to learn and use, cross-platform and free, the
combination of Python and Matplotlib is the grail you seek.

1.3 Resources
There are now countless books and websites devoted to Python and associated packages.
The webpage wiki.python.org/moin/PythonBooks includes a long list of book titles sorted
by category, as well as links to reviews. Some of the specific books that I have found
useful in preparing this monograph include:
Langtangen H P 2012 A Primer on Scientific Programming with Python 3rd edn
(Berlin: Springer)
Downey A 2012 Think Python: How to Think Like a Computer Scientist (Needham,
MA: Green Tea)
Fangohr H 2014 Introduction to Python for Computational Science and Engineering
(free download at www.southampton.ac.uk/~fangohr/software).
Many online resources exist as well. To list just a few that I have found useful:
python.org is of course the definitive Python resource on the web. A good starting
point is the Beginner’s Guide at wiki.python.org/moin/BeginnersGuide.
stackoverflow.com is a great question and answer site for programmers. Users vote
up the best answers, so they show up first and are easiest to find.
The Python course available at www.python-course.eu/index.php is very
comprehensive and includes many tutorials and examples.
Google has a Python class available at developers.google.com/edu/python/. The class
includes text, lecture videos and many coding examples.
1www.mathworks.com/products/matlab

2www.exelisvis.com/IDL

3
www.astro.princeton.edu/~rhl/sm
4www.gnuplot.info

5www.gnu.org/software/octave

6See
www.opensource.org for the open source definition.
7Source:
xkcd.com/353.
IOP Concise Physics
Python and Matplotlib Essentials
for Scientists and Engineers
Matt A Wood
Chapter 2

Downloading and installation


There are a large number of Python distributions to choose from, depending on your OS.
The page docs.python.org/2.7/using/index.html contains information on installation, set-
up and usage for all major OSs. My recommendation at the time of writing is that you
consider the Anaconda Python distribution available from Continuum Analytics1.
Anaconda Python includes Matplotlib, NumPy, SciPy, IPython, etc—over 195 of the most
popular Python packages for use in science, math, engineering and data analysis. The
Anaconda distribution has the feature that it installs into a single directory and does not
affect other Python installations on the system. It also does not require root or local
administrator privileges to install and updates are very easily completed using their online
repository.
Python currently has two major versions. Version 2.7 is now considered legacy and no
further major releases are expected, but remains popular. Python 3.0, introduced in 2008,
is under active development and is the future of the language. The current production
version is 3.4. There are some differences between the two versions and some software
packages have not yet been updated to work with version 3.x. Most Python programmers
are still using Python 2.x and Mac and Linux systems both default to Python 2.x, at the
time of writing. Guido van Rossum explains the changes from 2.x to 3.0 in What’s New in
Python 3.0 (docs.python.org/3/whatsnew/3.0.html). Relevant to his discussion, one final
advantage of the Anaconda Python distribution is that it makes it trivial to switch between
the Python 2.x and 3.x environments.
1Visit
store.continuum.io/cshop/anaconda. Another popular choice is the Enthought Canopy Express Python distribution
available from the site store.enthought.com/downloads.
IOP Concise Physics
Python and Matplotlib Essentials
for Scientists and Engineers
Matt A Wood
Chapter 3

First steps
3.1 Working with strings

3.1.1 Hello, World!


Let us keep with tradition for our first example—start up your Python session by (a)
entering (or ) in a terminal window (MacOS or Linux) or (b) double
clicking to start the Python interface (or Anaconda launcher) and enter the following1:

If you are using Python 3.x, you would enter ). You


might have noticed when you invoked the Python shell in interactive mode that it prints
the version number and copyright notice before the primary prompt, which defaults to the
chevron ( ). Continuation lines begin with the secondary prompt, which is three dots
( ) by default.
Note that you can use either single or double quotes, allowing you to have an
unmatched quote in your string:

If you need to, you can always escape quote marks with a backslash (\), and if you
need to include comments, just lead with a hash sign at the prompt or in your programs:

Strings can include ( ) in them to give a line break:


3.1.2 Introduction to string methods
It is sometimes useful to and strings2. For example, if we wished to
take the string and convert this to a three-element
list with the newlines and extraneous white space removed, we can accomplish this with

We can also the list elements back into a single string with spaces as a
separator:

3.1.3 String concatenation


In your programs for data analysis and plot making, you will probably mostly use strings
to access files and to hold text data for plot labels. The following example shows how you
might store your directory location in one variable and an input file name in another.
Concatenation is as simple as putting a symbol between the string variable names:
3.1.4 Slicing strings
Strings can be sliced, where we pull out just some particular character or subset of
characters (we will do the same thing soon with lists). As in the C/C++ languages,
indexing starts with subscript 0 (Fortran starts with index 1):

The length of a string is returned by :

3.1.5 Example: a sequence of file names


Let us pause to explore a slightly more complex example, which also gives a sneak peak at
the Python loop syntax and the function. If we want to generate a
sequence of file names (e.g., so that we might output intermediate simulation results) we
first need to know how to convert an integer to a string type so we can concatenate it into
the file name. This is as simple as where is a variable referring to an integer:
And we can create a file name using string concatenation:

A potential problem with this direct approach is that if we create files


through , a directory listing will return all files beginning with the
string ( , , ,… )
before any files that start with , and so on. We can solve this problem by left
padding the string created from integer with zeros, so that instead of to , we have
, , . We accomplish this with the method:

Next, we need our list of integers, which we obtain with the function3 The
form of is where if omitted
defaults to 0 and if omitted defaults to 1. If all three arguments are present, the
function returns a list of integers
. If is positive, the last element is the largest less than
. If is negative, then the last element is the smallest
greater than . For example

Finally, we put it all together with a loop. Note that when entering these
commands, you will need to indent the lines beginning with and to
indicate they are within the loop. Python simply uses the indentation level to indicate
which lines of code belong in a given block—curly brackets or statements are
therefore not required. The standard indentation is four spaces, but you can use whatever
you like, as long as you are consistent within a given block:

3.1.6 Error messages


Finally, note that if an error occurs, the interpreter prints an error message (which is
usually helpful) and a stack trace. If in interactive mode using the interpreter, it returns to
the primary prompt ( ), and if the input came from a file (e.g.,
) it prints the stack trace and then exits with a non-zero status. More on
exception handling in section 3.2 below.

3.2 Accessing user input


If you need to access user input when running your code, use the command
, which returns a string object, even if the response is a number:
We use the function to confirm that and are both string
variables. Note that Python version 2.7 and earlier contains the function , which
expects a valid Python expression, and evaluates it. This could be a (potentially complex)
number ( or or ), or an expression ( ), or a quote-enclosed string
( ). Thus, the statement
could result in being of type , , , or ! If you are writing
code just for your own use and want results as quickly as possible, you might opt to use
rather than , but the extra overhead for using
is small and offers protection against what should be invalid inputs to your programs. In
addition, in Python 3.x the function behaves as does in Python
2.7, so you might as well using the current standard from the outset.
If we need our input to be assigned to a variable of type or we can use the
or functions to perform the conversion for us:

If you do not pass an integer—


for example you enter or —you will receive an error message and be
dumped back to the Python prompt:
If instead of an integer you wanted a floating point (decimal) number, you could use
instead . If the user
enters an integer when prompted, the integer will be promoted to a floating point number
(e.g., entering either or will set to 2.0).
If you want to write user-tolerant code, you might include a test and let the user try
again if he/she fumbles when typing the input:

This code snippet is an example of handling an error exception4. When an integer is


entered, the statement breaks out of the loop. If you are writing programs
that will be widely used, then you will need to spend a fair amount of time worrying about
error handling, but as our focus here is on getting you up to speed in Python and
Matplotlib, let me simply direct you to section 8 of The Python Tutorial at
docs.python.org5 if you need or want a more in-depth treatment of this subject.

3.3 Your first Python program file


Using Python interactively is useful for very small tasks, but for anything substantial you
will probably want to save your program to a file so you can tweak it until it does what
you want and reuse it in the future. Remember that when you exit the interactive shell
everything disappears.
While developing your code you have three primary methods to choose from. First,
you can use an IDLE for your code development. Anaconda Python includes the Spyder6
(Scientific PYthon Development EnviRonment) IDLE, which provides an interactive
development environment with advanced editing features, interactive testing and
debugging features. Second, you can simply have two windows open, one of which has
your code file open in your favorite text editor (e.g., , , ,
and , to name just a few of the available options) and the other of
which is running IPython. This is the method I typically use, as shown in figure 3.1, since
IPython is far more capable than the standard Python interpreter. Third, you can use an
IPython Notebook7, which provides a web-based interactive environment where you can
combine code (and results), explanatory text, mathematics, plots and animations into a
single document that you can share with collaborators, students and/or instructors. Some
users find it has all the features they need for their research notebooks (see figure 3.2).
IPython Notebook is included with the Anaconda Python distribution. My
recommendation is to try all three methods and see which works best for your coding
style.
Figure 3.1. A simple terminal-based IPython development environment.
Figure 3.2. A sample IPython Notebook showing error bars.

Python is often referred to as a scripting language, but the homepage of


states ‘Python is a programming language that lets you work quickly and
integrate systems more effectively’. You may be wondering what the difference is between
a script and a program. The answer is: not much, really. Traditionally a program was a
body of code written to be compiled (Fortran, C/C++, etc) and run independently, whereas
scripts are typically run from within another program (Python, Perl, IDL, MATLAB).
Scripts are usually smaller in scale than traditional programs as well, but scripts are
certainly programs in that they are sets of instructions to be executed by the computer. So,
feel free to call your codes either scripts or programs, as you prefer.
Our first program file (figure 3.1, top panel) simply prints and
exits when it is run. Our program file could simply contain the single line
, and the result would be the same. The triple quotes set off a multi-
line comment. This is used at the top of programs and functions to describe the workings
of the routine, as we will discuss further below. The very first line
is a ‘shebang’ line that tells the program to activate the Python program given by
your environment variable and should make the program portable between different
machines and different OSs.
Our program as it stands is a bit inflexible. We can make it a little more versatile if we
add a command line argument. In general, command line arguments allow us to input file
names, variables, etc, into our program ( ) without having to prompt for them
within the program itself. For example:

Note that here we are importing the module to pass the command line arguments.
is the program name and is our command line argument.
The total number of command line arguments is given by .
Now would be a good time to save this code to a file called 8, after which
you can run it using the command

but for a program you will use frequently, it will be more convenient to make it an
executable. In Unix-like systems, this is accomplished with the (change mode)
function, where makes a script executable directly from the command line:

As noted above, when I am developing code that uses Python/Matplotlib to analyze


data and generate plots, I often have the code open in an editor ( ) in one window and
an IPython shell running in another window in the same directory. Edit the code, save the
file (without exiting), run the code by typing (note that we are using the IPython
interpreter this time)

The IPython shell numbers your commands for later use. In much of the rest of this
book, I will continue to show the standard Python prompt for examples, but when
you are actually doing your own work, my strong recommendation is that you use IPython
for everything (or an IDLE or IPython Notebook) whenever you are developing code.
You can also run an external program from the Python prompt if IPython is not
available. If there are no command line arguments, then you can use the
function

If you need to pass command line arguments, then things are not so straightforward:
1In
this book, text that is in represents text that you enter or that is returned by the computer. Text
that is entered is colored black, and text that is returned is colored dark blue. Code snippets are in ivory-colored boxes
with light gray borders and complete standalone programs are in ivory-colored boxes with gold borders.
2
The functions and are methods of the class. Methods are called using dot notation, for
example . Methods, including how to define them, are discussed more fully in chapter 9.
3Python 2.x includes the function which is an iterator object that returns the integers one at a time and so
conserves memory when the calling argument is a very large integer. In Python 3.x, is an iterator object that
behaves like in Python 2.x and the original is depreciated.
4This code snippet also gives a sneak peak at the statement and flow control, discussed further in chapter 7.
5The
link defaults to Python 3.x documentation, but the 2.x tutorial is just a click away.
6code.google.com/p/spyderlib/

7See
ipython.org/notebook.html.
8Example
codes named in the text are available for download at pythonessentials.com.
IOP Concise Physics
Python and Matplotlib Essentials
for Scientists and Engineers
Matt A Wood
Chapter 4

Working with numbers


4.1 A powerful calculator
We will do more with strings as we go, but now let us begin working with numbers. When
you have some simple arithmetic to do and your calculator is not handy, just start up the
Python interpreter. We start with integer arithmetic, since this is the only arithmetic that
gives unexpected answers.

Note the first two lines give the expected results, but that yields when in the
Python 2.x interpreter, because the result was rounded down to the nearest integer (floor).
This results because Guido ‘BDFL’ van Rossum adopted the typical rule from C (also true
in Fortran) that the result of an equation is always of the same type as the operands. So,
divide a float by a float, you obtain a float; divide an integer by an integer, you obtain an
integer. The former case is fine, but the BDFL now considers the latter to be a design
bug1. Although there are many codes out there that rely on this behavior, for the rest of us
it is an annoyance we have to know about and avoid. The good news is that Python 3.x
implements true division for both integers and floats2, so will return . Python 2.x
allows you to execute the command to
yield this same behavior. If using Python 2.x and you have not run this command, you can
simply put a decimal point after at least one of the integers in a division operation, which
will raise the result to a floating number:
The order in which operations are completed in a statement with multiple
mathematical operators is similar to other programing languages and can be remembered
with the acronym PEMDAS (Parentheses, Exponentiation, Multiplication/Division,
Addition/Subtraction).
Variables are easy to assign and work with. The statement

is an assignment statement. What happens when this is entered into the interpreter is that
Python creates the float object and binds the name to that object.
In computer programs it is very common to iterate and so lines of code of the form

are common. As an algebraic statement, this makes no sense, but what this line of code is
telling the interpreter (or compiler in a compiled language), is to evaluate the expression
on the right-hand side of the equals sign and then assign the resulting value to the variable
name on the left-hand side of the equation. Because this is such a common operation,
Python has available the compact notation so, for example,

Similarly, Python makes available subtraction of a constant with and division by a


constant with . Note that the order is important: the expression increments the
variable by 1, while the expression assigns the value +1 to !
Here are a few trivial examples of assigning variables and using them to calculate
results:

When in interactive mode, the previous result is available through the variable -. This
can be very useful when using Python as a calculator.

A value can be assigned to more than one variable at a time:

The modulus function can be very useful in certain situations (for example, printing
diagnostics every 100 time steps of a simulation):
Adapting our code from section 3.1.5,

It is probable that you will often want to use the value of π in your programs. You can
enter it yourself, but if you execute at the command line or at
the top of your program, you will have it available to machine precision when you need it.
is an example of a module and modules are extensions to Python that can be
imported to extend the capabilities of the base language:

The math module provides access to the standard mathematical functions defined by
the C standard and you have the option to simply include everything in the module with
at the top of your programs, although as discussed below it is
generally safer to just import what you need to avoid conflicts in the namespace:
Here is a selected list of a few useful functions from the module:

Complex numbers are also straightforward to work with, where indicates the
complex part of the value:

Complex numbers can also be created with the function.


It is simple to extract the real and imaginary parts of a complex number if needed
separately:
4.2 Lists, tuples and arrays

4.2.1 Lists
Python includes several compound data types. The list is a very useful sequence construct.
It can be written as a list of comma-separated values between square brackets:

Note that the elements of a list do not have to all be of the same type—in the example,
we have strings, an integer and a float all in the same list.
You can add elements on to the end of your list, replace elements within your list,
remove items from your list, insert some, reverse the list, or clear the entire list:
As noted briefly in the previous chapter, strings can be concatenated with the + sign:

Lists can also contain other lists, which is useful for some applications:
4.2.2 Slicing lists
Lists, like strings, can be sliced and indexed as needed:

4.2.3 List comprehension


Creating lists is a frequently encountered task in Python programs, so Python has a useful
and powerful compact syntax for accomplishing this, called list comprehension. The
general syntax is

which is equivalent to

Here is a simple example to demonstrate the concept:


and if we only wanted to keep the even values, we can add a conditional statement:

We can also use list comprehensions for string objects:

List comprehensions can be nested, with syntax:

For example, to assign coordinate pairs for a rectangular grid:

We will see additional uses of list comprehension below.

4.2.4 Tuples
Python also includes a data type called a tuple (in fact, our most recent example returned a
list of tuples). Tuples, like lists, are sequences. What is special about tuples is that they
cannot be changed—they are immutable (lists are mutable). You might think of a tuple as
a ‘constant list’. Tuples are indicated by simply separating some values with commas and
are often enclosed in parentheses:
The empty tuple is written as a . You can slice tuples just as you can lists and you
can convert a list to a tuple or vice versa if need be:

Tuples are used behind the scenes in Python and you may use them when calling
functions. Again, they work very much like lists, with the exception that they cannot be
changed.
The function iterates over two or more sequences or iterables in parallel. Most
commonly, it takes two or more lists and returns a list of tuples, where the ith tuple
contains the ith element from each of the argument lists:

It is possible to unzip a zipped tuple using the operator:


The function is commonly used in list comprehensions when two or more lists
are involved in the expression:

4.2.5 Lists caution #1: copying lists


An assignment statement such as in Python does not make independent copies of
lists. Instead, after the statement , both and refer to the same list object, so if
we change an element in we have also changed !

To make a copy of a list that does not refer to the same object, you can use
or for simple lists3:
There may be situations where your list itself contains other objects like lists or class
instances. If you have such a situation, you can use
. See section 9.3 below for more on when you would
need to make a deep copy.

4.2.6 Lists caution #2: multiplying lists by a constant


When working with sequences (lists or tuples) of numbers we have to be careful. This is
one instance where the default behavior of Python will give an unexpected result to those
of us familiar with linear algebra. For example, you might have a list of numbers that you
would like to have multiplied by a constant and so you might try the following:

That is probably not what you expected! What you obtained was three copies of
concatenated together. You can achieve the behavior you want with the following list
comprehension:
This works, and you can use similar constructions for other arithmetic operations, but
is a bit cumbersome. In science and engineering disciplines, we are mostly going to be
dealing with arrays of numbers, which are not included as a core feature in Python. So, let
us introduce the package NumPy, which you will probably import at the beginning of
nearly all of your codes that work with numerical data.
1See
python-history.blogspot.com/2009/03/problem-with-integer-division.html.
2Integers
and floating point numbers (floats) are stored differently in the computer memory.
3
For more information on the module, see docs.python.org/2/library/copy.html.
IOP Concise Physics
Python and Matplotlib Essentials
for Scientists and Engineers
Matt A Wood
Chapter 5

NumPy arrays
5.1 Creating and reshaping arrays
We just saw that the default Python behavior for dealing with lists of numbers was not
what we really wanted. The NumPy module, however, will give us the tools we need.
NumPy’s main object is the array, which is a table of elements all of the same type, with
an arbitrary number of dimensions (or axes) as needed. The number of dimensions or axes
is called the rank of the array. For example, the coordinates of a single point in 3D
space (for example, ) is represented by an array of rank 1 and that
axis has a length of 3. An array that holds the coordinates of four point masses
might be given by

This is a 2D (rank 2) array. In this example, the first dimension (axis) has a length of 4 and
the second has a length of 3.
We can access rows, columns and individual elements in the array just as we did for
lists using indexing and slicing:

The recommended standard method for importing the module is

The array class is called . You can create an array from an existing
(numerical-only) list using

While it may be more convenient to import everything from the module,

to accomplish the same task, experienced Python programmers usually advise against this,
as it introduces many other definitions into your current namespace and this can
potentially cause conflicts. For example, if you first import another module that also
defined an object and then used the new definition
would override the other. The use of qualified names (e.g. ) helps to avoid
such collisions and also helps make clear where those definitions are coming from.
You can easily find and change the attributes of your arrays using methods of the class

Arrays must be homogeneous, meaning all values have the same data type. So if you
enter a mix of integers and floating point numbers the integers will be converted to
floating point values and if there is a single complex number input all entries will be
converted to complex numbers:
5.1.1 NumPy
Note that works like the function we saw in section 3.1 above, but
returns an array instead of a list. As with you specify at a minimum
the value. You may not want your sequence to start at zero, or you may want your
sequence to be floating point values and/or you may want to count down instead of up. As
with these requirements are easy to accomplish:

5.1.2 NumPy
As a general rule, it is not a good idea to use with floating point values,
because uncertainties in how the floating point precision will round the numbers means
there may or may not be an extra value tagged on at the end (see section 5.6). It is much
better practice to use the function that takes as an argument the
number of elements to return, instead of the desired step size. Also, unlike or
, the function fills the array to include both end points. If this is
not what you want you can pass as a keyword argument:

5.1.3 Other array creation methods


The array type can be declared explicitly when the array is created:
It may be useful in some situations to initialize the array to be filled with 1s or 0s
(ones or zeros). These will fill with floating point values by default, but you can specify
them to be other data types if needed. Note that the parameter specifying the shape is a
tuple. There also exists the NumPy function which initializes an array without
initializing the values of the array elements:

5.2 Basic operations with arrays


Let us have a look at how to work with NumPy arrays. Arithmetic operations are
performed elementwise and a new array is created and populated with the result:
Note you could use the final operation to create a mask for another operation.
The product operator operates elementwise in NumPy arrays—it is not standard
matrix multiplication1. To obtain a matrix product use the function. When using
NumPy arrays, for example,

where the last statement gives standard matrix multiplication if and are
2D arrays. If instead and are 1D arrays (i.e., vectors) then returns the
standard inner product of the vectors (without complex conjugation). The function
returns the cross product of vectors and :
You may sometimes want to perform operations and overwrite the original array:

It will often be useful to find the minimum or maximum value of an array. The array
class provides methods and that return these. Having already imported
, we will use the function to return a list of pseudo-random
numbers in the half-open interval [0.0, 1.0) with a uniform distribution2:
It might be that you need the maximum or minimum of a given row or column, in
which case you could specify which axis to search:

5.2.1 Copying arrays


As when copying lists as discussed above, the statement does not make an
independent copy of in the variable —they both point to the same object. If we are
working with arrays and need to make a copy of an array, we use or
as in
If you simply use , then changing any element of also changes the
corresponding element in since they both refer to the same object—be careful out there.

5.3 Dictionaries
The Python dictionary object provides a very flexible means of storing information.
Perhaps you have a list that has the mass densities in units of g cm−3 for selected
substances:

For this to be useful we need to know what substance each of these list items represents.
This is where a dictionary can be useful. A dictionary object can be created as follows
using curly brackets {} and key-value pairs (or simply items) each separated by a colon
where, in our example, the substance name is the key:

Note that the printed order of the key-value pairs is not the same as what we input,
because the information is stored as a hashtable. If you enter the same statements on your
computer, the order may be different from that above. This behavior is not a problem
because the dictionary is not accessed using an index as a sequence is, but rather by the
key.
With the above definition for , we can retrieve the density of ice using a statement

We can add to the dictionary and print it, and can return the length of the dictionary
using :

We can check whether a dictionary contains an item of interest (or not):

The keys and values can each be extracted into new lists:

We can sort a dictionary by the keys in alphabetical order using the


function:
Key-value pairs can be removed from a dictionary using :

As with lists and arrays, to make an independent copy of a dictionary use :

For more information on dictionaries in Python see, for, example chapter 6 of


Langtangen H P 2012 A Primer on Scientific Programming with Python 3rd edn (Berlin:
Springer).

5.4 Basic statistics


Statistics are easy to compute using NumPy. The function
returns an array of pseudo-random numbers in the half-interval [0,1):
5.5 Universal functions
The module includes vectorized versions of the trigonometric and exponential
families of functions, for example, , and . These vectorized
versions can take arrays as arguments and will return arrays corresponding elementwise to
the values of the passed arrays:

We will use vectorized functions in several examples below.

5.6 Precision and round-off error


Before moving on, note that we sampled our array in the example above at the five
values [0, π/2, π, 3π/2, 2π], which should give for the array the values [0, 1, 0−1, 0].
While three of the results were as expected, two of the ‘zero’ values are instead returned as
numbers of order 10−16. This is the result of round-off error, also called representation
error. The cause is that on most machines running Python, the mantissas (or ‘significands’)
of floating point numbers are represented as binary numbers with 53 bits of precision,
which translates to approximately 16 decimal digit precision. Pi itself is an irrational
number, so only the value of zero is exactly zero in binary in our example. The others will
be represented internally with binary values that are very close to, but not exactly equal to
the desired values. The effects of round-off error can be clearly seen in the following
example:

Round-off error is a fact of life when programming and is the reason why it is best to
avoid comparing floats as equal in conditional statements. The following example code
would seem to print the numbers from 0.0 to 0.9 in increments of 0.1 and then stop when
t = 1.0. The actual behavior is that the conditional expression never tests as
and so the loop is infinite. The built-in function returns a string containing
the full (‘official’) string representation of an object, whereas the function returns
an ‘informal’—potentially less accurate—string representation of the object:
Rather than testing for equality, it is much safer to check that you have reached the target
value within some tolerance. For example, the following code terminates at , as
intended:

5.7 NumPy matrix objects


The NumPy module includes the matrix object, which is a subset of the array class. Matrix
objects inherit the attributes and methods of array objects, but matrix objects are strictly
2D, unlike arrays which can have any dimension. Matrix objects will be most useful for
those performing linear algebra. The primary difference between a object and
objects is that the multiplication operator yields matrix–matrix, vector–matrix,
or matrix–vector multiplication. If you are coming from a MATLAB background, you
may find the behavior of the objects comfortably familiar.

Matrices can also be created using a MATLAB-like syntax:

The module contains the routines, which are optimized for linear
algebra. For example, it is trivial to solve a matrix equation of the form ax = b for vector x.
If you are working with very large matrices, you should consider instead using the
module, because typically SciPy is built using the optimized
ATLAS3 LAPACK and BLAS libraries, which results in very fast linear algebra
performance. However, in this case you will need to use the class instead of the
class.
We have not discussed SciPy up to this point, but it is worth mentioning that
essentially everything available in NumPy is also available in SciPy. Often the routines are
identical, but when they differ the SciPy routines are usually faster. To quote the SciPy
FAQ4:
In an ideal world, NumPy would contain nothing but the array data type and the
most basic operations: indexing, sorting, reshaping, basic elementwise functions, et
cetera. All numerical code would reside in SciPy. However, one of NumPy’s
important goals is compatibility, so NumPy tries to retain all features supported by
either of its predecessors. Thus NumPy contains some linear algebra functions,
even though these more properly belong in SciPy. In any case, SciPy contains
more fully featured versions of the linear algebra modules, as well as many other
numerical algorithms. If you are doing scientific computing with Python, you
should probably install both NumPy and SciPy. Most new features belong in SciPy
rather than NumPy.
1MATLAB,
for example calculates the matrix product when the operator is used.
2
If you run this example, your numbers will differ from what is shown.
3math-atlas.sourceforge.net

4www.scipy.org/scipylib/faq.html
IOP Concise Physics
Python and Matplotlib Essentials
for Scientists and Engineers
Matt A Wood
Chapter 6

File input and output


6.1 Reading from a file

6.1.1 General form: numbers and text


Up to this point we have been focusing on Python language basics and have primarily
been working with the interpreter directly. For actual practical use, however, you will be
likely to need a program that will read something from a file, manipulate those data and
output the results to a new file (which may be a new data file, or a plot output file, or
both).
The function returns a file object and most often takes two arguments, the
first being the filename and the second the mode to be used. The mode will be ‘r’ if
reading from the file. If writing to the file, use ‘w’ if any existing contents are to be
replaced or ‘a’ if appending to the existing contents. The mode ‘r+’ is available if you will
be both reading from and writing to the file. If the mode argument is missing, ‘r’ is
assumed. If a binary file is to be written or read, append a ‘b’ to the mode (e.g., ‘wb’, ‘rb’,
or ‘r+b’). The ‘b’ is not required for Unix-based systems (e.g., Mac and Linux), but is for
Windows, so include it for platform independence. After reading is completed, close the
file with .
Let us assume we have a file called that contains the lines

The command will return the entire file as a single string (including the
newline character ) if no argument is passed, or just the number of bytes passed as an
argument, which for example can be useful for reading binary files. The
command resets the current position back to the beginning of the file and closes
the file:
The file can alternatively be read using , which returns a list
containing the lines in the file:

Using , each line is returned as a newline-terminated string and so


really needs some additional processing to be useful. If we wanted each element on each
line to be stored as an element in a 2D list , we might use something like
the following code, made executable with .

When we run the code at the terminal prompt, we obtain


This method is both simple and general, and can be used for essentially any kind of
file. What this code does is to open the file and then step through line by line using the
statement . For each line, the newline character is
stripped with the string method , the elements are split into a list assuming a
space for the separation character using method and the first element of each
line is converted to a floating point value using . Finally, the new rank 1 list is
appended onto the end of the nested list .
Now, it turns out we can improve on the above code. First, we did not have an explicit
statement in our example. The file will be closed when we exit the program,
but in a more complicated code that retrieves data from hundreds of files it is possible that,
even with the close statements, the files might not be closed ‘quickly enough’ and lead to a
‘too many files open’ error from the OS. If instead we use a block as shown in the
following example, then the file is closed immediately after the contents are retrieved.
Using a block is now considered the preferred method of accessing files. Next, it
turns out that is not even needed and can slow down the execution of
your code significantly because it results in the entire file being stored in the memory,
which can be a problem if your data files are huge. Instead, you can just iterate on the file
object itself, as it is already an iterable object! This is memory efficient, fast and yields
simpler code.
So, if we wanted to read in a data file that we knew contained (any number of)
columns of numbers, we could use the following program ( ) which
makes use of a list comprehension:

This program takes the file name from the command line, iterates on the file itself and
iterates within the line. Indeed, this can be made even more compact with the use
of a nested list comprehension:

and for example if we know there are two columns of data in our file and we want to put
these into array objects, we could add the lines

Now that we have discussed the ‘hard’ way to accomplish this, let us discuss the easier
path that affords for reading files containing numerical data.

6.1.2 NumPy and


Perhaps the most common situation you will encounter is needing to read a multi-column
file of numerical data. NumPy provides the function which does exactly
this. Let us say we have a two-column file containing the
mean monthly carbon dioxide CO2 in parts per million (ppm) since 1980 as measured at
Mauna Loa, Observatory, Hawaii1,

where you will notice that the first two lines are comments serving as column headers.
To read this file, all we need do is
Note that because the first two rows of the file start with the comment symbol they
are ignored by . If you have some number of rows at the top of your file that
you want to skip but that do not start with , you can simply include the
keyword when calling . When using comment lines are included
in the count of skipped lines, so it will behave as you want it to, no matter if the rows you
want to skip begin with or not.
For example, given our CO2 data file of monthly averages, if we wanted to not read in
the first ten years of data, we would need to skip the two header lines and 120 data lines,
for a total of 122 lines:

The actual data file from NOAA we are referring to ( ) contains


682 rows of measurements (at the time of writing) dating back to 1958, where a typical
line looks like

Here column 1 is the year, column 2 is the month, column 3 is the decimal date, columns
4, 5 and 6 are different estimations of the CO2 concentration averages and the last column
is the number of days going into the monthly average.
If we want to read this file directly, we can simply read everything with

and this would give us an array with

We could then copy the data of interest (the 3rd and 4th columns) to two 1D arrays as
follows:

but provides a more direct solution. If we just want the decimal date and the
direct average concentration, we can use the keyword to specify which of these
columns we want to read, where the index starts at zero for the first column. Even more
useful, we can the data and load them directly into 1D arrays that we can later
pass to (for example) the Matplotlib functions:

The parameter, if set , transposes the returned array allowing a


statement of the form to be used. The function
has additional parameters that may be useful in special circumstances, but
the above will probably work for most files you will need to read in practice.
If your data file contains missing data and/or if you just want more control over how
your data file is read, the NumPy function is a good choice2. It can take
missing data into account because it loops twice over the data. On the first pass, it
converts each line into a sequence of strings and on the second it converts to the
appropriate data type.

6.1.3 Reading and working with dates and times


Python includes the and modules for working with date and time
values. If you have a file that includes dates in the ISO 8601 international standard
format3, (e.g. ), you can read this into a string using the
general method of section 6.1.1 and then parse the string to a object using the
following:

Then you can use methods of the module to return useful information or
to reformat the date and time using 4:

The module can return the current date and time ( ),


can calculate the difference between two dates, etc, but this is beyond the scope of this
book. For more information, see docs.python.org/2/library/datetime.html and also section
6.1.4 below.

6.1.4 Reading files with Astropy


Members of the astronomical research community have contributed the Astropy5 package,
which has the useful general function which reads in tabular data and attempts to
guess the format by trying the known supported formats (which include basic ASCII,
HTML, LaTeX, CSV, FITS, HDF5 and many others). For example, if you have a file
named that contains the last three lines of the CO2 data with month
columns added as both text and integer:

you can read this file and examine the results using the following:
Note that the function was able to determine that the first line
contains and that the data types of the four columns were , ,
and , respectively. The full utility of the data table object is
beyond the scope of this text, but note that the column names can be accessed via

and data can be assigned to NumPy array objects using

Tables, like lists, are mutable so data in them can be changed in place, and rows and
columns can be deleted or added as needed. As noted above, from the
Astropy library can also read LaTeX tables directly, so if we had found our data in the
LaTeX source code of a paper on and saved it to a file on our local disk as
we could read that file using the following and obtain exactly the same table object as we
obtained above. A related function discussed below that may be useful to you is
, which can write your data as a LaTeX table6.

Reading dates and times with Astropy


As you might imagine, astronomers are particularly interested in time—it is required for
someone who, for example, wants to measure a change in the spin period of a pulsar over
decades or the exact arrival time of a gamma-ray burst. So, it is not surprising that the
Astropy module contains features that surpass those available in the and
7
modules discussed in section 6.1.3 above , with a specific focus on time formats
(e.g., ISO 8601 and Julian date) and time standards (e.g., UTC, TAI, etc) used in
astronomy.
Let us assume that we have read the time string in ISO 8601 format with nanosecond
accuracy. We can then convert it to an Astropy object:

and then can easily print out the equivalent Julian date, modified Julian date, or convert to
another time scale. If you do not need this functionality, then jump to the next section, but
if you do need this functionality, then something like this module may have been on your
wish list for years. See the documentation for more information and send your thanks to
the developers.
You have control over the precision of the printed output with the
attribute, which gives the number of digits after the decimal point when outputting a value
that includes seconds. The default is three and the maximum precision is nine:

Finally, note that the Astropy module can also read and write files using the binary
HDF5 and FITS formats, as we discuss in section 6.2.4 below.

6.2 Writing to a file

6.2.1 Formatted output


When writing output from your Python codes, the default formatting will often not be
formatted attractively. For example, will often output many more digits than are
needed:
If you are saving measurements that are only significant to four digits (e.g., 101.2,
3.002, 1.734×105) then the default behavior will not only take up unnecessary disk space,
but will be misleading since someone opening that file at a later time would not know the
true precision of the measurements from the file contents alone.

The old way: the string modulo operator


Before Python 2.6, string formatting was accomplished using the string modulo operator
‘ ’. This method is still in wide use (and is available in Python 3.x), so you should be
familiar with it, but there may come a time when this old style of formatting will be
removed from the language.
For example, the format string is on the left side of the string modulo operator and
on the right side is a tuple containing the values to be used, in order, in the format string.
Note that if only a single item is to the right of the modulo operator no parentheses are
required.

If you are using Python 3.x, your statement would be

If you have programmed in any other languages, the format codes for this method
should be easily understandable. The general form is
, where
So, the code means print a floating point number with a width of four
characters in the format (the decimal point counts as one character). The code
means print an integer with three character spaces, including a sign if negative. There are
additional specifiers not listed here for binary, octal and hexadecimal numbers.
Although it might appear that the formatting is part of the function, this is not
the case. Instead the string object is acted upon by the modulo operator, which returns
another string, and it is this returned string that is passed to the print function:

The new (Pythonic) way:


A new method was added in Python 2.6 and is intended to be the default method going
forward. If you are new to Python, this is the method you should learn and use in your
codes. The general syntax is

which perhaps is not terribly clear, so let us convert our examples from above to the new
method:

As before with the string modulo operator, we again have a format string on the left
which has fields that will be replaced, however, here we indicate these fields with curly
brackets . The curly brackets and any format codes within will be replaced by the
formatted value of one of the arguments to the object. In the examples
above, the positional arguments , and were explicitly stated, along with
format codes. If the arguments are in the same order as you want things printed, then you
can leave them out. Similarly, if you do not care about the exact formatting of the
arguments, you can also leave out those specifiers:

However, if you want to use the arguments in a different order, or if you want to use an
argument more than once, then you do need to specify the positional parameters

You may have noticed that the general syntax for allowed keyword
arguments. This feature could be quite useful for complicated print statements, as it makes
it easier to map from the arguments back to the format string:

Fields can be left-, right-, or center-aligned if needed:


Here is a slightly more complex example where we have a list we would like to print
centered one item per line between vertical bars. We use the method from
section 3.1 above, using the newline character as the separator and iterating over the
elements in the list:

Printing unicode characters


In science and engineering, you often want to output the results of a fit to data and the fit
parameters will have formal errors associate with them, something of the form a±σa. You
can of course accomplish this with

but if you employ unicode characters you can output the ‘±’ to the terminal (or file). The
unicode character for the ‘±’ symbol is . If you include this in your string with a
‘u’ in front of the string to tell Python to interpret the string as unicode, you can obtain the
result in this form using

which outputs a=5.34±0.02 to the terminal or your file. Note, it may still be preferable to
use the ‘ ’ combination depending on your application. A full list of unicode characters
is available at unicode-table.com.
Printing integers with commas
Finally and somewhat randomly, if you are printing large integers, you might prefer to
make them more easily readable for humans. Python lets you use a thousands separator:

6.2.2 Writing text and numbers to a file


As discussed above, we can open a file for writing using
. We can then write to the file using the
command which takes as an argument a string optionally modified using a
statement:

As noted above, it is preferable to use a statement, which automatically closes the


file after the enclosed block of code finishes executing:

After executing either of these statement blocks from the interpreter (or a file), your
working directory will contain the file with the following lines:

If you have numerical data that you want to write in columns to a file, there are several
ways you can do this, four of which are shown in the example below
( ). All four give identical output. The first example is the most
straightforward and perhaps the first thing you would think of if coming to Python with
previous experience in C/C++ or Fortran. Example 2 brings the loop inside a list
comprehension, saving one line (‘Flat is better than nested’). Examples 3 and 4 both use
the statement, saving another line. These use and
, respectively, where writes a single line to a file and
writes a sequence of strings to the file. The
method requires the entire sequence to be created in memory before writing to the file and
so example 4 is less memory efficient than example 3. Therefore, of the examples shown,
I recommend example 3 as the best, however, NumPy provides the function,
which is in practice what you will probably use to write columns of numbers to a file.

6.2.3 NumPy
Saving an array to a text file is straightforward using the function.
To save your two-column data to a file you could simply enter

which will output a file containing the values on the first line of the file and the values
on the second line of the file, with all values by default printed in exponential format to
machine precision, which is not convenient:
We can write our arrays in columns by including in the
call to and we can format our data to the appropriate number of significant
figures by including the keyword argument.
If using the same and arrays we call using

then our output file contains

If you would like to include header or footer lines, you can pass strings to the
and keyword arguments. This example also demonstrates that if you only include
a single format specifier, it will be used for all values:

Then contains

If you would like to include multiple comment lines at the top of your file, you can do
something similar to the following:
Then contains

6.2.4 Astropy

Writing ASCII tabular data with Astropy


If you are using the module, you may wish to use the included
function instead of :

The new file will contain

As noted above, the Astropy function is very flexible and can also write
your data in several useful formats. If you want to write your file with the column
headings written as a comment line (so the file could be read directly with
):
then the resulting file will contain

Writing LaTeX files


If you want to write your data as a LaTeX table,

then will contain

Other ASCII formats


You can write ASCII files with the formats _ , , , or one of several
other supplied format options. If you want to format your output to keep only a certain
precision, use the the keyword:

Now the file contains


Writing binary files with Astropy
The Astropy module can also read and write files in the widely used HDF5
(Hierarchical Data Format)8 as well as the FITS format9. The HDF was designed to store
and organize large volumes of numerical data efficiently and flexibly. A significant
advantage of using HDF5 files is that because the files are stored in a binary format, there
is no loss in precision and the read/write times are significantly improved over ASCII
files. A disadvantage is that the file itself is not directly human-readable.
Another significant advantage of HDF5 is that HDF5 files can contain multiple tables.
This feature could be useful in the very common situation where you need to write
snapshots of the state of a computational simulation as it evolves some system through
time. Each snapshot can be written as a new table in the file. In the example here, we write
two tables to our output file. Note that each table needs a defined on the write
statement to label the table within the file. The example is contained in the file
_ .

When we execute this code with , it creates the


new file . The default behavior of the function is that if the
specified file already exists, the program will exit with an error message. Include
if you simply want to overwrite the data in an existing file, but use
this with caution if your computations take hours or days to complete. The
allows adding a new table to an existing file.
To read files in HDF5, the keyword must again be specified to retrieve the
desired table. If we execute the following code in the file _

we obtain the output

There are other available packages for reading and writing HDF5 files in Python,
including the h5py package available at www.h5py.org. Quoting from the website, ‘[t]he
h5py package provides a Pythonic interface to the HDF5 binary data format’ and there
exists the book Python and HDF5 written Andrew Collette, the lead author of the h5py
package. The h5py package provides a lower-level interface to HDF5 files, but may
include features you need that the Astropy package does not.
1www.esrl.noaa.gov/gmd/ccgg/trends/

2See
docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html.
3
See, e.g., en.wikipedia.org/wiki/ISO_8601 and perhaps also xkcd.com/1179.
4For a list of all the format codes, see strftime.org.
5Visitwww.astropy.org. If using Anaconda Python, installation is as simple as typing at
a terminal command prompt.
6In
the current example, the table above was created with the command
.
7See
astropy.readthedocs.org/en/latest/time. From the documentation, ‘All time manipulations and arithmetic operations
are done internally using two 64-bit floats to represent time… [T]he Time object maintains sub-nanosecond precision
over times spanning the age of the universe’.
8See www.hdfgroup.org/HDF5.
9FITS
stands for Flexible Image Transport System. The FITS format is widely used by astronomers and although a
binary file format, has the advantage that the metadata are included in a human-readable (ASCII) header. See
fits.gsfc.nasa.gov/fits_home.html.
IOP Concise Physics
Python and Matplotlib Essentials
for Scientists and Engineers
Matt A Wood
Chapter 7

Simple programing: flow control


To write more advanced programs, you will need to use basic flow control commands,
some of which we have used in previous examples. In this chapter, we will first discuss
conditionals, then blocks, blocks, loops and related items.
In the following chapter, we will introduce functions. As noted above, Python uses the
indentation level to indicate which lines of code belong in a block, obviating, for example,
the need for brackets as used in, for example, the C/C++ programming languages or the
< > statement used in Fortran and other languages. The convention among
Python coders is to indent four white spaces per new block of code.

7.1 Conditionals
Python uses the boolean and objects in conditional statements:

The and objects behave as expected with , and boolean


logic statements:
In our programs, we often need to test whether some condition is or
before executing some block of code:

7.2 statements
One of the most basic statement types is the statement to choose between different
code blocks depending on the result of a conditional test. For example,

The statement is short for else if and these, as well as the statement, are
optional.
7.3 loops
In Python, the statement loops over the elements in a sequence. While that sequence
can be a sequence of numbers,

it may be that you want to loop over the elements in a list of strings:

If you need to iterate over the indices of a sequence, you can do so by using the
and functions:

You can also accomplish this behavior with the function:


The function can greatly simplify the situation when you need to
have a collection of items and want to know all the unique pairs. For example, to calculate
the gravitational potential energy of N point masses, we use the formula (7.1)
Ugrav=∑i=1N−1∑j=i+1NUij=∑i=1N−1∑j=i+1NGMiMjrij
where G is the gravitational constant and rij is the distance between masses Mi and Mj. So
given some object that contains the mass and position vectors for all
particles in the system (see section 9.1 below), we could implement this double sum in
Python and hence identify all the unique pairs using the function as in the
following example (where we assume N = 3 particles):

Should you actually want to implement this, you may find it useful to know that you can
find the distance between two vector positions using :

7.4 statements
An alternative method which can be useful is to loop over a sequence while some
condition is and to stop when that condition becomes :
The potential problem is that, if we are not careful, the condition might not be met and
we then have an infinite loop. Generally, it is safer to use loops.

7.5 , and statements


Just as in other languages, a statement breaks out of the current loop. In section
3.2 we saw how to use a loop to handle an error exception on user input, using the
statement to break out of the loop. Here is a trivial example using the
statement:

The statement skips the rest of the statements in the current loop block
and continues to the next iteration of the loop. The following example program
demonstrates the use of the , and
statements:
Make the code executable with then run

The statement is a null operation. It is useful as a placeholder when a statement


is required by the syntax of the language, but no code needs to be executed. An example
of its use is given in the following chapter.
IOP Concise Physics
Python and Matplotlib Essentials
for Scientists and Engineers
Matt A Wood
Chapter 8

Functions and modules


8.1 Introduction: coding best practices
Functions and modules allow us to gather code statements into a single block that can be
called from anywhere else in the program. Functions are useful for avoiding code
duplication and also make the overall program easier to modify and easier to understand.
For example, you might write a function that searches through a list of alphabetized names
to return a person’s phone number. Your first working code might simply perform a linear
search, starting at the beginning of the list each time the function is called and comparing
item-by-item until the desired name is found—fine for prototyping with only a few
hundred names and easy to write. But if this function needs to be called thousands to
millions of times and the list holds a million names, you will need to replace it with a
function that implements a more efficient sort strategy with the same interface to the
calling program. Indeed, it is good programming style to write your program so that, as
much as is practical, all tasks are put into functions where each function does one thing
and is simple to understand. When writing a larger code than the simple examples we have
discussed so far, your overall development time will generally be reduced if you follow
some ‘best practices’ for coding:
1. Know exactly what the overall code, and each function within the code, is supposed
to do.
2. Include a comment block at the top that explains the purpose of the function, and the
inputs and outputs, and that includes example results from calling the function.
3. Debug as you develop your code. Start with something very simple that works and
build in one new feature at a time, always maintaining a working code.
4. Keep your code simple so others can understand it.
5. Use variable names that are descriptive and maintain consistent naming conventions.

Some blocks of code will be useful in multiple projects. An example would be the
standard math routines (e.g., , , , etc.). These of course are so
useful that they are part of the distribution of any language. But perhaps your simulation
program writes output files in a specific format and you have several different programs
that you use to analyze and visualize the results. You could cut and paste the lines of code
that read your file format from program 1 to program 2, to program 3, etc., but then if you
change the output format of your simulation program you have to update all of your other
programs. Instead, it is more efficient to put your _ code into a module that
you can into any program. Then if you change the file format, you only need to
change the code in the module—not in each program separately.
Tim Peters posted the following to the on June 4, 1999, with the title
1
The Python Way . It has since come to be known as The Zen of Python and is an ‘easter
egg’ available at any time using at the interpreter prompt:

8.2 Simple Python functions and modules


You can define a function interactively,

but of course you will generally want to save your code in files for later use or simply for
a more efficient code development process. A module is simply a file that contains one or
more function definitions and associated statements.
The simple form of a function definition is
The function takes an argument(s) (etc), does something with it (them) and returns
a result. The returned result is called the return value. Note that it is possible for a function
to take no arguments and return no explicit result, but when defining a function you must
include the empty parentheses after the function name, even if you do not pass any
arguments to the function.
The following example shows a common use of the function introduced in the
previous chapter. A valid function definition must have at least one statement following
the definition line, so when developing a program or module, you may use the
statement as a placeholder statement for a function you have not yet written (sometimes
called a program stub), or instead you might opt to include a comment that describes what
the function will eventually do:

Neither of the functions or saved in the file


(module name ) explicitly return anything, but even
functions without an explicit statement still return the object . The object
is an object that has its own type—it is not the string .
As another example, here is a function that converts miles per hour ( ) to meters
per second ( ), saved in a file named :

If we import and call this code interactively, the interpreter prints the result to the
terminal, but no variables are actually set. In order to set a variable to the result, we have
to call with something of the form

The special object __ __ is set to the function name if imported and to __ __


if run as a program. For example, consider the single-line module file :

Notice how the output changes between importing the code and running the code from
the IPython interpreter with or the shell prompt:

This behavior is very useful, because it allows us to include a block of code in a


module that only executes if the module is directly, but does not run if the module is
imported by another module or the main module (the top-level program or when in the
interpreter). It is considered good programming practice to include statements that
demonstrate the functionality of the module if the module is run directly, using an
__ __ __ __ statement. For example, let us revisit our conversion
module, where we now also include the inverse to our original function and a
function that gives examples of running the functions in the module :

Accessing the module via and results in the following, where we


also demonstrate how to access the docstrings using
__ __:
8.3 Functions with keyword arguments
Python also has the ability to call functions with keyword arguments. These are useful for
two related reasons. First, the value of some variable can be assigned a default value, but
that value can be overridden if needed. For example, you may have a function that
calculates the free fall time t from a given height and the final speed v at the end of that
fall, assuming gravity is constant. The 1D equation for the speed is v=at, where we solve
the familiar equation
y=12at2
for time
t=2ya.

So our function definition might be as follows, in a file named


_ _ _ :
which we could run with

We could use a keyword argument to make this slightly more interesting. We would
like the default height to be 1 m, and the default acceleration to be the gravitational
acceleration at the surface of the Earth, but would also like the ability to use different
values for the height and acceleration if desired. We then have for our function

We can now call this function without any arguments and the defaults will be used. We
can call it using the keyword arguments and, if using the keyword arguments, the order
does not matter:
Keyword arguments are used a great deal in calls to the Matplotlib plotting functions
discussed in chapter 10 below.

8.4 Functional programming: list comprehension, ,


and

8.4.1 Introduction
Python includes functional programming capabilities in addition to procedural
programming capabilities. Many common programming languages (e.g., C, Fortran,
Pascal, BASIC) are procedural, meaning that the program contains a series of
computational steps to be completed. Procedures (also variously called routines,
subroutines, or functions) can call other procedures, but the net result is a series of
programming statements that are completed one after the other. Functional programming
is also modular in approach, but functional programming languages tend to de-emphasize
or even remove the imperative elements of procedural programming2. Python is one of
several languages (e.g., C++, Java, Perl, MATLAB, Visual Basic .NET) that are multi-
paradigm. Here we will briefly introduce the most useful functional programming features
of Python.

8.4.2 List comprehension and generator comprehension


We introduced list comprehension in section 4.2.3 above, but repeat the essentials here for
completeness. The general form of a list comprehension is

where the brackets ’ ’ and ’ ’ help remind us that the result will be a list object. A simple
example using a list comprehension is
Here is a more complex example using a nested list comprehension that returns a list
of prime numbers from 0 to 49. The list contains all the numbers from 4 to 49
that are divisible by 2, 3, 4, … 7 (many of them more than once, e.g., 12 occurs four
times). The list is created by finding the integers between 2 and 49 that are not
contained in the list:

This method is reasonably efficient for finding small primes, but for finding very large
primes the above code would quickly fill all available system memory with the
list. In such a case, a generator comprehension would be more appropriate, since
generator objects simplify the task of writing iterators and maintain their state between
calls. That is, instead of storing the entire list, generators only return one item of the list
for each time they are called, so are more efficient than list comprehension when the list is
just an intermediate step and does not actually need to be stored.
Here is a trivial example of a generator comprehension statement. Note that the
surrounding parentheses indicate the statement returns a generator object:

The same behavior is available if we write a generator function saved to :

This can be used as


There is a lot more to say about generators, but they are not really essential so are
arguably beyond the scope of this book. For more on generators, see
www.python.org/dev/peps/pep-0255 and this link from the jeffknupp.com blog.

8.4.3 The function


The function (or operator) provides a convenient method to create
small unnamed functions that are used in-place, when needed and where created. They are
often used in conjunction with the , and functions, and
can improve both the conciseness and readability of code if used well. Lambda functions
can take any number of arguments but return just one value as the result of evaluating a
single expression. Lambda functions also have their own local namespace and cannot
access variables other than those in their parameter list and in the global namespace.
The syntax of a lambda function is

For example,

The function can be useful in situations where a simple function needs to be


passed as an argument to an equation solver or minimizer. For example, the
module contains many useful optimization algorithms. Here is a
simple example using Brent’s method as implemented in to find the
minimum of the univariate function f(x)=(x+4)(x−2)2, which is shown in figure 8.1:

Figure 8.1. The function (x+4)(x−2)2 has a minimum at x = 2.0.

Another use of the function is for those who use the 3 or


4 to make GUIs. This example ( _ ), written by Michael
Driscoll5, demonstrates how useful the function can be in this context. A full
discussion of is beyond the scope of this book, but the key lines in this example
are the first and lines. Note that these statements create a
instance and bind to the function in a single line. The
function is assigned to the button’s parameter, calling .
In this case, the button object is said to ‘call back’ to the function object specified as its
command. It turns out that using the function to implement so-called callbacks
to the (or ) GUI frameworks is one of the most frequent uses of the
function:
8.4.4 The function
The function has similar functionality to list comprehension. It allows us to map
one list onto another using some function. In other words,

calls for each of the sequences elements and assigns the


resulting list to the variable .
So recalling our conversion functions, we have the following example that uses
with and avoids the function definition for altogether:
And this can be reduced to a single line using a list comprehension as follows:

The function can also be applied to more than one list at a time with the
proviso that both lists have to have the same length:

This can also be accomplished with list comprehension, as noted above,

8.4.5 The function


The function extracts elements from a sequence object for which the function
returns . For example, to keep only the positive values of a list,

To extend this, we might have a data set for which we would like to remove data
points that are more than some number of standard deviations away from the mean, a
process called sigma clipping. For a true normal (i.e., Gaussian) distribution, we expect
98.7% of the samples to fall between ±3σ. For 1000 numbers drawn from a normal
distribution, we expect roughly three samples to fall outside this range. In the following
example, we draw 1000 samples from a normal distribution with mean 0.0 and σ = 1.0,
then filter those to return the values that are more than 3σ from the mean:
The comma at the end of the statement suppresses the default newline. Note that
the same result can be obtained with a list comprehension which is arguably easier to read:

Should you actually need to sigma clip your data, you can use the function
from . Using the same array from the previous
example, will return an array with the four outliers removed. Note that
this routine is iterative, so after having removed outliers, the mean and standard deviation
of the culled sample are again computed and any outliers removed. This continues until no
outliers remain—use with caution:
1Source:
www.wefearchange.org/2010/06/import-this-and-zen-of-python.html.
2For
a more in-depth treatment of the topics in this subsection, see docs.python.org/2/howto/functional.html.
3
Tkinter is the most commonly used GUI programming toolkit for Python, but there are others. See
wiki.python.org/moin/TkInter for links to more information and tutorials.
4
www.wxpython.org
5www.blog.pythonlibrary.org/2010/07/19/the-python-lambda
IOP Concise Physics
Python and Matplotlib Essentials
for Scientists and Engineers
Matt A Wood
Chapter 9

Classes and class methods


9.1 Introduction
Up to this point, we have used many of Python’s built in object types—strings, lists,
tuples, etc. The class object lets us define our own object type. This is a very powerful and
flexible feature, but it is also a bit more involved than what we have discussed so far1. As
we have discussed, Python supports object-oriented programming. Everything in Python
is an object, and the program executes by operating on the objects.
For our example, let us consider what object type would be useful for a code that
computed the time evolution of a system of point particles subject to physical forces
(typically pairwise between the particles). Such a code is called an N-body code and these
are used widely in physics and astrophysics. For our specific example, let us assume that
we have N point masses interacting via the gravitational force2.
Conceptually, the method is straightforward. A single step in time δt in the simulation
consists of:
1. Calculate the net gravitational force F→i for each particle i resulting from the other
N − 1 particles.
2. Update each particle’s velocity using v→inew=v→iold+F→iMiδt.
3. Update each particle’s position using r→inew=r→iold+v→inewδt.
Once the particle positions are updated, the code again calculates the forces and so on.
Thus, for each particle, we must keep track of the position, velocity and mass. The
way this is typically accomplished if using a sequential programming language is to use
seven arrays, each of length N, which hold the positions (xi, yi, zi), velocities (vxi, vyi, vzi)
and masses (mi) for each of the i particles. However, making use of the object-oriented
programming features of Python, we can define a new data type . A user-
defined type is also called a class. We can define this new class and assign an object to it
as follows:
The new object is called an instance of the class and creating the new object is called
instantiation.

9.2 Class attributes


We can assign positions, velocities and masses using dot notation. These elements are
called attributes of the class.

The values of the attributes can be retrieved again with dot notation and we can use
them in any valid expression:

An instance can be passed as an argument to a function. For example, we probably


want a function that prints the attributes of a particle:
Then we would have

Note that even without having implemented a function to print the values of the
attributes of our instance, we can always print all the attributes of our object using the
module, which ‘pretty prints’ any Python data structure in a form which could be
used as input to the interpreter:

Functions can return instances. Here is an example function


_ that returns the center of mass position and velocity of two
particles, with the mass equal to the sum of the two particle masses:

When we run this we obtain the expected result and it is an instance of the
class:
Like lists, class attributes are mutable, so for example we could have a function
that adds to the mass of one of our particles:

When run, assuming is already an instance of the class,

9.3 Copying and deep copying


To obtain an independent copy of an instance, again use the module. As with lists, a
statement has the result that and refer to the same object, so changing
the value of an attribute of will change the value of the same attribute in :
Instead of having the position and velocity attributes defined as we did above, we
might have opted to create the individual classes and for each of
these vector quantities and then used them within the container object:

Our initialization would be


In this case, if we use in an attempt to make an independent copy of the object
, we run into a problem. The and instances are not copied—they still refer to
the original objects, so changing also changes :

In such a case, we must make use of the function available in the


module. This function copies all levels of objects and so does return a completely
independent copy of the object. It is slower than , but there are times when it is
unavoidable:
9.4 Methods
Methods are functions associated with a particular class. For example is a
method associate with the string class:

Returning to our original definition of the class3, we can bring our print
function into the class definition and make it a print method:

The convention is to use as the first parameter of a method. Note that a method
is invoked using dot notation, as in the example above:

If we include our function as a method, we can invoke as follows, since is


assigned to by default:
When a class is instantiated, the _ _ _ _ method is invoked, if present. This is
where you should set default values for the attributes of your class. For our particle
example, we might have inside class :

Now when we create , the values of the attributes are set, even if we call the method with
no arguments:

We can create using some or all of the arguments. We can call using positional
arguments or keyword arguments:
The __ __ method is another special method that is designed to return a string
representation of an object. We can modify our _ function to fill this role. The
__ __ method is invoked when you print an object. Within class , we have

You can overload operators if needed, such as +, −, ×, ÷. For example if we wanted


the + symbol to represent an inelastic collision as discussed above, we could define the
method __ __ to our class:
Then invoking this method is as simple as

This ends our discussion of Python class objects but, as you can imagine, we have
barely scratched the surface of the discussion of classes or object-oriented programming.
For more information, see chapter 9 of The Python Tutorial
(docs.python.org/2/tutorial/classes.html).
1For
an excellent discussion of Python classes and their implementation, see Downey A 2012 Think Python: How to
Think Like a Computer Scientist (Needham, MA: Green Tea) www.greenteapress.com/thinkpython.
2
In many N-body codes, the masses of all particles are identical, which reduces computation time. For more on N-body
methods (and more), see Bodenheimer P, Laughlin G P, Różyczka M and Yorke H W 2007 Numerical Methods in
Astrophysics: An Introduction (Boca Raton, FL: Taylor and Francis).
3
Note: the complete definition of the Particle class as discussed in this chapter is available at the companion website
pythonessentials.com in the file _ .
IOP Concise Physics
Python and Matplotlib Essentials
for Scientists and Engineers
Matt A Wood
Chapter 10

Making plots with Matplotlib


It is time to make some plots. The module is a large and
growing collection of functions that make Matplotlib work similarly to MATLAB. Each
function changes some aspect of the plot area, such as creating the figure and a
plotting area within the figure, adding lines, points, text, changing the fonts and axis
labels, etc. We will explore several demonstrative examples below, but when it comes time
for you to make your own publication-quality plots, I recommend you visit the Matplotlib
thumbnail gallery at matplotlib.org/gallery.html for examples that may be more closely
related to the vision that you have for your own figures.

10.1 Simple line and point plots


Basic plots are almost trivial to create with the function. If you pass a
1D list of numbers, it assumes the values are y values and the x values are assumed to be
integers starting with 0. The default is to connect points with lines. For example, start an
IPython shell and enter the commands

and you should obtain a plot that looks something like figure 10.1.
Figure 10.1. A simple line plot.

For a simple point plot (see figure 10.2), the code file ( _ _ )
might be
Figure 10.2. A simple point plot.

Note that to leave a little white space between the plotted points and the plot frame we
have specified the plot axes with the
command, which expects a list as an argument (you do not pass the values directly).
The complete list of arguments to for setting the line/point type and
color is too long to include here, but a useful subset includes
You can also specify grayscale intensities as a string (‘0.6’), or specify the color as a
hex string (‘#5D0603’). You can also specify the .
To plot multiple data sets on the same axis, just call multiple times.
The axes will autoscale to fit all of the data as shown in figure 10.3, but because
does not automatically insert white space between the data sets and the
axes, you will usually want to tweak your final plot ranges manually, as in the previous
example. Note this code ( _ _ ) also demonstrates the use of the
function:
Figure 10.3. A figure showing multiple data sets and a legend.

10.2 Including error bars


If you are plotting data values, there is a good chance you will also want to include error
bars on your plots. Here is an example ( ) using
to generate pseudo-random numbers from the standard normal distribution—i.e., a
Gaussian distribution with mean 0 and variance 1—as well as the
function that returns values from a uniform distribution in the
half-open interval [0.0,1.0) (see figure 10.4):

Figure 10.4. An example demonstrating the use of error bars.


If you only had error bars in your y values and it was the same value (e.g., σ=0.2) for
all data points, you could simply use

10.3 Multiple plots on a page


It will often be useful to have multiple subplots together in a single figure. You might
show raw data in a top panel and processed data (with the same x values) below, or a long
time series that spans several panels, etc. To show multiple panels within a given figure,
you will use the unction as in the following example
( _ ; see figure 10.5):
Figure 10.5. Demonstrating multiple plots on a page.

Note that here we first call to initialize our figure space. This is
optional but good practice. We next specify where the argument
says make two plots vertically and one horizontally, and we are plotting in the first
(top left) plot until we give a new command. If you had, for example, a 2×2
grid of subplots, then the order , , and would be top left, top right,
bottom left and bottom right, respectively.

10.4 Histogram plots


There are times when you will want to present your data as a histogram. In general you
will read in data, and you will know what kind of spacing you need. The routine
plots histograms in Matplotlib. If you only pass the array of data, the routine will pick the
minimum and maximum data values, the spacing and the number of bins to use. Most
often, you will want to specify the bin widths and boundaries. The following example
( ) shows how you might do this using to load an array with
random numbers drawn from a normal distribution with a width of 1.0 (see figure 10.6):
Figure 10.6. A simple histogram.

10.5 Quick and easy plotting routines for two-column data


Often when working on a project, you will just need a quick way to display two-column
data as a line plot. The example below ( ) is a code I keep as an executable in
my 1
directory for just such a purpose . This example demonstrates passing the
input file name on the command line and also outputs a syntax usage message if the
program is called with no arguments. The program adds a bit of white space between the
displayed data and the plot frame for aesthetic appeal. If a second command line argument
is passed, this is used as the title of the plot:
10.6 Customization: text on plots, and inset
figures
The default fonts are sufficient for many plots, but for making publication-quality plots,
you will often want to increase the font size, you may want to use a different font and you
may find it useful to include LaTeX commands for special symbols. These changes can be
made using as shown in the example below ( _ ).
Inset plots can also be useful on occasion and these are straightforward to add using the
function. Finally, figures can be saved using the
function, with a user-defined value of dots per inch ( ). The example reads in some
NASA Kepler mission data for the binary star system V344 Lyrae, plots these data as
points and then uses inset plots to ‘zoom in’ and show the character of the light curve at
higher resolution (see figure 10.7):
Figure 10.7. A plot with inset figures and text.

10.7 Image plots with


Color maps can be a very effective means of conveying information. The Matplotlib
function provides the means to work with images. The following example
( _ ) defines an (x,y) grid using the NumPy function and
then evaluates z=−sin(x)sin(y) over that grid. The resulting data are rendered with
and a color bar is added and labeled. When run, the code produces figure 10.8.
Figure 10.8. An demonstration plot.

10.8 3D plots

10.8.1 3D scatter plots


If you have data of three variables you would like to plot in 3D, use functions within the
module2. Here is a simple example ( ) where 200 random
points are colored by their distance from the origin:
When this file is run, the result will be something that looks like figure 10.9. The plot
shown on the screen is interactive—you can use the mouse (click–drag) to change the
viewing angle.
Figure 10.9. A 3D scatter plot.

10.8.2 3D wireframe and surface plots


Wireframe and surface plots are similarly straightforward to generate. Here is a simple
code ( ) demonstrating the generatation of a wireframe plot (figure 10.10):
Figure 10.10. A 3D wireframe plot.

Note that we use the NumPy function in the previous example. This
function makes it easy to generate a mesh of x–y values from 1D arrays that can then be
used in statements that assign values to a 2D z array. For example:

For the surface plot, let us step up the complexity just a bit by using a color map to
shade the surface according to the z values, making the surface slightly transparent with
, adding a contour plot on the base plane and adding a colorbar to the right of
the plot (see figure 10.11). The following lines of code replace the line beginning
_ in the wire3D.py example:
Figure 10.11. A 3D surface plot with a contour plot base and semi-transparent
surface.
1I
also keep a similar code that plots points instead of lines.
2
For more examples, see the mplot3d tutorial at matplotlib.org/mpl_toolkits/mplot3d/tutorial.html.
IOP Concise Physics
Python and Matplotlib Essentials
for Scientists and Engineers
Matt A Wood
Chapter 11

Applications
Python has a substantial body of existing packages that can be imported to efficiently
solve many classes of problems. In this chapter, I highlight just a few of these that I find
particularly useful.

11.1 Fits to data

11.1.1 Linear least squares: fitting a polynomial


NumPy and SciPy includes several functions for fitting data. The NumPy function
will return the coefficients of the best fit polynomial of degree . The
coefficients are given in decreasing powers. In the example below, we create data using a
polynomial of order 4 and then fit that with polynomials of order 1 through 4. The
example ( ) also makes use of the NumPy function that takes a list
of coefficients as argument and allows evaluation of the polynomial:
When we execute the code, it prints out the fitted coefficients. As expected the final fit
returns the original coefficients and a perfect fit (see figure 11.1):
Figure 11.1. A polynomial fit example.

11.1.2 Non-linear least squares


SciPy _
Probably more useful than fitting a polynomial is generalized non-linear least squares
fitting. Here we will initialize some noisy data around the function y=a+bxe−cx. We use
the _ function of the SciPy module, which actually calls the
function but presents a slightly simpler interface for the programmer. In the
example ( _ _ ), we use a function, set our weights equal to
unity for all points and initialize our guesses for the parameters to be 1.0. The function
_ uses the Levenburg–Marquardt gradient (steepest descent) method. It
returns the best-fit values for the parameters and the covariance matrix. The standard
errors are given by the square-root of the diagonal elements of the covariance matrix:
When we run the code, it prints the best fit solution and plots the fit over the generated
data (see figure 11.2):
Figure 11.2. A linear least squares example with _ .

SciPy
Next we present an example of fitting a sine curve to a generated curve with added noise
using the function (see figure 11.3). In this example ( ),
all of the data have the same weight, but it should be obvious from the code how to
included point-by-point weighting if appropriate for your data. The code for this is a bit
long.
Figure 11.3. An example of a non-linear least squares fit to sinusoidal data with
noise.
11.1.3 Linear systems of equations
Both SciPy and NumPy have linear algebra solvers. The SciPy routines may be faster
because they are always compiled with BLAS/LAPACK support, whereas this is optional
for NumPy, and can be much faster if SciPy is built using the optimized ATLAS LAPACK
and BLAS libraries. Solving a system of linear equations is quite straightforward. Say you
have the system of equations
1x+2y+3z+4w=93x+4y+9z+6w=88x+3y+8z+1w=77x+4y+3z+2w=7.

We can solve this by initializing a nested NumPy array to the coefficients on the left-
hand side and to the values on the right-hand side of the equals signs. The code
( ) to solve this particular problem is

which when run gives

The package contains many more linear algebra functions, including


functions for solving matrix inversions, banded matrices, triangular matrices, eigenvalue
problems, decompositions, etc. Low-level BLAS functions are available using
and low-level LAPACK functions are available using
.

11.2 Numerical integration


SciPy provides several choices for integration routines in the
module. The function integrates a function from limits to (which can be ±∞)
using a scheme from the Fortran library QUADPACK. It returns the result of the integral
and an estimate of the absolute error in the result. Higher dimensional integrals are
available using , , and for 2, 3 and N dimensions,
respectively. In this example ( _ ) we integrate ∫0∞e−2xdx which of course
has the analytical solution 1/2:

11.3 Integrating ordinary differential equations


Ordinary differential equations can be solved with ,
which uses LSODA from the Fortran library odepack. Implementation is straightforward,
as all that is needed are initial conditions and a function that returns the derivatives. For
example, the Lorenz system consists of three coupled ordinary differential equations that
model the essentials of atmospheric convection and provide a now-classic demonstration
of chaotic behavior in a deterministic system1:
dxdt=σ(y−x)dydt=x(ρ−z)−ydzdt=xy−βz
where x, y and z indicate the state of the system, and σ, ρ and β are system parameters.
For our example ( _ ), the function definition returns the derivatives
of the Lorenz equations, the initial position is set in , the time array is assigned to and
the function integrates the trajectory. The result is shown in figure 11.4.
Figure 11.4. A 3D plot of the Lorenz attractor.
11.4 Fourier transforms
Fourier transform are useful for exploring the harmonic content of time series (or spatial)
data2. The common convention is
F(k)=∫−∞∞f(t)e−2πiktdt
where F(k) is the Fourier transform of series f(t). When this integral is expressed as a sum
suitable for numerical evaluation3, we have the discrete Fourier transform (DFT)
ck=∑n=0N−1ynexp−i2πfkti
where the coefficients ck provide an estimate of the power in the time series at the
frequency. The direct coding of the above results in the following example, where in
addition we have applied the normalization such that an input sine curve with an
amplitude of A will return a peak in the Fourier transform with an amplitude of 1.0. In the
example, the DFT is implemented in the function , and we use this to demonstrate
the calculation of the amplitude spectrum on a time series consisting of two sinusoids
where we have randomly deleted 80% of the points. For large data sets, the DFT can be
very slow, as it requires O(N2) operations. If your data are equally spaced, then you will
be able to use a fast Fourier transform (FFT) to compute your estimate of the amplitude
spectrum. The FFT requires only O(NlogN) operations.
In the example ( _ ) we include a call to the NumPy function
from the module (using the gapless data set), as well as the function
which returns the frequencies of the calculated amplitudes (see figure 11.5):
Figure 11.5. A demonstration of a Fourier transform. The upper panel shows the
original time series (red) and the randomly selected points (blue points). The
middle panel shows the DFT amplitude spectrum of the randomly selected points
and the lower panel shows the FFT amplitude spectrum of the red curve.

The SciPy module contains the module which contains a long list of useful
functions, including which returns an estimate of the power spectral
density using the Lomb–Scargle periodogram. To call this function, we only need to
calculate the angular frequencies and call as follows using our previous definitions:
11.5 Writing sound files
It can be fun and perhaps even useful to turn your data into a sound file. The following
example4 does just that ( ). It reads in two-column data, normalizes to the
range -16384 to +16384, then uses the module functions to create and write a WAV
file:
1See,
e.g., mathworld.wolfram.com/LorenzAttractor.html.
2A
classic and comprehensive text introducing Fourier transforms is Bracewell R 1999 The Fourier Transform & Its
Applications (New York: McGraw-Hill).
3
See chapter 7 of Newman M 2012 Computational Physics (Scotts Valley, CA: CreateSpace) for a thorough discussion
of applications of Fourier transforms.
4Based
on codingmess.blogspot.com/2010/02/how-to-make-wav-file-with-python.html.
IOP Concise Physics
Python and Matplotlib Essentials
for Scientists and Engineers
Matt A Wood
Chapter 12

Visualization and animations


12.1 VPython
VPython1 is based on the 3D graphics module contributed by David Scherer in
2000. The package allows you to place 3D objects in a scene. The scene is
updated many times per second, allowing animations and user-controlled scene rotations
and scalings that are fluid. The website contains links to tutorial videos
that provide a good introduction. Here is a simple example2 ( ) that
simulates a ball bouncing between two walls. Figure 12.1 shows a screen capture from a
slightly modified version of this code that creates a new sphere every 15 time steps. In the
code, the module is imported on the first line. Next, the scene is initialized,
where indicates the coordinates to which the camera points. The next
three lines initialize the wall and floor objects. The ball object is initialized next at
position , radius = 0.5 and with a red color. The next line initializes the
vector velocity. The loop is infinite, so the program is terminated by closing the
plot window. The command indicates that no more than 100 time steps
should be taken per second, which is needed for fast computers to prevent the entire
simulation from zipping by too fast to register. The ball object’s position is updated with
the velocity as a vector statement. Normal velocity components are mirrored upon
collision with the walls or floor and the y velocity is updated using vynew=vyold−gdt on
all time steps except when there is a floor collision. Note that there are no graphics calls in
the loop—just calculations updating positions and velocities:
Figure 12.1. A VPython simulation of a ball bouncing between walls.
Here is an example of a binary star simulator ( _ ). The
calculations are performed in MKS units to simplify the programming. Here we have
increased the ambient and directional lighting, and make use of the (magnitude)
function when creating our r^12 unit vector. We include an object, the tail of
which is at the center of mass, to indicate the direction of system angular momentum. In
figure 12.2 we show three positions of the simulation.

Figure 12.2. A VPython binary star simulation.


12.2 Making figures with Mayavi
Mayavi3 is intended to provide an interface for making interactive visualizations of 3D
data and is an alternative to Matplotlib. It is available as a stand-alone program, but
relevant to us it is also callable from Python programs and interactively from IPython.
Mayavi is a very powerful package and a full description of it is outside the scope of this
book. My intent here is to simply bring it to your attention with a couple of small
examples. Should Mayavi be useful to you, there are a large number of examples on the
website to get you started. For our first example, the code below
( _ ) uses the NumPy function to generate a 3D
mesh, then plots points on that mesh, scaling both the color and size of the spherical
glyphs by the distance to the origin (see figure 12.3):

Figure 12.3. A Mayavi figure of points on a 3D mesh.


As our second Mayavi example, the code below ( _ _ ) generates a
plot of the Y105(θ,ϕ) spherical harmonic and saves the image to a file. The argument
instructs Mayavi to use the values to scale the color map on the surface
(see figure 12.4):

Figure 12.4. A Mayavi figure showing spherical harmonic Y105(θ,ϕ).


12.3 Animations
Matplotlib contains the module which can be useful for several purposes. As
discussed above, animations are excellent for visualizing physical phenomena. In addition,
animations can be very effectively used during public talks or for web publications. Even
a simple value versus time plot can be brought to life by animation. For example, the
following code4 reads in the CO2 data discussed in section 6.1.2 (see figure 12.5). The line
object is created with and is of type . It is
the only element which changes during the animation. The line object is updated with the
function _ , which is what is called repeatedly to generate the animation.
Here, the ellipsis ‘ ’ serve as a placeholder for a variable number of ‘ ’ slices and
of course indicates to return only the first values of the array. The result of this
function definition is that each subsequent call sets the data for the line object to contain
one additional data pair, up to the complete data set for the final call. The
keyword argument specifies a delay of 20 ms between frames. The keyword if set
tells to only redraw the pixels of the plot that have changed,
which can speed up the animations considerably5. Finally, the animation is saved in the
file _ 6.
Figure 12.5. The final frame of the animated CO2 movie.
1Tagline:
3D Programming for Ordinary Mortals. VPython is available from vpython.org. To install VPython in
Anaconda, type at a terminal prompt. To build VPython and
dependencies from source, see the instructions at github.com/mwcraig/conda-vpython-recipes.
2
Based on the bounce example from vpython.org/contents/bounce_example.html.
3For download, installation, documentation and a gallery of examples, visit code.enthought.com/projects/mayavi.
4Based
on _ ( _ ) from the matplotlib.org animation examples page
matplotlib.org/1.4.2/examples/animation/index.html. A nice tutorial is available at
jakevdp.github.io/blog/2012/08/18/matplotlib-animation-tutorial.
5If
using OS X, you will need to specify due to the way in which the OS works. This has been a known
issue for several years now and apparently is not a simple fix. For more information, see
github.com/matplotlib/matplotlib/issues/531.
6Available
at pythonessentials.org/anim_co2.mp4.
IOP Concise Physics
Python and Matplotlib Essentials
for Scientists and Engineers
Matt A Wood
Chapter 13

Interfacing with other languages


It is possible to call C/C++ and Fortran routines from within a Python program and vice
versa. You might want to do this if you have a working and often-used computationally
intensive routine in one of these languages, but you prefer the more user-friendly interface
that Python can present. Perhaps surprisingly, calling a Fortran routine is less involved
than interfacing with C/C++ as long as we use 1, which is now part of the NumPy

distribution and is what we will explore in this chapter. If you have the need to interface
with C/C++, see the documentation at docs.python.org/2/extending/extending.html.
Here we will use the specific example of a Fortran subroutine ( in file
) that calculates a simple DFT, as discussed in section 11.4. The subroutine
calculates the normalized amplitudes, such that when passed a noise-free sine curve with
amplitude 1.0, the calculated amplitude will equal 1.0 at the appropriate frequency. The
implementation here does not require that the input data be equally spaced, as is required
when using an FFT.
The equivalent Python implementation of this is function from our module
_ ,
where both implementations return the same values to six decimal places.
Given our Fortran subroutine, we can use to create a module that can be
imported and used:

This creates a file on your system with the basename and an extension that is
the appropriate extension for a Python extension module on your platform (e.g., ,
. , etc). The module is now importable, but all array dimensions must be
declared in the calling function.
In this example ( ) we generate a time series data set of 10 000
points consisting of two periods, of ten and three seconds, of different amplitudes. We
calculate the DFT at 1000 frequency points using both the Python code and the code in the
Fortran-derived module ( ). We use the function from the
module to determine how much time is spent in each of the two DFT routines and find that
the Python DFT function takes some 150 times longer to complete than the Fortran
subroutine:
1See,
for example, docs.scipy.org/doc/numpy/user/c-info.python-as-glue.html.

You might also like