Learning SciPy For Numerical and Scientific Computing - Second Edition - Sample Chapter
Learning SciPy For Numerical and Scientific Computing - Second Edition - Sample Chapter
ee
Sa
pl
Introduction to SciPy
There is no doubt that the labor of scientists in the twenty-first century is more
comprehensive and interdisciplinary than in previous generations. Members of
scientific communities connect in larger teams and work together on missionoriented goals and across their fields. This paradigm on research is also reflected in
the computational resources employed by researchers. No longer are researchers
restricted to one type of commercial software, operating system, or vendor, but
inspired by open source contributions made available and tested by research
institutions and open source communities; research work often spans over various
platforms and technologies.
This book presents the highly-recognized open source programming environment till
date a system based on two libraries of the computer language Python: NumPy
and SciPy. In the following sections, we will guide you through examples from
science and engineering on the usage of this system.
What is SciPy?
The ideal programming environment for computational mathematics enjoys the
following characteristics:
Introduction to SciPy
It should be an open source software, that allows user access to the raw data
code, and allows the user to modify basic algorithms if so desired. With
commercial software, the inclusion of the improved algorithms is applied
at the discretion of the seller, and it usually comes at a cost of the end
user. In the open source universe, the community usually performs these
improvements and releases new versions as they are publishedat no cost.
[8]
Chapter 1
On top of NumPy, we have yet another open source library, SciPy. This library
contains algorithms and mathematical tools to manipulate NumPy objects with
very definite scientific and engineering objectives.
The combination of Python, NumPy, and SciPy (which henceforth are coined
as "SciPy" for brevity) has been the environment of choice of many applied
mathematicians for years; we work on a daily basis with both pure mathematicians
and with hardcore engineers. One of the challenges of this trade is to bring about
the scientific production of professionals with different visions, techniques, tools,
and software to a single workstation. SciPy is the perfect solution to coordinate
computations in a smooth, reliable, and coherent manner.
Constantly, we are required to produce scripts with, for example, combinations
of experiments written and performed in SciPy itself, C/C++, Fortran, and/or
MATLAB. Often, we receive large amounts of data from some signal acquisition
devices. From all this heterogeneous material, we employ Python to retrieve and
manipulate the data, and once finished with the analysis, to produce high-quality
documentation with professional-looking diagrams and visualization aids. SciPy
allows performing all these tasks with ease.
This is partly because many dedicated software tools easily extend the core features
of SciPy. For example, although graphing and plotting are usually taken care of with
the Python libraries of matplotlib, there are also other packages available, such as
Biggles (http://biggles.sourceforge.net/), Chaco (https://pypi.python.
org/pypi/chaco), HippoDraw (https://github.com/plasmodic/hippodraw),
MayaVi for 3D rendering (http://mayavi.sourceforge.net/), the Python
Imaging Library or PIL (http://pythonware.com/products/pil/), and the online
analytics and data visualization tool Plotly (https://plot.ly/).
Interfacing with non-Python packages is also possible. For example, the interaction
of SciPy with the R statistical package can be done with RPy (http://rpy.
sourceforge.net/rpy2.html). This allows for much more robust data analysis.
Installing SciPy
At the time of this book, the stable production releases of Python were 2.7.9 and 3.4.2.
Still, Python 2.7 is more convenient if the user needs to communicate with thirdparty applications. No new releases are planned for Python 2; Python 3 is considered
the present and the future of Python. For the purposes of SciPy applications, we do
recommend you hold on to the 2.7 version, as there are still some packages using
SciPy that have not been ported to Python 3 yet. Nevertheless, the companion
software of this book was tested to work on both Python 2.7 and Python 3.4.
[9]
Introduction to SciPy
The Python software package can be downloaded from the official site
(https://www.python.org/downloads/) and can be installed on all major
systems such as Windows, Mac OS X, Linux, and Unix. It has also been ported
to other platforms, including Palm OS, iOS, PlayStation, PSP, Psion, and so on.
The following screenshot shows two popular options for coding in Python on an
iPadPythonMath and Sage Math. While the first application allows only the use of
simple math libraries, the second permits the user to load and use both NumPy and
SciPy remotely.
PythonMath and Sage Math bring Python coding to iOS devices. Sage Math allows
importing NumPy and SciPy.
We shall not go into detail about the installation of Python on your system, since we
already assume familiarity with this language. In case of doubt, we advise browsing
the excellent book Expert Python Programming, Tarek Ziad, Packt Publishing, where
detailed explanations are given for installing many of the different implementations
on different systems. It is usually a good idea to follow the directions given on the
official Python website. We will also assume familiarity with carrying out interactive
sessions in Python, as well as writing standalone scripts.
The latest libraries for both NumPy and SciPy can be downloaded from the official
SciPy site (http://scipy.org/). They both require a Python Version 2.4 or newer,
so we should be in good shape at this point. We may choose to download the
package from SourceForge (http://sourceforge.net/projects/scipy/),
Gohlke (http://www.lfd.uci.edu/~gohlke/pythonlibs/) or Git repositories
(for instance, the superpack from http://stronginference.com/
ScipySuperpack/).
[ 10 ]
Chapter 1
This presents a list of all ports that either install SciPy or use SciPy as a
requirement. For Python 2.7 we need to install py27-scipy issuing the
following command:
% port install py27-scipy
A few minutes later, the libraries are properly installed and ready to use. Note
how macports also installs all needed requirements for us (including the NumPy
libraries) without any extra effort on our part.
[ 11 ]
Introduction to SciPy
2. Once built, and on the same folder, issue the installation command.
This should be all:
% python setup.py install
The procedure for the installation of the SciPy libraries is exactly the same,
that is, downloading and building before installing under Unix/Linux or
downloading and running under Microsoft Windows. Note that different
implementations of Python might have different requirements before
installing NumPy and SciPy.
[ 12 ]
Chapter 1
The reader should be aware that the execution of this test will take some time to
finish. It should end with something like this:
This means that at the basic level, your SciPy installation is fine. Eventually, the test
could end in the form:
In this case, one needs to revise carefully the errors and the failed tests. A place to get
help is the SciPy mailing list (http://mail.scipy.org/pipermail/scipy-user/)
to which one could subscribe. We have included a Python script that the reader
could use to run these tests that can be found at the companion software for this
chapter that comes with the book.
[ 13 ]
Introduction to SciPy
SciPy organization
SciPy is organized as a family of modules. We like to think of each module as a
different field of mathematics. And as such, each has its own particular techniques
and tools. You can find a list of some of the different modules included in SciPy at
http://docs.scipy.org/doc/scipy-0.14.0/reference/py-modindex.html.
Let's use some of its functions to solve a simple problem.
The following table shows the IQ test scores of 31 individuals:
114
100
104
89
102
91
114
114
103
105
108
130
120
132
111
128
118
119
86
72
111
103
74
112
107
103
98
96
112
112
93
A stem plot of the distribution of these 31 scores (refers to the IPython Notebook for
this chapter) shows that there are no major departures from normality, and thus we
assume the distribution of the scores to be close to normal. Now, estimate the mean
IQ score for this population, using a 99 percent confidence interval.
We start by loading the data into memory, as follows:
>>> import numpy
>>> scores = numpy.array([114, 100, 104, 89, 102, 91, 114, 114, 103, 105,
108, 130, 120, 132, 111, 128, 118, 119, 86, 72, 111, 103, 74, 112, 107,
103, 98, 96, 112, 112, 93])
At this point, if we type dir(scores), hit the return key followed by a dot (.), and
press the tab key ;the system lists all possible methods inherited by the data from the
NumPy library, as it is customary in Python. Technically, we could go ahead and
compute the required mean, xmean, and corresponding confidence interval according
to the formula, xmean zcrit * sigma / sqrt(n), where sigma and n are respectively the
standard deviation and size of the data, and zcrit is the critical value corresponding
to the confidence (http://en.wikipedia.org/wiki/Confidence_interval).
In this case, we could look up a table on any statistics book to obtain a crude
approximation to its value, zcrit = 2.576. The remaining values may be computed in
our session and properly combined, as follows:
>>> import scipy
>>> xmean = scipy.mean(scores)
>>> sigma = scipy.std(scores)
[ 14 ]
Chapter 1
>>> n = scipy.size(scores)
>>> xmean, xmean - 2.576*sigma /scipy.sqrt(n), \
xmean + 2.576*sigma / scipy.sqrt(n)
The variable result contains the solution to our problem with some additional
information. Note that result is a tuple with three elements as the help
documentation suggests:
>>> help(scipy.stats.bayes_mvs)
The output of this command will depend on the installed version of SciPy. It might
look like this (run the companion IPython Notebook for this chapter to see how the
actual output from your system is, or run the command in a Python console):
[ 15 ]
Introduction to SciPy
Our solution is the first element of the tuple result; to see its contents, type:
>>> result[0]
Note how this output gives us the same average as before, but a slightly different
confidence interval, due to more accurate computations through SciPy (the output
might be different depending on the SciPy version available on your computer).
After executing this command, the system provides the necessary information.
Equivalently, both NumPy and SciPy come bundled with their own help system,
info. For instance, look at the following command:
>>> import numpy
>>> numpy.info('random')
This will offer a summary of all information parsed from the contents of all
docstrings from the NumPy library associated with the given keyword (note it must
be quoted). The user may navigate the output scrolling up and down, without the
possibility of further interaction.
This is convenient provided we already do know the function we want to use if we
are unsure of its usage. But, what should we do if we don't know about the existence
of this procedure, and suspect that it may exist? The usual Python way is to invoke
the dir() command on a module, which lists all possible attributes.
[ 16 ]
Chapter 1
Interactive Python sessions make it easier to search for such information with the
possibility of navigating and performing further searches inside the output of help
sessions. For instance, type in the following command at prompt:
>>> import scipy.stats
>>> help(scipy.stats)
The output of this command will depend on the installed version of SciPy. It might
look like this (run the companion IPython Notebook for this chapter to see the actual
output from your system, or run the command in a Python console):
Note the colon (:) at the end of the screenthis is an old-school prompt. The system
is in stand-by mode, expecting the user to issue a command (in the form of a single
key). This also indicates that there are a few more pages of help following the given
text. If we intend to read the rest of the help file, we may press spacebar to scroll to
the next page.
[ 17 ]
Introduction to SciPy
In this way, we can visit the following manual pages on this topic. It is also possible
to navigate the manual pages scrolling one line of text at a time using the up and
down arrow keys. When we are ready to quit the help session, we simply press
(the keyboard letter) Q.
It is also possible to search the help contents for a given string. In that case, at the
prompt, we press the (/) slash key. The prompt changes from a colon into a slash,
and we proceed to input the keyword we would like to search for.
For example, is there a SciPy function that computes the Pearson kurtosis of a given
dataset? At the slash prompt, we type in kurtosis and press enter. The help system
takes us to the first occurrence of that string. To access successive occurrences of
the string kurtosis, we press the N key (for next) until we find what we require. At
that stage, we proceed to quit this help session (by pressing Q) and request more
information on the function itself:
>>> help(scipy.stats.kurtosis)
The output of this command will depend on the installed version of SciPy. It might
look like this (run the companion IPython Notebook for this chapter to see how the
actual output from your system is, or run the command in a Python console):
[ 18 ]
Chapter 1
Scientific visualization
At this point, we would like to introduce you to another resource that we will be using
to generate graphs, namely the matplotlib libraries. It may be downloaded from its
official web page, http://matplotlib.org/, and installed following the standard
Python commands. There is a good online documentation in the official web page,
and we encourage the reader to dig deeper than the few commands that we will use
in this book. For instance, the excellent monograph Matplotlib for Python Developers,
Sandro Tosi, Packt Publishing, provides all that we would need and more. Other plotting
libraries are available (commercial or otherwise that aim to very different and specific
applications. The degree of sophistication and ease of use of matplotlib makes it one of
the best options to generate graphics in scientific computing.
Once installed, it may be imported using import matplotlib. Among all its
modules, we will focus on pyplot that provides a comfortable interface with the
plotting libraries. For example, if we desire to plot a cycle of the sine function,
we could execute the following code snippet:
>>> import numpy
>>> import matplotlib.pyplot as plt
>>> x=numpy.linspace(0,2*numpy.pi,32)
>>> fig = plt.figure()
>>> plt.plot(x, numpy.sin(x))
>>> plt.show()
>>> fig.savefig('sine.png')
0.5
0.0
-0.5
-1.0
0
[ 19 ]
Introduction to SciPy
Let us explain each command from the previous session. The first two commands
are used to import numpy and matplotlib.pyplot as usual. We define an array x of
32 uniformly spaced floating point values from 0 to 2, and define y to be the array
containing the sine of the values from x. The command figure creates space in the
memory to store the subsequent plots and puts in place an object of the matplotlib.
figure.Figure form. The plt.plot(x, numpy.sin(x)) command creates an object
of the matplotlib.lines.Line2D form containing data with the plot of x against
numpy.sin(x) together with a set of axes attached to it and labeled according to
the ranges of the variables. This object is stored in the previous Figure object and
is displayed on the screen via the plt.show()command. The last command in the
session, fig.savefig(), saves the Figure object to whatever valid image format
we desire (in this case, a Portable Network Graphics (PNG) image). From now
on, in any code that deals with matplotlib commands, we will leave the option of
showing/saving open.
There are, of course, commands that control the style of axes, aspect ratio between
axes, labeling, colors, legends, the possibility of managing several figures at the
same time (subplots), and many more features to display all sorts of data. We will be
discovering these as we progress with examples throughout the book.
[ 20 ]
Chapter 1
After hitting the enter key, the file should be displayed in the default web browser.
In case that does not happen, please note that the IPython Notebook is officially
supported on the browsers Chrome, Safari, and Firefox. For additional details
refers to the Browser Compatibility section on the documentation currently at
http://ipython.org/ipython-doc/stable/install/install.html.
Once the .ipynb file has been opened, press and hold the shift key and hit enter to
start executing the notebook cell by cell. Another way to execute the notebook cell by
cell is via the player icon on the menu near the left of the cell labeled as markdown.
Alternatively, from the Cell menu (on the top of the browser) you could choose
among several options to execute the contents of the notebook.
To leave the notebook you could choose Close and halt, from the File menu on top
of the browser below the label Notebook. Options to save the notebook can also be
found under the File menu. To completely close the notebook browser you need
to hit the keys ctrl and C simultaneously on the terminal where the notebook was
started and follow the instructions after that.
Summary
In this chapter, you have learned the benefits of using the combination of Python,
NumPy, SciPy, and matplotlib as a programming environment for any scientific
endeavor that requires mathematics; in particular, anything related to numerical
computations. You have explored the environment, learned how to download,
install, and test the required libraries, used them for some quick computations,
and figured out a few good ways to search for help.
In Chapter 2, Working with the NumPy Array As a First Step to SciPy, we will guide you
through basic object creation in SciPy, including the best methods to manipulate
data, or obtain information from it.
[ 21 ]
Get more information Learning SciPy for Numerical and Scientific Computing
Second Edition
www.PacktPub.com
Stay Connected: