Scipy Cookbook
Scipy Cookbook
Release
Various authors
3 Interpolation 87
4 Linear Algebra 93
5 Matplotlib 97
6 Mayavi 191
7 Numpy 249
11 Performance 397
16 Outdated 457
19 Interpolation 487
i
21 Matplotlib / 3D Plotting 491
27 Mayavi 503
28 Numpy 505
32 Performance 513
37 Outdated 523
ii
SciPy Cookbook, Release
This is the “SciPy Cookbook” — a collection of various user-contributed recipes, which once lived under wiki.
scipy.org. If you have a nice notebook you’d like to add here, or you’d like to make some other edits, please see
the SciPy-CookBook repository.
Contents 1
SciPy Cookbook, Release
2 Contents
CHAPTER 1
These are quick examples of using ctypes and numpy to do data acquisition and playback using National Instrument’s
NI-DAQmx library. This library allows access to their wide range of data acquisition devices. By using ctypes, we
bypass the need for a C compiler. The code below assumes a Windows platform. NI-DAQmx is also available for
Linux, but the code below would require a few minor changes, namely loading the shared library and setting the
function signatures.
See also Data acquisition with PyUniversalLibrary.
See also projects that wrap NI-DAQmx library with Linux support: pylibnidaqmx, pydaqmx, daqmxbase-swig.
OK, enough talk, let’s see the code!
#!python numbers=disable
#Acq_IncClk.py
# This is a near-verbatim translation of the example program
# C:\Program Files\National Instruments\NI-DAQ\Examples\DAQmx ANSI C\Analog
˓→In\Measure Voltage\Acq-Int Clk\Acq-IntClk.c
import ctypes
import numpy
nidaq = ctypes.windll.nicaiu # load the DLL
##############################
# Setup some typedefs and constants
# to correspond with values in
# C:\Program Files\National Instruments\NI-DAQ\DAQmx ANSI C Dev\include\NIDAQmx.h
# the typedefs
int32 = ctypes.c_long
uInt32 = ctypes.c_ulong
uInt64 = ctypes.c_ulonglong
float64 = ctypes.c_double
TaskHandle = uInt32
# the constants
DAQmx_Val_Cfg_Default = int32(-1)
3
SciPy Cookbook, Release
DAQmx_Val_Volts = 10348
DAQmx_Val_Rising = 10280
DAQmx_Val_FiniteSamps = 10178
DAQmx_Val_GroupByChannel = 0
##############################
def CHK(err):
"""a simple error checking routine"""
if err < 0:
buf_size = 100
buf = ctypes.create_string_buffer('\000' * buf_size)
nidaq.DAQmxGetErrorString(err,ctypes.byref(buf),buf_size)
raise RuntimeError('nidaq call failed with error %d: %s'%(err,repr(buf.
˓→value)))
# initialize variables
taskHandle = TaskHandle(0)
max_num_samples = 1000
data = numpy.zeros((max_num_samples,),dtype=numpy.float64)
# now, on with the program
CHK(nidaq.DAQmxCreateTask("",ctypes.byref(taskHandle)))
CHK(nidaq.DAQmxCreateAIVoltageChan(taskHandle,"Dev1/ai0","",
DAQmx_Val_Cfg_Default,
float64(-10.0),float64(10.0),
DAQmx_Val_Volts,None))
CHK(nidaq.DAQmxCfgSampClkTiming(taskHandle,"",float64(10000.0),
DAQmx_Val_Rising,DAQmx_Val_FiniteSamps,
uInt64(max_num_samples)));
CHK(nidaq.DAQmxStartTask(taskHandle))
read = int32()
CHK(nidaq.DAQmxReadAnalogF64(taskHandle,max_num_samples,float64(10.0),
DAQmx_Val_GroupByChannel,data.ctypes.data,
max_num_samples,ctypes.byref(read),None))
print "Acquired %d points"%(read.value)
if taskHandle.value != 0:
nidaq.DAQmxStopTask(taskHandle)
nidaq.DAQmxClearTask(taskHandle)
print "End of program, press Enter key to quit"
raw_input()
#!python numbers=disable
"""
This is an interpretation of the example program
C:\Program Files\National Instruments\NI-DAQ\Examples\DAQmx ANSI C\Analog
˓→Out\Generate Voltage\Cont Gen Volt Wfm-Int Clk\ContGen-IntClk.c
# the typedefs
int32 = ctypes.c_long
uInt32 = ctypes.c_ulong
uInt64 = ctypes.c_ulonglong
float64 = ctypes.c_double
TaskHandle = uInt32
# the constants
DAQmx_Val_Cfg_Default = int32(-1)
DAQmx_Val_Volts = 10348
DAQmx_Val_Rising = 10280
DAQmx_Val_FiniteSamps = 10178
DAQmx_Val_ContSamps = 10123
DAQmx_Val_GroupByChannel = 0
##############################
class WaveformThread( threading.Thread ):
"""
This class performs the necessary initialization of the DAQ hardware and
spawns a thread to handle playback of the signal.
It takes as input arguments the waveform to play and the sample rate at which
to play it.
This will play an arbitrary-length waveform file.
"""
def __init__( self, waveform, sampleRate ):
self.running = True
self.sampleRate = sampleRate
self.periodLength = len( waveform )
self.taskHandle = TaskHandle( 0 )
self.data = numpy.zeros( ( self.periodLength, ), dtype=numpy.float64 )
# convert waveform to a numpy array
for i in range( self.periodLength ):
self.data[ i ] = waveform[ i ]
# setup the DAQ hardware
self.CHK(nidaq.DAQmxCreateTask("",
ctypes.byref( self.taskHandle )))
self.CHK(nidaq.DAQmxCreateAOVoltageChan( self.taskHandle,
"Dev1/ao0",
"",
float64(-10.0),
float64(10.0),
DAQmx_Val_Volts,
None))
self.CHK(nidaq.DAQmxCfgSampClkTiming( self.taskHandle,
"",
float64(self.sampleRate),
DAQmx_Val_Rising,
DAQmx_Val_FiniteSamps,
uInt64(self.periodLength)));
self.CHK(nidaq.DAQmxWriteAnalogF64( self.taskHandle,
int32(self.periodLength),
0,
float64(-1),
DAQmx_Val_GroupByChannel,
self.data.ctypes.data,
None,
None))
threading.Thread.__init__( self )
def CHK( self, err ):
"""a simple error checking routine"""
if err < 0:
buf_size = 100
buf = ctypes.create_string_buffer('\000' * buf_size)
nidaq.DAQmxGetErrorString(err,ctypes.byref(buf),buf_size)
raise RuntimeError('nidaq call failed with error %d: %s'%(err,repr(buf.
˓→value)))
if err > 0:
buf_size = 100
buf = ctypes.create_string_buffer('\000' * buf_size)
nidaq.DAQmxGetErrorString(err,ctypes.byref(buf),buf_size)
raise RuntimeError('nidaq generated warning %d: %s'%(err,repr(buf.value)))
def run( self ):
counter = 0
self.CHK(nidaq.DAQmxStartTask( self.taskHandle ))
def stop( self ):
self.running = False
nidaq.DAQmxStopTask( self.taskHandle )
nidaq.DAQmxClearTask( self.taskHandle )
if __name__ == '__main__':
import time
# generate a time signal 5 seconds long with 250Hz sample rate
t = numpy.arange( 0, 5, 1.0/250.0 )
# generate sine wave
x = sin( t )
mythread = WaveformThread( x, 250 )
# start playing waveform
mythread.start()
# wait 5 seconds then stop
time.sleep( 5 )
mythread.stop()
This pages illustrates the use of the inexpensive (about $150) PMD USB-1208FS data acquisition device from Mea-
surement Computing. It makes use of PyUniversalLibrary, an open-source wrapper of Measurement Computing’s
Universal Library.
See also Data acquisition with Ni-DAQmx.
The following examples were made with PyUL Release 20050624. The pre-compiled win32 binaries of this version
are compatible with the Enthought Edition of Python 2.4 (Release 1.0.0, 2006-08-02 12:20), which is what was used
to run these examples.
The first example illustrates the use of the unbuffered analog input:
# example1.py
import UniversalLibrary as UL
import time
BoardNum = 0
Gain = UL.BIP5VOLTS
Chan = 0
tstart = time.time()
data = []
times = []
while 1:
DataValue = UL.cbAIn(BoardNum, Chan, Gain)
data.append( DataValue )
times.append( time.time()-tstart )
if times[-1] > 1.0:
break
import pylab
pylab.plot(times,data,'o-')
pylab.xlabel('time (sec)')
pylab.ylabel('ADC units')
pylab.show()
When I ran this, I had a function generator generating a sine wave connected to pins 1 and 2 of my device. This should
produce a figure like the following:
The values recorded in example 1 are “ADC units”, the values recorded directly by the Analog-to-Digital hardware.
In fact, this device has a 12-bit A to D converter, but the values are stored as 16-bit signed integers. To convert these
values to Volts, we use Measurement Computing’s function. Here we do that for each piece of data and plot the results.
#example2.py
import UniversalLibrary as UL
import time
BoardNum = 0
Gain = UL.BIP5VOLTS
Chan = 0
tstart = time.time()
data = []
times = []
while 1:
DataValue = UL.cbAIn(BoardNum, Chan, Gain)
EngUnits = UL.cbToEngUnits(BoardNum, Gain, DataValue)
data.append( EngUnits )
times.append( time.time()-tstart )
if times[-1] > 1.0:
break
import pylab
pylab.plot(times,data,'o-')
pylab.xlabel('time (sec)')
pylab.ylabel('Volts')
#pylab.savefig('example2.png',dpi=72)
pylab.show()
As you have no doubt noticed, the plots above aren’t very “pure” sine waves. This is undoubtedly due to the way we’re
sampling the data. Rather than relying on a steady clock to do our acquisition, we’re simply polling the device as fast
as it (and the operating system) will let us go. There’s a better way - we can use the clock on board the Measurement
Computing device to acquire a buffer of data at evenly spaced samples.
#example3.py
import UniversalLibrary as UL
import Numeric
import pylab
BoardNum = 0
Gain = UL.BIP5VOLTS
LowChan = 0
HighChan = 0
Count = 2000
Rate = 3125
Options = UL.CONVERTDATA
ADData = Numeric.zeros((Count,), Numeric.Int16)
# convert to Volts
data_in_volts = [ UL.cbToEngUnits(BoardNum, Gain, y) for y in ADData]
#example4.py
import UniversalLibrary as UL
import Numeric
import pylab
BoardNum = 0
Gain = UL.BIP5VOLTS
LowChan = 0
HighChan = 0
Count = 2000
Rate = 10000
Options = UL.CONVERTDATA
ADData = Numeric.zeros((Count,), Numeric.Int16)
# convert to Volts
data_in_volts = [ UL.cbToEngUnits(BoardNum, Gain, y) for y in ADData]
data_in_volts = Numeric.array(data_in_volts) # convert to Numeric array
For this example, I’ve turned up the frequency on the function generator to 480 Hz. You can see, indeed, that’s what
the psd() function tells us:
Attachments
• example1.png
• example2.png
• example3.png
• example4.png
>>> A = N.fromfile("/tmp/tmp_i7j_a/resid2.tmp",
... N.dtype([('pad1','i4'),
... ('TOA','f8'),
... ('resid_p','f8'),
... ('resid_s','f8'),
... ('orb_p','f8'),
... ('f','f8'),
... ('wt','f8'),
... ('sig','f8'),
... ('preres_s','f8'),
... ('pad3','i8'),
... ('pad2','i4')]))
This page gives examples how to read or write a !NumPy array to or from a file, be it ascii or binary. The various
methods demonstrated all have copious and sometimes sophisticated options, call help to get details.
We will consider a trivial example where we create an array of zeros called ‘data‘, write it to a file ‘myfile.txt‘
(myfile.dat for the binary case), and read it into ‘read_data‘.
This documentation could be improved by discussing more sophisticated cases (e.g. multiple arrays), and discussing
the costs/benefits of the various approaches presented.
SciPy
Writing files can be accomplished using ‘savetxt‘. By far the easiest way to read text data is via genfromtxt, (or
derivative convenience functions recfromtxt and recfromcsv).
== Matplotlib (pylab) ==
Matplotlib provides an easy solution which seems to load data faster than read_array:
{{{#!python numbers=disable
>>> from numpy import *
>>> from pylab import load # warning, the load() function of numpy will be
˓→shadowed
numPy
csv files
Note that csv stands for “comma separated value”. This means that the separator (also called a delimiter), i.e. the
character which is used to separate individual values in a file, is a comma. In the examples above, the default delimiter
is a space, but all of the above methods have an option (see their respective help for details), which can be set to a
comma in order to read or write a csv file instead.
Or, assuming you have imported numpy as N, you may want to read arbitrary column types. You can also return a
recarray, which let’s you assign ‘column headings’ to your array.
The advantage of binary files is the huge reduction in file size. The price paid is losing human readability, and in some
formats, losing portability.
Let us consider the array in the previous example.
The simplest possibility is to use ‘s own binary file format. See , and .
You can save several arrays in a single file using . When loading an file you get an object of type . You can obtain a
list of arrays and load individual arrays like this:
On older systems, the standard was to use python’s pickle module to pickle the arrays.
Raw binary
These file formats simply write out the internal representation of the arrays. This is platform-dependent and includes
no information about array shape or datatype, but is quick and easy.
SciPy provides fwrite() from scipy.io.numpyio. You have to set the size of your data, and optionally, its type (integer,
short, float, etc; see 1).
For reading binary files, scipy.io.numpyio provides fread(). You have to know the datatype of your array, its size and
its shape.
Or, you can simply use and . Following the previous example:
>>> data.tofile('myfile.dat')
>>> fd = open('myfile.dat', 'rb')
>>> read_data = numpy.fromfile(file=fd, dtype=numpy.uint8).reshape(shape)
numpy data type. The option {{{fromfile(..., count=<number>)}}} specifies the number
˓→of data entries of that type you want to read in (the default -1 means read in the
˓→whole file, which is what you usually want). However, the method is not recommended
˓→for data storage and transfer between different platforms, since no byteorder and
If you want that, use {{{numpy}}}'s own binary file format. See {{{numpy.save}}}, {{
˓→{numpy.savez}}} and {{{numpy.load}}}.
Another, but deprecated, way to fully control endianness (byteorder), storage order (row-major, column-major) for
rank > 1 arrays and datatypes that are written and read back is . Writing:
libnpy is a small library that provides simple routines for saving a C or Fortran array to a data file using NumPy’s own
binary format. For a description of this format, do
#include"npy.h"
int main(){
double a[2][4] = { { 1, 2, 3, 4 },
{ 5, 6, 7, 8 } };
int shape[2] = { 2, 4 }, fortran_order = 0;
The program creates a file ‘ca.npy‘ that you can load into python in the usual way.
>>> ca = np.load('ca.npy')
>>> print ca
[[ 1. 2. 3. 4.]
[ 5. 6. 7. 8.]]
program fex
use fnpy
use iso_c_binding
implicit none
integer :: i
real(C_DOUBLE) :: a(2,4) = reshape([(i, i=1,8)], [2,4])
but the entries of the NumPy array now follow the Fortran (column-major) ordering.
>>> fa = np.load('fa.npy')
>>> print fa
[[ 1. 3. 5. 7.]
[ 2. 4. 6. 8.]]
The ‘README‘ file in the source distribution explains how to compile the library using ‘make‘.
If you put ‘npy.h‘ and ‘libnpy.a‘ in the same directory as ‘cex.c‘, then you can build the executable ‘cex‘ with the
command
Similarly, with ‘npy.mod‘ and ‘libnpy.a‘ in the same directory as ‘fex.f95‘, build ‘fex‘ with the command
Section author: Unknown[5], VincentNijs, Unknown[56], FredericPetit, Elby, MartinSpacek, Unknown[57], Un-
known[58], Unknown[53], AMArchibald, Unknown[59], Unknown[60], Unknown[61], Unknown[62], MikeToews
LAS reader
This cookbook example contains a module that implements a reader for a LAS (Log ASCII Standard) well log file
(LAS 2.0). See the Canadian Well Logging Society page about this format for more information.
#!python
"""LAS File Reader
The main class defined here is LASReader, a class that reads a LAS file
and makes the data available as a Python object.
"""
import re
import keyword
import numpy as np
def isidentifier(s):
if s in keyword.kwlist:
return False
return re.match(r'^[a-z_][a-z0-9_]*$', s, re.I) is not None
def _convert_to_value(s):
try:
value = int(s)
except ValueError:
try:
value = float(s)
except ValueError:
value = s
return value
class LASError(Exception):
pass
class LASItem(object):
"""This class is just a namespace, holding the attributes 'name',
'units', 'data', 'value', and 'descr'. 'value' is the numerical
value of 'data', if it has a numerical value (specifically, if
int() or float() don't raise an exception when given the value
of the 'data' attribute).
"""
def __init__(self, name, units='', data='', descr=''):
self.name = name
self.units = units
self.data = data
self.value = _convert_to_value(data)
self.descr = descr
def __str__(self):
s = ("name='%s', units='%s', data='%s', descr='%s'" %
(self.name, self.units, self.data, self.descr))
return s
def __repr__(self):
s = str(self)
return "LASItem(%s)" % s
@classmethod
def from_line(cls, line):
first, descr = line.rsplit(':', 1)
descr = descr.strip()
name, mid = first.split('.', 1)
name = name.strip()
if mid.startswith(' '):
# No units
units = ''
data = mid
else:
units_data = mid.split(None, 1)
if len(units_data) == 1:
units = units_data[0]
data = ''
else:
units, data = units_data
return LASItem(name=name, units=units, data=data.strip(),
descr=descr.strip())
try:
row = _read_wrapped_row(f, ncols)
except Exception:
break
data.append(tuple(row))
data = np.array(data, dtype=dt)
return data
class LASSection(object):
"""Represents a "section" of a LAS file.
then the item may be referred to as `s.FD` (in addition to the longer
`s.items['FD']`).
items : dict
The keys are the item names, and the values are LASItem instances.
names : list
List of item names, in the order they were read from the LAS file.
"""
def __init__(self):
# Note: In Python 2.7, 'items' could be an OrderedDict, and
# then 'names' would not be necessary--one could use items.keys().
self.items = dict()
self.names = []
def display(self):
for name in self.names:
item = self.items[name]
namestr = name
if item.units != '':
namestr = namestr + (" (%s)" % item.units)
print "%-16s %-30s [%s]" % (namestr, "'" + item.data + "'",
item.descr)
class LASReader(object):
"""The LASReader class holds data from a LAS file.
This reader only handles LAS 2.0 files (as far as I know).
LASReader(f, null_subs=None)
other : str
Holds the contents of the '~O' section as a single string.
wrap : bool
True if the LAS file was wrapped. (More specifically, this
attribute is True if the data field of the item with the
name 'WRAP' in the '~V' section has the value 'YES'.)
vers : str
The LAS version. (More specifically, the value of the data
field of the item with the name 'VERS' in the '~V' section).
start_units : str
Units of the 'STRT' item from the '~W' section.
The value will be None if 'STRT' was not given in the file.
stop : float
Numerical value of the 'STOP' item from the '~W' section.
The value will be None if 'STOP' was not given in the file.
stop_units : str
Units of the 'STOP' item from the '~W' section.
The value will be None if 'STOP' was not given in the file.
step : float
Numerical value of the 'STEP' item from the '~W' section.
The value will be None if 'STEP' was not given in the file.
step_units : str
Units of the 'STEP' item from the '~W' section.
The value will be None if 'STEP' was not given in the file.
"""
If 'null_subs' is not None, its value replaces any values in the data
that matches the NULL value specified in the Version section of the LAS
file.
"""
self.null = None
self.null_subs = null_subs
self.start = None
self.start_units = None
self.stop = None
self.stop_units = None
self.step = None
self.step_units = None
self.version = LASSection()
self.well = LASSection()
self.curves = LASSection()
self.parameters = LASSection()
self.other = ''
self.data = None
self._read_las(f)
Returns a dictionary with keys 'V', 'W', 'C', 'P', 'O' and 'A',
corresponding to the sections of a LAS file. The values associated
with keys 'V', 'W', 'C' and 'P' will be lists of Item instances. The
value associated with the 'O' key is a list of strings. The value
associated with the 'A' key is a numpy structured array containing the
log data. The field names of the array are the mnemonics from the
Curve section of the file.
"""
opened_here = False
if isinstance(f, basestring):
opened_here = True
f = open(f, 'r')
self.wrap = False
line = f.readline()
current_section = None
current_section_label = ''
while not line.startswith('~A'):
if not line.startswith('#'):
if line.startswith('~'):
if len(line) < 2:
raise LASError("Missing section character after '~'.")
current_section_label = line[1:2]
other = False
if current_section_label == 'V':
current_section = self.version
elif current_section_label == 'W':
current_section = self.well
elif current_section_label == 'C':
current_section = self.curves
elif current_section_label == 'P':
current_section = self.parameters
elif current_section_label == 'O':
current_section = self.other
other = True
else:
raise LASError("Unknown section '%s'" % line)
elif current_section is None:
raise LASError("Missing first section.")
else:
if other:
# The 'Other' section is just lines of text, so we
# assemble them into a single string.
self.other += line
current_section = self.other
else:
# Parse the line into a LASItem and add it to the
# current section.
m = LASItem.from_line(line)
current_section.add_item(m)
# Check for the required items whose values we'll
# store as attributes of the LASReader instance.
if current_section == self.version:
if m.name == 'WRAP':
if m.data.strip() == 'YES':
self.wrap = True
if m.name == 'VERS':
self.vers = m.data.strip()
if current_section == self.well:
if m.name == 'NULL':
self.null = float(m.data)
elif m.name == 'STRT':
self.start = float(m.data)
self.start_units = m.units
if opened_here:
f.close()
if __name__ == "__main__":
import sys
Attachments
• las.py
• sample3.las
• sample3plot.png
Some charge-coupled device (CCD) cameras (Princeton and like) produce SPE files. This page suggests how to read
such binary files with Numpy but the code is not robust. The following code is only able to read files having the same
format as the example, ‘lampe_dt.spe’ (unfortuanetly the only SPE file on the wiki).
Only Numpy is required for loading SPE file, the result will be an array made of colors. The image size is at position
42 and 656 and the data at 4100. There are then many other data in a SPE file header, one must be the data type (you
are welcome to edit this page if you know where). Finally note that the image is always made of colors coded on
unsigned integer of 16 bits but it might not be the case in your input file.
#!python numbers=disabled
# read_spe.py
import numpy as N
class File(object):
def _load_size(self):
self._xdim = N.int64(self.read_at(42, 1, N.int16)[0])
self._ydim = N.int64(self.read_at(656, 1, N.int16)[0])
def _load_date_time(self):
rawdate = self.read_at(20, 9, N.int8)
rawtime = self.read_at(172, 6, N.int8)
strdate = ''
for ch in rawdate :
strdate += chr(ch)
for ch in rawtime:
strdate += chr(ch)
self._date_time = time.strptime(strdate,"%d%b%Y%H%M%S")
def get_size(self):
return (self._xdim, self._ydim)
def load_img(self):
img = self.read_at(4100, self._xdim * self._ydim, N.uint16)
return img.reshape((self._ydim, self._xdim))
def close(self):
self._fid.close()
def load(fname):
fid = File(fname)
img = fid.load_img()
fid.close()
return img
if __name__ == "__main__":
import sys
img = load(sys.argv[-1])
The ‘read_spe.py’ script from above and the ‘lampe_dt.spe’ example are provided in the archive read_spe.zip.
Once decompresesed, )# you can then start ipython in the directory where the script lives:
The following first line will show the image in a new window. The second line will change the colormap (try
‘help(pylab.colormaps)’ for listing them).
#!python
>>> pylab.imshow(img)
>>> pylab.hot()
Attachments
• lampe_dt.png
• read_spe.zip
Here are exemples of how to read two variables {{{lat}}} and {{{lon}}} from a mat
˓→file called "test.mat".
= Matlab up to 7.1 =
mat files created with Matlab up to version 7.1 can be read using the {{{mio}}}
˓→module part of {{{scipy.io}}}. Reading structures (and arrays of structures) is
˓→supported, elements are accessed with the same syntax as in Matlab: after reading a
˓→structure called e.g. {{{struct}}}, its {{{lat}}} element can be obtained with {{
˓→from a string.
{{{
#!python
#!/usr/bin/env python
from scipy.io import loadmat
x = loadmat('test.mat')
lon = x['lon']
lat = x['lat']
# one-liner to read a single variable
lon = loadmat('test.mat')['lon']
Beginning at release 7.3 of Matlab, mat files are actually saved using the HDF5 format by default (except if you use
the -vX flag at save time, see in Matlab). These files can be read in Python using, for instance, the PyTables or h5py
package. Reading Matlab structures in mat files does not seem supported at this point.
#!python
#!/usr/bin/env python
import tables
file = tables.openFile('test.mat')
lon = file.root.lon[:]
lat = file.root.lat[:]
# Alternate syntax if the variable name is in a string
varname = 'lon'
lon = file.getNode('/' + varname)[:]
hdf5 in Matlab
Python can save rich hierarchical datasets in hdf5 format. Matlab can read hdf5, but the api is so heavy it is almost
unusable. Here are some matlab scripts (written by Gaël Varoquaux) to load and save data in hdf5 format under Matlab
with the same signature as the standard matlab load/save function.
hdf5matlab.zip
These Matlab scripts cannot load every type allowed in hdf5. Feel free to provide python scripts to use pytables to
implement simple load/save functions compatible with this hdf5 subset.
One notice: these script use the “Workspace” namespace to store some variables, they will pollute your workspace
when saving data from Matlab. Nothing that I find unacceptable.
Here is a second HDF5 loader script, which loads (optionally partial) data from a HDF5 file to a Matlab structure
h5load.m
It can deal with more varied HDF5 datasets than the Matlab high-level functions (at least R2008a hdf5info fails with
chunked compressed datasets), via using only the low-level HDF5 API.
The script also recognizes complex numbers in the Pytables format, and permutes array dimensions to match the
logical order in the file (ie. to match Python. The builtin Matlab functions by default return data in the opposite order,
so the first dimension in Python would be the last in Matlab).
Attachments
• h5load.m
• hdf5matlab.zip
I’ve written several C extensions that handle NumPy arrays. They are simple, but they seem to work well. They will
show you how to pass Python variables and NumPy arrays to your C code. Once you learn how to do it, it’s pretty
straight-forward. I suspect they will suffice for most numerical code. I’ve written it up as a draft and have made the
code and document file available. I found for my numerical needs I really only need to pass a limited set of things
(integers, floats, strings, and NumPy arrays). If that’s your category, this code might help you.
I have tested the routines and so far,so good, but I cannot guarantee anything. I am a bit new to this. If you find any
errors put up a message on the SciPy mailing list.
A link to the tar ball that holds the code and docs is given below.
I have recently updated some information and included more examples. The document presented below is the original
documentation which is still useful. The link below holds the latest documentation and source code.
• Cext_v2.tar.gz
– Lou Pecora
By Lou Pecora - 2006-12-07 (Draft version 0.1)
In my use of Python I came across a typical problem: I needed to speed up particular parts of my code. I am not
a Python guru or any kind of coding/computer guru. I use Python for numerical calculations and I make heavy use
of Numeric/NumPy. Almost every Python book or tutorial tells you build C extensions to Python when you need a
routine to run fast. C extensions are C code that can be compiled and linked to a shared library that can be imported
like any Python module and you can call specified C routines like they were Python functions.
Sounds nice, but I had reservations. It looked non-trivial (it is, to an extent). So I searched for other solutions. I found
them. They are such approaches as SWIG, Pyrex, ctypes, Psyco, and Weave. I often got the simple examples given
to work (not all, however) when I tried these. But I hit a barrier when I tried to apply them to NumPy. Then one gets
into typemaps or other hybrid constructs. I am not knocking these approaches, but I could never figure them out and
29
SciPy Cookbook, Release
get going on my own code despite lots of online tutorials and helpful suggestions from various Python support groups
and emailing lists.
So I decided to see if I could just write my own C extensions. I got help in the form of some simple C extension
examples for using Numeric written about 2000 from Tom Loredo of Cornell university. These sat on my hard drive
until 5 years later out of desperation I pulled them out and using his examples, I was able to quickly put together
several C extensions that (at least for me) handle all of the cases (so far) where I want a speedup. These cases mostly
involve passing Python integers, floats (=C doubles), strings, and NumPy 1D and 2D float and integer arrays. I rarely
need to pass anything else to a C routine to do a calculation. If you are in the same situation as me, then this package
I put together might help you. It turns out to be fairly easy once you get going.
Please note, Tom Loredo is not responsible for any errors in my code or instructions although I am deeply indebted to
him. Likewise, this code is for research only. It was tested by only my development and usage. It is not guaranteed,
and comes with no warranty. Do not use this code where there are any threats of loss of life, limb, property, or money
or anything you or others hold dear.
I developed these C extensions and their Python wrappers on a Macintosh G4 laptop using system OS X 10.4 (essential
BSD Unix), Python 2.4, NumPy 0.9x, and the gnu compiler and linker gcc. I think most of what I tell you here will
be easily translated to Linux and other Unix systems beyond the Mac. I am not sure about Windows. I hope that my
low-level approach will make it easy for Windows users, too.
The code (both C and Python) for the extensions may look like a lot, but it is very repetitious. Once you get the
main scheme for one extension function you will see that repeated over and over again in all the others with minor
variations to handle different arguments or return different objects to the calling routine. Don’t be put off by the code.
The good news is that for many numerical uses extensions will follow the same format so you can quickly reuse what
you already have written for new projects. Focus on one extension function and follow it in detail (in fact, I will do
this below). Once you understand it, the other routines will be almost obvious. The same is true of the several utility
functions that come with the package. They help you create, test, and manipulate the data and they also have a lot of
repetition. The utility functions are also very short and simple so nothing to fear there.
This will be covered in detail below, but first I wanted to give you a sense of how each extension is organized.
Three things that must be done before your C extension functions in the C source file.
1. You must include Python and NumPy headers.
2. Each extension must be named in a defining structure at the beginning of the file. This is a name used to access
the extension from a Python function.
3. Next an initialization set of calls is made to set up the Python and NumPy calls and interface. It will be the same
for all extensions involving NumPy and Python unless you add extensions to access other Python packages or
classes beyond NumPy arrays. I will not cover any of that here (because I don’t know it). So the init calls can
be copied to each extension file.
Each C extension will have the following form.
• The arguments will always be the same: (PyObject *self, PyObject *args) - Don’t worry if you don’t
know what exactly these are. They are pointers to general Python objects and they are automatically provided
by the header files you will use from NumPy and Python itself. You need know no more than that.
• The args get processed by a function call that parses them and assigns them to C defined objects.
• Next the results of that parse might be checked by a utility routine that reaches into the structure representing
the object and makes sure the data is the right kind (float or int, 1D or 2D array, etc.). Although I included some
of these C-level checks, you will see that I think they are better done in Python functions that are used to wrap
the C extensions. They are also a lot easier to do in Python. I have plenty of data checks in my calling Python
wrappers. Usually this does not lead to much overhead since you are not calling these extensions billions of
times in some loop, but using them as a portal to a C or C++ routine to do a long, complex, repetitive, specialized
calculation.
• After some possible data checks, C data types are initialized to point to the data part of the NumPy arrays with
the help of utility functions.
• Next dimension information is extracted so you know the number of columns, rows, vector dimensions, etc.
• Now you can use the C arrays to manipulate the data in the NumPy arrays. The C arrays and C data from the
above parse point to the original Python/NumPy data so changes you make affect the array values when you go
back to Python after the extension returns. You can pass the arrays to other C functions that do calculations, etc.
Just remember you are operating on the original NumPy matrices and vectors.
• After your calculation you have to free any memory allocated in the construction of your C data for the NumPy
arrays. This is done again by Utility functions. This step is only necessary if you allocated memory to handle
the arrays (e.g. in the matrix routines), but is not necessary if you have not allocated memory (e.g. in the vector
routines).
• Finally, you return to the Python calling function, by returning a Python value or NumPy array. I have C
extensions which show examples of both.
It is best to call the C extensions by calling a Python function that then calls the extension. This is called a Python
wrapper function. It puts a more pythonic look to your code (e.g. you can use keywords easily) and, as I pointed out
above, allows you to easily check that the function arguments and data are correct before you had them over to the C
extension and other C functions for that big calculation. It may seem like an unnecessary extra step, but it’s worth it.
In this section I refer to the code in the source files C_arraytest.h, C_arraytest.c, C_arraytest.py, and
C_arraytest.mak. You should keep those files handy (probably printed out) so you can follow the explanations
of the code below.
First, I will use the example of code from C_arraytest.h, C_arraytest.c for the routine called matsq. This function takes
a (NumPy) matrix A, integer i, and (Python) float y as input and outputs a return (NumPy) matrix B each of whose
components is equal to the square of the input matrix component times the integer times the float. Mathematically:
𝐵𝑖𝑗 = 𝑖𝑦(𝐴𝑖𝑗 )2
The Python code to call the matsq routine is A=matsq(B,i,y). Here is the relevant code in one place:
The Header file, C_arraytest.h:
The Source file, C_arraytest.c:
Now, lets look at the source code in smaller chunks.
Headers
You must include the following headers with Python.h always the first header included.
#include "Python.h"
#include "arrayobject.h"
I also include the header C_arraytest.h which contains the prototype of the matsq function:
The static keyword in front of a function declaration makes this function private to your extension module. The linker
just won’t see it. This way you can use the same intuitional function names(i.e. sum, check, trace) for all extension
modules without having name clashes between them at link time. The type of the function is PyObject * because it
will always be returning to a Python calling function so you can (must, actually) return a Python object. The arguments
are always the same,
The first one self is never used, but necessary because of how Python passes arguments. The second args is a pointer
to a Python tuple that contains all of the arguments (B,i,x) of the function.
Method definitions
This sets up a table of function names that will be the interface from your Python code to your C extension. The name
of the C extension module will be _C_arraytest (note the leading underscore). It is important to get the name right
each time it is used because there are strict requirements on using the module name in the code. The name appears
first in the method definitions table as the first part of the table name:
where I used ellipses (...) to ignore other code not relevant to this function. The METH_VARARGS parameter tells
the compiler that you will pass the arguments the usual way without keywords as in the example A=matsq(B,i,x)
above. There are ways to use Python keywords, but I have not tried them out. The table should always end with
{NULL, NULL} which is just a “marker” to note the end of the table.
Initializations
These functions tell the Python interpreter what to call when the module is loaded. Note the name of the module
(_C_arraytest) must come directly after the init in the name of the initialization structure.
void init_C_arraytest() {
(void) Py_InitModule("_C_arraytest", _C_arraytestMethods);
import_array(); // Must be present for NumPy. Called first after above line.
}
The order is important and you must call these two initialization functions first.
Now here is the actual function that you will call from Python code. I will split it up and explain each section.
The function name and type:
The PyArrayObjects are structures defined in the NumPy header file and they will be assigned pointers to the
actual input and output NumPy arrays (A and B). The C arrays cin and cout are Cpointers that will point (eventually)
to the actual data in the NumPy arrays and allow you to manipulate it. The variable dfactor will be the Python float
y, ifactor will be the Python int i, the variables i,j,n, and m will be loop variables (i and j) and matrix dimensions
(n= number of rows, m= number of columns) in A and B. The array dims will be used to access n and m from the
NumPy array. All this happens below. First we have to extract the input variables (A, i, y) from the args tuple. This is
done by the call,
The PyArg_ParseTuple function takes the args tuple and using the format string that appears next (“O!id” ) it
assigns each member of the tuple to a C variable. Note you must pass all C variables by reference. This is true even if
the C variable is a pointer to a string (see code in vecfcn1 routine). The format string tells the parsing function what
type of variable to use. The common variables for Python all have letter names (e.g. s for string, i for integer, d for
(double - the Python float)). You can find a list of these and many more in Guido’s tutorial (http://docs.python.org/ext/
ext.html). For data types that are not in standard Python like the NumPy arrays you use the O! notation which tells
the parser to look for a type structure (in this case a NumPy structure PyArray_Type) to help it convert the tuple
member that will be assigned to the local variable ( matin ) pointing to the NumPy array structure. Note these are also
passed by reference. The order must be maintained and match the calling interface of the Python function you want.
The format string defines the interface and if you do not call the function from Python so the number of arguments
match the number in the format string, you will get an error. This is good since it will point to where the problem is.
If this doesn’t work we return NULL which will cause a Python exception.
Next we have a check that the input matrix really is a matrix of NumPy type double. This test is also done in the
Python wrapper for this C extension. It is better to do it there, but I include the test here to show you that you can
do testing in the C extension and you can “reach into” the NumPy structure to pick out it’s parameters. The utility
function not_doublematrix is explained later.
Here’s an example of reaching into the NumPy structure to get the dimensions of the matrix matin and assign them to
local variables as mentioned above.
Now we use these matrix parameters to generate a new NumPy matrix matout (our output) right here in our C ex-
tension. PyArray_FromDims(2,dims,NPY_DOUBLE) is a utility function provided by NumPy (not me) and its
arguments tell NumPy the rank of the NumPy object (2), the size of each dimension (dims), and the data type
(NPY_DOUBLE). Other examples of creating different NumPy arrays are in the other C extensions.
To actually do our calculations we need C structures to handle our data so we generate two C 2-dimensional arrays
(cin and cout) which will point to the data in matin and matout, respectively. Note, here memory is allocated since we
need to create an array of pointers to C doubles so we can address cin and cout like usual C matrices with two indices.
This memory must be released at the end of this C extension. Memory allocation like this is not always necessary.
See the routines for NumPy vector manipulation and treating NumPy matrices like contiguous arrays (as they are in
NumPy) in the C extension (the routine contigmat).
/* Change contiguous arrays into C ** arrays (Memory is Allocated!) */
cin=pymatrix_to_Carrayptrs(matin);
cout=pymatrix_to_Carrayptrs(matout);
Finally, we get to the point where we can manipulate the matrices and do our calculations. Here is the part where
the original equation operations 𝐵𝑖𝑗 = 𝑖𝑦(𝐴𝑖𝑗 )2 are carried out. Note, we are directly manipulating the data in the
original NumPy arrays A and B passed to this extension. So anything you do here to the components of cin or cout
will be done to the original matrices and will appear there when you return to the Python code.
/* Do the calculation. */
for ( i=0; i<n; i++) {
for ( j=0; j<m; j++) {
cout[i][j]= ifactor*dfactor*cin[i][j]*cin[i][j];
}
}
We are ready to go back to the Python calling routine, but first we release the memory we allocated for cin and cout.
/* Free memory, close file and return */
free_Carrayptrs(cin);
free_Carrayptrs(cout);
If you look at the other C extensions you can see that you can also return regular Python variables (like ints) using
another Python-provided function Py_BuildValue("i", 1) where the string “i” tells the function the data type
and the second argument (1 here) is the data value to be returned. If you decide to return nothing, you ‘’‘must’‘’ return
the Python keyword None like this:
Py_INCREF(Py_None);
return Py_None;
The Py_INCREF function increases the number of references to None (remember Python collects allocated memory
automatically when there are no more references to the data). You must be careful about this in the C extensions. For
more info see Guido´s tutorial.
Here are some quick descriptions of the matrix utility functions. They are pretty much self-explanatory. The vector
and integer array utility functions are very similar.
The first utility function is not used in any of the C extensions here, but I include it because a helpful person sent it
along with some code and it does show how one might convert a python object to a NumPy array. I have not tried it.
Use at your own risk.
The next one creates the C arrays that are used to point to the rows of the NumPy matrices. This allocates arrays of
pointers which point into the NumPy data. The NumPy data is contiguous and strides (m) are used to access each
row. This function calls ptrvector(n) which does the actual memory allocation. Remember to deallocate memory after
using this one.
n=arrayin->dimensions[0];
m=arrayin->dimensions[1];
c=ptrvector(n);
a=(double *) arrayin->data; /* pointer to arrayin data as double */
for ( i=0; i<n; i++) {
c[i]=a+i*m;
}
return c;
}
Here is where the memory for the C arrays of pointers is allocated. It’s a pretty standard memory allocator for arrays.
double **ptrvector(long n) {
double **v;
v=(double **)malloc((size_t) (n*sizeof(double)));
if (!v) {
printf("In **ptrvector. Allocation of memory for double array failed.");
exit(0);
}
return v;
}
Note: There is a standard C-API for converting from Python objects to C-style arrays-of-pointers called
PyArray_AsCArray
Here is a utility function that checks to make sure the object produced by the parse is a NumPy matrix. You can see
how it reaches into the NumPy object structure.
return 1; }
return 0;
}
As I mentioned in the introduction the functions are repetitious. All the other functions follow a very similar pattern.
They are given a line in the methods structure, they have the same arguments, they parse the arguments, they may
check the C structures after the parsing, they set up C variables to manipulate which point to the input data, they do
the actual calculation, they deallocate memory (if necessary) and they return something for Python (either None or a
Python object). I’ll just mention some of the differences in the code from the above matrix C extension matsq.
vecfcn1:
The format string for the parse function specifies that a variable from Python is a string (s).
if (!PyArg_ParseTuple(args, "O!O!sd", &PyArray_Type, &vecin,
&PyArray_Type, &vecout, &str, &dfac)) return NULL;
No memory is allocated in the pointer assignments for the local C arrays because they are already vectors.
cin=pyvector_to_Carrayptrs(vecin);
cout=pyvector_to_Carrayptrs(vecout);
rowx2:
In this routine we “pass back” the output using the fact that it is passed by reference in the argument tuple list and is
changed in place by the manipulations. Compare this to returning an array from the C extension in matsq. Either one
gets the job done.
contigmat:
Here the matrix data is treated like a long vector (just like stacking the rows end to end). This is useful if you have array
classes in C++ which store the data as one long vector and then use strides to access it like an array (two-dimensional,
three-dimensional, or whatever). Even though matin and matout are “matrices” we treat them like vectors and use the
vector utilities to get our C pointers cin and cout.
/* Change contiguous arrays into C * arrays pointers to PyArrayObject data */
cin=pyvector_to_Carrayptrs(matin);
cout=pyvector_to_Carrayptrs(matout);
For other utility functions note that we use different rank, dimensions, and NumPy parameters (e.g. NPY_LONG) to
tell the routines we are calling what the data types are.
The make file is very simple. It is written for Mac OS X 10.4, as BSD Unix.
# ---- Link ---------------------------
_C_arraytest.so: C_arraytest.o C_arraytest.mak gcc -bundle -flat_namespace -
˓→undefined suppress -o _C_arraytest.so C_arraytest.o
-I/Library/Frameworks/Python.framework/Versions/2.4/include/python2.4
-I/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/site-packages/numpy/
˓→core/include/numpy/
The compile step is pretty standard. You do need to add paths to the Python headers:
-I/Library/Frameworks/Python.framework/Versions/2.4/include/python2.4
and paths to NumPy headers:
-I/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/
site-packages/numpy/core/include/numpy/
These paths are for a Framework Python 2.4 install on Mac OS X. You need to supply paths to the headers installed
on your computer. They ma y be different. My guess is the gcc flags will be the same for the compile.
The link step produces the actual module (_C_arraytest.so) which can be imported to Python code. This is
specific to the Mac OS X system. On Linux or Windows you will need something different. I have been searching for
generic examples, but I’m not sure what I found would w ork for most people so I chose not to display the findings
there. I cannot judge whether the code is good for those systems.
Note, again the name of the produced shared library ‘’‘must’‘’ match the name in the initialization and methods
definition calls in the C extension source code. Hence the leading underline in the name _C_arraytest.so.
Here’s my modified Makefile which compiles this code under Linux (save it as Makefile in the same directory, and
then run ‘make’
– PaulIvanov
# ---- Link ---------------------------
_C_arraytest.so: C_arraytest.o
gcc -shared C_arraytest.o -o _C_arraytest.so
Here as in the C code I will only show detailed description of one wrapper function and its use. There is so much
repetition that the other wrappers will be clear if you understand one. I will again use the matsq function. This is the
code that will first be called when you inv oke the matsq function in your own Python code after importing the wrapper
module (C_arraytest.py) which automatically imports and uses ( in a way hidden from the user) the actual C
extensions in _C_arraytest.so (note the leading underscore which keeps the names separate).
imports:
Import the C extensions, NumPy, and the system module (used for the exit statement at the end which is optional).
import _C_arraytest
import numpy as NP
import sys
from C_arraytest import * # Note, NO underscore since you are importing the wrapper.
import numpy as NP
import sys
def mattest2():
print "\n--- Test matsq ------------------------------"
print " Each 2nd matrix component should = square of 1st matrix component x ifac
˓→x dfac \n"
if __name__ == '__main__':
mattest2()
x= [[ 0. 1. 2. 3.]
[ 0. 3. 6. 9.]]
y= [[ 0. 3. 12. 27.]
[ 0. 27. 108. 243.]]
The output of all the test functions is in the file C_arraytest output.
This is the first draft explaining the C extensions I wrote (with help). If you have comments on the code, mistakes, etc.
Please post them on the pythonmac email list. I will see them.
I wrote the C extensions to work with NumPy although they were originally written for Numeric. If you must use
Numeric you should test them to see if they are compatible. I suspect names like NPY_DOUBLE, for example, will not
be. I strongly suggest you upgrade to the NumPy since it is the future of Numeric in Python. It’s worth the effort.
Note that this line, while in the header file above, is missing from the .h in the tar-ball.
– PaulIvanov
The output file name should be _C_arraytest_d.pyd for Debug version and _C_arraytest.pyd for Release version.
– Geoffrey Zhu
ptrvector() allocates n*sizeof(double), but should really allocate pointers to double; so: n*sizeof(double
*)
– Peter Meerwald
In vecsq(), line 86, since you will be creating a 1-dimensional vector:
should be:
Attachments
• C_arraytest.c
• C_arraytest.c_v2
• C_arraytest.h
• C_arraytest.h_v2
• Cext_v2.tar.gz
C extensions
extmodule.h:
#ifndef EXTMODULE_H #define EXTMODULE_H
#ifdef __cplusplus extern “C” { #endif
/* Python.h must be included before everything else */ #include “Python.h”
/* include system headers here */
#if !defined(EXTMODULE_IMPORT_ARRAY) #define NO_IMPORT_ARRAY #endif #include
“numpy/arrayobject.h”
2.2. C extensions 39
SciPy Cookbook, Release
1. Python 2.5 (with earlier versions of Python, it is much more complicated to compile extensions with mingw.
Use visual Studio 2003 if you cannot upgrade to Python 2.5)
2. Local admin access on your machine
1. Go to www.mingw.org and go to the “downloads” section of the web-site. Look for the link to the sourceforge
page and go there. Grab the latest version of the “Automated MinGW Installer” and run the installer.
2. When asked “which MinGW package” you would like to install, select “Current” or “Candidate” depending on
whether you want to be on the bleeding edge or not.
3. When you get to the section where you need to select the various components you want to install, include the
“base tools” and the “g++ compiler”, and possibly the g77 compiler too if you are going to be compiling some
extensions that use Fortran.
4. Install it to “C::raw-latex:‘\MinGW‘” (the default location)
5. Add “C::raw-latex:‘\MinGW‘:raw-latex:‘\bin‘” to your PATH environment variable if the installer did not do
this for you. If you are running Vista, you will need to add “C::raw-latex:‘\MinGW‘:raw-latex:‘\libexec‘:raw-
latex:‘\gcc‘:raw-latex:‘\mingw‘32:raw-latex:‘3‘.4.5” to your PATH as well
6. In notepad (or your favourite text editor) create a new text file and enter the following in the file:
[build] compiler = mingw32
7. Save the file as “C::raw-latex:‘\Python‘25:raw-latex:‘Lib‘:raw-latex:distutils:raw-latex:‘\distutils‘.cfg”.
This will tell python to use the MinGW compiler when compiling extensions
8. Close down any command prompts you have open (they need to be reopened to see the new environment variable
values)
Note: this will only work for modules which have a setup.py script that uses distutils
1. So lets say you have downloaded code for some module to the folder “xyz”. Open a command prompt and cd
to this directory (eg. type “c:” then press enter to change to the C drive, then type “cd c::raw-latex:‘\xyz‘” to
change to directory xyz)
2. To create a binary installer, type “python setup.py bdist_wininst”
3. If everything worked fine, then there should now be a subfolder called “dist” which contains an exe file that you
can run to install the module.
Section author: MattKnox
Ctypes
ctypes is an advanced Foreign Function Interface package for Python 2.3 and higher. It is included in the standard
library for Python 2.5.
ctypes allows to call functions exposed from DLLs/shared libraries and has extensive facilities to create, access and
manipulate simple and complicated C data types in Python - in other words: wrap libraries in pure Python. It is even
possible to implement C callback functions in pure Python.
ctypes also includes a code generator tool chain which allows automatic creation of library wrappers from C header
files. ctypes works on Windows, Mac OS X, Linux, Solaris, FreeBSD, OpenBSD and other systems.
Ensure that you have at least ctypes version 1.0.1 or later.
Other possibilities to call or run C code in python include: SWIG, Cython, Weave, cffi, etc.
The ctypes tutorial and the ctypes documentation for Python provide extensive information on getting started with
ctypes.
Assuming you’ve built a library called ‘foo.dll‘ or ‘libfoo.so‘ containing a function called ‘bar‘ that takes a pointer to
a buffer of doubles and an int as arguments and returns an int, the following code should get you up and running. The
following sections cover some possible build scripts, C code and Python code.
If you would like to build your DLL/shared library with distutils, take a look at the !SharedLibrary distutils extension
included with OOF2. This should probably be included in numpy.distutils at some point.
Run nmake inside the Visual Studio Command Prompt to build with the following file.
You should be able to build the DLL with any version of the Visual Studio compiler regardless of the compiler used
to compile Python. Keep in mind that you shouldn’t allocate/deallocate memory across different debug/release and
single-threaded/multi-threaded runtimes or operate on FILE*s from different runtimes.
CXX = cl.exe
LINK = link.exe
all: foo.dll
foo.dll: foo.obj
$(LINK) $(LINKFLAGS) foo.obj /OUT:foo.dll
clean:
-erase /Q *.obj *.dll *.exp *.lib
SConstruct (GCC)
You can use the following file with SCons to build a shared library.
env = Environment()
env.Replace(CFLAGS=['-O2','-Wall','-ansi','-pedantic'])
env.SharedLibrary('foo', ['foo.cpp'])
foo.cpp
#include <stdio.h>
#ifdef FOO_DLL
#ifdef FOO_EXPORTS
#define FOO_API __declspec(dllexport)
#else
#define FOO_API __declspec(dllimport)
#endif /* FOO_EXPORTS */
#else
#define FOO_API extern /* XXX confirm this */
#endif /* FOO_DLL */
#ifdef __cplusplus
extern "C" {
#endif
2.4. Ctypes 43
SciPy Cookbook, Release
#ifdef __cplusplus
}
#endif
When building the DLL for foo on Windows, define ‘FOO_DLL‘ and ‘FOO_EXPORTS‘ (this is what you want to do
when building a DLL for use with ctypes). When linking against the DLL, define ‘FOO_DLL‘. When linking against
a static library that contains foo, or when including foo in an executable, don’t define anything.
If you’re unclear about what is for, read section 3 of the C++ dlopen mini HOWTO. This allows you to write function
wrappers with C linkage on top of a bunch of C++ classes so that you can use them with ctypes. Alternatively, you
might prefer to write C code.
foo.py
import numpy as N
import ctypes as C
_foo = N.ctypeslib.load_library('libfoo', '.')
_foo.bar.restype = C.c_int
_foo.bar.argtypes = [C.POINTER(C.c_double), C.c_int]
def bar(x):
return _foo.bar(x.ctypes.data_as(C.POINTER(C.c_double)), len(x))
x = N.random.randn(10)
n = bar(x)
In [18]: x = N.random.randn(2,3,4)
In [19]: x.ctypes.data
Out[19]: c_void_p(14394256)
In [21]: x.ctypes.data_as(ctypes.POINTER(c_double))
In [24]: x.ctypes.shape
Out[24]: <ctypes._endian.c_long_Array_3 object at 0x00DEF2B0>
In [25]: x.ctypes.shape[:3]
Out[25]: [2, 3, 4]
In [26]: x.ctypes.strides
Out[26]: <ctypes._endian.c_long_Array_3 object at 0x00DEF300>
In [27]: x.ctypes.strides[:3]
Out[27]: [96, 32, 8]
In general, a C function might take a pointer to the array’s data, an integer indicating the number of array dimensions,
(pass the value of the ndim property here) and two int pointers to the shapes and stride information.
If your C function assumes contiguous storage, you might want to wrap it with a Python function that calls !NumPy’s
‘ascontiguousarray‘ function on all the input arrays.
Starting with ctypes 0.9.9.9, any class implementing the from_param method can be used in the argtypes list of a
function. Before ctypes calls a C function, it uses the argtypes list to check each parameter.
Using !NumPy’s ndpointer function, some very useful argtypes classes can be constructed, for example:
Now, if an argument doesn’t meet the requirements, a !TypeError is raised. This allows one to make sure that arrays
passed to the C function is in a form that the function can handle.
See also the mailing list thread on ctypes and ndpointer.
ctypes supports the idea of callbacks, allowing C code to call back into Python through a function pointer. This is
possible because ctypes releases the Python Global Interpreter Lock (GIL) before calling the C function.
We can use this feature to allocate !NumPy arrays if and when we need a buffer for C code to operate on. This could
avoid having to copy data in certain cases. You also don’t have to worry about freeing the C data after you’re done
with it. By allocating your buffers as !NumPy arrays, the Python garbage collector can take care of this.
Python code:
from ctypes import *
ALLOCATOR = CFUNCTYPE(c_long, c_int, POINTER(c_int))
# load your library as lib
lib.baz.restype = None
lib.baz.argtypes = [c_float, c_int, ALLOCATOR]
This isn’t the prettiest way to define the allocator (I’m also not sure if c_long is the right return type), but there are a
few bugs in ctypes that seem to make this the only way at present. Eventually, we’d like to write the allocator like this
(but it doesn’t work yet):
from numpy.ctypeslib import ndpointer
ALLOCATOR = CFUNCTYPE(ndpointer('f4'), c_int, POINTER(c_int))
Possible failures include a !SystemError exception being raised, the interpreter crashing or the interpreter hanging.
Check these mailing list threads for more details: * Pointer-to-pointer unchanged when assigning in callback * Hang
with callback returning POINTER(c_float) * Error with callback function and as_parameter with NumPy ndpointer
Time for an example. The C code for the example:
#ifndef CSPKREC_H
#define CSPKREC_H
#ifdef FOO_DLL
#ifdef FOO_EXPORTS
#define FOO_API __declspec(dllexport)
#else
#define FOO_API __declspec(dllimport)
#endif
#else
#define FOO_API
#endif
2.4. Ctypes 45
SciPy Cookbook, Release
#endif
#include <stdio.h>
#ifdef __cplusplus
extern "C" {
#endif
#ifdef __cplusplus
}
#endif
Check the The Function Pointer Tutorials if you’re new to function pointers in C or C++. And the Python code:
allocated_arrays = []
def allocate(dim, shape):
print 'allocate called'
x = N.zeros(shape[:dim], 'f4')
allocated_arrays.append(x)
ptr = x.ctypes.data_as(c_void_p).value
print hex(ptr)
print 'allocate returning'
return ptr
lib = cdll['callback.dll']
lib.foo.restype = None
ALLOCATOR = CFUNCTYPE(c_long, c_int, POINTER(c_int))
lib.foo.argtypes = [ALLOCATOR]
print allocated_arrays[0]
The allocate function creates a new !NumPy array and puts it in a list so that we keep a reference to it after the callback
function returns. Expected output:
calling foo
foo calling allocator
allocate called
0xaf5778
allocate returning
allocator returned in foo
data = 0x00AF5778
foo returned
[[ 1. 2. 3.]
[ 2. 4. 6.]]
Here’s another idea for an Allocator class to manage this kind of thing. In addition to dimension and shape, this alloca-
tor function takes a char indicating what type of array to allocate. You can get these typecodes from the ndarrayobject.h
header, in the ‘NPY_TYPECHAR‘ enum.
class Allocator:
CFUNCTYPE = CFUNCTYPE(c_long, c_int, POINTER(c_int), c_char)
def __init__(self):
self.allocated_arrays = []
def getcfunc(self):
return self.CFUNCTYPE(self)
cfunc = property(getcfunc)
Corresponding C code:
None of the allocators presented above are thread safe. If you have multiple Python threads calling the C code that
invokes your callbacks, you will have to do something a bit smarter.
Suppose you have a C function like the following, which operates on a pointer-to-pointers data structure.
2.4. Ctypes 47
SciPy Cookbook, Release
You can create the necessary structure from an existing 2-D !NumPy array using the following code:
‘f4ptr*len(x)‘ creates a ctypes array type that is just large enough to contain a pointer to every row of the array.
Here’s a simple example when using heterogeneous dtypes (record arrays).
But, be warned that NumPy recarrays and corresponding structs in C *may not* be congruent.
Also structs are not standardized across platforms ...In other words, ‘’ be aware of padding issues!‘’
sample.c
#include <stdio.h>
SConstruct
env = Environment()
env.Replace(CFLAGS=['-O2','-Wall','-ansi','-pedantic'])
env.SharedLibrary('sample', ['sample.c'])
sample.py
import numpy as N
import ctypes as C
dat_dtype = N.dtype([('timestamp','i4'),('desc','|S12')])
arr = N.rec.fromrecords(dat,dtype=dat_dtype)
_sample = N.ctypeslib.load_library('libsample','.')
_sample.print_weather.restype = None
_sample.print_weather.argtypes = [N.ctypeslib.ndpointer(dat_dtype, flags='aligned,
˓→contiguous'), C.c_int]
def print_weather(x):
_sample.print_weather(x, x.size)
if __name__=='__main__':
print_weather(arr)
The following was tested and works on Windows (using MinGW) and GNU/Linux 32-bit OSs (last tested 13-08-2009).
Copy all three files to the same directory.
The C code (this calculates the Fibonacci number recursively):
/*
Filename: fibonacci.c
To be used with fibonacci.py, as an imported library. Use Scons to compile,
simply type 'scons' in the same directory as this file (see www.scons.org).
*/
/* Function prototypes */
int fib(int a);
void fibseries(int *a, int elements, int *series);
void fibmatrix(int *a, int rows, int columns, int *matrix);
int fib(int a)
{
if (a <= 0) /* Error -- wrong input will return -1. */
return -1;
else if (a==1)
return 0;
else if ((a==2)||(a==3))
return 1;
else
return fib(a - 2) + fib(a - 1);
}
2.4. Ctypes 49
SciPy Cookbook, Release
"""
Filename: fibonacci.py
Demonstrates the use of ctypes with three functions:
(1) fib(a)
(2) fibseries(b)
(3) fibmatrix(c)
"""
import numpy as nm
import ctypes as ct
def fib(a):
"""Compute the n'th Fibonacci number.
ARGUMENT(S):
An integer.
RESULT(S):
The n'th Fibonacci number.
EXAMPLE(S):
>>> fib(8)
13
>>> fib(23)
17711
>>> fib(0)
-1
"""
return _libfibonacci.fib(int(a))
def fibseries(b):
"""Compute an array containing the n'th Fibonacci number of each entry.
ARGUMENT(S):
A list or NumPy array (dim = 1) of integers.
RESULT(S):
NumPy array containing the n'th Fibonacci number of each entry.
EXAMPLE(S):
>>> fibseries([1,2,3,4,5,6,7,8])
array([ 0, 1, 1, 2, 3, 5, 8, 13])
>>> fibseries(range(1,12))
array([ 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55])
"""
b = nm.asarray(b, dtype=nm.intc)
result = nm.empty(len(b), dtype=nm.intc)
_libfibonacci.fibseries(b, len(b), result)
return result
def fibmatrix(c):
"""Compute a matrix containing the n'th Fibonacci number of each entry.
ARGUMENT(S):
A nested list or NumPy array (dim = 2) of integers.
RESULT(S):
NumPy array containing the n'th Fibonacci number of each entry.
EXAMPLE(S):
>>> from numpy import array
>>> fibmatrix([[3,4],[5,6]])
array([[1, 2],
[3, 5]])
>>> fibmatrix(array([[1,2,3],[4,5,6],[7,8,9]]))
array([[ 0, 1, 1],
[ 2, 3, 5],
[ 8, 13, 21]])
"""
tmp = nm.asarray(c)
rows, cols = tmp.shape
c = tmp.astype(nm.intc)
result = nm.empty(c.shape, dtype=nm.intc)
_libfibonacci.fibmatrix(c, rows, cols, result)
return result
env = Environment()
env.Replace(CFLAGS=['-O2', '-Wall', '-ansi', '-pedantic'])
env.SharedLibrary('libfibonacci', ['fibonacci.c'])
2.4. Ctypes 51
SciPy Cookbook, Release
F2py
This page provides examples on how to use the F2py fortran wrapping program.
It is possible to start with simple routines or even to wrap full Fortran modules. F2py is used in SciPy itself and you
can find some examples in the source code of SciPy.
* Then one has to download the fortran file for the lapack routine of interest.
* Generate library
Then modify
Create the module:
Simple test:
import hermite
hermite.vahepo(2,2.0)
import scipy
scipy.special.hermite(2)(2.0)
A more complicated example about how to wrap routines for band matrices can be found at http://www.physik.
tu-dresden.de/~baecker/comp_talks.html under “Python and Co - some recent developments”.
f2py is also capable of handling C code. An example is found on the wiki: [”Cookbook/f2py_and_NumPy”].
2.5. F2py 53
SciPy Cookbook, Release
http://websrv.cs.umt.edu/isis/index.php/F2py_example
Section author: DavidCooke, mauro, Pierre_deBuyl
Python and Numpy are designed to express general statements that work transparently on many sizes of incoming
data. Using inline Weave with Blitz conversion can dramatically speed up many numerical operations (eg, addition of
a series of arrays) because in some ways it bypasses generality. How can you speed up your algorithms with inline C
code while maintaining generality? One tool provided by Numpy is the iterator. Because an iterator keeps track of
memory indexing for you, its operation is very analogous to the concept of iteration in Python itself. You can write
loop in C that simply says “take the next element in a serial object–the !PyArrayObject–and operate on it, until there
are no more elements.”
This is a very simple example of multi dimensional iterators, and their power to “broadcast” arrays of compatible
shapes. It shows that the very same code that is entirely ignorant of dimensionality can achieve completely different
computations based on the rules of broadcasting. I have assumed in this case that *a* has at least as many dimensions
as *b*. It is important to know that the weave array conversion of *a* gives you access in C++ to: *py_a – !PyObject
* *a_array – !PyArrayObject * *a – (c-type *) py_array->data
def multi_iter_example():
a = npy.ones((4,4), npy.float64)
# for the sake of driving home the "dynamic code" approach...
dtype2ctype = {
npy.dtype(npy.float64): 'double',
npy.dtype(npy.float32): 'float',
npy.dtype(npy.int32): 'int',
npy.dtype(npy.int16): 'short',
}
dt = dtype2ctype.get(a.dtype)
b = npy.arange(4, dtype=a.dtype)
print '\n A B '
print a, b
# this reshaping is redundant, it would be the default broadcast
b.shape = (1,4)
inline(code, ['a', 'b'])
print "\ninline version of a*b[None,:],"
print a
a = npy.ones((4,4), npy.float64)
b = npy.arange(4, dtype=a.dtype)
b.shape = (4,1)
inline(code, ['a', 'b'])
print "\ninline version of a*b[:,None],"
print a
The docstring for inline is enormous, and indicates that all kinds of compiling options are supported when integrating
your inline code. I’ve taken advantage of this to make some specialized FFTW calls a lot more simple, and in only
a few additional lines add support for inline FFTs. In this example, I read in a file of pure C code and use it as
*support_code* in my inline statement. I also use a tool from Numpy’s distutils to locate my FFTW libraries and
headers.
import numpy as N
from scipy.weave import inline
from os.path import join, split
from numpy.distutils.system_info import get_info
fft1_code = \
"""
char *i, *o;
i = (char *) a;
o = inplace ? i : (char *) b;
if(isfloat) {
cfft1d(reinterpret_cast<fftwf_complex*>(i),
reinterpret_cast<fftwf_complex*>(o),
xdim, len_array, direction, shift);
} else {
zfft1d(reinterpret_cast<fftw_complex*>(i),
reinterpret_cast<fftw_complex*>(o),
xdim, len_array, direction, shift);
}
"""
extra_code = open(join(split(__file__)[0],'src/cmplx_fft.c')).read()
fftw_info = get_info('fftw3')
Attachments
• fftmod.tar.gz
• iterators.py
• iterators_example.py
Please note that the code described here is slightly out of date, since today cython is the actively maintained version
of Pyrex, and numpy now ships with Cython examples.
Rather than maintaining both the wiki and the source dir, we’ll continue to update the sources, kept here.
Pyrex is a language for writing C extensions to Python. Its syntax is very similar to writing Python. A file is compiled
to a file, which is then compiled like a standard C extension module for Python. Many people find writing extension
modules with Pyrex preferable to writing them in C or using other tools, such as SWIG.
This page is a starting point for accessing numpy arrays natively with Pyrex. Please note that with current versions of
NumPy (SVN), the directory contains a complete working example with the code in this page, including also a proper
file so you can install it with the standard Python mechanisms. This should help you get up and running quickly.
Here’s a file I call “c_python.pxd”:
cimport c_python
cimport c_numpy
cimport c_python
import numpy
c_numpy.import_array()
print '-='*10
print 'printing array info for ndarray at 0x%0lx'%(<c_python.Py_intptr_t>arr,)
print 'print number of dimensions:',arr.nd
print 'address of strides: 0x%0lx'%(<c_python.Py_intptr_t>arr.strides,)
print 'strides:'
for i from 0<=i<arr.nd:
# print each stride
print ' stride %d:'%i,<c_python.Py_intptr_t>arr.strides[i]
print 'memory dump:'
print_elements( arr.data, arr.strides, arr.dimensions, arr.nd, sizeof(double),
˓→arr.dtype )
print '-='*10
print
if nd ==0:
if dtype==numpy.dtype(numpy.object_):
elptr = (<void**>data)[0] #[0] dereferences pointer in Pyrex
print ' ',<object>elptr
elif dtype==numpy.dtype(numpy.float64):
print ' ',(<double*>data)[0]
elif nd == 1:
for i from 0<=i<dimensions[0]:
if dtype==numpy.dtype(numpy.object_):
elptr = (<void**>data)[0]
print ' ',<object>elptr
elif dtype==numpy.dtype(numpy.float64):
print ' ',(<double*>data)[0]
data = data + strides[0]
else:
for i from 0<=i<dimensions[0]:
print_elements(data, strides+1, dimensions+1, nd-1, elsize, dtype)
data = data + strides[0]
def test():
"""this function is pure Python"""
arr1 = numpy.array(-1e-30,dtype=numpy.Float64)
arr2 = numpy.array([1.0,2.0,3.0],dtype=numpy.Float64)
arr3 = numpy.arange(9,dtype=numpy.Float64)
arr3.shape = 3,3
four = 4
arr4 = numpy.array(['one','two',3,four],dtype=numpy.object_)
Now, if you compile and install the above test.pyx, the output of should be something like the following:
-=-=-=-=-=-=-=-=-=-=
printing array info for ndarray at 0x8184508
print number of dimensions: 0
address of strides: 0xb764f7ec
strides:
memory dump:
-1e-30
-=-=-=-=-=-=-=-=-=-=
-=-=-=-=-=-=-=-=-=-=
printing array info for ndarray at 0x8190060
print number of dimensions: 1
address of strides: 0x818453c
strides:
stride 0: 8
memory dump:
1.0
2.0
3.0
-=-=-=-=-=-=-=-=-=-=
-=-=-=-=-=-=-=-=-=-=
printing array info for ndarray at 0x82698a0
print number of dimensions: 2
address of strides: 0x8190098
strides:
stride 0: 24
stride 1: 8
memory dump:
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
-=-=-=-=-=-=-=-=-=-=
-=-=-=-=-=-=-=-=-=-=
printing array info for ndarray at 0x821d6e0
print number of dimensions: 1
address of strides: 0x818ed74
strides:
stride 0: 4
memory dump:
one
two
3
4
-=-=-=-=-=-=-=-=-=-=
-=-=-=-=-=-=-=-=-=-=
printing array info for ndarray at 0x821d728
print number of dimensions: 1
address of strides: 0x821d75c
strides:
stride 0: 4
memory dump:
print_elements() not (yet) implemented for dtype int32
-=-=-=-=-=-=-=-=-=-=
The [http://pytables.sourceforge.net/ pytables project] makes extensive use of Pyrex and numarray. See the pytables
source code for more ideas.
= See Also = [”Cookbook/ArrayStruct_and_Pyrex”]
Section author: AndrewStraw, DavidLinke, Unknown[138], Unknown[25], FernandoPerez
These are simple !NumPy and SWIG examples which use the numpy.i interface file. There is also a MinGW section
for people who may want to use these in a Win32 environment. The following code is C, rather than C++.
The information contained here was first made available by Bill Spotz in his article numpy.i: a SWIG Interface File
for !NumPy, and the !NumPy SVN which can be checked out using the following command:
* These two files (and others) are also available in the numpy source tarball: `4
<http://sourceforge.net/project/showfiles.php?group_id=1369&package_id=175103>‘__
Check that both gcc and SWIG are available (paths known):
swig -version
and
gcc -v
This is from my own tests, running SWIG Version 1.3.36 and gcc version 3.4.5 (mingw-vista special r3). I had to
remove the ‘static’ statements from the source, otherwise your SWIGed sources won’t compile. There are only two
‘static’ statements in the file, both will need removing. Here is my modified version: pyfragments.swg
A setup.py file specific to each module must be written first. I based mine on the reference setup.py available in
http://scipy.org/svn/numpy/trunk/doc/swig/test/ with added automatic handling of swig.
On a un*x like system, the command-line is:
In a Win32 environment (either cygwin or cmd), the setup command-line is (for use with MinGW):
The command handles both the SWIG process (generation of wrapper C and Python code) and gcc compilation. The
resulting module (a pyd file) is built in the ‘build\lib.XXX‘ directory (e.g. for a Python 2.5 install and on a Win32
machine, the ‘build\lib.win32-2.5‘ directory).
This is a re-implementation of the range function. The module is called ezrange. One thing to remember with ‘AR-
GOUT_ARRAY1‘ is that the dimension of the array must be passed from Python.
From Bill Spotz’s article: The python user does not pass these arrays in, they simply get returned. For the case where
a dimension is specified, the python user must provide that dimension as an argument.
This is useful for functions like ‘numpy.arange(N)‘, for which the size of the returned array is known in advance and
passed to the C function.
For functions that follow ‘array_out = function(array_in)‘ where the size of array_out is not known in advance and
depends on memory allocated in C, see the example given in [:Cookbook/SWIG Memory Deallocation].
%module ezrange
%{
#define SWIG_FILE_WITH_INIT
#include "ezrange.h"
%}
%include "numpy.i"
%init %{
import_array();
%}
%include "ezrange.h"
Don’t forget that you will also need the numpy.i file in the same directory.
#! /usr/bin/env python
# System imports
from distutils.core import *
from distutils import sysconfig
# Obtain the numpy include directory. This logic works across numpy versions.
try:
numpy_include = numpy.get_include()
except AttributeError:
numpy_include = numpy.get_numpy_include()
# ezrange setup
setup( name = "range function",
description = "range takes an integer and returns an n element int array
˓→where each element is equal to its index",
or
If everything goes according to plan, there should be a ‘_ezrange.pyd‘ file available in the ‘build\lib.XXX‘ directory.
You will need to copy the file in the directory where the ‘ezrange.py‘ file is (generated by swig), in which case, the
following will work (in python):
This example doubles the elements of the 1-D array passed to it. The operation is done in-place, which means that the
array passed to the function is changed.
%{
#define SWIG_FILE_WITH_INIT
#include "inplace.h"
%}
%include "numpy.i"
%init %{
import_array();
%}
# System imports
from distutils.core import *
from distutils import sysconfig
# Obtain the numpy include directory. This logic works across numpy versions.
try:
numpy_include = numpy.get_include()
except AttributeError:
numpy_include = numpy.get_numpy_include()
# NumyTypemapTests setup
setup( name = "inplace function",
description = "inplace takes a double array and doubles each of its elements
˓→in-place.",
or
If everything goes according to plan, there should be a ‘_inplace.pyd‘ file available in the ‘build\lib.XXX‘ directory.
You will need to copy the file in the directory where the ‘inplace.py‘ file is (generated by swig), in which case, the
following will work (in python):
Please note, Bill Spotz advises against the use of argout_view arrays, unless absolutely necessary:
Argoutview arrays are for when your C code provides you with a view of its internal data a
This can be dangerous. There is almost no way to guarantee that the internal data from the
NumPy array that encapsulates it. If the user destroys the object that provides the view of
NumPy array, then using that array my result in bad memory references or segmentation fault
Nevertheless, there are situations, working with large data sets,
where you simply have no other choice.
Python does not take care of memory de-allocation, as stated here by Travis Oliphant: 1
The tricky part, however, is memory management. How does the memory get deallocated?
The suggestions have always been something similar to “make sure the memory doesn’t get dea
NumPy array disappears.” This is nice advice, but not generally helpful as it basically jus
Memory deallocation is also difficult to handle automatically as there is no easy way to do module “finalization”.
There is a ‘Py_InitModule()‘ function, but nothing to handle deletion/destruction/finalization (this will be addressed
in Python 3000 as stated in PEP3121. In my example, I use the python module atexit but there must be a better way.
Having said all that, if you have no other choice, here is an example that uses ARGOUTVIEW_ARRAY1. As usual,
comments welcome!
The module declares a block of memory and a couple of functions: * ezview.set_ones() sets all the elements (doubles)
in the memory block to one and returns a numpy array that is a VIEW of the memory block. * ezview.get_view()
simply returns a view of the memory block. * ezview.finalize() takes care of the memory deallocation (this is the weak
part of this example).
#include "ezview.h"
if (array == NULL)
return;
for (i=0;i<n;i++)
array[i] = 1.;
}
%{
#define SWIG_FILE_WITH_INIT
#include "ezview.h"
void __call_at_begining()
{
printf("__call_at_begining...\n");
my_array = (double *)malloc(my_n*sizeof(double));
}
void __call_at_end(void)
{
printf("__call_at_end...\n");
if (my_array != NULL)
free(my_array);
}
%}
%include "numpy.i"
%init %{
import_array();
__call_at_begining();
%}
%include "ezview.h"
%rename (set_ones) my_set_ones;
%inline %{
void finalize(void){
__call_at_end();
}
Don’t forget that you will also need the numpy.i file in the same directory.
# System imports
from distutils.core import *
from distutils import sysconfig
# Obtain the numpy include directory. This logic works across numpy versions.
try:
numpy_include = numpy.get_include()
except AttributeError:
numpy_include = numpy.get_numpy_include()
# NumyTypemapTests setup
setup( name = "ezview module",
description = "ezview provides 3 functions: set_ones(), get_view() and
˓→finalize(). set_ones() and get_view() provide a view on a memory block allocated in
or
python setup.py build --compiler=mingw32
If everything goes according to plan, there should be a ‘_ezview.pyd‘ file available in the ‘build\lib.XXX‘ directory.
You will need to copy the file in the directory where the ‘ezview.py‘ file is (generated by swig), in which case, the
following will work (in python):
The test code test_ezview.py follows:
import atexit
import numpy
print "first message is from __call_at_begining()"
import ezview
a = ezview.set_ones()
print "\ncalling ezview.set_ones() - now the memory block is all ones.\nReturned
˓→array (a view on the allocated memory block) is:"
print a
a[:] = numpy.arange(a.shape[0])
print a
print "\nwe're now deleting the array - this only deletes the view,\nnot the
˓→allocated memory!"
del a
print "\nlet's get a new view on the allocated memory, should STILL contain [0,1,2,3..
˓→.]"
b = ezview.get_view()
print b
print "\nnext message from __call_at_end() - finalize() registered via module atexit"
let s get a new view on the allocated memory, should STILL contain [0,1,2,3...]
[ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
I have been testing this for a few months now and this is the best I’ve come-up with. If anyone knows of a better way,
please let me know.
From the opengroup website, the lvalue errno is used by many functions to return error values. The idea is that the
global variable errno is set to 0 before a function is called (in swig parlance: $action), and checked afterwards. If errno
is non-zero, a python exception with a meaningful message is generated, depending on the value of errno.
The following example comprises two examples: First example uses errno when checking whether an array index is
valid. Second example uses errno to notify the user of a malloc() problem.
#include <stdlib.h>
#include <errno.h>
#include "ezerr.h"
value = array[index];
end:
return value;
}
end:
return;
}
%module ezerr
%{
#include <errno.h>
#include "ezerr.h"
#define SWIG_FILE_WITH_INIT
%}
%include "numpy.i"
%init %{
import_array();
%}
%exception
{
errno = 0;
$action
if (errno != 0)
{
switch(errno)
{
case EPERM:
PyErr_Format(PyExc_IndexError, "Index out of range");
break;
case ENOMEM:
PyErr_Format(PyExc_MemoryError, "Failed malloc()");
break;
default:
PyErr_Format(PyExc_Exception, "Unknown exception");
}
SWIG_fail;
}
}
%include "ezerr.h"
Note the SWIG_fail, which is a macro for goto fail in case there is any other cleanup code to execute (thanks Bill!).
Don’t forget that you will also need the numpy.i file in the same directory.
# System imports
from distutils.core import *
from distutils import sysconfig
# Obtain the numpy include directory. This logic works across numpy versions.
try:
numpy_include = numpy.get_include()
except AttributeError:
numpy_include = numpy.get_numpy_include()
extra_compile_args = ["--verbose"]
)
# NumyTypemapTests setup
setup( name = "err test",
description = "A simple test to demonstrate the use of errno and python
˓→exceptions",
or
If everything goes according to plan, there should be a ‘_ezerr.pyd‘ file available in the ‘build\lib.XXX‘ directory. You
will need to copy the file in the directory where the ‘ezerr.py‘ file is (generated by swig), in which case, the following
will work (in python):
The test code test_err.py follows:
import traceback,sys
import numpy
import ezerr
for i in indexes:
try:
value = ezerr.val(a,i)
except:
print ">> failed for index=%d" % i
traceback.print_exc(file=sys.stdout)
else:
print "using ezerr.val() a[%d]=%d - should be %d" % (i,value,a[i])
for n in amounts:
try:
ezerr.alloc(n)
except:
print ">> could not allocate %d bytes" % n
traceback.print_exc(file=sys.stdout)
else:
print "allocated (and deallocated) %d bytes" % n
The last example given in Bill Spotz’s artice is for a dot product function. Here is a fleshed-out version.
d = 0;
for(i=0;i<len;i++)
d += vec1[i]*vec2[i];
return d;
}
%{
#define SWIG_FILE_WITH_INIT
#include "dot.h"
%}
%include "numpy.i"
%init %{
import_array();
%}
%apply (int DIM1, double* IN_ARRAY1) {(int len1, double* vec1), (int len2, double*
˓→vec2)}
%include "dot.h"
%rename (dot) my_dot;
%inline %{
double my_dot(int len1, double* vec1, int len2, double* vec2) {
if (len1 != len2) {
PyErr_Format(PyExc_ValueError, "Arrays of lengths (%d,%d) given", len1, len2);
return 0.0;
}
return dot(len1, vec1, vec2);
}
%}
#! /usr/bin/env python
# System imports
from distutils.core import *
from distutils import sysconfig
# Obtain the numpy include directory. This logic works across numpy versions.
try:
numpy_include = numpy.get_include()
except AttributeError:
numpy_include = numpy.get_numpy_include()
# dot setup
setup( name = "Dot product",
description = "Function that performs a dot product (numpy.i: a SWIG
˓→Interface File for NumPy)",
author = "Egor Zindy (based on the setup.py file available in the numpy
˓→ tree)",
version = "1.0",
ext_modules = [_dot]
)
or
Testing
If everything goes according to plan, there should be a ‘_dot.pyd‘ file available in the ‘build\lib.XXX‘ directory. You
will need to copy the file in the directory where the ‘dot.py‘ file is (generated by swig), in which case, the following
will work (in python):
Please note that with current versions of NumPy (SVN), the directory contains a complete working example with
simple SWIG typemaps, including also a proper file so you can install it with the standard Python mechanisms. This
should help you get up and running quickly.
To get the feel how to write a truly minimalist interface, below is a relevant part of the simple SWIG interface file
umfpack.i (this is for SWIG < version 1.3.29) used to wrap the UMFPACK sparse linear solver libraries. The full
interface can be found in the directory in the SciPy SVN repository. If you’re using SWIG > version 1.3.29, refer to
the file in SciPy SVN repository, which is slightly different.
/*!
Gets PyArrayObject from a PyObject.
*/
PyArrayObject *helper_getCArrayObject( PyObject *input, int type,
int minDim, int maxDim ) {
PyArrayObject *obj;
if (PyArray_Check( input )) {
obj = (PyArrayObject *) input;
if (!PyArray_ISCARRAY( obj )) {
PyErr_SetString( PyExc_TypeError, "not a C array" );
return NULL;
}
obj = (PyArrayObject *)
PyArray_ContiguousFromAny( input, type, minDim, maxDim );
if (!obj) return NULL;
} else {
PyErr_SetString( PyExc_TypeError, "not an array" );
return NULL;
}
return obj;
}
%}
/*!
Use for arrays as input arguments. Could be also used for changing an array
in place.
Attachments
• umfpack.i
Recipe description
This cookbook recipe describes the automatic deallocation of memory blocks allocated via ‘malloc()‘ calls in C, when
the corresponding Python numpy array objects are destroyed. The recipe uses SWIG and a modified ‘numpy.i‘ helper
file.
To be more specific, new fragments were added to the existing ‘numpy.i‘ to handle automatic deallocation of arrays,
the size of which is not known in advance. As with the original fragments, a block of ‘malloc()‘ memory can be
converted into a returned numpy python object via a call to ‘PyArray_SimpleNewFromData()‘. However, the returned
python object is created using ‘PyCObject_FromVoidPtr()‘, which ensures that the allocated memory is automatically
disposed of when the Python object is destroyed. Examples below show how using these new fragments avoids leaking
memory.
Since the new fragments are based on the ‘_ARGOUTVIEW_‘ ones, the name ‘_ARGOUTVIEWM_‘ was chosen,
where ‘M‘ stands for managed. All managed fragments (ARRAY1, 2 and 3, FARRAY1, 2 and 3) were implemented,
and have now been extensively tested.
At the moment, the modified numpy.i file is available here (last updated 2012-04-22): * http://ezwidgets.googlecode.
com/svn/trunk/numpy/numpy.i * http://ezwidgets.googlecode.com/svn/trunk/numpy/pyfragments.swg
The original memory deallocation code was written by Travis Oliphant (see http://blog.enthought.com/?p=62 ) and as
far as I know, these clever people were the first ones to use it in a swig file (see http://niftilib.sourceforge.net/pynifti,
file nifticlib.i). Lisandro Dalcin then pointed out a simplified implementation using CObjects, which Travis details in
this updated blog post.
Important steps
In yourfile.i, the %init function uses the same ‘import_array()‘ call you already know:
%init %{
import_array();
%}
... then just use ARGOUTVIEWM_ARRAY1 instead of ARGOUTVIEW_ARRAY1 and memory deallocation is
handled automatically when the python array is destroyed (see examples below).
The SWIG-wrapped function in C creates an N integers array, using ‘malloc()‘ to allocate memory. From python, this
function is repetitively called and the array created destroyed (M times).
Using the ARGOUTVIEW_ARRAY1 provided in numpy.i, this will create memory leaks (I know AR-
GOUTVIEW_ARRAY1 has not been designed for this purpose but it’s tempting!).
Using the ARGOUTVIEWM_ARRAY1 fragment instead, the memory allocated with ‘malloc()‘ will be automatically
deallocated when the array is deleted.
The python test program creates and deletes a 1024^2 ints array 2048 times using both ARGOUTVIEW_ARRAY1
and ARGOUTVIEWM_ARRAY1 and when memory allocation fails, an exception is generated in C and caught in
Python, showing which iteration finally caused the allocation to fail.
#include <stdio.h>
#include <errno.h>
#include "ezalloc.h"
if (temp == NULL)
errno = ENOMEM;
The file (available here: ezalloc.i) does a couple of interesting things: * Like I said in the introduction, calling the
‘import_array()‘ function in the ‘%init‘ section now also initialises the memory deallocation code. There is nothing
else to add here. * An exception is generated if memory allocation fails. After a few iterations of the code construct,
using ‘errno‘ and ‘SWIG_fail‘ is the simplest I’ve come-up with. * In this example, two inline functions are created,
one using ARGOUTVIEW_ARRAY1 and the other ARGOUTVIEWM_ARRAY1. Both function use the ‘alloc()‘
function (see ezalloc.h and ezalloc.c).
%module ezalloc
%{
#include <errno.h>
#include "ezalloc.h"
#define SWIG_FILE_WITH_INIT
%}
%include "numpy.i"
%init %{
import_array();
%}
%include "ezalloc.h"
%exception
{
errno = 0;
$action
if (errno != 0)
{
switch(errno)
{
case ENOMEM:
PyErr_Format(PyExc_MemoryError, "Failed malloc()");
break;
default:
PyErr_Format(PyExc_Exception, "Unknown exception");
}
SWIG_fail;
}
}
%inline %{
%}
Don’t forget that you will need the numpy.i file in the same directory for this to compile.
#! /usr/bin/env python
# System imports
from distutils.core import *
from distutils import sysconfig
# Obtain the numpy include directory. This logic works across numpy versions.
try:
numpy_include = numpy.get_include()
except AttributeError:
numpy_include = numpy.get_numpy_include()
extra_compile_args = ["--verbose"]
)
# NumyTypemapTests setup
setup( name = "alloc functions",
description = "Testing managed arrays",
author = "Egor Zindy",
version = "1.0",
ext_modules = [_ezalloc]
)
or in UN*X, simply
$> python setup_alloc.py build
If everything goes according to plan, there should be a ‘_ezalloc.pyd‘ file available in the ‘build\lib.XXX‘ directory.
The file needs to be copied in the directory with the ‘ezalloc.py‘ file (generated by swig).
A python test program is provided in the SVN repository (test_alloc.py) and reproduced below:
import ezalloc
n = 2048
err = 0
if err == 0:
print "Done!\n"
for i in range(n):
try:
#allocating some memory
a = ezalloc.alloc_leaking(m)
#deleting the array
del a
except:
err = 1
print "Step %d failed" % i
break
if err == 0:
print "Done? Increase n!\n"
Then, a
The unmanaged array leaks memory every time the array view is deleted. The managed one will delete the memory
block seamlessly. This was tested both in Windows XP and Linux.
The following examples shows how to return a two-dimensional array from C which also benefits from the automatic
memory deallocation.
A naive “crop” function is wrapped using SWIG/numpy.i and returns a slice of the input array. When used as ‘ar-
ray_out = crop.crop(array_in, d1_0,d1_1, d2_0,d2_1)‘, it is equivalent to the native numpy slicing ‘array_out = ar-
ray_in[d1_0:d1_1, d2_0:d2_1]‘.
void crop(int *arr_in, int dim1, int dim2, int d1_0, int d1_1, int d2_0, int d2_1,
˓→int **arr_out, int *dim1_out, int *dim2_out);
#include <stdlib.h>
#include <errno.h>
#include "crop.h"
void crop(int *arr_in, int dim1, int dim2, int d1_0, int d1_1, int d2_0, int d2_1,
˓→int **arr_out, int *dim1_out, int *dim2_out)
{
int *arr=NULL;
int dim1_o=0;
int dim2_o=0;
int i,j;
//value checks
if ((d1_1 < d1_0) || (d2_1 < d2_0) ||
(d1_0 >= dim1) || (d1_1 >= dim1) || (d1_0 < 0) || (d1_1 < 0) ||
(d2_0 >= dim2) || (d2_1 >= dim2) || (d2_0 < 0) || (d2_1 < 0))
{
errno = EPERM;
goto end;
}
//output sizes
dim1_o = d1_1-d1_0;
dim2_o = d2_1-d2_0;
//memory allocation
arr = (int *)malloc(dim1_o*dim2_o*sizeof(int));
if (arr == NULL)
{
errno = ENOMEM;
goto end;
}
end:
*dim1_out = dim1_o;
*dim2_out = dim2_o;
*arr_out = arr;
}
The file (available here: crop.i) does a couple of interesting things: * The array dimensions DIM1 and DIM2 are in the
same order as array.shape on the Python side. In a row major array definition for an image, DIM1 would the number of
rows and DIM2 the number of columns. * Using the errno library, An exception is generated when memory allocation
fails (ENOMEM) or when a problem occurs with the indexes (EPERM).
%module crop
%{
#include <errno.h>
#include "crop.h"
#define SWIG_FILE_WITH_INIT
%}
%include "numpy.i"
%init %{
import_array();
%}
%exception crop
{
errno = 0;
$action
if (errno != 0)
{
switch(errno)
{
case EPERM:
PyErr_Format(PyExc_IndexError, "Index error");
break;
case ENOMEM:
PyErr_Format(PyExc_MemoryError, "Not enough memory");
break;
default:
PyErr_Format(PyExc_Exception, "Unknown exception");
}
SWIG_fail;
}
}
%apply (int* IN_ARRAY2, int DIM1, int DIM2) {(int *arr_in, int dim1, int dim2)}
%apply (int** ARGOUTVIEWM_ARRAY2, int* DIM1, int* DIM2) {(int **arr_out, int *dim1_
˓→out, int *dim2_out)}
%include "crop.h"
Don’t forget that you will need the numpy.i file in the same directory for this to compile.
#! /usr/bin/env python
# System imports
from distutils.core import *
from distutils import sysconfig
# Obtain the numpy include directory. This logic works across numpy versions.
try:
numpy_include = numpy.get_include()
except AttributeError:
numpy_include = numpy.get_numpy_include()
extra_compile_args = ["--verbose"]
)
# NumyTypemapTests setup
setup( name = "crop test",
description = "A simple crop test to demonstrate the use of ARGOUTVIEWM_ARRAY2
˓→",
If everything goes according to plan, there should be a ‘_crop.pyd‘ file available in the ‘build\lib.XXX‘ directory. The
file needs to be copied in the directory with the ‘crop.py‘ file (generated by swig).
A python test program is provided in the SVN repository (test_crop.py) and reproduced below:
import crop
import numpy
a = numpy.zeros((5,10),numpy.int)
a[numpy.arange(5),:] = numpy.arange(10)
b = numpy.transpose([(10 ** numpy.arange(5))])
a = (a*b)[:,1:] #this array is most likely NOT contiguous
print a
print "dim1=%d dim2=%d" % (a.shape[0],a.shape[1])
d1_0 = 2
d1_1 = 4
d2_0 = 1
d2_1 = 5
returned array:
[[ 200 300 400 500]
[2000 3000 4000 5000]]
native slicing:
[[ 200 300 400 500]
[2000 3000 4000 5000]]
numpy.i takes care of making the array contiguous if needed, so the only thing left to take care of is the array orienta-
tion.
That’s all folks! Files are available on the [http://code.google.com/p/ezwidgets/source/browse/#svn/trunk/numpy
Google code SVN]. As usual, comments welcome!
Regards, Egor
Section author: EgorZindy
While initially f2py was developed for wrapping Fortran codes for Python, it can be easily used for wrapping C codes
as well. Signature files describing the interface to wrapped functions must be created manually and the functions and
their arguments must have the attribute . See f2py UsersGuide for more information about the syntax of signature files.
Here follows as simple C code
/* File foo.c */
void foo(int n, double *x, double *y) {
int i;
for (i=0;i<n;i++) {
y[i] = x[i] + i;
}
}
! File m.pyf
python module m
interface
subroutine foo(n,x,y)
intent(c) foo !foo is a C function
intent(c) !all foo arguments are
!considered as C based
integer intent(hide), depend(x):: n=len(x) ! n is the length
! of input array x
double precision intent(in) :: x(n) ! x is input array
! (or arbitrary sequence)
double precision intent(out) :: y(n) ! y is output array,
! see code in foo.c
end subroutine foo
end interface
end python module m
# File setup.py
def configuration(parent_package='',top_path=None):
from numpy.distutils.misc_util import Configuration
config = Configuration('',parent_package,top_path)
config.add_extension('m',
sources = ['m.pyf','foo.c'])
return config
if __name__ == "__main__":
from numpy.distutils.core import setup
setup(**configuration(top_path='').todict())
and execute:
python setup.py build_src build_ext --inplace
Or one can call f2py directly in command line to build the wrapper as follows:
f2py m.pyf foo.c -c
In both cases an extension module will be created to current directory that can be imported to python:
>>> import m
>>> print m.__doc__
This module 'm' is auto-generated with f2py (version:2_2130).
Functions:
y = foo(x)
.
>>> print m.foo.__doc__
foo - Function signature:
y = foo(x)
Required arguments:
x : input rank-1 array('d') with bounds (n)
Return objects:
y : rank-1 array('d') with bounds (n)
Interpolation
Interpolation
Example showing how to use B-splines in scipy.signal to do interpolation. The input points must be equally spaced to
use these routine.
x = r_[0:10]
dx = x[1]-x[0]
newx = r_[-3:13:0.1] # notice outside the original domain
y = sin(x)
cj = cspline1d(y)
newy = cspline1d_eval(cj, newx, dx=dx,x0=x[0])
from pylab import plot, show
plot(newx, newy, x, y, 'o')
show()
87
SciPy Cookbook, Release
The scipy.ndimage package also contains spline_filter and map_coordinates which can be used to perform N-
dimensional interpolation for equally-spaced data. A two-dimensional example is given below:
x,y = ogrid[-1:1:5j,-1:1:5j]
fvals = sin(x)*sin(y)
newx,newy = mgrid[-1:1:100j,-1:1:100j]
x0 = x[0,0]
y0 = y[0,0]
dx = x[1,0] - x0
dy = y[0,1] - y0
ivals = (newx - x0)/dx
jvals = (newy - y0)/dy
coords = array([ivals, jvals])
newf1 = ndimage.map_coordinates(fvals, coords)
To pre-compute the weights (for multiple interpolation results), you would use
coeffs = ndimage.spline_filter(fvals)
newf2 = ndimage.map_coordinates(coeffs, coords, prefilter=False)
plt.subplot(1,2,1)
plt.imshow(newf1)
plt.subplot(1,2,2)
plt.imshow(newf2)
plt.show()
88 Chapter 3. Interpolation
SciPy Cookbook, Release
The scipy.interpolate packages wraps the netlib FITPACK routines (Dierckx) for calculating smoothing splines for
various kinds of data and geometries. Although the data is evenly spaced in this example, it need not be so to use this
routine.
x = sin(t)
y = cos(t)
z = t
# add noise
x+= random.normal(scale=0.1, size=x.shape)
y+= random.normal(scale=0.1, size=y.shape)
z+= random.normal(scale=0.1, size=z.shape)
# spline parameters
s=3.0 # smoothness parameter
k=2 # spline order
nest=-1 # estimate of number of knots needed (-1 = maximal)
import pylab
pylab.subplot(2,2,1)
data,=pylab.plot(x,y,'bo-',label='data')
fit,=pylab.plot(xnew,ynew,'r-',label='fit')
pylab.legend()
pylab.xlabel('x')
pylab.ylabel('y')
pylab.subplot(2,2,2)
data,=pylab.plot(x,z,'bo-',label='data')
fit,=pylab.plot(xnew,znew,'r-',label='fit')
3.1. Interpolation 89
SciPy Cookbook, Release
pylab.legend()
pylab.xlabel('x')
pylab.ylabel('z')
pylab.subplot(2,2,3)
data,=pylab.plot(y,z,'bo-',label='data')
fit,=pylab.plot(ynew,znew,'r-',label='fit')
pylab.legend()
pylab.xlabel('y')
pylab.ylabel('z')
plt.show()
Radial basis functions can be used for smoothing/interpolating scattered data in n-dimensions, but should be used with
caution for extrapolation outside of the observed data range.
This example compares the usage of the Rbf and UnivariateSpline classes from the scipy.interpolate module.
import numpy as np
from scipy.interpolate import Rbf, InterpolatedUnivariateSpline
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
# setup data
x = np.linspace(0, 10, 9)
y = np.sin(x)
90 Chapter 3. Interpolation
SciPy Cookbook, Release
plt.subplot(2, 1, 1)
plt.plot(x, y, 'bo')
plt.plot(xi, yi, 'g')
plt.plot(xi, np.sin(xi), 'r')
plt.title('Interpolation using univariate spline')
plt.subplot(2, 1, 2)
plt.plot(x, y, 'bo')
plt.plot(xi, yi, 'g')
plt.plot(xi, np.sin(xi), 'r')
plt.title('Interpolation using RBF - multiquadrics')
plt.savefig('rbf1d.png')
[](files/attachments/RadialBasisFunctions/rbf1dnew.png
This example shows how to interpolate scattered 2d data.
import numpy as np
from scipy.interpolate import Rbf
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
from matplotlib import cm
# use RBF
rbf = Rbf(x, y, z, epsilon=2)
ZI = rbf(XI, YI)
Attachments
• rbf1d.png
• rbf1dnew.png
• rbf2d.png
• rbf2dnew.png
92 Chapter 3. Interpolation
CHAPTER 4
Linear Algebra
The following module, rank_nullspace.py, provides the functions rank() and nullspace(). (Note that !NumPy already
provides the function matrix_rank(); the function given here allows an absolute tolerance to be specified along with a
relative tolerance.)
rank_nullspace.py
#!python
import numpy as np
from numpy.linalg import svd
A : ndarray
A should be at most 2-D. A 1-D array with length n will be treated
as a 2-D with shape (1, n)
atol : float
The absolute tolerance for a zero singular value. Singular values
smaller than `atol` are considered to be zero.
rtol : float
The relative tolerance. Singular values less than rtol*smax are
considered to be zero, where smax is the largest singular value.
If both `atol` and `rtol` are positive, the combined tolerance is the
maximum of the two; that is::
tol = max(atol, rtol * smax)
Singular values smaller than `tol` are considered to be zero.
93
SciPy Cookbook, Release
r : int
The estimated rank of the matrix.
numpy.linalg.matrix_rank
matrix_rank is basically the same as this function, but it does not
provide the option of the absolute tolerance.
"""
A = np.atleast_2d(A)
s = svd(A, compute_uv=False)
tol = max(atol, rtol * s[0])
rank = int((s >= tol).sum())
return rank
A : ndarray
A should be at most 2-D. A 1-D array with length k will be treated
as a 2-D with shape (1, k)
atol : float
The absolute tolerance for a zero singular value. Singular values
smaller than `atol` are considered to be zero.
rtol : float
The relative tolerance. Singular values less than rtol*smax are
considered to be zero, where smax is the largest singular value.
If both `atol` and `rtol` are positive, the combined tolerance is the
maximum of the two; that is::
tol = max(atol, rtol * smax)
Singular values smaller than `tol` are considered to be zero.
ns : ndarray
If `A` is an array with shape (m, k), then `ns` will be an array
with shape (k, n), where n is the estimated dimension of the
nullspace of `A`. The columns of `ns` are a basis for the
nullspace; each element in numpy.dot(A, ns) will be approximately
zero.
"""
A = np.atleast_2d(A)
u, s, vh = svd(A)
tol = max(atol, rtol * s[0])
nnz = (s >= tol).sum()
ns = vh[nnz:].conj().T
return ns
#!python
import numpy as np
def checkit(a):
print "a:"
print a
r = rank(a)
print "rank is", r
ns = nullspace(a)
print "nullspace:"
print ns
if ns.size > 0:
res = np.abs(np.dot(a, ns)).max()
print "max residual is", res
print "-"*25
print "-"*25
print "-"*25
print "-"*25
print "-"*25
a:
[[ 1. 2. 3.]
[ 4. 5. 6.]
[ 7. 8. 9.]]
rank is 2
nullspace:
[[-0.40824829]
[ 0.81649658]
[-0.40824829]]
a:
[[ 0. 2. 3.]
[ 4. 5. 6.]
[ 7. 8. 9.]]
rank is 3
nullspace:
a:
[[ 0. 1. 2. 4.]
[ 1. 2. 3. 4.]]
rank is 2
nullspace:
[[-0.48666474 -0.61177492]
[-0.27946883 0.76717915]
[ 0.76613356 -0.15540423]
[-0.31319957 -0.11409267]]
a:
[[ 1.0+0.j 0.0+1.j 2.0+2.j ]
[ 0.0+1.j -1.0+0.j -2.0+2.j ]
[ 0.5+0.j 0.0+0.5j 1.0+1.j ]]
rank is 1
nullspace:
[[ 0.00000000-0.j -0.94868330-0.j ]
[ 0.13333333+0.93333333j 0.00000000-0.10540926j]
[ 0.20000000-0.26666667j 0.21081851-0.21081851j]]
Matplotlib
3D Plotting
Just in case you would ever like to incorporate matplotlib plots into your vtk application, vtk provides a really easy
way to import them.
Here is a full example for now:
import matplotlib
matplotlib.use('Agg')
from matplotlib.figure import Figure
from matplotlib.backends.backend_agg import FigureCanvasAgg
import pylab as p
cubeMapper = vtkPolyDataMapper()
cubeMapper.SetInput(cube.GetOutput())
97
SciPy Cookbook, Release
cubeActor = vtkActor()
cubeActor.SetMapper(cubeMapper)
ren = vtkRenderer()
ren.AddActor(cubeActor)
renWin = vtkRenderWindow()
renWin.AddRenderer(ren)
iren = vtkRenderWindowInteractor()
iren.SetRenderWindow(renWin)
# This is where we tell the image importer about the mpl image
extent = (0, w - 1, 0, h - 1, 0, 0)
importer.SetWholeExtent(extent)
importer.SetDataExtent(extent)
importer.SetImportVoidPointer(canvas.buffer_rgba(0,0), 1)
importer.Update()
iren.Initialize()
iren.Start()
bbact = vtkActor2D()
bbact.SetMapper(hmap)
98 Chapter 5. Matplotlib
SciPy Cookbook, Release
from http://sourceforge.net/mailarchive/forum.php?thread_id=7884469&forum_id=33405
The reason for this is that vtk comes with it's own freetype library, and this is the
˓→one being used if vtk is loaded first. Worse symptoms could be errors about fonts
˓→not being found. This is typically solved by importing vtk after other packages
I'm using VTK from CVS using the 5_0 Branch from around November 2005
The above code didn’t run on my system. I had to change the following line: fig.set_figsize_inches(w / dpi, h / dpi)
into: fig.set_figsize_inches(1.0w / dpi, 1.0h / dpi)
Section author: AndrewStraw, Unknown[41], Unknown[122]
5.1. 3D Plotting 99
SciPy Cookbook, Release
Matplotlib: mplot3d
The examples below show simple 3D plots using matplotlib. matplotlib’s 3D capabilities were added by incorporating
John Porter’s mplot3d module, thus no additional download is required any more, the following examples will run
with an up to date matplotlib installation. ‘’‘Note, this code is not supported in the matplotlib-0.98 branch, but you
can use either the latest 0.99 release or the 0.91 maintenance version if you need this functionality. ‘” Alternatively,
the Mayavi2 project provides a pylab-like API for extensive 3D plotting: http://code.enthought.com/projects/mayavi/
docs/development/html/mayavi/mlab.html
Note that not all examples on this page are up to date, so some of them might not be working. For other examples, see
http://matplotlib.sourceforge.net/examples/mplot3d/
3D Plotting examples:
#!python
from numpy import *
import pylab as p
#import matplotlib.axes3d as p3
import mpl_toolkits.mplot3d.axes3d as p3
#!python
fig=p.figure()
ax = p3.Axes3D(fig)
ax.plot_wireframe(x,y,z)
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')
p.show()
3D Plot:
#!python
# this connects each of the points with lines
fig=p.figure()
ax = p3.Axes3D(fig)
# plot3D requires a 1D array for x, y, and z
# ravel() converts the 100x100 array into a 1x10000 array
ax.plot3D(ravel(x),ravel(y),ravel(z))
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')
fig.add_axes(ax)
p.show()
#!python
fig=p.figure()
ax = p3.Axes3D(fig)
# scatter3D requires a 1D array for x, y, and z
# ravel() converts the 100x100 array into a 1x10000 array
ax.scatter3D(ravel(x),ravel(y),ravel(z))
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')
p.show()
fig=p.figure()
ax = p3.Axes3D(fig)
ax.contour3D(X,Y,Z)
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')
p.show()
Contourf3D:
#!python
# in mplt3D change:
# levels, colls = self.contourf(X, Y, Z, 20)
# to:
# C = self.contourf(X, Y, Z, *args, **kwargs)
# levels, colls = (C.levels, C.collections)
fig=p.figure()
ax = p3.Axes3D(fig)
ax.contourf3D(X,Y,Z)
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')
fig.add_axes(ax)
p.show()
#!python
x=r_[-10:10:100j]
y=r_[-10:10:100j]
z= add.outer(x*x, y*y)
### Contour plot of z = x**2 + y**2
p.contour(x,y,z)
items/attachments/Matplotlib_mplot3D_attachments/contour.jpg
items/files/Matplotlib_mplot3D/contourf.jpg
For some other examples of 3d plotting capability, run the following commands. See the source of mat-
plotlib/axes3d.py for more information:
#!python
# note that for the following to work you have to modify the test funcitons in your
˓→site-packages/matplotlib/axes3d.py like this:
#def test_xxxx():
# import pylab
# ax = Axes3D(pylab.figure())
# ....
# ....
# pylab.show()
# the following then work on 0.87.5
p3.test_bar2D()
p3.test_contour()
p3.test_scatter()
p3.test_scatter2D()
p3.test_surface()
# the following fail on 0.87.5
p3.test_plot()
p3.test_polys()
p3.test_wir
Attachments
• contour.jpg
• contour.png
• contour3D.jpg
• contour3D.png
• contourf.jpg
• contourf.png
• contourf3D.jpg
• contourf3D.png
• plot.jpg
• plot.png
• scatter.jpg
• scatter.png
• surface.jpg
• surface.png
• test1.jpg
• test1.png
• test2.jpg
• test2.png
• test3.jpg
• test3.png
• wireframe.jpg
• wireframe.png
Embedding In Wx
Matplotlib can be embedded in wxPython applications to provide high quality data visualization. There are two
approaches to this, direct embedding and using an embedding library.
‘’Direct embedding” is where your put one of the wxPython backend widgets (which subclass {{{wx.Panel}}}) di-
rectly into your application and draw plots on it using matplotlib’s object-oriented API. This approach is demon-
strated by the [http://cvs.sourceforge.net/viewcvs.py/matplotlib/matplotlib/examples/ embedding_in_wx*.py] exam-
ples that come with matplotlib. Neither {{{FigureCanvasWx}}} nor {{{FigureCanvasWxAgg}}} provide any facili-
ties for user interactions like displaying the coordinates under the mouse, so you’ll have to implement such things
yourself. The matplotlib example [http://cvs.sourceforge.net/viewcvs.py/%2Acheckout%2A/matplotlib/matplotlib/
examples/wxcursor_demo.py?content-type=text%2Fplain wxcursor_demo.py] should help you get started.
An ‘’embedding library” saves you a lot of time and effort by providing plotting widgets that already support user
interactions and other bells and whistles. There are two such libraries that I am aware of:
1. Matt Newville’s [http://cars9.uchicago.edu/~newville/Python/MPlot/ MPlot] package supports drawing 2D line
plots using pylab-style {{{plot()}}} and {{{oplot()}}} methods.
2. Ken !McIvor’s [http://agni.phys.iit.edu/~kmcivor/wxmpl/ WxMpl] module supports drawing all plot types using
matplotlib’s object-oriented API.
Each of these libraries has different benefits and drawbacks, so I encourage you to evaluate each of them and select
the one that best meets your needs.
If you’re embedding matplotlib in a wxPython program, you’re probably going to have to use Matplotlib’s Object-
Oriented API to at some point. Take heart, as it matches the pylab API closely and is easy to pick up. There are more
nuts and bolts to deal with, but that’s no problem to someone already programming with wxPython! ;-)
The matplotlib FAQ [http://matplotlib.sourceforge.net/faq.html#OO” links] to several resources for learning about
the OO API. Once you’ve got your feet wet, reading the classdocs is the most helpful source of information. The
[http://matplotlib.sourceforge.net/matplotlib.axes.html#Axes” matplotlib.axes.Axes] class is where most of the plot-
ting methods live, so it’s a good place to start after you’ve conquored creating a Figure.
For your edification, a series of pylab examples have been translated to the OO API. They are available in a demon-
stration script that must be run from a command line. You may use any interactive matplotlib backend to display these
plots.
Here is a simple example of an application written in wx that embeds a [”Matplotlib figure in a wx panel”]. No
toolbars, mouse clicks or any of that, just a plot drawn in a panel. Some work has been put into it to make sure that
the figure is only redrawn once during a resize. For plots with many points, the redrawing can take some time, so it is
best to only redraw when the figure is released. Have a read of the code.
Section author: AndrewStraw, Unknown[94]
Traits, part of the Enthought Tools Suit, provides a great framework for creating GUI Apps without a lot of the normal
boilerplate required to connect the UI the rest of the application logic. A brief introduction to Traits can be found here.
Although ETS comes with it’s own traits-aware plotting framework (Chaco), if you already know matplotlib it is just as
easy to embed this instead. The advantages of Chaco (IMHO) are its interactive “tools”, an (in development) OpenGL
rendering backend and an easy-to-understand codebase. However, matplotlib has more and better documentation and
better defaults; it just works. The key to getting TraitsUI and matplotlib to play nice is to use the mpl object-oriented
API, rather than pylab / pyplot. This recipe requires the following packages:
• numpy
• wxPython
• matplotlib
• Traits > 3.0
• TraitsGUI > 3.0
• TraitsBackendWX > 3.0
For this example, we will display a function (y, a sine wave) of one variable (x, a numpy ndarray) and one parameter
(scale, a float value with bounds). We want to be able to vary the parameter from the UI and see the resulting changes
items/files/EmbeddingInTraitsGUI/mpl_in_traits_v
to y in a plot window. Here’s what the final result looks like: The TraitsUI ”!CustomEditor” can
be used to display any wxPython window as the editor for the object. You simply pass the !CustomEditor a callable
which, when called, returns the wxPython window you want to display. In this case, our !MakePlot() function returns a
wxPanel containing the mpl !FigureCanvas and Navigation toolbar. This example exploits a few of Traits’ features. We
use “dynamic initialisation” to create the Axes and Line2D objects on demand (using the _xxx_default methods). We
use Traits “notification”, to call update_line(...) whenever the x- or y-data is changed. Further, the y-data is declared
as a Property trait which depends on both the ‘scale’ parameter and the x-data. ‘y’ is then recalculated on demand,
whenever either ‘scale’ or ‘x’ change. The ‘cached_property’ decorator prevents recalculation of y if it’s dependancies
*are)#not* modified.‘
Finally, there’s a bit of wx-magic in the redraw() method to limit the redraw rate by delaying the actual drawing by
50ms. This uses the wx.!CallLater class. This prevents excessive redrawing as the slider is dragged, keeping the UI
from lagging. Here's the full listing:
#!python
"""
A simple demonstration of embedding a matplotlib plot window in
a traits-application. The CustomEditor allow any wxPython window
Attachments
• mpl_editor.py
• mpl_in_traits_view.png
Matplotlib provides event handling to determine things like key presses, mouse position, and button clicks. Mat-
plotlib supports a number of GUIs, and provides an interface to the GUI event handling via the mpl_connect and
mpl_disconnect methods.
This page gives an example of use of these facilities by adding a Drag’n’Drop handler for text objects. You can get
the source code for this example here: Text_DragnDrop_v0.1.py .)
#!python numbers=disable
from matplotlib import pylab as p
from matplotlib.text import Text
class DragHandler(object):
""" A simple class to handle Drag n Drop.
figure.canvas.mpl_connect("button_release_event", self.on_release_event)
if isinstance(event.artist, Text):
self.dragged = event.artist
self.pick_pos = (event.mouseevent.xdata, event.mouseevent.ydata)
return True
#! python numbers=disable
# Usage example
from numpy import *
p.show()
Attachments
• Text_DragnDrop.py
• Text_DragnDrop_v0.1.py
• Text_DragnDrop_v2.py
Matplotlib: pyside
This is a very basic example showing how to display a matplotlib plot within a Qt application using PySide. In case
of problems try to change the rcParam entry “backend.qt4” to “PySide” (e.g. by in the matplotlibrc file).
#!/usr/bin/env python
import sys
import matplotlib
matplotlib.use('Qt4Agg')
import pylab
if __name__ == '__main__':
app = QtGui.QApplication(sys.argv)
win = QtGui.QMainWindow()
# add the plot canvas to a window
win.setCentralWidget(canvas)
win.show()
sys.exit(app.exec_())
When plotting a very long sequence in a matplotlib canvas embedded in a wxPython application, it sometimes is
useful to be able to display a portion of the sequence without resorting to a scrollable window so that both axes remain
visible. Here is how to do so:
from numpy import arange, sin, pi, float, size
import matplotlib
matplotlib.use('WXAgg')
from matplotlib.backends.backend_wxagg import FigureCanvasWxAgg
from matplotlib.figure import Figure
import wx
class MyFrame(wx.Frame):
def __init__(self, parent, id):
wx.Frame.__init__(self,parent, id, 'scrollable plot',
style=wx.DEFAULT_FRAME_STYLE ^ wx.RESIZE_BORDER,
size=(800, 400))
self.panel = wx.Panel(self, -1)
sizer = wx.BoxSizer(wx.VERTICAL)
sizer.Add(self.canvas, -1, wx.EXPAND)
self.panel.SetSizer(sizer)
self.panel.Fit()
self.init_data()
self.init_plot()
self.canvas.Bind(wx.EVT_SCROLLWIN, self.OnScrollEvt)
def init_data(self):
def init_plot(self):
self.axes = self.fig.add_subplot(111)
self.plot_data = \
self.axes.plot(self.t[self.i_start:self.i_end],
self.x[self.i_start:self.i_end])[0]
def draw_plot(self):
# Redraw:
self.canvas.draw()
class MyApp(wx.App):
def OnInit(self):
self.frame = MyFrame(parent=None,id=-1)
self.frame.Show()
self.SetTopWindow(self.frame)
return True
if __name__ == '__main__':
app = MyApp()
app.MainLoop()
Misc
Load image
Image processing often works on gray scale images that were stored as PNG files. How do we import / export that file
into {{{python}}}?
• Here is a recipy to do this with Matplotlib using the {{{imread}}} function (your image is called
{{{lena.png}}}).
This permits to do some processing for further exporting such as for [:Cook-
book/Matplotlib/converting_a_matrix_to_a_raster_image:converting a matrix to a raster image]. In the newest
version of pylab (check that your {{{pylab.matplotlib.**version**}}} is superior to {{{‘0.98.0’}}}) you get directly
a 2D numpy array if the image is grayscale.
• to write an image, do import Image mode = ‘L’ size= (256, 256) imNew=Image.new(mode , size) mat =
numpy.random.uniform(size = size) data = numpy.ravel(mat) data = numpy.floor(data * 256)
imNew.putdata(data) imNew.save(“rand.png”)
• this kind of functions live also under {{{scipy.misc}}}, see for instance {{{scipy.misc.imsave}}} to create a
color image: from scipy.misc import imsave import numpy a = numpy.zeros((4,4,3)) a[0,0,:] = [128, 0 , 255]
imsave(‘graybackground_with_a_greyish_blue_square_on_top.png’,a)
• to define the range, use: from scipy.misc import toimage import numpy a = numpy.random.rand(25,50) #be-
tween 0. and 1. toimage(a, cmin=0., cmax=2.).save(‘low_contrast_snow.png’) (adapted from http://telin.ugent.
be/~slippens/drupal/scipy_unscaledimsave )
• there was another (more direct) method suggested by http://jehiah.cz/archive/creating-images-with-numpy
Section author: LaurentPerrinet
This is a small demo file that helps teach how to adjust figure sizes for matplotlib
There are three parameters define an image size (this is not MPL specific):
When displaying on a computer screen (or saved to a PNG), the size in length units is irrelevant, the pixels are simply
displayed. When printed, or saved to PS, EPS or PDF (all designed to support printing), then the Size or dpi is used to
determine how to scale the image.
. 1) The size of a figure is defined in length units (inches), and can be set
by
. 2) The layout of the figure is defined in 'figure units' so that as the
figure size is changed, the layout (eg axes positions) will update.
. 3) Size of text, width of lines, etc is defined in terms of length units
(points?).
. 4) When displaying to the screen, or creating an image (PNG) the pixel size
of text and line widths, etc is determined by the dpi setting, which is set by
The trick here is that when printing, it's natural to think in terms of
inches, but when creating an image (for a web page, for instance), it is
natural to think in terms of pixel size. However, as of 0.84, pixel size can
only be set directly in the GTK* back-ends, with the canvas.resize(w,h)
method. (remember that you can only set two of the three size parameters, the
third must be calculated from the other two).
Figure.savefig() overrides the dpi setting in figure, and uses a default (which on my system at least is 100 dpi). If you
want to overide it, you can specify the ‘dpi’ in the savefig call:
The following code will hopefully make this more clear, at least for generating PNGs for web pages and the like.
MPL_size_test.py
#!python
"""
This is a small demo file that helps teach how to adjust figure sizes
for matplotlib
"""
import matplotlib
print "using MPL version:", matplotlib.__version__
matplotlib.use("WXAgg") # do this before pylab so you don'tget the default back end.
import pylab
import matplotlib.numerix as N
pylab.plot(x,y)
F = pylab.gcf()
# Now make the image twice as big, while keeping the fonts and all the
# same size
F.set_figsize_inches( (DefaultSize[0]*2, DefaultSize[1]*2) )
Size = F.get_size_inches()
print "Size in Inches", Size
F.savefig("test2.png")
# this results in a 1595x1132 image
# Now make the image twice as big, making all the fonts and lines
# bigger too.
Suppose you have two images: 100x100 and 100x50 that you want to display in a figure with a buffer of 20 pixels
(relative to image pixels) between them and a border of 10 pixels all around.
The solution isn’t particularly object oriented, but at least it gets to the practical details.
#!python
def _calcsize(matrix1, matrix2, top=10, left=10, right=10, bottom=10, buffer=20,
˓→height=4, scale = 1.):
pylab.imshow(X2, aspect='preserve')
return axes1, axes2, figure
Attachments
• MPL_size_test.py
how to install sunstudio and build matplotlib on solaris 10 might give some hints.
JDH said:
Hi Erik – if you succeed, then we’ll have convincing proof that compiling mpl on solaris is easier than
giving up the sauce.
Well, it has turned out to be easier than giving up the sauce (at least for me), but only by a whisker. In the end,
the fix is incredibly simple (if you consider recompiling python and manually adjusting the auto-produced pyconfig.h
incredibly simple, anyway). After two solid days of commenting this and that out, recompiling everything and its
mother 76 different ways from Sunday, poring over a legion of Solaris sys includes, slaughtering a few spotlessly
white lambs and one pure black sheep, wrapping the bones and tendons and viscera in a double layer of fat and
burning the offering to Delphic Apollo, I found the answer:
1 download Python 2.4.2
2 after extracting it and running ./configure, edit the generated pyconfig.h as follows:
Each axes instance contains a lines attribute, which is a list of the data series in the plot, added in chronological
order. To delete a particular data series, one must simply delete the appropriate element of the lines list and redraw if
necessary.
The is illustrated in the following example from an interactive session:
>>> x = N.arange(10)
>>> ax.plot(x+10)
[<matplotlib.lines.Line2D instance at 0x427ce88c>]
>>> ax.plot(x+20)
[<matplotlib.lines.Line2D instance at 0x427ce9ac>]
>>> P.show()
>>> ax.lines
[<matplotlib.lines.Line2D instance at 0x427ce7ec>,
<matplotlib.lines.Line2D instance at 0x427ce88c>,
<matplotlib.lines.Line2D instance at 0x427ce9ac>]
which will plot three lines, and then delete the second.
Section author: AngusMcMorland
Matplotlib: django
1. (a) Please edit system and help pages ONLY in the moinmaster wiki! For more
(b) information, please see MoinMaster:MoinPagesEditorGroup.
(c) master-page:Unknown-Page
(d) master-date:Unknown-Date
(e) acl MoinPagesEditorGroup:read,write,delete,revert All:read
You need to have a working Django installation, plus matplotlib.
# file charts.py
def simple(request):
import random
import django
import datetime
fig=Figure()
ax=fig.add_subplot(111)
x=[]
y=[]
now=datetime.datetime.now()
delta=datetime.timedelta(days=1)
for i in range(10):
x.append(now)
now+=delta
y.append(random.randint(0, 1000))
ax.plot_date(x, y, '-')
ax.xaxis.set_major_formatter(DateFormatter('%Y-%m-%d'))
fig.autofmt_xdate()
canvas=FigureCanvas(fig)
response=django.http.HttpResponse(content_type='image/png')
canvas.print_png(response)
return response
Since some versions of Internet Explorer ignore the content_type. The URL should end with ”.png”. You can create
an entry in your urls.py like this:
...
(r'^charts/simple.png$', 'myapp.views.charts.simple'),
...
I find it often quite useful to be able to identify points within a plot simply by clicking. This recipe provides a fairly
simple functor that can be connected to any plot. I’ve used it with both scatter and standard plots.
Because often you’ll have multiple views of a dataset spread across either multiple figures, or at least multiple axis,
I’ve also provided a utility to link these plots together so clicking on a point in one plot will highlight and identify that
data point on all other linked plots.
import math
class AnnoteFinder(object):
"""callback for matplotlib to display an annotation when points are
clicked on. The point which is closest to the click and within
xtol and ytol is identified.
scatter(xdata, ydata)
af = AnnoteFinder(xdata, ydata, annotes)
connect('button_press_event', af)
"""
if event.inaxes:
clickX = event.xdata
clickY = event.ydata
if (self.ax is None) or (self.ax is event.inaxes):
annotes = []
# print(event.xdata, event.ydata)
for x, y, a in self.data:
# print(x, y, a)
if ((clickX-self.xtol < x < clickX+self.xtol) and
(clickY-self.ytol < y < clickY+self.ytol)):
annotes.append(
(self.distance(x, clickX, y, clickY), x, y, a))
if annotes:
annotes.sort()
distance, x, y, annote = annotes[0]
self.drawAnnote(event.inaxes, x, y, annote)
for l in self.links:
l.drawSpecificAnnote(annote)
self.drawAnnote(self.ax, x, y, a)
x = range(10)
y = range(10)
annotes = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
fig, ax = plt.subplots()
ax.scatter(x,y)
af = AnnoteFinder(x,y, annotes, ax=ax)
fig.canvas.mpl_connect('button_press_event', af)
plt.show()
This is fairly useful, but sometimes you’ll have multiple views of a dataset and it is useful to click and identify a point
in one plot and find it in another. The below code demonstrates this linkage and should work between multiple axis or
figures.
def linkAnnotationFinders(afs):
for i in range(len(afs)):
allButSelfAfs = afs[:i]+afs[i+1:]
afs[i].links.extend(allButSelfAfs)
subplot(121)
scatter(x,y)
af1 = AnnoteFinder(x,y, annotes)
connect('button_press_event', af1)
subplot(122)
scatter(x,y)
af2 = AnnoteFinder(x,y, annotes)
connect('button_press_event', af2)
linkAnnotationFinders([af1, af2])
I find this fairly useful. By subclassing and redefining drawAnnote this simple framework could be used to drive a
more sophisticated user interface.
Currently this implementation is a little slow when the number of datapoints becomes large. I’m particularly interested
in suggestions people might have for making this faster and better.
Often, you don’t want to respond to click events while zooming or panning (selected using the toolbar mode buttons).
You can avoid responding to those events by checking the attribute of the toolbar instance. The first example below
shows how to do this using the pylab interface.
def click(event):
"""If the left mouse button is pressed: draw a little square. """
tb = get_current_fig_manager().toolbar
if event.button==1 and event.inaxes and tb.mode == '':
x,y = event.xdata,event.ydata
plot([x],[y],'rs')
draw()
plot((arange(100)/99.0)**3)
gca().set_autoscale_on(False)
connect('button_press_event',click)
show()
If your application is in an wxPython window, then chances are you created a handle to the toolbar during setup,
as shown in the {{{add_toolbar}}} method of the embedding_in_wx2.py example script, and can then access the
{{{mode}}} attribute of that object (self.toolbar.mode in that case) in your click handling method.
Section author: AndrewStraw, GaelVaroquaux, Unknown[99], AngusMcMorland, newacct, danielboone
0. Prerequisites: You need to have the following installed to successfully run this example
Matplotlib (on top of Zope's Python), Python Image Library (PIL).
And one more thing - probably every body does this right,
but just in case - zope instance home directory has to be writable,
for following to work.
1. Create a file (e.g. mpl.py) in INSTANCEHOME\Extensions:
import matplotlib
matplotlib.use('Agg')
from pylab import *
from os import *
from StringIO import StringIO
from PIL import Image as PILImage
from matplotlib.backends.backend_agg import FigureCanvasAgg
def chart(self):
clf()
img_dpi=72
width=400
height=300
fig=figure(dpi=img_dpi, figsize=(width/img_dpi, height/img_dpi))
x=arange(0, 2*pi+0.1, 0.1)
sine=plot(x, sin(x))
legend(sine, "y=sin x", "upper right")
xlabel('x')
ylabel('y=sin x')
grid(True)
canvas = FigureCanvasAgg(fig)
canvas.draw()
size = (int(canvas.figure.get_figwidth())*img_dpi, int(canvas.figure.get_
˓→figheight())*img_dpi)
buf=canvas.tostring_rgb()
im=PILImage.fromstring('RGB', size, buf, 'raw', 'RGB', 0, 1)
imgdata=StringIO()
im.save(imgdata, 'PNG')
self.REQUEST.RESPONSE.setHeader('Pragma', 'no-cache')
self.REQUEST.RESPONSE.setHeader('Content-Type', 'image/png')
return imgdata.getvalue()
2. Then create an external method in ZMI (e.g. Id -> mplchart, module name -> mpl, function name -> chart).
3. Click the Test tab and you should see the sine plot.
Section author: AndrewStraw, Unknown[106], Unknown[107]
When using multiple subplots with the same axis units, it is redundant to label each axis individually, and makes the
graph overly complex. You can use a single axis label, centered in the plot frame, to label multiple subplot axes. Here
is how to do it:
#!python
# note that this a code fragment...you will have to define your own data to plot
# Set up a whole-figure axes, with invisible axis, ticks, and ticklabels,
# which we use to get the xlabel and ylabel in the right place
bigAxes = pylab.axes(frameon=False) # hide frame
pylab.xticks([]) # don't want to see any ticks on this axis
pylab.yticks([])
# I'm using TeX for typesetting the labels--not necessary
pylab.ylabel(r'\textbf{Surface Concentration $(nmol/m^2)$}', size='medium')
pylab.xlabel(r'\textbf{Time (hours)}', size='medium')
# Create subplots and shift them up and to the right to keep tick labels
# from overlapping the axis labels defined above
topSubplot = pylab.subplot(2,1,1)
position = topSubplot.get_position()
position[0] = 0.15
position[1] = position[1] + 0.01
topSubplot.set_position(position)
pylab.errorbar(times150, average150)
bottomSubplot = pylab.subplot(2,1,2)
position = bottomSubplot.get_position()
position[0] = 0.15
position[1] = position[1] + 0.03
bottomSubplot.set_position(position)
pylab.errorbar(times300, average300)
1. (a)
Alternatively, you can use the following snippet to have shared ylabels on your subplots. Also see the attached figure
output#
#!python
import pylab
fig = pylab.figure(**figprops)
˓→ # New figure
fig.subplots_adjust(**adjustprops)
˓→ # Tunes the subplot layout
ax = fig.add_subplot(3, 1, 1)
bx = fig.add_subplot(3, 1, 2, sharex=ax, sharey=ax)
cx = fig.add_subplot(3, 1, 3, sharex=ax, sharey=ax)
pylab.setp(ax.get_xticklabels(), visible=False)
pylab.setp(bx.get_xticklabels(), visible=False)
#!python
def rem_x():
'''Removes superfluous x ticks when multiple subplots share
their axis works only in pylab mode but can easily be rewritten
for api use'''
nr_ax=len(gcf().get_axes())
count=0
for z in gcf().get_axes():
if count == nr_ax-1: break
setp(z.get_xticklabels(),visible=False)
count+=1
The first one above doesn’t work for me. The subplot command overwrites the bigaxes. However, I found a much
simpler solution to do a decent job for two axes and one ylabel:
yyl=plt.ylabel(r’My longish label that I want vertically centred’)
yyl.set_position((yyl.get_position()[0],1)) # This says use the top of the bottom axis as the reference point.
yyl.set_verticalalignment(‘center’)
Section author: Unknown[78], DavidLinke, Unknown[80], Unknown[109], Unknown[110]
Attachments
• Same_ylabel_subplots.png
Example code for embedding Matplotlib in Qt applications is given at embedding_in_qt.py. This recipe extends that
basic formula for integration with other powerful tools. In particular, we bring together the use of the GUI creation
tool made by Trolltech (the creators of Qt) and the ability to interact with a running Qt application via IPython.
A basic tutorial for using Designer (along with the pertinent system requirements) has been posted by Alex Fedosov.
Please review it before continuing.
However, for our purposes, we will create a much simpler design in Designer.
Open Designer, and create a new “Main Window”:
When the wizard appears, remove all of the menus and toolbars that it suggests to generate:
Now add a custom widget to your project as:
Give this new widget the name of and set the size to 400, 300:
The should now appear in your toolbox. Click on it and then click in the Form that has been created. Also perform
“Lay Out Vertically (Ctrl+L)” and “Adjust Size (Ctrl+J)” operations on the Form (without the widget being selected)
if you know what those are. At this point your workspace might look something like this:
Now we need to get the “import” setting put in, as specified here:
Of course, you’ll also need the file that will then be included: mplwidget.py.
The product of all of these operations in Designer is a .ui file. So, save the Form we’ve been working on and call it
“mplwidget_tutorial.ui”.
One of the things that can mess this up is that Designer automatically increments the names given to widget instances
(Form1, Form2, matplotlibWidget1, matplotlibWidget2, etc.) so if those don’t match up with what I used, you may
need to make some logical adjustments to your procedure.
Now, we use the tool pyuic (included with PyQt to convert this .ui file into a Python class. This is easily accomplished
with:
Go ahead and view the contents of mplwidget_tutorial.py and compare it with what I got. You can also look )# at my
mplwidget_tutorial.ui
Now, it’s nice to have the invocation written up in its own main_mpl_tutorial.py file, which is amazingly short:
)#
f = Form1()
f.show()
Then, we want to start ipython and instantiate the window. In order to succeed in this endeavor, there are a bunch of
packages (each with minimum version requirements) you’ll need, like python2.3-ipython, python2.3-qt3, and so on.
The most important one, though, is ipython >= 0.6.13 (I think). From that version on, there is a super-great feature
that adds an invocation switch {{{-qthread}}} that starts a QApplication in a separate thread, in such a way that the
ipython prompt can still interact with it.
So, if the above command doesn’t work for you, check the version. It may also be invoked as “python2.3-ipython”
depending on your configuration.
Okay, now comes the coolest part: the interaction.
And, it is very easy to go back to Designer, add a button, re-run pyuic, and you’ve got another version.
Section author: AndrewStraw, KristjanOnu
Attachments
• designer_edit_custom_widgets.png
• designer_form_settings_comment.png
• designer_full_workspace.png
• designer_new_widget.png
• designer_newopen.png
• designer_wizard.png
• ipython_interacted.png
• ipython_invoked.png
• main_mpl_tutorial.py
• mplwidget.py
• mplwidget_tutorial.py
• mplwidget_tutorial.ui
Trying to use matplotlib in a python CGI script naïvely will most likely result in the following error:
...
352, in _get_configdir
raise RuntimeError("'%s' is not a writable dir; you must set
environment variable HOME to be a writable dir "%h)
RuntimeError: '<WebServer DocumentRoot>' is not a writable dir; you must set
environment variable HOME to be a writable dir
Matplotlib needs the environment variable HOME to point to a writable directory. One way to accomplish this is to
set this environment variable from within the CGI script on runtime (another way would be to modify the file but that
would be not as portable). The following template can be used for a cgi that uses matplotlib to create a png image:
#!/usr/bin/python
import os,sys
import cgi
import cgitb; cgitb.enable()
# set HOME environment variable to a directory the httpd server can write to
os.environ[ 'HOME' ] = '/tmp/'
import matplotlib
# chose a non-GUI backend
matplotlib.use( 'Agg' )
import pylab
#Deals with inputing data into python from the html form
form = cgi.FieldStorage()
This image can then be accessed with a URL such as: http://localhost/showpng.py
As documented,some backends will not allow the output to be sent to sys.stdout. It is possible to replace the last line
with the following to work around this:
(Of course it is necessary to create and delete proper temp files to use this in production.)
Section author: AndrewStraw, Unknown[120], Unknown[121]
Ever wanted to reverse a colormap, or to desaturate one ? Here is a routine to apply a function to the look up table of
a colormap:
def cmap_map(function,cmap):
""" Applies function (which should operate on vectors of shape 3:
[r, g, b], on colormap cmap. This routine will break any discontinuous points
˓→in a colormap.
"""
cdict = cmap._segmentdata
step_dict = {}
# Firt get the list of points where the segments start or end
for key in ('red','green','blue'): step_dict[key] = map(lambda x: x[0],
˓→cdict[key])
return matplotlib.colors.LinearSegmentedColormap('colormap',cdict,1024)
Lets try it out: I want a jet colormap, but lighter, so that I can plot things on top of it:
[](files/attachments/Matplotlib_ColormapTransformations/light_jet4.png
As a comparison, this is what the original jet looks like: [](files/attachments/Matplotlib_ColormapTransformations/jet.png
OK, but what if you want to change the indices of a colormap, but not its colors.
def cmap_xmap(function,cmap):
""" Applies function, on the indices of colormap cmap. Beware, function
should map the [0, 1] segment to itself, or you are in for surprises.
cdict[key].sort()
assert (cdict[key][0]<0 or cdict[key][-1]>1), "Resulting indices extend out
˓→of the [0, 1] segment."
return matplotlib.colors.LinearSegmentedColormap('colormap',cdict,1024)
Example
x = resize(arange(100), (5,100))
djet = cmap_discretize(cm.jet, 5)
imshow(x, cmap=djet)
"""
if type(cmap) == str:
cmap = get_cmap(cmap)
colors_i = concatenate((linspace(0, 1., N), (0.,0.,0.,0.)))
colors_rgba = cmap(colors_i)
indices = linspace(0, 1., N+1)
cdict = {}
for ki,key in enumerate(('red','green','blue')):
cdict[key] = [ (indices[i], colors_rgba[i-1,ki], colors_rgba[i,ki]) for i in
˓→xrange(N+1) ]
So for instance, this is what you would get by doing {{{cmap_discretize(cm.jet, 6)}}}.
Attachments
• dicrete_jet1.png
• discrete_jet.png
• jet.png
• light_jet.png
• light_jet2.png
• light_jet3.png
• light_jet4.png
Scipy provides a command (imsave) to make a raster (png, jpg...) image from a 2D array, each pixel corresponding to
one value of the array. Yet the image is black and white.
Here is a recipy to do this with Matplotlib, and use a colormap to give color to the image.
X,Y=mgrid[-5:5:0.1,-5:5:0.1]
Z=sin(X**2+Y**2+1e-4)/(X**2+Y**2+1e-4) # Create the data to be plotted
imsave('imsave.png', Z, cmap=cm.hot )
imshow(imread('imsave.png'))
<matplotlib.image.AxesImage at 0x7f1733edf610>
Attachments
• imsave.jpg
• imsave.png
A commonly asked question on the matplotlib mailing lists is “how do I make a contour plot of my irregularly spaced
data?”. The answer is, first you interpolate it to a regular grid. As of version 0.98.3, matplotlib provides a griddata
function that behaves similarly to the matlab version. It performs “natural neighbor interpolation” of irregularly spaced
data a regular grid, which you can then plot with contour, imshow or pcolor.
This requires Scipy 0.9:
import numpy as np
from scipy.interpolate import griddata
import matplotlib.pyplot as plt
import numpy.ma as ma
from numpy.random import uniform, seed
# make up some randomly distributed data
seed(1234)
npts = 200
x = uniform(-2,2,npts)
y = uniform(-2,2,npts)
z = x*np.exp(-x**2-y**2)
# define grid.
xi = np.linspace(-2.1,2.1,100)
yi = np.linspace(-2.1,2.1,100)
# grid the data.
zi = griddata((x, y), z, (xi[None,:], yi[:,None]), method='cubic')
# contour the gridded data, plotting dots at the randomly spaced data points.
CS = plt.contour(xi,yi,zi,15,linewidths=0.5,colors='k')
CS = plt.contourf(xi,yi,zi,15,cmap=plt.cm.jet)
plt.colorbar() # draw colorbar
# plot data points.
plt.scatter(x,y,marker='o',c='b',s=5)
plt.xlim(-2,2)
plt.ylim(-2,2)
plt.title('griddata test (%d points)' % npts)
plt.show()
import numpy as np
from matplotlib.mlab import griddata
import matplotlib.pyplot as plt
import numpy.ma as ma
from numpy.random import uniform
# make up some randomly distributed data
npts = 200
x = uniform(-2,2,npts)
y = uniform(-2,2,npts)
z = x*np.exp(-x**2-y**2)
# define grid.
xi = np.linspace(-2.1,2.1,100)
yi = np.linspace(-2.1,2.1,100)
# grid the data.
zi = griddata(x,y,z,xi,yi)
# contour the gridded data, plotting dots at the randomly spaced data points.
CS = plt.contour(xi,yi,zi,15,linewidths=0.5,colors='k')
CS = plt.contourf(xi,yi,zi,15,cmap=plt.cm.jet)
plt.colorbar() # draw colorbar
# plot data points.
plt.scatter(x,y,marker='o',c='b',s=5)
plt.xlim(-2,2)
plt.ylim(-2,2)
plt.title('griddata test (%d points)' % npts)
plt.show()
By default, griddata uses the scikits delaunay package (included in matplotlib) to do the natural neighbor interpolation.
Unfortunately, the delaunay package is known to fail for some nearly pathological cases. If you run into one of those
cases, you can install the matplotlib natgrid toolkit. Once that is installed, the griddata function will use it instead
of delaunay to do the interpolation. The natgrid algorithm is a bit more robust, but cannot be included in matplotlib
proper because of licensing issues.
The radial basis function module in the scipy sandbox can also be used to interpolate/smooth scattered data in n
dimensions. See [”Cookbook/RadialBasisFunctions”] for details.
A less robust but perhaps more intuitive method is presented in the code below. This function takes three 1D arrays,
namely two independent data arrays and one dependent data array and bins them into a 2D grid. On top of that, the
code also returns two other grids, one where each binned value represents the number of points in that bin and another
in which each bin contains the indices of the original dependent array which are contained in that bin. These can be
further used for interpolation between bins if necessary.
The is essentially an Occam’s Razor approach to the matplotlib.mlab griddata function, as both produce similar results.
x : ndarray (1D)
import numpy as np
import matplotlib.pyplot as plt
import griddata
npr = np.random
npts = 3000. # the total number of data points.
x = npr.normal(size=npts) # create some normally distributed dependent
˓→data in x.
# now show the dependent data (x vs y). we could represent the z data
# as a third axis by either a 3d plot or contour plot, but we need to
# grid it first....
plt.plot(x, y, '.')
plt.title('X vs Y')
plt.xlabel('X')
plt.ylabel('Y')
# minimum values for colorbar. filter our nans which are in the grid
zmin = grid[np.where(np.isnan(grid) == False)].min()
zmax = grid[np.where(np.isnan(grid) == False)].max()
# colorbar stuff
palette = plt.matplotlib.colors.LinearSegmentedColormap('jet3',plt.cm.datad['jet'],
˓→2048)
palette.set_under(alpha=0.0)
plt.xlabel('X values')
plt.ylabel('Y values')
plt.title('Z = F(X, Y)')
plt.colorbar()
# now show the number of points in each bin. since the independent data are
# Gaussian distributed, we expect a 2D Gaussian.
plt.subplot(1, 2, 2)
plt.imshow(bins, extent=extent, cmap=palette, origin='lower', vmin=0, vmax=bins.max(),
˓→ aspect='auto', interpolation='bilinear')
plt.xlabel('X values')
plt.ylabel('Y values')
plt.title('X, Y vs The No. of Pts Per Bin')
plt.colorbar()
Attachments
• bin.png
• bin_small
• bin_small.png
• binned_data.png
• griddataexample1.png
• ppb.png
• ppb_raw.png
• raw.png
• raw_small
• raw_small.png
• unbinned_data.png
In a thread on the matplotlib mailling list, James Boyle posted a way to load colormaps from a file. Here it is slightly
modified.
gmtColormap.py
def gmtColormap(fileName,GMTPath = None):
import colorsys
import Numeric
N = Numeric
if type(GMTPath) == type(None):
filePath = "/usr/local/cmaps/"+ fileName+".cpt"
else:
filePath = GMTPath+"/"+ fileName +".cpt"
try:
f = open(filePath)
except:
print "file ",filePath, "not found"
return None
lines = f.readlines()
f.close()
x = []
r = []
g = []
b = []
colorModel = "RGB"
for l in lines:
ls = l.split()
if l[0] == "#":
if ls[-1] == "HSV":
colorModel = "HSV"
continue
else:
continue
if ls[0] == "B" or ls[0] == "F" or ls[0] == "N":
pass
else:
x.append(float(ls[0]))
r.append(float(ls[1]))
g.append(float(ls[2]))
b.append(float(ls[3]))
xtemp = float(ls[4])
rtemp = float(ls[5])
gtemp = float(ls[6])
btemp = float(ls[7])
x.append(xtemp)
r.append(rtemp)
g.append(gtemp)
b.append(btemp)
nTable = len(r)
x = N.array( x , N.Float)
r = N.array( r , N.Float)
g = N.array( g , N.Float)
b = N.array( b , N.Float)
if colorModel == "HSV":
for i in range(r.shape[0]):
rr,gg,bb = colorsys.hsv_to_rgb(r[i]/360.,g[i],b[i])
r[i] = rr ; g[i] = gg ; b[i] = bb
if colorModel == "HSV":
for i in range(r.shape[0]):
rr,gg,bb = colorsys.hsv_to_rgb(r[i]/360.,g[i],b[i])
r[i] = rr ; g[i] = gg ; b[i] = bb
if colorModel == "RGB":
r = r/255.
g = g/255.
b = b/255.
xNorm = (x - x[0])/(x[-1] - x[0])
red = []
blue = []
green = []
for i in range(len(x)):
red.append([xNorm[i],r[i],r[i]])
green.append([xNorm[i],g[i],g[i]])
blue.append([xNorm[i],b[i],b[i]])
colorDict = {"red":red, "green":green, "blue":blue}
return (colorDict)
Image plotting requires data, a colormap, and a normalization. A common desire is to show missing data or other
values in a specified color. The following code shows an example of how to do this.
The code creates a new Colormap subclass and a norm subclass.
The initialization takes a dictionary of value, color pairs. The data is already assumed to be normalized (except for the
sentinels which are preserved). The RGB values at the sentinel values are replaced by the specified colors.
The class normalizes the data in the standard way except for one sublety. takes an “ignore” argument. The ignored
values need to be excluded from the normalization so that they do not skew the results.
I use a not particularly wonderful algorithm of explicitly sorting the data and using the first non-sentinel values to
define the min and max. This can probably be improved, but for my purposes was easy and sufficient. The data is then
normalized including the sentinels. Finally, the sentinels are replaced.
from matplotlib.colors import Colormap, normalize
import matplotlib.numerix as nx
from types import IntType, FloatType, ListType
class SentinelMap(Colormap):
def __init__(self, cmap, sentinels={}):
# boilerplate stuff
self.N = cmap.N
self.name = 'SentinelMap'
self.cmap = cmap
self.sentinels = sentinels
for rgb in sentinels.values():
if len(rgb)!=3:
raise ValueError('sentinel color must be RGB')
rgbaValues[:,:,1] = nx.where(scaledImageData==sentinel, g,
˓→rgbaValues[:,:,1])
rgbaValues[:,:,2] = nx.where(scaledImageData==sentinel, b,
˓→rgbaValues[:,:,2])
rgbaValues[:,:,3] = nx.where(scaledImageData==sentinel,
˓→alpha, rgbaValues[:,:,3])
return rgbaValues
class SentinelNorm(normalize):
"""
Leave the sentinel unchanged
"""
def __init__(self, ignore=[], vmin=None, vmax=None):
self.vmin=vmin
self.vmax=vmax
vmin = self.vmin
vmax = self.vmax
sortValues=nx.sort(rval)
if vmin is None:
# find the lowest non-sentinel value
for thisVal in sortValues:
if thisVal not in self.ignore:
vmin=thisVal #vmin is the
˓→ lowest non-sentinel value
break
else:
vmin=0.
if vmax is None:
for thisVal in sortValues[::-1]:
if thisVal not in self.ignore:
vmax=thisVal #vmax is the
˓→ greatest non-sentinel value
break
else:
vmax=0.
else:
if vmin is None: vmin = min(rval)
if vmax is None: vmax = max(rval)
if vmin > vmax:
raise ValueError("minvalue must be less than or equal to
˓→ maxvalue")
elif vmin==vmax:
return 0.*value
else:
if needs_clipping:
val = nx.clip(val,vmin, vmax)
result = (1.0/(vmax-vmin))*(val-vmin)
if vtype == 'scalar':
result = result[0]
return result
if __name__=="__main__":
import pylab
import matplotlib.colors
n=100
pylab.show()
If the preceeding code is run from a prompt, two images are generated. The first is a pristine image of random data.
The second image is the data modified by setting some blocks to sentinel values and then plotting the sentinels in
specific colors. A sample result is shown below.
Section author: AndrewStraw
Attachments
• sentinel.png
• sentinel_pristine.png
#!python
from pylab import *
from numpy import outer
rc('text', usetex=False)
a=outer(arange(0,1,0.01),ones(10))
figure(figsize=(10,5))
subplots_adjust(top=0.8,bottom=0.05,left=0.01,right=0.99)
maps=[m for m in cm.datad if not m.endswith("_r")]
maps.sort()
l=len(maps)+1
for i, m in enumerate(maps):
subplot(1,l,i+1)
axis("off")
imshow(a,aspect='auto',cmap=get_cmap(m),origin="lower")
title(m,rotation=90,fontsize=10)
savefig("colormaps.png",dpi=100,facecolor='gray')
But, what if I think those colormaps are ugly? Well, just make your own using mat-
plotlib.colors.!LinearSegmentedColormap.
First, create a script that will map the range (0,1) to values in the RGB spectrum. In this dictionary, you will have a
series of tuples for each color ‘red’, ‘green’, and ‘blue’. The first elements in each of these color series needs to be
ordered from 0 to 1, with arbitrary spacing inbetween. Now, consider (0.5, 1.0, 0.7) in the ‘red’ series below. This
tuple says that at 0.5 in the range from (0,1) , interpolate from below to 1.0, and above from 0.7. Often, the second two
values in each tuple will be the same, but using diferent values is helpful for putting breaks in your colormap. This is
easier understand than might sound, as demonstrated by this simple script:
#!python
from pylab import *
cdict = {'red': ((0.0, 0.0, 0.0),
(0.5, 1.0, 0.7),
(1.0, 1.0, 1.0)),
'green': ((0.0, 0.0, 0.0),
(0.5, 1.0, 0.0),
(1.0, 1.0, 1.0)),
'blue': ((0.0, 0.0, 0.0),
(0.5, 1.0, 0.0),
(1.0, 0.5, 1.0))}
my_cmap = matplotlib.colors.LinearSegmentedColormap('my_colormap',cdict,256)
pcolor(rand(10,10),cmap=my_cmap)
colorbar()
As you see, the colormap has a break halfway through. Please use this new power responsibly.
Here a slightly modified version of the above code which allows for displaying a selection of the pre-defined colormaps
as well as self-created registered colormaps. Note that the cmap_d dictionary in the cm module is not documented.
The choice of indexed colors in discrete_cmap is somewhat haphazardous...
includes:
examples for registering own color maps
utility for showing all or selected named colormaps including self-defined ones"""
import matplotlib
import matplotlib.colors as col
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import numpy as np
def register_own_cmaps():
"""define two example colormaps as segmented lists and register them"""
# a good guide for choosing colors is provided at
# http://geography.uoregon.edu/datagraphics/color_scales.htm
#
# example 1:
# create own colormap from purple, blue, green, orange to red
# cdict contains a tuple structure for 'red', 'green', and 'blue'.
# Each color has a list of (x,y0,y1) tuples, where
# x defines the "index" in the colormap (range 0..1), y0 is the
# color value (0..1) left of x, and y1 the color value right of x.
# The LinearSegmentedColormap method will linearly interpolate between
# (x[i],y1) and (x[i+1],y0)
# The gamma value denotes a "gamma curve" value which adjusts the brightness
# at the bottom and top of the colormap. According to matlab documentation
# this means:
# colormap values are modified as c^gamma, where gamma is (1-beta) for
# beta>0 and 1/(1+beta) for beta<=0
cdict = {'red': ((0.0, 0.0, 0.0),
(0.3, 0.5, 0.5),
(0.6, 0.7, 0.7),
(0.9, 0.8, 0.8),
(1.0, 0.8, 0.8)),
'green': ((0.0, 0.0, 0.0),
(0.3, 0.8, 0.8),
(0.6, 0.7, 0.7),
(0.9, 0.0, 0.0),
(1.0, 0.7, 0.7)),
'blue': ((0.0, 1.0, 1.0),
(0.3, 1.0, 1.0),
(0.6, 0.0, 0.0),
(0.9, 0.0, 0.0),
(1.0, 1.0, 1.0))}
cmap1 = col.LinearSegmentedColormap('my_colormap',cdict,N=256,gamma=0.75)
cm.register_cmap(name='own1', cmap=cmap1)
def discrete_cmap(N=8):
"""create a colormap with N (N<15) discrete colors and register it"""
# define individual colors as hex values
cpool = [ '#bd2309', '#bbb12d', '#1480fa', '#14fa2f', '#000000',
'#faf214', '#2edfea', '#ea2ec4', '#ea2e40', '#cdcdcd',
'#577a4d', '#2e46c0', '#f59422', '#219774', '#8086d9' ]
cmap3 = col.ListedColormap(cpool[0:N], 'indexed')
cm.register_cmap(cmap=cmap3)
def show_cmaps(names=None):
"""display all colormaps included in the names list. If names is None, all
defined colormaps will be shown."""
# base code from http://www.scipy.org/Cookbook/Matplotlib/Show_colormaps
matplotlib.rc('text', usetex=False)
a=np.outer(np.arange(0,1,0.01),np.ones(10)) # pseudo image data
f=plt.figure(figsize=(10,5))
f.subplots_adjust(top=0.8,bottom=0.05,left=0.01,right=0.99)
# get list of all colormap names
# this only obtains names of built-in colormaps:
maps=[m for m in cm.datad if not m.endswith("_r")]
# use undocumented cmap_d dictionary instead
maps = [m for m in cm.cmap_d if not m.endswith("_r")]
maps.sort()
# determine number of subplots to make
l=len(maps)+1
if names is not None: l=len(names) # assume all names are correct!
# loop over maps and plot the selected ones
i=0
for m in maps:
if names is None or m in names:
i+=1
ax = plt.subplot(1,l,i)
ax.axis("off")
plt.imshow(a,aspect='auto',cmap=cm.get_cmap(m),origin="lower")
plt.title(m,rotation=90,fontsize=10,verticalalignment='bottom')
plt.savefig("colormaps.png",dpi=100,facecolor='gray')
if __name__ == "__main__":
register_own_cmaps()
discrete_cmap(8)
show_cmaps(['indexed','Blues','OrRd','PiYG','PuOr',
'RdYlBu','RdYlGn','afmhot','binary','copper',
'gist_ncar','gist_rainbow','own1','own2'])
Attachments
• cmap_example.png
• colormaps3.png
Simple Plotting
Histograms
Here is an example for creating a 2D histogram with variable bin size and displaying it.
#!/usr/bin/python
import numpy as np
import matplotlib as ml
import matplotlib.pyplot as plt
ml.rcParams['image.cmap'] = 'gist_heat'
Attachments
• histogram2d.png
Matplotlib: animations
*Note:* Some of the matplotlib code in this cookbook entry may be deprecated or obsolete. For example,
the file anim.py mentioned below no longer exists in matplotlib. Examples of animation in matplotlib are at
http://matplotlib.sourceforge.net/examples/animation/index.html, and the main animation API documentation is http:
//matplotlib.sourceforge.net/api/animation_api.html.
matplotlib supports animated plots, and provides a number of demos. An important question when considering whether
to use matplotlib for animation is what kind of speed you need. matplotlib is not the fastest plotting library in the west,
and may be too slow for some animation applications. Nonetheless, it is fast enough for many if not most, and this
tutorial is geared to showing you how to make it fast enough for you. In particular, the section *Animating selected
plot elements* shows you how to get top speed out of matplotlib animations.
matplotlib supports 5 different graphical user interfaces (GTK, WX, Qt, Tkinter, FLTK) and for some of those GUIs,
there are various ways to draw to the canvas. For example, for GTK, you can use native GDK drawing, antigrain, or
cairo. A GUI toolkit combined with some method of drawing comprises a backend. For example, drawing to a GTK
canvas with the antigrain drawing toolkit is called the GTKAgg backend. This is important because different backends
have different performance characteristics, and the difference can be considerable.
When considering performance, the typical measure is frames per second. Television is 30 frames per second, and
for many application if you can get 10 or more frames per second the animation is smooth enough to “look good”.
Monitors refresh at 75-80 frames per second typically, and so this is an upper limit for performance. Any faster is
probably wasted CPU cycles.
Here are some numbers for the animated script anim.py, which simply updates a sine wave on various backends,
run under linux on a 3GHz Pentium IV. To profile a script under different backends, you can use the “GUI neutral”
animation technique described below and then run it with the flag, as in:
Here are the results. Note that these should be interpreted cautiously, because some GUIs might call a drawing
operation in a separate thread and return before it is complete, or drop a drawing operation while one is in the queue.
The most important assessment is qualitative.
Backend Frames/second
GTK 43
GTKAgg 36
TkAgg 20
WX 11
WXAgg 27
The pylab interface supports animation that does not depend on a specific GUI toolkit. This is not recommended for
production use, but is often a good way to make a quick-and-dirty, throw away animation. After importing pylab, you
need to turn interaction on with the [http://matplotlib.sf.net/matplotlib.pylab.html#-ion ion] command. You can then
force a draw at any time with [http://matplotlib.sf.net/matplotlib.pylab.html#-draw draw]. In interactive mode, a new
draw command is issued after each pylab command, and you can also temporarily turn off this behavior for a block of
plotting commands in which you do not want an update with the [http://matplotlib.sf.net/matplotlib.pylab.html#-ioff
ioff] commands. This is described in more detail on the interactive page.
Here is the anim.py script that was used to generate the profiling numbers across backends in the table above
ion()
Note the technique of creating a line with the call to [http://matplotlib.sf.net/matplotlib.pylab.html#-plot plot]:
line, = plot(x,sin(x))
and then setting its data with the set_ydata method and calling draw:
line.set_ydata(sin(x+i/10.0)) # update the data
draw() # redraw the canvas
This can be much faster than clearing the axes and/or creating new objects for each plot command.
To animate a pcolor plot use:
p = pcolor(XI,YI,C[0])
for i in range(1,len(C)):
p.set_array(C[i,0:-1,0:-1].ravel())
p.autoscale()
draw()
This assumes C is a 3D array where the first dimension is time and that XI,YI and C[i,:,:] have the same shape. If
C[i,:,:] is one row and one column smaller simply use C.ravel().
If you are doing production code or anything semi-serious, you are advised to use the GUI event handling specific to
your toolkit for animation, because this will give you more control over your animation than matplotlib can provide
through the GUI neutral pylab interface to animation. How you do this depends on your toolkit, but there are examples
for several of the backends in the matplotlib examples directory, eg, anim_tk.py for Tkinter, dynamic_image_gtkagg.py
for GTKAgg and dynamic_image_wxagg.py for WXAgg.
The basic idea is to create your figure and a callback function that updates your figure. You then pass that callback to
the GUI idle handler or timer. A simple example in GTK looks like
def callback(*args):
line.set_ydata(get_next_plot())
canvas.draw() # or simply "draw" in pylab
gtk.idle_add(callback)
wx.EVT_IDLE(wx.GetApp(), callback)
One limitation of the methods presented above is that all figure elements are redrawn with every call to draw, but
we are only updating a single element. Often what we want to do is draw a background, and animate just one or
two elements on top of it. As of matplotlib-0.87, GTKAgg, !TkAgg, WXAgg, and FLTKAgg support the methods
discussed here.
The basic idea is to set the ‘animated’ property of the Artist you want to animate (all figure elements from Figure to
Axes to Line2D to Text derive from the base class Artist). Then, when the standard canvas draw operation is called,
all the artists except the animated one will be drawn. You can then use the method to copy a rectangular region (eg the
axes bounding box) into a a pixel buffer. In animation, you restore the background with , and then call to draw your
animated artist onto the clean background, and to blit the updated axes rectangle to the figure. When I run the example
below in the same environment that produced 36 FPS for GTKAgg above, I measure 327 FPS with the techniques
below. See the caveats on performance numbers mentioned above. Suffice it to say, quantitatively and qualitiatively it
is much faster.
import sys
import gtk, gobject
import pylab as p
import matplotlib.numerix as nx
import time
ax = p.subplot(111)
canvas = ax.figure.canvas
# for profiling
tstart = time.time()
# save the clean slate background -- everything but the animated line
# is drawn and saved in the pixel buffer background
background = canvas.copy_from_bbox(ax.bbox)
def update_line(*args):
# restore the clean slate background
canvas.restore_region(background)
# update the data
line.set_ydata(nx.sin(x+update_line.cnt/10.0))
# just draw the animated artist
ax.draw_artist(line)
# just redraw the axes rectangle
canvas.blit(ax.bbox)
if update_line.cnt==50:
# print the timing info and quit
print 'FPS:' , update_line.cnt/(time.time()-tstart)
sys.exit()
update_line.cnt += 1
return True
update_line.cnt = 0
gobject.idle_add(update_line)
p.show()
matplotlib 0.83.2 introduced a cursor class which can utilize these blit methods for no lag cursoring. The class takes a
argument in the constructor. For backends that support the new API (GTKAgg) set :
from matplotlib.widgets import Cursor
import pylab
fig = pylab.figure(figsize=(8,6))
ax = fig.add_axes([0.075, 0.25, 0.9, 0.725], axisbg='#FFFFCC')
x,y = 4*(pylab.rand(2,100)-.5)
ax.plot(x,y,'o')
ax.set_xlim(-2,2)
ax.set_ylim(-2,2)
pylab.show()
As noted above, only the GTKAgg supports the methods above to to the animations of selected actors. The following
are needed
* `` - copy the region in ax.bbox into a pixel buffer and return it in an object type of yo
bbox is a matplotlib BBox instance from the `transforms module <http://
matplotlib.sf.net/transforms.html>‘__. `` is not used by the matplotlib frontend,
but it stores it and passes it back to the backend in the `` method.
You will probably want to store not only the pixel buffer but the rectangular region of the
and the rest will happen automagically. Since some backends ‘’‘do not’‘’ currently implement the required methods,
I am making them available to the users to manage themselves but am not assuming them in the axes drawing code.
Section author: AndrewStraw, Unknown[83], Unknown[84], Unknown[85], Unknown[86], KristjanOnu, Warren-
Weckesser
Matplotlib: arrows
Some example code for how to plot an arrow using the Arrow function.
x = arange(10)
y = x
ax = gca()
Attachments
• plot_arrow.png
xlocations = na.array(range(len(data)))+0.5
width = 0.5
show()
Attachments
• barchart.png
Example of how to replace the default log-plot exponential labels with integer labels. The same method will work for
any kind of custom labeling. This example was pulled from the Python-list mailing list and the original can be found
here.
# If it is not the Formatter will use the default log labels for ticks.
ax = subplot(111)
ax.set_xscale('log')
ax.set_yscale('log')
formatter = FuncFormatter(log_10_product)
ax.xaxis.set_major_formatter(formatter)
ax.yaxis.set_major_formatter(formatter)
# Plot
ax.scatter(result1, predict1, s=40, c='b', marker='s', faceted=False)
ax.scatter(result2, predict2, s=40, c='r', marker='s', faceted=False)
ax.set_xlim(1e-1, 1e4)
ax.set_ylim(1e-1, 1e4)
grid(True)
/usr/lib/python2.7/dist-packages/matplotlib/cbook.py:137:
˓→MatplotlibDeprecationWarning: The faceted option was deprecated in version 1.2. Use
˓→edgecolor instead.
<matplotlib.text.Text at 0x7f07874f2410>
Attachments
• log_labels.png
Hinton diagrams are a way of visualizing numerical values in a matrix/vector, popular in the neural networks and
machine learning literature. The area occupied by a square is proportional to a value’s magnitude, and the colour
(black or white in this case) indicates its sign (positive/negative).
)#
import numpy as N
import pylab as P
def _blob(x,y,area,colour):
"""
Draws a square-shaped blob with the given area (< 1) at
the given coordinates.
"""
hs = N.sqrt(area) / 2
xcorners = N.array([x - hs, x + hs, x + hs, x - hs])
P.fill(N.array([0,width,width,0]),N.array([0,0,height,height]),'gray')
P.axis('off')
P.axis('equal')
for x in xrange(width):
for y in xrange(height):
_x = x+1
_y = y+1
w = W[y,x]
if w > 0:
_blob(_x - 0.5, height - _y + 0.5, min(1,w/maxWeight),'white')
elif w < 0:
_blob(_x - 0.5, height - _y + 0.5, min(1,-w/maxWeight),'black')
if reenable:
P.ion()
P.show()
Attachments
• hinton.png
Matplotlib: legend
If you have a lot of points to plot, you may want to set a marker only once every n points, e.g.
plot(x, y, '-r')
plot(x[::20], y[::20], 'ro')
Then the automatic legend sees this as two different plots. One approach is to create an extra line object that is not
plotted anywhere but used only for the legend:
There is no nice easy way to add a legend outside (to the right of) your plot, but if you set the axes right in the first
place, it works OK:
figure()
axes([0.1,0.1,0.71,0.8])
plot([0,1],[0,1],label="line 1")
hold(1)
plot([0,1],[1,0.5],label="line 2")
legend(loc=(1.03,0.2))
show()
One can simply set the attribute of the axes to and redraw:
ax = gca()
ax.legend_ = None
draw()
Matplotlib: maps
Plotting data on map projections is easy with the basemap toolkit. Toolkits are collections of application-specific
functions that extend matplotlib.
The basemap toolkit is not in the default matplotlib install - you can download it from the matplotlib sourceforge
download page.
Suppose you’d like to make a map of the world using an orthographic, or satellite projection and plot some data on it.
Here’s how to make the map (using matplotlib >= 0.98.0 and basemap >= 0.99):
There are many other map projections available, probably more than you’ve even heard of before. A complete list is
available in the basemap docstrings. Coastlines, political boundaries and rivers are available in four resolutions, , , and
. Here’s what the resolution coastlines look like.
[](files/attachments/Matplotlib_Maps/basemap0.png
Now, suppose you would like to plot the locations of five cities on this map. Add the following just before the in the
above script:
[](files/attachments/Matplotlib_Maps/basemap1b.png
Calling a basemap class instance with arrays of longitudes and latitudes returns those locations in native map projection
coordinates using the proj4 library. Now suppose you have some data on a regular latitude/longitude grid and you
would like to plot contours of that data over the map. Try adding the following lines just before
[](files/attachments/Matplotlib_Maps/basemap2b.png
Instead of plotting the continents and coastlines, you can use an image as a map background using the method. The
default background image is the NASA ‘blue marble’ image, which you can apply by using
map.bluemarble()
in place of
map.drawcoastlines()
map.drawcountries()
map.fillcontinents(color='coral')
Here’s what the resulting plot looks like (using white text instead of black, and white dots instead of blue)
You can also plot images, pcolor plots and vectors over map projections. Examples that illustrate this and more can be
found in the examples directory of the basemap source distribution.
Section author: AndrewStraw, Unknown[97], GaelVaroquaux, Unknown[16], Unknown[105]
Attachments
• basemap0.png
• basemap1.png
• basemap1b.png
• basemap2.png
• basemap2b.png
• basemap3.png
• basemap3b.png
• basemap3c.png
colored_line.py is a simple illustration of how to make the )# color of each segment of a line depend on some
property of the data being plotted.
An up to date version of the script can be found here.
# Create a colormap for red, green and blue and a norm to color
# f' < -0.5 red, f' > 0.5 blue, and the rest green
cmap = ListedColormap(['r', 'g', 'b'])
norm = BoundaryNorm([-1, -0.5, 0.5, 1], cmap.N)
plt.xlim(x.min(), x.max())
plt.ylim(-1.1, 1.1)
plt.show()
Note that the number of segments is one less than the number of points.
An alternative strategy would be to generate one segment for each contiguous region of a given color.
If you have a parametric curve to display, and want to represent the parameter using color.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
plt.gca().add_collection(lc)
plt.xlim(-1, 1)
plt.ylim(-1, 1)
plt.show()
Attachments
• colored_line.png
• colored_line.py
• colored_line2.png
Often one wants to plot many signals over one another. There are a few ways to do this. The naive implementation is
just to add a constant offset to each signal:
from pylab import plot, show, ylim, yticks
from matplotlib.numerix import sin, cos, exp, pi, arange
show()
but then it is difficult to do change the y scale in a reasonable way. For example when you zoom in on y, the signals
on top and bottom will go off the screen. Often what one wants is for the y location of each signal to remain in place
and the gain of the signal to be changed.
If you have just a few signals, you could make each signal a separate axes and make the y label horizontal. This
works fine for a small number of signals (4-10 say) except the extra horizontal lines and ticks around the axes may be
annoying. It’s on our list of things to change the way these axes lines are draw so that you can remove it, but it isn’t
done yet:
from pylab import figure, show, setp
from matplotlib.numerix import sin, cos, exp, pi, arange
fig = figure()
t = arange(0.0, 2.0, 0.01)
yprops = dict(rotation=0,
horizontalalignment='right',
verticalalignment='center',
x=-0.01)
axprops = dict(yticks=[])
axprops['sharex'] = ax1
axprops['sharey'] = ax1
# force x axes to remain in register, even with toolbar navigation
ax2 = fig.add_axes([0.1, 0.5, 0.8, 0.2], **axprops)
ax2.plot(t, s2)
ax2.set_ylabel('S2', **yprops)
show()
[](files/attachments/Matplotlib_MultilinePlots/multipleaxes.png
For large numbers of lines the approach above is inefficient because creating a separate axes for each line creates a
lot of useless overhead. The application that gave birth to matplotlib is an EEG viewer which must efficiently handle
hundreds of lines; this is is available as part of the pbrain package.
Here is an example of how that application does multiline plotting with “in place” gain changes. Note that this
will break the y behavior of the toolbar because we have changed all the default transforms. In my application I
have a custom toolbar to increase or decrease the y scale. In this example, I bind the plus/minus keys to a function
which increases or decreases the y gain. Perhaps I will take this and wrap it up into a function called plot_signals
or something like that because the code is a bit hairy since it makes heavy use of the somewhat arcane matplotlib
transforms. I suggest reading up on the transforms module before trying to understand this example:
s4 = sin(2*pi*t)*cos(4*pi*t)
s5 = s1*s2
s6 = s1-s4
s7 = s3*s4-s1
# This bounding reuses the x data of the viewLim for the xscale -10 to
# 10 on the y scale. -10 to 10 means that a signal with a min/max
# amplitude of 10 will span the entire vertical extent of the axes
scale = 10
boxin = Bbox(
Point(ax.viewLim.ll().x(), Value(-scale)),
Point(ax.viewLim.ur().x(), Value(scale)))
boxout = Bbox(
Point(ax.bbox.ll().x(), Value(-0.5) * height),
Point(ax.bbox.ur().x(), Value( 0.5) * height))
# now add the signals, set the transform, and set the offset of each
# line
ticklocs = []
for i, s in enumerate(signals):
trans = get_bbox_transform(boxin, boxout)
offset = (i+1.)/(len(signals)+1.)
trans.set_offset( (0, offset), transOffset)
ax.set_yticks(ticklocs)
ax.set_yticklabels(['S%d'%(i+1) for i in range(len(signals))])
setp(all, transform=ax.transAxes)
setp(labels, x=-0.01)
ax.set_xlabel('time (s)')
def keypress(event):
if event.key in ('+', '='): set_ygain(-1)
elif event.key in ('-', '_'): set_ygain(1)
connect('key_press_event', keypress)
ax.set_title('Use + / - to change y gain')
show()
Attachments
• multiline.png
• multipleaxes.png
From time to time one might end up with “meaningless” data in an array. Be it because a detector didn’t work properly
or for an other reason. Or one has to deal with data in completely different ranges. In both cases plotting all values
will screw up the plot. This brief example script addresses this problem and show one possible solution using masked
arrays. See ‘masked_demo.py’ in the matplotlib examples for a reference, too.
import numpy as np
import matplotlib.pyplot as plt
y_values = [0,0,100,97,98,0,99,101,0,102,99,105,101]
x_values = [0,1,2,3,4,5,6,7,8,9,10,11,12]
#give a threshold
threshold = 1
savefig('masked_test.png')
The resulting figure might illustrate the problem - note the different scales in all three subplots:
Section author: AndrewStraw, UnuTbu
Attachments
• masked_test.png
Use the fill function to make shaded regions of any color tint. Here is an example.
x = arange(10)
y = x
[<matplotlib.patches.Polygon at 0x7f7a19aac890>]
Attachments
• shaded.png
matplotlib’s approach to plotting functions requires you to compute the x and y vertices of the curves you want to plot
and then pass it off to plot. Eg for a normal pdf, matplotlib.mlab provides such a function:
x = nx.arange(-4, 4, 0.01)
y = normpdf(x, 0, 1) # unit normal
p.plot(x,y, color='red', lw=2)
p.show()
Of course, some curves do not have closed form expressions and are not amenable for such treatment. Some of the
matplotlib backends have the capability to draw arbitrary paths with splines (cubic and quartic) but this functionality
hasn’t been exposed to the user yet (as of 0.83). If you need this, please post to the mailing list or submit a sourceforge
support request.
Rich Shepard was interested in plotting “S curves” and “Z curves”, and a little bit of googling suggests that the S curve
is a sigmoid and the Z curve is simply 1.0-sigmoid. There are many simple forms for sigmoids: eg, the hill, boltzman,
and arc tangent functions. Here is an example of the boltzman function:
import matplotlib.numerix as nx
import pylab as p
x = nx.arange(-6, 6, .01)
S = boltzman(x, 0, 1)
Z = 1-boltzman(x, 0.5, 1)
p.plot(x, S, x, Z, color='red', lw=2)
p.show()
import matplotlib.numerix as nx
import pylab as p
x = nx.arange(-6, 6, .01)
S = boltzman(x, 0, 1)
Z = 1-boltzman(x, 0.5, 1)
p.plot(x, S, x, Z, color='red', lw=2)
fill_below_intersection(x, S, Z)
p.show()
As these examples illustrate, matplotlib doesn’t come with helper functions for all the kinds of curves people want to
plot, but along with numerix and python, provides the basic tools to enable you to build them yourself.
Section author: AndrewStraw, RobManagan
Attachments
• sigmoids.png
• sigmoids2.png
Example of how to thicken the lines around your plot (axes lines) and to get big bold fonts on the tick and axis labels.
# Save figure
savefig('thick_axes.png')
Attachments
• thick_axes.png
Matplotlib: transformations
Whenever you pass coordinates to matplotlib, the question arises, what kind of coordinates you mean. Consider the
following example
A label ‘my label’ is added to the axes at the coordinates x,y, or stated more clearly: The text is placed at the theoretical
position of a data point (x,y). Thus we would speak of “data coords”. There are however other coordinates one can
think of. You might e.g. want to put a label in the exact middle of your graph. If you specified this by the method
above, then you would need to determine the minimum and maximum values of x and y to determine the middle.
However, using transforms, you can simply use
There are four built-in transforms that you should be aware of (let ax be an Axes instance and fig a Figure instance):
These transformations can be used for any kind of Artist, not just for text objects.
The default transformation for ax.text is ax.transData and the default transformation for fig.text is fig.transFigure.
Of course, you can define more general transformations, e.g. matplotlib.transforms.Affine, but the four listed above
arise in a lot of applications.
xy_tup() is no more. Please see the official Matplotlib documentation at http://matplotlib.sourceforge.net/users/
transforms_tutorial.html for further reference.
If you find that the built-in tick labels of Matplotlib are not enough for you, you can use transformations to implement
something similar. Here is an example that draws annotations below the tick labels, and uses a transformation to
guarantee that the x coordinates of the annotation correspond to the x coordinates of the plot, but the y coordinates are
at a fixed position, independent of the scale of the plot:
import matplotlib as M
import Numeric as N
import pylab as P
blend = M.transforms.blend_xy_sep_transform
fig = P.figure()
doplot(fig, 121, N.sin)
doplot(fig, 122, lambda x: 10*N.sin(x))
P.show()
Sometimes you want to specify that a label is shown a fixed pixel offset from the corresponding data point, regardless
of zooming. Here is one way to do it; try running this in an interactive backend, and zooming and panning the figure.
The way this works is by first taking a shallow copy of ‘transData‘ and then adding an offset to it. All transformations
can have an offset which can be modified with ‘set_offset‘, and the copying is necessary to avoid modifying the
transform of the data itself. New enough versions of matplotlib (currently only the svn version) have an ‘offset_copy‘
function which does this automatically.
import matplotlib
import matplotlib.transforms
from pylab import figure, show
fig=figure()
ax=fig.add_subplot(111)
# add labels
trans=offset(ax, 10, 5)
for a,b in zip(x,y):
ax.text(a, b, '(%d,%d)'%(a,b), transform=trans)
show()
Matplotlib: treemap
Treemaps are a nice way of showing tree information not based on a connected node approach.
See http://www.cs.umd.edu/hcil/treemap/
[](files/attachments/Matplotlib_TreeMap/TreeMap.png
"""
Treemap builder using pylab.
import pylab
from matplotlib.patches import Rectangle
class Treemap:
def __init__(self, tree, iter_method, size_method, color_method):
"""create a tree map from tree, using itermethod(node) to walk tree,
self.ax = pylab.subplot(111,aspect='equal')
pylab.subplots_adjust(left=0, right=1, top=1, bottom=0)
self.ax.set_xticks([])
self.ax.set_yticks([])
self.size_method = size_method
self.iter_method = iter_method
self.color_method = color_method
self.addnode(tree)
except TypeError:
pass
if __name__ == '__main__':
# example using nested lists, iter to walk and random colors
size_cache = {}
def size(thing):
if isinstance(thing, int):
return thing
if thing in size_cache:
return size_cache[thing]
else:
size_cache[thing] = reduce(int.__add__, [size(x) for x in thing])
return size_cache[thing]
import random
def random_color(thing):
return (random.random(),random.random(),random.random())
Attachments
• TreeMap.png
Here’s some template code for plotting histograms that don’t look like bar charts, but instead have only outlines (like
IDL creates).
First define a function that does the bulk of the heavy lifting.
import numpy as np
bins[0] = bins[1]
bins[-1] = bins[-2]
data[0] = 0
data[-1] = 0
##########
#
# First make a normal histogram
#
##########
subplot(1, 2, 1)
(n, bins, patches) = hist(data)
# Boundaries
xlo = -max(abs(bins))
xhi = max(abs(bins))
ylo = 0
yhi = max(n) * 1.1
##########
#
# Now make a histogram in outline format
#
##########
(bins, n) = histOutline(data)
subplot(1, 2, 2)
plot(bins, n, 'k-')
axis([xlo, xhi, ylo, yhi])
Attachments
• histNofill.py
• histOutline.py
• hist_outline.png
Typesetting
This page describes several ways to produce publication quality graphics with LaTeX.
LaTeX UsingTex
\documentclass[prl,10pt,twocolumn]{revtex4}
\usepackage{graphicx} % Used to import the graphics
\begin{document}
%...
\begin{figure}[t]
\begin{center}
\showthe\columnwidth % Use this to determine the width of the figure.
\includegraphics[width=\columnwidth]{fig1.eps}
\caption{\label{fig:sin_cos} Plot of the sine and cosine functions.}
\end{center}
\end{figure}
%...
\end{document}
The first step is to determine the size of the figure: this way, when the graphic is included, it will not be resized, and
the fonts etc. will be exactly as you set them rather than scaled (and possibly distored). This can be done in LaTeX by
explicitly setting the width of the figure and using the ‘\showthe‘ command to print this width. (In the example above,
the figure width is set to the ‘\columnwidth‘.)
When the file is processed by LaTeX, look at the output. The example above produces the following output (Note:
LaTeX will pause after the ‘\showthe‘ command, press enter to continue):
This is TeX, Version 3.14159 (Web2C 7.4.5)
LaTeX2e <2001/06/01>
...
> 246.0pt.
l.8 \showthe\columnwidth
% Use this to determine the width of the figure.
?
<fig1.eps> [1] (./tst.aux) )
...
Thus, the figure will be 246.0pt wide. There are 1 inch = 72.27pt (in [La]TeX), so this means that the figure width
should be 3.40390 inches. The heigth depends on the content of the figure, but the golden mean may be used to make
a pleasing figure. Once this is determined, the ‘figure.figsize‘ property can be used to set the default figure size.
#!python numbers=disable
fig_width_pt = 246.0 # Get this from LaTeX using \showthe\columnwidth
inches_per_pt = 1.0/72.27 # Convert pt to inches
golden_mean = (sqrt(5)-1.0)/2.0 # Aesthetic ratio
fig_width = fig_width_pt*inches_per_pt # width in inches
fig_height =fig_width*golden_mean # height in inches
fig_size = [fig_width,fig_height]
Since the figure will not be scaled down, we may explicitly set the font sizes.
#!python numbers=disable
'font.size' : 10,
'axes.labelsize' : 10,
'font.size' : 10,
'text.fontsize' : 10,
'legend.fontsize': 10,
'xtick.labelsize' : 8,
'ytick.labelsize' : 8,
Fine Tuning
With these smaller plot sizes, the default margins are not enough to display the axis labels, so we need to specify large
margins. We do this with an explicit call to the ‘axes()‘ function. In this example, we only have one axis. The typeset
LaTeX document will have whitespace on either side of the figure, so we do not need to include this in the figure.
Thus, we keep just a bit of whitespace at the top and to the right so that the labels do not extend beyond the bounding
box, and add more space to the bottom for the x label:
#!python numbers=disable
pylab.axes([0.125,0.2,0.95-0.125,0.95-0.2])
#!python
import pylab
from pylab import arange,pi,sin,cos,sqrt
fig_width_pt = 246.0 # Get this from LaTeX using \showthe\columnwidth
inches_per_pt = 1.0/72.27 # Convert pt to inch
golden_mean = (sqrt(5)-1.0)/2.0 # Aesthetic ratio
fig_width = fig_width_pt*inches_per_pt # width in inches
fig_height = fig_width*golden_mean # height in inches
fig_size = [fig_width,fig_height]
params = {'backend': 'ps',
'axes.labelsize': 10,
'text.fontsize': 10,
'legend.fontsize': 10,
'xtick.labelsize': 8,
'ytick.labelsize': 8,
'text.usetex': True,
'figure.figsize': fig_size}
pylab.rcParams.update(params)
# Generate data
x = pylab.arange(-2*pi,2*pi,0.01)
y1 = sin(x)
y2 = cos(x)
# Plot data
pylab.figure(1)
pylab.clf()
pylab.axes([0.125,0.2,0.95-0.125,0.95-0.2])
pylab.plot(x,y1,'g:',label='$\sin(x)$')
pylab.plot(x,y2,'-b',label='$\cos(x)$')
pylab.xlabel('$x$ (radians)')
pylab.ylabel('$y$')
pylab.legend()
pylab.savefig('fig1.eps')
items/files/Matplotlib_LaTeX_Examples/naka-rushton.png
rushton.png]
Note: This section is obsolete. Recent versions matplotlib break the psfrag functionality (see for example this discus-
sion. That being said, one can use the usetex feature to render the LaTeX text directly with very good results (if you
are careful about choosing fonts). I will try to discuss this here further in the near future. – MichaelMcNeilForbes
To ensure that your graphics use exactly the same fonts as your document, you can have LaTeX generate and substitute
the text for your graph using the psfrag package. This is a good option if you have problems with the ‘text.usetex‘
method (for example, if the appropriate fonts cannot be found.)
To do this, simply use plain text for the labels and then replace them using the psfrag package. Here are the modified
files to make use of this method:
\documentclass[prl,10pt,twocolumn]{revtex4}
\usepackage{graphicx} % Used to import the graphics
\usepackage{psfrag}
\begin{document}
%...
\begin{figure}[t]
\begin{center}
\psfrag{sin(x)}{$\sin(x)$}
\psfrag{cos(x)}{$\cos(x)$}
\psfrag{x (radians)}{$x$ (radians)}
\psfrag{y}{$y$}
{\footnotesize % Replace tick-lables with smaller font.
\psfrag{1.0}{1.0}
\psfrag{0.5}{0.5}
\psfrag{0.0}{0.0}
\psfrag{-0.5}{-0.5}
\psfrag{-1.0}{-1.0}
\psfrag{-8}{-8}
\psfrag{-6}{-6}
\psfrag{-4}{-4}
\psfrag{-2}{-2}
\psfrag{0}{0}
\psfrag{2}{2}
\psfrag{4}{4}
\psfrag{6}{6}
\psfrag{8}{8}
\showthe\columnwidth % Use this to determine the width of the figure.
\includegraphics[width=\columnwidth]{fig2.eps}
} % Note that the psfrag commands only work in the top-most environment.
\caption{\label{fig:sin_cos} Plot of the sine and cosine functions.}
\end{center}
\end{figure}
%...
\end{document}
#!python
import pylab
Another way to set the legend fonts after the legend has been drawn is to use, for example:
– DavidDonovan
If you have very complex images such as high-resolution contour plots or three-dimensional plots, saving these to
vector graphic formats like PDF or EPS can result in unacceptably large files (though with the ability for stunning
zooms). One solution is to selectively convert parts of the plot (not the text labels) to a rasterized image. This can be
done with the so-called “mixed-mode rendering” capability in recent versions of matplotlib. Here is an example as it
should probably work:
#!python
from pylab import meshgrid, sin, cos, linspace, contourf, savefig, clf
x, y = meshgrid(*(linspace(-1,1,500),)*2)
z = sin(20*x**2)*cos(30*y)
c = contourf(x,y,z,50)
savefig('full_vector.pdf')
clf()
c = contourf(x,y,z,50,rasterized=True)
savefig('rasterized.pdf')
however, {{{contourf}}} currently does not support the {{{rasterized}}} option (which is silently ignored). Some
other plot elements do however. I am looking into a solution for this. – MichaelMcNeilForbes
Section author: MichaelMcNeilForbes, Unknown[100], GaelVaroquaux, Unknown[101], EmmanuelleGouillart, Un-
known[102], LaurentPerrinet, Unknown[8]
Attachments
• fig.png
• fig1.png
• naka-rushton.png
• psfrag_example.png
Matplotlib can use LaTeX to handle the text layout in your figures. This option (which is still somewhat experimental)
can be activated by setting text.usetex : true in your rc settings. Text handling with matplotlib’s LaTeX support
is slower than standard text handling, but is more flexible, and produces publication-quality plots. The results are
striking, especially when you take care to use the same fonts in your figures as in the main document.
Matplotlib’s LaTeX support is still under development, although at least two individuals have relied upon it to generate
the figures for their doctoral dissertations. Many improvements have been made beginning with matplotlib-0.87, please
update matplotlib if you have an earlier version. This option requires a working LaTeX installation, dvipng (which
may be included with your TeX installation), and ghostscript (AFPL, GPL, or ESP ghostscript should all work, but
GPL ghostscript-8.60 or later is recommended). The executables for these external dependencies must be located on
your PATH.
There are a couple of options to mention, which can be changed using rc settings, either using a matplotlibrc file, or
the rcParams dict in your program. Here is an example matplotlibrc file:
font.family : serif
font.serif : Times, Palatino, New Century Schoolbook, Bookman, Computer
˓→Modern Roman
text.usetex : true
The first valid font in each family is the one that will be loaded. If the fonts are not specified, the Computer Modern
fonts are used by default. All of the other fonts are Adobe fonts. Times and Palatino each have their own accompanying
math fonts, while the other Adobe serif fonts make use of the Computer Modern math fonts. See psnfss2e.pdf for more
details.
To use tex and select e.g. Helvetica as the default font, without edititing matplotlibrc use:
#!python
from matplotlib import rc
rc('font',**{'family':'sans-serif','sans-serif':['Helvetica']})
## for Palatino and other serif fonts use:
#rc('font',**{'family':'serif','serif':['Palatino']})
rc('text', usetex=True)
rc('text', usetex=True)
figure(1)
ax = axes([0.1, 0.1, 0.8, 0.7])
t = arange(0.0, 1.0+0.01, 0.01)
s = cos(2*2*pi*t)+2
plot(t, s)
xlabel(r'\textbf{time (s)}')
ylabel(r'\textit{voltage (mV)}',fontsize=16)
title(r"\TeX\ is Number $\displaystyle\sum_{n=1}^\infty\frac{-e^{i\pi}}{2^n}$!",
˓→fontsize=16, color='r')
grid(True)
savefig('tex_demo')
show()
Note that when TeX/LaTeX support is enabled, you can mix text and math modes. Display math mode ($$ e=mc^2
$$) is not supported, but adding the command :raw-latex:‘\displaystyle‘, as in tex_demo.py, will produce the same
results.
In order to produce encapsulated postscript files that can be embedded in a new LaTeX document, the default behavior
of matplotlib is to distill the output, which removes some postscript operators used by LaTeX that are illegal in an eps
file. This step produces fonts which may be unacceptable to some users. One workaround is to to set ps.distiller.res
to a higher value (perhaps 6000) in your rc settings. A better workaround, which requires [http://www.foolabs.com/
xpdf/download.html xpdf] or [http://poppler.freedesktop.org/ poppler] (the new backend to xpdf) can be activated by
changing the rc ps.usedistiller setting to xpdf. The xpdf alternative produces postscript with text that can be edited in
Adobe Illustrator, or searched for once converted to pdf.
• On Windows, the PATH environment variable may need to be modified to find the latex, dvipng and ghostscript
executables. This is done by going to the control panel, selecting the “system” icon, selecting the “advanced”
tab, and clicking the “environment variables” button (and people think Linux is complicated. Sheesh.) Select
the PATH variable, and add the appropriate directories.
• Using MiKTeX with Computer Modern fonts, if you get odd -Agg and PNG results, go to MiKTeX/Options and
update your format files
• The fonts look terrible on screen. You are probably running Mac OS, and there is some funny business with
dvipng on the mac. Set text.dvipnghack : True in your matplotlibrc file.
• On Ubuntu and Gentoo, the base texlive install does not ship with the type1cm package. You may need to install
some of the extra packages to get all the goodies that come bundled with other latex distributions.
• Some progress has been made so Matplotlib uses the dvi files directly for text layout. This allows latex to be
used for text layout with the pdf and svg backends, as well as the *Agg and PS backends. In the future, a latex
installation may be the only external dependency.
• Try rm -r ~/.matplotlib/*cache
• Make sure LaTeX, dvipng and ghostscript are each working and on your PATH.
• Run your script with verbose mode enabled: python example.py –verbose-helpful (or –verbose-debug-annoying)
and inspect the output. Most likely there is a problem reported there. If you need help, post a short example that
reproduces the behavior, explicitly noting any changes to your rc settings, along with what version of matplotlib
you have installed, your os, and the –verbose-* output.
Section author: AndrewStraw, Unknown[117], Unknown[118], Unknown[119]
Attachments
• tex_demo.png
Mayavi
Mayabi: mlab
||<#80FF80> This page is about the tvtk.tools.mlab module. You are strongly advised to use the more recent
mayavi.mlab module which can also be used from ipython -wthread or as a library inside another application.||
The ‘‘mlab.py‘‘ module allows for simple 3D plotting of objects. It provides an object oriented approch to 3d visual-
ization.
It relies on the simple TVTK module, rather than on the full blown Mayavi application. As a result it has less
dependencies. However, it is harder to extend and more limited. The developers are not planning any feature addition
to this module, and we strongly advise you to use the Mayavi mlab module, which provides the same usecases, but
with more functionalities and is under constant development. For more information, you can read the relevant section
of the Mayavi user guide
‘” Table of Contents ‘’‘
||<#FF8080> Important: All these examples must be run in “ipython -wthread” or in a Wx application (like pycrust,
or the Mayavi2 application). They will not work if you don’t use this option.||
Start with ‘ipython -wthread‘ and paste the following code:
.. code-block:: python
import scipy
# prepare some interesting function: def f(x, y):
return 3.0*scipy.sin(x*y+1e-4)/(x*y+1e-4)
x = scipy.arange(-7., 7.05, 0.1) y = scipy.arange(-5., 5.05, 0.1)
# 3D visualization of f: from enthought.tvtk.tools import mlab fig = mlab.figure() s = mlab.SurfRegular(x,
y, f) fig.add(s)
[](files/attachments/MayaVi_mlab/simple_example.png
191
SciPy Cookbook, Release
[x,y]=mgrid[-5:5:0.1,-5:5:0.1]
r=sqrt(x**2+y**2)
z=sin(3*r)/(r)
s=mlab.Surf(x,y,z,z)
f.add(s)
s.scalar_bar.title='sinc(r)'
s.show_scalar_bar=True
# LUT means "Look-Up Table", it give the mapping between scalar value and color
s.lut_type='blue-red'
# The current figure has two objects, the outline object originaly present,
# and the surf object that we added.
f.objects[0].axis.z_label='value'
t=mlab.Title()
t.text='Sampling function'
f.add(t)
# Edit the title properties with the GUI:
t.edit_traits()
[](files/attachments/MayaVi_mlab/tvtk.mlab_example.png
The implementation provided here is object oriented and each visualization capability is implemented as a class that
has traits. So each of these may be configured. Each visualization class derives (ultimately) from MLabBase which is
responsible for adding/removing its actors into the render window. The classes all require that the RenderWindow be
a ‘pyface.tvtk.scene.Scene‘ instance (this constraint can be relaxed if necessary later on).
This module offers the following broad class of functionality:
* ‘‘Figure‘‘
This basically manages all of the objects rendered. Just like
figure in any Matlab like environment. A convenience function
called `figure` may be used to create a nice Figure instance.
* ‘‘Glyphs‘‘
This and its subclasses let one place glyphs at points specified
as inputs. The subclasses are: `Arrows`, `Cones`, `Cubes`,
`Cylinders`, `Spheres`, and `Points`.
* ‘‘Line3‘‘
Draws lines between the points specified at initialization time.
* ‘‘Outline‘‘
Draws an outline for the contained objects.
* ‘‘Title‘‘
Draws a title for the entire figure.
* ‘‘LUTBase‘‘
Manages a lookup table and a scalar bar (legend) for it. This
is subclassed by all classes that need a LUT.
* ‘‘SurfRegular‘‘
MayaVi1's imv.surf like functionality that plots surfaces given x (1D), y(1D)
and z (or a callable) arrays.
* ‘‘SurfRegularC‘‘
Also plots contour lines.
* ‘‘TriMesh‘‘
Given triangle connectivity and points, plots a mesh of them.
* ‘‘FancyTriMesh‘‘
Plots the mesh using tubes and spheres so its fancier.
* ‘‘Mesh‘‘
Given x, y generated from scipy.mgrid, and a z to go with it.
Along with optional
scalars. This class builds the triangle connectivity (assuming that x,
y are from
scipy.mgrid) and builds a mesh and shows it.
* ‘‘FancyMesh‘‘
Like mesh but shows the mesh using tubes and spheres.
* ‘‘Surf‘‘
This generates a surface mesh just like Mesh but renders the mesh as
a surface.
* ‘‘Contour3‘‘
Shows contour for a mesh.
* ‘‘ImShow‘‘
Allows one to view large numeric arrays as image data using an image
actor. This is just like MayaVi1's `mayavi.tools.imv.viewi`.
To see nice examples of all of these look at the ‘test_*‘ functions at the end of this file. Here is a quick example that
uses these test functions:
Attachments
• simple_example.png
• simpleexample.png
• tvtk.mlab_example.png
Mayavi
Mayavi2 is a interactive program allowing elaborate 3D plots of scientific data in Python with scipy. It is the successor
of MayaVi for 3D visualization.
||<#80FF80> The information on this page is a bit old and outdated. You are invited to refer to the Mayavi2 user guide
as a reference. For a quick introduction to scripting, see mlab. In recent versions of Mayavi2, the user guide can be
accessed in the help menu, but the user guide for the latest version of Mayavi can also be found on line. ||
If you need more help, you are invited to ask questions on the Enthought-dev mailing list.
#!figure
#class right
## Snazzy graphics here...
# :
return numpy.sin(x*y)/(x*y)
x = numpy.arange(-7., 7.05, 0.1)
y = numpy.arange(-5., 5.05, 0.05)
from enthought.tvtk.tools import mlab
s = mlab.SurfRegular(x, y, f)
from enthought.mayavi.sources.vtk_data_source import VTKDataSource
d = VTKDataSource()
d.data = s.data
mayavi.add_source(d)
from enthought.mayavi.filters.warp_scalar import WarpScalar
w = WarpScalar()
mayavi.add_filter(w)
from enthought.mayavi.modules.outline import Outline
from enthought.mayavi.modules.surface import Surface
o = Outline()
s = Surface()
mayavi.add_module(o)
mayavi.add_module(s)
You can run this script by running “mayavi2 -n -x script.py”, loading it through the menu (File -> Open File), and
pressing Ctrl+R, or entering “execfile(‘script.py’) in the python shell.
Section author: GaelVaroquaux, Unknown[41], Unknown[133]
Attachments
• surf.png
Mayavi tips
First, enable python shell clicking on Python in the “View” menu. A python shell should appear at the bottom of the
!MayaVi2 window:
Then drag any object from the tree view on the left and drop it on the python shell and you’ll get the object. Say you
want to get the Outline module:
Now, you can use your object following two ways: typing directly in the python shell or using the explore() method.
You can create an instance of your object in the python shell window.
Note that you benefit of “word completion” in this python shell, ie a little window popups letting you choose the name
of the expected object or method.
Thus, you can display the RGB values of the outline color for instance:
However, find out objects or methods can be not so easy: you may not know how they depends from each other. An
easier way is using the explore() method.
Simply type
explore(_)
or
explore(foobar)
On the embedded Python shell in the !MayaVi2 application, the name ‘mayavi’ is bound to the !MayaVi scripting
interface which can be used to script mayavi easily as shown here:
mayavi.save_visualization('foo.mv2')
mayavi.load_visualization('foo.mv2')
Note that the Mayavi engine lets you script mayavi in powerful ways. For example:
e = mayavi.engine
Here ‘e’ is the running Engine instance (mayavi/engine.py) and has the same hierarchy that you see on the mayavi tree
view. For example:
Another example, just run examples/contour.py and type/cut/paste the following to get the scalar cut plane module:
e = mayavi.engine
s = e.scenes[0].children[0].children[0].children[-1]
# Think scene.datafile.module_manager.last_module
It is possible to use Mayavi2 from within IPython and script it. You must first start IPython with the ‘-wthread’
command line option. Here is an example:
In the above, ‘m.script’ is the same as the ‘mayavi’ instance on the embedded shell.
If you are writing a stand-alone script to visualize something like the scripts in the examples/ directory, please use any
of the examples as a template.
Saving snapshots within a python script is very easy:
s = script.engine.current_scene
s.scene.save('test.jpg', size=(width,height))
You can also save images in a lot of others format: !PostScript (ps), Encapsuled !PostScript (eps), PDF (pdf), Bitmap
(bmp), TIFF (tiff), PNG (png), !OpenInventor (iv), Virtual Reality Markup Language (wrl, vrml), Geomview (oogl),
!RenderMan RIB (rib), Wavefront (obj).
The obvious corollary of saving snapshots is saving a lot of snapshots in order to make a movie for example, without
!MayaVi2 window popup for each snapshot recorded.
The answer is straightforward (only under UN*X boxes): use the ‘X virtual framebuffer’.
The following lines give you the trick. You can improve it, of course, scripting it in shell, python, and so on.
* create your X virtual framebuffer with the following command: 'xvfb :1 -screen 0 1280x102
It will use the display #1, with a size of 1280x1024 and 24 bit depth color;
* export your display: 'export DISPLAY=:1' (sh/bash syntax) or 'setenv DISPLAY :1' (csh/
tcsh syntax)
* run your !MayaVi2 script as usual;
* once finished, and all your snapshots have been created,
don't forget to kill the X virtual framebuffer and reset your display to its previous value
If not, you won't be able to see your movie ;-)
Drag the module manager to the python shell and you will be able to enable alpha transparency in the colormap via:
dragged.scalar_lut_manager.lut.alpha_range=(0,1)
Also, if you want to set foreground color, it will be applied for all modules and filters, i.e. outline color, text color,
labels axes, and so on.
Your preferences will be saved in a !MayaVi2 configuration file, so you’ll get these colors each time you run a
!MayaVi2 session.
= Writing VTK data files using TVTK =
Coming soon...
Section author: Unknown[6], GaelVaroquaux, FredericPetit, Pauli Virtanen
Attachments
• ps1.png
• ps2.png
• ps3.png
• ps4.png
• ps5.png
• ps6.png
• ps7.png
• ps8.png
• setcolors.png
Mayavi tvtk
||<#80FF80> This page is not the main source of documentation. You are invited to refer to the Mayavi2 home page
for up-to-date documentation on TVTK. In particular, bear in mind that if you are looking for a high-level Python 3D
plotting library, Mayavi also provides the right API, and can be embedded (see the user guide). ||
tvtk is a traits <http://www.enthought.com/traits>_ enabled version of VTK <http://www.
vtk.org>_ for 3D graphics and visualization.
It provides an exact match to the VTK objects, but with a Pythonic feel, unlike Mayavi <http://code.
enthought.com/projects/mayavi>_ which aims to provide new APIs
Most important features are
• All VTK classes are wrapped.
• Support for traits.
• Elementary pickle support.
• Pythonic feel.
• Handles numpy/Numeric/numarray arrays/Python lists transparently.
• Support for a pipeline browser, ivtk
• High-level mlab module.
• Envisage plugins for a tvtk scene and the pipeline browser.
• MayaVi2 is built on top of tvtk
See the enthought TVTK page <http://www.enthought.com/enthought/wiki/TVTK>_ for more
details, in particular the tvtk introduction <http://www.enthought.com/enthought/wiki/
TVTKIntroduction>_.
cone
The following example displays a cone which can be rotated/scaled/... with the mouse.
[Simple_tvtk_cone.py](files/attachments/MayaVi_tvtk/Simple_tvtk_cone.py)
quadrics
A more interesting example is the generation of some contour surfaces of an implicit function. #Vis_quad.py
This displays the following scene on screen and also saves it to a file.
The module tools.ivtk makes VTK/TVTK easier to use from the Python interpreter. For example, start IPython
with:
ipython -wthread
(if you have both wxPython 2.4 and wxPython 2.6 installed you will need a recent IPython and do ipython
-wthread -wxversion 2.6).
Then you can paste the following lines:
You can then explore the visualization pipeline and modify any settings.
)
For creating the viewer there are different options:
• v = ivtk.viewer() - this one does not need the v.open() and the size=(600,600)
• v = ivtk.IVTK()
• v = ivtk.IVTKWithCrust()
• v = ivtk.IVTKWithBrowser()
• v = ivtk.IVTKWithCrustAndBrowser)
For viewers with Crust you can use the python command line window to modify the pipeline.
Section author: ArndBaecker, Unknown[128], GaelVaroquaux, Unknown[6], Pauli Virtanen
Attachments
• ivtk_example.png
• simple_tvtk_cone.py
• vis_quad.png
• vis_quad.py
Mayavi: examples
||<#80FF80> This page presents scripting Mayavi2 using the advanced, object-oriented API. Mayavi2 has recently
acquired an easy-to-use, thought maybe not as powerful, scripting module: mlab. You are invited to refer to the
section of Mayavi2 user guide. ||
Here, you will be presented some examples of rendering scenes you can get with !MayaVi2. You are advised to
read [:Cookbook/MayaVi/ScriptingMayavi2] to understand what you see, although most examples given here are
self-explanatory.
||<#FF8080> Please note that these examples are not up to date. The example gallery for the latest version of Mayavi
can be found at http://enthought.github.com/mayavi/mayavi/auto/examples.html. ||
#!/usr/bin/env mayavi2
"""This script demonstrates how one can script MayaVi and use its
contour related modules. Notice the magic line at the top.
"""
# Author: Prabhu Ramachandran <[email protected]>
# Copyright (c) 2005-2007, Enthought, Inc.
# License: BSD Style.
def contour():
"""The script itself. We needn't have defined a function but
having a function makes this more reusable.
"""
# 'mayavi' is always defined on the interpreter.
# Create a new scene.
mayavi.new_scene()
cgp.grid_plane.position = 15
# An isosurface module.
iso = IsoSurface(compute_normals=True)
mayavi.add_module(iso)
iso.contour.contours = [220.0]
if __name__ == '__main__':
contour()
[](files/attachments/MayaVi_examples/contour.png
#!/usr/bin/env mayavi2
def glyph():
"""The script itself. We needn't have defined a function but
having a function makes this more reusable.
"""
# 'mayavi' is always defined on the interpreter.
# Create a new VTK scene.
mayavi.new_scene()
r = VTKXMLFileReader()
r.initialize(join(dirname(enthought.mayavi.__file__),
'examples', 'data', 'fire_ug.vtu'))
mayavi.add_source(r)
v = VectorCutPlane()
mayavi.add_module(v)
v.glyph.color_mode = 'color_by_scalar'
# Now mask the points and show glyphs (we could also use
# Vectors but glyphs are a bit more generic)
m = MaskPoints()
m.filter.set(on_ratio=10, random_mode=True)
mayavi.add_filter(m)
g = Glyph()
mayavi.add_module(g)
# Note that this adds the module to the filtered output.
g.glyph.scale_mode = 'scale_by_vector'
# Use arrows to view the scalars.
g.glyph.glyph_source = g.glyph.glyph_list[1]
if __name__ == '__main__':
glyph()
[](files/attachments/MayaVi_examples/glyph.png
#!/usr/bin/env python
"""This script demonstrates how one can use the MayaVi framework
without displaying MayaVi's UI. Note: look at the end of this file to
see how the non gui plugin is chosen instead of the default gui
mayavi plugin.
"""
# Author: Prabhu Ramachandran <[email protected]>
# Copyright (c) 2005, Enthought, Inc.
# License: BSD Style.
class MyApp(Mayavi):
def run(self):
"""This is executed once the application GUI has started.
*Make sure all other MayaVi specific imports are made here!*
"""
script = self.script
ipw.ipw.slice_position = 16
# An isosurface module.
iso = IsoSurface(compute_normals=True)
script.add_module(iso)
iso.contour.contours = [200.0]
if __name__ == '__main__':
m = MyApp()
# Note how we change the plugins that are loaded only here.
m.main(plugin_defs=NONGUI_PLUGIN_DEFINITIONS)
[](files/attachments/MayaVi_examples/nongui.png
#!/usr/bin/env mayavi2
"""
# Author: Prabhu Ramachandran <[email protected]>
# Copyright (c) 2005-2007, Enthought, Inc.
# License: BSD Style.
scalars = (scipy.sin(x*y*z)/(x*y*z))
return scipy.transpose(scalars).copy() # This makes the data contiguous.
def view_numpy():
"""Example showing how to view a 3D numpy array in mayavi2.
"""
# 'mayavi' is always defined on the interpreter.
mayavi.new_scene()
# Make the data and add it to the pipeline.
data = make_data()
src = ArraySource(transpose_input_array=False)
src.scalar_data = data
mayavi.add_source(src)
# Visualize the data.
o = Outline()
mayavi.add_module(o)
ipw = ImagePlaneWidget()
mayavi.add_module(ipw)
ipw.module_manager.scalar_lut_manager.show_scalar_bar = True
ipw_y = ImagePlaneWidget()
mayavi.add_module(ipw_y)
ipw_y.ipw.plane_orientation = 'y_axes'
if __name__ == '__main__':
view_numpy()
[](files/attachments/MayaVi_examples/numeric_source.png
#!/usr/bin/env mayavi2
"""This script demonstrates how one can script MayaVi to display
streamlines and an iso surface.
"""
# Author: Prabhu Ramachandran <[email protected]>
# Copyright (c) 2005-2007, Enthought, Inc.
# License: BSD Style.
def setup_data(fname):
"""Given a VTK XML file name `fname`, this creates a mayavi2
reader for it and adds it to the pipeline. It returns the reader
created.
"""
mayavi.new_scene()
r = VTKXMLFileReader()
r.initialize(fname)
mayavi.add_source(r)
return r
def streamline():
"""Sets up the mayavi pipeline for the visualization.
"""
# Create an outline for the data.
o = Outline()
mayavi.add_module(o)
s = Streamline(streamline_type='tube')
mayavi.add_module(s)
s.stream_tracer.integration_direction = 'both'
s.seed.widget.center = 3.5, 0.625, 1.25
s.module_manager.scalar_lut_manager.show_scalar_bar = True
i = IsoSurface()
mayavi.add_module(i)
i.contour.contours[0] = 550
i.actor.property.opacity = 0.5
if __name__ == '__main__':
import enthought.mayavi
fname = join(dirname(enthought.mayavi.__file__),
'examples', 'data', 'fire_ug.vtu')
r = setup_data(fname)
streamline()
[](files/attachments/MayaVi_examples/streamline.png
#!/usr/bin/env python
"""This script demonstrates how one can script MayaVi, set its size,
create a new VTK scene and create a few simple modules.
"""
# Author: Prabhu Ramachandran <[email protected]>
# Copyright (c) 2005, Enthought, Inc.
# License: BSD Style.
class MyApp(Mayavi):
def run(self):
"""This is executed once the application GUI has started.
*Make sure all other MayaVi specific imports are made here!*
"""
# Various imports to do different things.
from enthought.mayavi.sources.vtk_file_reader import VTKFileReader
from enthought.mayavi.filters.threshold import Threshold
from enthought.mayavi.modules.outline import Outline
from enthought.mayavi.modules.axes import Axes
from enthought.mayavi.modules.grid_plane import GridPlane
from enthought.mayavi.modules.image_plane_widget import ImagePlaneWidget
from enthought.mayavi.modules.text import Text
script = self.script
# Create a new scene.
script.new_scene()
# Create an orientation axes for the scene. This only works with
# VTK-4.5 and above which is why we have the try block.
try:
from enthought.mayavi.modules.orientation_axes import OrientationAxes
except ImportError:
pass
else:
a = OrientationAxes()
a.marker.set_viewport(0.0, 0.8, 0.2, 1.0)
script.add_module(a)
gp.grid_plane.axis = 'z'
if __name__ == '__main__':
a = MyApp()
a.main()
[](files/attachments/MayaVi_examples/test.png
See also [:Cookbook/MayaVi/Surf] for another way of doing this.
#!/usr/bin/env mayavi2
"""Shows how to view data created by `enthought.tvtk.tools.mlab` with
mayavi2.
"""
import numpy
def make_data():
"""Make some test numpy data and create a TVTK data object from it
that we will visualize.
"""
x = numpy.arange(-7., 7.05, 0.1)
y = numpy.arange(-5., 5.05, 0.05)
s = mlab.SurfRegular(x, y, f)
return s.data
def add_data(tvtk_data):
"""Add a TVTK data object `tvtk_data` to the mayavi pipleine.
"""
d = VTKDataSource()
d.data = tvtk_data
mayavi.add_source(d)
def surf_regular():
if __name__ == '__main__':
mayavi.new_scene()
d = make_data()
add_data(d)
surf_regular()
Attachments
• contour.png
• glyph.png
• nongui.png
• numeric_source.png
• streamline.png
• surf_regular_mlab.png
• test.png
mayavi2
* Now that we have data, we can visualize it. First lets create an outline.
Click on "VTK File (heart.vtk)" in the upper left window.
Go into the menu item Visualize/Modules/Outline. This creates an Outline object.
We can edit the properties of the outline in the object editor,
and we can rotate and manipulate the object in the scene editor.
* Now lets add three grid planes.
Select Visualize/Modules/!GridPlane, this create a grid plane in the X axis.
We can create two more grid planes. By selecting Axis in object editor,
we can set those grid planes in the Y and Z axis.
* We can project the object onto the Grid planes using contour grid planes.
Select Visualize/Modules/Contour grid plane. Do this for the Y,
and Z planes as well.
* Now we can look at our data by selecting Visualize/Modules/Isosurface.
* We can now finish adding a scalar cut plane by selecting Visualize/Modules/
Scalar Cut Plane. By clicking on the white arrow we can move the plane around.
The result should look like this (for issue concerning background/forground colors, see [:Cookbook/MayaVi/Tips:
Cookbook/MayaVi/Tips]):
[](files/attachments/MayaVi_RunningMayavi2/mv2.png
In order to know what you can set as parameters on the command line, type
mayavi2 -h
This will list all the available options. Amongst all these options are:
* -d file.vtk: set the VTK file to load into !MayaVi2;
* -m ‘‘module‘‘: set the module to use. The following list displays all available modules,
in alphabetical order with a short description (taken from Python doc files):
* ‘‘Axes‘‘: draws a simple axes using tvtk.!CubeAxesActor;
* ‘‘!ContourGridPlane‘‘: a contour grid plane module. This module lets one take a slice of inpu
The module only works for structured points, rectilinear grid and structured grid input.
* ‘‘!IsoSurface‘‘: an !IsoSurface module that allows the user to make contours of input point d
* ‘‘!OrientationAxes‘‘: creates a small axes on the side that indicates the position of the co-o
It uses the !OrientationMarkerWidget which requires VTK-4.5 and above.
* ‘‘Outline‘‘: an outline module that draws an outline for the given data;
* ‘‘!ScalarCutPlane‘‘: takes a cut plane of any input data set using an implicit plane and plot
* ‘‘!SliceUnstructuredGrid‘‘: this module takes a slice of the unstructured grid data and shows th
* ‘‘Streamline‘‘: allows the user to draw streamlines for given vector data.
This supports various types of seed objects (line, sphere,
plane and point seeds). It also allows the user to draw ribbons or tubes and further suppor
* ‘‘Vectors‘‘: displays different types of glyphs oriented and colored as per vector data at t
This is merely a convenience module that is entirely based on the Glyph module.
* ‘‘Volume‘‘: the Volume module visualizes scalar fields using volumetric visualization techn
This supports !ImageData and !UnstructuredGrid data. It also supports the !
FixedPointRenderer for !ImageData. However, the performance is slow so your best bet is pro
ImageData based renderers.
* ‘‘!WarpVectorCutPlane‘‘: takes an arbitrary slice of the input data using an implicit cut plan
The scalars are displayed on the warped surface as colors.
* -f ‘‘filter‘‘: set the filter to use (load it before module if you want to see your data filt
Available filters are:
* ‘‘!CellToPointData‘‘: transforms cell attribute data to point data by averaging the cell data
* ‘‘!ExtractVectorNorm‘‘: computes the norm (Eucliedean) of the input vector data (with optional
1]). This is useful when the input data has vector input but no scalar data for the magnitu
Quite long to type the first time, but once typed, no more effort load the module or filter you want to use ! ;-)
Thus, typing (in !MayaVi2’s examples directory, see [:Cookbook/MayaVi/Installation: Cook-
book/MayaVi/Installation]):
• and moving the mouse with right button pressed zooms in/out (note: you can also use the wheel of your mouse,
if you get one).
Note: You can get a “predefined” angle of view (normal to x axis, y or z) clicking on the “View” item or on each small
icons (first X: Ox axis points backwards you, second X: Ox axis points towards you, etc...)
On the left of the window, you can see which modules are loaded (“TVTK Scene” is the “root” of the scene, and “VTK
file” is the data source, i.e. the heart.vtk file):
• “Outline” module displays the box surrounding the scene;
• you have three grid planes (”!GridPlane” module), at coordinates x = 0, y = 0, and z = 0;
• two contour grid planes (”!ContourGridPlane” module): the first displays contour lines (vertically), the second,
the cutplane at z = const;
• ”!IsoSurface” module displays the heart as yellow;
• the last cutplane (vertically, at y = const) is displayed by the ”!ScalarCutPlane” module.
Note that no filter is used in these scenes.
Using each module/filter is very intuitive. Click on your module/filter in the tree and set some parameters with the
mouse or enter some values at the keyboard, in the window at the bottom left of the GUI.
If you want to copy/paste/delete/cut a given module/filter, click on it with the right button. A small window popups,
with a items list.
Note: You can get a larger window double-clicking on the choosen module/filter in the tree.
To load other modules or add filters, click on the “Visualize” item at the top of the window.
At last, you can also load another VTK data file, load/save the scene in a ”!MayaVi2” file (with extension .mv2), or
convert it to the image format you want (PNG, JPEG...), clicking on the “File” item or on the appropriate icon (small
floppy disk).
You can also get your scene in full-screen clicking on the small icon representing four red arrows in a little square. To
disable full-screen mode, type ‘e’ or ‘q’.
This is the simplest way to use !MayaVi2. You are recalled here that you can also try ‘’mayavi2 -h” to see what options
and arguments you can add to the !MayaVi2 command line.
Section author: GaelVaroquaux, FredericPetit, Unknown[130]
Attachments
• mv2.png
• mv2_cmdline.png
Scripting Mayavi 2
||<#80FF80> This page presents scripting Mayavi2 using the advanced, object-oriented API. Mayavi2 has recently
acquired an easy-to-use, thought maybe not as powerful, scripting module: mlab. You are invited to refer to the
section of Mayavi2 user guide. Reading this page will give you a deeper understanding of how Mayavi2 works, and it
complements the user guide. ||
To script !MayaVi2, you need (at least):
Scripting !MayaVi2 is quite simple because !MayaVi2 is written in python and based on TVTK, which eases the uses
of all VTK objects.
In the following, you’ll be learned how to script and use !MayaVi2 modules and filters.
Modules can be split in two parts:
* modules which do not interact with VTK data, and are seldom modified/
handled (Outline, Axes, !OrientationAxes and Text). These are called the "basic" modules.
Although color bar is not strictly speaking a module, it will be presented here.
Setting a few parameters for rendering your scene will be also presented.
* modules which do interact with VTK data, and those you want to play with (i.
e. all the remainder modules ;-).
Before starting, let’s see the “main template” of a !MayaVi2 script written in python.
A !MayaVi2 script should contain at least the following few lines:
#! /usr/bin/env python
class MyClass(Mayavi):
def run(self):
script = self.script
# `self.script` is the MayaVi Script interface (an instance of
# enthought.mayavi.script.Script) that is created by the
# base `Mayavi` class. Here we save a local reference for
# convenience.
## import any Mayavi modules and filters you want (they must be done here!)
.../...
if __name__ == '__main__':
mc = MyClass()
mc.main()
Adding modules or filters is quite simple: you have to import it, and then add it to your !MayaVi2 class.
To add a module, type:
from enthought.mayavi.modules.foo_module import FooModule
.../...
mymodule = FooModule()
script.add_module(mymodule)
Notice the used syntax: for modules for example, foo_module is the foo_module python file (without the extension
.py) in the subdirectory module/ of mayavi/ directory (lower case, underscore); this file contains the class FooModule
(no underscore, capitalized name).
But first of all, before rendering your scene with the modules and the filters you want to use, you have to load some
data, of course.
You have the choice between:
* create a 3D data array, for scalars data (for vectors data,
you have to use a 4D scalars data, i.e. a 3D scalar data for each component) and load it wi
ArraySource method;
* load a data file with !FileReader methods.
For example, we will create a 50*50*50 3D (scalar) array of a product of cosinus & sinus functions.
To do this, we need to load the appropriate modules:
import scipy
from scipy import ogrid, sin, cos, sqrt, pi
Nx = 50
Ny = 50
Nz = 50
Lx = 1
Ly = 1
Lz = 1
x, y, z = ogrid[0:Lx:(Nx+1)*1j,0:Ly:(Ny+1)*1j,0:Lz:(Nz+1)*1j]
To load a VTK data file, say heart.vtk file in mayavi/examples/data/ directory, simply type:
from enthought.mayavi.sources.vtk_file_reader import VTKFileReader
src = VTKFileReader()
src.initialize("heart.vtk")
Note: Files with .vtk extension are called “legacy VTK” files. !MayaVi2 can read a lot of other files formats (XML,
files from Ensight, Plot3D and so on). For example, you can load an XML file (with extension .vti, .vtp, .vtr, .vts, .vtu,
etc) using VTKXML!FileReader method.
Then, once your data are loaded using one of the two methods above, add the source with the add_source() method in
the body of the class !MyClass (after script.new_scene):
script.add_source(src)
The four basic modules Outline, Axes, !OrientationAxes and Text will be presented now.
See the [:Cookbook/MayaVi/ScriptingMayavi2/BasicModules: Basic Modules] wiki page.
See the [:Cookbook/MayaVi/ScriptingMayavi2/MainModules: Main Modules] wiki page.
See the [:Cookbook/MayaVi/ScriptingMayavi2/Filters: Filters] wiki page.
Section author: FredericPetit, Unknown[131], GaelVaroquaux, Unknown[6], Unknown[132]
These modules are called “basic” because they are general and independant to all kind of data.
Before using a module, remind that you have to import it.
In general, if you want to change the color of an object, you have to type:
module.actor.property.color = fg_color
then
fg_color = (0, 0, 0) # black foreground color
o = Outline()
script.add_module(o)
o.actor.property.color = fg_color
[](files/attachments/MayaVi_ScriptingMayavi2_BasicModules/basic_outline.png
For axis, you can set several parameters:
then
a = Axes()
script.add_module(a)
a.axes.property.color = fg_color # color for axes
a.axes.axis_title_text_property.color = fg_color # color for axes title
a.axes.x_label = "Lx" # label for Ox axis
a.axes.y_label = "Ly" # label for Oy axis
a.axes.z_label = "Lz" # label for Oz axis
a.axes.label_format = "" # no dimensions displayed
a.axes.axis_label_text_property.color = fg_color # in case we want to display them
Label format is the format for the dimensions along Ox, Oy and Oz axis. By default, it is set to %6.3g.
[](files/attachments/MayaVi_ScriptingMayavi2_BasicModules/basic_axes.png
!OrientationAxes module will display a little trihedron in the corner of the render window, showing the orientation of
the three axes, Ox, Oy, Oz.
Note: VTK > 5.0 must be installed in order to use this module.
Nothing special here, as the Outline module, you can set the labels color:
and
oa = OrientationAxes()
script.add_module(oa)
oa.text_property.color = fg_color
[](files/attachments/MayaVi_ScriptingMayavi2_BasicModules/basic_orientationaxes.png
You can use this module to display a title for your rendering window.
You need a text (a string), and you have to set up its position in the window (coordinates go from 0 to 1 in x and y)
with the x_position and y_position parameters.
Then you can set up the height and the width of your text:
and
[](files/attachments/MayaVi_ScriptingMayavi2_BasicModules/basic_title.png
Now, we will present how to set up the color bar, called “scalar” or “vector” color bar. It depends of the data you have
in your file.
Strictly speaking, the color bar is not a module, i.e. you don’t need to add it with an add_module() command: you
have to associate the “module_manager” object to a module, previously loaded, say the Text module for example.
Then, you can configure the color bar as follows (keywords are self-explanatory):
mmsclut = t.module_manager.scalar_lut_manager
mmsclut.show_scalar_bar = True
mmsclutsc = mmsclut.scalar_bar
mmsclutsc.orientation = "vertical" # or "horizontal"
mmsclutsc.width = 0.1
mmsclutsc.height = 0.8
mmsclutsc.position = (0.01, 0.15) # color bar located to the left of the
˓→rendering window
mmsclutsc.label_text_property.color = fg_color
mmsclutsc.title_text_property.color = fg_color
mmsclut.number_of_labels = 10
mmsclut.number_of_colors = 64
mmsclut.data_name = "My Label"
[](files/attachments/MayaVi_ScriptingMayavi2_BasicModules/basic_colorbar.png
Note: To configure a color bar for vectors instead of scalars, replace “scalar_lut_manager” by “vector_lut_manager”
above.
At last, to close the “basic” modules section, let’s see how we can setting up the scene.
By “setting up the scene”, you have to read “how the scene will be seen”: for example, setting the color background
and the point of view of the scene.
As usual, setting these parameters using python & TVTK is very easy.
If you want to change background color, you may need to also change foreground color for all modules. We recall
them here.
#! /usr/bin/env python
* an isometric view (coordinates normal are (1, 1, 1)), use isometric_view method.
t.scene.x_plus_view()
t.scene.camera.azimuth(62)
t.scene.camera.elevation(19.5)
t.scene.camera.zoom(1.5)
At last, you can choose if you want a perspective view or a parallel projection for your scene:
t.scene.camera.parallel_projection = True
[](files/attachments/MayaVi_ScriptingMayavi2_BasicModules/basic_scene_parall.png
for a parallel projection, or:
t.scene.camera.parallel_projection = False
Attachments
• basic_axes.png
• basic_colorbar.png
• basic_orientationaxes.png
• basic_outline.png
• basic_scene_parall.png
• basic_scene_persp.png
• basic_title.png
Well, until now, examples above are quite simple: scalars or vectors data are presented in the “vaccum”, i.e. there is
not object or material or whatsoever.
In others words, how can one display some object, say a metallic parallellepiped, for example, in a scalars or vectors
field ?
The first filter you will be presented here deals with this problem.
For this example, we suppose several hypotheses:
* the mesh of the volume data is made of 76x60x72 cubic cells (8 vertices);
thus, the VTK object cell used for the "vaccum" is called VTK_HEXAHEDRON (#12).
Cell ids begins at 0 and ends at #342880.
* a metallic parallellepiped is immersed in a EM field spreading over the whole volume data
this object is made of faceted cells (4 vertices), called VTK_QUAD (#9) cells in VTK denomi
These faceted cells are used here because of the null EM field within the parallellepiped.
Cell ids begin at #342881 and end at #345966.
* as there are different kinds of cells, the !UnstructuredGrid data set must be used (see `
/www.vtk.org/pdf/file-formats.pdf <http://www.vtk.org/pdf/file-formats.pdf>‘__ to know how !
UnstructuredGrid files must be written, VTK cells syntax, etc).
To display the metallic parallellepiped as a separate object from the vaccum, you have to extract the cells corresponding
to this object. Thus, you will be able to display this object using the Surface module for example.
First, import !ExtractUnstructuredGrid filter and Surface module, as usual:
from enthought.mayavi.modules.surface import Surface
then
### for the metallic parallellepiped
script.engine.current_object = src # current object must be set to the source
eug1 = ExtractUnstructuredGrid()
script.add_filter(eug1)
eug1.filter.cell_clipping = True
eug1.filter.cell_minimum = 342881
eug1.filter.cell_maximum = 345966
s = Surface() # the metallic is displayed using Surface module
eug1.add_module(s) # the module must be added to the current filter
s.actor.mapper.scalar_visibility = False
s.actor.property.color = (0.509804, 0.5098040, 0.5490196) # grey color for the
˓→metallic parallellepiped
### we need also extract the required cells for and only for the vaccum
script.engine.current_object = src # current object must be set to the source
eug2 = ExtractUnstructuredGrid()
script.add_filter(eug2)
eug2.filter.cell_clipping = True
eug2.filter.cell_minimum = 0
eug2.filter.cell_maximum = 342880
### we need also extract the required cells for and only for the vaccum
script.engine.current_object = src # current object must be set to the source
eug3 = ExtractUnstructuredGrid()
script.add_filter(eug3)
eug3.filter.cell_clipping = True
eug3.filter.cell_minimum = 0
eug3.filter.cell_maximum = 342880
.../...
eg = ExtractGrid()
script.add_filter(eg)
eg.x_min, eg.x_max = 10, 40
eg.y_min, eg.y_max = 10, 40
eg.z_min, eg.z_max = 10, 40
# eg.x_ratio = 2
# eg.y_ratio = 2
# eg.z_ratio = 2
[](files/attachments/MayaVi_ScriptingMayavi2_Filters/filter_eg.png
Using this filter, you can consider scalars values contained in a specific range.
Suppose that your scalars data spread from 0 to 1, but you are only interested by the values in the range [0.4, 0.6] and
you want to play with the slidebar of the !IsoSurface module within this range, around 0.5. By default, min & max
values of the slidebar will be set to 0 & 1, because of your data range:
[](files/attachments/MayaVi_ScriptingMayavi2_Filters/filter_thrld1.png
To play more accurately with the slidebar of the !IsoSurface module, you have to set min & max values to the required
values, i.e. 0.4 & 0.6. Thus, if you want to see your scalars data around 0.5, you can set the slidebar from 0.4 to 0.6
more easily than in the case where slidebar goes form 0 to 1.
The Threshold filter can help you to do this.
Begin to import the module and the filter first:
from enthought.mayavi.modules.iso_surface import IsoSurface
You can notice on the two previous figures that the Threshold module approximates bounds to the nearest values (there
are not strictly equal to 0.4 & 0.6).
Generally, data are interpolated between each point. Thus, they look like nicer.
But maybe in some case, you don’t want them to be interpolated, and see the data “as they are”: they are not displayed
as points, but as cells. In this case, you can use the !PointToCellData filter.
Let’s see again the example using the !ScalarCutPlane module, and import the !PointToCellData filter:
then add the !ScalarCutPlane module “above” the !PointToCellData filter, as usual:
ptocd = PointToCellData()
script.add_filter(ptocd)
scp = ScalarCutPlane()
ptocd.add_module(scp)
scp.implicit_plane.normal = (1, 0, 0)
scp.implicit_plane.origin = (10, 25, 25)
scp.implicit_plane.widget.enabled = False
scp.actor.property.diffuse = 0.0
scp.actor.property.ambient = 1.0
scp.actor.property.opacity = 1.0
scp.module_manager.scalar_lut_manager.data_range = [0, 1]
Thus, you can see your data on each cell and not as points (compare to the first figure showing the use of the !Scalar-
CutPlane module):
You can use the !WarpScalar filter to warp a 2D surface for example. See [:Cookbook/MayaVi/Examples: Example
using mlab (surf_regular_mlab.py)].
Section author: FredericPetit
Attachments
• filter_eg.png
• filter_eug1.png
• filter_eug2.png
• filter_p2c.png
• filter_thrld1.png
• filter_thrld2.png
Note2: In the !MayaVi2 tree view, the “Main Modules” (called “Modules”) have been separated from the “Basic
Modules” loading the !ModuleManager. Of coutse, you can load all your modules and filters without using the
!ModuleManager.
The simpliest (and easiest, but not the most impressive ;-) ) way to display 3D data is doubtless to slice it on some
planes, normal to Ox, Oy or Oz axis, or oblique.
One of the modules you can use to do this is called !ScalarCutPlane. It works for any data.
Note: As the !ImagePlaneWidget module also display scalars data on a plane (but it does not “cut” data), please
see [:Cookbook/MayaVi/Examples: Example with a 3D array as numerical source (numeric_source.py)] or [:Cook-
book/MayaVi/Examples: Example using ImagePlaneWidget Module (test.py)] to get more information on how you
can use this module.
You have to set several parameters:
* plane normal;
* its origin;
* widget enabled or not: if enabled,
you can play with the mouse to modify normal orientation, position, etc.;
* some colors properties.
and
[](files/attachments/MayaVi_ScriptingMayavi2_MainModules/module_scp.png
Note that if you enable widget, you will be able to translate (move the mouse to the red frame), change the normal
(move the mouse to the grey arrow) of the cutplanes in “real-time” :
[](files/attachments/MayaVi_ScriptingMayavi2_MainModules/module_scp_widg_en.png
You can also display the cutplanes as “warped surfaces”, just adding a few lines, setting the scale factor and the
normals to be computed (smoother surface) or not:
scp.enable_warp_scalar = True
scp.compute_normals = True
scp.warp_scalar.filter.scale_factor = 20
[](files/attachments/MayaVi_ScriptingMayavi2_MainModules/module_scp_warp.png
Of course, you can add as many cutplanes as you want, oblique or not.
Let’s see now a little more complex example : we want opacity to be set to 0.2 for each cutplane, and contours (#10)
for the same cutplanes added. Lines above have been changed as below:
Note: I agree, this is not the best way to write such a code. You can obviously write a method to do the same suff. But
this is not the purpose here.
scp.implicit_plane.widget.enabled = False
scp.actor.property.diffuse = 0.0
scp.actor.property.ambient = 1.0
scp.enable_contours = True
scp.contour.number_of_contours = 10
scp.contour.minimum_contour, scp.contour.maximum_contour = [0, 1]
scp.module_manager.scalar_lut_manager.data_range = [0, 1]
and
sug = SliceUnstructuredGrid()
script.add_module(sug)
# unstructured grid so origin coordinates are reals
sug.implicit_plane.origin = (25., 25., 25.)
sug.implicit_plane.normal = (1, 1, 1)
sug.implicit_plane.widget.enabled = False
sug.extract_geometry.extract_boundary_cells = False
sug.extract_geometry.extract_inside = True
sug.extract_geometry.extract_only_boundary_cells = False
sug.geom_filter.cell_clipping = False
sug.geom_filter.extent_clipping = False
sug.geom_filter.merging = True
sug.geom_filter.point_clipping = False
sug.actor.property.representation = 'wireframe'
sug.actor.property.diffuse = 0.0
sug.actor.property.ambient = 1.0
sug.actor.property.opacity = 1.0
sug.module_manager.scalar_lut_manager.data_range = [0, 1]
Using !GridPlane module cuts also your grid, but quite differently from !ScalarCutPlane module. You can’t get normal
plane only along Ox, Oy and Oz axis, and it works only for structured grids. But unlike !ScalarCutPlane module, which
always cuts your mesh in a plane, !GridPlane cuts through your mesh: if it’s a conformal mesh, the cut won’t be a
plane, but something following the curvature of your mesh.
The !StructuredGridOutline module does the same as Outline module, but for conformal mesh.
To illustrate how can we use these modules, let’s consider a example provided in the VTKData directory, combxyz.bin
& combq.bin files (Plot3D format) from the tarball vtkdata-5.0.3.tar.gz you can download here.
So, type:
and
src = PLOT3DReader()
src.initialize('combxyz.bin', 'combq.bin')
script.add_source(src)
sgo = StructuredGridOutline()
script.add_module(sgo)
gp = GridPlane()
script.add_module(gp)
gp.grid_plane.axis = 'x'
gp.grid_plane.position = 2
gp.actor.mapper.scalar_visibility = True
gp.actor.property.representation = 'surface'
gp.actor.property.diffuse = 0.0
gp.actor.property.ambient = 1.0
gp.actor.property.opacity = 1
gp = GridPlane()
script.add_module(gp)
gp.grid_plane.axis = 'x'
gp.grid_plane.position = 25
gp.actor.mapper.scalar_visibility = True
gp.actor.property.representation = 'surface'
gp.actor.property.diffuse = 0.0
gp.actor.property.ambient = 1.0
gp.actor.property.opacity = 1
gp = GridPlane()
script.add_module(gp)
gp.grid_plane.axis = 'x'
gp.grid_plane.position = 55
gp.actor.mapper.scalar_visibility = True
gp.actor.property.representation = 'surface'
gp.actor.property.diffuse = 0.0
gp.actor.property.ambient = 1.0
gp.actor.property.opacity = 1
Others modules are Surface and !IsoSurface. These modules work with any data.
Surface module does the same as !IsoSurface but displays, automatically, several isosurfaces for a given number of
values in a given range.
In fact, you can get the same result with !IsoSurface module, but you will have to set each isovalue.
When several isosurfaces are displayed, using Surface or !IsoSurface module, you should set opacity to a value below
1, in order to see all isosurfaces.
Using Surface module is straightforward:
then
s = Surface()
s.enable_contours = True # we want contours enabled
s.contour.auto_contours = True # we want isovalues automatically well-defined
s.contour.number_of_contours = 10 # self-explanatory ;-)
s.actor.property.opacity = 0.2
script.add_module(s)
s.contour.minimum_contour = 0
s.contour.maximum_contour = 1
s.module_manager.scalar_lut_manager.data_range = [0, 1]
and
isosurf = IsoSurface()
script.add_module(isosurf)
isosurf.contour.contours = [0.1111, 0.2222, 0.3333, 0.4444, 0.5555, 0.6666, 0.7777, 0.
˓→8888]
isosurf.compute_normals = True
isosurf.actor.property.opacity = 0.2
isosurf.module_manager.scalar_lut_manager.data_range = [0, 1]
This yelds the same scene as previous, of course, but now, you can control each isovalue separately.
[](files/attachments/MayaVi_ScriptingMayavi2_MainModules/module_isosurface.png
The funny part is that you can set the minimum/maximum contour for Surface or Contours for !IsoSurface in “real-
time”, moving the slide-bar. This is a very useful feature. And can render very nice “dynamic” scene ! :-)
It is still quite experimental for me (you can set a lot of parameters), so this section will be very short ;-)
Instead of viewing surfaces, data are displayed in the whole volume.
Begin to import the required module:
v = Volume()
script.add_module(v)
v.lut_manager.show_scalar_bar = True
v.lut_manager.scalar_bar.orientation = 'vertical'
v.lut_manager.scalar_bar.width = 0.1
v.lut_manager.scalar_bar.height = 0.8
v.lut_manager.scalar_bar.position = (0.01, 0.15)
v.lut_manager.scalar_bar.label_text_property.color = fg_color
v.lut_manager.scalar_bar.title_text_property.color = fg_color
v.lut_manager.number_of_labels = 10
v.lut_manager.data_name = ""
Note that the Volume module has a “Color Transfer Function”, which is quite different from the !LookUp Table used
by the others modules.
The rendered scene should look like this (thanks to Prabhu to have made the CTF similar to the LUT) :
[](files/attachments/MayaVi_ScriptingMayavi2_MainModules/module_volume.png
Until now, we have only dealt with scalar values. You can also display values as vectors. You can use one of the three
following modules:
* Vectors module: scale and color are set by vectors data, i.e.
a 3D array vectors field;
* Glyph module: scale and color are set by scalar data;
* !VectorCutPlane module; in this case, vectors are not diplayed in the whole volume,
but only on cutplanes, as !ScalarCutPlane module does with scalar values.
You can set several parameters for these modules, in concern with arrows shape, etc.
First, it depends of the number of points in your volume, but you are advised to decimate your data. If you don’t,
you should see nothing all but a lot of arrows everywhere, and thus loss the pertinent information. You can choose a
randomly, or not, decimation.
Second, you can choose the shape of your vectors, amongst the following list: 2D Glyph or Arrow, Cone, Cylinder,
Sphere and Cube 3D vector shapes.
Third, you can set some parameters for the choosen shape. For example, using the Arrow shape, you can set the
following properties for the shaft and the tip:
* the shaft radius;
* the shaft resolution (number of polygons);
* the tip length;
* the tip radius;
* the tip resolution;
You can also set the vector position, between “tail”, “centered” and “head”, the scale mode, the color mode, the scale
factor (how big your vectors will be displayed), etc.
Let’s see now how one can do this.
First, import the required module.
For Vectors module,
In fact, you will see that these three modules use the same objects and methods. Only default values differ.
For instance, for Vectors module, you can type:
v = Vectors()
script.add_module(v)
v.glyph.mask_input_points = True # we want to decimate our data...
v.glyph.mask_points.on_ratio = 100 # ...by a ratio of 100
v.glyph.mask_points.random_mode = True # I want a randomly decimation
v.glyph.glyph_source = v.glyph.glyph_list[1] # I like ArrowSource ;-)
# following values are the default values: tweak your own !
v.glyph.glyph_source.shaft_radius = 0.03
v.glyph.glyph_source.shaft_resolution = 6
v.glyph.glyph_source.tip_length = 0.35
v.glyph.glyph_source.tip_radius = 0.1
v.glyph.glyph_source.tip_resolution = 6
v.glyph.glyph.scale_factor = 10
v.glyph.glyph_position = 'tail'
v.glyph.scale_mode = 'scale_by_vector'
v.glyph.color_mode = 'color_by_vector'
### if you use Glyph module, here are the default values
# v.glyph.glyph_position = 'center'
# v.glyph.scale_mode = 'scale_by_scalar'
# v.glyph.color_mode = 'color_by_scalar'
If we consider, once again ;-), the same 3D data already shown before, but this time, with vectors instead of scalars
data, the scene should look like this:
[](files/attachments/MayaVi_ScriptingMayavi2_MainModules/module_vectors.png
For the !VectorCutPlane module, you can set the same properties as above plus the properties of the !ScalarCutPlane
module such as implicit_plane.normal, implicit_plane.origin, implicit_plane.widget.enabled, etc:
vcp = VectorCutPlane()
script.add_module(vcp)
vcp.glyph.mask_input_points = True
vcp.glyph.mask_points.on_ratio = 5
vcp.glyph.mask_points.random_mode = False
vcp.glyph.glyph_source = vcp.glyph.glyph_list[1]
vcp.glyph.glyph_source.shaft_radius = 0.03
vcp.glyph.glyph_source.shaft_resolution = 6
vcp.glyph.glyph_source.tip_length = 0.35
vcp.glyph.glyph_source.tip_radius = 0.1
vcp.glyph.glyph_source.tip_resolution = 6
vcp.glyph.glyph.scale_factor = 20
vcp.glyph.glyph_position = 'tail'
vcp.glyph.scale_mode = 'scale_by_vector'
vcp.glyph.color_mode = 'color_by_vector'
vcp.implicit_plane.normal = (1, 0, 0) # set normal to Ox axis
vcp.implicit_plane.origin = (10, 25, 25) # set origin to (i=10, j=25, k=25) for a
˓→structured grid
vcp.implicit_plane.widget.enabled = True
vcp.actor.property.diffuse = 0.0 # set some color properties
vcp.actor.property.ambient = 1.0 #
vcp.actor.property.opacity = 1.0 #
vcp.module_manager.vector_lut_manager.data_range = [0, 1]
then
wvcp = WarpVectorCutPlane()
script.add_module(wvcp)
wvcp.implicit_plane.normal = (1, 0, 0) # set normal to Ox axis
wvcp.implicit_plane.origin = (10, 25, 25) # set origin to (i=10, j=25, k=25) for a
˓→structured grid
wvcp.implicit_plane.widget.enabled = True
wvcp.compute_normals = True
wvcp.warp_vector.filter.scale_factor = 10
You should get this (compare to the warped surface with !ScalarCutPlane module):
[](files/attachments/MayaVi_ScriptingMayavi2_MainModules/module_warpvcp.png
Another way to display vectors fields is to use the Streamline module.
We consider here others Plot3D files: postxyz.bin & postq.bin that you can download here. You can find some
screenshots using these files on the VTK home page here.
You can set several parameters for this module: for instance, the type of the streamline (tube, ribbon or line) with its
properties, and the “seed”.
We also use the !GridPlane module in this example:
Begin to import the required module:
from enthought.mayavi.sources.plot3d_reader import PLOT3DReader
from enthought.mayavi.modules.streamline import Streamline
from enthought.mayavi.modules.grid_plane import GridPlane
In this example, we want streamlines displayed as tubes, with 10 sides, and the seed set to the line seed. We also
choose to display the “Kinetic Energy” part of the Plot3D files.
src = PLOT3DReader()
src.initialize('postxyz.bin', 'postq.bin')
src.scalars_name = "kinetic energy"
script.add_source(src)
gp = GridPlane()
script.add_module(gp)
gp.grid_plane.axis = 'x'
gp.actor.mapper.scalar_visibility = True
gp.actor.property.representation = 'surface'
gp.actor.property.diffuse = 0.0
gp.actor.property.ambient = 1.0
gp.actor.property.opacity = 1
gp = GridPlane()
script.add_module(gp)
gp.grid_plane.axis = 'z'
gp.actor.mapper.scalar_visibility = False
gp.actor.property.representation = 'wireframe'
gp.actor.property.diffuse = 0.0
gp.actor.property.ambient = 1.0
gp.actor.property.opacity = 1
strl = Streamline()
script.add_module(strl)
strl.streamline_type = "tube" # tube, ribbon or line
strl.tube_filter.number_of_sides = 10
strl.tube_filter.radius = 0.1
strl.seed.widget = strl.seed.widget_list[1] # [Sphere, Line, Plane, Point]
strl.seed.widget.align = "z_axis" # or "x_axis", "y_axis"
strl.seed.widget.point1 = (-0.7, 0, 0)
strl.seed.widget.point2 = (-0.7, 0, 4.82)
strl.seed.widget.resolution = 10
strl.seed.widget.enabled = False
Attachments
• module_isosurface.png
• module_scp.png
• module_scp2.png
• module_scp_warp.png
• module_scp_widg_en.png
• module_sgo_gp.png
• module_streamline.png
• module_sug.png
• module_surface.png
• module_vcp.png
• module_vectors.png
• module_volume.png
• module_warpvcp.png
Numpy
There are two very closely related ways to access array columns by name: recarrays and structured arrays. Structured
arrays are just ndarrays with a complicated data type:
#!python numbers=disable
In [1]: from numpy import *
In [2]: ones(3, dtype=dtype([('foo', int), ('bar', float)]))
Out[2]:
array([(1, 1.0), (1, 1.0), (1, 1.0)],
dtype=[('foo', '<i4'), ('bar', '<f8')])
In [3]: r = _
In [4]: r['foo']
Out[4]: array([1, 1, 1])
recarray is a subclass of ndarray that just adds attribute access to structured arrays:
#!python numbers=disable
In [10]: r2 = r.view(recarray)
In [11]: r2
Out[11]:
recarray([(1, 1.0), (1, 1.0), (1, 1.0)],
dtype=[('foo', '<i4'), ('bar', '<f8')])
In [12]: r2.foo
Out[12]: array([1, 1, 1])
One downside of recarrays is that the attribute access feature slows down all field accesses, even the r[’foo’] form,
because it sticks a bunch of pure Python code in the middle. Much code won’t notice this, but if you end up having to
iterate over an array of records, this will be a hotspot for you.
Structured arrays are sometimes confusingly called record arrays.
. - lightly adapted from a Robert Kern post of Thu, 26 Jun 2008 15:25:11 -0500
A little script showing how to efficiently reformat structured arrays into normal ndarrays.
249
SciPy Cookbook, Release
#!python numbers=disable
import numpy as np
print('\n ndarray')
nd = np.array(data)
print nd
print struct_1dtype_float_alt
print('\n structured array with different dtypes to reshaped float array ommiting
˓→string columns')
print struct_diffdtype_float
Building arrays
This is a brief introduction to array objects, their declaration and use in scipy. A comprehensive list of examples of
Numpy functions for arrays can be found at Numpy Example List With Doc
Numerical arrays are not yet defined in the standard python language. To load the array object and its methods into
the namespace, the numpy package must be imported:
Arrays can be created from the usual python lists and tuples using the array function. For example,
a = array([1,2,3])
returns a one dimensional array of integers. The array instance has a large set of methods and properties attached to it.
For example, is the dimension of the array. In this case, it would simply be .
One big difference between array objects and python’s sequences object is the definition of the mathematical operators.
Whereas the addition of two lists concatenates those list, the addition of two arrays adds the arrays element-wise. For
example :
b = array((10,11,12))
a + b
dtype('int64')
Accordingly, mathematical operations such as division will operate as usual in python, that is, will return an integer
answer :
a/3
array([0, 0, 1])
To obtain the expected answer, one solution is to force the casting of integers into real numbers by dividing by a real
number . A more careful approach is to define the type at initialization time :
a = array([1,2,3], dtype=float)
Another way to cast is by using Numpy’s built-in cast functions astype and cast. These allow you to change the type
of data you’re working with:
a = array([1,2,3], dtype=int)
b = a.astype('float')
The elements of an array are accessed using the bracket notation where is an integer index starting at 0. Sub-arrays
can be accessed by using general indexes of the form start:stop:step. a[start:stop:step] will return a
reference to a sub-array of array a starting with (including) the element at index start going up to (but not including)
the element at index stop in steps of step. e.g.:
data = array([0.5, 1.2, 2.2, 3.4, 3.5, 3.4, 3.4, 3.4], float)
t = arange(len(data), dtype='float') * 2*pi/(len(data)-1)
t[:] # get all t-values
Furthermore, there is the possibility to access array-elements using bool-arrays. The bool-array has the indexes of
elements which are to be accessed set to True.
i = array(len(t)*[False], bool) # create an bool-array for indexing
i[2] = True; i[4] = True; i[6] = True # we want elements with indexes 2,4 and 6
t[i]
We can use this syntax to make slightly more elaborate constructs. Consider the data[:] and t[:] arrays defined
before. Suppose we want to get the four (t[i]/data[i])-pairs with the four t[i]-values being closest to a
point p=1.8. We could proceed as follows:
p=1.8 # set our point
abs(t-p) # how much do the t[:]-values differ from p?
# t[:]-values and p
y_p = data[abs(t-p) <= dt_m] # construct the sub-arrays; (1) get the 4 t[:]-values
t_p = t[abs(t-p) <= dt_m] # (2) get the data t[:]-values corresponding to the
˓→4 t[:] values
y_p
t_p
It has to be kept in mind that slicing returns a reference to the data. As a consequence, changes in the returned sub-array
cause changes in the original array and vice versa. If one wants to copy only the values one can use the copy()-method
of the matrix object. For example:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
array([[4, 5],
[8, 9]])
array([[4, 5],
[8, 9]])
array([[42, 42],
[42, 42]])
array([[4, 5],
[8, 9]])
The next example creates two matrices: a and b, and computes the dot product axb (in other words, the standard
matrix product)
a = array([[1,2], [2,3]])
b = array([[7,1], [0,1]])
dot(a, b)
array([[ 7, 3],
[14, 5]])
Scipy (via Numpy) provides numerous ways to create arrays automatically. For example, to create a vector of evenly
spaced numbers, the linspace function can be called. This is often useful to compute the result of a function on some
domain. For example, to compute the value of the function on one period, we would define a vector going from 0 to 2
pi and compute the value of the function for all values in this vector :
The same can be done on a N dimensional grid using the class and some of its object creation methods and . For
example,
x, y = mgrid[0:10:.1, 0:10:.2]
returns two matrices, x and y, whose elements range from 0 to 10 (non-inclusively) in .1 and .2 increments respectively.
These matrices can be used to compute the value of a function at the points (x_i, y_i) defined by those grids :
z = (x+y)**2
The ogrid object has the exact same behavior, but instead of storing an N-D matrix into memory, it stores only the 1-D
vector that defines it. For large matrices, this can lead to significant economy of memory space.
Other useful functions to create matrices are and who initialize arrays full of zeros and ones. Note that those will be
float arrays by default. This may lead to curious behaviour for the unawares. For example, let’s initialize a matrix with
zeros, and then place values in it element by element.
In this example, we are trying to store floating point numbers in an integer array. Thus, the numbers are then recast to
integers, so that if we print the matrix, we obtain :
mz
array([[0, 0],
[0, 2]])
To create real number arrays, one simply need to state the type explicitly in the call to the function :
The ndarray.repeat() method returns a new array with dimensions repeated from the old one.
a = array([[0, 1],
... [2, 3]])
a.repeat(2, axis=0) # repeats each row twice in succession
array([[0, 1],
[0, 1],
[2, 3],
[2, 3]])
array([[0, 0, 0, 1, 1, 1],
[2, 2, 2, 3, 3, 3]])
array([0, 0, 1, 1, 2, 2, 3, 3])
These can be combined to do some useful things, like enlarging image data stored in a 2D array:
def enlarge(a, x=2, y=None):
"""Enlarges 2D image array a using simple pixel repetition in both dimensions.
Enlarges by factor x horizontally and factor y vertically.
If y is left as None, uses factor x for both dimensions."""
a = asarray(a)
assert a.ndim == 2
if y == None:
y = x
for factor in (x, y):
assert factor.__class__ == int
assert factor > 0
return a.repeat(y, axis=0).repeat(x, axis=1)
array([[0, 0, 1, 1],
[0, 0, 1, 1],
[2, 2, 3, 3],
[2, 2, 3, 3]])
Convolution-like operations
Users frequently want to break an array up into overlapping chunks, then apply the same operation to each chunk.
You can generate a dynamical power spectrum, for example, by taking an FFT of each chunk, or you can construct a
convolution using a dot product. Some of these operations already exist in numpy and scipy, but others don’t. One
way to attack the problem would be to make a matrix in which each column was a starting location, and each row was
a chunk. This would normally require duplicating some data, potentially a lot of data if there’s a lot of overlap, but
numpy’s striding can be used to do this. The simplification of striding doesn’t come for free; if you modify the array, all
shared elements will be modified. Nevertheless, it’s a useful operation. Find attached the code, segmentaxis.py .
Example usage:
In [1]: import numpy as N
In [2]: import segmentaxis
In [3]: a = N.zeros(30)
In [4]: a[15] = 1
In [5]: filter = N.array([0.1,0.5,1,0.5,0.1])
In [6]: sa = segmentaxis.segment_axis(a,len(filter),len(filter)-1)
In [7]: sa
Out[7]:
array([[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
Attachments
• segmentaxis.py
The whole point of numpy is to introduce a multidimensional array object for holding homogeneously-typed numerical
data. This is of course a useful tool for storing data, but it is also possible to manipulate large numbers of values without
writing inefficient python loops. To accomplish this, one needs to be able to refer to elements of the arrays in many
different ways, from simple “slices” to using arrays as lookup tables. The purpose of this page is to go over the various
different types of indexing available. Hopefully the sometimes-peculiar syntax will also become more clear.
We will use the same arrays as examples wherever possible:
import numpy as np
A = np.arange(10)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
B = np.reshape(np.arange(9),(3,3))
B
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
C = np.reshape(np.arange(2*3*4),(2,3,4))
C
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
The simplest way to pick one or some elements of an array looks very similar to python lists:
A[1]
B[1,0]
C[1,0,2]
14
That is, to pick out a particular element, you simply put the indices into square brackets after it. As is standard for
python, element numbers start at zero.
If you want to change an array value in-place, you can simply use the syntax above in an assignment:
T = A.copy()
T[3] = -5
T
T[0] += 7
T
(The business with .copy() is to ensure that we don’t actually modify A, since that would make further examples
confusing.) Note that numpy also supports python’s “augmented assignment” operators, +=, -=, *=, and so on.
Be aware that the type of array elements is a property of the array itself, so that if you try to assign an element of
another type to an array, it will be silently converted (if possible):
T = A.copy()
T[3] = -1.5
T
T[3] = -0.5j
<ipython-input-12-513dc3e86c66> in <module>()
----> 1 T[3] = -0.5j
Note that the conversion that happens is a default conversion; in the case of float to int conversion, it’s truncation.
If you wanted something different, say taking the floor, you would have to arrange that yourself (for example with
np.floor()). In the case of converting complex values to integers, there’s no resonable default way to do it, so numpy
raises an exception and leaves the array unchanged.
Finally, two slightly more technical matters.
If you want to manipulate indices programmatically, you should know that when you write something like
C[1,0,1]
13
C[(1,0,1)]
13
This peculiar-looking syntax is constructing a tuple, python’s data structure for immutable sequences, and using that
tuple as an index into the array. (Under the hood, C[1,0,1] is converted to C.__getitem__((1,0,1)).) This means you
can whip up tuples if you want to:
i = (1,0,1)
C[i]
13
If it doesn’t seem likely you would ever want to do this, consider iterating over an arbitrarily multidimensional array:
for i in np.ndindex(B.shape):
print i, B[i]
(0, 0) 0
(0, 1) 1
(0, 2) 2
(1, 0) 3
(1, 1) 4
(1, 2) 5
(2, 0) 6
(2, 1) 7
(2, 2) 8
Indexing with tuples will also become important when we start looking at fancy indexing and the function np.where().
The last technical issue I want to mention is that when you select an element from an array, what you get back has the
same type as the array elements. This may sound obvious, and in a way it is, but keep in mind that even innocuous
numpy arrays like our A, B, and C often contain types that are not quite the python types:
a = C[1,2,3]
a
23
type(a)
numpy.int64
type(int(a))
int
a**a
8450172506621111015
int(a)**int(a)
20880467999847912034355032910567L
numpy scalars also support certain indexing operations, for consistency, but these are somewhat subtle and under
discussion.
It is obviously essential to be able to work with single elements of an array. But one of the selling points of numpy is
the ability to do operations “array-wise”:
2*A
This is handy, but one very often wants to work with only part of an array. For example, suppose one wants to
compute the array of differences of A, that is, the array whose elements are A[1]-A[0], A[2]-A[1], and so on. (In fact,
the function np.diff does this, but let’s ignore that for expositional convenience.) numpy makes it possible to do this
using array-wise operations:
A[1:]
array([1, 2, 3, 4, 5, 6, 7, 8, 9])
A[:-1]
array([0, 1, 2, 3, 4, 5, 6, 7, 8])
A[1:] - A[:-1]
array([1, 1, 1, 1, 1, 1, 1, 1, 1])
This is done by making an array that is all but the first element of A, an array that is all but the last element of A, and
subtracting the corresponding elements. The process of taking subarrays in this way is called “slicing”.
One-dimensional slices
The general syntax for a slice is array[start:stop:step]. Any or all of the values start, stop, and step may be left out
(and if step is left out the colon in front of it may also be left out):
A[5:]
array([5, 6, 7, 8, 9])
A[:5]
array([0, 1, 2, 3, 4])
A[::2]
array([0, 2, 4, 6, 8])
A[1::2]
array([1, 3, 5, 7, 9])
A[1:8:2]
array([1, 3, 5, 7])
As usual for python, the start index is included and the stop index is not included. Also as usual for python, negative
numbers for start or stop count backwards from the end of the array:
A[-3:]
array([7, 8, 9])
A[:-3]
array([0, 1, 2, 3, 4, 5, 6])
If stop comes before start in the array, then an array of length zero is returned:
A[5:3]
array([], dtype=int64)
(The “dtype=int32” is present in the printed form because in an array with no elements, one cannot tell what type the
elements have from their printed representation. It nevertheless makes sense to keep track of the type that they would
have if the array had any elements.)
If you specify a slice that happens to have only one element, you get an array in return that happens to have only one
element:
A[5:6]
array([5])
A[5]
This seems fairly obvious and reasonable, but when dealing with fancy indexing and multidimensional arrays it can
be surprising.
If the number step is negative, the step through the array is negative, that is, the new array contains (some of) the
elements of the original in reverse order:
A[::-1]
array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])
This is extremely useful, but it can be confusing when start and stop are given:
A[5:3:-1]
array([5, 4])
A[3:5:1]
array([3, 4])
The rule to remember is: whether step is positive or negative, start is always included and stop never is.
Just as one can retrieve elements of an array as a subarray rather than one-by-one, one can modify them as a subarray
rather than one-by-one:
#!python numbers=disable
>>> T = A.copy()
>>> T
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> T[1::2]
array([1, 3, 5, 7, 9])
>>> T[1::2] = -np.arange(5)
>>> T[1::2]
array([ 0, -1, -2, -3, -4])
>>> T
array([ 0, 0, 2, -1, 4, -2, 6, -3, 8, -4])
If the array you are trying to assign is the wrong shape, an exception is raised:
#!python numbers=disable
>>> T = A.copy()
>>> T[1::2] = np.arange(6)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: shape mismatch: objects cannot be broadcast to a single shape
>>> T[:4] = np.array([[0,1],[1,0]])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: shape mismatch: objects cannot be broadcast to a single shape
If you think the error message sounds confusing, I have to agree, but there is a reason. In the first case, we tried to stuff
six elements into five slots, so numpy refused. In the second case, there were the right number of elements - four - but
we tried to stuff a two-by-two array where there was supposed to be a one-dimensional array of length four. While
numpy could have coerced the two-by-two array into the right shape, instead the designers chose to follow the python
philosophy “explicit is better than implicit” and leave any coercing up to the user. Let’s do that, though:
#!python numbers=disable
>>> T = A.copy()
>>> T[:4] = np.array([[0,1],[1,0]]).ravel()
>>> T
array([0, 1, 1, 0, 4, 5, 6, 7, 8, 9])
So in order for assignment to work, it is not simply enough to have the right number of elements - they must be
arranged in an array of the right shape.
There is another issue complicating the error message: numpy has some extremely convenient rules for converting
lower-dimensional arrays into higher-dimensional arrays, and for implicitly repeating arrays along axes. This process
is called “broadcasting”. We will see more of it elsewhere, but here it is in its simplest possible form:
#!python numbers=disable
>>> T = A.copy()
>>> T[1::2] = -1
>>> T
array([ 0, -1, 2, -1, 4, -1, 6, -1, 8, -1])
We told numpy to take a scalar, -1, and put it into an array of length five. Rather than signal an error, numpy’s
broadcasting rules tell it to convert this scalar into an effective array of length five by repeating the scalar five times.
(It does not, of course, actually create a temporary array of this size; in fact it uses a clever trick of telling itself that
the temporary array has its elements spaced zero bytes apart.) This particular case of broadcasting gets used all the
time:
#!python numbers=disable
>>> T = A.copy()
>>> T[1::2] -= 1
>>> T
array([0, 0, 2, 2, 4, 4, 6, 6, 8, 8])
#!python numbers=disable
>>> T = A.copy()
>>> T[:] = -1
>>> T
array([-1, -1, -1, -1, -1, -1, -1, -1, -1, -1])
>>> T = A.copy()
>>> T = -1
>>> T
-1
What happened here? Well, in the first case we told numpy to assign -1 to all the elements of T, so that’s what it
did. In the second case, we told python “T = -1”. In python, variables are just names that can be attached to objects
in memory. This is in sharp contrast with languages like C, where a variable is a named region of memory where
data can be stored. Assignment to a variable name - T in this case - simply changes which object the name refers
to, without altering the underlying object in any way. (If the name was the only reference to the original object, it
becomes impossible for your program ever to find it again after the reassignment, so python deletes the original object
to free up some memory.) In a language like C, assigning to a variable changes the value stored in that memory region.
If you really must think in terms of C, you can think of all python variables as holding pointers to actual objects;
assignment to a python variable is just modification of the pointer, and doesn’t affect the object pointed to (unless
garbage collection deletes it). In any case, if you want to modify the contents of an array, you can’t do it by assigning
to the name you gave the array; you must use slice assignment or some other approach.
Finally, a technical point: how can a program work with slices programmatically? What if you want to, say, save a
slice specification to apply to many arrays later on? The answer is to use a slice object, which is constructed using
slice():
#!python numbers=disable
>>> A[1::2]
array([1, 3, 5, 7, 9])
>>> s = slice(1,None,2)
>>> A[s]
array([1, 3, 5, 7, 9])
(Regrettably, you can’t just write “s = 1::2”. But within square brackets, 1::2 is converted internally to slice(1,None,2).)
You can leave out arguments to slice() just like you can with the colon notation, with one exception:
#!python numbers=disable
>>> A[slice(-3)]
array([0, 1, 2, 3, 4, 5, 6])
>>> A[slice(None,3)]
array([0, 1, 2])
>>> A[slice()]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: slice expected at least 1 arguments, got 0
>>> A[slice(None,None,None)]
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Multidimensional slices
One-dimensional arrays are extremely useful, but often one has data that is naturally multidimensional - image data
might be an N by M array of pixel values, or an N by M by 3 array of colour values, for example. Just as it is
useful to take slices of one-dimensional arrays, it is useful to take slices of multidimensional arrays. This is fairly
straightforward:
#!python numbers=disable
>>> B
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> B[:2,:]
array([[0, 1, 2],
[3, 4, 5]])
>>> B[:,::-1]
array([[2, 1, 0],
[5, 4, 3],
[8, 7, 6]])
Essentially one simply specifies a one-dimensional slice for each axis. One can also supply a number for an axis rather
than a slice:
#!python numbers=disable
>>> B[0,:]
array([0, 1, 2])
>>> B[0,::-1]
array([2, 1, 0])
>>> B[:,0]
array([0, 3, 6])
Notice that when one supplies a number for (say) the first axis, the result is no longer a two-dimensional array; it’s
now a one-dimensional array. This makes sense:
#!python numbers=disable
>>> B[:,:]
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> B[0,:]
array([0, 1, 2])
>>> B[0,0]
0
If you supply no numbers, you get a two-dimensional array; if you supply one number, the dimension drops by one,
and you get a one-dimensional array; and if you supply two numbers the dimension drops by two and you get a scalar.
(If you think you should get a zero-dimensional array, you are opening a can of worms. The distinction, or lack thereof,
between scalars and zero-dimensional arrays is an issue under discussion and development.)
If you are used to working with matrices, you may want to preserve a distinction between “row vectors” and “column
vectors”. numpy supports only one kind of one-dimensional array, but you could represent row and column vectors
as two-dimensional arrays, one of whose dimensions happens to be one. Unfortunately indexing of these objects then
becomes cumbersome.
As with one-dimensional arrays, if you specify a slice that happens to have only one element, you get an array one of
whose axes has length 1 - the axis doesn’t “disappear” the way it would if you had provided an actual number for that
axis:
#!python numbers=disable
>>> B[:,0:1]
array([[0],
[3],
[6]])
>>> B[:,0]
array([0, 3, 6])
numpy also has a few shortcuts well-suited to dealing with arrays with an indeterminate number of dimensions.
If this seems like something unreasonable, keep in mind that many of numpy’s functions (for example np.sort(),
np.sum(), and np.transpose()) must work on arrays of arbitrary dimension. It is of course possible to extract the
number of dimensions from an array and work with it explicitly, but one’s code tends to fill up with things like
(slice(None,None,None),)*(C.ndim-1), making it unpleasant to read. So numpy has some shortcuts which often sim-
plify things.
First the Ellipsis object:
#!python numbers=disable
>>> A[...]
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> B[...]
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> B[0,...]
array([0, 1, 2])
>>> B[0,...,0]
array(0)
>>> C[0,...,0]
array([0, 4, 8])
>>> C[0,Ellipsis,0]
array([0, 4, 8])
The ellipsis (three dots) indicates “as many ‘:’ as needed”. (Its name for use in index-fiddling code is Ellipsis, and it’s
not numpy-specific.) This makes it easy to manipulate only one dimension of an array, letting numpy do array-wise
operations over the “unwanted” dimensions. You can only really have one ellipsis in any given indexing expression,
or else the expression would be ambiguous about how many ‘:’ should be put in each. (In fact, for some reason it is
allowed to have something like “C[...,...]”; this is not actually ambiguous.)
In some circumstances, it is convenient to omit the ellipsis entirely:
#!python numbers=disable
>>> B[0]
array([0, 1, 2])
>>> C[0]
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> C[0,0]
array([0, 1, 2, 3])
>>> B[0:2]
array([[0, 1, 2],
[3, 4, 5]])
If you don’t supply enough indices to an array, an ellipsis is silently appended. This means that in some sense you
can view a two-dimensional array as an array of one-dimensional arrays. In combination with numpy’s array-wise
operations, this means that functions written for one-dimensional arrays can often just work for two-dimensional
arrays. For example, recall the difference operation we wrote out in the section on one-dimensional slices:
#!python numbers=disable
>>> A[1:] - A[:-1]
array([1, 1, 1, 1, 1, 1, 1, 1, 1])
>>> B[1:] - B[:-1]
array([[3, 3, 3],
[3, 3, 3]])
It works, unmodified, to take the differences along the first axis of a two-dimensional array.
Writing to multidimensional slices works just the way writing to one-dimensional slices does:
>>> T = B.copy()
>>> T[1,:] = -1
>>> T
array([[ 0, 1, 2],
[-1, -1, -1],
[ 6, 7, 8]])
>>> T[:,:2] = -2
>>> T
array([[-2, -2, 2],
[-2, -2, -1],
[-2, -2, 8]])
#!python numbers=disable
>>> T = A.copy() - 5
>>> T[T<0] **= 2
>>> T
array([25, 16, 9, 4, 1, 0, 1, 2, 3, 4])
Or suppose one wants to use an array as a lookup table, that is, for an array B, produce an array whose i,j th element
is LUT[B[i,j]]: FIXME: argsort is a better example
#!python numbers=disable
>>> LUT = np.sin(A)
>>> LUT
array([ 0. , 0.84147098, 0.90929743, 0.14112001, -0.7568025 ,
-0.95892427, -0.2794155 , 0.6569866 , 0.98935825, 0.41211849])
>>> LUT[B]
array([[ 0. , 0.84147098, 0.90929743],
[ 0.14112001, -0.7568025 , -0.95892427],
[-0.2794155 , 0.6569866 , 0.98935825]])
For this sort of thing numpy provides what is called “fancy indexing”. It is not nearly as quick and lightweight as
slicing, but it allows one to do some rather sophisticated things while letting numpy do all the hard work in C.
Boolean indexing
It frequently happens that one wants to select or modify only the elements of an array satisfying some condition.
numpy provides several tools for working with this sort of situation. The first is boolean arrays. Comparisons - equal
to, less than, and so on - between numpy arrays produce arrays of boolean values:
#!python numbers=disable
>>> A<5
array([ True, True, True, True, True, False, False, False, False, False],
˓→dtype=bool)
These are normal arrays. The actual storage type is normally a single byte per value, not bits packed into a byte, but
boolean arrays offer the same range of indexing and array-wise operations as other arrays. Unfortunately, python’s
“and” and “or” cannot be overridden to do array-wise operations, so you must use the bitwise operations “&”, “|”,
and “^” (for exclusive-or). Similarly python’s chained inequalities cannot be overridden. Also, regrettably, one cannot
chage the precence of the bitwise operators:
#!python numbers=disable
>>> c = A<5 & A>1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: The truth value of an array with more than one element is ambiguous. Use
˓→a.any() or a.all()
The result is necessarily a copy of the original array, rather than a view, since it will not normally be the case the the
elements of c that are True select an evenly-strided memory layout. Nevertheless it is also possible to use boolean
arrays to write to specific elements:
>>> T = A.copy()
>>> c = (A<5) & (A>1)
>>> T[c] = -7
>>> T
array([ 0, 1, -7, -7, -7, 5, 6, 7, 8, 9])
Boolean indexing works for multidimensional arrays as well. In its simplest (and most common) incarnation, you
simply supply a single boolean array as index, the same shape as the original array:
>>> C[C%5==0]
array([ 0, 5, 10, 15, 20])
You then get back a one-dimensional array of the elements for which the condition is True. (Note that the array must
be one-dimensional, since the boolean values can be arranged arbitrarily around the array. If you want to keep track
of the arrangement of values in the original array, look into using numpy’s “masked array” tools.) You can also use
boolean indexing for assignment, just as you can for one-dimensional arrays.
Two very useful operations on boolean arrays are np.any() and np.all():
>>> np.any(B<5)
True
>>> np.all(B<5)
False
They do just what they say on the tin, evaluate whether any entry in the boolean matrix is True, or whether all elements
in the boolean matrix are True. But they can also be used to evaluate “along an axis”, for example, to produce a boolean
array saying whether any element in a given row is True:
>>> B<5
array([[ True, True, True],
[ True, True, False],
[False, False, False]], dtype=bool)
>>> np.any(B<5, axis=1)
array([ True, True, False], dtype=bool)
>>> np.all(B<5, axis=1)
array([ True, False, False], dtype=bool)
One can also use boolean indexing to pull out rows or columns meeting some criterion:
The result here is two-dimensional because there is one dimension for the results of the boolean indexing, and one
dimension because each row is one-dimensional.
This works with higher-dimensional boolean arrays as well:
>>> c = np.any(C<5,axis=2)
>>> c
array([[ True, True, False],
[False, False, False]], dtype=bool)
>>> C[c,:]
array([[0, 1, 2, 3],
[4, 5, 6, 7]])
Here too the result is two-dimensional, though that is perhaps a little more surprising. The boolean array is two-
dimensional, but the part of the return value corresponding to the boolean array must be one-dimensional, since the
True values may be distributed arbitrarily. The subarray of C corresponding to each True or False value is one-
dimensional, so we get a return array of dimension two.
Finally, if you want to apply boolean conditions to the rows and columns simultaneously, beware:
The obvious approach doesn’t give the right answer. I don’t know why not, or why it produces the value that it does.
You can get the right answer by indexing twice, but that’s clumsy and inefficient and doesn’t allow assignment.
FIXME: works with too-small boolean arrays for some reason?
List-of-locations indexing
It happens with some frequency that one wants to pull out values at a particular location in an array. If one wants
a single location, one can just use simple indexing. But if there are many locations, you need something a bit more
clever. Fortunately numpy supports a mode of fancy indexing that accomplishes this:
When you index with an array that is not an array of booleans, or with a list, numpy views it as an array of indices.
The array can be any shape, and the returned array has the same shape:
>>> T = A.copy()
>>> T[ [1,3,5,0] ] = -np.arange(4)
>>> T
array([-3, 0, 2, -1, 4, -2, 6, 7, 8, 9])
Warning: Augmented assignment - the operators like “+=” - works, but it does not necessarily do what you would
expect. In particular, repeated indices do not result in the value getting added twice:
>>> T = A.copy()
>>> T[ [0,1,2,3,3,3] ] += 10
>>> T
array([10, 11, 12, 13, 4, 5, 6, 7, 8, 9])
This is surprising, inconvenient, and unfortunate, but it is a direct result of how python implements the “+=” operators.
The most common case for doing this is something histogram-like:
Unfortunately this gives the wrong answer. In older versions of numpy there was no really satisfactory solution, but
as of numpy 1.1, the histogram function can do this:
One can also, not too surprisingly, use list-of-locations indexing on multidimensional arrays. The syntax is, however,
a bit surprising. Let’s suppose we want the list [B[0,0],B[1,2],B[0,1]]. Then we write:
>>> B[ [0,1,0], [0,2,1] ]
array([0, 5, 1])
>>> [B[0,0],B[1,2],B[0,1]]
[0, 5, 1]
This may seem weird - why not provide a list of tuples representing coordinates? Well, the reason is basically that for
large arrays, lists and tuples are very inefficient, so numpy is designed to work with arrays only, for indices as well as
values. This means that something like B[ [(0,0),(1,2),(0,1)] ] looks just like indexing B with a two-dimensional array,
which as we saw above just means that B should be used as a look-up table yielding a two-dimensional array of results
(each of which is one-dimensional, as usual when we supply only one index to a two-dimensional array).
In summary, in list-of-locations indexing, you supply an array of values for each coordinate, all the same shape, and
numpy returns an array of the same shape containing the values obtained by looking up each set of coordinates in the
original array. If the coordinate arrays are not the same shape, numpy’s broadcasting rules are applied to them to try
to make their shapes the same. If there are not as many arrays as the original array has dimensions, the original array
is regarded as containing arrays, and the extra dimensions appear on the result array.
Fortunately, most of the time when one wants to supply a list of locations to a multidimensional array, one got the list
from numpy in the first place. A normal way to do this is something like:
>>> idx = np.nonzero(B%2)
>>> idx
(array([0, 1, 1, 2]), array([1, 0, 2, 1]))
>>> B[idx]
array([1, 3, 5, 7])
>>> B[B%2 != 0]
array([1, 3, 5, 7])
Here nonzero() takes an array and returns a list of locations (in the correct format) where the array is nonzero. Of
course, one can also index directly into the array with a boolean array; this will be much more efficient unless the
number of nonzero locations is small and the indexing is done many times. But sometimes it is valuable to work with
the list of indices directly.
One unfortunate consequence of numpy’s list-of-locations indexing syntax is that users used to other array languages
expect it to pick out rows and columns. After all, it’s quite reasonable to want to pull out a list of rows and columns
from a matrix. So numpy provides a convenience function, ix_() for doing this:
>>> B[ np.ix_([0,2],[0,2]) ]
array([[0, 2],
[6, 8]])
>>> np.ix_([0,2],[0,2])
(array([[0],
[2]]), array([[0, 2]]))
The way it works is by taking advantage of numpy’s broadcasting facilities. You can see that the two arrays used as
row and column indices have different shapes; numpy’s broadcasting repeats each along the too-short axis so that they
conform.
What happens when you try to mix slice indexing, element indexing, boolean indexing, and list-of-locations indexing?
A numpy array is a block of memory, a data type for interpreting memory locations, a list of sizes, and a list of
strides. So for example, C[i,j,k] is the element starting at position i*strides[0]+j*strides[1]+k*strides[2]. This means,
for example, that transposing amatrix can be done very efficiently: just reverse the strides and sizes arrays. This is
why slices are efficient and can return views, but fancy indexing is slower and can’t.
At a python level, numpy’s indexing works by overriding the __getitem__ and __setitem__ methods in an ndarray
object. These methods are called when arrays are indexed, and they allow arbitrary implementations:
numpy and scipy provide a few other types that behave like arrays, in particular matrices and sparse matrices. Their
indexing can differ from that of arrays in surprising ways.
Section author: AMArchibald, jh
MetaArray
!MetaArray is a class that extends ndarray, adding support for per-axis meta data storage. This class is useful for
storing data arrays along with units, axis names, column names, axis values, etc. !MetaArray objects can be indexed
and sliced arbitrarily using named axes and columns.
Download here: MetaArray.py
Here is an example of the type of data one might store with !MetaArray:
Notice that each axis is named and can store different types of meta information: * The Signal axis has named columns
with different units for each column * The Time axis associates a numerical value with each row * The Trial axis uses
normal integer indexes
data[0, 1, 1]
data[:, "Voltage 1", 0]
data["Trial":1, "Signal":"Voltage 0"]
data["Time":slice(3,7)]
Instantiation
Accepted Syntaxes:
info parameter: This parameter specifies the entire set of meta data for this !
MetaArray and must follow a specific format. First, info is a list of axis descriptions:
For example, the data set shown above would look like:
Accessing Data
Data can be accessed through a variety of methods: * Standard indexing–You may always just index the array exactly
as you would any ndarray * Named axes–If you don’t remember the order of axes, you may specify the axis to be
indexed or sliced like this:
data["AxisName":index]
data["AxisName":slice(...)]
Note that since this syntax hijacks the original slice mechanism,
you must specify a slice using slice() if you want to use named axes.
* Column selection--If you don't remember the index of a column you wish to select,
you may substitute the column's name for the index number.
Lists of column names are also acceptable. For example:
data["AxisName":"ColumnName"]
data["ColumnName"] ## Works only if the named column exists for this axis
data[["ColumnName1", "ColumnName2"]]
File I/O
data.write('fileName')
newData = MetaArray(file='fileName')
Performance Tips
!MetaArray is a subclass of ndarray which overrides the ‘__getitem__‘ and ‘__setitem__‘ methods. Since these meth-
ods must alter the structure of the meta information for each access, they are quite slow compared to the native meth-
ods. As a result, many builtin functions will run very slowly when operating on a !MetaArray. It is recommended,
therefore, that you recast your arrays before performing these operations like this:
data = MetaArray(...)
data.mean() ## Very slow
data.view(ndarray).mean() ## native speed
More Examples
info=[
{'name': 'lat', 'title': 'Latitude'},
{'name': 'lon', 'title': 'Longitude'},
{'title': 'Altitude', 'units': 'm'}
]
In this case,
every value in the array represents the altitude in feet at the lat, lon
position represented by the array index. All of the following return the
value at lat=10, lon=5:
array[10, 5]
array['lon':5, 'lat':10]
array['lat':10][5]
Now suppose we want to combine this data with another array of equal dimensions that
represents the average rainfall for each location.
We could easily store these as two
separate arrays or combine them into a 3D array with this description:
info=[
{'name': 'vals', 'cols': [
{'name': 'altitude', 'units': 'm'},
{'name': 'rainfall', 'units': 'cm/year'}
]},
{'name': 'lat', 'title': 'Latitude'},
{'name': 'lon', 'title': 'Longitude'}
]
array[1, 10, 5]
array['lon':5, 'lat':10, 'val': 'rainfall']
array['rainfall', 'lon':5, 'lat':10]
Notice that in the second example, there is no need for an extra (4th) axis
˓→description
since the actual values are described (name and units) in the column info for the
˓→first axis.
Attachments
• MetaArray.py
• example.png
Multidot
The matrix multiplication function, numpy.dot(), only takes two arguments. That means to multiply more than two
arrays together you end up with nested function calls which are hard to read:
dot(dot(dot(a,b),c),d)
There are a couple of ways to define an ‘mdot’ function that acts like dot but accepts more than two arguments. Using
one of these allows you to write the above expression as
mdot(a,b,c,d)
Or use the equivalent loop (which is apparently the preferred style for Py3K):
def mdot(*args):
ret = args[0]
for a in args[1:]:
ret = dot(ret,a)
return ret
This will always give you left to right associativity, i.e. the expression is interpreted as ‘(((a*b)*c)*d)‘.
You also can make a right-associative version of the loop:
def mdotr(*args):
ret = args[-1]
for a in reversed(args[:-1]):
ret = dot(a,ret)
return ret
which evaluates as ‘(a*(b*(c*d)))‘. But sometimes you’d like to have finer control since the order in which matrix
multiplies are performed can have a big impact on performance. The next version gives that control.
If we’re willing to sacrifice Numpy’s ability to treat tuples as arrays, we can use tuples as grouping constructs. This
version of ‘mdot‘ allows syntax like this:
mdot(a,((b,c),d))
to control the order in which the pairwise ‘dot‘ calls are made.
import types
import numpy
def mdot(*args):
"""Multiply all the arguments using matrix product rules.
The output is equivalent to multiplying the arguments one by one
from left to right using dot().
Precedence can be controlled by creating tuples of arguments,
for instance mdot(a,((b,c),d)) multiplies a (a*((b*c)*d)).
Note that this means the output of dot(a,b) and mdot(a,b) will differ if
def _mdot_r(a,b):
"""Recursive helper for mdot"""
if type(a)==types.TupleType:
if len(a)>1:
a = mdot(*a)
else:
a = a[0]
if type(b)==types.TupleType:
if len(b)>1:
b = mdot(*b)
else:
b = b[0]
return numpy.dot(a,b)
Note that the elementwise multiplication function ‘numpy.multiply‘ has the same two-argument limitation as
‘numpy.dot‘. The exact same generalized forms can be defined for multiply.
Left associative versions:
def mmultiply(*args):
return reduce(numpy.multiply, args)
def mmultiply(*args):
ret = args[0]
for a in args[1:]:
ret = multiply(ret,a)
return ret
Right-associative version:
def mmultiplyr(*args):
ret = args[-1]
for a in reversed(args[:-1]):
ret = multiply(a,ret)
return ret
if len(args)==1:
return args[0]
elif len(args)==2:
return _mmultiply_r(args[0],args[1])
else:
return _mmultiply_r(args[:-1],args[-1])
def _mmultiply_r(a,b):
"""Recursive helper for mmultiply"""
if type(a)==types.TupleType:
if len(a)>1:
a = mmultiply(*a)
else:
a = a[0]
if type(b)==types.TupleType:
if len(b)>1:
b = mmultiply(*b)
else:
b = b[0]
return numpy.multiply(a,b)
numpy supports working with arrays of python objects, but these arrays lack the type-uniformity of normal numpy
arrays, so they can be quite inefficient in terms of space and time, and they can be quite cumbersome to work with.
However, it would often be useful to be able to store a user-defined class in an array.
One approach is to take advantage of numpy’s record arrays. These are arrays in which each element can be large, as
it has named and typed fields; essentially they are numpy’s equivalent to arrays of C structures. Thus if one had a class
consisting of some data - named fields, each of a numpy type - and some methods, one could represent the data for an
array of these objects as a record array. Getting the methods is more tricky.
One approach is to create a custom subclass of the numpy array which handles conversion to and from your object
type. The idea is to store the data for each instance internally in a record array, but when indexing returns a scalar,
construct a new instance from the data in the records. Similarly, when assigning to a particular element, the array
subclass would convert an instance to its representation as a record.
Attached is an implementation of the above scheme.
Section author: AMArchibald
Attachments
• obarray.py
• test_obarray.py
It consists of a rectangular grid of cells which are either dead or alive, and a transition rule for updating the cells’ state.
To update each cell in the grid, the state of the 8 neighbouring cells needs to be examined, i.e. it would be desirable to
have an easy way of accessing the 8 neighbours of all the cells at once without making unnecessary copies. The code
snippet below shows how to use the devious stride tricks for that purpose.
[1] [Game of Life](http://en.wikipedia.org/wiki/Conway%27s_Game_of_Life) at Wikipedia
import numpy as np
from numpy.lib import stride_tricks
x = np.arange(20).reshape([4, 5])
xx = stride_tricks.as_strided(x, shape=(2, 3, 3, 3), strides=x.strides + x.strides)
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]])
xx
array([[[[ 0, 1, 2],
[ 5, 6, 7],
[10, 11, 12]],
[[ 1, 2, 3],
[ 6, 7, 8],
[11, 12, 13]],
[[ 2, 3, 4],
[ 7, 8, 9],
[12, 13, 14]]],
[[[ 5, 6, 7],
[10, 11, 12],
[15, 16, 17]],
[[ 6, 7, 8],
[11, 12, 13],
[16, 17, 18]],
[[ 7, 8, 9],
[12, 13, 14],
[17, 18, 19]]]])
xx[0,0]
array([[ 0, 1, 2],
[ 5, 6, 7],
[10, 11, 12]])
xx[1,2]
array([[ 7, 8, 9],
[12, 13, 14],
[17, 18, 19]])
x.strides
(40, 8)
xx.strides
(40, 8, 40, 8)
From time to time, people write to the !NumPy list asking in which cases a view of an array is created and in which it
isn’t. This page tries to clarify some tricky points on this rather subtle subject.
As its name is saying, it is simply another way of viewing the data of the array. Technically, that means that the data
of both objects is shared. You can create views by selecting a slice of the original array, or also by changing the dtype
(or a combination of both). These different kinds of views are described below.
This is probably the most common source of view creations in !NumPy. The rule of thumb for creating a slice view is
that the viewed elements can be addressed with offsets, strides, and counts in the original array. For example:
>>> a = numpy.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> v1 = a[1:2]
>>> v1
array([1])
>>> a[1] = 2
>>> v1
array([2])
>>> v2 = a[1::3]
>>> v2
array([2, 4, 7])
>>> a[7] = 10
>>> v2
array([ 2, 4, 10])
In the above code snippet, () and () are views of , because if you update , and will reflect the change.
Another way to create array views is by assigning another dtype to the same data area. For example:
>>> v4
array([1, 0, 1, 0, 3, 0, 3, 0, 5, 0, 5, 0, 7, 0, 7, 0, 9, 0, 9, 0], dtype=int8)
In that case, and are views of . As you can see, dtype views are not as useful as slice views, but can be handy in some
cases (for example, for quickly looking at the bytes of a generic array).
I think I understand what a view is, but why fancy indexing is not returning a view?
One might be tempted to think that doing fancy indexing may lead to sliced views. But this is not true:
>>> a = numpy.arange(10)
>>> c1 = a[[1,3]]
>>> c2 = a[[3,1,1]]
>>> a[:] = 100
>>> c1
array([1, 3])
>>> c2
array([3, 1, 1])
The reason why a fancy indexing is not returning a view is that, in general, it cannot be expressed as a slice (in the
sense stated above of being able to be addressed with offsets, strides, and counts).
For example, fancy indexing for could have been expressed by , but it is not possible to do the same for by means of a
slice. So, this is why an object with a copy of the original data is returned instead.
But fancy indexing does seem to return views sometimes, doesn’t it?
Many users get fooled and think that fancy indexing returns views instead of copies when they use this idiom:
>>> a = numpy.arange(10)
>>> a[[1,2]] = 100
>>> a
array([ 0, 100, 100, 3, 4, 5, 6, 7, 8, 9])
So, it seems that a<1,2> is actually a view because elements 1 and 2 have been updated. However, if we try this step
by step, it won’t work:
>>> a = numpy.arange(10)
>>> c1 = a[[1,2]]
>>> c1[:] = 100
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> c1
array([100, 100])
What happens here is that, in the first idiom (), the python interpreter translate it to:
a.__setitem__([1,2], 100)
i.e. there is not need to create neither a view or a copy because the method can be evaluated inplace (i.e. no new object
creation is involved).
However, the second idiom () is translated to:
c1 = a.__getitem__([1,2])
c1.__setitem__(slice(None, None, None), 100) # [:] translates into slice(None, None,
˓→None)
i.e. a new object with a copy (remember, fancy indexing does not return views) of some elements of is created and
returned prior to call .
The rule of thumb here can be: in the context of lvalue indexing (i.e. the indices are placed in the left hand side value
of an assignment), no view or copy of the array is created (because there is no need to). However, with regular values,
the above rules for creating views does apply.
Finally, let’s put a somewhat advanced problem. The next snippet works as expected:
>>> a = numpy.arange(12).reshape(3,4)
>>> ifancy = [0,2]
>>> islice = slice(0,3,2)
>>> a[islice, :][:, ifancy] = 100
>>> a
array([[100, 1, 100, 3],
[ 4, 5, 6, 7],
[100, 9, 100, 11]])
Would the reader discover why is the difference? ‘’Hint: think in terms of the sequence of {{{getitem()}}} and
{{{setitem()}}} calls and what they do on each example.’‘
Section author: FrancescAltet, Unknown[16], WarrenWeckesser
NumPy doesn’t include a function that is equivalent to MATLAB’s accumarray function. The following function,
accum, is close.
Note that accum can handle n-dimensional arrays, and allows the data type of the result to be specified.
from itertools import product
import numpy as np
accmap : ndarray
This is the "accumulation map". It maps input (i.e. indices into
`a`) to their destination in the output array. The first `a.ndim`
dimensions of `accmap` must be the same as `a.shape`. That is,
out : ndarray
The accumulated results.
if func is None:
func = np.sum
if dtype is None:
dtype = a.dtype
if accmap.shape == a.shape:
accmap = np.expand_dims(accmap, -1)
adims = tuple(range(a.ndim))
if size is None:
size = 1 + np.squeeze(np.apply_over_axes(np.max, accmap, axes=adims))
size = np.atleast_1d(size)
return out
a = array([[1,2,3],[4,-1,6],[-1,8,9]])
accmap = array([[0,1,2],[2,0,1],[1,2,0]])
s = accum(accmap, a)
s
array([ 9, 7, 15])
accmap = array([
[[0,0],[0,0],[0,1]],
[[0,0],[0,0],[0,1]],
[[1,0],[1,0],[1,1]],
])
Create an array of lists containing the values to be accumulated in each position in the output array:
Use ‘accum‘ to arrange some values from a 1D array in a 2D array (note that using ‘accum‘ for this is overkill; fancy
indexing would suffice):
array([[0, 5],
[1, 4],
[2, 3],
[3, 2],
[4, 1],
[5, 0]])
vals = array(range(10,16))
accum(subs, vals)
array([[ 0, 0, 0, 0, 0, 10],
[ 0, 0, 0, 0, 11, 0],
[ 0, 0, 0, 12, 0, 0],
[ 0, 0, 13, 0, 0, 0],
[ 0, 14, 0, 0, 0, 0],
[15, 0, 0, 0, 0, 0]])
Fitting data
This page shows you how to fit experimental data and plots the results using matplotlib.
Using real data is much more fun, but, just so that you can reproduce this example I will generate data to fit
import numpy as np
from numpy import pi, r_
import matplotlib.pyplot as plt
from scipy import optimize
tX = 11.86*np.cos(2*pi/0.81*Tx-1.32) + 0.64*Tx+4*((0.5-np.random.rand(num_points))*np.
˓→exp(2*np.random.rand(num_points)**2))
tY = -32.14*np.cos(2*np.pi/0.8*Ty-1.94) + 0.15*Ty+7*((0.5-np.random.rand(num_
˓→points))*np.exp(2*np.random.rand(num_points)**2))
We now have two sets of data: Tx and Ty, the time series, and tX and tY, sinusoidal data with noise. We are interested
in finding the frequency of the sine wave.
# Fit the first set
fitfunc = lambda p, x: p[0]*np.cos(2*np.pi/p[1]*x+p[2]) + p[3]*x # Target function
errfunc = lambda p, x, y: fitfunc(p, x) - y # Distance to the target function
287
SciPy Cookbook, Release
ax = plt.axes()
plt.text(0.8, 0.07,
'x freq : %.3f kHz \n y freq : %.3f kHz' % (1/p1[1],1/p2[1]),
fontsize=16,
horizontalalignment='center',
verticalalignment='center',
transform=ax.transAxes)
plt.show()
Suppose that you have the same data set: two time-series of oscillating phenomena, but that you know that the fre-
quency of the two oscillations is the same. A clever use of the cost function can allow you to fit both set of data in one
fit, using the same frequency. The idea is that you return, as a “cost” array, the concatenation of the costs of your two
data sets for one choice of parameters. Thus the leastsq routine is optimizing both data sets at the same time.
# Target function
fitfunc = lambda T, p, x: p[0]*np.cos(2*np.pi/T*x+p[1]) + p[2]*x
# Initial guess for the first set's parameters
p1 = r_[-15., 0., -1.]
# Initial guess for the second set's parameters
p2 = r_[-15., 0., -1.]
# Initial guess for the common period
T = 0.8
# Vector of the parameters to fit, it contains all the parameters of the problem, and
˓→the period of the oscillation is not there twice !
ax = plt.axes()
plt.text(0.8, 0.07,
'x freq : %.3f kHz' % (1/p[0]),
fontsize=16,
horizontalalignment='center',
verticalalignment='center',
transform=ax.transAxes)
<matplotlib.text.Text at 0x7fab996e5b90>
Especially when using fits for interactive use, the standard syntax for optimize.leastsq can get really long. Using the
following script can simplify your life:
import numpy as np
from scipy import optimize
class Parameter:
def __init__(self, value):
self.value = value
def __call__(self):
return self.value
if x is None: x = np.arange(y.shape[0])
p = [param() for param in parameters]
return optimize.leastsq(f, p)
sigma = Parameter(3)
height = Parameter(5)
Fitting gaussian-shaped data does not require an optimization routine. Just calculating the moments of the distribution
is enough, and this is much faster.
However this works only if the gaussian is not cut out too much, and if it is not too small.
data = gaussian(np.arange(100))
plt.plot(data, '.')
X = np.arange(data.size)
x = np.sum(X*data)/np.sum(data)
width = np.sqrt(np.abs(np.sum((X-x)**2*data)/np.sum(data)))
max = data.max()
plt.plot(fit(X), '-')
[<matplotlib.lines.Line2D at 0x7fab9977d990>]
Fitting a 2D gaussian
Here is robust code to fit a 2D gaussian. It calculates the moments of the data to guess the initial parameters for
an optimization routine. For a more complete gaussian, one with an optional additive constant and rotation, see
http://code.google.com/p/agpy/source/browse/trunk/agpy/gaussfitter.py. It also allows the specification of a known
error.
def gaussian(height, center_x, center_y, width_x, width_y):
"""Returns a gaussian function with the given parameters"""
width_x = float(width_x)
width_y = float(width_y)
return lambda x,y: height*np.exp(
-(((center_x-x)/width_x)**2+((center_y-y)/width_y)**2)/2)
def moments(data):
"""Returns (height, x, y, width_x, width_y)
the gaussian parameters of a 2D distribution by calculating its
moments """
total = data.sum()
X, Y = np.indices(data.shape)
x = (X*data).sum()/total
y = (Y*data).sum()/total
col = data[:, int(y)]
width_x = np.sqrt(np.abs((np.arange(col.size)-y)**2*col).sum()/col.sum())
row = data[int(x), :]
width_y = np.sqrt(np.abs((np.arange(row.size)-x)**2*row).sum()/row.sum())
height = data.max()
return height, x, y, width_x, width_y
def fitgaussian(data):
"""Returns (height, x, y, width_x, width_y)
the gaussian parameters of a 2D distribution found by a fit"""
params = moments(data)
plt.matshow(data, cmap=plt.cm.gist_earth_r)
params = fitgaussian(data)
fit = gaussian(*params)
plt.contour(fit(*np.indices(data.shape)), cmap=plt.cm.copper)
ax = plt.gca()
(height, x, y, width_x, width_y) = params
<matplotlib.text.Text at 0x7fab9d8a4dd0>
Generate some data with noise to demonstrate the fitting procedure. Data is generated with an amplitude of 10 and a
power-law index of -2.0. Notice that all of our data is well-behaved when the log is taken... you may have to be more
careful of this for real data.
##########
# Generate data points with noise
##########
num_points = 20
If your data is well-behaved, you can fit a power-law function by first converting to a linear equation by using the
logarithm. Then use the optimize function to fit a straight line. Notice that we are weighting by positional uncertainties
during the fit. Also, the best-fit parameters uncertainties are estimated from the variance-covariance matrix. You should
read up on when it may not be appropriate to use this form of error estimation. If you are trying to fit a power-law
distribution, this solution is more appropriate.
##########
# Fitting the data -- Least Squares Method
##########
logx = np.log10(xdata)
logy = np.log10(ydata)
logyerr = yerr / ydata
pfinal = out[0]
covar = out[1]
print pfinal
print covar
index = pfinal[1]
amp = 10.0**pfinal[0]
##########
# Plotting data
##########
plt.clf()
plt.subplot(2, 1, 1)
plt.plot(xdata, powerlaw(xdata, amp, index)) # Fit
plt.errorbar(xdata, ydata, yerr=yerr, fmt='k.') # Data
plt.text(5, 6.5, 'Ampli = %5.2f +/- %5.2f' % (amp, ampErr))
plt.text(5, 5.5, 'Index = %5.2f +/- %5.2f' % (index, indexErr))
plt.title('Best Fit Power Law')
plt.xlabel('X')
plt.ylabel('Y')
plt.xlim(1, 11)
plt.subplot(2, 1, 2)
plt.loglog(xdata, powerlaw(xdata, amp, index))
plt.errorbar(xdata, ydata, yerr=yerr, fmt='k.') # Data
plt.xlabel('X (log scale)')
plt.ylabel('Y (log scale)')
plt.xlim(1.0, 11)
[ 1.00341313 -2.00447676]
[[ 0.01592265 -0.0204523 ]
[-0.0204523 0.03027352]]
(1.0, 11)
Section author: GaelVaroquaux, Unknown[52], Unknown[53], jh, jesrl, keflavich, MartinSpacek, Unknown[54]
Attachments
• gaussfitter.py
• gaussfitter2.py
A bundle adjusmtent problem arises in 3-D reconstruction and it can be formulated as follows (taken from https:
//en.wikipedia.org/wiki/Bundle_adjustment):
Given a set of images depicting a number of 3D points from different viewpoints, bundle adjustment can
be defined as the problem of simultaneously refining the 3D coordinates describing the scene geometry as
well as the parameters of the relative motion and the optical characteristics of the camera(s) employed to
acquire the images, according to an optimality criterion involving the corresponding image projections of
all points.
More precisely. We have a set of points in real world defined by their coordinates (𝑋, 𝑌, 𝑍) in some apriori chosen
“world coordinate frame”. We photograph these points by different cameras, which are characterized by their orienta-
tion and translation relative to the world coordinate frame and also by focal length and two radial distortion parameters
(9 parameters in total). Then we precicely measure 2-D coordinates (𝑥, 𝑦) of the points projected by the cameras on
images. Our task is to refine 3-D coordinates of original points as well as camera parameters, by minimizing the sum
of squares of reprojecting errors.
Let 𝑃 = (𝑋, 𝑌, 𝑍)𝑇 - a radius-vector of a point, 𝑅 - a rotation matrix of a camera, 𝑡 - a translation vector of a camera,
𝑓 - its focal distance, 𝑘1 , 𝑘2 - its distortion parameters. Then the reprojecting is done as follows:
𝑄 = 𝑅𝑃 + 𝑡 (8.1)
(︂ )︂
𝑄𝑥 /𝑄𝑧
𝑞=− (8.2)
𝑄𝑦 /𝑄𝑧
𝑝 = 𝑓 (1 + 𝑘1 ‖𝑞𝑞 ‖2 + 𝑘2 ‖𝑞𝑞 ‖4 )𝑞𝑞 (8.3)
The resulting vector 𝑝 = (𝑥, 𝑦)𝑇 contains image coordinates of the original point. This model is called “pinhole cam-
era model”, a very good notes about this subject I found here http://www.comp.nus.edu.sg/~cs4243/lecture/camera.pdf
Now let’s start solving some real bundle adjusment problem. We’ll take a problem from http://grail.cs.washington.
edu/projects/bal/.
from __future__ import print_function
import urllib
import bz2
import os
import numpy as np
if not os.path.isfile(FILE_NAME):
urllib.request.urlretrieve(URL, FILE_NAME)
for i in range(n_observations):
camera_index, point_index, x, y = file.readline().split()
camera_indices[i] = int(camera_index)
point_indices[i] = int(point_index)
points_2d[i] = [float(x), float(y)]
camera_params = np.empty(n_cameras * 9)
for i in range(n_cameras * 9):
camera_params[i] = float(file.readline())
camera_params = camera_params.reshape((n_cameras, -1))
points_3d = np.empty(n_points * 3)
for i in range(n_points * 3):
points_3d[i] = float(file.readline())
points_3d = points_3d.reshape((n_points, -1))
n_cameras = camera_params.shape[0]
n_points = points_3d.shape[0]
n = 9 * n_cameras + 3 * n_points
m = 2 * points_2d.shape[0]
print("n_cameras: {}".format(n_cameras))
print("n_points: {}".format(n_points))
print("Total number of parameters: {}".format(n))
print("Total number of residuals: {}".format(m))
n_cameras: 49
n_points: 7776
Total number of parameters: 23769
Total number of residuals: 63686
We chose a relatively small problem to reduce computation time, but scipy’s algorithm is capable of solving much
larger problems, although required time will grow proportionally.
Now define the function which returns a vector of residuals. We use numpy vectorized computations:
You can see that computing Jacobian of fun is cumbersome, thus we will rely on the finite difference approximation.
To make this process time feasible we provide Jacobian sparsity structure (i. e. mark elements which are known to be
non-zero):
i = np.arange(camera_indices.size)
for s in range(9):
A[2 * i, camera_indices * 9 + s] = 1
A[2 * i + 1, camera_indices * 9 + s] = 1
for s in range(3):
A[2 * i, n_cameras * 9 + point_indices * 3 + s] = 1
A[2 * i + 1, n_cameras * 9 + point_indices * 3 + s] = 1
return A
Now we are ready to run optimization. Let’s visualize residuals evaluated with the initial parameters.
%matplotlib inline
import matplotlib.pyplot as plt
x0 = np.hstack((camera_params.ravel(), points_3d.ravel()))
plt.plot(f0)
[<matplotlib.lines.Line2D at 0x1067696d8>]
import time
from scipy.optimize import least_squares
t0 = time.time()
res = least_squares(fun, x0, jac_sparsity=A, verbose=2, x_scale='jac', ftol=1e-4,
˓→method='trf',
t1 = time.time()
˓→ 2.20e+03
8 11 1.3418e+04 3.70e+00 1.28e+02
˓→ 7.88e+03
Setting scaling='jac' was done to automatically scale the variables and equalize their influence on the cost
function (clearly the camera parameters and coordinates of the points are very different entities). This option turned
out to be crucial for successfull bundle adjustment.
Now let’s plot residuals at the found solution:
plt.plot(res.fun)
[<matplotlib.lines.Line2D at 0x10de73438>]
We see much better picture of residuals now, with the mean being very close to zero. There are some spikes left. It
can be explained by outliers in the data, or, possibly, the algorithm found a local minimum (very good one though)
or didn’t converged enough. Note that the algorithm worked with Jacobian finite difference aproximate, which can
potentially block the progress near the minimum because of insufficient accuracy (but again, computing exact Jacobian
for this problem is quite difficult).
Section author: Nikolay Mayorov
This page gathers different methods used to find the least squares circle fitting a set of 2D points (x,y).
The full code of this analysis is available here: least_squares_circle_v1d.py.
Finding the least squares circle corresponds to finding the center of the circle (xc, yc) and its radius Rc which minimize
the residu function defined below:
#! python
Ri = sqrt( (x - xc)**2 + (y - yc)**2)
residu = sum( (Ri - Rc)**2)
This is a nonlinear problem. We well see three approaches to the problem, and compare there results, as well as their
speeds.
As detailed in this document this problem can be approximated by a linear one if we define the function to minimize
as follow:
#! python
residu_2 = sum( (Ri**2 - Rc**2)**2)
#! python
# == METHOD 1 ==
method_1 = 'algebraic'
xc_1 = x_m + uc
yc_1 = y_m + vc
Scipy comes will several tools to solve the nonlinear problem above. Among them, scipy.optimize.leastsq is very
simple to use in this case.
Indeed, once the center of the circle is defined, the radius can be calculated directly and is equal to mean(Ri). So there
is only two parameters left: xc and yc.
Basic usage
#! python
# == METHOD 2 ==
from scipy import optimize
method_2 = "leastsq"
def f_2(c):
""" calculate the algebraic distance between the data points and the mean circle
˓→centered at c=(xc, yc) """
Ri = calc_R(*c)
return Ri - Ri.mean()
To gain in speed, it is possible to tell optimize.leastsq how to compute the jacobian of the function by adding a Dfun
argument:
#! python
# == METHOD 2b ==
method_2b = "leastsq with jacobian"
def f_2b(c):
""" calculate the algebraic distance between the 2D points and the mean circle
˓→centered at c=(xc, yc) """
Ri = calc_R(*c)
return Ri - Ri.mean()
def Df_2b(c):
""" Jacobian of f_2b
The axis corresponding to derivatives must be coherent with the col_deriv option
˓→of leastsq"""
xc, yc = c
df2b_dc = empty((len(c), x.size))
Ri = calc_R(xc, yc)
df2b_dc[0] = (xc - x)/Ri # dR/dxc
df2b_dc[1] = (yc - y)/Ri # dR/dyc
df2b_dc = df2b_dc - df2b_dc.mean(axis=1)[:, newaxis]
return df2b_dc
Scipy has a dedicated package to deal with orthogonal distance regression, namely scipy.odr. This package can handle
both explict and implicit function definition, and we will used the second form in this case.
Here is the implicit definition of the circle:
#! python
(x - xc)**2 + (y-yc)**2 - Rc**2 = 0
Basic usage
method_3 = "odr"
lsc_out = lsc_odr.run()
One of the advantages of the implicit function definition is that its derivatives are very easily calculated.
This can be used to complete the model:
#! python
# == METHOD 3b ==
method_3b = "odr with jacobian"
return df_db
df_dx = empty_like(x)
df_dx[0] = 2*(xi-xc) # d_f/dxi
df_dx[1] = 2*(yi-yc) # d_f/dyi
return df_dx
def calc_estimate(data):
""" Return a first estimation on the parameter from the data """
xc0, yc0 = data.x.mean(axis=1)
r0 = sqrt((data.x[0]-xc0)**2 +(data.x[1] -yc0)**2).mean()
return xc0, yc0, r0
#! python
# Coordinates of the 2D points
x = r_[ 9, 35, -13, 10, 23, 0]
y = r_[ 34, 10, 6, -14, 27, -10]
This gives:
||||||||||||||||SUMMARY|| ||Method|| Xc || Yc || Rc ||nb_calls || std(Ri)|| residu || residu2 || ||algebraic || 10.55231 ||
9.70590|| 23.33482|| 1|| 1.135135|| 7.731195|| 16236.34|| ||leastsq || 10.50009 || 9.65995|| 23.33353|| 15|| 1.133715||
7.711852|| 16276.89|| ||leastsq with jacobian || 10.50009 || 9.65995|| 23.33353|| 7|| 1.133715|| 7.711852|| 16276.89||
||odr || 10.50009 || 9.65995|| 23.33353|| 82|| 1.133715|| 7.711852|| 16276.89|| ||odr with jacobian || 10.50009 || 9.65995||
23.33353|| 16|| 1.133715|| 7.711852|| 16276.89||
Note:
* `nb_calls` correspond to the number of function calls of the function to be minimized,
and do not take into account the number of calls to derivatives function.
This differs from the number of iteration as ODR can make multiple calls during an iteratio
* as shown on the figures below, the two functions `residu` and `residu_2` are not equivale
but their minima are close in this case.
items/attachments/Least_Squares_Circle_attachments/full_cercle_v5.png
items/files/Least_Squares_Circle/full_cercle_residu2_v5.png
#! python
x = r_[36, 36, 19, 18, 33, 26]
y = r_[14, 10, 28, 31, 18, 26]
‘’‘Method’‘’
‘’‘Xc’‘’
‘’‘Yc’‘’
‘’‘Rc’‘’
‘’‘nb_calls’‘’
‘’‘std(Ri)’‘’
‘’‘residu’‘’
‘’‘residu2’‘’
algebraic
15.05503
8.83615
20.82995
1
0.930508
5.195076
9153.40
leastsq
9.88760
3.68753
27.35040
24
0.820825
4.042522
12001.98
leastsq with jacobian
9.88759
3.68752
27.35041
10
0.820825
4.042522
12001.98
odr
9.88757
3.68750
27.35044
472
0.820825
4.042522
12002.01
odr with jacobian
9.88757
3.68750
27.35044
109
0.820825
4.042522
12002.01
arc_v5.png arc_residu2_v6.png
Conclusion
Attachments
• arc_residu2_v1.png
• arc_residu2_v2.png
• arc_residu2_v3.png
• arc_residu2_v4.png
• arc_residu2_v5.png
• arc_residu2_v6.png
• arc_v1.png
• arc_v2.png
• arc_v3.png
• arc_v4.png
• arc_v5.png
• full_cercle_residu2_v1.png
• full_cercle_residu2_v2.png
• full_cercle_residu2_v3.png
• full_cercle_residu2_v4.png
• full_cercle_residu2_v5.png
• full_cercle_v1.png
• full_cercle_v2.png
• full_cercle_v3.png
• full_cercle_v4.png
• full_cercle_v5.png
• least_squares_circle.py
• least_squares_circle_v1.py
• least_squares_circle_v1b.py
• least_squares_circle_v1c.py
• least_squares_circle_v1d.py
• least_squares_circle_v2.py
• least_squares_circle_v3.py
Linear regression
#matplotlib ploting
title('Linear Regression Example')
plot(t,x,'g.--')
plot(t,xn,'k.')
plot(t,xr,'r.-')
legend(['original','plus noise', 'regression'])
show()
Attachments
• linregress.png
OLS
#!python
import ols
After importing the class you can estimate a model by passing data to it as follows
#!python
mymodel = ols.ols(y,x,y_varnm,x_varnm)
where y is an array with data for the dependent variable, x contains the independent variables, y_varnm, is a string
with the variable label for the dependent variable, and x_varnm is a list of variable labels for the independent variables.
Note: An intercept term and variable label is automatically added to the model.
#!python
>>> import ols
>>> from numpy.random import randn
>>> data = randn(100,5)
>>> y = data[:,0]
>>> x = data[:,1:]
>>> mymodel = ols.ols(y,x,'y',['x1','x2','x3','x4'])
>>> mymodel.p # return coefficient p-values
array([ 0.31883448, 0.7450663 , 0.95372471, 0.97437927, 0.09993078])
Dependent Variable: y
Method: Least Squares
Date: Thu, 28 Feb 2008
Time: 22:32:24
# obs: 100
const 0.107348 0.107121 1.002113 0.318834
x1 -0.037116 0.113819 -0.326100 0.745066
x2 0.006657 0.114407 0.058183 0.953725
x3 0.003617 0.112318 0.032201 0.974379
R-squared 0.033047 Durbin-Watson stat 2.012949
Adjusted R-squared -0.007667 Omnibus stat 5.664393
F-statistic 0.811684 Prob(Omnibus stat) 0.058883
Prob (F-statistic) 0.520770 JB stat 6.109005
Log likelihood -145.182795 Prob(JB) 0.047146
AIC criterion 3.003656 Skew 0.327103
== Note ==
Library function [http://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.lstsq.html numpy.linalg.lstsq()]
performs basic OLS estimation.
Attachments
• ols.0.1.py
• ols.0.2.py
This is a quick example of creating data from several Bessel functions and finding local maxima, then fitting a curve
using some spline functions from the scipy.interpolate module. The enthought.chaco package and wxpython are used
for creating the plot. PyCrust (which comes with wxpython) was used as the python shell.
plt.figure()
plt.hold()
w = []
z = []
x = arange(0,10,.01)
for k in arange(1,5,.5):
y = special.jv(k,x)
plt.plot(x,y)
f = lambda x: -special.jv(k,x)
x_max = optimize.fminbound(f,0,6)
w.append(x_max)
z.append(special.jv(k,x_max))
plt.plot(w,z, 'ro')
from scipy import interpolate
t = interpolate.splrep(w, z, k=3)
s_fit3 = interpolate.splev(x,t)
plt.plot(x,s_fit3, 'g-')
t5 = interpolate.splrep(w, z, k=5)
s_fit5 = interpolate.splev(x,t5)
plt.plot(x,s_fit5, 'y-')
#class left
# 
x = arange(0,10,0.01)
for k in arange(0.5,5.5):
y = special.jv(k,x)
plot(x,y)
f = lambda x: -special.jv(k,x)
x_max = optimize.fminbound(f,0,6)
plot([x_max], [special.jv(k,x_max)],'ro')
RANSAC
The attached file ransac.py implements the RANSAC algorithm. An example image:
ipython -wthread
import ransac
ransac.test()
To use the module you need to create a model class with two methods
An example of such model is the class LinearLeastSquaresModel as seen the file source (below)
ransac.py
Section author: AndrewStraw, DatChu
Attachments
• ransac.png
• ransac.py
One of the main applications of nonlinear least squares is nonlinear regression or curve fitting. That is by given pairs
{(𝑡𝑖 , 𝑦𝑖 ) 𝑖 = 1, . . . , 𝑛} estimate parameters x defining a nonlinear function 𝜙(𝑡; x), assuming the model:
𝑦𝑖 = 𝜙(𝑡𝑖 ; x) + 𝜖𝑖 (8.4)
Where 𝜖𝑖 is the measurement (observation) errors. In the least-squares estimation we search x as the solution of the
following optimization problem:
𝑛
1 ∑︁
(𝜙(𝑡𝑖 ; x) − 𝑦𝑖 )2 → min (8.5)
2 𝑖=1 x
Such formulation is intuitive and convinient from mathematical point of view. From the probabilistic point of view
the least-squares solution is known to be the maximum likelihood estimate, provided that all 𝜖𝑖 are independent and
normally distributed random variables.
So theoretically it is not optimal when 𝜖𝑖 have distribution other than normal. Although in engineering practice it is
usally not important, i.e. if errors behave as some reasonable random variables with zero mean a result of least-squares
estimation will be satisfactory.
The real problems start when data is contaminated by outliers (completly wrong measurements). In this case the least-
squares solution can become significantly biased to avoid very high residuals on outliers. To qualitatively explain why
it is happening, let’s consider the simplest least-squares problem:
𝑛
1 ∑︁
(𝑎 − 𝑥𝑖 )2 → min (8.6)
2 𝑖=1 𝑎
Now imagine: 𝑎 = 1, 𝑛 = 4, 𝑥1 = 0.9, 𝑥2 = 1.05, 𝑥3 = 0.95, 𝑥4 = 10 (outlier) . The solution 𝑎* = 3.225, i.e.
completely ruined by a single outlier.
One of the well known robust estimators is l1-estimator, in which the sum of absolute values of the residuals is
minimized. For demonstration, again consider the simplest problem:
𝑛
∑︁
|𝑎 − 𝑥𝑖 | → min (8.8)
𝑎
𝑖=1
And the well-known solution is the median of {𝑥𝑖 }, such that the small number of outliers won’t affect the solution.
In our toy problem we have 𝑎* = 1.
The only disadvantage of l1-estimator is that arising optimization problem is hard, as the function is nondifferentiable
everywhere, which is particularly troublesome for efficient nonlinear optimization. It means that we are better to
stay with differentiable problems, but somehow incorporate robustness in estimation. To accomplish this we intro-
duce a sublinear function 𝜌(𝑧) (i.e. its growth should be slower than linear) and formulate a new least-squares-like
optimization problem:
𝑛
1 ∑︁ (︀
𝜌 (𝜙(𝑡𝑖 ; 𝑥) − 𝑦𝑖 )2 → min
)︀
(8.9)
2 𝑖=1 𝑥
Turns out that this problem can be reduced to standard nonlinear least squares by modifying a vector of residuals
and Jacobian matrix on each iteration, such that computed gradient and Hessian approximation match the ones of the
objective function. Refer to the paper for details.
The implemented choices of 𝜌 are the following:
1. Linear functior which gives a standard least squares: 𝜌(𝑧) = 𝑧.
{︃
𝑧 𝑧≤1
2. Huber loss: 𝜌(𝑧) = √
𝑧−1 𝑧 >1
√
3. Smooth approximation to absolute value loss, “soft l1 loss”: 𝜌(𝑧) = 2( 1 + 𝑧 − 1)
4. Cauchy loss: 𝜌(𝑧) = ln(1 + 𝑧).
5. Loss by arctan: 𝜌(𝑧) = arctan 𝑧.
The functions 2 and 3 are relatively mild and give approximately absolute value loss for large residuals. The last two
functions are strongly sublinear and give significant attenuation for outliers.
The loss functions above are written with the assumption that the soft threshold between inliners and outliers is equal
to 1.0. To generalize, we introduce the scaling parameter 𝐶 and evaluate the loss as
(︂(︁ )︁ )︂
2 2 𝑟 2
𝜌ˆ(𝑟 ) = 𝐶 𝜌 (8.10)
𝐶
Note that if residuals are small, we have 𝜌ˆ(𝑟2 ) ≈ 𝜌(𝑟2 ) ≈ 𝑟2 for any 𝜌(𝑧) defined above.
To illustrate the behaviour of the introduced functions we build a plot:
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
r = np.linspace(0, 5, 100)
linear = r**2
huber = r**2
huber[huber > 1] = 2 * r[huber > 1] - 1
cauchy = np.log1p(r**2)
arctan = np.arctan(r**2)
<matplotlib.legend.Legend at 0x1058c1ef0>
Now we will show how robust loss functions work on a model example. We define the model function as
A = 2
sigma = 0.1
omega = 0.1 * 2 * np.pi
x_true = np.array([A, sigma, omega])
noise = 0.1
t_min = 0
t_max = 30
x0 = np.ones(3)
Run robust least squares with loss='soft_l1', set f_scale to 0.1 which means that inlier residuals are approx-
imately lower than 0.1.
<matplotlib.legend.Legend at 0x107031898>
We clearly see that standard least squares react significantly on outliers and give very biased solution, whereas robust
least squares (with soft l1-loss) give the solution very close to the true model.
Section author: Nikolay Mayorov
This cookbook example shows how to solve a system of differential equations. (Other examples include the Lotka-
Volterra Tutorial, the Zombie Apocalypse and the KdV example.)
This figure shows the system to be modeled:
Two objects with masses 𝑚1 and 𝑚2 are coupled through springs with spring constants 𝑘1 and 𝑘2 . The left end of the
left spring is fixed. We assume that the lengths of the springs, when subjected to no external forces, are 𝐿1 and 𝐿2 .
The masses are sliding on a surface that creates friction, so there are two friction coefficients, 𝑏1 and 𝑏2 .
The differential equations for this system are
𝑚1 𝑥′′1 + 𝑏1 𝑥′1 + 𝑘1 (𝑥1 − 𝐿1 ) − 𝑘2 (𝑥2 − 𝑥1 − 𝐿2 ) = 0
𝑚2 𝑥′′2 + 𝑏2 𝑥′2 + 𝑘2 (𝑥2 − 𝑥1 − 𝐿2 ) = 0
This is a pair of coupled second order equations. To solve this system with one of the ODE solvers provided by SciPy,
we must first convert this to a system of first order differential equations. We introduce two variables
𝑦1 = 𝑥′1
321
SciPy Cookbook, Release
𝑦2 = 𝑥′2
These are the velocities of the masses.
With a little algebra, we can rewrite the two second order equations as the following system of four first order equa-
tions:
𝑥′1 = 𝑦1
𝑦1′ = (−𝑏1 𝑦1 − 𝑘1 (𝑥1 − 𝐿1 ) + 𝑘2 (𝑥2 − 𝑥1 − 𝐿2 ))/𝑚1
𝑥′2 = 𝑦2
𝑦2′ = (−𝑏2 𝑦2 − 𝑘2 (𝑥2 − 𝑥1 − 𝐿2 ))/𝑚2
These equations are now in a form that we can implement in Python.
The following code defines the “right hand side” of the system of equations (also known as a vector field). I have
chosen to put the function that defines the vector field in its own module (i.e. in its own file), but this is not necessary.
Note that the arguments of the function are configured to be used with the function odeint: the time t is the second
argument.
Arguments:
w : vector of the state variables:
w = [x1,y1,x2,y2]
t : time
p : vector of the parameters:
p = [m1,m2,k1,k2,L1,L2,b1,b2]
"""
x1, y1, x2, y2 = w
m1, m2, k1, k2, L1, L2, b1, b2 = p
# Create f = (x1',y1',x2',y2'):
f = [y1,
(-b1 * y1 - k1 * (x1 - L1) + k2 * (x2 - x1 - L2)) / m1,
y2,
(-b2 * y2 - k2 * (x2 - x1 - L2)) / m2]
return f
Next, here is a script that uses odeint to solve the equations for a given set of parameter values, initial conditions, and
time interval. The script writes the points to the file ‘two_springs.dat’.
# Use ODEINT to solve the differential equations defined by the vector field
from scipy.integrate import odeint
# Parameter values
# Masses:
m1 = 1.0
m2 = 1.5
# Spring constants
k1 = 8.0
k2 = 40.0
# Natural lengths
L1 = 0.5
L2 = 1.0
# Friction coefficients
b1 = 0.8
b2 = 0.5
# Initial conditions
# x1 and x2 are the initial displacements; y1 and y2 are the initial velocities
x1 = 0.5
y1 = 0.0
x2 = 2.25
y2 = 0.0
# Create the time samples for the output of the ODE solver.
# I use a large number of points, only because I want to make
# a plot of the solution that looks nice.
t = [stoptime * float(i) / (numpoints - 1) for i in range(numpoints)]
The following script uses Matplotlib to plot the solution generated by the above script.
xlabel('t')
grid(True)
hold(True)
lw = 1
Attachments
• two_springs_diagram.png
This page shows how the Korteweg-de Vries equation can be solved on a periodic domain using the method of lines,
with the spatial derivatives computed using the pseudo-spectral method. In this method, the derivatives are computed
in the frequency domain by first applying the FFT to the data, then multiplying by the appropriate values and converting
back to the spatial domain with the inverse FFT. This method of differentiation is implemented by the diff function in
the module scipy.fftpack.
We discretize the spatial domain, and compute the spatial derivatives using the diff function defined in the
scipy.fftpack module. In the following code, this function is given the alias psdiff to avoid confusing it with the
numpy function diff. By discretizing only the spatial dimension, we obtain a system of ordinary differential equations,
which is implemented in the function kdv(u, t, L). The function kdv_solution(u0, t, L) uses scipy.integrate.odeint
to solve this system.
#!python
import numpy as np
from scipy.integrate import odeint
from scipy.fftpack import diff as psdiff
u = 0.5*c*np.cosh(0.5*np.sqrt(c)*x)**(-2)
return u
# Compute du/dt.
dudt = -6*u*ux - uxxx
return dudt
if __name__ == "__main__":
# Set the size of the domain, and create the discretized grid.
L = 50.0
N = 64
dx = L / (N - 1.0)
x = np.linspace(0, (1-1.0/N)*L, N)
print "Plotting."
plt.figure(figsize=(6,5))
plt.imshow(sol[::-1, :], extent=[0,L,0,T])
plt.colorbar()
plt.xlabel('x')
plt.ylabel('t')
plt.axis('normal')
plt.title('Korteweg-de Vries on a Periodic Domain')
plt.show()
Attachments
• kdv.png
• kdv2.png
We will have a look at the Lotka-Volterra model, also known as the predator-prey equations, which is a pair of first
order, non-linear, differential equations frequently used to describe the dynamics of biological systems in which two
species interact, one a predator and the other its prey. The model was proposed independently by Alfred J. Lotka in
1925 and Vito Volterra in 1926, and can be described by
du/dt = a*u - b*u*v
dv/dt = -c*v + d*b*u*v
+ d is the factor describing how many caught rabbits let create a new fox
Population equilibrium
Before using !SciPy to integrate this system, we will have a closer look at position equilibrium. Equilibrium occurs
when the growth rate is equal to 0. This gives two fixed points:
#!python
X_f0 = array([ 0. , 0.])
X_f1 = array([ c/(d*b), a/b])
all(dX_dt(X_f0) == zeros(2) ) and all(dX_dt(X_f1) == zeros(2)) # => True
Near these two points, the system can be linearized: dX_dt = A_f*X where A is the Jacobian matrix evaluated at the
corresponding point. We have to define the Jacobian matrix:
#!python
def d2X_dt2(X, t=0):
""" Return the Jacobian matrix evaluated in X. """
return array([[a -b*X[1], -b*X[0] ],
[b*d*X[1] , -c +b*d*X[0]] ])
Near X_f0, the number of rabbits increase and the population of foxes decrease. The origin is therefore a saddle point.
Near X_f1, we have:
#!python
A_f1 = d2X_dt2(X_f1) # >>> array([[ 0. , -2. ],
# [ 0.75, 0. ]])
# whose eigenvalues are +/- sqrt(c*a).j:
lambda1, lambda2 = linalg.eigvals(A_f1) # >>> (1.22474j, -1.22474j)
# They are imaginary numbers. The fox and rabbit populations are periodic as follows
˓→from further
Now we will use the scipy.integrate module to integrate the ODEs. This module offers a method named odeint, which
is very easy to use to integrate ODEs:
#!python
from scipy import integrate
t = linspace(0, 15, 1000) # time
X0 = array([10, 5]) # initials conditions: 10 rabbits and 5 foxes
X, infodict = integrate.odeint(dX_dt, X0, t, full_output=True)
infodict['message'] # >>> 'Integration successful.'
‘infodict‘ is optional, and you can omit the ‘full_output‘ argument if you don’t want it. Type “info(odeint)” if you
want more information about odeint inputs and outputs.
We can now use Matplotlib to plot the evolution of both populations:
#!python
rabbits, foxes = X.T
f1 = p.figure()
p.plot(t, rabbits, 'r-', label='Rabbits')
p.plot(t, foxes , 'b-', label='Foxes')
p.grid()
p.legend(loc='best')
p.xlabel('time')
p.ylabel('population')
p.title('Evolution of fox and rabbit populations')
f1.savefig('rabbits_and_foxes_1.png')
The populations are indeed periodic, and their period is close to the value T_f1 that we computed.
We will plot some trajectories in a phase plane for different starting points between X_f0 and X_f1.
We will use Matplotlib’s colormap to define colors for the trajectories. These colormaps are very useful to make nice
plots. Have a look at ShowColormaps if you want more information.
f2 = p.figure()
#-------------------------------------------------------
# plot trajectories
for v, col in zip(values, vcolors):
X0 = v * X_f1 # starting point
X = integrate.odeint( dX_dt, X0, t) # we don't need infodict here
p.plot( X[:,0], X[:,1], lw=3.5*v, color=col, label='X0=(%.f, %.f)' % ( X0[0],
˓→X0[1]) )
#-------------------------------------------------------
# define a grid and compute direction at each point
ymax = p.ylim(ymin=0)[1] # get axis limits
xmax = p.xlim(xmin=0)[1]
nb_points = 20
#-------------------------------------------------------
# Drow direction fields, using matplotlib 's quiver function
# I choose to plot normalized arrows and to use colors to give information on
# the growth speed
p.title('Trajectories and direction fields')
Q = p.quiver(X1, Y1, DX1, DY1, M, pivot='mid', cmap=p.cm.jet)
p.xlabel('Number of rabbits')
p.ylabel('Number of foxes')
p.legend()
p.grid()
p.xlim(0, xmax)
p.ylim(0, ymax)
f2.savefig('rabbits_and_foxes_2.png')
This graph shows us that changing either the fox or the rabbit population can have an unintuitive effect. If, in order to
decrease the number of rabbits, we introduce foxes, this can lead to an increase of rabbits in the long run, depending
on the time of intervention.
We can verify that the function IF defined below remains constant along a trajectory:
#!python
def IF(X):
u, v = X
return u**(c/a) * v * exp( -(b/a)*(d*u+v) )
# We will verify that IF remains constant for different trajectories
for v in values:
X0 = v * X_f1 # starting point
X = integrate.odeint( dX_dt, X0, t)
Plotting iso-contours of IF can be a good representation of trajectories, without having to integrate the ODE
#!python
#-------------------------------------------------------
# plot iso contours
nb_points = 80 # grid size
x = linspace(0, xmax, nb_points)
y = linspace(0, ymax, nb_points)
X2 , Y2 = meshgrid(x, y) # create the grid
Z2 = IF([X2, Y2]) # compute IF on each point
f3 = p.figure()
CS = p.contourf(X2, Y2, Z2, cmap=p.cm.Purples_r, alpha=0.5)
CS2 = p.contour(X2, Y2, Z2, colors='black', linewidths=2. )
p.clabel(CS2, inline=1, fontsize=16, fmt='%.f')
p.grid()
p.xlabel('Number of rabbits')
p.ylabel('Number of foxes')
p.ylim(1, ymax)
p.xlim(1, xmax)
p.title('IF contours')
f3.savefig('rabbits_and_foxes_3.png')
p.show()
Attachments
• rabbits_and_foxes_1.png
• rabbits_and_foxes_1v2.png
• rabbits_and_foxes_2.png
• rabbits_and_foxes_2v2.png
• rabbits_and_foxes_2v3.png
• rabbits_and_foxes_3.png
• rabbits_and_foxes_3v2.png
• tutorial_lokta-voltera.py
• tutorial_lokta-voltera_v2.py
• tutorial_lokta-voltera_v3.py
• tutorial_lokta-voltera_v4.py
• zombie_nodead_nobirths.png
• zombie_somedead_10births.png
• zombie_somedead_nobirths.png
• zombie_somedeaddead_nobirths.png
This example demonstrates how to solve a system of first order ODEs using SciPy. Note that a Nth order equation can
also be solved using SciPy by transforming it into a system of first order equations. In a this lighthearted example, a
system of ODEs can be used to model a “zombie invasion”, using the equations specified in Munz et al. 2009.
The system is given as:
dS/dt = P - BSZ - dS dZ/dt = BSZ + GR - ASZ dR/dt = dS + ASZ - GR
with the following notations:
• S: the number of susceptible victims
• Z: the number of zombies
• R: the number of people “killed”
• P: the population birth rate
• d: the chance of a natural death
• B: the chance the “zombie disease” is transmitted (an alive person becomes a zombie)
• G: the chance a dead person is resurrected into a zombie
• A: the chance a zombie is totally destroyed
This involves solving a system of first order ODEs given by: dy/dt = f(y, t)
Where y = [S, Z, R].
The code used to solve this system is below:
P = 0 # birth rate
d = 0.0001 # natural death percent (per day)
B = 0.0095 # transmission percent (per day)
G = 0.0001 # resurect percent (per day)
A = 0.0001 # destroy percent (per day)
# initial conditions
S0 = 500. # initial population
Z0 = 0 # initial zombie population
R0 = 0 # initial death population
y0 = [S0, Z0, R0] # initial condition vector
t = np.linspace(0, 5., 1000) # time grid
# plot results
plt.figure()
plt.plot(t, S, label='Living')
plt.plot(t, Z, label='Zombies')
plt.xlabel('Days from outbreak')
plt.ylabel('Population')
plt.title('Zombie Apocalypse - No Init. Dead Pop.; No New Births.')
plt.legend(loc=0)
plt.figure()
plt.plot(t, S, label='Living')
plt.plot(t, Z, label='Zombies')
plt.xlabel('Days from outbreak')
plt.ylabel('Population')
plt.title('Zombie Apocalypse - 1% Init. Pop. is Dead; No New Births.')
plt.legend(loc=0)
plt.figure()
plt.plot(t, S, label='Living')
plt.plot(t, Z, label='Zombies')
plt.xlabel('Days from outbreak')
plt.ylabel('Population')
<matplotlib.legend.Legend at 0x392ac90>
We consider a nonlinear elliptic boundary-value problem in a square domain Ω = [0, 1] × [0, 1]:
∆𝑢 + 𝑘𝑓 (𝑢) = 0
𝑢 = 0 on 𝜕Ω
Here 𝑢 = 𝑢(𝑥, 𝑦) is an unknown function, ∆ is Laplace operator, 𝑘 is some constant and 𝑓 (𝑢) is a given function.
A usual computational approach to such problems is discretization. We use a uniform grid with some step ℎ, and define
𝑢𝑖,𝑗 = 𝑢(𝑖ℎ, 𝑗ℎ). By approximating Laplace operator using 5-point finite difference we get a system of equations in
the form:
𝑢𝑖−1,𝑗 + 𝑢𝑖+1,𝑗 + 𝑢𝑖,𝑗−1 + 𝑢𝑖,𝑗+1 − 4𝑢𝑖,𝑗 + 𝑐𝑓 (𝑢𝑖,𝑗 ) = 0
𝑢𝑖,𝑗 = 0 on 𝜕Ω
Here 𝑐 = 𝑘ℎ2 .
From now on we focus on the discrete version and consider a grid with 𝑛 = 100 ticks for each dimension and set
𝑐 = 1 and 𝑓 (𝑢) = 𝑢3 .
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import least_squares
from scipy.sparse import coo_matrix
n = 100
c = 1
def f(u):
return u**3
def f_prime(u):
return 3 * u**2
It is always recommened to provide analytical Jacobian if possible. In our problem we have 𝑛2 = 10000 equations
and variables, but each equation depends only on few variables, thus we should compute Jacobian in a sparse format.
It is convenient to precompute indices of rows and columns of nonzero elements in Jacobian. We definte the corre-
sponding function:
def compute_jac_indices(n):
i = np.arange(n)
jj, ii = np.meshgrid(i, i)
ii = ii.ravel()
jj = jj.ravel()
ij = np.arange(n**2)
jac_rows = [ij]
jac_cols = [ij]
mask = ii > 0
ij_mask = ij[mask]
jac_rows.append(ij_mask)
jac_cols.append(ij_mask - n)
mask = ii < n - 1
ij_mask = ij[mask]
jac_rows.append(ij_mask)
jac_cols.append(ij_mask + n)
mask = jj > 0
ij_mask = ij[mask]
jac_rows.append(ij_mask)
jac_cols.append(ij_mask - 1)
mask = jj < n - 1
ij_mask = ij[mask]
jac_rows.append(ij_mask)
jac_cols.append(ij_mask + 1)
Without any insight to the problem we set all the values to 0.5 initially. Note, that it is not guranteed that the given
continuous or discrete problems have a unique solution.
u0 = np.ones(n**2) * 0.5
Now we are ready to run the optimization. The first solution will be computed without imposing any bounds on the
variables.
Below we visualize the first solution. The left plot shows the flatten solution, the middle plot shows how the solution
looks in the square domain, and the right plot shows final residuals in each node.
plt.figure(figsize=(16, 5))
plt.subplot(132)
plt.imshow(res_1.x.reshape((n, n)), cmap='coolwarm', vmin=-max(abs(res_1.x)),
˓→vmax=max(abs(res_1.x)))
It is possible that some physical considerations require that the solution must be non-negative everywhere. We can
achieve that by specifying bounds to the solver:
plt.figure(figsize=(16, 5))
plt.subplot(132)
plt.imshow(res_2.x.reshape((n, n)), cmap='Reds')
plt.colorbar(use_gridspec=True, fraction=0.046, pad=0.04)
plt.subplot(131)
plt.plot(res_2.x)
plt.subplot(133)
plt.plot(res_2.fun)
plt.tight_layout()
We see that setting a lower bound allowed us to find a different non-negative solution.
You can try running optimization from different starting points, using different bounds or changing 𝑐 and 𝑓 (𝑢), and
see how it affects the result from least_squares solver.
Section author: Nikolay Mayorov
A simple script that recreates the min/max bifurcation diagrams from Hastings and Powell 1991.
Two useful functions are defined in the module bif.py.
import numpy
"""
for start in range(len(data) - (size - 1)):
yield data[start:(start + size)]
"""
maxes = []
mins = []
for first, second, third in window(data, size=3):
if first < second and third < second:
maxes.append(second)
elif first > second and third > second:
mins.append(second)
elif abs(first - second) < tol and abs(second - third) < tol:
# an equilibrium is both the maximum and minimum
maxes.append(second)
mins.append(second)
For speed the model is defined in a fortran file and compiled into a library for use from python. Using this method
gives a 100 fold increase in speed. The file uses Fortran 90, which makes using f2py especially easy. The file is named
hastings.f90.
module model
implicit none
contains
Which is compiled (using the gfortran compiler) with the command: f2py -c -m hastings hastings.f90
--fcompiler=gnu95
import numpy
from scipy.integrate import odeint
import bif
import hastings
print_max(x_minmax, x_maxfile)
print_max(y_minmax, y_maxfile)
print_max(z_minmax, z_maxfile)
Other examples
Brownian Motion
Brownian motion is a stochastic process. One form of the equation for Brownian motion is
𝑋(0) = 𝑋0
𝑋(𝑡 + 𝑑𝑡) = 𝑋(𝑡) + 𝑁 (0, (𝑑𝑒𝑙𝑡𝑎)2 𝑑𝑡; 𝑡, 𝑡 + 𝑑𝑡)
where 𝑁 (𝑎, 𝑏; 𝑡1 , 𝑡2 ) is a normally distributed random variable with mean a and variance b. The parameters 𝑡1 and 𝑡2
make explicit the statistical independence of N on different time intervals; that is, if [𝑡1 , 𝑡2 ) and [𝑡3 , 𝑡4 ) are disjoint
intervals, then 𝑁 (𝑎, 𝑏; 𝑡1 , 𝑡2 ) and 𝑁 (𝑎, 𝑏; 𝑡3 , 𝑡4 ) are independent.
The calculation is actually very simple. A naive implementation that prints n steps of the Brownian motion might look
like this:
# Process parameters
delta = 0.25
dt = 0.1
# Initial condition.
x = 0.0
0.0149783802189
0.0153383445186
345
SciPy Cookbook, Release
0.0234318982959
0.0212808305024
0.0114258184942
0.01354252966
0.024915691657
0.0225717389674
0.0202001899576
0.0210395736257
0.0346234119557
0.0315723937897
0.0296012566384
0.0296135943506
0.0198499273499
0.0165205162055
0.00688369775327
0.0069949507719
0.00888200058681
0.00297662267152
The above code could be easily modified to save the iterations in an array instead of printing them.
The problem with the above code is that it is slow. If we want to compute a large number of iterations, we can do
much better. The key is to note that the calculation is the cumulative sum of samples from the normal distribution.
A fast version can be implemented by first generating all the samples from the normal distribution with one call to
scipy.stats.norm.rvs(), and then using the numpy cumsum function to form the cumulative sum.
The following function uses this idea to implement the function brownian(). The function allows the initial condition
to be an array (or anything that can be converted to an array). Each element of x0 is treated as an initial condition for
a Brownian motion.
"""
brownian() implements one dimensional Brownian motion (i.e. the Wiener process).
"""
# File: brownian.py
where N(a,b; t0, t1) is a normally distributed random variable with mean a and
variance b. The parameters t0 and t1 make explicit the statistical
independence of N on different time intervals; that is, if [t0, t1) and
[t2, t3) are disjoint intervals, then N(a, b; t0, t1) and N(a, b; t2, t3)
are independent.
x0 : float or numpy array (or something that can be converted to a numpy array
using numpy.asarray(x0)).
The initial condition(s) (i.e. position(s)) of the Brownian motion.
n : int
The number of steps to take.
dt : float
The time step.
delta : float
delta determines the "speed" of the Brownian motion. The random variable
of the position at time t, X(t), has a normal distribution whose mean is
the position at time t=0 and whose variance is delta**2*t.
out : numpy array or None
If `out` is not None, it specifies the array in which to put the
result. If `out` is None, a new numpy array is created and returned.
Note that the initial value `x0` is not included in the returned array.
"""
x0 = np.asarray(x0)
return out
Example
Here’s a script that uses this function and matplotlib’s pylab module to plot several realizations of Brownian motion.
import numpy
from pylab import plot, show, grid, xlabel, ylabel
2D Brownian Motion
The same function can be used to generate Brownian motion in two dimensions, since each dimension is just a one-
dimensional Brownian motion.
The following script provides a demo.
import numpy
from pylab import plot, show, grid, axis, xlabel, ylabel, title
# Total time.
T = 10.0
# Number of steps.
N = 500
# Time step size
dt = T/N
# Initial values of x.
x = numpy.empty((2,N+1))
x[:, 0] = 0.0
*Note:* This cookbook entry shows how to generate random samples from a multivariate normal distribution using
tools from SciPy, but in fact NumPy includes the function ‘numpy.random.multivariate_normal‘ to accomplish the
same task.
To generate correlated normally distributed random samples, one can first generate uncorrelated samples, and then
multiply them by a matrix C such that 𝐶𝐶 𝑇 = 𝑅, where R is the desired covariance matrix. C can be created, for
example, by using the Cholesky decomposition of R, or from the eigenvalues and eigenvectors of R.
"""Example of generating correlated normally distributed random samples."""
import numpy as np
from scipy.linalg import eigh, cholesky
from scipy.stats import norm
from pylab import plot, show, axis, subplot, xlabel, ylabel, grid
num_samples = 400
# We need a matrix `c` for which `c*c^T = r`. We can use, for example,
# the Cholesky decomposition, or the we can construct `c` from the
# eigenvectors and eigenvalues.
if method == 'cholesky':
# Compute the Cholesky decomposition.
c = cholesky(r, lower=True)
else:
# Compute the eigenvalues and eigenvectors.
evals, evecs = eigh(r)
# Construct c, so c*c^T = r.
c = np.dot(evecs, np.diag(np.sqrt(evals)))
#
# Plot various projections of the samples.
#
subplot(2,2,1)
plot(y[0], y[1], 'b.')
ylabel('y[1]')
axis('equal')
grid(True)
subplot(2,2,3)
plot(y[0], y[2], 'b.')
xlabel('y[0]')
ylabel('y[2]')
axis('equal')
grid(True)
subplot(2,2,4)
plot(y[1], y[2], 'b.')
xlabel('y[1]')
axis('equal')
grid(True)
show()
Attachments
• correlated_random_vars.png
Easy multithreading
Python includes a multithreading package, “threading”, but python’s multithreading is seriously limited by the Global
Interpreter Lock, which allows only one thread to be interacting with the interpreter at a time. For purely interpreted
code, this makes multithreading effectively cooperative and unable to take advantage of multiple cores.
However, numpy code often releases the GIL while it is calculating, so that simple parallelism can speed up the
code. For sophisticated applications, one should look into MPI or using threading directly, but surprisingly often one’s
application is “embarrassingly parallel”, that is, one simply has to do the same operation to many objects, with no
interaction between iterations. This kind of calculation can be easily parallelized:
dft = parallel_map(lambda f: sum(exp(2.j*pi*f*times)), frequencies)
The code implementing parallel_map is not too complicated, and is attached to this entry. Even simpler, if one doesn’t
want to return values:
def compute(n):
...do something...
foreach(compute, range(100))
Attachments
• handythread.py
• test_handythread.py
Eye Diagram
use_fast = True
try:
from brescount import bres_curve_count
except ImportError:
print "The cython version of the curve counter is not available."
use_fast = False
dx = abs(x1 - x0)
dy = abs(y1 - y0)
sx = 0
if x0 < x1:
sx = 1
else:
sx = -1
sy = 0
if y0 < y1:
sy = 1
else:
sy = -1
err = dx - dy
while True:
# Note: this test is moved before setting
# the value, so we don't set the last point.
if x0 == x1 and y0 == y1:
break
e2 = 2 * err
if e2 > -dy:
err -= dy
x0 += sx
if e2 < dx:
err += dx
y0 += sy
def random_trace(t):
s = 2*(np.random.randint(0, 5) % 2) - 1
r = 0.01 * np.random.randn()
s += r
a = 2.0 + 0.001 * np.random.randn()
q = 2*(np.random.randint(0, 7) % 2) - 1
t2 = t + q*(6 + 0.01*np.random.randn())
t2 += 0.05*np.random.randn()*t
y = a * (np.exp(s*t2) / (1 + np.exp(s*t2)) - 0.5) + 0.07*np.random.randn()
return y
if __name__ == "__main__":
import matplotlib.pyplot as plt
grid_size = 600
grid = np.zeros((grid_size, grid_size), dtype=np.int32)
tmin = -10.0
tmax = 10.0
n = 81
t = np.linspace(tmin, tmax, n)
dt = (tmax - tmin) / (n - 1)
ymin = -1.5
ymax = 1.5
num_traces = 1000
for k in range(num_traces):
y = random_trace(t2)
if use_fast:
bres_curve_count(td, yd, grid)
else:
bres_curve_count_slow(td, yd, grid)
plt.figure()
# Convert to float32 so we can use nan instead of 0.
grid = grid.astype(np.float32)
grid[grid==0] = np.nan
plt.grid(color='w')
plt.imshow(grid.T[::-1,:], extent=[0,1,0,1], cmap=plt.cm.coolwarm,
interpolation='gaussian')
ax = plt.gca()
ax.set_axis_bgcolor('k')
ax.set_xticks(np.linspace(0,1,11))
ax.set_yticks(np.linspace(0,1,11))
ax.set_xticklabels([])
ax.set_yticklabels([])
plt.colorbar()
fig = plt.gcf()
#plt.savefig("eye-diagram.png", bbox_inches='tight')
plt.show()
#!python
import numpy as np
cimport numpy as np
cimport cython
@cython.boundscheck(False)
cdef int bres_segment_count(unsigned x0, unsigned y0,
unsigned x1, unsigned y1,
np.ndarray[np.int32_t, ndim=2] grid):
"""Bresenham's algorithm.
See http://en.wikipedia.org/wiki/Bresenham%27s_line_algorithm
"""
nrows = grid.shape[0]
ncols = grid.shape[1]
if x1 > x0:
dx = x1 - x0
else:
dx = x0 - x1
if y1 > y0:
dy = y1 - y0
else:
dy = y0 - y1
sx = 0
if x0 < x1:
sx = 1
else:
sx = -1
sy = 0
if y0 < y1:
sy = 1
else:
sy = -1
err = dx - dy
while True:
# Note: this test occurs before increment the
# grid value, so we don't count the last point.
if x0 == x1 and y0 == y1:
break
e2 = 2 * err
if e2 > -dy:
err -= dy
x0 += sx
if e2 < dx:
err += dx
y0 += sy
return 0
for k in range(len(x)-1):
x0 = x[k]
y0 = y[k]
x1 = x[k+1]
y1 = y[k+1]
bres_segment_count(x0, y0, x1, y1, grid)
if 0 <= x1 < grid.shape[0] and 0 <= y1 < grid.shape[1]:
grid[x1, y1] += 1
This file, setup.py, can be used to build the Cython extension module:
#!python
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
import numpy
setup(ext_modules=[ext],
cmdclass = {'build_ext': build_ext})
Attachments
• eye-diagram3.png
Recursively eliminates points that lie inside two neighbouring points until only
˓→convex hull is remaining.
:Parameters:
points : ndarray (2 x m)
array of points for which to find hull
graphic : bool
use pylab to show progress?
smidgen : float
offset for graphic number labels - useful values depend on your data range
:Returns:
hull_points : ndarray (2 x n)
convex hull surrounding points
'''
if graphic:
p.clf()
p.plot(points[0], points[1], 'ro')
n_pts = points.shape[1]
assert(n_pts > 5)
centre = points.mean(1)
if graphic: p.plot((centre[0],),(centre[1],),'bo')
angles = n.apply_along_axis(_angle_to_point, 0, points, centre)
pts_ord = points[:,angles.argsort()]
if graphic:
for i in xrange(n_pts):
p.text(pts_ord[0,i] + smidgen, pts_ord[1,i] + smidgen, \
'%d' % i)
pts = [x[0] for x in zip(pts_ord.transpose())]
prev_pts = len(pts) + 1
k = 0
while prev_pts > n_pts:
prev_pts = n_pts
n_pts = len(pts)
if graphic: p.gca().patches = []
i = -2
while i < (n_pts - 2):
Aij = area_of_triangle(centre, pts[i], pts[(i + 1) % n_pts])
Ajk = area_of_triangle(centre, pts[(i + 1) % n_pts], \
pts[(i + 2) % n_pts])
Aik = area_of_triangle(centre, pts[i], pts[(i + 2) % n_pts])
if graphic:
_draw_triangle(centre, pts[i], pts[(i + 1) % n_pts], \
facecolor='blue', alpha = 0.2)
_draw_triangle(centre, pts[(i + 1) % n_pts], \
pts[(i + 2) % n_pts], \
facecolor='green', alpha = 0.2)
_draw_triangle(centre, pts[i], pts[(i + 2) % n_pts], \
facecolor='red', alpha = 0.2)
if Aij + Ajk < Aik:
if graphic: p.plot((pts[i + 1][0],),(pts[i + 1][1],),'go')
del pts[i+1]
i += 1
n_pts = len(pts)
k += 1
return n.asarray(pts)
if __name__ == "__main__":
points = n.random.random_sample((2,40))
hull_pts = convex_hull(points)
Finding the minimum point in the convex hull of a finite set of points
Based on the work of Philip Wolf [1] and the recursive algorithm of Kazuyuki Sekitani and Yoshitsugu Yamamoto [2].
The algorithm in [2] has 3 epsilon to avoid comparison problems in three parts of the algorithm. The code below has
few changes and only one epsilon. The aim of the change is to avoid infinite loops.
from numpy import array, matrix, sin, sqrt, dot, cos, ix_, zeros, concatenate, abs,
˓→log10, exp, ones
def find_min_point(P):
# print "Calling find_min with P: ", P
if len(P) == 1:
return P[0]
eps = mpf(10)**-40
while True:
if len(Pk) == len(P):
return array([float(i) for i in x])
y = find_min_point(Pk)
# Step 4.
P_aux = [p for p in P_Pk if (dot(y-x,y-p)>eps) and (dot(x,y-p)!=0)]
p_lambda = P_aux[array([dot(y,y-p)/dot(x,y-p) for p in P_aux]).argmin()]
lam = dot(x,p_lambda-y) / dot(y-x,y-p_lambda)
x += lam * (y-x)
if __name__ == '__main__':
print find_min_point( [array([ -4.83907292e+00, 2.22438863e+04, -2.
˓→67496763e+04]), array([ 9.71147604, -351.46404195, -292.18064276]), array([ 4.
˓→60452808e+00, 1.07020174e+05, -1.25310230e+05]), array([ 2.16080134e+00, 5.
˓→12019937e+04, -5.96167833e+04]), array([ 2.65472146e+00, 6.70546443e+04, -7.
˓→71619656e+04]), array([ 1.55775358e+00, -1.34347516e+05, 1.53209265e+05]),
˓→array([ 13.22464295, 1869.01251292, -2137.61850989])])
10.6. Finding the minimum point in the convex hull of a finite set of points 361
SciPy Cookbook, Release
KDTree example
Building a kd-tree
#!python numbers=disable
#import psyco
#psyco.full()
import numpy
input:
data: 2D ndarray, shape =(ndim,ndata), preferentially C order
leafsize: max. number of data points to leave in a leaf
output:
kd-tree: list of tuples
"""
ndim = data.shape[0]
ndata = data.shape[1]
left_hrect = hrect.copy()
right_hrect = hrect.copy()
left_hrect[1, 0] = splitval
right_hrect[0, 0] = splitval
# leaf node?
if ndata <= leafsize:
_didx = didx.copy()
_data = data.copy()
leaf = (_didx, _data, None, None, 0, 0)
tree.append(leaf)
return tree
Searching a kd-tree
#!python numbers=disable
# leaf
if leaf_idx is not None:
_knn = quadratic_knn_search(datapoint, leaf_idx, leaf_data, K)
if _knn[0][0] < knn[-1][0]:
knn = sorted(knn + _knn)[:K]
# not a leaf
else:
ndata = data.shape[1]
param = data.shape[0]
# build kdtree
tree = kdtree(data.copy(), leafsize=leafsize)
# search kdtree
knn = []
for i in numpy.arange(ndata):
_data = data[:,i].reshape((param,1)).repeat(leafsize, axis=1);
_knn = search_kdtree(tree, _data, K+1)
knn.append(_knn[1:])
return knn
# leaf
if leaf_idx is not None:
param=leaf_data.shape[0]
distance = numpy.sqrt(((leaf_data - datapoint.reshape((param,1)))**2).
˓→sum(axis=0))
near = numpy.where(distance<=radius)
if len(near[0]):
idx = leaf_idx[near]
distance = distance[near]
inside += (zip(distance, idx))
else:
return inside
In contrast to the kd-tree, straight forward exhaustive search has quadratic complexity with respect to sample size. It
can be faster than using a kd-tree when the sample size is very small. On my computer that is approximately 500
samples or less.
#!python numbers=disable
While creating a kd-tree is very fast, searching it can be time consuming. Due to Python’s dreaded “Global Interpreter
Lock” (GIL), threads cannot be used to conduct multiple searches in parallel. That is, Python threads can be used for
asynchrony but not concurrency. However, we can use multiple processes (multiple interpreters). The pyprocessing
package makes this easy. It has an API similar to Python’s threading and Queue standard modules, but work with
processes instead of threads. Beginning with Python 2.6, pyprocessing is already included in Python’s standard library
as the “multiprocessing” module. There is a small overhead of using multiple processes, including process creation,
process startup, IPC, and process termination. However, because processes run in separate address spaces, no memory
contention is incurred. In the following example, the overhead of using multiple processes is very small compared to
the computation, giving a speed-up close to the number of CPUs on the computer.
#!python numbers=disable
try:
import multiprocessing as processing
except:
import processing
import ctypes, os
def __num_processors():
if os.name == 'nt': # Windows
return int(os.getenv('NUMBER_OF_PROCESSORS'))
else: # glibc (Linux, *BSD, Apple)
get_nprocs = ctypes.cdll.libc.get_nprocs
get_nprocs.restype = ctypes.c_int
get_nprocs.argtypes = []
return get_nprocs()
ndata = data.shape[1]
param = data.shape[0]
nproc = __num_processors()
# build kdtree
tree = kdtree(data.copy(), leafsize=leafsize)
# compute chunk size
chunk_size = data.shape[1] / (4*nproc)
chunk_size = 100 if chunk_size < 100 else chunk_size
# set up a pool of processes
qin = processing.Queue(maxsize=ndata/chunk_size)
qout = processing.Queue(maxsize=ndata/chunk_size)
pool = [processing.Process(target=__remote_process,
args=(rank, qin, qout, tree, K, leafsize))
for rank in range(nproc)]
for p in pool: p.start()
# put data chunks in input queue
cur, nc = 0, 0
while 1:
_data = data[:,cur:cur+chunk_size]
if _data.shape[1] == 0: break
qin.put((nc,_data))
cur += chunk_size
nc += 1
# read output queue
knn = []
while len(knn) < nc:
knn += [qout.get()]
# avoid race condition
_knn = [n for i,n in sorted(knn)]
knn = []
for tmp in _knn:
knn += tmp
# terminate workers
for p in pool: p.terminate()
return knn
The following shows how to run the example code (including how input data should be formatted):
#!python numbers=disable
def test():
K = 11
ndata = 10000
ndim = 12
data = 10 * numpy.random.rand(ndata*ndim).reshape((ndim,ndata) )
knn_search(data, K)
if __name__ == '__main__':
t0 = clock()
test()
t1 = clock()
print "Elapsed time %.2f seconds" % t1-t0
Line integral convolution is a technique, or family of techniques, for representing two-dimensional vector fields. The
idea is to produce a texture which is highly correlated in the direction of the vector field but not correlated across
the vector field. This is done by generating a noise texture then, for each pixel of the image, “flowing” forward and
back along the vector field. The points along this path are looked up in the noise texture and averaged to give the LIC
texture at the starting point. The basic technique ignores both the magnitude of the vector field and its sign. With a
minor modification the same technique can be used to produce an animation of “flow” along the vector field.
Attached to this page is cython code to implement a simple line integral convolution operator, plus some demonstration
python code. The demo code can either make more or less the image above - a simple array of vortices; note how an
overall rotation appears in the sum of individual vortex vector fields, just as a superfluid’s “bulk rotation” is actually a
vortex array - or it can make a video of the same vector field. The video is a little awkward to work with, since all the
standard video compression techniques butcher it horribly, but it does work well.
Section author: AMArchibald
Attachments
• flow-image.png
• lic.py
• lic_demo.py
• lic_internal.pyx
• setup.py
Linear classification
The first example shows the implementation of Fisher’s Linear Classifier for 2-class problem and this algorithm is
precisely described in book “Pattern Recognition and Machine Learning” by Christopher M Bishop (p 186, Section
4.1). The main idea of this algorithm is that we try to reduce the dimensionality of input vector X and project it onto
1D space using the equation y=W.T X where W.T - row vector of weights, and we adjust the weight vector W and
choose the projection that maximizes the class separation. The following program use the famouse data set Iris with
150 number of instances and 4 attributes (4D space), target vector which contains labels: “Iris-setosa”, “Iris-virginica”,
“Iris-versicolor”, therefore, we have 3 classes, but, in this case, we may assume that we have class 1 with labels “Iris-
setosa” and class 2 with other instances. Iris data set is available here: http://archive.ics.uci.edu/ml/datasets/Iris/ or
here (comma separated format) - bezdekIris.data.txt
#! python
from __future__ import division
import numpy as np
import matplotlib.pyplot as plt
def read_data():
f=open("Iris.txt", 'r')
lines=[line.strip() for line in f.readlines()]
f.close()
def main():
class1, class2=read_data()
mean1=np.mean(class1, axis=0)
mean2=np.mean(class2, axis=0)
plt.legend()
plt.show()
main()
This program is the implementation of Probabilistic Generative Model for K-class problem which is also described
in book “Pattern Recognition and Machine Learning” by Christopher M Bishop (p 196, Section 4.2). We try to learn
the class-conditional densities (likelihood) p(x|Ck) for each class K, and prior probability density p(Ck), then we
can compute posterior probability p(Ck|x) by using Bayes rule. Here we assume that p(x|Ck) are 4D Gaussians with
parameters uk - mean vector of class K, Sk - covariance matrix of class K, also p(Ck) for all k is 1/3. Then we compute
so called quantities ak (variables pc’s in the program) and if ak>>aj for all k!=j then assign p(Ck|x)=1 and p(Cj|x)=0.
#! python
from __future__ import division
import numpy as np
import matplotlib.pyplot as plt
import math
def read_data():
f=open("Iris.txt", 'r')
lines=[line.strip() for line in f.readlines()]
f.close()
return px
def main():
class1, class2, class3, data, t=read_data()
count=np.zeros(shape=(150,1))
t_assigned=np.zeros(shape=(150, 3))
cov=np.zeros(shape=(3, 4, 4))
mean=np.zeros(shape=(3, 4))
cov1=np.cov(class1, rowvar=0)
cov2=np.cov(class2, rowvar=0)
cov3=np.cov(class3, rowvar=0)
pc1=((math.exp(px1)*math.exp(-m))*math.exp(m))/((math.exp(px2)*math.exp(-m)+math.
˓→exp(px3)*math.exp(-m))*math.exp(m))
pc2=((math.exp(px2)*math.exp(-m))*math.exp(m))/((math.exp(px1)*math.exp(-m)+math.
˓→exp(px3)*math.exp(-m))*math.exp(m))
pc3=((math.exp(px3)*math.exp(-m))*math.exp(m))/((math.exp(px1)*math.exp(-m)+math.
˓→exp(px2)*math.exp(-m))*math.exp(m))
t1=np.zeros(shape=(len(class1), 1))
t2=np.zeros(shape=(len(class2), 1))
t3=np.zeros(shape=(len(class3), 1))
for i in xrange(len(data)):
for j in xrange(len(class1)):
if t_assigned[i][0]==1: t1[j]=1
elif t_assigned[i][1]==1: t2[j]=2
elif t_assigned[i][2]==1: t3[j]=3
main()
This program resulted in the number of misclassifications = 3 out of all 150 instances
Section author: DmitriyRybalkin
Attachments
• Fisher_discriminant.png
• Fisher_disrciminant.JPG
• Fisher_disrciminant.PNG
• Probabilistic_model.PNG
• bezdekIris.data.txt
• fisher
• prob_gen_model.png
Particle filter
A basic particle filter tracking algorithm, using a uniformly distributed step as motion model, and the initial target
colour as determinant feature for the weighting function. This requires an approximately uniformly coloured object,
which moves at a speed no larger than stepsize per frame.
This implementation assumes that the video stream is a sequence of numpy arrays, an iterator pointing to such a
sequence or a generator generating one. The particle filter itself is a generator to allow for operating on real-time video
streams.
#!python
from numpy import *
from numpy.random import *
def resample(weights):
n = len(weights)
indices = []
C = [0.] + [sum(weights[:i+1]) for i in range(n)]
u0, j = random(), 0
for u in [(u0+i)/n for i in range(n)]:
while u > C[j]:
j+=1
indices.append(j-1)
return indices
for im in seq:
x += uniform(-stepsize, stepsize, x.shape) # Particle motion model: uniform step
x = x.clip(zeros(2), array(im.shape)-1).astype(int) # Clip out-of-bounds
˓→particles
w /= sum(w) # Normalize w
yield sum(x.T*w, axis=1), x, w # Return expected position, particles
˓→and weights
The following code shows the tracker operating on a test sequence featuring a moving square against a uniform
background.
#!python
if __name__ == "__main__":
from pylab import *
from itertools import izip
import time
ion()
seq = [ im for im in zeros((20,240,320), int)] # Create an image sequence of
˓→20 frames long
xs = vstack((arange(20)*3, arange(20)*2)).T + x0
for t, x in enumerate(xs):
xslice = slice(x[0]-8, x[0]+8)
yslice = slice(x[1]-8, x[1]+8)
seq[t][xslice, yslice] = 255
for im, p in izip(seq, particlefilter(seq, x0, 8, 100)): # Track the square through
˓→the sequence
pos, xs, ws = p
position_overlay = zeros_like(im)
position_overlay[tuple(pos)] = 1
particle_overlay = zeros_like(im)
particle_overlay[tuple(xs.T)] = 1
hold(True)
draw()
time.sleep(0.3)
clf() # Causes flickering, but without
˓→the spy plots aren't overwritten
Attachments
• pftrack.jpg
• pftrack.png
• pftracking.jpg
• track.jpg
In this cookbook, we will focus on using pyparsing and numpy to read a structured text file like this one, data.txt:
[ INFOS ]
Debug = False
Shape (mm^-1) = 2.3 #
˓→here is a unit
[ EMPTY SECTION ]
# empty section should not be taken into account
[ TABLE IN ROWS ]
Temp (C) 100 200 300 450.0 600
E XX (GPa) 159.4 16.9E+0 51.8 .15E02 4 #
˓→Here is a space in the row name
[ TABLE IN COLUMNS ]
STATION PRECIPITATION T_MAX_ABS T_MIN_ABS
(/) (mm) (C) (C) # Columns must have a unit
Ajaccio 64.8 18.8E+0 -2.6
Auxerre 49.6 16.9E+0 Nan # Here is a Nan
Bastia 114.2 20.8E+0 -0.9
[ MATRIX ]
True 2 3
4. 5. 6.
7. nan 8
* return a data structure with the same organization in section as the input file and clean
[['infos',
['debug', False],
['shape', 2.2999999999999998],
['length', 25361.150000000001],
['path_1', 'C:\\\\This\\is\\a\\long\\path\\with some space in it\\data.txt'],
['description',
'raw values can have multiple lines, but additional lines must start\nwith a
˓→whitespace which is automatically skipped'],
['parent', None],
['names_', ['debug', 'shape', 'length', 'path_1', 'description', 'parent']],
['unit_', {'length': 'mm', 'shape': 'mm^-1'}]],
['table_in_rows',
['temp', array([ 100., 200., 300., 450., 600.])],
['e_xx', array([ 159.4, 16.9, 51.8, 15. , 4. ])],
['words', array(['hundred', 'two hundreds', 'a lot', 'four', 'five'], dtype='|S12
˓→')],
>>> data.matrix
array([[ 1., 2., 3.],
[ 4., 5., 6.],
[ 7., NaN, 8.]])
>>> data.table_in_columns.t_max_abs
array([ 18.8, 16.9, 20.8])
This parser add two specials fields in all sections but matrix ones :
* ‘‘names_‘‘ : a list containing the names of all variables found in this section
* ‘‘unit_‘‘ : a dict containing the unit corresponding to each variable name,
if there is any
pyparsing is an efficient tool to deal with formatted text, and let you process in two steps:
1. Define rules to identify strings representing sections, variable names,
and so on. With pyparsing, theses rules can be combined easily with the standard operators
1. Define actions to be executed on theses fields, to convert them into python objects.
In the file example above, there are four kinds of data: parameter definitions, table in rows, table in columns and
matrix.
So, we will define a parser for each one and combine them to define the final parser.
This section will describe step by step how to build the function ‘paramParser‘ defined in ConfigNumParser, used
to parse )# the bloc [ INFOS ] in the example above.
A parameter declaration has the form:
with:
* ‘‘value‘‘: anything to the end of line or to the character # which starts a comment
This can be translated almost literally with pyparsing syntax (see how to use pyparsing for more information):
It is easy to test what will be found with this pattern in the data file:
# parameter definition
keyName = Word(alphanums + '_')
unitDef = Suppress('(') + Word(alphanums + '^*/-._') + Suppress(')')
paramValueDef = SkipTo('#'|lineEnd)
The test will now give name to results and gives a nicer output:
['Context', 'full']
- name: Context
- value: full
...
['Temp_ref', 'K', '298.15']
- name: Temp_ref
- unit: ['K']
- value: 298.15
...
...
We will detail further what kind of values are expected to let pyparsing handle the conversion.
They can be divided in two parts :
Let’s begin with numbers. We can use the ‘Regex‘ element to rapidly detect strings representing numbers:
See Regular expression operations for more information on regular expressions. We could have built a parser with
standard pyparsing elements (‘Combine‘, ‘Optional‘, ‘oneOf‘, etc.) but low-level expressions like floating point num-
bers are said to do really much better using the ‘Regex‘ class. I know it feels like cheating, but in truth, pyparsing uses
a number of re’s under the covers.
Now we will define a function to convert this string into python float or integer and set a ‘parseAction‘ to tell pyparsing
to automatically convert a number when it find one:
def convertNumber(t):
"""Convert a string matching a number to a python number"""
if t.float1 or t.float2 or t.float3 : return [float(t[0])]
else : return [int(t[0]) ]
number.setParseAction(convertNumber)
* it should return either a `parseResults` object or a list of results which will be automa
Pyparsing comes with a very convenient function to convert fields to a constant object, namely ‘replaceWith‘. This
can be used to create a list of element converting strings to python objects:
from numpy import NAN
pyValue_list = [ number ,
Keyword('True').setParseAction(replaceWith(True)) ,
Keyword('False').setParseAction(replaceWith(False)) ,
Keyword('NAN', caseless=True).setParseAction(replaceWith(NAN)),
Keyword('None').setParseAction(replaceWith(None)) ,
QuotedString('"""', multiline=True) ,
QuotedString("'''", multiline=True) ,
QuotedString('"') ,
QuotedString("'") ,
]
Here we used:
* `Keyword` to detect standard python keyword and replace them on the fly
* `QuotedString` to detect quoted string and automatically unquote them
* `MatchFirst` to build a super element,
`pyValue` to convert all kind of python values.
[True], [False], [nan], [nan], [None], ['word'], ['two words'], ["'more words', he
˓→said"]]
By default, pyparsing considers any characters in ‘ \t\r\n’) as whitespace and meaningless. If you need to detect
ends-of-line you need to change this behavior by using ‘setWhitespaceChars‘ or ‘setDefaultWhitespaceChars‘.
As we are going to process tables line by line, we need to configure this and this should be set up at the lowest level:
˓→ he said"]]
>>> # to compare to
To finish this parser, we now need to add a rule to match raw text following the conditions:
# rawValue can be multiline but theses lines should start with a Whitespace
rawLine = CharsNotIn('#\n') + (lineEnd | Suppress('#'+restOfLine))
rawValue = Combine( rawLine + ZeroOrMore(White(' \t').suppress()+ NotAny('[') +
˓→rawLine))
We will also refine our definition of units to handle special cases like (-), (/) or (), corresponding to a blank unit.
This leads to:
Structuring data
def formatBloc(t):
""" Format the result to have a list of (key, values) easily usable with Dict
for row in t :
rows.append(ParseResults([ row.name, row.value ]))
names.append(row.name)
if row.unit : units[row.name] = row.unit[0]
return rows
This ‘paramParser‘ element is exactly the parser created by the function ‘paramParser‘ defined in
ConfigNumParser.
Let’s see what we get:
For parsing parameter declarations, we have seen most of the common techniques but one: the use of ‘Forward‘
element to define parsing rules on the fly.
Let’s see how this can be used to parse a table defined column by column, according to this schema:
Such a parser can be generated with the ‘tableColParser‘ function defined in ConfigNumParser.
The heart of the problem is to tell pyparsing that each line should have the same number of columns, whereas this
number is unknown a priori.
We will get round this problem by defining the pattern corresponding to the unit line and its followers right after
reading the header line.
Indeed, theses lines can be defined with a ‘Forward‘ element and we can attach a ‘parseAction‘ to the header line to
redefine these elements later, once we know how many columns we have in the headers.
Redefining a ‘Forward‘ element is done via the << operator:
def defineColNumber(t):
""" Define unitLine and tabValueLine to match the same number of columns than
the header line"""
nbcols = len(t.header)
unitLine << Group( unitDef*nbcols + EOL)
tabValueLine << Group( tabValueDef*nbcols + EOL)
tableColDef = ( firstLine('header').setParseAction(defineColNumber)
+ unitLine('unit')
+ Group(OneOrMore(tabValueLine))('data')
)
Now will organize our data the same way we did for parameters, but we will use this time the name of the column as
the key and we will transform our data into numpy arrays:
def formatBloc(t):
""" Format the result to have a list of (key, values) easily usable
with Dict and transform data into array
transposedData = zip(*t.data)
for header, unit, data in zip(t.header, t.unit, transposedData):
units[header] = unit
columns.append(ParseResults([header, array(data)]))
columns.append(ParseResults(['names_', names]))
columns.append(ParseResults(['unit_' , units ]))
return columns
tableColParser = Dict(tableColDef.setParseAction(formatBloc))
[...]
- names_: ['station', 'precipitation', 't_max_abs', 't_min_abs']
- precipitation: [ 64.8 49.6 114.2]
- station: ['Ajaccio' 'Auxerre' 'Bastia']
- t_max_abs: [ 18.8 16.9 20.8]
- t_min_abs: [-2.6 NaN -0.9]
- unit_: {'station': '/', 'precipitation': 'mm', 't_min_abs': 'C', 't_max_abs': 'C'}
We won’t detail them here, because they use exactly the same techniques we’ve already seen.
We will rather see how to combine them into a complex parser as it is done in the ‘parseConfigFile‘ function:
# Section header
sectionName = Suppress('[') + keyName + Suppress(']')
# Defines comments
parser.ignore('#' + restOfLine)
That’s all.
The parser can now be use through its method parseString or parseFile. See [attach-
ment:ConfigNumParser_v0.1.1.py ConfigNumParser] for more information.
I hope this will give you a good starting point to read complex formatted text.
Section author: Elby
Attachments
• ConfigNumParser_v0.1.1.py
• ConfigNumParser_v0.1.py
• data.txt
• data3.txt
Rebinning
Examples showing how to rebin data to produce a smaller or bigger array without (and with) using interpolation.
Here we deal with the simplest case where any desired new shape is valid and no interpolation is done on the data to
determine the new values. * First, floating slices objects are created for each dimension. * Second, the coordinates of
the new bins are computed from the slices using mgrid. * Then, coordinates are transformed to integer indices. * And,
finally, ‘fancy indexing’ is used to evaluate the original array at the desired indices.
def rebin( a, newshape ):
'''Rebin an array to a new shape.
'''
assert len(a.shape) == len(newshape)
If we were only interested in reducing the sizes by some integer factor then we could use:
def rebin_factor( a, newshape ):
'''Rebin an array to a new shape.
newshape must be a factor of a.shape.
'''
assert len(a.shape) == len(newshape)
assert not sometrue(mod( a.shape, newshape ))
Here is an other way to deal with the reducing case for ndarrays. This acts identically to IDL’s rebin command where
all values in the original array are summed and divided amongst the entries in the new array. As in IDL, the new shape
must be a factor of the old one. The ugly ‘evList trick’ builds and executes a python command of the form
a.reshape(args[0],factor[0],).sum(1)/factor[0] a.reshape(args[0],factor[0],args[1],factor[1],).sum(1).sum(2)/factor[0]/factor[1]
etc. This general form is extended to cover the number of required dimensions.
def rebin(a, *args):
'''rebin ndarray data into a smaller ndarray of the same rank whose dimensions
are factors of the original dimensions. eg. An array with 6 columns and 4 rows
can be reduced to have 6,3,2 or 1 columns and 4,2 or 1 rows.
example usages:
>>> a=rand(6,4); b=rebin(a,3,2)
>>> a=rand(6); b=rebin(a,2)
'''
shape = a.shape
lenShape = len(shape)
factor = asarray(shape)/asarray(args)
evList = ['a.reshape('] + \
['args[%d],factor[%d],'%(i,i) for i in range(lenShape)] + \
The above code returns an array of the same type as the input array. If the input is an integer array, the output values
will be rounded down. If you want a float array which correctly averages the input values without rounding, you can
do the following instead.
a.reshape(args[0],factor[0],).mean(1) a.reshape(args[0],factor[0],args[1],factor[1],).mean(1).mean(2)
# 2-D case
a=rand(6,4)
print a
b=rebin(a,6,4)
print b
b=rebin(a,6,2)
print b
b=rebin(a,3,2)
print b
b=rebin(a,1,1)
# 1-D case
print b
a=rand(4)
print a
b=rebin(a,4)
print b
b=rebin(a,2)
print b
b=rebin(a,1)
print b
A python version of congrid, used in IDL, for resampling of data to arbitrary sizes, using a variety of nearest-neighbour
and interpolation routines.
import numpy as n
import scipy.interpolate
import scipy.ndimage
Uses the same parameters and creates the same co-ordinate lookup points
method:
neighbour - closest value from original data
nearest and linear - uses n x 1-D interpolations using
scipy.interpolate.interp1d
(see Numerical Recipes for validity of use of n 1-D interpolations)
spline - uses ndimage.map_coordinates
centre:
True - interpolation points are at the centres of the bins
False - points are at the front edge of the bin
minusone:
For example- inarray.shape = (i,j) & new dimensions = (x,y)
False - inarray is resampled by factors of (i/x) * (j/y)
True - inarray is resampled by(i-1)/(x-1) * (j-1)/(y-1)
This prevents extrapolation one element beyond bounds of input array.
'''
if not a.dtype in [n.float64, n.float32]:
a = n.cast[float](a)
m1 = n.cast[int](minusone)
ofs = n.cast[int](centre) * 0.5
old = n.array( a.shape )
ndims = len( a.shape )
if len( newdims ) != ndims:
print "[congrid] dimensions error. " \
"This routine currently only support " \
"rebinning to the same number of dimensions."
return None
newdims = n.asarray( newdims, dtype=float )
dimlist = []
if method == 'neighbour':
for i in range( ndims ):
base = n.indices(newdims)[i]
dimlist.append( (old[i] - m1) / (newdims[i] - m1) \
* (base + ofs) - ofs )
cd = n.array( dimlist ).round().astype(int)
newa = a[list( cd )]
return newa
if ndims > 1:
# need one more transpose to return to original dimensions
newa = newa.transpose( trorder )
return newa
elif method in ['spline']:
oslices = [ slice(0,j) for j in old ]
oldcoords = n.ogrid[oslices]
nslices = [ slice(0,j) for j in list(newdims) ]
newcoords = n.mgrid[nslices]
newcoords_dims = range(n.rank(newcoords))
#make first index last
newcoords_dims.append(newcoords_dims.pop(0))
newcoords_tr = newcoords.transpose(newcoords_dims)
# makes a view that affects newcoords
newcoords_tr += ofs
newcoords_tr -= ofs
This page shows how to compute the stationary distribution pi of a large Markov chain. The example is a tandem
of two M/M/1 queues. Generally the transition matrix P of the Markov chain is sparse, so that we can either use
scipy.sparse or Pysparse. Here we demonstrate how to use both of these tools.
In this section we find pi by means of iterative methods called the Power method. More specifically, given a (stochastic)
transition matrix P, and an initial vector pi_0, compute iteratively pi_n = pi_{n-1} P until the distance (in some norm)
between pi_n and pi_{n-1} is small enough.
Fist we build the generator matrix Q for the related Markov chain. Then we turn Q into a transition matrix P by the
method of uniformization, that is, we define P as I - Q/l, where I is the identity matrix (of the same size as Q) and l is
the smallest element on the diagonal of Q. Once we have P, we approximate pi (the left eigenvector of P that satisfies
pi = pi P) by the iterates pi_n = pi_0 P^n, for some initial vector pi_0.
More details of the above approach can be found in (more or less) any book on probability and Markov Chains. A fine
example, with many nice examples and attention to the numerical solution of Markov chains, is ‘Queueing networks
and Markov Chains’ by G. Bolch et al., John Wiley, 2nd edition, 2006.
You can get the source code for this tutorial here: tandemqueue.py
#!/usr/bin/env python
This simple function converts the state (i,j), which represents that the first queue contains i jobs and the second queue
j jobs, to a more suitable form to define a transition matrix.
def state(i,j):
return j*N1 + i
def computePiMethod2():
e0 = time.time()
Q = pysparse.spmatrix.ll_mat(size,size)
fillOffDiagonal(Q)
# fill diagonal
x = empty(size)
Q.matvec(ones(size),x)
Q.put(-x)
# uniformize
l = min(Q.values())*1.001
P = pysparse.spmatrix.ll_mat(size,size)
P.put(ones(size))
P.shift(-1./l, Q)
# Compute pi
P = P.to_csr()
pi = zeros(size); pi1 = zeros(size)
pi[0] = 1;
n = norm(pi - pi1,1); i = 0;
while n > 1e-3 and i < 1e5:
P.matvec_transp(pi,pi1)
P.matvec_transp(pi1,pi)
n = norm(pi - pi1,1); i += 1
print "Method 2: ", time.time() - e0, i
return pi
def plotPi(pi):
pi = pi.reshape(N2,N1)
matshow(pi)
savefig("pi.eps")
if __name__ == "__main__":
pi = computePiMethod1()
pi = computePiMethod2()
plotPi(pi)
Attachments
• pi.png
• tandemqueue.py
Watershed
The watershed algorithm (see 1) is used to split an image into distinct components.
Suppose that we have the following image, composed of three whites disks (pixels of value 1) and a black background
(pixels of value 0). We want to obtain a new array where each pixel is labeled with the index of the component to
which it belongs, that is a segmentation of the orginal array, as shown in the image below. We will use the watershed
algorithm provided by scipy.ndimage, scipy.ndimage.watershed_ift.
<matplotlib.image.AxesImage at 0x7ff9c8348690>
The watershed algorithm relies on the flooding of different basins, so we need to put markers in the image to initiate
the flooding. If one knows approximately where the objects are, and there are only a few objects, it is possible to set
the markers by hand
markers = np.zeros_like(a).astype(np.int16)
markers[0, 0] = 1
markers[200, 100] = 2
markers[350, 400] = 3
markers[260, 200] = 4
res1 = ndimage.watershed_ift(a.astype(np.uint8), markers)
<matplotlib.image.AxesImage at 0x7ff9c827a550>
If you don’t know where to put the markers, and you know the minimal size of the objects, you can spread a lot of
markers on a grid finer than the objects.
<matplotlib.image.AxesImage at 0x7ff9c81a3a10>
Performance
A comparison of weave with NumPy, Pyrex, Psyco, Fortran (77 and 90) and C++ for solving Laplace’s equation. This
article was originally written by Prabhu Ramachandran.
laplace.py is the complete Python code discussed below. The source tarball ( perfpy_2.tgz ) contains in
addition the Fortran code, the pure C++ code, the Pyrex sources and a setup.py script to build the f2py and Pyrex
module.
This is a simple introductory document to using Python for performance computing. We’ll use NumPy, SciPy’s weave
(using both weave.blitz and weave.inline) and Pyrex. We will also show how to use f2py to wrap a Fortran subroutine
and call it from within Python.
We will also use this opportunity to benchmark the various ways to solve a particular numerical problem in Python
and compare them to an implementation of the algorithm in C++.
The example we will consider is a very simple (read, trivial) case of solving the 2D Laplace equation using an iterative
finite difference scheme (four point averaging, Gauss-Seidel or Gauss-Jordan). The formal specification of the problem
is as follows. We are required to solve for some unknown function u(x,y) such that 2u = 0 with a boundary condition
specified. For convenience the domain of interest is considered to be a rectangle and the boundary values at the sides
of this rectangle are given.
It can be shown that this problem can be solved using a simple four point averaging scheme as follows. Discretise the
domain into an (nx x ny) grid of points. Then the function u can be represented as a 2 dimensional array - u(nx, ny).
The values of u along the sides of the rectangle are given. The solution can be obtained by iterating in the following
manner.
Where dx and dy are the lengths along the x and y axis of the discretised domain.
397
SciPy Cookbook, Release
Implementing a solver for this is straight forward in Pure Python. Use a simple NumPy array to store the solution
matrix u. The following code demonstrates a simple solver.
import numpy
class Grid:
"""A simple grid class that stores the details and solution of the
computational grid."""
def __init__(self, nx=10, ny=10, xmin=0.0, xmax=1.0,
ymin=0.0, ymax=1.0):
self.xmin, self.xmax, self.ymin, self.ymax = xmin, xmax, ymin, ymax
self.dx = float(xmax-xmin)/(nx-1)
self.dy = float(ymax-ymin)/(ny-1)
self.u = numpy.zeros((nx, ny), 'd')
# used to compute the change in solution in some of the methods.
self.old_u = self.u.copy()
def computeError(self):
"""Computes absolute error using an L2 norm for the solution.
This requires that self.u and self.old_u must be appropriately
setup."""
v = (self.u - self.old_u).flat
return numpy.sqrt(numpy.dot(v,v))
class LaplaceSolver:
"""A simple Laplacian solver that can use different schemes to
solve the problem."""
def __init__(self, grid, stepper='numeric'):
self.grid = grid
self.setTimeStepper(stepper)
err = 0.0
for i in range(1, nx-1):
for j in range(1, ny-1):
tmp = u[i,j]
u[i,j] = ((u[i-1, j] + u[i+1, j])*dy2 +
(u[i, j-1] + u[i, j+1])*dx2)*dnr_inv
diff = u[i,j] - tmp
err += diff*diff
return numpy.sqrt(err)
return count
The code is pretty simple and very easy to write but if we run it for any sizeable problem (say a 500 x 500 grid of
points), we’ll see that it takes forever to run. The CPU hog in this case is the slowTimeStep method. In the next
section we will speed it up using NumPy.
It turns out that the innermost loop of the LaplaceSolver.slowTimeStep method can be readily expressed by
a much simpler NumPy expression. Here is a re-written timeStep method.
return g.computeError()
The entire for i and j loops have been replaced by a single NumPy expression. NumPy expressions operate elementwise
and hence the above expression works. It basically computes the four point average. If you have gone through the
NumPy tutorial and played with NumPy a bit you should be able to understand how this works. The beauty of the
expression is that its completely done in C. This makes the computation much faster. For a quick comparison here are
some numbers for a single iteration on a 500x500 grid. On a PIII 450Mhz with 192 MB RAM, the above takes about
0.3 seconds whereas the previous one takes around 15 seconds. This is close to a 50 fold speed increase. You will also
note a few things.
1. We cannot compute the error the way we did earlier inside the for loop. We need to make a copy of the data and
then use the computeError function to do this. This costs us memory and is not very pretty. This is certainly a
limitation but is worth a 50 fold speed increase.
2. The expression will use temporaries. Hence, during one iteration, the computed values at an already computed
location will not be used during the iteration. For instance, in the original for loop, once the value of u[1,1] is
computed, the next value for u[1,2] will use the newly computed u[1,1] and not the old one. However, since the
NumPy expression uses temporaries internally, only the old value of u[1,1] will be used. This is not a serious
issue in this case because it is known that even when this happens the algorithm will converge (but in twice as
much time, which reduces the benefit by a factor of 2, which still leaves us with a 25 fold increase).
Apart from these two issues its clear that using NumPy boosts speed tremendously. We will now use the amazing
weave package to speed this up further.
The NumPy expression can be speeded up quite a bit if we use weave.blitz. Here is the new function.
# import necessary modules and functions
from scipy import weave
# ...
def blitzTimeStep(self, dt=0.0):
"""Takes a time step using a NumPy expression that has been
blitzed using weave."""
g = self.grid
dx2, dy2 = g.dx**2, g.dy**2
dnr_inv = 0.5/(dx2 + dy2)
u = g.u
g.old_u = u.copy()
return g.computeError()
If you notice, the only thing that has changed is that we put quotes around the original numeric expression and call this
string ‘expr’ and then invoke weave.blitz. The ‘check_size’ keyword when set to 1 does a few sanity checks and is to
be used when you are debugging your code. However, for pure speed it is wise to set it to 0. This time when we time
the code for a 500x500 array for a single iteration it takes only about 0.1 seconds which is about a three fold increase!
There are again a few things to note.
1. The first time this method is called, it will take a long while to do some magic behind your back. The next time
it is called, it will run immediately. More details on this are in the weave documentation. Basically, weave.blitz
converts the NumPy expression into C++ code and uses blitz++ for the array expression, builds a Python module,
stores it in a special place and invokes that the next time the function call is made.
2. Again we need to use a temporary array to compute the error.
3. blitz does not use temporaries for the computation and therefore behaves more like the original (slow) for loop
in that the computed values are re-used immediately.
Apart from these points, the results are identical as compared to the original for loop. It’s only about 170 times faster
than the original code! We will now look at yet another way to speed up our original for loop. Enter weave.inline!
Inline allows one to embed C or C++ code directly into your Python code. Here is a simple version of an inlined
version of the code.
from scipy.weave import converters
# ...
def inlineTimeStep(self, dt=0.0):
"""Takes a time step using inlined C code -- this version uses
blitz arrays."""
g = self.grid
nx, ny = g.u.shape
code = """
#line 120 "laplace.py" (This is only useful for debugging)
double tmp, err, diff;
err = 0.0;
for (int i=1; i<nx-1; ++i) {
for (int j=1; j<ny-1; ++j) {
tmp = u(i,j);
u(i,j) = ((u(i-1,j) + u(i+1,j))*dy2 +
(u(i,j-1) + u(i,j+1))*dx2)*dnr_inv;
diff = u(i,j) - tmp;
err += diff*diff;
}
}
return_val = sqrt(err);
"""
# compiler keyword only needed on windows with MSVC installed
err = weave.inline(code,
['u', 'dx2', 'dy2', 'dnr_inv', 'nx', 'ny'],
type_converters=converters.blitz,
compiler = 'gcc')
return err
The code itself looks very straightforward (which is what makes inline so cool). The inline function call arguments are
all self explanatory. The line with ‘#line 120 ...’ is only used, for debugging and doesn’t affect the speed in anyway.
Again the first time you run this function it takes a long while to do something behind the scenes and the next time it
blazes away. This time notice that we have far more flexibility inside our loop and can easily compute an error term
without a need for temporary arrays. Timing this version results in a time for a 500x500 array of a mere 0.04 seconds
per iteration! This corresponds to a whopping 375 fold speed increase over the plain old for loop. And remember we
haven’t sacrificed any of Python’s incredible flexibility! This loop contains code that looks very nice but if we want to
we can speed things up further by writing a little dirty code. We won’t get into that here but it suffices to say that its
possible to get a further factor of two speed up by using a different approach. The code for this basically does pointer
arithmetic on the NumPy array data instead of using blitz++ arrays. This code was contributed by Eric Jones. The
source code accompanying this article contains this code.
Next, we look at how it is possible to easily implement the loop inside Fortran and call it from Python by using f2py.
f2py is an amazing utility that lets you easily call Fortran functions from Python. First we will write a small Fortran77
subroutine to do our calculation. Here is the code.
c File flaplace.f
subroutine timestep(u,n,m,dx,dy,error)
double precision u(n,m)
double precision dx,dy,dx2,dy2,dnr_inv,tmp,diff
integer n,m,i,j
cf2py intent(in) :: dx,dy
cf2py intent(in,out) :: u
cf2py intent(out) :: error
cf2py intent(hide) :: n,m
dx2 = dx*dx
dy2 = dy*dy
dnr_inv = 0.5d0 / (dx2+dy2)
error = 0
do 200,j=2,m-1
do 100,i=2,n-1
tmp = u(i,j)
u(i,j) = ((u(i-1,j) + u(i+1,j))*dy2+
& (u(i,j-1) + u(i,j+1))*dx2)*dnr_inv
diff = u(i,j) - tmp
error = error + diff*diff
100 continue
200 continue
error = sqrt(error)
end
The lines starting with cf2py are special f2py directives and are documented in f2py. The rest of the code is straight-
forward for those who know some Fortran. We trivially create a Python module for this using the following command.
import flaplace
Thats it! Hopefully someday scipy.weave will let us do this inline and not require us to write a separate Fortran file.
The Fortran code and f2py example were contributed by Pearu Peterson, the author of f2py. Anyway, using this module
it takes about 0.029 seconds for a 500x500 grid per iteration! This is about a 500 fold speed increase over the original
code.
f2py can also work with more modern Fortran versions. This is useful because Fortran90 has special array features
that allow a more compact, nicer and faster code. Here is the same subroutine in Fortran90:
! File flaplace90_arrays.f90
subroutine timestep(u,n,m,dx,dy,error)
implicit none
!Pythonic array indices, from 0 to n-1.
real (kind=8), dimension(0:n-1,0:m-1), intent(inout):: u
real (kind=8), intent(in) :: dx,dy
real (kind=8), intent(out) :: error
integer, intent(in) :: n,m
real (kind=8), dimension(0:n-1,0:m-1) :: diff
real (kind=8) :: dx2,dy2,dnr_inv
!f2py intent(in) :: dx,dy
!f2py intent(in,out) :: u
!f2py intent(out) :: error
!f2py intent(hide) :: n,m
dx2 = dx*dx
dy2 = dy*dy
dnr_inv = 0.5d0 / (dx2+dy2)
diff=u
u(1:n-2, 1:m-2) = ((u(0:n-3, 1:m-2) + u(2:n-1, 1:m-2))*dy2 + &
(u(1:n-2,0:m-3) + u(1:n-2, 2:m-1))*dx2)*dnr_inv
error=sqrt(sum((u-diff)**2))
end subroutine
Remark that the operations are performed in a single line, very similar to Numpy. The compilation step is exactly the
same. In a 1000x1000 grid, this code is 2.2 times faster than the fortran77 loops.
The source tarball ( perfpy_2.tgz ) also contains a Fortran95 subroutine using the forall construct, which has a very
similar performance.
We also implemented the timeStep function in Pyrex using the code from the fast inline version. The Pyrex sources
are a little longer than the weave, blitz or Fortran code since we have to expose the NumPy array structure. The basic
function looks like this.
Travis Oliphant wrote a Cython version of the example.
nx = u.dimensions[0]
ny = u.dimensions[1]
dx2, dy2 = dx**2, dy**2
dnr_inv = 0.5/(dx2 + dy2)
elem = u.data
err = 0.0
cdef int i, j
cdef double *uc, *uu, *ud, *ul, *ur
cdef double diff, tmp
for i from 1 <= i < nx-1:
uc = elem + i*ny + 1
ur = elem + i*ny + 2
ul = elem + i*ny
uu = elem + (i+1)*ny + 1
ud = elem + (i-1)*ny + 1
return sqrt(err)
The function looks long but is not too hard to write. It is also possible to write without doing the pointer arithmetic by
providing convenient functions to access the array. However, the code shown above is fast. The sources provided with
this article contains the complete Pyrex file and also a setup.py script to build it. Timing this version, we find that this
version is as fast as the fast inlined version and takes only 0.025 seconds.
We have implemented the Numeric version in Matlab and Octave ( laplace.m ) and run the tests on a different computer
(hence the “estimate” values in the table below). We have found that no significant speed-up is obtained in Matlab,
while Octave runs twice slower than NumPy. Detailed graphs can be found here.
Finally, for comparison we implemented this in simple C++ (nothing fancy) without any Python. One would expect
that the C++ code would be faster but surprisingly, not by much! Given the fact that it’s so easy to develop with
Python, this speed reduction is not very significant.
Here are some timing results for a 500x500 grid for 100 iterations. Note that we also have a comparison of results of
using the slow Python version along with Psyco.
Attachments
• laplace.m
• laplace.py
• perfpy_2.tgz
Multiprocessor and multicore machines are becoming more common, and it would be nice to take advantage of them
to make your code run faster. numpy/scipy are not perfect in this area, but there are some things you can do. The best
way to make use of a parallel processing system depend on the task you’re doing and on the parallel system you’re
using. If you have a big complicated job or a cluster of machines, taking full advantage will require much thought.
But many tasks can be parallelized in a fairly simple way.
As the saying goes, “premature optimization is the root of all evil”. Using a multicore machine will provide at best
a speedup by a factor of the number of cores available. Get your code working first, before even thinking about
parallelization. Then ask yourself whether your code actually needs to be any faster. Don’t embark on the bug-strewn
path of parallelization unless you have to.
Break your job into smaller jobs and run them simultaneously
For example, if you are analyzing data from a pulsar survey, and you have thousands of beams to analyze, each taking
a day, the simplest (and probably most efficient) way to parallelize the task is to simply run each beam as a job.
Machines with two processors can just run two jobs. No need to worry about locking or communication; no need to
write code that knows it’s running in parallel. You can have issues if the processes each need as much memory as your
machine has, or if they are both IO heavy, but in general this is a simple and efficient way to parallelize your code -
if it works. Not all tasks divide up this nicely. If your goal is to process a single image, it’s not clear how to do this
without a lot of work.
One of the great strengths of numpy is that you can express array operations very cleanly. For example to compute the
product of the matrix A and the matrix B, you just do:
>>> C = numpy.dot(A,B)
Not only is this simple and clear to read and write, since numpy knows you want to do a matrix dot product it can use an
optimized implementation obtained as part of “BLAS” (the Basic Linear Algebra Subroutines). This will normally be a
library carefully tuned to run as fast as possible on your hardware by taking advantage of cache memory and assembler
implementation. But many architectures now have a BLAS that also takes advantage of a multicore machine. If your
numpy/scipy is compiled using one of these, then dot() will be computed in parallel (if this is faster) without you
doing anything. Similarly for other matrix operations, like inversion, singular value decomposition, determinant, and
so on. For example, the open source library ATLAS allows compile time selection of the level of parallelism (number
of threads). The proprietary MKL library from Intel offers the possibility to chose the level of parallelism at runtime.
There is also the GOTO library that allow run-time selection of the level of parallelism. This is a commercial product
but the source code is distributed free for academic use.
Finally, scipy/numpy does not parallelize operations like
>>> A = B + C
>>> A = numpy.sin(B)
>>> A = scipy.stats.norm.isf(B)
These operations run sequentially, taking no advantage of multicore machines (but see below). In principle, this could
be changed without too much work. OpenMP is an extension to the C language which allows compilers to produce
parallelizing code for appropriately-annotated loops (and other things). If someone sat down and annotated a few core
loops in numpy (and possibly in scipy), and if one then compiled numpy/scipy with OpenMP turned on, all three of
the above would automatically be run in parallel. Of course, in reality one would want to have some runtime control -
for example, one might want to turn off automatic parallelization if one were planning to run several jobs on the same
multiprocessor machine.
Sometimes you can see how to break your problem into several parallel tasks, but those tasks need some kind of
synchronization or communication. For example, you might have a list of jobs that can be run in parallel, but you
need to gather all their results, do some summary calculation, then launch another batch of parallel jobs. There are
two approaches to doing this in Python, using either multiple threads or processes.
Threads
Threads are generally ‘lighter’ than processes, and can be created, destroyed and switched between faster than pro-
cesses. They are normally preferred for taking advantage of multicore systems. However, multithreading with python
has a key limitation; the Global Interpreter Lock (GIL). For various reasons (a quick web search will turn up copious
discussion, not to say argument, over why this exists and whether it’s a good idea), python is implemented in such
a way that only one thread can be accessing the interpreter at a time. This basically means only one thread can be
running python code at a time. This almost means that you don’t take any advantage of parallel processing at all. The
exceptions are few but important: while a thread is waiting for IO (for you to type something, say, or for something to
come in the network) python releases the GIL so other threads can run. And, more importantly for us, while numpy is
doing an array operation, python also releases the GIL. Thus if you tell one thread to do:
During the print operations and the % formatting operation, no other thread can execute. But during the A = B + C,
another thread can run - and if you’ve written your code in a numpy style, much of the calculation will be done in a
few array operations like A = B + C. Thus you can actually get a speedup from using multiple threads.
The python threading module is part of the standard library and provides tools for multithreading. Given the limitations
discussed above, it may not be worth carefully rewriting your code in a multithreaded architecture. But sometimes you
can do multithreading with little effort, and in these cases it can be worth it. See Cookbook/Multithreading for one
example. This recipe provides a thread Pool() interface with the same API as that found for processes (below) which
might also be of interest. There is also the ThreadPo ol module which is quite similar.
Processes
One way to overcome the limitations of the GIL discussed above is to use multiple full processes instead of threads.
Each process has it’s own GIL, so they do not block each other in the same way that threads do. From python
2.6, the standard library includes a multiprocessing module, with the same interface as the threading module. For
earlier versions of Python, this is available as the processing module (a backport of the multiprocessing module of
python 2.6 for python 2.4 and 2.5 is in the works here: multiprocessing ). It is possible to share memory between
processes, including numpy arrays. This allows most of the benefits of threading without the problems of the GIL.
It also provides a simple Pool() interface, which features map and apply commands similar to those found in the
Cookbook/Multithreading example.
Comparison
Here is a very basic comparison which illustrates the effect of the GIL (on a dual core machine).
import numpy as np
import math
def f(x):
print x
y = [1]*10000000
[math.exp(i) for i in y]
def g(x):
print x
y = np.ones(10000000)
np.exp(y)
time fornorm(g,range(100))
time fornorm(f,range(10))
time foreach(g,range(100),threads=2)
time foreach(f,range(10),threads=2)
p = Pool(2)
time p.map(g,range(100))
time p.map(f,range(100))
For function f(), which does not release the GIL, threading actually performs worse than serial code, presumably
due to the overhead of context switching. However, using 2 processes does provide a significant speedup. For function
g() which uses numpy and releases the GIL, both threads and processes provide a significant speed up, although
multiprocesses is slightly faster.
If you need sophisticated parallelism - you have a computing cluster, say, and your jobs need to communicate with
each other frequently - you will need to start thinking about real parallel programming. This is a subject for graduate
courses in computer science, and I’m not going to address it here. But there are some python tools you can use to
implement the things you learn in that graduate course. (I am perhaps exaggerating - some parallelization is not that
difficult, and some of these tools make it fairly easy. But do realize that parallel code is much more difficult to write
and debug than serial code.)
• IPython1
• mpi4py
• parallel python
• POSH
Section author: Pauli Virtanen
Root finding
Function intersections
import numpy as np
def f(xy):
x, y = xy
z = np.array([y - x**2, y - x - 1.0])
return z
It may be useful to find out the zeros of the spherical Bessel functions, for instance, if you want to compute the
eigenfrequencies of a spherical electromagnetic cavity (in this case, you’ll need also the zeros of the derivative of
(r*Jn(r))).
The problem is that you have to work out the ranges where you are supposed to find the zeros.
409
SciPy Cookbook, Release
Happily, the range of a given zero of the n’th spherical Bessel functions can be computed from the zeros of the (n-1)’th
spherical Bessel function.
Thus, the approach proposed here is recursive, knowing that the spherical Bessel function of order 0 is equal to sin(r)/r,
whose zeros are well known.
This approach is obviously not efficient at all, but it works ;-).
A sample example is shown, for the 10 first zeros of the spherical Bessel function of order 5 (and the derivative of
(r*J5(r))), using [:Cookbook/Matplotlib: matplotlib].
#! /usr/bin/env python
### recursive method: computes zeros ranges of Jn(r,n) from zeros of Jn(r,n-1)
### (also for zeros of (rJn(r,n))')
### pros : you are certain to find the right zeros values;
### cons : all zeros of the n-1 previous Jn have to be computed;
### note : Jn(r,0) = sin(r)/r
def Jn(r,n):
return (sqrt(pi/(2*r))*jv(n+0.5,r))
def Jn_zeros(n,nt):
zerosj = zeros((n+1, nt), dtype=Float32)
zerosj[0] = arange(1,nt+1)*pi
points = arange(1,nt+n+1)*pi
racines = zeros(nt+n, dtype=Float32)
for i in range(1,n+1):
for j in range(nt+n-i):
foo = brentq(Jn, points[j], points[j+1], (i,))
racines[j] = foo
points = racines
zerosj[i][:nt] = racines[:nt]
return (zerosj)
def rJnp(r,n):
return (0.5*sqrt(pi/(2*r))*jv(n+0.5,r) + sqrt(pi*r/2)*jvp(n+0.5,r))
def rJnp_zeros(n,nt):
zerosj = zeros((n+1, nt), dtype=Float32)
zerosj[0] = (2.*arange(1,nt+1)-1)*pi/2
points = (2.*arange(1,nt+n+1)-1)*pi/2
racines = zeros(nt+n, dtype=Float32)
for i in range(1,n+1):
for j in range(nt+n-i):
foo = brentq(rJnp, points[j], points[j+1], (i,))
racines[j] = foo
points = racines
zerosj[i][:nt] = racines[:nt]
return (zerosj)
dr = 0.01
eps = dr/1000
jnz = Jn_zeros(n,nt)[n]
r1 = arange(eps,jnz[len(jnz)-1],dr)
jnzp = rJnp_zeros(n,nt)[n]
r2 = arange(eps,jnzp[len(jnzp)-1],dr)
grid(True)
plot(r1,Jn(r1,n),'b', r2,rJnp(r2,n),'r')
title((str(nt)+' first zeros'))
legend((r'$j_{'+str(n)+'}(r)$', r'$(rj_{'+str(n)+'}(r))\'$'))
plot(jnz,zeros(len(jnz)),'bo', jnzp,zeros(len(jnzp)),'rd')
gca().xaxis.set_minor_locator(MultipleLocator(1))
# gca().xaxis.set_minor_formatter(FormatStrFormatter('%d'))
show()
bessph_zeros_rec 5 10
Attachments
• snapshot.png
Scientific GUIs
As I had some problems in figuring out how to use VTK to render data contained in a three dimensional numpy array,
I have decided to share my code. This code is based on the otherwise excellent documentation for VTK and the now
outdated vtkImageImportFromArray-class created by David Gobbi found at http://public.kitware.com/cgi-bin/cvsweb.
cgi/vtk/python/?cvsroot=vtk
The example is very simple, for more advanced functionality: read the documentation.
import vtk
from numpy import *
# The only limit is that the data must be reduced to unsigned 8 bit or 16 bit
˓→integers.
# For VTK to be able to use the data, it must be stored as a VTK-image. This can be
˓→done by the vtkImageImport-class which
413
SciPy Cookbook, Release
# simple case, all axes are of length 75 and begins with the first element. For other
˓→data, this is probably not the case.
# I have to admit however, that I honestly dont know the difference between
˓→SetDataExtent() and SetWholeExtent() although
# completly opaque whereas the three different cubes are given different transperancy-
˓→values to show how it works.
alphaChannelFunc = vtk.vtkPiecewiseFunction()
alphaChannelFunc.AddPoint(0, 0.0)
alphaChannelFunc.AddPoint(50, 0.05)
alphaChannelFunc.AddPoint(100, 0.1)
alphaChannelFunc.AddPoint(150, 0.2)
# This class stores color data and can create color tables from a few color points.
˓→For this demo, we want the three cubes
# The preavius two classes stored properties. Because we want to apply these
˓→properties to the volume we want to render,
# This class describes how the volume is rendered (through ray tracing).
compositeFunction = vtk.vtkVolumeRayCastCompositeFunction()
# We can finally create our volume. We also have to specify the data for it, as well
˓→as how the data will be rendered.
volumeMapper = vtk.vtkVolumeRayCastMapper()
volumeMapper.SetVolumeRayCastFunction(compositeFunction)
volumeMapper.SetInputConnection(dataImporter.GetOutputPort())
# The class vtkVolume is used to pair the preaviusly declared volume as well as the
˓→properties to be used when rendering that volume.
volume = vtk.vtkVolume()
volume.SetMapper(volumeMapper)
volume.SetProperty(volumeProperty)
# With almost everything else ready, its time to initialize the renderer and window,
˓→as well as creating a method for exiting the application
renderer = vtk.vtkRenderer()
renderWin = vtk.vtkRenderWindow()
renderWin.AddRenderer(renderer)
renderInteractor = vtk.vtkRenderWindowInteractor()
renderInteractor.SetRenderWindow(renderWin)
# A simple function to be called when the user decides to quit the application.
def exitCheck(obj, event):
if obj.GetEventPending() != 0:
obj.SetAbortRender(1)
renderInteractor.Initialize()
# Because nothing will be rendered without any input, we order the first render
˓→manually before control is handed over to the main-loop.
renderWin.Render()
renderInteractor.Start()
volumeProperty.ShadeOn()
wxPython dialogs
Something I enjoy in matlab is the ease in which simple file selector dialogs, and status bars can be made. Now that
I use nothing but scipy, I have wanted to have simliar functions for my scipy code. Thanks to the great reference
“wxPython in Action” I have learned some of the basics again, with the promise of making very fancy GUIs if I ever
find the urge! (Check out the sample chapters, they have given the entire section on making dialogs, which is how I
initially got started with wxPython).
I often write simple translation scripts that convert some data, into another form. I like to use these for a series of data,
and share them with some coworkers who do not program. the wxPython FileSelector function comes to the rescue.
import wx
filename = wx.FileSelector()
With this basic code, filename contains a string pathname (may be unicode depending on your installation of wxPython,
more on this latter) of the selected file.
Some of the spiffing up will include using the current directory the script was started in, we do this easily
import wx
import os
filename = wx.FileSelector(default_path=os.getcwd())
If one runs such script repeatedly, it might be a good idea to do some basic clean-up after each run.
# ...
app.Destroy()
Scientific Scripts
The code below illustrates the use of the The One-Dimensional Finite-Difference Time-Domain (FDTD) algorithm to
solve the one-dimensional Schrödinger equation for simple potentials. It only requires Numpy and Matplotlib.
All the mathematical details are described in this PDF: Schrodinger_FDTD.pdf
In these figures the potential is shaded in arbitrary units in yellow, while the total energy of the wavepacket is plotted
as a green line, in the same units as the potential. So while the energy units are not those on the left axis, both energy
plots use the same units and can thus be validly compared relative to one another.
Depending on the particle energy, the yellow region may be classically forbidden (when the green line is inside the
yellow region).
The wavepacket starts at t=0 as (step potential shown):
And at the end of the simulation it can look like this, depending on the actual potential height:
items/attachments/SchrodingerFDTD_attachments/schrod_step_lo_sm2.png
items/files/SchrodingerFDTD/schrod_step_hi_sm2.png
This illustrates the tunneling through a thin barrier, depending on the barrier height. In the second case, a classical
particle would completely bounce off since its energy is lower than the potential barrier:
417
SciPy Cookbook, Release
#=============================================================================
#
# Quantum Mechanical Simulation using Finite-Difference
# Time-Domain (FDTD) Method
#
# This script simulates a probability wave in the presence of multiple
# potentials. The simulation is c arried out by using the FDTD algorithm
# applied to the Schrodinger equation. The program is intended to act as
# a demonstration of the FDTD algorithm and can be used as an educational
# aid for quantum mechanics and numerical methods. The simulation
# parameters are defined in the code constants and can be freely
# manipulated to see different behaviors.
#
# NOTES
#
# The probability density plots are amplified by a factor for visual
# purposes. The psi_p quanity contains the actual probability density
# without any rescaling.
#
# BEWARE: The time step, dt, has strict requirements or else the
# simulation becomes unstable.
#
# The code has three built-in potential functions for demonstration.
#
# 1) Constant potential: Demonstrates a free particle with dispersion.
#
# 2) Step potential: Demonstrates transmission and reflection.
#
# 3) Potential barrier: Demonstrates tunneling.
#
# By tweaking the height of the potential (V0 below) as well as the
# barrier thickness (THCK below), you can see different behaviors: full
#=============================================================================
#
# Simulation Constants. Be sure to include decimal points on appropriate
# variables so they become floats instead of integers.
#
N = 1200 # Number of spatial points.
T = 5*N # Number of time steps. 5*N is a nice value for terminating
# before anything reaches the boundaries.
Tp = 50 # Number of time steps to increment before updating the plot.
# I think there's a problem with pylab, because it resets the xlim after
# plotting the E line. Fix it back manually.
pylab.xlim(xmin,xmax)
# Direct index assignment is MUCH faster than using a spatial FOR loop, so
# these constants are used in the update equations. Remember that Python uses
# zero-based indexing.
IDX1 = range(1,N-1) # psi [ k ]
IDX2 = range(2,N) # psi [ k + 1 ]
IDX3 = range(0,N-2) # psi [ k - 1 ]
for t in range(T+1):
# Precompute a couple of indexing constants, this speeds up the computation
psi_rPR = psi_r[PR]
psi_iPR = psi_i[PR]
# Apply the update equations.
psi_i[FU,IDX1] = psi_i[PA,IDX1] + \
c1*(psi_rPR[IDX2] - 2*psi_rPR[IDX1] +
psi_rPR[IDX3])
psi_i[FU] -= c2V*psi_r[PR]
psi_r[FU,IDX1] = psi_r[PA,IDX1] - \
c1*(psi_iPR[IDX2] - 2*psi_iPR[IDX1] +
psi_iPR[IDX3])
psi_r[FU] += c2V*psi_i[PR]
# Increment the time steps. PR -> PA and FU -> PR
psi_r[PA] = psi_rPR
psi_r[PR] = psi_r[FU]
psi_i[PA] = psi_iPR
psi_i[PR] = psi_i[FU]
# Only plot after a few iterations to make the simulation run faster.
if t % Tp == 0:
# Compute observable probability for the plot.
psi_p = psi_r[PR]**2 + psi_i[PR]**2
# Update the plots.
lineR.set_ydata(psi_r[PR])
lineI.set_ydata(psi_i[PR])
# Note: we plot the probability density amplified by a factor so it's a
# bit easier to see.
lineP.set_ydata(6*psi_p)
pylab.draw()
# So the windows don't auto-close at the end if run outside ipython
pylab.ioff()
pylab.show()
Attachments
• Schrodinger_FDTD.pdf
• schrod_barrier_hi_sm2.png
• schrod_barrier_lo_sm2.png
• schrod_step_hi_sm2.png
• schrod_step_init_sm2.png
• schrod_step_lo_sm2.png
Signal processing
The following plot shows the time required to apply a finite impulse response (FIR) filter of varying length to a signal
of length 131072 using several different functions that are available in numpy and scipy. The details of how this figure
was created are given below.
fig
There are several functions in the numpy and scipy libraries that can be used to apply a FIR filter to a signal. From
scipy.signal, lfilter() is designed to apply a discrete IIR filter to a signal, so by simply setting the array of denominator
coefficients to [1.0], it can be used to apply a FIR filter. Applying a FIR filter is equivalent to a discrete convolution,
so one can also use convolve() from numpy, convolve() or fftconvolve() from scipy.signal, or convolve1d() from
scipy.ndimage. In this page, we demonstrate each of these functions, and we look at how the computational time
varies when the data signal size is fixed and the FIR filter length is varied. We’ll use a data signal length of 131072,
which is 2**17. We assume that we have m channels of data, so our input signal is an m by n array. For the one
dimensional timing, we use use a source array the size of 2*10^4 with varying number of filter coefficients.
425
SciPy Cookbook, Release
We assume our FIR filter coefficients are in a one-dimensional array b. The function numpy.convolve only accepts
one-dimensional arrays, so we’ll have to use a python loop over our input array to perform the convolution over all the
channels. One way to do that is
We use a list comprehension to loop over the rows of x, and pass the result to np.array to reassemble the filtered data
into a two-dimensional array. It should be noted that in the one-dimensional case, scipy.signal.convolve
possibly uses np.convolve.
Both signal.convolve and signal.fftconvolve perform a two-dimensional convolution of two-
dimensional arrays. To filter our m by n array with either of these functions, we shape our filter to be a two-dimensional
array, with shape 1 by len(b). The python code looks like this:
where x is a numpy array with shape (m, n), and b is the one-dimensional array of FIR filter coefficients. b[np.newaxis,
:] is the two dimensional view of b, with shape 1 by len(b). y is the filtered data; it includes only those terms for which
the full convolution was computed, so it has shape (m, n - len(b) + 1).
ndimage.convolve1d() is designed to do a convolution of a 1d array along the given axis of another n-dimensional
array. It does not have the option mode=’valid’, so to extract the valid part of the result, we slice the result of the
function:
signal.lfilter is designed to filter one-dimensional data. It can take a two-dimensional array (or, in general, an n-
dimensional array) and filter the data in any given axis. It can also be used for IIR filters, so in our case, we’ll pass in
[1.0] for the denominator coefficients. In python, this looks like:
y = lfilter(b, [1.0], x)
To obtain exactly the same array as computed by convolve or fftconvolve (i.e. to get the equivalent of the ‘valid’ mode),
we must discard the beginning of the array computed by lfilter. We can do this by slicing the array immediately after
the call to filter:
The following script computes and plots the results of applying a FIR filter to a 2 by 131072 array of data, with a series
of FIR filters of increasing length.
#!python
%matplotlib inline
from __future__ import print_function
import time
import numpy as np
from numpy import convolve as np_convolve
from scipy.signal import fftconvolve, lfilter, firwin
from scipy.signal import convolve as sig_convolve
from scipy.ndimage import convolve1d
import matplotlib.pyplot as plt
conv_time = []
npconv_time = []
fftconv_time = []
conv1d_time = []
lfilt_time = []
diff_list = []
diff2_list = []
diff3_list = []
else:
print( "No! Something went wrong.")
plt.subplot(1, 2, 2)
plt.loglog(Ns_1d, conv1d_time2, 'k-p', label='ndimage.convolve1d')
plt.loglog(Ns_1d, lconv_time2, 'c-o', label='signal.lfilter')
plt.loglog(Ns_1d, fftconv_time2, 'm-*', markersize=8, label='signal.fftconvolve')
plt.loglog(Ns_1d, sig_conv_time2, 'g-d', label='signal.convolve')
plt.loglog(Ns_1d, npconv_time2, 'b-s', label='np.convolve')
plt.grid()
plt.xlabel('Number of taps')
plt.ylabel('Time to filter (seconds)')
plt.title('One dimensional timing')
plt.legend(loc='best')
plt.show()
The plot shows that, depending on the number of taps, either numpy.convolve or scipy.signal.
fftconvolve/scipy.signal.convolve is the fastest. The above script can be used to explore variations
of these results.
It should be noted that scipy.signal.convolve makes a choice to use scipy.signal.fftconvolve,
np.convolve or the convolve implementation found naturally in scipy.signal.convolve depending on the
size/shape of the input arrays.
Section author: WarrenWeckesser, WarrenWeckesser, Scott Sievert
Butterworth Bandpass
This cookbook recipe demonstrates the use of scipy.signal.butter to create a bandpass Butterworth filter.
scipy.signal.freqz is used to compute the frequency response, and scipy.signal.lfilter is used to
apply the filter to a signal. (This code was originally given in an answer to a question at stackoverflow.com.)
from scipy.signal import butter, lfilter
return b, a
def run():
import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import freqz
plt.show()
run()
Communication theory
These two examples illustrate simple simulation of a digital BPSK modulated communication system where only one
sample per symbol is used, and signal is affected only by AWGN noise.
In the first example, we cycle through different signal to noise values, and the signal length is a function of theoretical
probability of error. As a rule of thumb, we want to count about 100 errors for each SNR value, which determines the
length of the signal (and noise) vector(s).
#!/usr/bin/python
# BPSK digital modulation example
# by Ivo Maljevic
SNR_MIN = 0
SNR_MAX = 9
Eb_No_dB = arange(SNR_MIN,SNR_MAX+1)
SNR = 10**(Eb_No_dB/10.0) # linear SNR
Pe = empty(shape(SNR))
BER = empty(shape(SNR))
loop = 0
for snr in SNR: # SNR loop
Pe[loop] = 0.5*erfc(sqrt(snr))
VEC_SIZE = ceil(100/Pe[loop]) # vector length is a function of Pe
# noise
n = sqrt(No/2)*random.randn(VEC_SIZE)
# signal + noise
x = s + n
plt.xlabel('Eb/No (dB)')
plt.ylabel('BER')
plt.show()
In the second, slightly modified example, the problem of signal length growth is solved by braking a signal into
frames.Namely, the number of samples for a given SNR grows quickly, so that the simulation above is not practical
for Eb/No values greater than 9 or 10 dB.
#!/usr/bin/python
# BPSK digital modulation: modified example
# by Ivo Maljevic
rand = random.rand
normal = random.normal
SNR_MIN = 0
SNR_MAX = 10
FrameSize = 10000
Eb_No_dB = arange(SNR_MIN,SNR_MAX+1)
Eb_No_lin = 10**(Eb_No_dB/10.0) # linear SNR
# Allocate memory
Pe = empty(shape(Eb_No_lin))
BER = empty(shape(Eb_No_lin))
# signal vector (for faster exec we can repeat the same frame)
s = 2*random.randint(0,high=2,size=FrameSize)-1
loop = 0
for snr in Eb_No_lin:
No = 1.0/snr
Pe[loop] = 0.5*erfc(sqrt(snr))
nFrames = ceil(100.0/FrameSize/Pe[loop])
error_sum = 0
scale = sqrt(No/2)
# error counting
err = where (y != s)
error_sum += len(err[0])
plt.semilogy(Eb_No_dB, Pe,'r',linewidth=2)
plt.semilogy(Eb_No_dB, BER,'-s')
plt.grid(True)
plt.legend(('analytical','simulation'))
plt.xlabel('Eb/No (dB)')
plt.ylabel('BER')
plt.show()
Attachments
• BPSK_BER.PNG
FIR filter
This cookbook example shows how to design and use a low-pass FIR filter using functions from scipy.signal.
The pylab module from matplotlib is used to create plots.
#!python
#------------------------------------------------
# Create a signal for demonstration.
#------------------------------------------------
sample_rate = 100.0
nsamples = 400
t = arange(nsamples) / sample_rate
x = cos(2*pi*0.5*t) + 0.2*sin(2*pi*2.5*t+0.1) + \
0.2*sin(2*pi*15.3*t) + 0.1*sin(2*pi*16.7*t + 0.1) + \
0.1*sin(2*pi*23.45*t+.8)
#------------------------------------------------
# Create a FIR filter and apply it to x.
#------------------------------------------------
# Compute the order and Kaiser parameter for the FIR filter.
N, beta = kaiserord(ripple_db, width)
#------------------------------------------------
# Plot the FIR filter coefficients.
#------------------------------------------------
figure(1)
plot(taps, 'bo-', linewidth=2)
title('Filter Coefficients (%d taps)' % N)
grid(True)
#------------------------------------------------
# Plot the magnitude response of the filter.
#------------------------------------------------
figure(2)
clf()
w, h = freqz(taps, worN=8000)
plot((w/pi)*nyq_rate, absolute(h), linewidth=2)
xlabel('Frequency (Hz)')
ylabel('Gain')
title('Frequency Response')
ylim(-0.05, 1.05)
grid(True)
#------------------------------------------------
# Plot the original and filtered signals.
#------------------------------------------------
figure(3)
# Plot the original signal.
plot(t, x)
# Plot the filtered signal, shifted to compensate for the phase delay.
plot(t-delay, filtered_x, 'r-')
# Plot just the "good" part of the filtered signal. The first N-1
# samples are "corrupted" by the initial conditions.
plot(t[N-1:]-delay, filtered_x[N-1:], 'g', linewidth=4)
xlabel('t')
grid(True)
show()
The final plots shows the original signal (thin blue line), the filtered signal (shifted by the appropriate phase delay to
align with the original signal; thin red line), and the “good” part of the filtered signal (heavy green line). The “good
part” is the part of the signal that is not affected by the initial conditions.
Section author: WarrenWeckesser
Attachments
• fir_demo_freq_resp.png
• fir_demo_signals.png
• fir_demo_taps.png
Filtfilt
This sample code demonstrates the use of the function scipy.signal.filtfilt, a linear filter that achieves zero phase delay
by applying an IIR filter to a signal twice, once forwards and once backwards. The order of the filter is twice the
original filter order. The function also computes the initial filter parameters in order to provide a more stable response
(via lfilter_zi).
For comparison, this script also applies the same IIR filter to the signal using scipy.signal.lfilter; for these calculations,
lfilter_zi is used to choose appropriate initial conditions for the filter. Without this, these plots would have long
transients near 0. As it is, they have long transients near the initial value of the signal.
from numpy import sin, cos, pi, linspace
from numpy.random import randn
from scipy.signal import lfilter, lfilter_zi, filtfilt, butter
from matplotlib.pyplot import plot, legend, show, hold, grid, figure, savefig
# Apply the filter to xn. Use lfilter_zi to choose the initial condition
# of the filter.
zi = lfilter_zi(b, a)
z, _ = lfilter(b, a, xn, zi=zi*xn[0])
'lfilter, twice',
'filtfilt'),
loc='best')
hold(False)
grid(True)
show()
#savefig('plot.png', dpi=65)
This page demonstrates two functions in scipy.signal for generating frequency-swept signals: ‘chirp‘ and
‘sweep_poly‘.
Some of these require SciPy 0.8.
To run the code samples, you will need the following imports:
import numpy as np
from scipy.signal import chirp, sweep_poly
Sample code:
t = np.linspace(0, 10, 5001)
w = chirp(t, f0=12.5, f1=2.5, t1=10, method='linear')
Sample code:
t = np.linspace(0, 10, 5001)
w = chirp(t, f0=12.5, f1=2.5, t1=10, method='quadratic')
Sample code:
Sample code:
Sample code:
Attachments
• chirp_hyperbolic.png
• chirp_linear.png
• chirp_logarithmic.png
• chirp_plot.py
• chirp_quadratic.png
• chirp_quadratic_v0false.png
• sweep_poly.png
Kalman filtering
This is code implements the example given in pages 11-15 of An Introduction to the Kalman Filter by Greg Welch and
Gary Bishop, University of North Carolina at Chapel Hill, Department of Computer Science.
# by Andrew D. Straw
import numpy as np
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (10, 8)
# intial parameters
n_iter = 50
sz = (n_iter,) # size of array
x = -0.37727 # truth value (typo in example at top of p. 13 calls this z)
z = np.random.normal(x,0.1,size=sz) # observations (normal about x, sigma=0.1)
# intial guesses
xhat[0] = 0.0
P[0] = 1.0
for k in range(1,n_iter):
# time update
xhatminus[k] = xhat[k-1]
Pminus[k] = P[k-1]+Q
# measurement update
K[k] = Pminus[k]/( Pminus[k]+R )
xhat[k] = xhatminus[k]+K[k]*(z[k]-xhatminus[k])
P[k] = (1-K[k])*Pminus[k]
plt.figure()
plt.plot(z,'k+',label='noisy measurements')
plt.plot(xhat,'b-',label='a posteri estimate')
plt.axhline(x,color='g',label='truth value')
plt.legend()
plt.title('Estimate vs. iteration step', fontweight='bold')
plt.xlabel('Iteration')
plt.ylabel('Voltage')
plt.figure()
valid_iter = range(1,n_iter) # Pminus not valid at step 0
plt.plot(valid_iter,Pminus[valid_iter],label='a priori error estimate')
plt.title('Estimated $\it{\mathbf{a \ priori}}$ error vs. iteration step', fontweight=
˓→'bold')
plt.xlabel('Iteration')
plt.ylabel('$(Voltage)^2$')
plt.setp(plt.gca(),'ylim',[0,.01])
plt.show()
The Savitzky Golay filter is a particular type of low-pass filter, well adapted for data smoothing. For further information
see: http://www.wire.tu-bs.de/OLDWEB/mameyer/cmr/savgol.pdf (or http://www.dalkescientific.com/writings/NBN/
data/savitzky_golay.py for a pre-numpy implementation).
#!python
def savitzky_golay(y, window_size, order, deriv=0, rate=1):
r"""Smooth (and optionally differentiate) data with a Savitzky-Golay filter.
The Savitzky-Golay filter removes high frequency noise from data.
It has the advantage of preserving the original shape and
features of the signal better than other types of filtering
approaches, such as moving averages techniques.
y : array_like, shape (N,)
the values of the time history of the signal.
window_size : int
the length of the window. Must be an odd integer number.
order : int
the order of the polynomial used in the filtering.
try:
window_size = np.abs(np.int(window_size))
order = np.abs(np.int(order))
except ValueError, msg:
raise ValueError("window_size and order have to be of type int")
if window_size % 2 != 1 or window_size < 1:
raise TypeError("window_size size must be a positive odd number")
if window_size < order + 2:
raise TypeError("window_size is too small for the polynomials order")
order_range = range(order+1)
half_window = (window_size -1) // 2
# precompute coefficients
b = np.mat([[k**i for i in order_range] for k in range(-half_window, half_
˓→window+1)])
In lines 61-62 the coefficients of the local least-square polynomial fit are pre-computed. These will be used later at
line 68, where they will be correlated with the signal. To prevent spurious results at the extremes of the data, the signal
is padded at both ends with its mirror image, (lines 65-67).
CD-spectrum of a protein. Black: raw data. Red: filter applied
One of the most popular applications of S-G filter, apart from smoothing UV-VIS and IR spectra, is smoothing of
curves obtained in electroanalytical experiments. In cyclic voltammetry, voltage (being the abcissa) changes like a
triangle wave. And in the signal there are cusps at the turning points (at switching potentials) which should never be
smoothed. In this case, Savitzky-Golay smoothing should be done piecewise, ie. separately on pieces monotonic in x:
#!python numbers=disable
def savitzky_golay_piecewise(xvals, data, kernel=11, order =4):
turnpoint=0
last=len(xvals)
if xvals[1]>xvals[0] : #x is increasing?
for i in range(1,last) : #yes
if xvals[i]<xvals[i-1] : #search where x starts to fall
turnpoint=i
break
else: #no, x is decreasing
for i in range(1,last) : #search where it starts to rise
if xvals[i]>xvals[i-1] :
turnpoint=i
break
if turnpoint==0 : #no change in direction of x
return savitzky_golay(data, kernel, order)
else:
#smooth the first piece
firstpart=savitzky_golay(data[0:turnpoint],kernel,order)
#recursively smooth the rest
rest=savitzky_golay_piecewise(xvals[turnpoint:], data[turnpoint:], kernel,
˓→order)
return numpy.concatenate((firstpart,rest))
Savitsky-Golay filters can also be used to smooth two dimensional data affected by noise. The algorithm is exactly
the same as for the one dimensional case, only the math is a bit more tricky. The basic algorithm is as follow: 1. for
each point of the two dimensional matrix extract a sub-matrix, centered at that point and with a size equal to an odd
number “window_size”. 2. for this sub-matrix compute a least-square fit of a polynomial surface, defined as p(x,y) =
a0 + a1*x + a2*y + a3*x^2 + a4*y^2 + a5*x*y + ... . Note that x and y are equal to zero at the central point. 3. replace
the initial central point with the value computed with the fit.
Note that because the fit coefficients are linear with respect to the data spacing, they can pre-computed for efficiency.
Moreover, it is important to appropriately pad the borders of the data, with a mirror image of the data itself, so that the
evaluation of the fit at the borders of the data can happen smoothly.
Here is the code for two dimensional filtering.
#!python numbers=enable
def sgolay2d ( z, window_size, order, derivative=None):
"""
"""
# number of terms in the polynomial expression
n_terms = ( order + 1 ) * ( order + 2) / 2.0
if window_size % 2 == 0:
raise ValueError('window_size must be odd')
half_size = window_size // 2
# coordinates of points
ind = np.arange(-half_size, half_size+1, dtype=np.float64)
dx = np.repeat( ind, window_size )
dy = np.tile( ind, [window_size, 1]).reshape(window_size**2, )
# bottom band
band = z[-1, :]
Z[-half_size:, half_size:-half_size] = band + np.abs( np.flipud( z[-half_size-1:-
˓→1, :] ) -band )
# left band
band = np.tile( z[:,0].reshape(-1,1), [1,half_size])
Z[half_size:-half_size, :half_size] = band - np.abs( np.fliplr( z[:, 1:half_
˓→size+1] ) - band )
# right band
band = np.tile( z[:,-1].reshape(-1,1), [1,half_size] )
Z[half_size:-half_size, -half_size:] = band + np.abs( np.fliplr( z[:, -half_size-
˓→1:-1] ) - band )
# central band
Z[half_size:-half_size, half_size:-half_size] = z
Here is a demo
#!python number=enable
# add noise
Zn = Z + np.random.normal( 0, 0.2, Z.shape )
# filter it
Zf = sgolay2d( Zn, window_size=29, order=4)
# do some plotting
matshow(Z)
matshow(Zn)
matshow(Zf)
Attachments
• Original+noise+filtered.pdf
• Original+noise.pdf
• Original.pdf
• cd_spec.png
Smoothing of a 1D signal
This method is based on the convolution of a scaled window with the signal. The signal is prepared by introducing
reflected window-length copies of the signal at both ends so that boundary effect are minimized in the beginning and
end part of the output signal.
import numpy
def smooth(x,window_len=11,window='hanning'):
"""smooth the data using a window with requested size.
This method is based on the convolution of a scaled window with the signal.
The signal is prepared by introducing reflected copies of the signal
(with the window size) in both ends so that transient parts are minimized
in the begining and end part of the output signal.
input:
x: the input signal
window_len: the dimension of the smoothing window; should be an odd integer
window: the type of window from 'flat', 'hanning', 'hamming', 'bartlett',
˓→'blackman'
output:
the smoothed signal
example:
t=linspace(-2,2,0.1)
x=sin(t)+randn(len(t))*0.1
y=smooth(x)
see also:
TODO: the window parameter could be the window itself if an array instead of a
˓→string
NOTE: length(output) != length(input), to correct this: return y[(window_len/2-
˓→1):-(window_len/2)] instead of just y.
"""
if x.ndim != 1:
raise ValueError, "smooth only accepts 1 dimension arrays."
if window_len<3:
return x
s=numpy.r_[x[window_len-1:0:-1],x,x[-1:-window_len:-1]]
#print(len(s))
if window == 'flat': #moving average
w=numpy.ones(window_len,'d')
else:
w=eval('numpy.'+window+'(window_len)')
y=numpy.convolve(w/w.sum(),s,mode='valid')
return y
def smooth_demo():
t=linspace(-4,4,100)
x=sin(t)
xn=x+randn(len(t))*0.1
y=smooth(x)
ws=31
subplot(211)
plot(ones(ws))
hold(True)
for w in windows[1:]:
eval('plot('+w+'(ws) )')
axis([0,30,0,1.1])
legend(windows)
title("The smoothing windows")
subplot(212)
plot(x)
plot(xn)
for w in windows:
plot(smooth(xn,10,w))
l=['original signal', 'signal with noise']
l.extend(windows)
legend(l)
title("Smoothing a noisy signal")
show()
if __name__=='__main__':
smooth_demo()
[](files/attachments/SignalSmooth/smoothsignal.jpg
Convolving a noisy image with a gaussian kernel (or any bell-shaped curve) blurs the noise out and leaves the low-
X, Y = mgrid[-70:70, -70:70]
Z = cos((X**2+Y**2)/200.)+ random.normal(size=X.shape)
blur_image(Z, 3)
The attachment cookb_signalsmooth.py contains a version of this script with some stylistic cleanup.
[”Cookbook/FiltFilt”] which can be used to smooth the data by low-pass filtering and does not delay the signal (as this
smoother does).
Section author: Unknown[1], GaelVaroquaux, Unknown[143], Unknown[144], Unknown[145], Unknown[146], Un-
known[147], Unknown[148], WesTurner
Attachments
• convolved.png
• cookb_signalsmooth.py
• noisy.png
• smoothsignal.jpg
Outdated
NOTE: This entry was last updated 2006-11-04 and contains information that is not relevant today (as of 2013).
Here is presented !NumInd (Numerical Independent Extension), an example of a class written in Pyrex. This class
can wrap information from any object coming from Numeric, numarray or !NumPy without any dependency of those
packages for compiling the extension. It lets you to create a uniform interface in both Python and C spaces. In addition,
you can personalize it by adding new methods or properties.
For this extension to work, you need a numerical package that supports the array interface. Any of these versions
would be good enough:
* NumPy (all versions) * Numeric (>=24.2) * numarray (>=1.5.1)
The !NumInd class shown below takes a Numeric/numarray/!NumPy object and creates another object that can be
accessed in an uniform way from both Python and Pyrex (and hence, C) space. Moreover, it exposes an array in-
terface so that you can re-wrap this object with any Numeric/numarray/!NumPy. All of these features are achieved
without actually copying the data itself. This opens the door to the possibility to develop applications that supports the
Numeric/numarray/!NumPy triad without a need to compile against any of them.
Warning: This class supports mainly homogeneous datasets, but it wouldn’t be difficult to support recarrays as well.
This is a work in-progress anyway.
# This Pyrex extension class can take a numpy/numarray/Numeric object
# as a parameter and wrap it so that its information can be accessed
# in a standard way, both in Python space and C space.
#
# Heavily based on an idea of Andrew Straw. See
# http://www.scipy.org/Cookbook/ArrayStruct_and_Pyrex
# Very inspiring! :-)
#
# First version: 2006-03-25
# Last update: 2006-03-25
# Author: Francesc Altet
457
SciPy Cookbook, Release
import sys
# for PyArrayInterface:
CONTIGUOUS=0x01
FORTRAN=0x02
ALIGNED=0x100
NOTSWAPPED=0x200
WRITEABLE=0x400
# byteorder dictionary
byteorder = {'<':'little', '>':'big'}
self._nd = len(array_shape)
# The shape
self._shape = <Py_intptr_t *>malloc(self._nd*sizeof(Py_intptr_t))
for i from 0 <= i < self._nd:
self._shape[i] = self._t_shape[i]
# The strides (compute them if needed)
array_strides = undarray.__array_strides__
self._t_strides = array_strides
self._strides = <Py_intptr_t *>malloc(self._nd*sizeof(Py_intptr_t))
if array_strides:
for i from 0 <= i < self._nd:
self._strides[i] = array_strides[i]
else:
# strides is None. Compute them explicitely.
self._t_strides = [0] * self._nd
stride = int(self.typestr[2:])
for i from self._nd > i >= 0:
self._strides[i] = stride
self._t_strides[i] = stride
stride = stride * array_shape[i]
self._t_strides = tuple(self._t_strides)
# Populate the C array interface
self.inter = self._get_array_interface()
property shape:
def __get__(self):
return self._t_shape
property strides:
def __get__(self):
return self._t_strides
property typestr:
def __get__(self):
return self._undarray.__array_typestr__
property readonly:
def __get__(self):
return self._undarray.__array_data__[1]
property __array_struct__:
"Allows other numerical packages to obtain a new object."
def __get__(self):
if hasattr(self._undarray, "__array_struct__"):
return self._undarray.__array_struct__
else:
# No an underlying array with __array_struct__
# Deliver an equivalent PyCObject.
Py_INCREF(self)
return PyCObject_FromVoidPtrAndDesc(<void*>self.inter,
<void*>self,
free_array_interface)
undarray = self._undarray
typestr = self.typestr
inter = <PyArrayInterface *>malloc(sizeof(PyArrayInterface))
if inter is NULL:
raise MemoryError()
inter.version = 2
inter.nd = self._nd
inter.typekind = ord(typestr[1])
inter.itemsize = int(typestr[2:])
inter.flags = 0 # initialize flags
if typestr[0] == '|':
inter.flags = inter.flags | NOTSWAPPED
elif byteorder[typestr[0]] == sys.byteorder:
inter.flags = inter.flags | NOTSWAPPED
if not self.readonly:
inter.flags = inter.flags | WRITEABLE
# XXX how to determine the ALIGNED flag?
inter.strides = self._strides
inter.shape = self._shape
# Get the data address
data_address = int(undarray.__array_data__[0], 16)
inter.data = <void*>PyInt_AsLong(data_address)
return inter
def __dealloc__(self):
free(self._shape)
free(self._strides)
free(self.inter)
In order to get an idea of what the above extension offers, try to run this script against the !NumInd extension:
import Numeric
import numarray
import numpy
import numind
nu.shape = (4,3)
na=numarray.arange(12)
na.shape = (4,3)
np=numpy.arange(12)
np.shape = (4,3)
Attachments
• test_output.txt
NOTE: this entry was last updated 2007-11-18, and may contain information that is not as relevant today (as of 2013).
Pyrex is a language for writing C extensions to Python. Its syntax is very similar to writing Python. A file is compiled
to a file, which is then compiled like a standard C extension module for Python. Many people find writing extension
modules with Pyrex preferable to writing them in C or using other tools, such as SWIG.
See http://numeric.scipy.org/ for an explanation of the interface. The following packages support the interface:
* !NumPy (all versions) * Numeric (>=24.2) * numarray (>=1.5.0)
Here is a Pyrex file which shows how to export its data using the array interface. This allows various Python types to
have a view of the data without actually copying the data itself.
cdef extern from "Python.h":
ctypedef int Py_intptr_t
void Py_INCREF(object)
void Py_DECREF(object)
object PyCObject_FromVoidPtrAndDesc( void* cobj, void* desc, void (*destr)(void *,
˓→ void *))
# for PyArrayInterface:
CONTIGUOUS=0x01
FORTRAN=0x02
ALIGNED=0x100
NOTSWAPPED=0x200
WRITEABLE=0x400
inter = <PyArrayInterface*>ptr
arrpy = <object>arr
Py_DECREF( arrpy )
free(inter)
self.shape[0]=height
self.shape[1]=width
def __dealloc__(self):
free(self.data)
property __array_struct__:
def __get__(self):
cdef PyArrayInterface* inter
nd = 2
inter.version = 2
inter.nd = nd
inter.typekind = 'u'[0] # unsigned int
inter.itemsize = 1
inter.flags = NOTSWAPPED | ALIGNED | WRITEABLE
inter.strides = self.strides
inter.shape = self.shape
inter.data = self.data
Py_INCREF(self)
obj = PyCObject_FromVoidPtrAndDesc( <void*>inter, <void*>self, free_array_
˓→interface)
return obj
One example is the get_data_copy() function of _cam_iface_shm.pyx in the motmot camera utilities software.
In this use example, an image is copied from shared memory into an externally malloced buffer supporting the
{{{array_struct}}} interface. (The shared memory stuff has only been tested on linux, but the rest should work
anywhere.)
• [”Cookbook/Pyrex_and_NumPy”]
Section author: AndrewStraw, DavidLinke, Unknown[25]
Data Frames
NOTE: this entry is outdated. You will likely want to use pandas instead.
The DataFrame.py class, posted by Andrew Straw on the) scipy-user mailing list original link, is an extremely
useful tool for using alphanumerical tabular data, as often found in databases. Some data which might be ingested into
a data frame could be: || ID || LOCATION || VAL_1 || VAL_2 || || 01 || Somewhere || 0.1 || 0.6 || || 02 || Somewhere Else
|| 0.2 || 0.5 || || 03 || Elsewhere || 0.3 || 0.4 ||
The DataFrame.py class can be populated from data from a CSV file (comman-separated values). In its current
implementation, these files are read with Python’s own CSV module, which allows for a great deal of customisation.
A sample file CSV file from Access2000 is in CSVSample.csv We first import the module:
import DataFrame
df = DataFrame.read_csv("CSVSample.csv",dialect=DataFrame.access2000)
(note that the dialect is actually defined in the DataFrame class). It is often useful to filter the data according to some
criterion.
Starting with Python 2.6, the sets module is deprecated, in order to get rid of the warning, replace
imports sets
with
try:
set
except NameError:
from sets import Set as set
Attachments
• CSVSample.csv
• DataFrame.py
• SampleCSV.csv
export DESTDIR=~/Mayavi2
Is is also supposed that you download and uncompress all tarball sources in a specific directory, named src/, for
example.
Installing python2.3/python2.4
Then run:
Installing wx-Python2.6
Note: You should have GTK 2 installed i.e. you should have libgtk-2.6.* ‘and‘ libgtk2.6.*-dev packages installed.
Then run:
cd wxPython-src-2.6.3.2/ && ./configure --enable-unicode --with-opengl --prefix=
˓→$DESTDIR
or follow instructions on wx-Python2.6 webpage, creating a little script which runs automatically the commands above.
Then install all:
make install; make -C contrib/src/animate install ; make -C contrib/src/gizmos
˓→install; make -C contrib/src/stc install
and run:
./setup.py build_ext --inplace --debug UNICODE=1
Before installing !MayaVi2, you have to set some environment variables, to tell !MayaVi2 where python extensions
can be found.
Under sh shell-like, type:
export PYTHONPATH=$DESTDIR:$PYTHONPATH
export LD_LIBRARY_PATH=$DESTDIR:$LD_LIBRARY_PATH
+
+def _conv_type_shape(im):
+ shape = im.size[::-1]
+ typ, extra = _MODE_CONV[im.mode]
+ if extra is None:
+ return shape, typ
+ else:
+ return shape+(extra,), typ
+
+
+
+
MODES = _MODEINFO.keys()
MODES.sort()
@@ -491,6 +527,22 @@
return string.join(data, "")
##
+ # Returns the array_interface dictionary
+ #
+ # @return A dictionary with keys 'shape', 'typestr', 'data'
+
+ def __get_array_interface__(self):
+ new = {}
+ shape, typestr = _conv_type_shape(self)
+ new['shape'] = shape
+ new['typestr'] = typestr
+ new['data'] = self.tostring()
+ return new
+
+ __array_interface__ = property(__get_array_interface__, None, doc="array
˓→ interface")
+
+
+ ##
# Returns the image converted to an X11 bitmap. This method
# only works for mode "1" images.
#
@@ -1749,7 +1801,61 @@
+
##
+# (New in 1.1.6) Create an image memory from an object exporting
+# the array interface (using the buffer protocol).
+#
+# If obj is not contiguous, then the tostring method is called
+# and frombuffer is used
+#
+# @param obj Object with array interface
+# @param mode Mode to use (will be determined from type if None)
+
+def fromarray(obj, mode=None):
+ arr = obj.__array_interface__
+ shape = arr['shape']
+ ndim = len(shape)
+ try:
+ strides = arr['strides']
+ except KeyError:
+ strides = None
+ if mode is None:
+ typestr = arr['typestr']
+ if not (typestr[0] == '|' or typestr[0] == _ENDIAN or
+ typestr[1:] not in ['u1', 'b1', 'i4', 'f4']):
+ raise TypeError, "cannot handle data-type"
+ typestr = typestr[:2]
+ if typestr == 'i4':
+ mode = 'I'
+ elif typestr == 'f4':
+ mode = 'F'
+ elif typestr == 'b1':
+ mode = '1'
+ elif ndim == 2:
+ mode = 'L'
+ elif ndim == 3:
+ mode = 'RGB'
+ elif ndim == 4:
+ mode = 'RGBA'
+ else:
+ raise TypeError, "Do not understand data."
+ ndmax = 4
+ bad_dims=0
+ if mode in ['1','L','I','P','F']:
+ ndmax = 2
+ elif mode == 'RGB':
+ ndmax = 3
+ if ndim > ndmax:
+ raise ValueError, "Too many dimensions."
+
+ size = shape[:2][::-1]
+ if strides is not None:
+ obj = obj.tostring()
+
+ return frombuffer(mode, size, obj)
+
+##
# Opens and identifies the given image file.
# <p>
# This is a lazy operation; this function identifies the file, but the
Exemple
NOTE: The official documentation and important remarks from the developers can be found at the timseries scikit
sourceforge page.
Question
In my original data nodata values are marked with “-999”. How can I import the data or create the time series and
exclude these no data points from further processing? (flagging no data in timeseries - http://permalink.gmane.org/
gmane.comp.python.scientific.user/14455)
Answer
* Use indexing
#!python
myvalues_ts_hourly[myvalues_ts_hourly==-999] = M.masked
01.02.2004;14;655
01.02.2004;15;-999
01.02.2004;16;-999
01.02.2004;17;-999
01.02.2004;18;744
01.02.2004;19;445
01.02.2004;20;374
01.02.2004;21;168
01.02.2004;22;995
01.02.2004;23;943
01.02.2004;24;326
02.02.2004;1;83.98
02.02.2004;2;302.26
02.02.2004;3;310.76
02.02.2004;4;-999
02.02.2004;5;62.22
02.02.2004;6;85.34
02.02.2004;7;324.02
02.02.2004;8;53.04
02.02.2004;9;336.94
02.02.2004;10;189.38
02.02.2004;11;197.54
02.02.2004;12;120.36
02.02.2004;13;164.9
02.02.2004;14;222.7
02.02.2004;15;34.74
02.02.2004;16;85.34
02.02.2004;17;53.04
02.02.2004;18;252.96
02.02.2004;19;151.3
02.02.2004;20;-999
02.02.2004;21;57.12
02.02.2004;22;338.3
02.02.2004;23;320.62
02.02.2004;24;110.84}}}
{{{
#!python
import numpy as N
import maskedarray as M
import timeseries as ts
data = N.loadtxt("tmp.txt", dtype='|S10', skiprows=2)
dates = ts.date_array([ts.Date(freq='H',string="%s %s:00" %
(d[0],int(d[1])-1))
for d in data],
freq='H')
series = ts.time_series(data[:,-1].astype(N.float_),
dates,
mask=(data[:,-1]=='-999'))
frequencies
Question
Is there a example data set for at least one year on a high temporal resolution: 15min or at least 1h. Having such a
common data set one could set up tutorials examples and debug or ask questions easier because all will have the same
(non-confidetial) data on the disk.
Answer
For hours, you have the ‘hourly’ frequency. For 15min, you have the ‘minutely’ frequency, from which you can select
every other 15th point.
(cf. Re: roadmap/plans for timeseries package - http://permalink.gmane.org/gmane.comp.python.scientific.user/1459)
This formatting may be the result of some logging devices which for instance record 5 minutes averaged values which
have been taken with a device using a sample rate of 16 sec. As well, syntetically generated data sets which have been
created by scientifc models or from remote sensing information can have such a format. When creating a timeseries
object the start hour should be set to zero (0) internally to achieve a correct assignment of the hours (01:00 h is the
end of the period 00:00 h - 01:00 h => data for this period starts at 00:00 h). For the output one can be customized as
shown below in the answer. The python built-in module datetime can help here.
Question
I have hourly measurements where hour 1 represents the end of the period 0:00-1:00, 2 the end of the period 1:00-2:00,
... , 24 the end of the period 23:00 to 24:00.
When I plot these hourly time series from February to November the curve is continued into December because of
that matter. time series then assumes that the value for hour 0:00 of dec, 01 is 0 which then leads do a wrong plotting
behaviour.
I want to achieve that hour 24 is accounted as the last measurement period of a day and not as the first measurement
of the next day (like 0:00).
Answer
Since the time “24:00” doesn’t actually exist (as far as I am aware anyway), you will have to rely on somewhat of a
hack to get your desired output. Try this:
#!python
import timeseries as ts
series = ts.time_series(range(400, 430), start_date=ts.now('hourly'))
hours = ts.time_series(series.hour + 1, dates=series.dates)
hour_fmtfunc = lambda x : '%i:00' % x
ts.Report(hours, series, datefmt='%d-%b-%Y', delim=' ', fmtfunc=[None hour_fmtfunc,
˓→])()
Question
One has to get rainfall intensity during early morning hours. For such a filter the information on the corresponding
hours are neccessary.
Answer
import timeseries as ts
data = ts.time_series(range(100), start_date=ts.today('hourly'))
hours = data.hour
filtered_data = data[(hours < 7) & (hours > 3)]
filtered_data
timeseries([80 6 7 8 30 31 32 54 55 56 78 79],
dates = [07-Jan-2008 04:00 07-Jan-2008 05:00 07-Jan-2008 06:00
08-Jan-2008 04:00 08-Jan-2008 05:00 08-Jan-2008 06:00 09-Jan-2008 04:00
09-Jan-2008 05:00 09-Jan-2008 06:00 10-Jan-2008 04:00 10-Jan-2008 05:00
10-Jan-2008 06:00],
freq = H)
Question
How can one save the read the result of time series operations into a array?
For instance, if I convert data in an hourly frequency to daily averages how to I read the daily averages into a array for
further processing?
when I print out my daily timeseries converted from hourly data I get something like this:
#!python
In: myvalues_ts_daily
Out:
timeseries([ 1.4 89.4 3.5 ..., 11.5 1.6
0. ],
dates = [01-Dec-2006 01-Feb-1995 ...],
freq = D)
What I would like is an array with just the values of the daily averages . Additional a report-like array output with the
format day value
1 3
2 11
Answer
> For instance, if I convert data in an hourly frequency to daily averages > > how to I read the daily averages into a
array for further processing?
1. possibility #1: use the keyword func while converting.
1. possibility #2:
If you don't use the keyword func, you end up with a 2d array, each row being a day,
˓→each column an hour. Just use maskedarray.mean on each row avgdata = convert(data,'D
˓→').mean(-1)
If you only want the values, use the .series attribute, it will give you a view of
˓→the array as a MaskedArray.
== Plotting ==
Word of caution... the timeseries plotting stuff does not currently support
˓→frequencies higher than daily (eg. hourly, minutely, etc...). Support for these
˓→frequencies could be added without too much trouble, but just haven't got around to
˓→gmane.comp.python.scientific.user/14598)
== todo ==
* Use one data set consistently for the examples
* offer the code for download
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.
To use the default endian and size settings, one can just do::
>>> f = FortranFile('filename')
>>> x = f.readReals()
import struct
import numpy
class FortranFile(file):
"""File with methods for dealing with fortran unformatted data files"""
def _get_header_length(self):
return struct.calcsize(self._header_prec)
_header_length = property(fget=_get_header_length)
def _set_endian(self,c):
"""Set endian to big (c='>') or little (c='<') or native (c='@')
:Parameters:
`c` : string
The endian-ness to use when reading from this file.
"""
if c in '<>@=':
self._endian = c
else:
raise ValueError('Cannot set endian-ness')
def _get_endian(self):
return self._endian
ENDIAN = property(fset=_set_endian,
fget=_get_endian,
doc="Possible endian values are '<', '>', '@', '='"
)
"""
file.__init__(self, fname, *args, **kwargs)
self.ENDIAN = endian
self.HEADER_PREC = header_prec
def _read_check(self):
return struct.unpack(self.ENDIAN+self.HEADER_PREC,
self._read_exactly(self._header_length)
)[0]
def readRecord(self):
"""Read a single fortran record"""
l = self._read_check()
data_str = self._read_exactly(l)
check_size = self._read_check()
if check_size != l:
raise IOError('Error reading record from data file')
return data_str
def writeRecord(self,s):
"""Write a record with the given bytes.
"""
length_bytes = len(s)
self._write_check(length_bytes)
self.write(s)
self._write_check(length_bytes)
def readString(self):
"""Read a string."""
return self.readRecord()
def writeString(self,s):
"""Write a string
"""
self.writeRecord(s)
_real_precisions = 'df'
"""
data_str = self.readRecord()
num = len(data_str)/struct.calcsize(prec)
numbers =struct.unpack(self.ENDIAN+str(num)+prec,data_str)
return numpy.array(numbers, dtype=_numpy_precisions[prec])
reals : array
Data to write
prec` : string
Character code for the precision to use in writing
"""
if prec not in self._real_precisions:
raise ValueError('Not an appropriate precision')
_int_precisions = 'hilq'
"""
if prec not in self._int_precisions:
raise ValueError('Not an appropriate precision')
data_str = self.readRecord()
num = len(data_str)/struct.calcsize(prec)
return numpy.array(struct.unpack(self.ENDIAN+str(num)+prec,data_str))
reals : array
Data to write
prec : string
Character code for the precision to use in writing
"""
if prec not in self._int_precisions:
raise ValueError('Not an appropriate precision')
dbase
import sys
sys.path.append('attachments/dbase')
import dbase
After running the class you can load the example data using
In the above command ‘0’ is the index of the column containing dates.
You can plot series ‘b’ and ‘c’ in the file using
data.dataplot('b','c')
You get descriptive statistics for series ‘a’,’b’, and ‘c’ by using
data.info('a','b','c')
file: /mnt/data/pauli/prj/scipy/SciPy-CookBook/ipython/
˓→attachments/dbase/data.csv
# obs: 100
# variables: 3
Start date: 08 Jan 2001
End date: 02 Dec 2002
Attachments
• data.csv
• dbase.py
• dbase_pydoc.0.2.txt
• ex_plot.0.1.png
xplt
This shows a simple example of how to create a quick 3-d surface visualization using xplt.
x,y = ogrid[-12:12:50j,-12:12:50j]
r = sqrt(x**2+y**2)
z = special.j0(r)
xplt.surf(z,x,y,shade=1,palette='heat')
Attachments
• surface.png
Data Acquisition with NIDAQmx Data acquisition with PyUL Fortran I/O Formats Input and output LAS reader
Reading SPE file from CCD camera Reading mat files hdf5 in Matlab
483
SciPy Cookbook, Release
C Extensions for Using NumPy Arrays C extensions Compiling Extension Modules on Windows using mingw Ctypes
F2py Inline Weave With Basic Array Conversion (no Blitz) Pyrex And NumPy SWIG Numpy examples SWIG and
Numpy SWIG memory deallocation f2py and numpy
485
SciPy Cookbook, Release
Interpolation
487
SciPy Cookbook, Release
Linear Algebra
489
SciPy Cookbook, Release
Matplotlib / 3D Plotting
491
SciPy Cookbook, Release
Embedding In Wx Embedding in Traits GUI Matplotlib: drag’n’drop text example Matplotlib: pyside Matplotlib:
scrolling plot
493
SciPy Cookbook, Release
Matplotlib / Misc
Load image Matplotlib: adjusting image size Matplotlib: compiling matplotlib on solaris10 Matplotlib: deleting an
existing data series Matplotlib: django Matplotlib: interactive plotting Matplotlib: matplotlib and zope Matplotlib:
multiple subplots with one axis label Matplotlib: qt with ipython and designer Matplotlib: using matplotlib in a CGI
script
495
SciPy Cookbook, Release
Matplotlib: colormap transformations Matplotlib: converting a matrix to a raster image Matplotlib: gridding ir-
regularly spaced data Matplotlib: loading a colormap dynamically Matplotlib: plotting images with special values
Matplotlib: show colormaps
497
SciPy Cookbook, Release
Histograms Matplotlib: animations Matplotlib: arrows Matplotlib: bar charts Matplotlib: custom log labels Mat-
plotlib: hint on diagrams Matplotlib: legend Matplotlib: maps Matplotlib: multicolored line Matplotlib: multiline
plots Matplotlib: plotting values with masked arrays Matplotlib: shaded regions Matplotlib: sigmoidal functions
Matplotlib: thick axes Matplotlib: transformations Matplotlib: treemap Matplotlib: unfilled histograms
499
SciPy Cookbook, Release
Matplotlib / Typesetting
501
SciPy Cookbook, Release
Mayavi
Mayabi: mlab Mayavi Mayavi surf Mayavi tips Mayavi tvtk Mayavi: examples Mayavi: running mayavi 2 Scripting
Mayavi 2 Scripting Mayavi 2: basic modules Scripting Mayavi 2: filters Scripting Mayavi 2: main modules
503
SciPy Cookbook, Release
Numpy
Addressing Array Columns by Name Building arrays Convolution-like operations Indexing numpy arrays MetaArray
Multidot Object arrays using record arrays Stride tricks for the Game of Life Views versus copies in NumPy accumar-
ray like function
505
SciPy Cookbook, Release
Fitting data Large-scale bundle adjustment in scipy Least squares circle Linear regression OLS Optimization and fit
demo Optimization demo RANSAC Robust nonlinear regression in scipy
507
SciPy Cookbook, Release
Coupled spring-mass system Korteweg de Vries equation Matplotlib: lotka volterra tutorial Modeling a Zombie Apoc-
alypse Solving a discrete boundary-value problem in scipy Theoretical ecology: Hastings and Powell
509
SciPy Cookbook, Release
Other examples
Brownian Motion Correlated Random Samples Easy multithreading Eye Diagram Finding the Convex Hull of a 2-D
Dataset Finding the minimum point in the convex hull of a finite set of points KDTree example Line Integral Convo-
lution Linear classification Particle filter Reading custom text files with Pyparsing Rebinning Solving large Markov
Chains Watershed
511
SciPy Cookbook, Release
Performance
A beginners guide to using Python for performance computing Parallel Programming with numpy and scipy
513
SciPy Cookbook, Release
Root finding
515
SciPy Cookbook, Release
Scientific GUIs
517
SciPy Cookbook, Release
Scientific Scripts
519
SciPy Cookbook, Release
Signal processing
Applying a FIR filter Butterworth Bandpass Communication theory FIR filter Filtfilt Frequency swept signals Kalman
filtering Savitzky Golay Filtering Smoothing of a 1D signal
521
SciPy Cookbook, Release
Outdated
A numerical agnostic pyrex class Array, struct, and Pyrex Data Frames Mayavi: Install python stuff from source
Python Imaging Library Recipes for timeseries The FortranFile class dbase xplt
523