SciPy Programming Succinctly
SciPy Programming Succinctly
SciPy Programming
Succinctly
By
James McCaffrey
2
Copyright © 2016 by Syncfusion, Inc.
If you obtained this book from any other source, please register and download a free copy from
www.syncfusion.com.
The authors and copyright holders provide absolutely no warranty for any information provided.
The authors and copyright holders shall not be liable for any claim, damages, or any other
liability arising from, out of, or in connection with the information in this book.
Please do not use this book if the listed terms are unacceptable.
3
The World's Best
4.6 out of
5 stars
UI Component Suite
for Building
Powerful Apps
Laptop: 56%
Orders
Online Orders offline Orders Total users
Products 23456 345 945 65 9789 95
Analytics
Sales Overview Monthly
S M T W T F S
Message
26 27 28 29 30 31 1
Accessories: 19% Mobile: 25%
2 3 4 5 6 7 8 $51,456
OTHER
9 10 11 12 13 14 15 Laptop Mobile Accessories
16 17 18 19 20 21 22 Users
23 24 25 26 27 28 29 Top Sale Products
Teams Cash
30 31 1 2 3 4 5
Setting Apple iPhone 13 Pro $999.00 $1500
Order Delivery Stats
Mobile +12.8%
100K
Completed
120 Apple Macbook Pro $1299.00 50K
In Progress
Invoices New Invoice Laptop +32.8%
25K
24
Order id Date Client name Amount Status Galaxy S22 Ultra $499.99 0
Mobile +22.8% 10 May 11 May 12 May Today
Log Out #1208 Jan 21, 2022 Olive Yew $1,534.00 Completed
syncfusion.com/communitylicense
desktop platforms
20+ years in
Chapter 4 Combinatorics.......................................................................................................73
4
4.4 Combinations .................................................................................................................86
5
The Story behind the Succinctly Series
of Books
Daniel Jebaraj, Vice President
Syncfusion, Inc.
taying on the cutting edge
S As many of you may know, Syncfusion is a provider of software components for the
Microsoft platform. This puts us in the exciting but challenging position of always
being on the cutting edge.
Whenever platforms or tools are shipping out of Microsoft, which seems to be about
every other week these days, we have to educate ourselves, quickly.
While more information is becoming available on the Internet and more and more books are
being published, even on topics that are relatively new, one aspect that continues to inhibit us is
the inability to find concise technology overview books.
We are usually faced with two options: read several 500+ page books or scour the web for
relevant blog posts and other articles. Just as everyone else who has a job to do and customers
to serve, we find this quite frustrating.
We firmly believe, given the background knowledge such developers have, that most topics can
be translated into books that are between 50 and 100 pages.
This is exactly what we resolved to accomplish with the Succinctly series. Isn’t everything
wonderful born out of a deep desire to change things for the better?
6
Free forever
Syncfusion will be working to produce books on several topics. The books will always be free.
Any updates we publish will also be free.
As a component vendor, our unique claim has always been that we offer deeper and broader
frameworks than anyone else on the market. Developer education greatly helps us market and
sell against competing vendors who promise to “enable AJAX support with one click,” or “turn
the moon to cheese!”
We sincerely hope you enjoy reading this book and that it helps you better understand the topic
of study. Thank you for reading.
7
About the Author
James McCaffrey works for Microsoft Research in Redmond, WA. He holds a B.A. in
psychology from the University of California at Irvine, a B.A. in applied mathematics from
California State University at Fullerton, an M.S. in information systems from Hawaii Pacific
University, and a doctorate from the University of Southern California. James enjoys exploring
all forms of activity that involve human interaction and combinatorial mathematics, such as the
analysis of betting behavior associated with professional sports, machine learning algorithms,
and data mining.
8
Chapter 1 Getting Started
The SciPy library (Scientific Python, pronounced "sigh-pie") is an open source extension to the
Python language. When Python was first released in 1991, the language omitted an array data
structure by design. It quickly became apparent that an array type and functions that operate on
arrays would be needed for numeric and scientific computing.
The SciPy stack has three components: Python, NumPy, and SciPy. The Python language has
basic features, such as loop control statements and a general purpose list data structure. The
NumPy library (Numerical Python) has array and matrix data structures plus some relatively
simple functions such as array search. The SciPy library, which requires NumPy, has many
intermediate and advanced functions that work with arrays and matrices. There is some overlap
between SciPy and NumPy, meaning there are some functions that are in both libraries.
When SciPy was first released in 2001, it contained built-in array and matrix types. In 2006, the
array and matrix functionality from SciPy was moved into a newly created NumPy library so that
programmers who needed just an array type didn't have to import the entire SciPy library.
Because of the dependency, the term SciPy also refers to NumPy.
This e-book makes no assumptions about your background or experience. Even if you have no
Python programming experience at all, you should be able to follow along with a bit of effort.
Each section of this e-book presents a complete demo program. Every programmer I know,
including me, learns how to program in a new language by getting an example program up and
running, and then experimenting by making changes. So if you want to learn SciPy, copy and
paste the source code from the demo programs, run the programs, and then fiddle with the
programs. Find the code samples in Syncfusion’s Bitbucket repository.
The approach I take in this e-book is not to present hundreds of one-line SciPy examples.
Instead, I've tried to pick key examples that give you the knowledge you need to learn SciPy
quickly. For example, section 5.4 explains how the normal() function generates random
values. Once you understand the normal() function, you can easily figure out how to use the
35 other distribution functions, such as the poisson() and exponential() functions.
In my opinion, the most difficult part of learning any programming language or technology is
getting a first program to run. After that, it's just details. But getting started can be frustrating.
The purpose of this first chapter is to make sure you can install SciPy and run a program.
In section 1.1, you'll learn how to install the software required to access the SciPy library. In
particular, you'll see how to install the Anaconda distribution, which includes Python, SciPy,
NumPy, and many related and useful packages. You'll also learn how to install SciPy separately
if you have an existing instance of Python installed. In section 1.2, you'll learn how to edit and
execute Python programs that use the SciPy and NumPy libraries. In section 1.3, you'll learn a
bit about program structure and style when using SciPy and NumPy. Section 1.4 presents a
quick reference for NumPy and SciPy.
9
1.1 Installing SciPy and NumPy
It's no secret that the best way to learn a programming language, library, or technology is to use
it. Unlike the installation process for many Python libraries, installing SciPy is not trivial. Briefly,
the crux of the difficulty is that SciPy and NumPy contain hooks to C language routines.
It is possible to first install Python, and then install the SciPy and NumPy packages separately
from source code using the pip (PIP Installs Packages) utility program, but this approach can be
troublesome. I recommend that you either use the Anaconda distribution bundle or, if you install
Python, NumPy, and SciPy separately, that you use a pre-built binary installer for NumPy and
SciPy.
Note: The terms package, module, and library have different meanings but are often used
more or less interchangeably.
There are several advantages to using Anaconda. There are binary installers for Windows, OS
X, and Linux. The distribution comes with Python, NumPy, and SciPy, as well as many other
related packages. This means you have one relatively easy installation procedure. The
distribution comes with the conda open source package and environment manager, which
means you can work with multiple versions of Python. In other words, even if you already have
Python installed, Anaconda will let you work with a new Python + SciPyPy installation without
resource conflicts. Anaconda also comes with two nice Python editors, IDLE and Spyder.
The open source Anaconda distribution is maintained by the Continuum Analytics company at
http://www.continuum.io/. Let's walk through the installation process, step by step. I'll show you
screenshots for a Windows installation, but you should have little trouble installing on OS X or
any flavor of Linux. First, use your web browser of choice to go to the Continuum Analytics site,
and then locate the download link and click on it.
10
Next, locate the link to your appropriate operating system and click on it.
At this point you must choose between Python version 2.x and Python version 3.x. If you're new
to Python, the essential point is that the two versions are not fully compatible. Python users can
have strong opinions about which Python version they prefer, but for use with SciPy, I
recommend using Python 2.7 in order to maintain compatibility with older functions.
After selecting the Python version, you should see a message asking if you want to save the
self-extracting executable installer, or if you want to run the installer immediately. You can do
either. I chose the Run option.
The installation process begins by displaying a welcome splash screen. Notice that the
Anaconda distribution number (2.4.1 in this case) is not the same as the Python version number
(2.7).
11
Figure 4: The Welcome Splash
After clicking Next, you'll be presented with a license agreement, which you can read if you're a
glutton for legal jargon punishment. Click I Agree.
12
Next, you'll have the option of installing for all users or just for the current user (presumably
you). I suggest using Anaconda's recommendation.
Then, you'll need to specify the installation root directory. With open source software such as
Python, it's normal to install programs in a directory located off drive C rather than in the
C:\Program Files directory. I recommend installing at C:\Anaconda2.
13
Figure 7: The Installation Directory
Next, you'll get an option to add the locations of the Anaconda executables to the System PATH
variable, and an option to register the Anaconda Python as the default. Select both check boxes
and click Install.
14
Figure 8: The PATH and Integration Options
You'll see a progress bar during the installation process. Notice that NumPy and SciPy are
included in the installation components.
When the installation is complete, you'll see an "Installation Complete" message. If there are
any errors during the installation, they'll appear here. If so, you can read the error messages, fix
whatever is wrong, delete the root installation directory, and try again.
15
Figure 10: Installation Completed
After you click Next, you'll see a final completion confirmation message. You can click Finish.
The last step of the Anaconda installation process is to verify that your installation is working.
First, verify that Python is up and running. Launch a command shell and navigate to the root
directory by entering a cd \ command. Then type the command:
16
C:\> python --version
If Python responds with information about a version, then Python is almost certainly installed
correctly, but you should now verify this by executing a Python statement. At the command
prompt, enter the command python (I've included a space after the prompt for readability):
C:\> python
This will start the Python interpreter, which will be indicated by the three greater-than characters
prompt. Instruct Python to print a traditional first message:
Finally, verify that NumPy is installed correctly by creating and manipulating an array. Enter the
following commands at the Python prompt:
The import statement brings the NumPy library into scope so it can be used. The np alias could
have been something else, but np is customary and good.
17
The statement a = np.array([4, 6, 8]) creates an array named a with three cells with
integer values 4, 6, and 8. The Python type() function tells you that a is, in fact, an array
(technically an ndarray, which stands for an n-dimensional array).
The statement a[0] = 7 sets the value in the first cell of the array to 7, overwriting the original
value of 2. The point here is that NumPy arrays, like those in most languages, use 0-based
indexing. Congratulations! You have all the software you need to explore SciPy and NumPy.
That site has different versions of both NumPy and SciPy. Go into the directory of the version
you wish to install. I recommend using a recent version that has the most downloads. Go into a
version directory, and then look for a file named something like numpy-1.10.2-win32-superpack-
python2.7.exe.
18
Make sure you have the version that corresponds to your Python version, then click on the link
and you'll get the option to run the installer.
You'll have the option to either run the installer program immediately, or save the installer so
you can run it later. I usually choose the Run option.
After you click Run, the installer will launch and present you with an installation wizard. Click
Next.
The installer should find your existing Python installation and recommend an installation
directory for the NumPy library.
19
Figure 16: The NumPy Installer Finds Existing Python Installation
Click Next on the next few wizard pages and you'll complete the NumPy installation. You can
verify NumPy was installed by launching a Python shell and entering the command import
numpy. If no error message results, NumPy has been installed.
Now you can install the SciPy library from SourceForge using the exact same process.
Resources
20
1.2 Editing SciPy programs
Although Python and SciPy can be used interactively, for many scenarios you'll want to write
and execute a program (technically a script). If you have installed the Anaconda distribution, you
have three main ways to edit and execute a Python program. First, you can use any simple text
editor, such as Notepad, and execute from a command line. Second, you can edit and execute
programs using the IDLE (Integrated DeveLopment Environment) program. Third, you can edit
and execute using the Spyder program. I'll walk you through each approach.
# test.py
import numpy as np
import scipy as sp
a = np.array([2, 4, 6, 8])
print a
length = a.size # 4
a[length-1] = 9
print a
Launch Notepad and type or copy and paste the statements shown in Code Listing 1. Save the
program as test.py in any directory, such as C:\SciPy. If you use Notepad, be sure it doesn't add
an extra .txt extension to the file name.
21
Launch a Command Prompt (Windows) or command shell such as bash (Linux). Navigate to the
directory containing file test.py. Execute the program by entering the command:
Using Notepad as an editor and executing from a shell is simple and effective, but I recommend
using either IDLE or Spyder. The idle.bat launcher file is typically located by default in the
C:\Python27\Lib\idelib directory. To start the IDLE program, launch a command shell, navigate
to the location of the .bat file if that directory is not in your PATH variable, and enter the
command idle.
This will start a special Python Shell as shown in the top part of Figure 19.
From the Python Shell menu bar, click File > New File. This will launch a similar-looking editor,
as shown in the bottom part of Figure 19. Now, type or copy and paste the program in Code
Listing 1 into the IDLE editor. Save the program as test.py in any convenient directory using File
> Save. Execute the program by clicking Run > Run Module, or pressing the F5 shortcut key.
22
Program output is displayed in the Python Shell window. Some experienced Python users
criticize IDLE for being too simple and lacking sophisticated editing and debugging features, but
I like IDLE a lot and it's my SciPy programming environment of choice in most situations.
The Anaconda distribution comes with the open source Spyder (Scientific PYthon Development
EnviRonment) program. To start Spyder, launch a command shell and enter:
> spyder
Type or copy and paste the program from Code Listing 1 into the Spyder editor window on the
left side. You can either save first using File > Save or execute immediately by clicking Run >
Run. Program output appears in the lower right window.
Resources
If you use Visual Studio, consider the Python Tools for Visual Studio (PTVS) plugin at
http://microsoft.github.io/PTVS/.
If you use the Eclipse IDE, you might want to take a look at the PyDev plugin at
http://www.pydev.org/.
23
1.3 Program structure
Because the Python language is so flexible, there are many ways to structure a program. Some
experienced Python programmers have strong opinions about what constitutes good Python
program structure. Other programmers, like me, believe that there's no single best program
structure suitable for all situations.
Take a look at the demo program in Code Listing 2. The program begins with comments
indicating the program file name and Python version. Because the Python 2.x and Python 3.x
versions are not fully compatible, it's a good idea to indicate which version your program is
using. If you are using Linux, you can optionally use a shebang like #!/usr/bin/env python
as the very first statement.
# structure.py
# Python 2.7
import numpy as np
def make_x(n):
result = np.zeros((n,n))
for i in xrange(n):
for j in xrange(n):
if i == j or (i + j == n-1):
result[i,j] = 1.0
return result
def main():
print "\nBegin program structure demo \n"
try:
n = 5
print "X matrix with size n = " + str(n) + " is "
mx = make_x(n)
print mx
print ""
n = -1
print "X matrix with size n = " + str(n) + " is "
mx = make_x(n)
print mx
print ""
except Exception, e:
print "Error: " + str(e)
if __name__ == "__main__":
main()
24
C:\SciPy\Ch1> python structure.py
Begin program structure demo
End demo
Next, the demo program imports the NumPy library and assigns a short alias:
import numpy as np
This idiom is standard for NumPy and SciPy programming and I recommend that you use it
unless you have a specific reason for not doing so. Next, the demo creates a program-defined
function named make_x():
def make_x(n):
result = np.zeros((n,n))
for i in xrange(n):
for j in xrange(n):
if i == j or (i + j == n-1):
result[i,j] = 1.0
return result
The make_x() function accepts a matrix dimension parameter n (presumably an odd integer)
and returns a NumPy matrix with 1.0 values on the main diagonal (upper-left cell to lower-right
cell) and the minor diagonal, and 0.0 values elsewhere.
The demo uses an indentation of two spaces instead of the widely recommended four spaces. I
use two-space indentation throughout this e-book mostly to save space, but to be honest, I
prefer using two spaces, anyway.
The demo program defines a main() function that is the execution entry point:
def main():
print "\nBegin program structure demo \n"
# rest of calling statements here
print "\nEnd demo \n"
if __name__ == "__main__":
main()
25
The program-defined main() function is called using the __main__ mechanism (note: there are
two underscore characters before and after the word main). Defining a main() function has
several advantages compared to simply placing the program's calling statements after import
statements and function definitions.
The primary downside to using a main() function in your program is simply the extra time and
space it takes you to write the program. Throughout the rest of this e-book, I do not use a
main() function, just to save space.
By default, when the Python interpreter reads a source .py file, it will execute all statements in
the file. However, just before beginning execution, the interpreter sets a system __name__
variable to the value __main__ for the source file that started execution. The value of the
__name__ variable for any other module that is called is set to the name of the module.
In other words, the interpreter knows which program or module is the main one that started
execution and will execute just the statements in that program or module. Put another way,
Python modules that don't have an if __name__ == "__main__" statement will not be
automatically executed. This mechanism allows you to write Python code and then import that
code into another module. In effect, this allows you to write library modules.
Additionally, by using a main() function, you can avoid program-defined variable and function
names clashing with Python system names and keywords. Finally, using a main() function
gives you more control over control flow if you use the try-except error handling mechanism.
The demo program uses double quote characters to delimit strings. Unlike some other
languages, Python recognizes no semantic difference between single quotes and double
quotes. In particular, Python does not have a character data type, so both "c" and 'c'
represent a string with a single character.
The demo program uses the try-except mechanism (that is, a try statement followed by an
except statement). Using try-except is particularly useful when you are writing new code, but
the downside is additional time and lines of code. The demo programs in the remainder of this
e-book do not use try-except in order to save space.
Resources
The more or less official Python style guide is PEP 0008 (Python Enhancements Proposal #8).
See https://www.python.org/dev/peps/pep-0008/.
Many Python programmers use the Google Python Style Guide. See
https://google.github.io/styleguide/pyguide.html.
For additional details about the Python try and except statements and error handling, see
https://docs.python.org/2/tutorial/errors.html.
For a discussion of the pros and cons of using a shebang in Linux environments, see
https://en.wikipedia.org/wiki/Shebang_(Unix).
26
1.4 Quick reference program
The program in Code Listing 3 is a quick reference for many of the NumPy and SciPy functions
and programming techniques that are presented in this e-book.
@staticmethod
def my_fact(n): # static method
result = 1 # iterative rather than recursive
for i in xrange(1, n+1): # recursion supported in Python
result *= i # but usually not a good idea
return result
# ----------------------------------
def show_matrix(m, decimals): # standalone function
(rows, cols) = np.shape(m) # matrix dimensions as tuple
for i in rows: # traverse a matrix
for j in cols:
print "%." + str(dec) % m[i,j]
print ""
# ----------------------------------
27
arr_r = arr_s[::-1] # reverse: [3.0, 2.0, 1.0, 0.0]
28
Chapter 2 Arrays
Many of my colleagues, when they first started using Python, were surprised to learn that the
language does not have a built-in array data structure. The Python list data structure is versatile
enough to handle many programming scenarios, but for scientific and numeric programming,
arrays and matrices are needed. The most fundamental object in the SciPy and NumPy libraries
is the array data structure. The following screenshot shows you where this chapter is headed.
In section 2.1, you'll learn the most common ways to create and initialize NumPy arrays, and
learn about the various numeric data types supported by NumPy.
In section 2.2, you'll learn how to search an array for a target value using the where() function,
using the in keyword, and by using a program-defined function.
In section 2.3, you'll learn how to sort a NumPy array using the three different built-in sorting
algorithms (quicksort, merge sort, and heap sort). You'll also learn about the NumPy array
reference mechanism.
29
In section 2.4, you'll learn how to randomize an array using the NumPy shuffle() function and
how to randomize an array using a program-defined function and the Fisher-Yates algorithm.
Take a look at the demo program in Code Listing 4. After two preliminary print statements,
program execution begins by creating an array using hard-coded numeric values:
The NumPy array() function accepts a Python list (as indicated by the square brackets) and
returns an array containing the list values. Notice the decimal points. These tell the interpreter to
cast the cell values as float64, the default floating-point data type for arrays. Without the
decimals, the interpreter would cast the values to int32, the default integer type for arrays.
# arrays.py
# Python 2.7
import numpy as np
# =====
print "Creating array arr using np.array() and list with hard-coded values "
arr = np.array([1., 3., 5., 7., 9.]) # float64
dt = np.dtype(arr[0])
print "Cell element type is " + str(dt.name)
print ""
30
print "Creating int array arr using np.arange(9) "
arr = np.arange(9) # [0, 1, . . 8] # int32
print "Printing array arr using built-in print() "
print arr
print ""
cols = 4; dec = 0
print "Printing array arr using my_print() with cols=" + str(cols),
print "and dec=" + str(dec)
my_print(arr, cols, dec)
Creating array arr using np.array() and list with hard-coded values
Cell element type is float64
31
Printing array arr using built-in print()
[ 2. 2.6 3.2 3.8 4.4 5. ]
End demo
If you are creating an array and neither float64 nor int32 is appropriate, you can make the
data type explicit. For example:
NumPy has four floating-point data types: float_, float16, float32, and float64. The
default floating-point type is float64: a signed value with an 11-bit exponent and a 52-bit
mantissa. NumPy also supports complex numbers.
NumPy has 11 integer data types, including int32, int64, and uint64. The default integer data
type is int32 (that is, a signed 32-bit integer with possible values between -2,147,483,648 and
+2,147,483,647).
You can also create arrays with string values and with Boolean values. For example:
After creating the array, the demo displays the array values using the built-in print statement:
The Python 2.7 print statement is simple and effective for displaying NumPy arrays in most
situations. If you need to customize the output format, you can use the NumPy
set_printoptions() function or write a program-defined display function.
Next, the demo program creates and initializes an array using the NumPy arange() function:
arr = np.arange(9)
print "Printing array arr using built-in print() "
print arr # displays [0 1 2 3 4 5 6 7 8]
A call to arange(n) returns an int32 array with sequential values 0, 1, 2,… (n-1). Note that the
NumPy arange() function (the name stands for array-range, and is not a misspelling of the
word arrange) is quite different from the Python range() function, which returns a list of integer
values, and the Python xrange() function, which returns an iterator object that can be used to
traverse a list or an array.
Next, the demo program displays the array generated by the arange() function, using a
program-defined function named my_print():
32
cols = 4; dec = 0
print "Printing array arr using my_print() with cols=" + str(cols),
print "and dec=" + str(dec)
my_print(arr, cols, dec)
The custom function displays an array in a specified number of columns (4 here), using a
specified number of decimal places (0 here because the values are integers).
If you are new to Python, you might be puzzled by the trailing comma character after the first
print statement. This syntax is used to print without a newline character and is similar to the
C# Console.Write() method (as opposed to the WriteLine() method) or the Java
System.out.print() method (as opposed to the println() method).
The function first finds the number of cells in the array using the Python len() function. An
alternative is to use the more efficient NumPy size property:
n = arr.size
Note that size has no parentheses after it because it's a property, not a function. The
my_print() function iterates through the array using traditional array indexing:
for i in xrange(n):
Using this technique, a cell value in array arr is accessed as arr[i]. An alternative is to iterate
over the array like so:
for x in arr
Here, x is a cell value. This technique is similar to using a "for-each" loop in other languages
such as C#. In most situations, I prefer using array indexing to "for-eaching" but most of my
colleagues prefer the "for x in arr" syntax.
Next, the demo program creates an array using the NumPy zeros() function:
arr = np.zeros(5)
print "Printing array arr using built-in print() "
print arr
33
Based on my experience, using the zeros() function is perhaps the most common way to
create a NumPy array. As the name suggests, a call to zeros(n) creates an array with n cells
and initializes each cell to a 0.0 value. The default element value is float64, so if you want an
integer array initialized to 0 values, you'd have to supply the dtype parameter value to zeros()
like so:
A closely related NumPy function is ones(), which initializes an array to all 1.0 (or integer 1)
values.
The demo concludes by creating and initializing an array using the NumPy linspace()
function:
A call to linspace(start, stop, num) returns an array that has num cells with values evenly
spaced between start and stop, inclusive. The demo call np.linspace(2., 5., 6) returns
an array of six float64 values starting with 2.0 and ending with 5.0 (2.0, 2.6, 3.2, 3.8, 4.4, and
5.0).
Note that almost all Python and NumPy functions that accept start and stop parameters return
values in [start, stop), that is, between start inclusive and stop exclusive. The NumPy
linspace() function is an exception.
There are many other NumPy functions that can create arrays, but the array() and zeros()
functions can handle most programming scenarios. And you can always create specialized
arrays using a program-defined function. For example, suppose you needed to create an array
of the first n odd integers. You could define:
def my_odds(n):
result = np.zeros(n, dtype=np.int32)
v = 1
for i in xrange(n):
result[i] = v
v += 2
return result
And then you could create an array holding the first four odd integers with a call:
arr = my_odds(4)
A task that is closely related to creating NumPy arrays is copying an existing array. The NumPy
copy() function can do this, and is described in detail later in this e-book.
Resources
34
For additional information about NumPy numeric array initialization functions, see
http://docs.scipy.org/doc/numpy-1.10.1/user/basics.creation.html.
# searching.py
# Python 2.7
import numpy as np
# =====
target = 5.0
print "Target value is "
print target
print ""
35
print "Search result using my_search() = "
print idx
print ""
Array arr is
[ 7. 9. 5. 1. 5. 8.]
Target value is
5.0
End demo
The demo program begins with creating an array and a target value to search for:
target = 5.0
print "Target value is "
print target
Next, the demo program searches the array for the target value using the Python in keyword:
The return result from a call to target in arr is Boolean, either True or False. Nice and
simple. However, using this syntax for searching an array of floating-point values is not really a
good idea. The problem is that comparing two floating-point values for exact equality is very
tricky. For example:
36
>>> x = 0.15 + 0.15
>>> y = 0.20 + 0.10
>>> 'yes' if x == y else 'no'
'no'
>>> # what the heck?!
When comparing two floating-point values for equality, you should usually not compare for exact
equality; instead, you should check if the two values are very, very close to each other.
Floating-point values stored in memory are sometimes just close approximations to their true
values, so comparing two floating-point values for exact equality can give unexpected results.
The target in arr syntax doesn't give you any direct way to control how the target value is
compared to the values in the array. Note that this problem with checking for exact equality
doesn't exist for integer arrays (or string arrays or Boolean arrays), so the target in arr
syntax is fine for those.
The target in arr syntax does work properly in the demo program, returning a correct result
of True. Next, the demo program searches using the NumPy where() function:
The where() function returns a tuple (as indicated by the parentheses) containing an array. The
array holds the indices in the searched array where the target value occurs, cells 2 and 4 in this
case. If you search for a target value that is not in the array, the return result is a tuple with an
array with length 0:
(array([], dtype=int64),)
Therefore, if you just want to know if a target value is in an array, you can check the return value
along the line of:
if len(result[0]) == 0:
print "target not found in array"
else:
print "target is in array"
As is the case with searching using the in keyword, searching an array of floating-point values
using the where() function is not recommended because you cannot control how the cell values
are compared to the target value. But using the where() function with integer, string, and
Boolean arrays is safe and effective.
37
idx = my_search(arr, target, 1.0e-6)
print "Search result using my_search() = "
print idx
The program-defined my_search() function returns -1 if the target value is not found in the
array, or the cell index of the first occurrence of the target if the target is in the array. In this case
the return value is 2 because the target value, 5.0, is in cells [2] and [4] of the array. The third
argument, 1.0e-6, is the tolerance defining how close two floating-point values must be in order
to be considered equal.
The NumPy isclose() function compares two values and returns True if the values are within
eps (this stands for epsilon, the Greek letter often used in mathematics to represent a small
value) of each other.
Instead of using the isclose() function, you can compare directly using either the Python built-
in abs() function or the NumPy fabs() function like so:
In some situations, you may want to find the location of the last occurrence of a target value in
an array. Using the where() function with integer, string, or Boolean arrays, you could write
code like:
if len(result[0]) == 0:
print "-1" # not found
else:
print result[0][len(result[0])-1] # last idx
To find the last occurrence of a target value in a program-defined function, you could traverse
the array from back to front with a for i in xrange(len(a)-1, -1, -1): loop.
38
Resources
For technical details about how NumPy stores arrays in memory, see
http://docs.scipy.org/doc/numpy-1.10.0/reference/internals.html.
For a list of Python built-in functions such as the absolute value, see
https://docs.python.org/2/library/functions.html.
Interestingly, unlike the sorting functions in most other languages, a call to sort(arr) returns a
sorted array, leaving the original array arr unchanged. The sort functions in many programming
languages sort their array argument in place, and do not return a new sorted array. However,
you can sort a NumPy array arr in place if you wish by using the call arr.sort().
# sorting.py
# Python 2.7
import numpy as np
import time
def my_qsort(a):
quick_sorter(a, 0, len(a)-1)
# =====
39
print "\nBegin array sorting demo \n"
arr = np.array([4.0, 3.0, 0.0, 2.0, 1.0, 9.0, 7.0, 6.0, 5.0])
print "Original array is "
print arr
print ""
Original array is
[ 4. 3. 0. 2. 1. 9. 7. 6. 5.]
Calling my_qsort(arr)
Elapsed time =
3.6342481559e-05 seconds
40
End demo
arr = np.array([4.0, 3.0, 0.0, 2.0, 1.0, 9.0, 7.0, 6.0, 5.0])
print "Original array is "
print arr
Next, the demo program sorts the array using the NumPy sort() function:
The sort() function returns a new array with values in order, leaving the original array
unchanged:
It is possible to sort an array in place using either a slightly different syntax or calling pattern:
arr.sort(kind='quicksort')
print arr # arr is sorted
The quicksort algorithm is the default, so the call to sort() could have been written:
s_arr = np.sort(arr)
The other two sorting algorithms could have been called like so:
arr.sort(kind='mergesort')
arr.sort(kind='heapsort')
By default, the sort() function orders array elements from low value to high value. If you want
to sort an array from high value to low, you can't do so directly, but you can use Python slicing
syntax to reverse after sorting (there is no explicit reverse() function for arrays):
41
arr = np.array([4.0, 8.0, 6.0, 5.0])
s_arr = np.sort(arr, kind='quicksort') # s_arr = [4.0 5.0 6.0 8.0]
r_arr = s_arr[::-1] # r_arr = [8.0 6.0 5.0 4.0]
Note that the sort() function has an optional order parameter. However, this parameter
controls the order in which fields are compared when an array has cells holding an object with
multiple fields. So order does not control ascending versus descending sort behavior.
start_time = time.clock()
my_qsort(arr)
end_time = time.clock()
The program-defined my_qsort() function sorts its array argument in place. The demo
measures the approximate amount of time used by my_qsort() by wrapping its call with
time.clock() function calls. Notice the demo program has an import time statement at the
top of the source code to bring the clock() function into scope.
The whole point of using a library like NumPy is that you can use built-in functions like sort()
and so you don't have to write program-defined functions. However, there are some scenarios
where writing a custom version of a NumPy function is useful. In particular, you can customize
the behavior of a program-defined function, usually at the expense of extra time (to write the
function) and performance.
The heart of the quicksort algorithm is the partition() function. A detailed explanation of how
quicksort and partitioning work is outside the scope of this e-book, but the behavior of any
quicksort implementation depends on how the so-called pivot value is selected. The key line of
code in the custom partition() function is:
piv = a[hi]
The pivot value is selected as the last cell value in the current sub-array being processed.
Alternatives are to select the first cell value (piv = a[lo]), the middle cell value, or a randomly
selected cell value between a[lo] and a[hi].
42
Resources
The program-defined quicksort function in this section is based on the Wikipedia article at
https://en.wikipedia.org/wiki/Quicksort.
For additional information about working with the Python time module, see
https://docs.python.org/2/library/time.html.
# shuffling.py
# Python 2.7
import numpy as np
# =====
np.random.shuffle(arr)
print "Array arr after a call to np.random.shuffle(arr) is "
print arr
print ""
43
arr = np.copy(orig)
print "Array arr is "
print arr
print ""
my_shuffle(arr, seed=0)
print "Array arr after call to my_shuffle(arr, seed=0) is "
print arr
print ""
Array arr is
[0 1 2 3 4 5 6 7 8 9]
End demo
The demo program begins by creating an ordered integer array with 10 values (0 through 9)
using the arange() function, and makes a copy of that array using the copy() function:
Next, the demo shuffles the contents of the array using the NumPy random.shuffle()
function:
np.random.shuffle(arr)
print "Array arr after a call to np.random.shuffle(arr) is "
print arr
The random.shuffle() function reorders the contents of its array argument in place to a
random order. In this example, the seed value for the underlying random number generator was
not set, so if you ran the program again, you'd almost certainly get a different ordering of the
array. If you want to make your program results reproducible, which is usually the case, you can
explicitly set the underlying seed value like so:
44
np.random.seed(0)
np.random.shuffle(arr)
Here the seed was arbitrarily set to 0. Next, the demo program resets the array to its original
values using the copy:
It would have been a mistake to use the assignment operator instead of the copy() function in
an effort to make a copy of the original array. For example, suppose you had written this code:
Because array assignment works by reference rather than by value, orig and arr are
essentially pointers that both point to the same array in memory. Any change made to arr, such
as a call to random.shuffle(arr), implicitly affects orig, too. Therefore, an attempt to reset
arr after a call to random.shuffle() would have no effect.
Another important consequence of NumPy arrays being reference objects is that a function with
an array parameter can modify the array in place. You can also create a reference to an array
using the view() function, for example arr_v = arr.view() creates a reference copy of arr.
The demo program concludes by using a program-defined function my_shuffle() to shuffle the
array:
my_shuffle(arr, seed=0)
print "Array arr after call to my_shuffle(arr, seed=0) is "
print arr
Shuffling an array into a random order is surprisingly tricky and it's very easy to write faulty
code. The function my_shuffle() uses what is called the Fisher-Yates algorithm, which is the
best approach in most situations. Notice the function uses the very handy a,b = b,a Python
idiom to swap two values. An alternative is to use the standard tmp=a; a=b; b=tmp idiom
that's required in other programming languages.
45
Resources
46
Chapter 3 Matrices
Matrices are arguably the most important data structure used in numeric and scientific
programming. The following screenshot shows you where this chapter is headed.
In section 3.1, you'll learn the most common ways to create and initialize NumPy matrices, and
learn the differences between the two kinds of matrix data structures supported in NumPy.
In section 3.2, you'll learn how to perform matrix multiplication using the dot() function.
In section 3.3, you'll learn about the three different ways to transpose a matrix.
In section 3.4, you'll learn about the important NumPy and SciPy linalg modules, how to find
the determinant of a matrix using the det() function, and what the determinant is used for.
47
In section 3.5, you'll learn how to create an identity matrix using the eye() function, find the
inverse of a matrix using the linalg.inv() function, and correctly compare two matrices for
equality using the isclose() function.
In section 3.6, you'll learn how to load values into a matrix from a text file using the loadtxt()
function.
# matrices.py
# Python 2.7
import numpy as np
# =====
48
print "N-dimensional array/matrix mc is "
print mc
print ""
msum = ma + mc
print "Result of ma + mc = "
print (msum)
print ""
Matrix ma is
[[ 1. 2. 3.]
[ 4. 5. 6.]]
Matrix mb is
[[0 0]
[0 0]
[0 0]]
N-dimensional array/matrix mc is
[[ 1. 2. 3.]
[ 4. 5. 6.]]
49
4.000 5.000 6.000
Result of ma + mc =
[[ 2. 4. 6.]
[ 8. 10. 12.]]
Matrix md is
[[ 7. 8. 9.]]
Result of ma + md is
[[ 8. 10. 12.]
[ 11. 13. 15.]]
End demo
The demo program begins by creating a matrix using the NumPy matrix() function:
There are two rows, each with three columns, so the matrix has shape 2×3. Because no dtype
argument was specified, each cell of the matrix holds the default float64 data type.
Next, the demo creates a 3×2 matrix using the NumPy zeros() function:
Notice the double sets of parentheses used here as opposed to the single set of parentheses
used to create a simple array. Each cell of matrix mb holds a 32-bit integer. If the dtype
argument had been omitted, each cell would have been the default float64 data type. As you'll
see shortly, matrix mb is actually a NumPy n-dimensional array rather than a NumPy matrix. In
the vast majority of programming situations, you can use either a NumPy 2-dimensional array or
a NumPy matrix. The general terms matrix and matrices can refer to either a NumPy matrix or a
NumPy n-dimensional array.
Matrix mc is a 2×3 n-dimensional array with the same values as explicit matrix ma. Matrix md is a
1×3 matrix. Matrices with one row are often called row matrices. Matrices with one column are
called column matrices. For example:
Row and column matrices are not the same as simple one-dimensional arrays. You can create
a column matrix from a row matrix (or vice versa) using the reshape() function, for example, mm
= np.reshape(md, (3,1)). And you can make a regular array from an ndarray-style matrix
using the flatten() or ravel() functions, for example:
50
aa = np.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]) # a 2x3 ndarray matrix
arr = np.flatten(aa) # arr is an array [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]
After displaying the contents of matrices ma, mb, and mc, the demo displays their object types:
To summarize, when creating a NumPy matrix, the result can be either an explicit matrix (for
example, when using the matrix() function) or an ndarray (for example, when using the
zeros() function). In most cases, you don't have to worry about what the object type is
because the two forms of matrices are usually (but not always) compatible.
Next, the demo displays the contents of matrix ma using the program-defined show_matrix()
function:
The second and third parameters for show_matrix() are the number of decimals to use and
the width to use when displaying each cell value. In situations like this, where there are similar
parameters, it's more readable to use named-parameter syntax like:
The dimensions of the matrix are determined using the NumPy shape() function, which returns
a tuple with the number of rows and columns. An alternative approach is:
rows = len(m)
cols = len(m[0])
A NumPy matrix m is an array of arrays. So len(m) is the number of rows, m[0] is the first row,
and len(m[0]) is the number of cells in the first row, which is the same as the number of
columns (assuming all rows of m have the same number of columns).
The nested for loops iterate over the cells of the matrix from left to right, then top to bottom:
51
for i in xrange(rows):
for j in xrange(cols):
# curr cell is m[i,j]
Interestingly, NumPy allows you to access a matrix cell using either m[i,j] syntax or m[i][j]
syntax. The two forms are completely equivalent. In most cases the m[i,j] form is preferred,
only because it's easier to type.
msum = ma + mc
print "Result of ma + mc = "
print msum
Recall that both ma and mc are 2×3 matrices with values 1.0 through 6.0:
Not surprisingly, the result (where I've added 0s after the decimal points for readability) is:
However, recall that ma is an explicit NumPy matrix but mc is a NumPy ndarray. The point is
that the two different types of matrices could be added together without any problems.
Matrix ma is 2×3. Matrix md is 1×3. In just about any other programming language that I'm aware
of, an attempt to add these two matrices would generate some kind of error because these
matrices have different shapes. However, NumPy allows the addition and returns:
NumPy essentially expands the 1×3 md matrix to a 2×3 matrix, duplicating values, so that it has
the same shape as ma, and then corresponding cells can be added. Some of my colleagues
think NumPy broadcasting is a wonderful, useful feature. Others feel that broadcasting is a
dubious feature that encourages sloppy coding and can easily lead to program bugs.
Resources
For a discussion of the differences between NumPy matrices and arrays, see
http://www.scipy.org/scipylib/faq.html#what-is-the-difference-between-matrices-and-arrays.
52
For details about creating matrices using the NumPy matrix() function, see
http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.matrix.html.
For details about creating ndarray style matrices using the NumPy array() function, see
http://docs.scipy.org/doc/numpy-1.10.1/user/basics.creation.html.
The demo program in Code Listing 9 illustrates matrix multiplication using NumPy. The program
then defines a custom function named my_mult(), which performs matrix multiplication using
nested loops. Program execution begins with a preliminary print statement and then the demo
creates a 2x3 matrix A, and a 3x2 matrix B using the NumPy matrix() function:
B = np.matrix([[7.0, 8.0],
[9.0, 10.0],
[11.0, 12.0]])
# multiplication.py
# Python 2.7
import numpy as np
# =====
B = np.matrix([[7.0, 8.0],
[9.0, 10.0],
[11.0, 12.0]])
53
C = np.dot(A, B) # NumPy matrix multiplication
Matrix A =
[[ 1. 2. 3.]
[ 4. 5. 6.]]
Matrix B =
[[ 7. 8.]
[ 9. 10.]
[ 11. 12.]]
Result of dot(A,B) =
[[ 58. 64.]
[ 139. 154.]]
Result of my_mult(A,B) =
[[ 58. 64.]
[ 139. 154.]]
End demo
After creating the two matrices and displaying their values, we compute their product using the
NumPy dot() function and then again using the program-defined my_mult() function like so:
C = np.dot(A, B)
D = my_mult(A, B)
54
NumPy matrix objects can also be multiplied using the * operator, for example C = A * B, but
ndarray objects must use the dot() function. In other words, the dot() function works for both
types and so is preferable in most situations.
The demo concludes by displaying both results to visually verify they're the same:
Result of A dot B =
[[ 58. 64.]
[ 139. 154.]]
Result of my_mult(A,B) =
[[ 58. 64.]
[ 139. 154.]]
Matrix multiplication is perhaps best explained by example. Matrix A has shape 2×3 and matrix
B has shape 3×2. The shape of their product is 2×2:
(2 x 3) * (3 x 2) = (2 x 2)
You can imagine that the two innermost dimensions, 3 and 3 here, cancel each other out,
leaving the two outermost dimensions. For example, a 5×4 matrix times a 4×7 matrix will have
shape 5×7. If the two innermost dimensions are not equal, NumPy will generate a "shapes not
aligned" error.
The result value at cell [x,y] is the product of the values in row x of the first matrix and column y
of the second matrix. So for the demo, the result at cell [0,1] uses row 0 of matrix A = [1, 2, 3]
and column 1 of matrix B = [8, 10, 12], giving (1 * 8) + (2 * 10) + (3 * 12) = 64.
The shape() function returns a tuple holding the number of rows and the number of columns in
a matrix. You could perform an error check here to verify that the two matrices are conformable,
for example, if acols != brows: print "Error!".
Once the sizes of the two input matrices are known, a result matrix with the correct shape can
be initialized using the NumPy zeros() function:
Notice the use of double parentheses, which forces the zeros() function to return a matrix
rather than an array. Function my_mult() then iterates over each row and each column,
accumulating and storing the sum of products into each cell of the result matrix:
55
for i in xrange(arows):
for j in xrange(bcols):
for k in xrange(acols):
result[i,j] = result[i,j] + a[i,k] * b[k,j]
return result
Notice that the program-defined matrix multiplication function is quite simple but does involve
triple-nested for loops. For small matrices, the difference in performance between a program-
defined method and the NumPy dot() function probably isn't significant in most scenarios. But
for large matrices, the slower performance of a program-defined method would likely be
noticeable and annoying.
The dot() function can be applied to NumPy one-dimensional arrays as well as matrices. For
example:
NumPy has a dedicated inner() function that works just with arrays. For example:
The dot() function can also be applied to arrays with three or more dimensions, but this is a
relatively uncommon scenario.
Resources
56
For a table that lists the approximately 60 NumPy matrix functions, see
http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.matrix.html.
For information on the NumPy ndarray data type, which includes matrices, see
http://docs.scipy.org/doc/numpy-1.10.1/reference/arrays.ndarray.html.
# transposition.py
# Python 2.7
import numpy as np
def my_transpose(m):
(rows, cols) = np.shape(m)
result = np.zeros((rows, cols))
for i in xrange(rows):
for j in xrange(cols):
result[j,i] = m[i,j]
return result
# =====
mt = m.transpose()
print "Transpose from m.transpose() function = "
print mt
print ""
mt = np.transpose(m)
print "Transpose from np.transpose(m) function = "
print mt
print ""
mt = m.T
print "Transpose from m.T property = "
57
print mt
print ""
mt = my_transpose(m)
print "Transpose from my_transpose() function = "
print mt
print ""
Matrix m =
[[ 1. 2. 3.]
[ 4. 5. 6.]
[ 7. 8. 9.]]
End demo
The demo program begins by creating and displaying a simple 3×3 float64 matrix:
58
Here, matrix m is called a square matrix because it has the same number of rows and columns.
Matrix transposition works with either square or non-square matrices.
Next, the demo program creates a transposition of the matrix m using three different NumPy
built-in techniques:
mt = m.transpose()
print "Transpose from m.transpose() function = "
print mt
mt = np.transpose(m)
print "Transpose from np.transpose(m) function = "
print mt
mt = m.T
print "Transpose from m.T property = "
print mt
The first function call uses the transpose() method of the ndarray class. Notice the syntax is
matrix.transpose() and there are no arguments. The second function call uses the NumPy
function that accepts a matrix as its argument. The third call has no parentheses, indicating it is
a property. In all three function calls, the original matrix m is not changed. If you want to change
a matrix, you can use a calling pattern along the lines of m = np.transpose(m).
An immediate and obvious question is: Why are there three ways to transpose a matrix? There's
no good answer. One of the strengths of open source projects like NumPy and SciPy is that
they are collaborative efforts. However, this strength is offset by a certain amount of redundancy
in the libraries. Basically, when you're using NumPy and SciPy you can often perform a task
several ways, and frequently there's no clear best way.
The demo program concludes by calling a custom transpose function named my_transpose():
mt = my_transpose(m)
def my_transpose(m):
(rows, cols) = np.shape(m)
result = np.zeros((rows, cols))
for i in xrange(rows):
for j in xrange(cols):
result[j,i] = m[i,j]
return result
59
Resources
For details about the three built-in NumPy ways to transpose a matrix, see:
http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.transpose.html
http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.ndarray.transpose.html
http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.ndarray.T.html
# determinants.py
# Python 2.7
import numpy as np
60
# =====
d = np.linalg.det(m)
print "Determinant of m using np.linalg.det() is "
print d
print ""
d = my_det(m)
print "Determinant of m using my_det() is "
print d
print ""
Matrix m is
[[ 1. 4. 2. 3.]
[ 0. 1. 5. 4.]
[ 1. 0. 1. 0.]
[ 2. 3. 4. 1.]]
End demo
The demo program begins by creating and displaying a 4×4 float64 matrix:
61
m = np.matrix([[1., 4., 2., 3.],
[0., 1., 5., 4.],
[1., 0., 1., 0.],
[2., 3., 4., 1.]])
Determinants only apply to square matrices (those with the same number of rows and columns).
The simplest case (other than a 1×1 matrix with a single value) is a 2×2 matrix. Consider the
2×2 matrix in the lower left portion of the demo matrix:
1.0 2.0
3.0 4.0
The determinant of this matrix is (1.0 * 4.0) - (2.0 * 3.0) = 4.0 - 6.0 = -2.0. In words, to calculate
the determinant of a 2×2 matrix, you take upper left times lower right, and subtract upper right
times lower left.
A determinant of a square matrix always exists, but it can be zero. For example, consider this
matrix:
3.0 2.0
6.0 4.0
The determinant would be (3.0 * 4.0) - (2.0 * 6.0) = (12.0 - 12.0) = 0. Matrices that have a
determinant of zero do not have an inverse.
For 3×3 and larger matrices, the mathematical definition of the determinant is recursive.
Suppose a 3×3 matrix is:
a b c
d e f
g h i
Notice you have to extract n submatrices of size n-1 × n-1 by removing the first row and each of
the n columns. Writing code from scratch to calculate the determinant of a matrix with a size
larger than 3×3 is very difficult, but with NumPy and SciPy, all you have to do is call the
linalg.det() function.
The demo program finds the determinant of the matrix it created like so:
62
d = np.linalg.det(m)
print "Determinant of m using np.linalg.det() is "
print d
Simple and easy. The NumPy linalg submodule currently has 28 functions that operate on
matrices, including the det() function. The larger SciPy linalg submodule has 82 functions.
Interestingly, the SciPy linalg submodule contains a slightly different det() function. The
SciPy version of det() has a parameter overwrite_a that allows the matrix to be changed
during the calculation of the determinant, which improves performance. Many functions appear
in both the NumPy and SciPy libraries, which is both useful and a possible source of confusion.
The demo has a program-defined function my_det() that calculates the determinant of a matrix.
Let me emphasize that the program-defined function is very inefficient and is intended only to
demonstrate advanced NumPy and SciPy programming techniques. The custom my_det()
function shouldn't be used unless you want to demonstrate a bad way to calculate a matrix
determinant.
Function my_det() uses the same calling signature as the NumPy det() function:
d = my_det(m)
print "Determinant of m using my_det() is "
print d
Function my_det() is recursive, meaning that it calls itself. The my_det() function also calls a
helper function extract() defined as:
Function extract(m, col) accepts an n × n matrix m and returns an n-1 × n-1 matrix from
which the first row and column col have been removed. The key code in my_det() is:
for k in xrange(n):
sign = -1
if k % 2 == 0: sign = +1
subm = extract(m, k)
sum = sum + sign * m[0,k] * my_det(subm)
63
Each of the n sub-matrices is extracted and my_det() is called recursively. There are very few
situations where recursive code is a good choice, and calculating a determinant of a matrix is
not one of them. The NumPy and SciPy implementations of det() use a technique called matrix
decomposition, which is complicated, but very efficient.
Resources
The demo program in Code Listing 12 illustrates matrix inversion using NumPy. As usual, at the
top of the code, the program brings the NumPy library into scope and provides a convenience
alias: import numpy as np.
Because the inv() function is part of the NumPy linalg (linear algebra) submodule, an
alternative would be to use a from numpy import linalg statement. The demo program then
defines a custom function named my_close(), which determines if two matrices are equal in
the sense that all corresponding cell values are equal or nearly equal, within some small
tolerance.
Program execution begins with a preliminary print statement and then the demo creates a 3×3
matrix m using the NumPy matrix() function, explicitly specifying the data type:
m = np.matrix([[3, 0, 4],
[2, 5, 1],
[0, 4, 5]], dtype=np.float64)
# inversion.py
# Python 2.7
import numpy as np
64
# =====
m = np.matrix([[3, 0, 4],
[2, 5, 1],
[0, 4, 5]], dtype=np.float64)
mi = np.linalg.inv(m)
print "The inverse of m is"
print mi
print ""
idty = np.eye(3)
print "The 3x3 identity matrix idty is"
print idty
print ""
b1 = np.allclose(mim, idty)
print "Comparing mi * m with idty using np.allclose() gives"
print str(b1)
print ""
Matrix m is
[[ 3. 0. 4.]
[ 2. 5. 1.]
[ 0. 4. 5.]]
The inverse of m is
[[ 0.22105263 0.16842105 -0.21052632]
[-0.10526316 0.15789474 0.05263158]
65
[ 0.08421053 -0.12631579 0.15789474]]
Product of mi * m is
[[ 1.00000000e+00 -1.11022302e-16 0.00000000e+00]
[ 0.00000000e+00 1.00000000e+00 0.00000000e+00]
[ 0.00000000e+00 1.11022302e-16 1.00000000e+00]]
End demo
Matrix m could have been created as an n-dimensional array using the array() function:
After creating the matrix m and displaying its values, the inverse of the matrix is computed and
displayed like so:
mi = np.linalg.inv(m)
print "The inverse of m is"
print mi
If the from numpy import linalg statement had been used at the top of the script, the inv()
function could have been called as linalg.inv(m) instead. The inv() function applies only to
square matrices (equal number of rows and columns) that have a determinant not equal to zero.
The return value is a square matrix with the same shape as the original matrix.
Matrix inversion is one of the most technically challenging algorithms in numeric processing.
Believe me, you do not want to try to write your own custom matrix inversion function, unless
you are willing to spend a lot of time and effort, presumably because you need to implement
some specialized behavior.
Not all matrices have an inverse. If you apply the inv() function to such a matrix, you'll get a
"singular matrix" error. Therefore, you want to check first along the lines of:
d = np.linalg.det(m)
if d == 0.0:
print "Matrix does not have an inverse"
else:
mi = np.linalg.inv(m)
66
Next, the demo creates and displays a 3x3 identity matrix:
idty = np.eye(3)
print "The 3x3 identity matrix idty is"
print idty
An identity matrix is a square matrix where the cells on the diagonal from upper left to lower
right contain 1.0 values, and all the other cells contain 0.0 values.
In ordinary arithmetic, the inverse of some number x is 1/x. For example, the inverse of 3 is 1/3.
Notice that any number times its inverse equals 1. The identity matrix is analogous to the
number 1 in ordinary arithmetic. Any matrix times its inverse equals the identity matrix.
The demo verifies the inverse is correct by multiplying the original matrix m by its inverse mi and
displaying the result, which is, in fact, the identity matrix:
The output is somewhat difficult to read because of the print statement's default formatting:
Product of mi * m is
[[ 1.00000000e+00 -1.11022302e-16 0.00000000e+00]
[ 0.00000000e+00 1.00000000e+00 0.00000000e+00]
[ 0.00000000e+00 1.11022302e-16 1.00000000e+00]]
If you look closely, you'll see that that main diagonal elements are 1.0 and the other cell values
are very, very close to 0.0. Visual verification that two matrices (the product of the original matrix
times its inverse, and the identity matrix) are equal is fine in simple scenarios, but in many
situations a programmatic approach is better. The demo compares the matrix times its inverse
(mim) and the identity matrix in two ways:
b1 = np.allclose(mim, idty)
print "Comparing mi * m with idty using np.allclose() gives"
print str(b1)
In general, it's a bad idea to compare two matrices that hold floating-point values for exact
equality because floating-point values have some storage limit and therefore are sometimes
only approximations to their true values. For example:
67
The NumPy allclose() function accepts two matrices and returns True if both matrices have
the same shape and all corresponding pairs of cell values are very close to each other (within
1.0e-5 (0.00001)), and False otherwise. If the default 1.0e-5 tolerance isn't suitable, you can
pass a different tolerance argument to the allclose() function. For example, the statement:
will return True only if all corresponding cells in matrices mim and idty are within 1.0e-8 of each
other.
The demo program defines a custom method named my_close() that has similar functionality
to the NumPy allclose() function. There's no advantage to writing such a custom function
unless you need to implement some sort of specialized behavior, such as having a different
tolerance for different rows or columns.
Function my_close() doesn't check if its two matrix parameters have the same shape. You
could do so like this:
The SciPy version of inv() has an overwrite_a parameter that permits the cell values in the
original matrix to be overwritten during the calculation of the inverse. For example:
import numpy as np
import scipy.linalg as spla
m = np.random.rand(10, 10)
d = np.linalg.det(m)
if d == 0:
print "Matrix does not have inverse"
else:
mi = spla.inv(m, overwrite_a=True)
This code creates a 10×10 matrix with random values in the range [0.0 and 1.0), and then
computes the matrix's inverse, allowing the matrix values to be changed in order to improve
performance. However, when I've used this approach, I've never seen the original matrix
changed with this form of function call.
68
Resources
For information about the SciPy version of the inv() function, see
http://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.inv.html.
Code Listing 13: Loading Matrix Data from a Text File Demo
# loadingdata.py
# Python 2.7
import numpy as np
rows = 0; cols = 0
for line in f:
rows += 1
cols = len(line.strip().split(sep))
i = 0 # row index
while True:
line = f.readline()
if not line: break
line = line.strip()
tokens = line.split(',') # a list
for j in xrange(cols):
result[i,j] = np.float64(tokens[j])
i += 1
f.close()
return result
69
# =====
fn = r"C:\SciPy\Ch3\datafile.txt"
m = np.loadtxt(fn, delimiter=',')
print "Matrix loaded using np.loadtxt() = "
print m
print ""
m = my_load(fn, sep=',')
print "Matrix loaded using my_load() = "
print m
print ""
End demo
The demo program begins by specifying the location of the source data file:
fn = r"C:\SciPy\Ch3\datafile.txt"
Here, fn stands for file name. The r qualifier stands for raw and tells the Python interpreter to
treat backslashes as literals rather than the start of an escape sequence. File datafile.txt is
a simple comma-delimited text file with no header:
1.0, 2.0
3.0, 4.0
5.0, 6.0
7.0, 8.0
70
Next, the demo creates and loads a matrix like so:
m = np.loadtxt(fn, delimiter=',')
print "Matrix loaded using np.loadtxt() = "
print m
The delimiter argument tells loadtxt() how values are separated on each line. The default
value is any whitespace character (spaces, tabs, newlines), so the argument is required in this
case.
In addition to the required fname parameter and optional delimiter parameter, loadtxt() has
seven additional optional parameters. Of these, based on my experience, the three most
commonly used parameters are comments, skiprows, and usecols. For example, suppose a
data file is:
The following statement means: skip the first line, treat lines with '$' or '%' as comments, and
load only column 0 and 2.
Although loadtxt() is quite versatile, there are many scenarios it doesn't handle. In these
situations, it's easy to write a custom load function. The demo program defines such a function:
71
The function my_load() performs a preliminary scan of the file to determine the number of rows
and columns there are, then creates a matrix with the appropriate shape, resets the file read
pointer, and does a second scan to read, parse, and store each value in the data file. There are
several alternative designs you can use.
Resources
For details about NumPy function genfromtxt() that can handle missing values, see
http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.genfromtxt.html.
72
Chapter 4 Combinatorics
In section 4.1, you'll learn how to create a program-defined Permutation class using NumPy,
and how to write an effective factorial function.
In section 4.2, you'll learn how to write a successor() function that returns the next permutation
element in lexicographical order.
73
In section 4.3, you'll learn how to create a useful element() function that directly generates a
specified permutation element.
And in section 4.6, you'll learn how to write an element() function for combinations.
4.1 Permutations
A mathematical permutation set is all possible orderings of n items. For example, if n = 3 and
the items are the integers (0, 1, 2) then there are six possible permutation elements:
(0, 1, 2)
(0, 2, 1)
(1, 0, 2)
(1, 2, 0)
(2, 0, 1)
(2, 1, 0)
Python supports permutations in the SciPy special module and in the Python itertools
module. Interestingly, NumPy has no direct support for permutations, but it is possible to
implement custom permutation functions using NumPy arrays.
# permutations.py
# Python 2.7
import numpy as np
import itertools as it
import scipy.special as ss
class Permutation:
def __init__(self, n):
self.n = n
self.data = np.arange(n)
@staticmethod
def my_fact(n):
ans = 1
for i in xrange(1, n+1):
ans *= i
return ans
def as_string(self):
s = "# "
for i in xrange(self.n):
s = s + str(self.data[i]) + " "
74
s = s + "#"
return s
# =====
n = 3
print "Setting n = " + str(n)
print ""
num_perms = ss.factorial(n)
print "Using scipy.special.factorial(n) there are ",
print str(num_perms),
print "possible permutation elements"
print ""
num_perms = Permutation.my_fact(n)
print "Using my_fact(n) there are " + str(num_perms),
print "possible permutation elements"
print ""
Setting n = 3
75
Making a custom Permutation object
The first custom permutation element is
# 0 1 2 #
End demo
import numpy as np
import itertools as it
import scipy.special as ss
The itertools module has the primary permutations class, but the closely associated
factorial() function is defined in the special submodule of the scipy module. If this feels a
bit awkward to you, you're not alone.
The demo program defines a custom Permutation class. In most cases, you will only want to
define a custom implementation of a function when you need to implement some specialized
behavior, or you want to avoid using a module that contains the function.
n = 3
print "Setting n = " + str(n)
Using lowercase n for the number of permutations is traditional, so you should use it unless you
have a reason not to. Next, the demo program determines the number of possible permutations
using the SciPy factorial() function:
num_perms = ss.factorial(n)
print "Using scipy.special.factorial(n) there are ",
print str(num_perms),
print "possible permutation elements"
factorial(3) = 3 * 2 * 1 = 6
factorial(5) = 5 * 4 * 3 * 2 * 1 = 120
The value of factorial(0) is usually considered a special case and defined to be 1. Next, the
demo creates a Python permutations iterator:
all_perms = it.permutations(xrange(n))
I like to think of a Python iterator object as a little factory that can emit data when a request is
made of it using an explicit or implicit call to a next() function. Notice the call to the
permutations() function accepts xrange(n) rather than just n, as you might have thought.
The demo program requests and displays the first itertools permutation element like so:
76
p = all_perms.next()
print "The first itertools permutation is "
print p
Next, the demo program uses the custom functions. First, the my_fact() function is called:
num_perms = Permutation.my_fact(n)
print "Using my_fact(n) there are " + str(num_perms),
print "possible permutation elements"
Notice that the call to my_fact() is appended to Permutation, which is the name of its defining
class. This is because the my_fact() function is decorated with the @staticmethod attribute.
Next, the demo creates an instance of the custom Permutation class. The Permutation class
__init__() constructor method initializes an object to the first permutation element so there's
no need to call a next() function:
p = Permutation(n)
print "The first custom permutation element is "
print p.as_string()
def my_fact(n):
ans = 1
for i in xrange(1, n+1):
ans *= i
return ans
The mathematical factorial function is often used in computer science classes as an example of
a function that can be implemented using recursion:
@staticmethod
def my_fact_rec(n):
if n == 0 or n == 1:
return 1
else:
return n * Permutation.my_fact_rec(n-1)
Although recursion has a certain mysterious aura, in most situations (such as this one),
recursion is highly inefficient and so should be avoided.
An option for any implementation of a factorial function, especially where the function will be
called many times, is to create a pre-calculated lookup table with values for the first handful (say
1,000) results. The extra storage is usually a small price to pay for much-improved performance.
77
Resources
For details about the Python itertools module that contains the permutations iterator, see
https://docs.python.org/2/library/itertools.html.
(0, 1, 2)
(0, 2, 1)
(1, 0, 2)
(1, 2, 0)
(2, 0, 1)
(2, 1, 0)
Notice that if we removed the separating commas and interpreted each element as an ordinary
integer (like 120), the elements would be in ascending order (12 < 21 < 102 < 120 < 201 < 210).
# perm_succ.py
# Python 2.7
import numpy as np
import itertools as it
class Permutation:
def __init__(self, n):
self.n = n
self.data = np.arange(n)
def as_string(self):
s = "# "
for i in xrange(self.n):
s = s + str(self.data[i]) + " "
s = s + "#"
return s
def successor(self):
res = Permutation(self.n) # result
78
res.data = np.copy(self.data)
left = self.n - 2
while res.data[left] > res.data[left+1] and left >= 1:
left -= 1
right = self.n - 1
while res.data[left] > res.data[right]:
right -= 1
res.data[left], res.data[right] = \
res.data[right], res.data[left]
i = left + 1
j = self.n - 1
while i < j:
tmp = res.data[i]
res.data[i] = res.data[j]
res.data[j] = tmp
i += 1; j -= 1
return res
# =====
n = 3
print "Setting n = " + str(n)
print ""
perm_it = it.permutations(xrange(n))
print "Iterating all permutations using itertools permutations(): "
for p in perm_it:
print "p = " + str(p)
print ""
p = Permutation(n)
print "Iterating all permutations using custom Permutation class: "
while p is not None:
print "p = " + p.as_string()
p = p.successor()
79
Setting n = 3
End demo
import numpy as np
import itertools as it
Since the itertools module has many kinds of iterable objects, an alternative is to bring just
the permutations iterator into scope:
The demo program defines a custom Permutation class. In most cases, you will only want to
define a custom implementation of a function when you need to implement some specialized
behavior, or you want to avoid using a module that contains the function.
n = 3
print "Setting n = " + str(n)
Using lowercase n for the number of permutations is traditional, so you should use it unless you
have a reason not to.
Next, the demo program iterates through all possible permutation elements using an implicit
mechanism:
perm_it = permutations(xrange(n))
print "Iterating all permutations using itertools permutations(): "
for p in perm_it:
print "p = " + str(p)
print ""
80
The perm_it iterator can emit all possible permutation elements. In most situations, Python
iterators are designed to be called using a for item in iterator pattern, as shown. In other
programming languages, this pattern is sometimes distinguished from a regular for loop by
using a foreach keyword.
Note that the itertools.permutations() iterator emits tuples, indicated by the parentheses
in the output, rather than a list or a NumPy array.
It is possible, but somewhat awkward, to explicitly call the permutations iterator using the
next() function like so:
perm_it = it.permutations(xrange(n))
while True:
try:
p = perm_it.next()
print "p = " + str(p)
except StopIteration:
break
print ""
By design, iterator objects don't have an explicit way to signal the end of iteration, such as an
end() function or returning a special value like None. Instead, when an iterator object has no
more items to emit and a call to next() is made, a StopIteration exception is thrown. To
terminate a loop, you must catch the exception.
Next, the demo program iterates through all permutation elements for n = 3 using the program-
defined Permutation class:
p = Permutation(n)
print "Iterating all permutations using custom Permutation class: "
while p is not None:
print "p = " + p.as_string()
p = p.successor()
The successor() function of the Permutation class uses a traditional stopping technique by
returning None when there are no more permutation elements. The function successor() uses
an unobvious approach to determine when the current permutation element is the last one. A
straightforward approach isn't efficient. For example, if n = 5, the last element is (4 3 2 1 0) and
it'd be very time-consuming to check if data[0] > data[1] > data[2] > . . > data[n-1] on each call.
The logic in the program-defined successor() function is rather clever. Suppose n = 5 and the
current permutation element is:
#01432#
The next element in lexicographical order after 01432, using the digits 0 through 4, is 02134.
The successor() function first finds the indices of two items to swap, called left and right. In
this case, left = 1 and right = 4. The items at those indices are swapped, giving a preliminary
result of 02431. Then the items from index right through the end of the element are placed in
order (431 in this example) giving the final result of 02134.
81
Resources
For details about the Python itertools module and the permutations iterator, see
https://docs.python.org/2/library/itertools.html.
[0] (0, 1, 2)
[1] (0, 2, 1)
[2] (1, 0, 2)
[3] (1, 2, 0)
[4] (2, 0, 1)
[5] (2, 1, 0)
In many situations, you want to iterate through all possible permutations, but in some cases you
may want to generate just a specific permutation element. For example, a function call like pe =
perm_element(4) would store (2, 0, 1) into pe.
# perm_elem.py
# Python 2.7
import numpy as np
import itertools as it
import time
class Permutation:
def __init__(self, n):
self.n = n
self.data = np.arange(n)
def as_string(self):
s = "# "
for i in xrange(self.n):
s = s + str(self.data[i]) + " "
s = s + "#"
return s
factoradic = np.zeros(self.n)
82
for j in xrange(1, self.n + 1):
factoradic[self.n-j] = idx % j
idx = idx / j
for i in xrange(self.n):
factoradic[i] += 1
result.data[self.n - 1] = 1
for i in xrange(self.n):
result.data[i] -= 1
return result;
# =====
# =====
n = 20
print "Setting n = " + str(n) + "\n"
idx = 1000000000
print "Element " + str(idx) + " using itertools.permutations() is "
start_time = time.clock()
pe = perm_element(n, idx)
end_time = time.clock()
elapsed_time = end_time - start_time
print pe
print "Elapsed time = " + str(elapsed_time) + " seconds "
print ""
p = Permutation(n)
start_time = time.clock()
pe = p.element(idx)
end_time = time.clock()
elapsed_time = end_time - start_time
print "Element " + str(idx) + " using custom Permutation class is "
83
print pe.as_string()
print "Elapsed time = " + str(elapsed_time) + " seconds "
print ""
Setting n = 20
End demo
import numpy as np
import itertools as it
import time
The demo program defines a custom Permutation class that has an element() member
function and a stand-alone function perm_element() that is not part of a class. Both functions
return a specific permutation element. Function perm_element() uses the built-in
permutations() iterator from the itertools module. Function element() uses a NumPy
array plus a clever algorithm that involves something called the factoradic. Program execution
begins by setting up the order of a permutation, n:
n = 20
print "Setting n = " + str(n) + "\n"
The order of a permutation is the number of items in each permutation. For n = 20 there are 20!
= 2,432,902,008,176,640,000 different permutation elements. Next, the demo finds the
permutation element 1,000,000,000 using the program-defined perm_element() function:
After the permutation element has been computed, the element and the elapsed time required
are displayed:
84
elapsed_time = end_time - start_time
print pe
print "Elapsed time = " + str(elapsed_time) + " seconds "
In this example, the perm_element() function took over 2 and a half minutes to execute. Not
very good performance.
Next, the demo computes the same permutation element using the program-defined
Permutation class:
p = Permutation(n)
start_time = time.clock()
pe = p.element(idx)
end_time = time.clock()
Then the element and the elapsed time required are displayed using the custom class
approach:
The elapsed time using the custom Permutation element() function class was approximately
0.0003 seconds—much better performance than the 160+ seconds for the itertools-based
function.
It really wasn't a fair fight. The perm_element() function works by creating an itertools.
permutations iterator and then generating each successive permutation one at a time until the
desired permutation element is reached. The function definition is:
On the other hand, the custom element() function uses some very clever mathematics and an
entity called the factoradic of a number to construct the requested permutation element directly.
The regular decimal representation of numbers is based on powers of 10. For example, 1047 is
(1 * 10^3) + (0 * 10^2) + (4 * 10^1) + (7 * 10^0). The factoradic of a number is an alternate
representation based on factorials. For example, 1047 is 1232110 because it's (1 * 6!) + (2 * 5!)
+ (3 * 4!) + (2 * 3!) + (1 * 2!) + (1 * 1!) + (0 * 0!). Using some rather remarkable mathematics, it's
possible to use the factoradic of a permutation element index to compute the element directly.
85
Resources
For details about the Python itertools module, which contains the permutations iterator, see
https://docs.python.org/2/library/itertools.html.
4.4 Combinations
A mathematical combination set is a collection of all possible subsets of k items selected from n
items. For example, if n = 5 and k = 3 and the items are the integers (0, 1, 2, 3, 4), then there
are 10 possible combination elements:
(0, 1, 2)
(0, 1, 3)
(0, 1, 4)
(0, 2, 3)
(0, 2, 4)
(0, 3, 4)
(1, 2, 3)
(1, 2, 4)
(1, 3, 4)
(2, 3, 4)
For combinations, the order of the items does not matter. Therefore, there is no element (0, 2,
1) because it is considered the same as (0, 1, 2). Python supports combinations in the SciPy
special module and in the Python itertools module. There is no direct support for
combinations in SciPy, but it's possible to implement combination functions using NumPy
arrays.
# combinations.py
# Python 2.7
import numpy as np
import itertools as it
import scipy.special as ss
class Combination:
# n == order, k == subset size
def __init__(self, n, k):
self.n = n
self.k = k
self.data = np.arange(self.k)
def as_string(self):
s = "^ "
for i in xrange(self.k):
86
s = s + str(self.data[i]) + " "
s = s + "^"
return s
@staticmethod
def my_choose(n,k):
if n < k: return 0
if n == k: return 1;
delta = k
imax = n - k
if k < n-k:
delta = n-k
imax = k
ans = delta + 1
for i in xrange(2, imax+1):
ans = (ans * (delta + i)) / i
return ans
# =====
n = 5
k = 3
print "Setting n = " + str(n) + " k = " + str(k)
print ""
num_combs = ss.comb(n, k)
print "n choose k using scipy.comb() is ",
print num_combs
print ""
c = all_combs.next()
print "First itertools combination element is "
print c
print ""
num_combs = Combination.my_choose(n, k)
print "n choose k using my_choose(n, k) is ",
print num_combs
print ""
87
C:\SciPy\Ch4> python combinations.py
Setting n = 5 k = 3
End demo
import numpy as np
import itertools as it
import scipy.special as ss
The itertools module has the primary combinations class, but the closely associated comb()
function is defined in the special submodule of the scipy module (and also in scipy.misc).
The demo program defines a custom Combination class. In most cases, you will only want to
define a custom implementation of a function when you need to implement some specialized
behavior, or you want to avoid using a module that contains the function.
Program execution begins by setting up the number of items n, and the subset size k:
n = 5
k = 3
print "Setting n = " + str(n) + " k = " + str(k)
Lowercase n and k are most often used with combinations, so if you use different variable
names it would be a good idea to comment on which is the number of items and which is the
subset size. Next, the demo program determines the number of possible combination elements
using the SciPy comb() function:
num_combs = ss.comb(n, k)
print "n choose k using scipy.comb() is ",
print num_combs
88
The function that returns the number of ways to select k items from n items is almost universally
called choose(n, k) so it's not clear why the SciPy code implementation is named comb(n, k).
The mathematical definition of choose(n, k) is n! / k! * (n-k)! where ! is the factorial function. For
example:
As it turns out, a useful fact is that choose(n, k) = choose(n, n-k). For example, choose(10, 7) =
choose(10, 3). The choose function is easier to calculate using smaller values of the subset
size.
all_combs = it.combinations(xrange(n), k)
I like to think of a Python iterator object as a little factory that can emit data when a request is
made of it using an explicit or implicit call to a next() function. Notice the call to the
it.combinations() function accepts xrange(n) rather than just n. The choice of the name
all_combs could be somewhat misleading if you're not familiar with Python iterators. The
all_combs iterator doesn't generate all possible combination elements when it is created. It
does, however, have the ability to emit all combination elements.
In addition to xrange(), the it.combinations() iterator can accept any iterable object. For
example:
Next, the demo program requests and displays the first itertools combination element like so:
c = all_combs.next()
print "The first itertools combination element is "
print c
Next, the demo program demonstrates the custom functions. First, the program-defined
my_choose() function is called:
num_combs = Combination.my_choose(n, k)
print "n choose k using my_choose(n, k) is ",
print num_combs
Notice that the call to my_choose() is appended to Combination, which is the name of its
defining class. This is because the my_choose() function is decorated with the @staticmethod
attribute.
Next, the demo creates an instance of the custom Combination class. The Combination class
__init__() constructor method initializes an object to the first combination element, so there's
no need to call a next() function to get the first element:
89
print "Making a custom Combination object "
c = Combination(n, k)
print "The first custom combination element is "
print c.as_string()
The custom as_string() function displays a Combination element delimited by the ^ (carat)
character so that the element can be easily distinguished from a tuple, a list, or another Python
collection. I used ^ because both combination and carat start with the letter c.
The custom my_choose() function is rather subtle. It would be a weak approach to implement a
choose function directly using the math definition because that would involve the calculation of
three factorial functions. The factorial of a number can be very large. For example, 20! is
2,432,902,008,176,640,000 and 1000! is an almost unimaginably large number.
The my_choose() function uses a clever alternate definition that is best explained by example:
Furthermore, the top and bottom parts of the division don't have to be computed fully. Instead,
the product of each pair of terms in the top can be iteratively divided by a term in the bottom.
For example:
choose(10, 3) = 10 * 9 / 3 = 30 * 8 / 2 = 120
delta = k
imax = n - k
if k < n-k:
delta = n-k
imax = k
ans = delta + 1
for i in xrange(2, imax+1):
ans = (ans * (delta + i)) / i
return ans
The first two statements look for early exit conditions. The statements with delta and imax
simplify k if possible. The for loop performs the iterated pair-multiplication and division.
90
Resources
For details about the Python itertools module that contains the combinations iterator, see
https://docs.python.org/2/library/itertools.html.
(0, 1, 2)
(0, 1, 3)
(0, 1, 4)
(0, 2, 3)
(0, 2, 4)
(0, 3, 4)
(1, 2, 3)
(1, 2, 4)
(1, 3, 4)
(2, 3, 4)
Notice that if we removed the separating commas and interpreted each element as an ordinary
integer (like 124), the elements would be in ascending order (12 < 13 < 14 < 23 < . . < 234).
# comb_succ.py
# Python 2.7
import numpy as np
import itertools as it
class Combination:
# n == order, k == subset size
def as_string(self):
s = "^ "
for i in xrange(self.k):
s = s + str(self.data[i]) + " "
s = s + "^"
91
return s
def successor(self):
if self.data[0] == self.n - self.k:
return None
i = self.k - 1
while i > 0 and res.data[i] == self.n - self.k + i:
i -= 1
res.data[i] += 1
return res
# =====
n = 5
k = 3
print "Setting n = " + str(n) + " k = " + str(k)
print ""
Setting n = 5 k = 3
92
c = (0, 1, 2)
c = (0, 1, 3)
c = (0, 1, 4)
c = (0, 2, 3)
c = (0, 2, 4)
c = (0, 3, 4)
c = (1, 2, 3)
c = (1, 2, 4)
c = (1, 3, 4)
c = (2, 3, 4)
End demo
import numpy as np
import itertools as it
Since the itertools module has many kinds of iterable objects, an alternative is to bring just
the permutations iterator into scope:
The demo program defines a custom Combination class. In most cases, you will only want to
define a custom implementation of a function when you need to implement some specialized
behavior, or you want to avoid using a module that contains the function (such as itertools).
Program execution begins by setting up the number of items and the subset size:
n = 5
k = 3
print "Setting n = " + str(n) + " k = " + str(k)
It is customary to use n and k when working with mathematical combinations, so you should do
so unless you have a reason to use different variable names.
Next, the demo program iterates through all possible combination elements using an implicit
mechanism:
93
print "Iterating through all elements using itertools.combinations()"
comb_iter = it.combinations(xrange(n), k)
for c in comb_iter:
print "c = " + str(c)
print ""
The comb_iter iterator can emit all possible combination elements. In most situations, Python
iterators are designed to be called using a for item in iterator pattern, as shown. In other
programming languages, this pattern is sometimes distinguished from a regular for loop by
using a foreach keyword (C#) or special syntax like for x : somearr (Java).
Note that the itertools.combinations() iterator emits tuples, indicated by the parentheses
in the output, rather than a list or a NumPy array.
It is possible but awkward to explicitly call the combinations iterator using the next() function
like so:
comb_iter = it.combinations(xrange(n), k)
while True:
try:
c = comb_iter.next()
print "c = " + str(c)
except StopIteration:
break
print ""
By design, iterator objects don't have an explicit way to signal the end of iteration, such as a
last() function or returning a special value like None. Instead, when an iterator object has no
more items to emit and a call to next() is made, a StopIteration exception is thrown. To
terminate a loop, you must catch the exception. Note that you could catch a general Exception
rather than the more specific StopIteration.
Next, the demo program iterates through all combination elements for n = 5 and k = 3 using the
successor() function of the program-defined Combination class:
The successor() function of the Combination class uses a traditional stopping technique by
returning None when there are no more permutation elements. The logic in the program-defined
successor() function is rather clever. Suppose n = 7, k = 4, and the current combination
element is:
^0156^
94
The next element in lexicographical order after 0256, using the digits 0 through 6, is 0345. The
successor algorithm first finds the index i of the left-most item that must change. In this case, i
= 1, which corresponds to item 2. The item at i is incremented, giving a preliminary result of
0356. Then the items to the right of the new value at i (56 in this case) are updated so that they
are all consecutive relative to the value at i (45 in this case), giving the final result of 0345.
Notice that it's quite easy for successor() to determine the last combination element because
it's the only one that has a value of n-k at index 0. For example, with n = 5 and k = 3, n-k = 2
and the last combination element is (2 3 4). Or, if n = 20 and k = 8, the last combination element
would be (12 13 14 . . . 19).
One potential advantage of using a program-defined Combination class rather than the
itertools.combinations() iterator is that you can easily define a predecessor() function.
For example, consider the functions in Code Listing 20:
Code Listing 20: A Combination Predecessor Function
def predecessor(self):
if self.data[self.n - self.k] == self.n - self.k:
return None
res = Combination(self.n, self.k)
res.data = np.copy(self.data)
i = self.k - 1
while i > 0 and res.data[i] == res.data[i-1] + 1:
i -= 1
res.data[i] -= 1; i += 1
while i < k:
res.data[i] = self.n - self.k + i
i += 1
return res
def last(self):
res = Combination(self.n, self.k)
nk = self.n - self.k
for i in xrange(self.k):
res.data[i] = nk + i
return res
Then the following statements would iterate through all combination elements in reverse order:
c = Combination(n, k) # 0 1 2
c = c.last() # 2 3 4
while c is not None:
print "c = " + c.as_string()
c = c.predecessor()
Resources
For details about the Python itertools module, which contains the combinations iterator, see
https://docs.python.org/2/library/itertools.html.
95
4.6 Combination element
When working with mathematical combinations, it's often useful to be able to generate a specific
element. For example, if n = 5, k = 3, and the items are the integers (0, 1, 2, 3, 4), then there are
10 combination elements. When listed in lexicographical order, the elements are:
[0] (0, 1, 2)
[1] (0, 1, 3)
[2] (0, 1, 4)
[3] (0, 2, 3)
[4] (0, 2, 4)
[5] (0, 3, 4)
[6] (1, 2, 3)
[7] (1, 2, 4)
[8] (1, 3, 4)
[9] (2, 3, 4)
In many situations, you want to iterate through all possible combination elements, but in some
cases you may want to generate just a specific combination element. For example, a function
call like ce = comb_element(5) would store (0, 3, 4) into ce.
Using the built-in itertools.combinations iterator, the only way you can get a specific
combination element is to iterate from the first element until you reach the desired element. This
approach is impractical in all but the simplest scenarios. An efficient alternative is to define a
custom Combination class and element() function that use NumPy arrays for data.
# comb_elem.py
# Python 2.7
class Combination:
def __init__(self, n, k):
self.n = n
self.k = k
self.data = np.arange(k)
def as_string(self):
s = "^ "
for i in xrange(self.k):
s = s + str(self.data[i]) + " "
s = s + "^"
return s
@staticmethod
def my_choose(n,k):
if n < k: return 0
96
if n == k: return 1;
delta = k
imax = n - k
if k < n-k:
delta = n-k
imax = k
ans = delta + 1
for i in xrange(2, imax+1):
ans = (ans * (delta + i)) / i
return ans
for i in xrange(self.k):
ans[i] = (self.n - 1) - ans[i]
# =====
# =====
97
print "\nBegin combination element demo \n"
n = 100
k = 8
print "Setting n = " + str(n) + " k = " + str(k)
ces = ss.comb(n, k)
print "There are " + str(ces) + " different combinations \n"
idx = 100000000
c = Combination(n, k)
start_time = time.clock()
ce = c.element(idx)
end_time = time.clock()
elapsed_time = end_time - start_time
print "Element " + str(idx) + " using custom Combination class is "
print ce.as_string()
print "Elapsed time = " + str(elapsed_time) + " seconds "
print ""
Setting n = 100 k = 8
There are 186087894300.0 different combinations
End demo
98
The demo program sets up a combinatorial problem with n = 100 items taken k = 8 at a time. So
the first combination element is (0, 1, 2, 3, 4, 5, 6, 7). The number of different combination
elements is calculated using the comb() function from the scipy.special module and is
186,087,894,300. Note that in virtually all other programming language libraries, the function to
calculate the number of different combination elements is called choose().
The demo calculates the same combination element using a program-defined Combination
class and element() function. This approach took just over 0.001 seconds. The point is that
Python iterators are designed to iterate well, but are not well suited for other scenarios.
The function doesn't check if parameter idx is valid. You could do so using a statement like:
The obvious problem with using an iterator is that there's no way to avoid walking through every
combination element until you reach the desired element. On the other hand, the program-
defined element() function in the Combination class uses a clever mathematical idea called
the combinadic to generate a combination element directly.
The regular decimal representation of numbers is based on powers of 10. For example, 7203 is
(7 * 10^3) + (2 * 10^2) + (0 * 10^1) + (3 * 10^0). The combinadic of a number is an alternate
representation based on the mathematical choose(n,k) function. For example, if n = 7 and k = 4,
the number 27 is 6521 in combinadic form because 27 = choose(6,4) + choose(5,3) +
choose(2,2) + choose(1,1). Using some rather remarkable mathematics, it's possible to use the
combinadic of a combination element index to compute the element directly.
Resources
For details about the Python itertools module that contains the combinations iterator, see
https://docs.python.org/2/library/itertools.html.
99
The World's Best
5 stars
for Building
Powerful Apps
Laptop: 56%
Orders
Online Orders offline Orders Total users
Products 23456 345 945 65 9789 95
16 17 18 19 20 21 22 Users
23 24 25 26 27 28 29
Teams Top Sale Products
Cash
30 31 3 4 5
1 2
Setting Apple iPhone 13 Pro $999.00
$1500
Order Delivery Stats Mobile +12.8%
100K
Completed
120 Apple Macbook Pro $1299.00 50K
In Progress
Invoices New Invoice Laptop +32.8%
25K
24
Order id Date Client name Amount Status Galaxy S22 Ultra $499.99 0
Mobile +22.8% 10 May 11 May 12 May Today
Log Out #1208 Jan 21, 2022 Olive Yew $1,534.00 Completed
syncfusion.com/communitylicense
desktop platforms
20+ years in
This chapter deals with miscellaneous NumPy and SciPy functions and techniques. The goal is
to present representative examples so you'll be able to search the SciPy documentation more
efficiently. The following screenshot shows you where this chapter is headed.
In section 5.1, you'll learn how to use the NumPy searchsorted() binary search function and
how to interpret its unusual return value.
In section 5.2, you'll learn how to use SciPy to perform LU decomposition on a square matrix
and why decomposition is important.
In section 5.3, you'll learn about NumPy and SciPy statistics functions such as mean() and
std().
100
In section 5.4, you'll learn how to generate random values from a specified distribution such as
the Normal or Poisson, and how to bin data using the histogram() function.
In section 5.5, you'll learn about SciPy miscellaneous functions such as the double factorial.
In section 5.6, you'll learn how to use special SciPy functions such as bernoulli() and
gamma().
# binsearch.py
# Python 2.7
import numpy as np
target = 11.0
print "Target value to find is " + str(target)
print ""
101
print ""
Array arr is
[ 1. 3. 4. 6. 8. 11. 13.]
End demo
The demo program execution begins by setting up an array to search and a target value to
search for:
target = 11.0
print "Target value to find is " + str(target)
If you need to search a very large array and the array is already sorted, a binary search is often
the best approach because it's much faster than a simple sequential search. For small arrays
(typically those with less than 100 cells), the marginally faster performance of a binary search is
often unimportant, and if your array is not already sorted, the time required to sort the array
usually wipes out any time saved by a binary search.
Next, the demo calls the NumPy searchsorted() function like so:
102
print "Searching array using np.searchsorted() function "
idx = np.searchsorted(arr, target)
if idx < len(arr) and arr[idx] == target:
print "Target found at cell " + str(idx)
else:
print "Target not found "
The binary search functions in most programming languages return a -1 if the target is not
found, or return the cell index that holds the target value if the target is found. The
searchsorted() function works a bit differently.
A call to searchsorted(arr, x) returns the cell index in sorted array arr where x would be
inserted so that the array would remain sorted. For example, if arr = [2.0, 5.0, 6.0, 9.0]
and x = 3.0, then searchsorted(arr, x) returns 1 because the 3.0 would be inserted at cell
1 in order to keep the array sorted. If x = 11.0, then searchsorted(arr, x) would return 4
because the 11.0 would have to be inserted beyond the end of the array.
If x is a value that is already in the array, then searchsorted(arr, x) will return the cell where
the value is. Therefore, to determine if a value is in an array arr using the return value idx from
searchsorted(arr, x), you must first check that idx is less than the length of arr and then
check to see if the value at arr[idx] equals the target value.
If the search array holds floating-point values, using searchsorted() is somewhat risky. For
example, if the target value is 11.0000000000000001 (there are 15 zeros), it would not be found
by the demo program, but a slightly less precise target of 11.000000000000001 (there are 14
zeros) would be found.
The lesson is that, when searching a sorted array of floating-point values using the NumPy
searchsorted(), you don't have control over how the function determines floating-point value
equality, so you may want to write a program-defined binary search function like
my_bin_search() in the demo program:
103
Resources
For information about the array binary search algorithm used by the demo, see
https://en.wikipedia.org/wiki/Binary_search_algorithm.
Matrix decomposition, also called matrix factorization, is rarely used by itself, but decomposition
is the basis for efficient algorithms that find the inverse and the determinant of a matrix.
There are several kinds of matrix decomposition. The most common form is called lower-upper
decomposition for reasons that will become clear shortly. The scipy.linalg.lu() function
performs lower-upper matrix decomposition. It's sometimes useful to write a program-defined
matrix decomposition function.
# decomposition.py
# Python 2.7
import numpy as np
import scipy.linalg as spla
def my_decomp(m):
# LU decompose matrix m using Crout's algorithm
n = len(m)
toggle = 1 # row swapping parity
lum = np.copy(m) # result matrix
perm = np.arange(n) # row permutation info
for j in xrange(n-1):
max = abs(lum[j,j])
piv = j
if (piv != j):
for k in xrange(n): # swap rows j, piv
104
t = lum[piv,k]
lum[piv,k] = lum[j,k]
lum[j,k] = t
xjj = lum[j,j]
if xjj != 0.0:
for i in xrange(j+1, n):
xij = lum[i,j] / xjj
lum[i,j] = xij
for k in xrange(j+1, n):
lum[i,k] = lum[i,k] - (xij * lum[j,k])
# =====
print "\n----------"
105
print "\nResult combined LU matrix = "
print lum
Original matrix m =
[[ 3. 2. 1. 3.]
[ 5. 6. 4. 2.]
[ 7. 9. 8. 1.]
[ 4. 2. 3. 0.]]
----------
106
[2 3 0 1]
End demo
The demo program begins by bringing the scipy.linalg submodule into scope:
import numpy as np
import scipy.linalg as spla
After creating the source matrix m and displaying its values, the matrix is decomposed using the
linalg.lu() function like so:
The return result is a tuple with three items. The first item, perm, will be explained shortly. The
second and third items are the decomposed matrices. For the demo, return matrix low is:
[[ 1. 0. 0. 0. ]
[ 0.57142857 1. 0. 0. ]
[ 0.42857143 0.59090909 1. 0. ]
[ 0.71428571 0.13636364 1. 1. ]]
Notice that the relevant values are in the lower part of the matrix, and there are dummy 1.0
values on the main diagonal. The return matrix upp is:
[[ 7. 9. 8. 1. ]
[ 0. -3.14285714 -1.57142857 -0.57142857]
[ 0. 0. -1.5 2.90909091]
[ 0. 0. 0. -1.54545455]]
Here, all the relevant values are on the main diagonal and above. Next, the demo multiplies low
and upp using the NumPy dot() function and displays the resulting matrix:
[[ 7. 9. 8. 1.]
[ 4. 2. 3. 0.]
[ 3. 2. 1. 3.]
[ 5. 6. 4. 2.]]
107
[[ 3. 2. 1. 3.]
[ 5. 6. 4. 2.]
[ 7. 9. 8. 1.]
[ 4. 2. 3. 0.]]
Notice that the product of matrices low and upp is almost the original matrix. Rows 0 and 1 have
been swapped and rows 2 and 4 have been swapped. The swap information is contained in the
perm matrix result:
[[ 0. 0. 1. 0.]
[ 0. 0. 0. 1.]
[ 1. 0. 0. 0.]
[ 0. 1. 0. 0.]]
This may be interesting, but what's the point? As it turns out, the lower and upper matrices of a
decomposition can be used to easily calculate the determinant of the original matrix, and can
also be used to compute the inverse of the original matrix.
The determinant of a matrix is the product of the parity of row swaps times the product of the
diagonal elements of the upper matrix. The inverse of a matrix can be computed using a short
helper function that performs what is called elimination on the lower and upper matrices.
This is exactly how SciPy calculates the determinant and inverse of a matrix. It may seem odd
to use such an indirect approach, but decomposing a matrix and then finding the determinant or
the inverse is much easier and faster than finding the determinant or inverse directly.
The LU decomposition functions in many other libraries return different values than the
scipy.linalg.lu() function. The demo program implements a custom my_decomp()
decomposition function that returns values in a different format. The call to my_decomp() is:
The program-defined function returns a tuple of three items. The first is a combined lower-upper
matrix (instead of separate lower and upper matrices). The second item is a permutation array
(instead of a matrix). And the third item is a toggle parity where +1 indicates an even number of
row swaps and -1 indicates an odd number of row swaps. For the demo, the combined lower-
upper matrix result from my_decomp() is:
[[ 7. 9. 8. 1. ]
[ 0.57142857 -3.14285714 -1.57142857 -0.57142857]
[ 0.42857143 0.59090909 -1.5 2.90909091]
[ 0.71428571 0.13636364 1. -1.54545455]]
These are the same values from linalg.lu() except combined into a single matrix to save
space. The result perm array from my_decomp() is:
[2 3 0 1]
108
This contains essentially the same information as the perm matrix return from linalg.lu(),
indicating that if the lower and upper matrices were extracted from the combined LU matrix, and
then multiplied together, the result would be the original matrix with rows 0 and 2 swapped and
rows 1 and 3 swapped.
Resources
5.3 Statistics
The NumPy and SciPy libraries have a wide range of statistics functions that work with arrays
and matrices. Representative examples include the mean(), std(), median(), and corrcoef()
functions.
Code Listing 24: Statistics Functions Demo
# statistics.py
# Python 2.7
import numpy as np
import math
num = 0.0
for i in xrange(n):
num += (x[i] - mx) * (y[i] - my)
ssx = 0.0
ssy = 0.0
for i in xrange(n):
ssx += math.pow(x[i] - mx, 2)
ssy += math.pow(y[i] - my, 2)
# =====
109
print ability
print ""
ma = np.median(ability)
print "The median ability score is "
print ma
print ""
pr = np.corrcoef(ability, payrate)
print "Pearson r calculated using np.corrcoef() = "
print pr
print ""
pr = my_corr(ability, payrate)
print "Pearson r calculated using my_corr() = "
print "%1.8f" % pr
ability array =
[ 0. 1. 3. 4. 4. 6.]
payrate array =
[ 15. 15. 25. 20. 30. 33.]
110
The demo program execution begins by setting up two parallel arrays. The first array represents
the ability scores of six people. The second array represents the pay rates of the six people:
Next, after displaying the values in the two arrays, the demo illustrates the use of the NumPy
median() and std() functions:
ma = np.median(ability)
print "The median ability score is "
print ma
The median is the middle value in an array or, as in this example when there isn't a single
middle value, the average of the two values closest to the middle.
By default, the NumPy std() function returns the population standard deviation of its array
argument. If you want the sample standard deviation, you can use the ddof (delta degrees of
freedom) parameter with value = 1.
Next, the demo computes and displays the Pearson r coefficient of correlation using the
corrcoef() function:
pr = np.corrcoef(ability, payrate)
print "Pearson r calculated using np.corrcoef() = "
print pr
The correlation coefficient is a value between -1.0 and +1.0, the magnitude indicating the
strength of the linear relation and the sign indicating the direction of the relationship. Notice the
output is in the form of a matrix with the coefficient value (0.88700711) duplicated on the minor
diagonal.
The demo concludes by calling a program-defined function my_corr() that also calculates the
Pearson r coefficient of correlation:
pr = my_corr(ability, payrate)
print "Pearson r calculated using my_corr() = "
print "%1.8f" % pr
There's no advantage to using the program-defined correlation function. The point is that
NumPy and SciPy have many built-in statistics functions, but in the rare situations when you
need to implement a custom statistics function, NumPy and SciPy have all the tools you need.
Resources
111
For an explanation of the Pearson correlation coefficient that was used for my_corr(), see
https://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient.
# distributions.py
# Python 2.7
import numpy as np
import math # for custom Gaussian class
import random # for custom Gaussian class
class Gaussian:
# generate using Box-Muller algorithm
def __init__(self, mean, sd, seed):
self.mean = mean
self.sd = sd
self.rnd = random.Random(seed)
def next(self):
two_pi = 2.0*3.14159265358979323846
u1 = self.rnd.random() # [0.0 to 1.0)
while u1 < 1.0e-10:
u1 = self.rnd.random()
u2 = self.rnd.random()
z = math.sqrt(-2.0 * math.log(u1)) * math.cos(two_pi * u2)
return z * self.sd + self.mean
# =====
np.random.seed(0)
mean = 0.0
std = 1.0
n = 100
112
values = np.zeros(n)
for i in xrange(n):
x = np.random.normal(mean, std)
values[i] = x
bins = 5
print "Constructing histogram data using " + str(bins) + " bins "
(histo, edges) = np.histogram(values, bins=5)
113
1.13940068 -1.23482582 0.40234164 -0.68481009 -0.87079715 -0.57884966
-0.31155253 0.05616534 -1.16514984 0.90082649 0.46566244 -1.53624369
1.48825219 1.89588918 1.17877957 -0.17992484 -1.07075262 1.05445173
-0.40317695 1.22244507 0.20827498 0.97663904 0.3563664 0.70657317
0.01050002 1.78587049 0.12691209 0.40198936]
End demo
The demo program begins by preparing to generate 100 random values that come from a
Normal (also called Gaussian or bell-shaped) distribution with mean = 0.0 and standard
deviation = 1.0.
np.random.seed(0)
mean = 0.0
std = 1.0
n = 100
Setting the global random seed, in this case to an arbitrary value of 0, means that the program
results will be the same every time the program is run. For a Normal distribution with mean =
0.0, the vast majority of values will be between (-3 * std) and (+3 * std), so we expect all
generated values to be in the range [-3.0, +3.0].
Next, the demo program creates an array with 100 cells and fills each cell with a Normal
distributed random value:
An alternative approach is to create the array directly by supplying a value for the optional size
parameter: values = np.normal(mean, std, 100). After displaying the 100 values, the
demo program constructs histogram information from the values:
bins = 5
print "Constructing histogram data using " + str(bins) + " bins "
(histo, edges) = np.histogram(values, bins=5)
114
The NumPy histogram() function returns a tuple that has two arrays. The first array stores the
count of values in each bin. The second array stores the boundary values for each bin. This is
clearer when you examine the output. The statements:
print histo
print edges
This means there were 6 values in the interval [-2.55, -1.58), 20 values in [-1.58, -0.62), 35
values in [-0.62, 0.34), 27 values in [0.34, 1.30), and 12 values in [1.30, 2.26]. If you visually
scan the 100 values, you can see the smallest value generated is -2.55298982 and the largest
is 2.26975462.
The demo program concludes by showing you how to implement a Normal distribution value
generator without using NumPy via a program-defined class named Gaussian. The class
constructor accepts a mean, a standard deviation, and a seed:
class Gaussian:
def __init__(self, mean, sd, seed):
self.mean = mean
self.sd = sd
self.rnd = random.Random(seed)
The class uses a Random object from the Python random module. The next() function uses the
clever Box-Muller algorithm to transform two uniform random values into one that is Normal.
def next(self):
two_pi = 2.0*3.14159265358979323846
u1 = self.rnd.random() # [0.0 to 1.0)
while u1 < 1.0e-10:
u1 = self.rnd.random()
u2 = self.rnd.random()
z = math.sqrt(-2.0 * math.log(u1)) * math.cos(two_pi * u2)
return z * self.sd + self.mean
The while loop in function next() guarantees that variable u1 is not a very small value so that
log(u1) won't fail. This example illustrates that it's relatively easy to implement a custom
generator in rare situations where NumPy doesn't have the generator you need.
Resources
115
For details about the NumPy histogram() function, see
http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.histogram.html.
# doublefact.py
# Python 2.7
import scipy.misc as sm
def my_double_fact(n):
result = 1
stop = 2 # for even n
if n % 2 == 0:
stop = 1 # odd n
for i in xrange(n, stop-1, -2):
result *= i
return result
# =====
n = 3
dfact = sm.factorial2(n)
print "Double factorial of " + str(n) + " using misc.factorial2() = "
print str(dfact)
print ""
n = 4
dfact = sm.factorial2(n)
print "Double factorial of " + str(n) + " using misc.factorial2() = "
print str(dfact)
print ""
n = 4
dfact = my_double_fact(n)
print "Double factorial of " + str(n) + " using my_double_fact() = "
print str(dfact)
print ""
116
C:\SciPy\Ch5> python doublefact.py
End demo
The demo program illustrates the double factorial function, which is best explained by example.
The double factorial of n is often abbreviated as n!!, much like n! is an abbreviation for the
regular factorial function.
7!! = 7 * 5 * 3 * 1 = 105
6!! = 6 * 4 * 2 = 48
In words, the double factorial is like the regular factorial function except every other term in the
product is skipped in the product. The double factorial function is used as a helper in many
important mathematical functions such as the specialized gamma function. The demo program
begins by importing the scipy.misc submodule:
import scipy.misc as sm
Note that the factorial2() function is also in the scipy.special submodule. After import, the
factorial2() function can be called like so:
n = 3
dfact = sm.factorial2(n)
print "Double factorial of " + str(n) + " using misc.factorial2() = "
print str(dfact)
The factorial2() function has an optional parameter exact that, if set to False, allows the
function to do a fast approximation rather than a slower exact calculation.
The demo implements a program-defined version of the double factorial function named
my_double_fact(). There's no advantage to a program-defined version unless you need some
sort of specialized behavior, or wish to avoid importing a module for some reason.
117
Resources
For information about the double factorial function, including alternate definitions, see
https://en.wikipedia.org/wiki/Double_factorial.
# gamma.py
# Python 2.7
import scipy.special as ss
import math
def my_special_gamma(n):
# return gamma(n/2)
if n % 2 == 0: # n/2 is an integer
return math.factorial(n / 2 - 1)
else:
root_pi = math.sqrt(math.pi)
return root_pi * ss.factorial2(n-2) / math.pow(2.0, (n-1) / 2.0)
# =====
n = 3
n_fact = math.factorial(n)
print "Factorial of " + str(n) + " = " + str(n_fact)
n = 4
n_fact = math.factorial(n)
print "Factorial of " + str(n) + " = " + str(n_fact)
print ""
n = 5
n_gamma = ss.gamma(n)
print "Gamma of " + str(n) + " using special.gamma() = "
print str(n_gamma)
print ""
n = 4.5
n_gamma = ss.gamma(n)
print "Gamma of " + str(n) + " using special.gamma() = "
118
print str(n_gamma)
print ""
n = 9
s_gamma = my_special_gamma(n)
print "Gamma of " + str(n) + "/2 using my_special_gamma() = "
print str(s_gamma)
print ""
Factorial of 3 = 6
Factorial of 4 = 24
End demo
The factorial function applies only to integers. The gamma function extends the factorial function
to real numbers. For example, factorial(3) = 3 * 2 * 1 = 6 and factorial(4) = 4 * 3 * 2 * 1 = 24.
However factorial(3.5) is not defined.
For integer arguments, gamma(n) = factorial(n-1). For example, gamma(5) = factorial(4) = 24.
For non-integer arguments, such as n = 4.5, the gamma() function returns a value between
factorial(3) and factorial(4).
Without a routine like the SciPy special.gamma() function, calculating the gamma value for an
arbitrary argument like n = 4.68 is difficult. However, there is a relatively easy way to calculate
gamma for arguments that are integers divided by two. If n is even, then n/2 is an integer and
gamma can be calculated using factorial. For example, gamma(10/2) = gamma(5.0) =
factorial(4). If n is odd, there is a special equation that can be used. For example, if n = 9 then
gamma(9/2) = gamma(4.5) has a shortcut solution. These types of arguments are called
positive half-integers. But for all other arguments, calculating gamma is difficult.
The demo program begins by importing the scipy.special submodule and the Python math
module:
import scipy.special as ss
import math
119
Next, the demo program calculates and displays the factorial for n=3 and n=4 in order to
illustrate the relationship between special.gamma(n) and math.factorial(n):
n = 3
n_fact = math.factorial(n)
print "Factorial of " + str(n) + " = " + str(n_fact)
n = 4
n_fact = math.factorial(n)
print "Factorial of " + str(n) + " = " + str(n_fact)
n = 5
n_gamma = ss.gamma(n)
print "Gamma of " + str(n) + " using special.gamma() = "
print str(n_gamma)
The output is 24.0, verifying that if n is an integer, then gamma(n) = factorial(n-1). Next, the
demo calculates and displays the value of gamma(4.5):
n = 4.5
n_gamma = ss.gamma(n)
print "Gamma of " + str(n) + " using special.gamma() = "
print str(n_gamma)
The point here is that gamma(4.5) = 11.63 is a value between factorial(3) = 6 and factorial(4) =
24.
def my_special_gamma(n):
# return gamma(n/2)
if n % 2 == 0: # n/2 is an integer
return math.factorial(n / 2 - 1)
else:
root_pi = math.sqrt(math.pi)
return root_pi * ss.factorial2(n-2) / math.pow(2.0, (n-1) / 2.0)
For odd values of n, the function's return value is not at all obvious and comes from math
theory. Interestingly, even though the scipy.special submodule has 17 functions that are
related to gamma(), there is no dedicated gamma function for positive half-integer arguments.
Resources
For information about the specialized gamma function for positive half-integers, see
https://en.wikipedia.org/wiki/Particular_values_of_the_Gamma_function.
120