Labs For Foundations of Applied Mathematics (Python Essentials)
Labs For Foundations of Applied Mathematics (Python Essentials)
Foundations of Applied
Mathematics
Python Essentials
B. Barker T. Christensen
Brigham Young University Brigham Young University
E. Evans M. Cook
Brigham Young University Brigham Young University
R. Evans M. Cutler
Brigham Young University Brigham Young University
J. Grout R. Dorff
Drake University Brigham Young University
J. Humpherys B. Ehlert
Brigham Young University Brigham Young University
T. Jarvis M. Fabiano
Brigham Young University Brigham Young University
J. Whitehead K. Finlinson
Brigham Young University Brigham Young University
J. Adams J. Fisher
Brigham Young University Brigham Young University
K. Baldwin R. Flores
Brigham Young University Brigham Young University
J. Bejarano R. Fowers
Brigham Young University Brigham Young University
J. Bennett A. Frandsen
Brigham Young University Brigham Young University
A. Berry R. Fuhriman
Brigham Young University Brigham Young University
Z. Boyd T. Gledhill
Brigham Young University Brigham Young University
M. Brown S. Giddens
Brigham Young University Brigham Young University
A. Carr C. Gigena
Brigham Young University Brigham Young University
C. Carter M. Graham
Brigham Young University Brigham Young University
S. Carter F. Glines
Brigham Young University Brigham Young University
i
ii List of Contributors
C. Glover E. Manner
Brigham Young University Brigham Young University
M. Goodwin M. Matsushita
Brigham Young University Brigham Young University
R. Grout R. McMurray
Brigham Young University Brigham Young University
D. Grundvig S. McQuarrie
Brigham Young University Brigham Young University
S. Halverson E. Mercer
Brigham Young University Brigham Young University
E. Hannesson D. Miller
Brigham Young University Brigham Young University
K. Harmer J. Morrise
Brigham Young University Brigham Young University
J. Henderson M. Morrise
Brigham Young University Brigham Young University
J. Hendricks A. Morrow
Brigham Young University Brigham Young University
A. Henriksen R. Murray
Brigham Young University Brigham Young University
I. Henriksen J. Nelson
Brigham Young University Brigham Young University
B. Hepner C. Noorda
Brigham Young University Brigham Young University
C. Hettinger A. Oldroyd
Brigham Young University Brigham Young University
S. Horst A. Oveson
Brigham Young University Brigham Young University
R. Howell E. Parkinson
Brigham Young University Brigham Young University
E. Ibarra-Campos M. Probst
Brigham Young University Brigham Young University
K. Jacobson M. Proudfoot
Brigham Young University Brigham Young University
R. Jenkins D. Reber
Brigham Young University Brigham Young University
J. Larsen H. Ringer
Brigham Young University Brigham Young University
J. Leete C. Robertson
Brigham Young University Brigham Young University
Q. Leishman M. Russell
Brigham Young University Brigham Young University
J. Lytle R. Sandberg
Brigham Young University Brigham Young University
List of Contributors iii
C. Sawyer T. Thompson
Brigham Young University Brigham Young University
N. Sill B. Trendler
Brigham Young University Brigham Young University
D. Smith
M. Victors
Brigham Young University
Brigham Young University
J. Smith
Brigham Young University E. Walker
P. Smith Brigham Young University
Brigham Young University J. Webb
M. Stauffer Brigham Young University
Brigham Young University R. Webb
E. Steadman Brigham Young University
Brigham Young University
J. West
J. Stewart
Brigham Young University
Brigham Young University
S. Suggs R. Wonnacott
Brigham Young University Brigham Young University
A. Tate A. Zaitzeff
Brigham Young University Brigham Young University
iv List of Contributors
Preface
This lab manual is designed to accompany the textbook Foundations of Applied Mathematics
by Humpherys, Jarvis and Evans. This manual begins with an introduction to Python [VD10]
from scratch, which only requires a basic understanding of general programming concepts (variables,
functions, etc.). Later labs introduce several Python packages that are essential for mathematical
and scientific computing in Python.
©This work is licensed under the Creative Commons Attribution 3.0 United States License.
You may copy, distribute, and display this copyrighted work only if you give credit to Dr. J. Humpherys.
All derivative works must include an attribution to Dr. J. Humpherys as the owner of this work as
well as the web address to
https://github.com/Foundations-of-Applied-Mathematics/Labs
as the original source of this work.
To view a copy of the Creative Commons Attribution 3.0 License, visit
http://creativecommons.org/licenses/by/3.0/us/
or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105,
USA.
v
vi Preface
Contents
Preface v
I Labs 1
1 Introduction to Python 3
3 Introduction to NumPy 43
4 Object-oriented Programming 61
5 Introduction to Matplotlib 73
9 Profiling 131
II Appendices 179
Bibliography 201
vii
viii Contents
Part I
Labs
1
1
Introduction to Python
Getting Started
Python is quickly gaining momentum as a fundamental tool in scientific computing. To install
Python, see the Getting Started document.
Running Python
Python files are saved with a .py extension. For beginners, we strongly recommend using a simple
text editor for writing Python files, though many free IDEs (Integrated Development Environments—
large applications that facilitate code development with some sophisticated tools) are also compatible
with Python. For now, the simpler the coding environment, the better.
A plain Python file looks similar to the following code.
# filename.py
"""This is the file header.
The header contains basic information about the file.
"""
if __name__ == "__main__":
pass # 'pass' is a temporary placeholder.
The # character creates a single-line comment. Comments are ignored by the interpreter and
serve as annotations for the accompanying source code. A pair of three quotes, """ """ or ''' ''',
creates a multi-line string literal, which may also be used as a multi-line comment. A triple-quoted
string literal at the top of the file serves as the header for the file. The header typically identifies the
author and includes instructions on using the file. Executable Python code comes after the header.
3
4 Lab 1. Introduction to Python
Problem 1. Open the file named python_intro.py (or create the file in a text editor if you
don’t have it). Add your information to the header at the top, then add the following code.
if __name__ == "__main__":
print("Hello, world!") # Indent with four spaces (NOT a tab).
Be sure to save your edited file. Open a command prompt (Terminal on Linux or Mac and
Command Prompt or GitBash on Windows) and navigate to the directory where the new file
is saved. Use the command ls (or DIR on Windows) to list the files and folders in the current
directory, pwd (CD , on Windows) to print the working directory, and cd to change directories.
Now the Python file can be executed with the following command:
$ python python_intro.py
If Hello, world! is displayed on the screen, you have just successfully executed your
first Python program!
Achtung!
The if __name__ == "__main__" clause is incredibly helpful to create test functions and debug
your code. In order to use if __name__ == "__main__" to test declared functions, it must
be placed at the end of your file with the code you desire to run directly following. If you
attempt to run that block of code at the beginning of the file with functions that are declared
afterwards, you’ll get an error of having undefined functions.
IPython
Python can be run interactively using several interfaces. The most basic of these is the Python
interpreter. In this and subsequent labs, the triple brackets >>> indicate that the given code is being
executed one line at a time via the Python interpreter.
There are, however, more useful interfaces. Chief among these is IPython,1 [PG07, jup]. To
execute a script in IPython, use the %run command.
# A list is a basic Python data structure. To see the methods associated with
# a list, type the object name (list), followed by a period, and press tab.
In [1]: list. # Press 'tab'.
append() count() insert() remove()
clear() extend() mro() reverse()
copy() index() pop() sort()
# To learn more about a specific method, use a '?' and hit 'Enter'.
In [1]: list.append?
Signature: list.append(self, object, /)
Docstring: Append object to the end of the list.
Type: method_descriptor
class list(object)
| list(iterable=(),/)
| # ... # Press 'q' to exit the info screen.
Note
Use IPython side-by-side with a text editor to test syntax and small code snippets quickly.
Testing small pieces of code in IPython before putting them into a program reveals errors and
greatly speeds up the coding process. Consult the internet with questions; stackoverflow.com
is a particularly valuable resource for answering common programming questions.
The best way to learn a new coding language is by actually writing code. Follow along
with the examples in the yellow code boxes in this lab by executing them in an IPython console.
Avoid copy and paste for now; your fingers need to learn the language as well.
Python Basics
Arithmetic
Python can be used as a calculator with the regular +, -, *, and / operators. Use ** for exponentiation
and % for modular division.
In most Python interpreters, the underscore character _ is a variable with the value of the
previous command’s output, like the ANS button on many calculators.
>>> 12 * 3
36
>>> _ / 4
9.0
Data comparisons like < and > act as expected. The == operator checks for numerical equality
and the <= and >= operators correspond to ≤ and ≥, respectively. To connect multiple boolean
expressions, use the operators and, or, and not.2
>>> True and True and True and True and True and False
False
>>> False or False or False or False or False or True
2 In many other programming languages, the and, or, and not operators are written as &&, ||, and !, respectively.
Python’s convention is much more readable and does not require parentheses.
7
True
>>> True or not True
True
Variables
Variables are used to temporarily store data. A single equals sign = assigns one or more values (on
the right) to one or more variable names (on the left). A double equals sign == is a comparison
operator that returns True or False, as in the previous code block.
Unlike many programming languages, Python does not require a variable’s data type to be
specified upon initialization. Because of this, Python is called a dynamically typed language.
Functions
To define a function, use the def keyword followed by the function name, a parenthesized list of
parameters, and a colon. Then indent the function body using exactly four spaces.
Achtung!
Many other languages use the curly braces {} to delimit blocks, but Python uses whitespace
indentation. In fact, whitespace is essentially the only thing that Python is particularly picky
about compared to other languages: mixing tabs and spaces confuses the interpreter
and causes problems. Most text editors have a setting to set the indentation type to spaces
so you can use the tab key on your keyboard to insert four spaces (sometimes called soft tabs).
For consistency, never use tabs; always use spaces.
Functions are defined with parameters and called with arguments, though the terms are often
used interchangeably. Below, width and height are parameters for the function area(). The values
2 and 5 are the arguments that are passed when calling the function.
The keyword lambda is a shortcut for creating one-line functions. For example, the polynomials
f (x) = 6x3 + 4x2 − x + 3 and g(x, y, z) = x + y 2 − z 3 can be defined as functions in one line each.
Note
Documentation is important in every programming language. Every function should have a
docstring—a string literal in triple quotes just under the function declaration—that describes
the purpose of the function, the expected inputs and return values, and any other notes that
are important to the user. Short docstrings are acceptable for very simple functions, but more
complicated functions require careful and detailed explanations.
Lambda functions cannot have custom docstrings, so the lambda keyword should be only
be used as a shortcut for very simple or intuitive functions that need no additional labeling.
Problem 2. The volume of a sphere with radius r is V = 34 πr3 . In your Python file from
Problem 1, define a function called sphere_volume() that accepts a single parameter r. Return
the volume of the sphere of radius r, using 3.14159 as an approximation for π (for now). Also
write an appropriate docstring for your function.
To test your function, call it under the if __name__ == "__main__" clause and print the
returned value. Run your file to see if your answer is what you expect it to be.
Achtung!
The return statement instantly ends the function call and passes the return value to the
function caller. However, functions are not required to have a return statement. A function
without a return statement implicitly returns the Python constant None, which is similar to
the special value null of many other languages. Calling print() at the end of a function does
not cause a function to return any values.
If you have any intention of using the results of a function, use a return statement.
It is also possible to specify default values for a function’s parameters. In the following example,
the function pad() has three parameters, and the value of c defaults to 0. If it is not specified in the
function call, the variable c will contain the value 0 when the function is executed.
It’s important to note that positional arguments must precede named arguments in a function
call. Additionally, parameters without default values must precede parameters with default values in
a function definition. For example, a and b must come before c in the function definition of pad().
Examine the following code blocks demonstrating how positional and named arguments are used to
call a function.
# Correctly define pad() with the named argument after positional arguments.
>>> def pad(a, b, c=0):
... """Print the arguments, plus a zero if c is not specified."""
... print(a, b, c)
...
Problem 3. The built-in print() function has the useful keyword arguments sep and end.
It accepts any number of positional arguments and prints them out with sep inserted between
values (defaulting to a space), then prints end (defaulting to the newline character '\n').
Write a function called isolate() that accepts five arguments. The function should print
the first three arguments separated by 5 spaces and then print the last two arguments with a
single space seperating the last three arguments. For example,
11
>>> isolate(1, 2, 3, 4, 5)
1 2 3 4 5
Achtung!
In previous versions of Python, print() was a statement (like return), not a function, and
could therefore be executed without parentheses. However, it lacked keyword arguments like
sep and end. If you are using Python 2.7, include the following line at the top of the file to
turn the print statement into the new print() function.
Python has two types of division: integer and float. The / operator performs float division
(true fractional division), and the // operator performs integer division, which rounds the result
down to the next integer. If both operands for // are integers, the result will be an int. If one or
both operands are floats, the result will be a float. Regular division with / always returns a float.
Achtung!
12 Lab 1. Introduction to Python
In previous versions of Python, using / with two integers performed integer division, even in
cases where the division was not even. This can result in some incredibly subtle and frustrating
errors. If you are using Python 2.7, always include a . on the operands or cast at least one as
a float when you want float division.
# PYTHON 2.7
>>> 15 / 4 # The answer should be 3.75, but the
3 # interpreter does integer division!
Alternatively, including the following line at the top of the file redefines the / and // operators
so they are handled the same way as in Python 3.
Python also supports complex numbers computations by pairing two numbers as the real and
imaginary parts. Use the letter j, not i, for the imaginary part.
Strings
In Python, strings are created with either single or double quotes. To concatenate two or more
strings, use the + operator between string variables or literals.
Parts of a string can be accessed using slicing, indicated by square brackets [ ]. Slicing syntax
is [start:stop:step]. The parameters start and stop default to the beginning and end of the
string, respectively. The parameter step defaults to 1.
'!'
# Slice from the 0th to the 5th character (not including the 5th character).
>>> my_string[:5]
'Hello'
# Slice from the 3rd to the 8th character (not including the 8th character).
>>> my_string[3:8]
'lo wo'
1. first_half() should accept a parameter and return the first half of it, excluding the
middle character if there is an odd number of characters.
(Hint: the built-in function len() returns the length of the input.)
2. The backward() function should accept a parameter and reverse the order of its characters
using slicing, then return the reversed parameter.
(Hint: The step parameter used in slicing can be negative.)
Lists
A Python list is created by enclosing comma-separated values with square brackets [ ]. Entries of
a list do not have to be of the same type. Access entries in a list with the same indexing or slicing
operations used with strings.
Common list methods (functions) include append(), insert(), remove(), and pop(). Consult
IPython for details on each of these methods using object introspection.
14 Lab 1. Introduction to Python
The in operator quickly checks if a given value is in a list (or another iterable, including strings).
Tuples
A Python tuple is an ordered collection of elements, created by enclosing comma-separated values
with parentheses ( and ). Tuples are similar to lists, but they are much more rigid, have fewer
built-in operations, and cannot be altered after creation. Lists are therefore preferable for managing
dynamic ordered collections of objects.
When multiple objects are returned by a function, they are returned as a tuple. For example,
recall that the arithmetic() function returns two values.
Problem 5. Write a function called list_ops(). Define a list with the entries "bear", "ant",
"cat", and "dog", in that order. Then perform the following operations on the list:
1. Append "eagle".
Sets
A Python set is an unordered collection of distinct objects. Objects can be added to or removed
from a set after its creation. Initialize a set with curly braces { }, separating the values by commas,
or use set() to create an empty set. Like mathematical sets, Python sets have operations like union,
intersection, difference, and symmetric difference.
Dictionaries
Like a set, a Python dict (dictionary) is an unordered data type. A dictionary stores key-value
pairs, called items. The values of a dictionary are indexed by its keys. Dictionaries are initialized
with curly braces, colons, and commas. Use dict() or {} to create an empty dictionary.
16 Lab 1. Introduction to Python
As far as data access goes, lists are like dictionaries whose keys are the integers 0, 1, . . . , n − 1,
where n is the number of items in the list. The keys of a dictionary need not be integers, but they
must be immutable, which means that they must be objects that cannot be modified after creation.
We will discuss mutability more thoroughly in the Standard Library lab.
Type Casting
The names of each of Python’s data types can be used as functions to cast a value as that type. This
is particularly useful for converting between integers and floats.
The If Statement
An if statement executes the indented code if (and only if) the given condition holds. The elif
statement is short for “else if” and can be used multiple times following an if statement, or not at
all. The else keyword may be used at most once at the end of a series of if/elif statements.
Problem 6. Write a function called pig_latin(). Accept a string parameter word, translate
it into Pig Latin, then return the translation. Specifically, if word starts with a vowel, add
“hay” to the end; if word starts with a consonant, take the first character of word, move it to
the end, and add “ay”.
(Hint: use the in operator to check if the first letter is a vowel.)
>>> i = 0
>>> while i < 10:
... print(i, end=' ') # Print a space instead of a newline.
... i += 1 # Shortcut syntax for i = i+1.
...
0 1 2 3 4 5 6 7 8 9
1. break manually exits the loop, regardless of which iteration the loop is on or if the termination
condition is met.
2. continue skips the current iteration and returns to the top of the loop block if the termination
condition is still not met.
18 Lab 1. Introduction to Python
>>> i = 0
>>> while True:
... print(i, end=' ')
... i += 1
... if i >= 10:
... break # Exit the loop.
...
0 1 2 3 4 5 6 7 8 9
>>> i = 0
>>> while i < 10:
... i += 1
... if i % 3 == 0:
... continue # Skip multiples of 3.
... print(i, end=' ')
1 2 4 5 7 8 10
The break and continue statements also work in for loops, but a continue in a for loop will
automatically increment the index or item, whereas a continue in a while loop makes no automatic
changes to any variable.
In addition, Python has some very useful built-in functions that can be used in conjunction
with the for statement:
19
1. range(start, stop, step): Produces a sequence of integers, following slicing syntax. If only
one argument is specified, it produces a sequence of integers from 0 up to (but not including)
the argument, incrementing by one. This function is used very often.
2. zip(): Joins multiple sequences in parallel so they can be iterated over simultaneously.
3. enumerate(): Yields both a count and a value from the sequence. Typically used to get both
the index of an item and the actual item simultaneously.
5. sorted(): Returns a new list of sorted items that can then be used for iteration.
Each of these functions except for sorted() returns an iterator, an object that is built specifically
for looping but not for creating actual lists. To put the items of the sequence in a collection, use
list(), set(), or tuple().
# Iterate by index.
>>> for i in range(5):
... print(i, vowels[i], colors[i])
...
0 a red
1 e yellow
2 i white
3 o blue
4 u purple
List Comprehension
A list comprehension uses for loop syntax between square brackets to create a list. This is a powerful,
efficient way to build lists. The code is concise and runs quickly.
List comprehensions can be thought of as “inverted loops”, meaning that the body of the loop
comes before the looping condition. The following loop and list comprehension produce the same
list, but the list comprehension takes only about two-thirds the time to execute.
>>> loop_output = []
>>> for i in range(5):
... loop_output.append(i**2)
...
>>> list_output = [i**2 for i in range(5)]
21
Tuple, set, and dictionary comprehensions can be done in the same way as list comprehensions
by using the appropriate style of brackets on the end.
Write a function called alt_harmonic() that accepts an integer n. Use a list comprehension
to quickly compute and sum the first n terms of this series (be careful not to sum only n − 1
terms). The sum of the first 500,000 terms of this series approximates ln(2) to five decimal
places.
(Hint: consider using Python’s built-in sum() function.)
22 Lab 1. Introduction to Python
Additional Material
Further Reading
Refer back to this and other introductory labs often as you continue getting used to Python syntax
and data types. As you continue your study of Python, we strongly recommend the following readings.
Function Decorators
A function decorator is a special function that “wraps” other functions. It takes in a function as
input and returns a new function that pre-processes the inputs or post-processes the outputs of the
original function.
The outer function, typewriter(), returns the new function wrapper(). Since wrapper()
accepts *args and **kwargs as arguments, the input function func() could accept any number of
positional or keyword arguments.
Apply a decorator to a function by tagging the function’s definition with an @ symbol and the
decorator name.
>>> @typewriter
... def combine(a, b, c):
... return a*b // c
Placing the tag above the definition is equivalent to adding the following line of code after the
function definition:
Now calling combine() actually calls wrapper(), which then calls the original combine().
>>> combine(3, 4, 6)
output type: <class 'int'>
2
>>> combine(3.0, 4, 6)
output type: <class 'float'>
2.0
Function decorators can also be customized with arguments. This requires another level of
nesting: the outermost function must define and return a decorator that defines and returns a
wrapper.
Lab Objective: Python is designed to make it easy to implement complex tasks with little code.
To that end, every Python distribution includes several built-in functions for accomplishing common
tasks. In addition, Python is designed to import and reuse code written by others. A Python file with
code that can be imported is called a module. All Python distributions include a collection of modules
for accomplishing a variety of tasks, collectively called the Python Standard Library. In this lab we
explore some built-in functions, learn how to create, import, and use modules, and become familiar
with the standard library.
Built-in Functions
Python has several built-in functions that may be used at any time. IPython’s object introspection
feature makes it easy to learn about these functions: start IPython from the command line and use
? to bring up technical details on each function.
In [1]: min?
Docstring:
min(iterable, *[, default=obj, key=func]) -> value
min(arg1, arg2, *args, *[, key=func]) -> value
In [2]: len?
Signature: len(obj, /)
Docstring: Return the number of items in a container.
Type: builtin_function_or_method
25
26 Lab 2. The Standard Library
Function Returns
abs() The absolute value of a real number, or the magnitude
of a complex number.
min() The smallest element of a single iterable, or the smallest
of several arguments. Strings are compared based on
lexicographical order: numerical characters first, then
upper-case letters, then lower-case letters.
max() The largest element of a single iterable, or the largest
of several arguments.
len() The number of items of a sequence or collection.
round() A float rounded to a given precision in decimal digits.
sum() The sum of a sequence of numbers.
# len() can be used on a string, list, set, dict, tuple, or other iterable.
>>> print(len([2, 7, 1]), len("abcdef"), len({1, 'a', 'a'}))
3 6 2
Problem 1. Write a function that accepts a list L and returns the minimum, maximum, and
average of the entries of L in that order as multiple values (separated by a comma). Can you
implement this function in one return statement?
Namespaces
Whenever a Python object—a number, data structure, function, or other entity—is created, it is
stored somewhere in computer memory. A name (or variable) is a reference to a Python object, and
a namespace is a dictionary that maps names to Python objects.
A single equals sign assigns a name to an object. If a name is assigned to another name, that
new name refers to the same object as the original name.
To see all of the names in the current namespace, use the built-in function dir(). To delete a
name from the namespace, use the del keyword (with caution!).
Note
Many programming languages distinguish between variables and pointers. A pointer refers to
a variable by storing the address in memory where the corresponding object is stored. Python
names are essentially pointers, and traditional pointer operations and cleanup are done auto-
matically. For example, Python automatically deletes objects in memory that have no names
assigned to them (no pointers referring to them). This feature is called garbage collection.
Mutability
Every Python object type falls into one of two categories: a mutable object, which may be altered at
any time, or an immutable object, which cannot be altered once created. Attempting to change an
immutable object creates a new object in memory. If two names refer to the same mutable object, any
changes to the object are reflected in both names since they still both refer to that same object. On
the other hand, if two names refer to the same immutable object and one of the values is “changed,”
then one name will refer to the original object, and the other will refer to a new object in memory.
Achtung!
Failing to correctly copy mutable objects can cause subtle problems. For example, consider a
dictionary that maps items to their base prices. To make a similar dictionary that accounts for
a small sales tax, we might try to make a copy by assigning a new name to the first dictionary.
To avoid this problem, explicitly create a copy of the object by casting it as a new structure.
Changes made to the copy will not change the original object, since they are distinct objects
in memory. To fix the above code, replace the second line with the following:
Then, after running the same procedure, the two dictionaries will be different.
Problem 2. Determine which Python object types are mutable and which are immutable by
repeating the following experiment for an int, str, list, tuple, and set.
3. Alter the object via only one of the names (for tuples, use my_tuple += (1,)).
4. Check to see if the two names are equal. If they are, then changing one name also changes
the other. Thus, both names refer to the same object and the object type is mutable.
Otherwise, the names refer to different objects—meaning a new object was created in
step 2—and therefore the object type is immutable.
For example, the following experiment shows that dict is a mutable type.
Print a statement of your conclusions that clearly indicates which object types are mutable and
which are immutable.
Achtung!
Mutable objects cannot be put into Python sets or used as keys in Python dictionaries. However,
the values of a dictionary may be mutable or immutable.
Modules
A module is a Python file containing code that is meant to be used in some other setting, and not
necessarily run directly.1 The import statement loads code from a specified Python file. Importing a
module containing some functions, classes, or other objects makes those functions, classes, or objects
available for use by adding their names to the current namespace.
All import statements should occur at the top of the file, below the header but before any other
code. There are several ways to use import:
1. import <module> makes the specified module available under the alias of its own name.
2. import <module> as <name> creates an alias for an imported module. The alias is added to
the current namespace, but the module name itself is not.
>>> import numpy as np # The name 'np' gives access to the numpy
>>> np.sqrt(2) # module, but the name 'numpy' does not.
1.4142135623730951
>>> numpy.sqrt(2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'numpy' is not defined
3. from <module> import <object> loads the specified object into the namespace without load-
ing anything else in the module or the module name itself. This is used most often to access
specific functions from a module. The as statement can also be tacked on to create an alias.
>>> from random import randint # The name 'randint' gives access to the
>>> r = randint(0, 10000) # randint() function, but the rest of
>>> random.seed(r) # the random module is unavailable.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'random' is not defined
In each case, the final word of the import statement is the name that is added to the namespace.
# example1.py
data = list(range(4))
1 Python files that are primarily meant to be executed, not imported, are often called scripts.
31
def display():
print("Data:", data)
if __name__ == "__main__":
display()
print("This file was executed from the command line or an interpreter.")
else:
print("This file was imported.")
Executing the file from the command line executes the file line by line, including the code under
the if __name__ == "__main__" clause.
$ python example1.py
Data: [0, 1, 2, 3]
This file was executed from the command line or an interpreter.
Executing the file with IPython’s special %run command executes each line of the file and also
adds the module’s names to the current namespace. This is the quickest way to test individual
functions via IPython.
In [2]: display()
Data: [0, 1, 2, 3]
Importing the file also executes each line,2 but only adds the indicated alias to the namespace.
Also, code under the if __name__ == "__main__" clause is not executed when a file is imported.
2 Try importing the this or antigravity modules. Importing these modules actually executes some code.
32 Lab 2. The Standard Library
Problem 3. Create a module called calculator.py. Write a function sum() that returns the
sum of two arguments and a function product() that returns the product of two arguments.
Also use import to add the sqrt() function from the math module to the namespace. When
this file is either run or imported, nothing should be executed.
In your solutions file, import your new custom module. Write a function that accepts two
numbers representing the lengths of the sides of a right triangle. Using only the functions from
calculator.py, calculate and return the length of the hypotenuse of the triangle.
Achtung!
If a module has been imported in IPython and the source code then changes, using import again
does not refresh the name in the IPython namespace. Use run instead to correctly refresh the
namespace. Consider this example where we test the function sum_of_squares(), saved in the
file example2.py.
# example2.py
def sum_of_squares(x):
"""Return the sum of the squares of all positive integers
less than or equal to x.
"""
return sum([i**2 for i in range(1,x)])
In [2]: sum_of_squares(3)
Out[2]: 5 # Should be 14!
Since 12 +22 +32 = 14, not 5, something has gone wrong. Modify the source file to correct
the mistake, then run the file again in IPython.
# example2.py
def sum_of_squares(x):
"""Return the sum of the squares of all positive integers
less than or equal to x.
"""
return sum([i**2 for i in range(1,x+1)]) # Include the final term.
Remember that running or importing a file executes any freestanding code snippets, but
any code under an if __name__ == "__main__" clause will only be executed when the file is
run (not when it is imported).
Module Description
cmath Mathematical functions for complex numbers.
itertools Tools for iterating through sequences in useful ways.
math Standard mathematical functions and constants.
random Random variable generators.
string Common string literals.
sys Tools for interacting with the interpreter.
time Time value generation and manipulation.
Use IPython’s object introspection to quickly learn about how to use the various modules and
functions in the standard library. Use ? or help() for information on the module or one of its names.
To see the entire module’s namespace, use the tab key.
In [2]: math?
Type: module
String form: <module 'math' (built-in)>
Docstring:
This module provides access to the mathematical functions
defined by the C standard.
# Type the module name, a period, then press tab to see the module's namespace.
In [3]: math. # Press 'tab'.
acos() cos() factorial() isclose() log2() tan()
acosh() cosh() floor() isfinite() modf() tanh()
asin() degrees() fmod() isinf() nan tau
asinh() e frexp() isnan() pi trunc()
atan() erf() fsum() ldexp() pow()
atan2() erfc() gamma() lgamma() radians()
atanh() exp() gcd() log() sin()
ceil() expm1() hypot() log10() sinh()
34 Lab 2. The Standard Library
In [3]: math.sqrt?
Signature: math.sqrt(x, /)
Docstring: Return the square root of x.
Type: builtin_function_or_method
Function Description
chain() Iterate over several iterables in sequence.
cycle() Iterate over an iterable repeatedly.
combinations() Return successive combinations of elements in an iterable.
permutations() Return successive permutations of elements in an iterable.
product() Iterate over the Cartesian product of several iterables.
# Get all permutations of length 2 from "ABC". Note that order matters here.
>>> list(permutations("ABC", 2))
[('A', 'B'), ('A', 'C'), ('B', 'A'), ('B', 'C'), ('C', 'A'), ('C', 'B')]
35
Problem 4. The power set of a set A, denoted P(A) or 2A , is the set of all subsets of A,
including the empty set ∅ and A itself. For example, the power set of the set A = {a, b, c} is
2A = {∅, {a}, {b}, {c}, {a, b}, {a, c}, {b, c}, {a, b, c}}.
Write a function that accepts an iterable A. Use an itertools function to compute and
return the power set of A as a list of sets (why couldn’t it be a set of sets in Python?). The
empty set should be returned as set().
Many real-life events can be simulated by taking random samples from a probability distribution.
For example, a coin flip can be simulated by randomly choosing between the integers 1 (for heads)
and 0 (for tails). The random module includes functions for sampling from probability distributions
and generating random data.
Function Description
choice() Choose a random element from a non-empty sequence, such as a list.
randint() Choose a random integer over a closed interval.
random() Pick a float from the interval [0, 1).
sample() Choose several unique random elements from a non-empty sequence.
seed() Seed the random number generator.
shuffle() Randomize the ordering of the elements in a list.
Some of the most common random utilities involve picking random elements from iterables.
The time() function is useful for measuring how long it takes for code to run: record the time
just before and just after the code in question, then subtract the first measurement from the second
to get the number of seconds that have passed.
Now we can execute code based on the command line arguments by inserting the use of the
sys module in our .py file:
37
# example3.py
"""If there are two command line arguments after the .py file name, print a ←-
descriptive statment."""
import sys
Note that the first command line argument is always the filename, so that is always the first
element of the sys.argv list. This is why we have the if statement execute if the length of the list
is 3 even though we want it to execute when there are 2 command line arguments. Also, sys.argv
is always a list of strings. If a number is provided on the command line, it is converted to a string
when it is stored in sys.argv. In IPython, command line arguments are specified after the %run
command.
Another way to get input from the program user is to prompt the user for text. The built-in
function input() pauses the program and waits for the user to type something. Like command line
arguments, the user’s input is parsed as a string.
>>> x
'20' # Note that x contains a string.
38 Lab 2. The Standard Library
>>> y
16 # Note that y contains an integer.
Problem 5. Shut the box is a popular British pub game that is used to help children learn
arithmetic. The player starts with the numbers 1 through 9, and the goal of the game is
to eliminate as many of these numbers as possible. At each turn the player rolls two dice,
then chooses a set of integers from the remaining numbers that sum up to the sum of the
dice roll. These numbers are removed, and the dice are then rolled again. If the sum of the
remaining numbers is 6 or less, then only one die is rolled. The game ends when none of the
remaining integers can be combined to the sum of the dice roll, and the player’s final score
is the sum of the numbers that could not be eliminated. For a demonstration, see https:
//www.youtube.com/watch?v=mwURQC7mjDI.
Modify your solutions file so that when the file is run with the correct command line
arguments (but not when it is imported), the user plays a game of shut the box. The provided
module box.py contains two functions that will be useful in your implementation of the game.
You do not need to understand exactly how the functions work, but you do need to be able to
import and use them correctly. Their functionality is outlined at the beginning of each function
declaration. Your game should match the following specifications:
• Require three total command line arguments: the file name (included by default), the
player’s name, and a time limit in seconds. If there are not exactly three command line
arguments, do not start the game.
• Use the random module to simulate rolling two six-sided dice. However, if the sum of the
player’s remaining numbers is 6 or less, roll only one die.
• The player wins if they have no numbers left, and they lose if they are out of time or if
they cannot choose numbers to match the dice roll.
• If the game is not over, print the player’s remaining numbers, the sum of the dice roll,
and the number of seconds remaining. Prompt the user for numbers to eliminate. The
input should be one or more of the remaining integers, separated by spaces. If the user’s
input is invalid, prompt them for input again before rolling the dice again.
(Hint: use round() to format the number of seconds remaining nicely.)
• When the game is over, display the player’s name, their score, and the total number of
seconds since the beginning of the game. Congratulate or mock the player appropriately.
(Hint: Before you start coding, write an outline for the entire program, adding one feature
at a time. Only start implementing the game after you are completely finished designing it.)
39
Your game should look similar to the following examples. The characters in red are typed
inputs from the user.
The next two examples show different ways that a player could lose (which they usually
do), as well as examples of invalid user input. Use the box module’s parse_input() to detect
invalid input.
Additional Material
More Built-in Functions
The following built-in functions are worth knowing, especially for working with iterables and writing
very readable conditional statements.
Function Description
all() Return True if bool(entry) evaluates to True for every entry in
the input iterable.
any() Return True if bool(entry) evaluates to True for any entry in the
input iterable.
bool() Evaluate a single input object as True or False.
eval() Execute a string as Python code and return the output.
map() Apply a function to every item of the input iterable and return
an iterable of the results.
Python Packages
Large programming projects often have code spread throughout several folders and files. In order
to get related files in different folders to communicate properly, the associated directories must be
organized into a Python packages.
42 Lab 2. The Standard Library
A package is simply a folder that contains a file called __init__.py. This file is always executed
first whenever the package is used. A package must also have a file called __main__.py in order to
be executable. Executing the package will run __init__.py and then __main__.py, but importing
the package will only run __init__.py.
Use the regular syntax to import a module or subpackage that is in the current package, and
use from <subpackage.module> import <object> to load a module within a subpackage. Once a
name has been loaded into a package’s __init__.py, other files in the same package can load the
same name with from . import <object>. To access code in the directory one level above the
current directory, use the syntax from .. import <object> This tells the interpreter to go up one
level and import the object from there. This is called an explicit relative import and cannot be done
in files that are executed directly (like __main__.py).
Finally, to execute a package, run Python from the shell with the flag -m (for “module-name”)
and exclude the extension .py.
$ python -m package_name
Lab Objective: NumPy is a powerful Python package for manipulating data with multi-dimensional
vectors. Its versatility and speed makes Python an ideal language for applied and computational
mathematics. In this lab we introduce basic NumPy data structures and operations as a first step to
numerical computing in Python.
Arrays
In many algorithms, data can be represented mathematically as a vector or a matrix. Conceptually,
a vector is just a list of numbers and a matrix is a two-dimensional list of numbers (a list of lists).
However, even basic linear algebra operations like matrix multiplication are cumbersome to implement
and slow to execute when data is stored this way. The NumPy module1 [Oli06, ADH+ 01, Oli07] offers
a much better solution.
The basic object in NumPy is the array, which is conceptually similar to a matrix. The NumPy
array class is called ndarray (for “n-dimensional array”). The simplest way to explicitly create a 1-D
ndarray is to define a list, then cast that list as an ndarray with NumPy’s array() function.
43
44 Lab 3. Introduction to NumPy
Problem 1. There are two main ways to perform matrix multiplication in NumPy: with
NumPy’s dot() function (np.dot(A, B)), or with the @ operator (A @ B). Write a function
that defines the following matrices as NumPy arrays.
2 6 −5 3
3 −1 4
A= B = 5 −8 9 7
1 5 −9
9 −3 −2 −3
Achtung!
The @ operator was not introduced until Python 3.5. It triggers the __matmul__() magic
method,a which for the ndarray is essentially a wrapper around np.dot(). If you are using a
previous version of Python, always use np.dot() to perform basic matrix multiplication.
a See the lab on Object Oriented Programming for an overview of magic methods.
NumPy arrays act like mathematical vectors and matrices: + and * perform component-wise
addition or multiplication.
Problem 2. Write a function that defines the following matrix as a NumPy array.
3 1 4
A= 1 5 9
−5 3 1
Array Attributes
An ndarray object has several attributes, some of which are listed below.
Attribute Description
dtype The type of the elements in the array.
ndim The number of axes (dimensions) of the array.
shape A tuple of integers indicating the size in each dimension.
size The total number of elements in the array.
46 Lab 3. Introduction to NumPy
Note that ndim is the number of entries in shape, and that the size of the array is the product
of the entries of shape.
Function Returns
arange() Array of sequential integers (like list(range())).
eye() 2-D array with ones on the diagonal and zeros elsewhere.
ones() Array of given shape and type, filled with ones.
ones_like() Array of ones with the same shape and type as a given array.
zeros() Array of given shape and type, filled with zeros.
zeros_like() Array of zeros with the same shape and type as a given array.
full() Array of given shape and type, filled with a specified value.
full_like() Full array with the same shape and type as a given array.
Each of these functions accepts the keyword argument dtype to specify the data type. Common
types include np.bool_, np.int64, np.float64, and np.complex128.
Unlike native Python data structures, all elements of a NumPy array must be of the
same data type. To change an existing array’s data type, use the array’s astype() method.
The following functions are for dealing with the diagonal, upper, or lower portion of an array.
Function Description
diag() Extract a diagonal or construct a diagonal array.
tril() Get the lower-triangular portion of an array by replacing entries above
the diagonal with zeros.
triu() Get the upper-triangular portion of an array by replacing entries below
the diagonal with zeros.
# diag() can also be used to create a diagonal matrix from a 1-D array.
>>> np.diag([1, 11, 111])
array([[ 1, 0, 0],
[ 0, 11, 0],
[ 0, 0, 111]])
Problem 3. Write a function that defines the following matrices as NumPy arrays using the
functions presented in this section (not np.array()). Calculate the matrix product ABA.
Change the data type of the resulting matrix to np.int64, then return it.
1 1 1 1 1 1 1 −1 5 5 5 5 5 5
0 1 1 1 1 1 1
−1 −1 5 5 5 5 5
0 0 1 1 1 1 1
−1 −1 −1 5 5 5 5
A=
0 0 0 1 1 1 1
B=
−1 −1 −1 −1 5 5 5
0 0 0 0 1 1 1
−1 −1 −1 −1 −1 5 5
0 0 0 0 0 1 1 −1 −1 −1 −1 −1 −1 5
0 0 0 0 0 0 1 −1 −1 −1 −1 −1 −1 −1
Data Access
Array Slicing
Indexing for a 1-D NumPy array uses the slicing syntax x[start:stop:step]. If there is no colon,
a single entry of that dimension is accessed. With a colon, a range of values is accessed. For multi-
dimensional arrays, use a comma to separate slicing syntax for each axis.
>>> A = np.array([[0,1,2,3,4],[5,6,7,8,9]])
>>> A
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
Note
Indexing and slicing operations return a view of the array. Changing a view of an array also
changes the original array. In other words, arrays are mutable. To create a copy of an array,
use np.copy() or the array’s copy() method. Changes to a copy of an array does not affect
the original array, but copying an array uses more time and memory than getting a view.
Fancy Indexing
So-called fancy indexing is a second way to access or change the elements of an array. Instead of
using slicing syntax, provide either an array of indices or an array of boolean values (called a mask )
to extract specific elements.
# A boolean array extracts the elements of 'x' at the same places as 'True'.
>>> mask = np.array([True, False, False, True, False])
>>> x[mask] # Get the 0th and 3rd entries.
array([ 0, 30])
Fancy indexing is especially useful for extracting or changing the values of an array that meet
some sort of criterion. Use comparison operators like < and == to create masks.
While indexing and slicing always return a view, fancy indexing always returns a copy.
50 Lab 3. Introduction to NumPy
Problem 4. Write a function that accepts a single array as input. Make a copy of the array,
then use fancy indexing to set all negative entries of the copy to 0. Return the resulting array.
Array Manipulation
Shaping
An array’s shape attribute describes its dimensions. Use np.reshape() or the array’s reshape()
method to give an array a new shape. The total number of entries in the old array and the new
array must be the same in order for the shaping to work correctly. Using a -1 in the new shape tuple
makes the specified dimension as long as necessary.
# Reshape 'A' into an array with 2 rows and the appropriate number of columns.
>>> A.reshape((2,-1))
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11]])
Use np.ravel() to flatten a multi-dimensional array into a 1-D array and np.transpose() or
the T attribute to transpose a 2-D array in the matrix sense.
>>> A = np.arange(12).reshape((3,4))
>>> A
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
Note
By default, all NumPy arrays that can be represented by a single dimension, including column
slices, are automatically reshaped into “flat” 1-D arrays. For example, by default an array will
have 10 elements instead of 10 arrays with one element each. Though we usually represent
vectors vertically in mathematical notation, NumPy methods such as dot() are implemented
to purposefully work well with 1-D “row arrays”.
>>> A = np.arange(10).reshape((2,5))
>>> A
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
However, it is occasionally necessary to change a 1-D array into a “column array”. Use
np.reshape(), np.vstack(), or slice the array and put np.newaxis on the second axis. Note
that np.transpose() does not alter 1-D arrays.
>>> x = np.arange(3)
>>> x
array([0, 1, 2])
Stacking
NumPy has functions for stacking two or more arrays with similar dimensions into a single block
matrix. Each of these methods takes in a single tuple of arrays to be stacked in sequence.
Function Description
concatenate() Join a sequence of arrays along an existing axis
hstack() Stack arrays in sequence horizontally (column wise).
vstack() Stack arrays in sequence vertically (row wise).
column_stack() Stack 1-D arrays as columns into a 2-D array.
52 Lab 3. Introduction to NumPy
>>> A = np.arange(6).reshape((2,3))
>>> B = np.zeros((4,3))
>>> A = A.T
>>> B = np.ones((3,4))
See http://docs.scipy.org/doc/numpy-1.10.1/reference/routines.array-manipulation.html
for more array manipulation routines and documentation.
Problem 5. Write a function that defines the following matrices as NumPy arrays.
3 0 0 −2 0 0
0 2 4
A= B= 3 3 0 C = 0 −2 0
1 3 5
3 3 3 0 0 −2
Use NumPy’s stacking functions to create and return the block matrix:
0 AT I
A 0 0 ,
B 0 C
where I is the 3 × 3 identity matrix and each 0 is a matrix of all zeros of appropriate size.
A block matrix of this form is used in the interior point method for linear optimization.
53
Array Broadcasting
Many matrix operations make sense only when the two operands have the same shape, such as
element-wise addition. Array broadcasting extends such operations to accept some (but not all)
operands with different shapes, and occurs automatically whenever possible.
Suppose, for example, that we would like to add different values to the columns of an m × n
matrix A. Adding a 1-D array x with the n entries to A will automatically do this correctly. To add
different values to the different rows of A, first reshape a 1-D array of m values into a column array.
Broadcasting then correctly takes care of the operation.
Broadcasting can also occur between two 1-D arrays, once they are reshaped appropriately.
>>> A = np.arange(12).reshape((4,3))
>>> x = np.arange(3)
>>> A
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
>>> x
array([0, 1, 2])
Function Description
abs() or absolute() Calculate the absolute value element-wise.
exp() / log() Exponential (ex ) / natural log element-wise.
maximum() / minimum() Element-wise maximum / minimum of two arrays.
sqrt() The positive square-root, element-wise.
sin(), cos(), tan(), etc. Element-wise trigonometric operations.
>>> x = np.arange(-2,3)
>>> print(x, np.abs(x)) # Like np.array([abs(i) for i in x]).
[-2 -1 0 1 2] [2 1 0 1 2]
Achtung!
The math module has many useful functions for numerical computations. However, most of
these functions can only act on single numbers, not on arrays. NumPy functions can act on
either scalars or entire arrays, but math functions tend to be a little faster for acting on scalars.
Always use universal NumPy functions, not the math module, when working with arrays.
55
The np.ndarray class itself has many useful methods for numerical computations.
Method Returns
all() True if all elements evaluate to True.
any() True if any elements evaluate to True.
argmax() Index of the maximum value.
argmin() Index of the minimum value.
argsort() Indices that would sort the array.
clip() restrict values in an array to fit within a given range
max() The maximum element of the array.
mean() The average value of the array.
min() The minimum element of the array.
sort() Return nothing; sort the array in-place.
std() The standard deviation of the array.
sum() The sum of the elements of the array.
var() The variance of the array.
Each of these np.ndarray methods has an equivalent NumPy function. For example, A.max()
and np.max(A) operate the same way. The one exception is the sort() function: np.sort() returns
a sorted copy of the array, while A.sort() sorts the array in-place and returns nothing.
Every method listed can operate along an axis via the keyword argument axis. If axis is
specified for a method on an n-D array, the return value is an (n − 1)-D array, the specified axis
having been collapsed in the evaluation process. If axis is not specified, the return value is usually
a scalar. Refer to the NumPy Visual Guide in the appendix for more visual examples.
>>> A = np.arange(9).reshape((3,3))
>>> A
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
Problem 6. A matrix is called row-stochastic a if its rows each sum to 1. Stochastic matrices
are fundamentally important for finite discrete random processes and some machine learning
algorithms.
Write a function that accepts a matrix (as a 2-D NumPy array). Divide each row of the
matrix by the row sum and return the new row-stochastic matrix. Use array broadcasting and
the axis argument instead of a loop.
a Similarly, a matrix is called column-stochastic if its columns each sum to 1.
08 02 22 97 38 15 00 40 00 75 04 05 07 78 52 12 50 77 91 08
49 49 99 40 17 81 18 57 60 87 17 40 98 43 69 48 04 56 62 00
81 49 31 73 55 79 14 29 93 71 40 67 53 88 30 03 49 13 36 65
52 70 95 23 04 60 11 42 69 24 68 56 01 32 56 71 37 02 36 91
22 31 16 71 51 67 63 89 41 92 36 54 22 40 40 28 66 33 13 80
24 47 32 60 99 03 45 02 44 75 33 53 78 36 84 20 35 17 12 50
32 98 81 28 64 23 67 10 26 38 40 67 59 54 70 66 18 38 64 70
67 26 20 68 02 62 12 20 95 63 94 39 63 08 40 91 66 49 94 21
24 55 58 05 66 73 99 26 97 17 78 78 96 83 14 88 34 89 63 72
21 36 23 09 75 00 76 44 20 45 35 14 00 61 33 97 34 31 33 95
78 17 53 28 22 75 31 67 15 94 03 80 04 62 16 14 09 53 56 92
16 39 05 42 96 35 31 47 55 58 88 24 00 17 54 24 36 29 85 57
86 56 00 48 35 71 89 07 05 44 44 37 44 60 21 58 51 54 17 58
19 80 81 68 05 94 47 69 28 73 92 13 86 52 17 77 04 89 55 40
04 52 08 83 97 35 99 16 07 97 57 32 16 26 26 79 33 27 98 66
88 36 68 87 57 62 20 72 03 46 33 67 46 55 12 32 63 93 53 69
04 42 16 73 38 25 39 11 24 94 72 18 08 46 29 32 40 62 76 36
20 69 36 41 72 30 23 88 34 62 99 69 82 67 59 85 74 04 36 16
20 73 35 29 78 31 90 01 74 31 49 71 48 86 81 16 23 57 05 54
01 70 54 71 83 51 54 69 16 92 33 48 61 43 52 01 89 19 67 48
One way to approach this problem is to iterate through the rows and columns of the array,
checking small slices of the array at each iteration and updating the current largest product.
Array slicing, however, provides a much more efficient solution.
57
The naïve method for computing the greatest product of four adjacent numbers in a
horizontal row might be as follows:
>>> winner = 0
>>> for i in range(20):
... for j in range(17):
... winner = max(np.prod(grid[i,j:j+4]), winner)
...
>>> winner
48477312
Instead, use array slicing to construct a single array where the (i, j)th entry is the product
of the four numbers to the right of the (i, j)th entry in the original grid. Then find the largest
element in the new array.
Use slicing to similarly find the greatest products of four vertical, right diagonal, and left
diagonal adjacent numbers.
(Hint: Consider drawing the portions of the grid that each slice in the above code covers, like
the examples in the visual guide. Then draw the slices that produce vertical, right diagonal, or
left diagonal sequences, and translate the pictures into slicing syntax.)
Achtung!
All of the examples in this lab use NumPy arrays, objects of type np.ndarray. NumPy also
has a “matrix” data structure called np.matrix that was built specifically for MATLAB users
who are transitioning to Python and NumPy. It behaves slightly differently than the regular
array class, and can cause some unexpected and subtle problems.
For consistency (and your sanity), never use a NumPy matrix; always use NumPy arrays.
If necessary, cast a matrix object as an array with np.array().
58 Lab 3. Introduction to NumPy
Additional Material
Random Sampling
The submodule np.random holds many functions for creating arrays of random values chosen from
probability distributions such as the uniform, normal, and multinomial distributions. It also contains
some utility functions for getting non-distributional random samples, such as random integers or
random samples from a given array.
Function Description
choice() Take random samples from a 1-D array.
random() Uniformly distributed floats over [0, 1).
randint() Random integers over a half-open interval.
randn() Sample from the standard normal distribution.
permutation() Randomly permute a sequence / generate a random sequence.
Function Distribution
beta() Beta distribution over [0, 1].
binomial() Binomial distribution.
exponential() Exponential distribution.
gamma() Gamma distribution.
geometric() Geometric distribution.
multinomial() Multivariate generalization of the binomial distribution.
multivariate_normal() Multivariate generalization of the normal distribution.
normal() Normal / Gaussian distribution.
poisson() Poisson distribution.
uniform() Uniform distribution.
Note that many of these functions have counterparts in the standard library’s random module.
These NumPy functions, however, are much better suited for working with large collections of random
samples.
It is often useful to save an array as a file for later use. NumPy provides several easy methods for
saving and loading array data.
59
Function Description
save() Save a single array to a .npy file.
savez() Save multiple arrays to a .npz file.
savetxt() Save a single array to a .txt file.
load() Load and return an array or arrays from a .npy or .npz file.
loadtxt() Load and return an array from a text file.
# Read the array from the file and check that it matches the original.
>>> y = np.load("uniform.npy") # Or np.loadtxt("uniform.txt").
>>> np.allclose(x, y) # Check that x and y are close entry-wise.
True
To save several arrays to a single file, specify a keyword argument for each array in np.savez().
Then np.load() will return a dictionary-like object with the keyword parameter names from the
save command as the keys.
# Read the arrays from the file and check that they match the original.
>>> arrays = np.load("normal.npz")
>>> np.allclose(x, arrays["first"])
True
>>> np.allclose(y, arrays["second"])
True
60 Lab 3. Introduction to NumPy
4
Object-oriented
Programming
Lab Objective: Python is a class-based language. A class is a blueprint for an object that binds
together specified variables and routines. Creating and using custom classes is often a good way to
write clean, efficient, well-designed programs. In this lab we learn how to define and use Python
classes. In subsequent labs we will often create customized classes for use in algorithms.
Classes
A Python class is a code block that defines a custom object and determines its behavior. The class
key word defines and names a new class. Other statements follow, indented below the class name, to
determine the behavior of objects instantiated by the class.
A class needs a method called a constructor that is called whenever the class instantiates a new
object. The constructor specifies the initial state of the object. In Python, a class’s constructor is
always named __init__(). For example, the following code defines a class for storing information
about backpacks.
class Backpack:
"""A Backpack object class. Has a name and a list of contents.
Attributes:
name (str): the name of the backpack's owner.
contents (list): the contents of the backpack.
"""
def __init__(self, name): # This function is the constructor.
"""Set the name and initialize an empty list of contents.
Parameters:
name (str): the name of the backpack's owner.
"""
self.name = name # Initialize some attributes.
self.contents = []
61
62 Lab 4. Object-oriented Programming
An attribute is a variable stored within an object. The Backpack class has two attributes:
name and contents. In the body of the class definition, attributes are assigned and accessed via
the identifier self. The identifier self refers to the object internally once it has been created. In
the previous example, the line self.name = name stores the input argument name to the attribute
self.name.
Instantiation
The class code block above only defines a blueprint for backpack objects. To create an actual
backpack object, call the class name like a function. This triggers the constructor and returns a new
instance of the class, an object whose type is the class.
# Access the object's attributes with a period and the attribute name.
>>> print(my_backpack.name, my_backpack.contents)
Fred []
Note
Every object in Python has some built-in attributes. For example, modules have a __name__
attribute that identifies the scope in which it is being executed. If the module is being run
directly, and not imported, then __name__ is set to "__main__". Therefore, any commands
under an if __name__ == "__main__": clause are ignored when the module is imported.
Methods
In addition to storing variables as attributes, classes can have functions attached to them. A function
that belongs to a specific class is called a method.
class Backpack:
# ...
def put(self, item):
"""Add an item to the backpack's list of contents."""
self.contents.append(item) # Use 'self.contents', not just 'contents'.
self.contents.remove(item)
The first argument of each method must be self, to give the method access to the attributes
and other methods of the class. The self argument is only included in the declaration of the class
methods, not when calling the methods on an instantiation of the class.
1. Modify the constructor so that it accepts three total arguments: name, color, and
max_size (in that order). Make max_size a keyword argument that defaults to 5. Store
each input as an attribute.
2. Modify the put() method to check that the backpack does not go over capacity. If there
are already max_size items or more, print “No Room!” and do not add the item to the
contents list.
3. Write a new method called dump() that resets the contents of the backpack to an empty
list. This method should not receive any arguments (except self).
4. Documentation is especially important in classes so that the user knows what an ob-
ject’s attributes represent and how to use methods appropriately. Update (or write) the
docstrings for the __init__(), put(), and dump() methods, as well as the actual class
docstring (under class but before __init__()) to reflect the changes from parts 1-3 of
this problem.
To ensure that your class works properly, write a test function outside of the Backpack
class that instantiates and analyzes a Backpack object.
def test_backpack():
testpack = Backpack("Barry", "black") # Instantiate the object.
if testpack.name != "Barry": # Test an attribute.
print("Backpack.name assigned incorrectly")
for item in ["pencil", "pen", "paper", "computer"]:
testpack.put(item) # Test a method.
print("Contents:", testpack.contents)
# ...
64 Lab 4. Object-oriented Programming
Inheritance
To create a new class that is similar to one that already exists, it is often better to inherit the methods
and attributes from an existing class rather than create a new class from scratch. This creates a
class hierarchy: a class that inherits from another class is called a subclass, and the class that a
subclass inherits from is called a superclass. To define a subclass, add the name of the superclass as
an argument at the end of the class declaration.
For example, since a knapsack is a kind of backpack (but not all backpacks are knapsacks), we
create a special Knapsack subclass that inherits the structure and behaviors of the Backpack class
and adds some extra functionality.
Attributes:
name (str): the name of the knapsack's owner.
color (str): the color of the knapsack.
max_size (int): the maximum number of items that can fit inside.
contents (list): the contents of the backpack.
closed (bool): whether or not the knapsack is tied shut.
"""
def __init__(self, name, color, max_size=3):
"""Use the Backpack constructor to initialize the name, color,
and max_size attributes. A knapsack only holds 3 item by default.
Parameters:
name (str): the name of the knapsack's owner.
color (str): the color of the knapsack.
max_size (int): the maximum number of items that can fit inside.
"""
Backpack.__init__(self, name, color, max_size)
self.closed = True
A subclass may have new attributes and methods that are unavailable to the superclass, such
as the closed attribute in the Knapsack class. If methods from the superclass need to be changed
for the subclass, they can be overridden by defining them again in the subclass. New methods can
be included normally.
class Knapsack(Backpack):
# ...
def put(self, item): # Override the put() method.
"""If the knapsack is untied, use the Backpack.put() method."""
if self.closed:
65
print("I'm closed!")
else: # Use Backpack's original put().
Backpack.put(self, item)
Since Knapsack inherits from Backpack, a knapsack object is a backpack object. All methods
defined in the Backpack class are available as instances of the Knapsack class. For example, the
dump() method is available even though it is not defined explicitly in the Knapsack class.
The built-in function issubclass() shows whether or not one class is derived from another.
Similarly, isinstance() indicates whether or not an object belongs to a specified class hierarchy.
Finally, hasattr() shows whether or not a class or object has a specified attribute or method.
# The put() and take() method now require the knapsack to be open.
>>> my_knapsack.put('compass')
I'm closed!
# The Knapsack class has a weight() method, but the Backpack class does not.
>>> print(hasattr(my_knapsack, 'weight'), hasattr(my_backpack, 'weight'))
True False
>>> my_knapsack.dump()
>>> my_knapsack.contents
[]
Problem 2. Write a Jetpack class that inherits from the Backpack class.
1. Override the constructor so that in addition to a name, color, and maximum size, it also
accepts an amount of fuel. Change the default value of max_size to 2, and set the default
value of fuel to 10. Store the fuel as an attribute.
2. Add a fly() method that accepts an amount of fuel to be burned and decrements the
fuel attribute by that amount. If the user tries to burn more fuel than remains, print “Not
enough fuel!” and do not decrement the fuel.
3. Override the dump() method so that both the contents and the fuel tank are emptied.
4. Write clear, detailed docstrings for the class and each of its methods.
Note
All classes are subclasses of the built-in object class, even if no parent class is specified in
the class definition. In fact, the syntax “class ClassName(object):” is not uncommon (or
incorrect) for the class declaration, and is equivalent to the simpler “class ClassName:”.
Magic Methods
A magic method is a special method used to make an object behave like a built-in data type. Magic
methods begin and end with two underscores, like the constructor __init__(). Every Python object
is automatically endowed with several magic methods, which can be revealed through IPython.
In [3]: b.__ # Press 'tab' to see magic methods and hidden attributes.
__add__() __getattribute__ __new__()
__class__ __gt__ __reduce__()
__delattr__ __hash__ __reduce_ex__()
__dict__ __init__() __repr__
__dir__() __init_subclass__() __setattr__
67
Note
Many programming languages distinguish between public and private variables. In Python, all
attributes are public, period. However, attributes that start with an underscore are hidden
from the user, which is why magic methods do not show up at first in the preceding code box.
The more common magic methods define how an object behaves with respect to addition and
other binary operations. For example, how should addition be defined for backpacks? A simple
option is to add the number of contents. Then if backpack A has 3 items and backpack B has 5 items,
A + B should return 8. To incorporate this idea, we implement the __add__() magic method.
class Backpack:
# ...
def __add__(self, other):
"""Add the number of contents of each Backpack."""
return len(self.contents) + len(other.contents)
Using the + binary operator on two Backpack objects calls the class’s __add__() method. The
object on the left side of the + is passed in to __add__() as self and the object on the right side of
the + is passed in as other.
Comparisons
Magic methods also facilitate object comparisons. For example, the __lt__() method corresponds
to the < operator. Suppose one backpack is considered “less” than another if it has fewer items in its
list of contents.
class Backpack(object)
68 Lab 4. Object-oriented Programming
# ...
def __lt__(self, other):
"""If 'self' has fewer contents than 'other', return True.
Otherwise, return False.
"""
return len(self.contents) < len(other.contents)
Using the < binary operator on two Backpack objects calls __lt__(). As with addition, the
object on the left side of the < operator is passed to __lt__() as self, and the object on the right
is passed in as other.
>>> pack2.put('pencils')
>>> pack1 < pack2
True
Comparison methods should return either True or False, while methods like __add__() might
return a numerical value or another kind of object.
Table 4.1: Common magic methods for arithmetic and comparisons. What each of these operations
do is up to the programmer and should be carefully documented. For more methods and details, see
https://docs.python.org/3/reference/datamodel.html#special-method-names.
Problem 3. Endow the Backpack class with two additional magic methods:
1. The __eq__() magic method is used to determine if two objects are equal, and is invoked
by the == operator. Implement the __eq__() magic method for the Backpack class so
that two Backpack objects are equal if and only if they have the same name, color, and
number of contents.
2. The __str__() magic method returns the string representation of an object. This method
is invoked by str() and used by print(). Implement the __str__() method in the
Backpack class so that printing a Backpack object yields the following output (that is,
construct and return the following string).
Owner: <name>
69
Color: <color>
Size: <number of items in contents>
Max Size: <max_size>
Contents: [<item1>, <item2>, ...]
(Hint: Use the tab and newline characters '\t' and '\n' to align output nicely.)
Achtung!
Magic methods for comparison are not automatically related. For example, even though the
Backpack class implements the magic methods for < and ==, two Backpack objects cannot
respond to the <= operator unless __le__() is explicitly defined. The exception to this rule is
the != operator: as long as __eq__() is defined, A!=B is False if and only if A==B is True.
(a) Implement __str__() so that a + bi is printed out as (a+bj) for b ≥ 0 and (a-bj)
for b < 0.
√
(b) The magnitude of a + bi is |a + bi| = a2 + b2 . The __abs__() magic method
determines the output of the built-in abs() function (absolute value). Implement
__abs__() so that it returns the magnitude of the complex number.
(c) Two ComplexNumber objects are equal if and only if they have the same real and
imaginary parts. Implement __eq__() so that it compares the ComplexNumber object
with another ComplexNumber object and returns a bool indicating whether they are
equal.
(d) Implement __add__(), __sub__(), __mul__(), and __truediv__() appropriately.
Each of these should return a new ComplexNumber object.
Write a function to test your class by comparing it to Python’s built-in complex type.
# Validate __str__().
if str(py_cnum) != str(my_cnum):
print("__str__() failed for", py_cnum)
# ...
71
Additional Material
Static Attributes
Attributes that are accessed through self are called instance attributes because they are bound to
a particular instance of the class. In contrast, a static attribute is one that is shared between all
instances of the class. To make an attribute static, declare it inside of the class block but outside of
any of the class’s methods, and do not use self. Since the attribute is not tied to a specific instance
of the class, it may be accessed or changed via the class name without even instantiating the class
at all.
class Backpack:
# ...
brand = "Adidas" # Backpack.brand is a static attribute.
# Change the brand name for the class to change it for all class instances.
>>> Backpack.brand = "Nike"
>>> print(pack1.brand, pack2.brand, Backpack.brand)
Nike Nike Nike
Static Methods
Individual class methods can also be static. A static method cannot be dependent on the attributes
of individual instances of the class, so there can be no references to self inside the body of the
method and self is not listed as an argument in the function definition. Thus static methods only
have access to static attributes and other static methods. Include the tag @staticmethod above the
function definition to designate a method as static.
class Backpack:
# ...
@staticmethod
def origin(): # Do not use 'self' as a parameter.
print("Manufactured by " + Backpack.brand + ", inc.")
To practice these principles, consider adding a static attribute to the Backpack class to serve as
a counter for a unique ID. In the constructor for the Backpack class, add an instance variable called
self.ID. Set this ID based on the static ID variable, then increment the static ID so that the next
Backpack object will have a different ID.
Hashing
A hash function is a method that maps objects to identifiers which can be numeric or alphanumeric
strings. These identifiers are known as hash values. The built-in hash() function calculates an
object’s hash value by calling its __hash__() magic method. While an object has one unique hash
value, it is possible that two distinct objects may share the same hash value. This is called a collision.
A good hash function is one that minimizes the probability of collisions occurring.
In Python, the built-in set and dict structures use hash values to store and retrieve objects
in memory quickly. If an object is unhashable, it cannot be put in a set or be used as a key in
a dictionary. See https://docs.python.org/3/glossary.html#term-hashable for details. If the
__hash__() method is not defined, the default hash value is the object’s memory address (accessible
via the built-in function id()) divided by 16, rounded down to the nearest integer. However, two
objects that compare as equal via the __eq__() magic method must have the same hash value. The
following simple __hash__() method for the Backpack class conforms to this rule and returns an
integer.
class Backpack:
# ...
def __hash__(self):
return hash(self.name) ^ hash(self.color) ^ hash(len(self.contents))
The caret operator ˆ is a bitwise XOR (exclusive or). The bitwise AND operator & and the
bitwise OR operator | are also good choices to use.
See https://docs.python.org/3/reference/datamodel.html#object.__hash__ for more on
hashing.
5
Introduction to
Matplotlib
Lab Objective: Matplotlib is the most commonly used data visualization library in Python. Being
able to visualize data helps to determine patterns and communicate results and is a key component
of applied and computational mathematics. In this lab we introduce techniques for visualizing data
in 1, 2, and 3 dimensions. The plotting techniques presented here will be used in the remainder of
the labs in the manual.
Line Plots
Raw numerical data is rarely helpful unless it can be visualized. The quickest way to visualize a
simple 1-dimensional array is with a line plot. The following code creates an array of outputs of the
function f (x) = x2 , then visualizes the array using the matplotlib module1 [Hun07].
>>> y = np.arange(-5,6)**2
>>> y
array([25, 16, 9, 4, 1, 0, 1, 4, 9, 16, 25])
The result is shown in Figure 5.1a. Just as np is a standard alias for NumPy, plt is a standard
alias for matplotlib.pyplot in the Python community.
The call plt.plot(y) creates a figure and draws straight lines connecting the entries of y
relative to the y-axis. The x-axis is (by default) the index of the array, which in this case is the
integers from 0 to 10. Calling plt.show() then displays the figure.
1 Like NumPy, Matplotlib is not part of the Python standard library, but it is included in most Python distributions.
73
74 Lab 5. Introduction to Matplotlib
25 25
20 20
15 15
10 10
5 5
0 0
0 2 4 6 8 10 4 2 0 2 4
(a) plt.plot(y) uses the indices of (b) plt.plot(x,y) specifies both the
the array for the x-axis. domain and the range.
Problem 1. NumPy’s random module has tools for sampling from probability distributions.
For instance, np.random.normal() draws samples from the normal (Gaussian) distribution.
The size parameter specifies the shape of the resulting array.
Define another function that creates an array of the results of the first function with inputs
n = 100, 200, . . . , 1000. Plot (and show) the resulting array.
Specifying a Domain
An obvious problem with Figure 5.1a is that the x-axis does not correspond correctly to the y-axis
for the function f (x) = x2 that is being drawn. To correct this, define an array x for the domain,
then use it to calculate the image y = f(x). The command plt.plot(x,y) plots x against y by
drawing a line between the consecutive points (x[i], y[i]).
Another problem with Figure 5.1a is its poor resolution: the curve is visibly bumpy, especially
near the bottom of the curve. NumPy’s linspace() function makes it easy to get a higher-resolution
domain. Recall that np.arange() returns an array of evenly-spaced values in a given interval, where
75
the spacing between the entries is specified. In contrast, np.linspace() creates an array of evenly-
spaced values in a given interval where the number of elements is specified.
The resulting plot is shown in Figure 5.1b. This time, the x-axis correctly matches up with the
y-axis. The resolution is also much better because x and y have 50 entries each instead of only 10.
Subsequent calls to plt.plot() modify the same figure until plt.show() is executed, which
displays the current figure and resets the system. This behavior can be altered by specifying separate
figures or axes, which we will discuss shortly.
Note
Plotting can seem a little mystical because the actual plot doesn’t appear until plt.show() is
executed. Matplotlib’s interactive mode allows the user to see the plot be constructed one piece
at a time. Use plt.ion() to turn interactive mode on and plt.ioff() to turn it off. This is
very useful for quick experimentation. Try executing the following commands in IPython:
Use interactive mode only with IPython. Using interactive mode in a non-interactive
setting may freeze the window or cause other problems.
Problem 2. Write a function that plots the functions sin(x), cos(x), and arctan(x) on the
domain [−2π, 2π] (use np.pi for π). Make sure the domain is refined enough to produce a
figure with good resolution.
76 Lab 5. Introduction to Matplotlib
Plot Customization
plt.plot() receives several keyword arguments for customizing the drawing. For example, the color
and style of the line are specified by the following string arguments.
Specify one or both of these string codes as the third argument to plt.plot() to change from
the default color and style. Other plt functions further customize a figure.
Function Description
legend() Place a legend in the plot
title() Add a title to the plot
xlim() / ylim() Set the limits of the x- or y-axis
xlabel() / ylabel() Add a label to the x- or y-axis
1. Although f (x) has a discontinuity at x = 1, a single call to plt.plot() in the usual way
will make the curve look continuous. Split up the domain into [−2, 1) and (1, 6]. Plot the
two sides of the curve separately so that the graph looks discontinuous at x = 1.
2. Plot both curves with a dashed magenta line. Set the keyword argument linewidth (or
lw) of plt.plot() to 4 to make the line a little thicker than the default setting.
3. Use plt.xlim() and plt.ylim() to change the range of the x-axis to [−2, 6] and the
range of the y-axis to [−6, 6].
6
4
2
0
2
4
6
2 1 0 1 2 3 4 5 6
The window that plt.show() reveals is called a figure, stored in Python as a plt.Figure object.
A space on a figure where a plot is drawn is called an axes, a plt.Axes object. A figure can have
multiple axes, and a single program may create several figures. There are several ways to create or
grab figures and axes with plt functions.
Function Description
axes() Add an axes to the current figure
figure() Create a new figure or grab an existing figure
gca() Get the current axes
gcf() Get the current figure
subplot() Add a single subplot to the current figure
subplots() Create a figure and add several subplots to it
Usually when a figure has multiple axes, they are organized into non-overlapping subplots.
The command plt.subplot(nrows, ncols, plot_number) creates an axes in a subplot grid where
nrows is the number of rows of subplots in the figure, ncols is the number of columns, and
plot_number specifies which subplot to modify. If the inputs for plt.subplot() are all integers, the
commas between the entries can be omitted. For example, plt.subplot(3,2,2) can be shortened
to plt.subplot(322).
78 Lab 5. Introduction to Matplotlib
1 2 3
4 5 6
Figure 5.3: The layout of subplots with plt.subplot(2,3,i) (2 rows, 3 columns), where i is the
index pictured above. The outer border is the figure that the axes belong to.
Exponential Logarithmic
50 1
40
0
30
20 1
10
2
0
0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00
79
Note
Plotting functions such as plt.plot() are shortcuts for accessing the current axes on the current
figure and calling a method on that Axes object. Calling plt.subplot() changes the current
axis, and calling plt.figure() changes the current figure. Use plt.gca() to get the current
axes and plt.gcf() to get the current figure. Compare the following equivalent strategies for
producing a figure with two subplots.
Problem 4. Write a function that plots the functions sin(x), sin(2x), 2 sin(x), and 2 sin(2x)
on the domain [0, 2π], each in a separate subplot of a single figure.
Achtung!
Be careful not to mix up the following functions.
1. plt.axes() creates a new place to draw on the figure, while plt.axis() (or ax.axis())
sets properties of the x- and y-axis in the current axes, such as the x and y limits.
2. plt.subplot() (singular) returns a single subplot belonging to the current figure, while
plt.subplots() (plural) creates a new figure and adds a collection of subplots to it.
• A scatter plot plots two 1-dimensional arrays against each other without drawing lines between
the points. Scatter plots are particularly useful for data that is not correlated or ordered.
To create a scatter plot, use plt.plot() and specify a point marker (such as 'o' or '*') for
the line style, or use plt.scatter() (or ax.scatter()). Beware that plt.scatter() has
slightly different arguments and syntax than plt.plot().
• A histogram groups entries of a 1-dimensional data set into a given number of intervals, called
bins. Each bin has a bar whose height indicates the number of values that fall in the range of
the bin. Histograms are best for displaying distributions, relating data values to frequency.
To create a histogram, use plt.hist() (or ax.hist()). Use the argument bins to specify the
edges of the bins or to choose a number of bins. The range argument specifies the outer limits
of the first and last bins.
>>> plt.show()
81
1.5
120
1.0
100
0.5
80
0.0
60
0.5
40
1.0
20
1.5
0
4 2 0 2 4 4 2 0 2 4
Problem 5. The Fatality Analysis Reporting System (FARS) is a nationwide census that
provides yearly data regarding fatal injuries suffered in motor vehicle traffic crashes.a The
array contained in FARS.npy is a small subset of the FARS database from 2010–2014. Each of
the 148,206 rows in the array represents a different car crash; the columns represent the hour
(in military time, as an integer), the longitude, and the latitude, in that order.
Write a function to visualize the data in FARS.npy. Use np.load() to load the data, then
create a single figure with two subplots:
1. A scatter plot of longitudes against latitudes. Because of the large number of data points,
use black pixel markers (use "k," as the third argument to plt.plot()). Label both axes
using plt.xlabel() and plt.ylabel() (or ax.set_xlabel() and ax.set_ylabel()).
(Hint: Use plt.axis("equal") or ax.set_aspect("equal") so that the x- and y-axis
are scaled the same way.
2. A histogram of the hours of the day, with one bin per hour. Set the limits of the x-axis
appropriately. Label the x-axis. You should be able to clearly see which hours of the day
experience more traffic.
a See http://www.nhtsa.gov/FARS.
Matplotlib also has tools for creating other kinds of plots for visualizing 1-dimensional data,
including bar plots and box plots. See the Matplotlib Appendix for examples and syntax.
1.5 1.5
1.0 1.0
0.5 0.5
0.0 0.0
0 1 2 0 1 2
Figure 5.6: In the left plot, we have two arrays where x and y have the values x = y = [0,1,2].
0 1 2 0 0 0
The command np.meshgrid(x, y) returns the arrays X = 0 1 2 and Y = 1 1 1 .
0 1 2 2 2 2
These give the x- and y-coordinates of the points in the grid formed by x and y as seen in the right
plot and satisfy the relation (X[i,j], Y[i,j]) = (x[j],y[i]).
With a 2-dimensional domain, g(x, y) is usually visualized with two kinds of plots.
• A heat map assigns a color to each point in the domain, producing a 2-dimensional colored
picture describing a 3-dimensional shape. Darker colors typically correspond to lower values
while lighter colors typically correspond to higher values.
83
Use plt.pcolormesh() to create a heat map. You can add an optional argument for the shading
type; this determines the layout and fill style of the heat map. This argument defaults to
shading='auto', and will automatically choose a fill method suited to the data being graphed.
• A contour map draws several level curves of g on the 2-dimensional domain. A level curve
corresponding to the constant c is the collection of points {(x, y) | c = g(x, y)}. Coloring the
space between the level curves produces a discretized version of a heat map. Including more
and more level curves makes a filled contour plot look more and more like the complete, blended
heat map.
Use plt.contour() to create a contour plot and plt.contourf() to create a filled contour
plot. Specify either the number of level curves to draw, or a list of constants corresponding to
specific level curves.
These functions each receive the keyword argument cmap to specify a color scheme (some of
the better schemes are "viridis", "magma", and "coolwarm"). For the list of all Matplotlib color
schemes, see http://matplotlib.org/examples/color/colormaps_reference.html.
Finally, plt.colorbar() draws the color scale beside the plot to indicate how the colors relate
to the values of the function.
3 0.8 3 1.0
0.6 0.8
2 2
0.4
1 0.2 1 0.5
0 0.0 0 0.0
1 0.2 1 0.5
0.4
2 2 0.8
0.6
3 0.8 3 1.0
3 2 1 0 1 2 3 3 2 1 0 1 2 3
sin(x) sin(y)
Problem 6. Write a function to plot g(x, y) = xy on the domain [−2π, 2π] × [−2π, 2π].
1. Create 2 subplots: one with a heat map of g, and one with a contour map of g. Choose
an appropriate number of level curves, or specify the curves yourself.
Additional Material
Further Reading and Tutorials
Plotting takes some getting used to. See the following materials for more examples.
• https://www.labri.fr/perso/nrougier/teaching/matplotlib/.
• https://matplotlib.org/stable/tutorials/introductory/pyplot.html.
• http://scipy-lectures.org/intro/matplotlib/.
3-D Plotting
Matplotlib can also be used to plot 3-dimensional surfaces. The following code produces the surface
corresponding to g(x, y) = sin(x) sin(y).
0.75
0.50
0.25
0.00
0.25
0.50
0.75
123
01
3 2 1 0 1 32
2 3
86 Lab 5. Introduction to Matplotlib
Animations
Lines and other graphs can be altered dynamically to produce animations. Follow these steps to
create a Matplotlib animation:
The submodule matplotlib.animation contains the tools for putting together and managing
animations. The function matplotlib.animation.FuncAnimation() accepts the figure to animate,
the function that updates the figure, the number of frames to show before repeating, and how fast
to run the animation (lower numbers mean faster animations).
def sine_animation():
# Calculate the data to be animated.
x = np.linspace(0, 2*np.pi, 200)[:-1]
y = np.sin(x)
Try using the following function in place of update(). Can you explain why this animation is
different from the original?
def wave(index):
drawing.set_data(x, np.roll(y, index))
return drawing,
To animate multiple objects at once, define the objects separately and make sure the update
function returns both objects.
87
def sine_cosine_animation():
x = np.linspace(0, 2*np.pi, 200)[:-1]
y1, y2 = np.sin(x), np.cos(x)
fig = plt.figure()
plt.xlim(0, 2*np.pi)
plt.ylim(-1.2, 1.2)
sin_drawing, = plt.plot([],[])
cos_drawing, = plt.plot([],[])
def update(index):
sin_drawing.set_data(x[:index], y1[:index])
cos_drawing.set_data(x[:index], y2[:index])
return sin_drawing, cos_drawing,
Animations can also be 3-dimensional. The only major difference is an extra operation to
set the 3-dimensional component of the drawn object. The code below animates the space curve
parametrized by the following equations:
θ
x(θ) = cos(θ) cos(6θ), y(θ) = sin(θ) cos(6θ), z(θ) = 10
def rose_animation_3D():
theta = np.linspace(0, 2*np.pi, 200)
x = np.cos(theta) * np.cos(6*theta)
y = np.sin(theta) * np.cos(6*theta)
z = theta / 10
fig = plt.figure()
ax = fig.add_subplot(projection='3d') # Make the figure 3-D.
ax.set_xlim3d(-1.2, 1.2) # Use ax instead of plt.
ax.set_ylim3d(-1.2, 1.2)
ax.set_aspect("equal")
# Update the first 2 dimensions like usual, then update the 3-D component.
def update(index):
drawing.set_data(x[:index], y[:index])
drawing.set_3d_properties(z[:index])
return drawing,
Lab Objective: In Python, an exception is an error detected during execution. Exceptions are
important for regulating program usage and for correctly reporting problems to the programmer and
end user. An understanding of exceptions is essential to safely read data from and write data to exter-
nal files. Being able to interact with external files is important for analyzing data and communicating
results. In this lab we learn exception syntax and file interaction protocols.
Exceptions
An exception formally indicates an error and terminates the program early. Some of the more common
exception types are listed below, along with the kinds of problems they typically indicate.
Exception Indication
AttributeError An attribute reference or assignment failed.
ImportError An import statement failed.
IndexError A sequence subscript was out of range.
NameError A local or global name was not found.
TypeError An operation or function was applied to an object of
inappropriate type.
ValueError An operation or function received an argument that had
the right type but an inappropriate value.
ZeroDivisionError The second argument of a division or modulo operation was zero.
>>> print(x)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined
89
90 Lab 6. Exceptions and File Input/Output
Raising Exceptions
Most exceptions are due to coding mistakes and typos. However, exceptions can also be used inten-
tionally to indicate a problem to the user or programmer. To create an exception, use the keyword
raise, followed by the name of the exception class. As soon as an exception is raised, the program
stops running unless the exception is handled properly.
1. Choose a 3-digit number where the first and last digits differ by 2 or more (say, 123).
The result of the last step will always be 1089, regardless of the original number chosen in step
1 (can you explain why?).
The following function prompts the user for input at each step of the magic trick, but
does not check that the user’s inputs are correct.
def arithmagic():
step_1 = input("Enter a 3-digit number where the first and last "
"digits differ by 2 or more: ")
step_2 = input("Enter the reverse of the first number, obtained "
"by reading it backwards: ")
step_3 = input("Enter the positive difference of these numbers: ")
step_4 = input("Enter the reverse of the previous result: ")
print(str(step_3), "+", str(step_4), "= 1089 (ta-da!)")
91
Modify arithmagic() so that it verifies the user’s input at each step. Raise a ValueError
with an informative error message if any of the following occur:
• The first number’s first and last digits differ by less than 2.
• The second number (step_2) is not the reverse of the first number.
• The third number (step_3) is not the positive difference of the first two numbers.
• The fourth number (step_4) is not the reverse of the third number.
(Hint: input() always returns a string, so each variable is a string initially. Use int() to cast
the variables as integers when necessary. The built-in function abs() may also be useful.)
Handling Exceptions
To prevent an exception from halting the program, it must be handled by placing the problematic
lines of code in a try block. An except block then follows with instructions for what to do in the
event of an exception.
# The 'try' block should hold any lines of code that might raise an exception.
>>> try:
... print("Entering try block...")
... raise Exception("for no reason")
... print("No problem!") # This line gets skipped.
... # The 'except' block is executed just after the exception is raised.
... except Exception as e:
... print("There was a problem:", e)
...
Entering try block...
There was a problem: for no reason
>>> # The program then continues on.
In this example, the name e represents the exception within the except block. Printing e
displays its error message. If desired, e can be raised again with raise e or just raise.
The try-except control flow can be expanded with two other blocks, forming a code structure
similar to a sequence of if-elif-else blocks.
2. An except statement specifying the same kind of exception that was raised in the try block
“catches” the exception, and the block is then executed. There may be multiple except blocks
following a single try block (similiar to having several elif statements following a single if
statement), and a single except statement may specify multiple kinds of exceptions to catch.
3. The else block is executed if an exception was not raised in the try block.
>>> try:
... print("Entering try block...", end='')
... house_on_fire = False
... raise ValueError("The house is on fire!")
... # Check for multiple kinds of exceptions using parentheses.
... except (ValueError, TypeError) as e:
... print("caught an exception.")
... house_on_fire = True
... else: # Skipped due to the exception.
... print("no exceptions raised.")
... finally:
... print("The house is on fire:", house_on_fire)
...
Entering try block...caught an exception.
The house is on fire: True
>>> try:
... print("Entering try block...", end='')
... house_on_fire = False
... except ValueError as e: # Skipped because there was no exception.
... print("caught a ValueError.")
... house_on_fire = True
... except TypeError as e: # Also skipped.
... print("caught a TypeError.")
... house_on_fire = True
... else:
... print("no exceptions raised.")
... finally:
... print("The house is on fire:", house_on_fire)
...
Entering try block...no exceptions raised.
The house is on fire: False
The code in the finally block is always executed, even if a return statement or an uncaught
exception occurs in any block following the try statement.
Achtung!
An except statement with no specified exception type catches any exception raised in the
corresponding try block. This approach can mistakenly mask unexpected errors. Always be
specific about the kinds of exceptions you expect to encounter.
Problem 2. A random walk is a path created by a sequence of random steps. The following
function simulates a random walk by repeatedly adding or subtracting 1 to a running total.
def random_walk(max_iters=1e12):
walk = 0
directions = [1, -1]
for i in range(int(max_iters)):
walk += choice(directions)
return walk
Note
The built-in exceptions are organized into a class hierarchy. For example, the ValueError
class inherits from the generic Exception class. Thus, a ValueError is an Exception, but an
Exception is not a ValueError.
>>> try:
... raise ValueError("caught!")
... except Exception as e: # A ValueError is an Exception.
... print(e)
...
caught! # The exception was caught.
>>> try:
... raise Exception("not caught!")
... except ValueError as e: # A Exception is not a ValueError.
... print(e)
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
Exception: not caught! # The exception wasn't caught!
In both cases, if the file hello_world.txt does not exist in the current directory, open() raises
a FileNotFoundError. However, errors in the try or with blocks do not prevent the file from being
safely closed.
Attribute Description
closed True if the object is closed.
mode The access mode used to open the file object.
name The name of the file.
Method Description
close() Close the connection to the file.
read() Read a given number of bytes; with no input, read the rest of the file.
readline() Read a line of the file, including the newline character at the end.
readlines() Call readline() repeatedly and return a list of the resulting lines.
seek() Move the cursor to a new position.
tell() Report the current position of the cursor.
write() Write a single string to the file (spaces are not added automatically).
writelines() Write a list of strings to the file (newline characters are not added automatically).
Only strings can be written to files; to write a non-string type, first cast it as a string with
str(). Be mindful of spaces and newlines to separate the data.
96 Lab 6. Exceptions and File Input/Output
Problem 3. Define a class called ContentFilter. Implement the constructor so that it accepts
the name of a file to be read.
1. If the file name is invalid in any way, prompt the user for another filename using input().
Continue prompting the user until they provide a valid filename.
2. Read the file and store its name and contents as attributes (store the contents as a single
string). Make sure the file is securely closed.
String Formatting
The str class has several useful methods for parsing and formatting strings. They are particularly
useful for processing data from a source file and for preparing data to be written to an external file.
Method Returns
count() The number of times a given substring occurs within the string.
find() The lowest index where a given substring is found.
isalpha() True if all characters in the string are alphabetic (a, b, c, . . . ).
isdigit() True if all characters in the string are digits (0, 1, 2, . . . ).
isspace() True if all characters in the string are whitespace (" ", '\t', '\n').
join() The concatenation of the strings in a given iterable with a
specified separator between entries.
lower() A copy of the string converted to lowercase.
upper() A copy of the string converted to uppercase.
replace() A copy of the string with occurrences of a given substring
replaced by a different specified substring.
split() A list of segments of the string, using a given character or string
as a delimiter.
strip() A copy of the string with leading and trailing whitespace removed.
97
The join() method translates a list of strings into a single string by concatenating the entries
of the list and placing the principal string between the entries. Conversely, split() translates the
principal string into a list of substrings, with the separation determined by a single input.
# defining function
>>> def abomination(userOpinion='Bread and Butter Pickles'):
... # notice the use of the f-string
... return f'I think that {userOpinion} are an abomination'
Problem 4. Add the following methods to the ContentFilter class for writing the contents
of the original file to new files. Each method should accept the name of a file to write to and
a keyword argument mode that specifies the file access mode, defaulting to 'w'. If mode is not
'w', 'x', or 'a', raise a ValueError with an informative message.
1. uniform(): write the data to the outfile with uniform case. Include an additional keyword
argument case that defaults to "upper".
If case="upper", write the data in upper case. If case="lower", write the data in lower
case. If case is not one of these two values, raise a ValueError.
2. reverse(): write the data to the outfile in reverse order. Include an additional keyword
argument unit that defaults to "line".
If unit="word", reverse the ordering of the words in each line, but write the lines in the
same order as the original file. If unit="line", reverse the ordering of the lines, but do
not change the ordering of the words on each individual line. If unit is not one of these
two values, raise a ValueError.
3. transpose(): write a “transposed” version of the data to the outfile. That is, write the
first word of each line of the data to the first line of the new file, the second word of each
line of the data to the second line of the new file, and so on. Viewed as a matrix of words,
the rows of the input file then become the columns of the output file, and vice versa. You
may assume that there are an equal number of words on each line of the input file.
4. __str__(): Also implement the __str__() magic method so that printing a ContentFilter
object yields the following output. You may want to calculate these statistics in the
constructor. (Note: Using f-strings will also make this implementation much simpler).
(Hint: list comprehensions are very useful for some of these functions. For example,
what does [line[::-1] for line in lines] do? What about sum([s.isspace() for s
in data])?)
Compare your class to the following example.
# cf_example1.txt
A b C
d E f
99
>>> cf = ContentFilter("cf_example1.txt")
>>> cf.uniform("uniform.txt", mode='w', case="upper")
>>> cf.uniform("uniform.txt", mode='a', case="lower")
>>> cf.reverse("reverse.txt", mode='w', unit="word")
>>> cf.reverse("reverse.txt", mode='a', unit="line")
>>> cf.transpose("transpose.txt", mode='w')
# uniform.txt
A B C
D E F
a b c
d e f
# reverse.txt
C b A
f E d
d E f
A b C
# transpose.txt
A d
b E
C f
100 Lab 6. Exceptions and File Input/Output
Additional Material
Custom Exception Classes
Custom exceptions can be defined by writing a class that inherits from some existing exception class.
The generic Exception class is typically the parent class of choice.
This may seem like a trivial extension of the Exception class, but it is useful to do because
the interpreter never automatically raises a TooHardError. Any TooHardError must have originated
from a hand-written raise command, making it easier to identify the exact source of the problem.
Chaining Exceptions
Sometimes, especially in large programs, it is useful to raise one kind of exception just after catching
another. The two exceptions can be linked together using the from statement. This syntax makes it
possible to see where the error originated from and to “pass it up” to another part of the program.
>>> try:
... raise TooHardError("This lab is impossible!")
... except TooHardError as e:
... raise NotImplementedError("Lab is incomplete") from e
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
__main__.TooHardError: This lab is impossible!
The above exception was the direct cause of the following exception:
This method is extremely flexible and provides many convenient ways to format string output
nicely. Consider the following code for printing out a simple progress bar from within a loop.
Here the string "\r[{:<20}]" used in conjunction with the format() method tells the cursor
to go back to the beginning of the line, print an opening bracket, then print the first argument
of format() left-aligned with at least 20 total spaces before printing the closing bracket. The end
parameter can change the default newline added to the output to another ending. Setting to an
empty string, end='', the output ends without any whitespace. The flush parameter defaults to
False and does not need to be changed if the end parameter is not changed. As shown in the example
above, setting the parameter equal to True forces the output to be printed on the terminal before it
is complete. If it is False with the end='' the print() function will first build the output with the
set ending before printing each iteration rather than printing incrementally.
Printing at each iteration dramatically slows down the progression through the loop. How does
the following code solve that problem?
Module Description
csv CSV (comma separated value) file writing and parsing.
io Support for file objects and open().
os Communication with the operating system.
os.path Common path operations such as checking for file existence.
pickle Create portable serialized representations of Python objects.
Unit Testing
7
Lab Objective: Finding and fixing programming errors can be difficult and time consuming,
especially in large or complex programs. Unit testing is a formal strategy for finding and eliminating
errors quickly as a program is constructed and for ensuring that the program still works whenever
it is modified. A single unit test checks a small piece code (usually a function or class method) for
correctness, independent of the rest of the program. A well-written collection of unit tests can help
make sure that every unit of code functions as intended, thereby decreasing the chances for errors in
the program. In this lab we learn to write unit tests in Python and practice test-driven development.
Applying these principles will greatly speed up the coding process and improve your code quality.
Unit Tests
A unit test verifies a piece of code by running a series of test cases and comparing actual outputs
with expected outputs. Each test case is usually checked with an assert statement, a shortcut for
raising an AssertionError with an optional error message if a boolean statement is false.
Now suppose we wanted to test a simple add() function, located in the file specs.py.
103
104 Lab 7. Unit Testing
# specs.py
In a corresponding file called test_specs.py, which should contain all of the unit tests for the
code in specs.py, we write a unit test called test_add() to verify the add() function.
# test_specs.py
import specs
def test_add():
assert specs.add(1, 3) == 4, "failed on positive integers"
assert specs.add(-5, -7) == -12, "failed on negative integers"
assert specs.add(-6, 14) == 8
In this case, running test_add() raises no errors since all three test cases pass. Unit test
functions don’t need to return anything, but they should raise an exception if a test case fails.
Note
This style of external testing—checking that certain inputs result in certain outputs—is called
black box testing. The actual structure of the code is not considered, but what it produces is
thoroughly examined. In fact, the author of a black box test doesn’t even need to be the person
who eventually writes the program: having one person write tests and another write the code
helps detect problems that one developer or the other may not have caught individually.
PyTest
Python’s pytest module1 provides tools for building tests, running tests, and providing detailed
information about the results. To begin, run pytest in the current directory. Without any test files,
the output should be similar to the following.
$ pytest
============================= test session starts =============================
platform linux -- Python 3.10.6, pytest-7.4.0, py-1.11.0, pluggy-1.2.0
rootdir: /Users/Student, inifile:
collected 0 items
Given some test files, say test_calendar.py and test_google.py, the output of pytest iden-
tifies failed tests and provides details on why they failed.
1 Pytest is not part of the standard libray. Install pytest with [basicstyle=]pip install pytest if needed. The
standard library’s [basicstyle=]unittest module also provides a testing framework, but is less popular
and straightforward than PyTest.
105
$ pytest
============================= test session starts =============================
platform linux -- Python 3.10.6, pytest-7.4.0, py-1.11.0, pluggy-1.2.0
rootdir: /Users/Student/example_tests, inifile:
collected 12 items
test_calendar.py ........
test_google.py .F..
def test_subtract():
> assert google.subtract(42, 17)==25, "subtract() failed for a > b > 0"
E AssertionError: subtract() failed for a > b > 0
E assert 35 == 25
E + where 35 = <function subtract at 0x102d4eb90>(42, 17)
E + where <function subtract at 0x102d4eb90> = google.subtract
test_google.py:11: AssertionError
===================== 1 failed, 11 passed in 0.02 seconds =====================
Each dot represents a passed test and each F represents a failed test. They show up in order,
so in the example above, only the second of four tests in test_google.py failed.
Achtung!
PyTest will not find or run tests if they are not contained in files named test_*.py or
*_test.py, where * represents any number of characters. In addition, the unit tests them-
selves must be named test_*() or *_test(). If you need to change this behavior, consult the
documentation at http://pytest.org/latest/example/pythoncollection.html.
def smallest_factor(n):
"""Return the smallest prime factor of the positive integer n."""
if n == 1: return 1
for i in range(2, int(n**.5)):
if n % i == 0: return i
return n
Write a unit test for this function, including test cases that you suspect might uncover the error
(what are the edge cases for this function?). Use pytest to run your unit test and discover a
test case that fails, then use this information to correct the function.
106 Lab 7. Unit Testing
Coverage
Successful unit tests include enough test cases to test the entire program. Coverage refers to the
number of lines of code that are executed by at least one test case. One tool for measuring coverage
is called pytest-cov, an extension of pytest. This tool must be installed separately. To install, run
the following code in a terminal.
Add the flag --cov to the pytest command to print out code coverage information. Running
pytest --cov in the same directory as specs.py and test_specs.py yields the following output.
$ pytest --cov
============================= test session starts =============================
platform linux -- Python 3.10.6, pytest-7.4.0, py-1.11.0, pluggy-1.2.0
rootdir: /Users/Student/Testing, inifile:
plugins: cov-2.3.1
collected 7 items
test_specs.py .......
Here, Stmts refers to the number of lines of code covered by a unit test, while Miss is the number
of lines that are not currently covered. Notice that the file test_specs.py has 100% coverage while
specs.py does not. Test files generally have 100% coverage, since pytest is designed to run these
files in their entirety. However, specs.py does not have full coverage and requires additional unit
tests. To find out which lines are not yet covered, pytest-cov has a useful feature called cov-report
that creates an HTML file for visualizing the current line coverage.
Instead of printing coverage statistics, this command creates various files with coverage details
in a new directory called htmlcov/. The file htmlcov/specs_py.html, which can be viewed in an
internet browser, highlights in red the lines of specs.py that are not yet covered by any unit tests.
107
Note
Statement coverage is categorized as white box testing because it requires an understanding of
the code’s structure. While most black box tests can be written before a program is actually
implemented, white box tests should be added to the collection of unit tests after the program
is completed. By designing unit tests so that they cover every statement in a program, you
may discover that some lines of code are unreachable, find that a conditional statement isn’t
functioning as intended, or uncover problems that accompany edge cases.
Testing Exceptions
Many programs are designed to raise exceptions in response to bad input or an unexpected error. A
good unit test makes sure that the program raises the exceptions that it is expected to raise, but also
that it doesn’t raise any unexpected exceptions. The raises() method in pytest is a clean, formal
way of asserting that a program raises a desired exception. For example, the following code should
raise a ZeroDivisionError if the divisor is 0.
# specs.py
The corresponding unit test checks that the function raises the ZeroDivisionError correctly.
# test_specs.py
import pytest
def test_divide():
assert specs.divide(4,2) == 2, "integer division"
assert specs.divide(5,4) == 1.25, "float division"
pytest.raises(ZeroDivisionError, specs.divide, a=4, b=0)
def test_divide():
assert specs.divide(4,2) == 2, "integer division"
assert specs.divide(5,4) == 1.25, "float division"
with pytest.raises(ZeroDivisionError) as excinfo:
specs.divide(4, 0)
assert excinfo.value.args[0] == "second input cannot be zero"
Here excinfo is an object containing information about the exception; the actual exception
object is stored in excinfo.value, and hence excinfo.value.args[0] is the error message.
Problem 3. Write a comprehensive unit test for the following function. Make sure that each
exception is raised properly by explicitly checking the exception message. Use pytest-cov and
its cov-report tool to confirm that you have full coverage for this function.
Fixtures
Consider the following class for representing rational numbers as reduced fractions.
class Fraction(object):
"""Reduced fraction class with integer numerator and denominator."""
def __init__(self, numerator, denominator):
if denominator == 0:
raise ZeroDivisionError("denominator cannot be zero")
elif type(numerator) is not int or type(denominator) is not int:
raise TypeError("numerator and denominator must be integers")
def gcd(a,b):
while b != 0:
a, b = b, a % b
return a
common_factor = gcd(numerator, denominator)
self.numer = numerator // common_factor
self.denom = denominator // common_factor
def __str__(self):
if self.denom != 1:
return "{} / {}".format(self.numer, self.denom)
else:
return str(self.numer)
def __float__(self):
return self.numer / self.denom
To test this class, it would be nice to have some ready-made Fraction objects to use in each
unit test. A fixture, a function marked with the @pytest.fixture decorator, sets up variables that
can be used as mock data for multiple unit tests. The individual unit tests take the fixture function
in as input and unpack the constructed tests. Below, we define a fixture that instantiates three
Fraction objects. The unit tests for the Fraction class use these objects as test cases.
@pytest.fixture
def set_up_fractions():
frac_1_3 = specs.Fraction(1, 3)
frac_1_2 = specs.Fraction(1, 2)
frac_n2_3 = specs.Fraction(-2, 3)
return frac_1_3, frac_1_2, frac_n2_3
def test_fraction_init(set_up_fractions):
frac_1_3, frac_1_2, frac_n2_3 = set_up_fractions
assert frac_1_3.numer == 1
assert frac_1_2.denom == 2
assert frac_n2_3.numer == -2
frac = specs.Fraction(30, 42) # 30/42 reduces to 5/7.
assert frac.numer == 5
assert frac.denom == 7
def test_fraction_str(set_up_fractions):
frac_1_3, frac_1_2, frac_n2_3 = set_up_fractions
assert str(frac_1_3) == "1 / 3"
assert str(frac_1_2) == "1 / 2"
assert str(frac_n2_3) == "-2 / 3"
def test_fraction_float(set_up_fractions):
frac_1_3, frac_1_2, frac_n2_3 = set_up_fractions
assert float(frac_1_3) == 1 / 3.
assert float(frac_1_2) == .5
assert float(frac_n2_3) == -2 / 3.
def test_fraction_eq(set_up_fractions):
frac_1_3, frac_1_2, frac_n2_3 = set_up_fractions
assert frac_1_2 == specs.Fraction(1, 2)
assert frac_1_3 == specs.Fraction(2, 6)
assert frac_n2_3 == specs.Fraction(8, -12)
111
Problem 4. Add test cases to the unit tests provided above to get full coverage for the
__init__(), __str__(), __float__(), and __eq__() methods. You may modify the fix-
ture function if it helps. Also add unit tests for the magic methods __add__(), __sub__(),
__mul__(), and __truediv__(). Verify that you have full coverage with pytest-cov.
Additionally, two of the Fraction class’s methods are implemented incorrectly. Use your
tests to find the issues, then correct the methods so that your tests pass.
Test-driven Development
Test-driven development (TDD) is the programming style of writing tests before implementing the
actual code. It may sound tedious at first, but TDD incentivizes simple design and implementation,
speeds up the actual coding, and gives quantifiable checkpoints for the development process. TDD
can be summarized in the following steps:
1. Define with great detail the program specifications. Write function declarations, class defini-
tions, and especially docstrings, determining exactly what each function or class method should
accept and return.
2. Write a unit test for each unit of the program, usually black box tests.
3. Implement the program code, making changes until all tests pass.
For adding new features or cleaning existing code, the process is similar.
1. Redefine program specifications to account for planned modifications.
2. Add or modify tests to match the new specifications.
3. Change the code until all tests pass.
If the test cases are sufficiently thorough, then when the tests all pass, the program can be
considered complete. Remember, however, that it is not sufficient to just have tests, but to have
tests that accurately and rigorously test the code. To check that the test cases are sufficient, examine
the test coverage and add additional tests if necessary.
See https://en.wikipedia.org/wiki/Test-driven_development for more discussion on TDD
and https://en.wikipedia.org/wiki/Behavior-driven_development for an overview of Behavior-
driven development (BDD), a close relative of TDD.
Problem 5. Set is a card game about finding patterns. Each card contains a design with 4
different properties: color (red, green or purple), shape (diamond, oval or squiggly), quantity
(one, two, or three) and pattern (solid, striped or outlined). A set is a group of three cards
which are either all the same or all different for each property. You can try playing Set online
at http://smart-games.org/en/set/start.
112 Lab 7. Unit Testing
(a) Same in quantity and shape; different in (b) Same in color and pattern; different in
pattern and color shape and quantity
(c) Same in pattern; different in shape, (d) Same in shape; different in quantity,
quantity and color pattern and color
Each Set card can be uniquely represented by a 4-bit integer in base 3,a where each digit
represents a different property and each property has three possible values. A full hand in Set is
a group of twelve unique cards, so a hand can be represented by a list of twelve 4-digit integers
in base 3. For example, the hand shown above could be represented by the following list.
The following function definitions provide a framework for partially implementing Set by
calculating the number of sets in a given hand.
113
def count_sets(cards):
"""Return the number of sets in the provided Set hand.
Parameters:
cards (list(str)) a list of twelve cards as 4-bit integers in
base 3 as strings, such as ["1022", "1122", ..., "1020"].
Returns:
(int) The number of sets in the hand.
Raises:
ValueError: if the list does not contain a valid Set hand, meaning
- there are not exactly 12 cards,
- the cards are not all unique,
- one or more cards does not have exactly 4 digits, or
- one or more cards has a character other than 0, 1, or 2.
"""
pass
Parameters:
a, b, c (str): string representations of 4-bit integers in base 3.
For example, "1022", "1122", and "1020" (which is not a set).
Returns:
True if a, b, and c form a set, meaning the ith digit of a, b,
and c are either the same or all different for i=1,2,3,4.
False if a, b, and c do not form a set.
"""
pass
Write unit tests for these functions, but do not implement them yet. Focus on what the
functions should do rather than on how they will be implemented.
(Hint: if three cards form a set, then the first digits of the cards are either all the same or all
different. Then the sums of these digits can only be 0, 3, or 6. Thus, a group of cards forms a
set only if for each set of digits—first digits, second digits, etc.—the sum is a multiple of 3.)
a A 4-bit integer in base 3 contains four digits that are either 0, 1 or 2. For example, 0000 and 1201 are 4-bit
integers in base 3, whereas 000 is not because it has only three digits, and 0123 is not because it contains the
number 3.
Problem 6. After you have written unit tests for the functions in Problem 5, implement the
actual functions. If needed, add additional test cases to get full coverage.
(Hint: The combinations() function from the standard library module itertools may be
useful in implementing count_sets().)
114 Lab 7. Unit Testing
Additional Material
The Python Debugger
Python has a built in debugger called pdb to aid in finding mistakes in code during execution. The
debugger can be run either in a terminal or in a Jupyter Notebook.
A break point, set with pdb.set_trace(), is a spot where the program pauses execution. Once
the program is paused, use the following commands to tell the program what to do next.
Command Description
n next: executes the next line
p <var> print: display the value of the specified variable.
c continue: stop debugging and run the program normally to the end.
q quit: terminate the program.
l list: show several lines of code around the current line.
r return: return to the end of a subroutine.
<Enter> Execute the most recent command again.
For example, suppose we have a long loop where the value of a variable changes unpredictably.
# pdb_example.py
import pdb
from random import randint
i = 0
pdb.set_trace() # Set a break point.
while i < 1000000000:
i += randint(1, 10)
print("DONE")
$ python pdb_example.py
> /Users/Student/pdb_example.py(7)<module>()
-> while i < 1000000000:
(Pdb) l # Show where we are.
2 import pdb
3 from random import randint
4
5 i = 0
6 pdb.set_trace()
7 -> while i < 1000000000:
8 i += randint(1, 10)
9 print("DONE")
[EOF]
We can check the value of the variable i at any step with p i, and we can even change the
value of i mid-program.
115
Lab Objective: This lab demonstrates how to communicate information through clean, concise,
and honest data visualization. We recommend completing the exercises in a Jupyter Notebook.
Problem 1. The file anscombe.npy contains the quartet of data points shown in the table
below. For each section of the quartet,
• Plot the data as a scatter plot on the box [0, 20] × [0, 13].
• Use scipy.stats.linregress() to calculate the slope and intercept of the least squares
regression line for the data and its correlation coefficient (the first three return values).
• Plot the least squares regression line over the scatter plot on the domain x ∈ [0, 20].
• Report (print) the mean and variance in x and y, the slope and intercept of the regression
line, and the correlation coefficient. Compare these statistics to those of the other sections.
• Describe how the section is similar to the others and how it is different.
I II III IV
x y x y x y x y
10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
117
118 Lab 8. Data Visualization
Line Plots
Figure 8.1: Line plots can be used to visualize and compare mathematical functions. For example,
this figure shows the first nine Chebyshev polynomials in one plot (left) and small multiples (right).
Using small multiples makes comparison easy and shows how each polynomial changes as n increases.
A line plot connects ordered (x, y) points with straight lines, and is best for visualizing one or
two ordered arrays, such as functional outputs over an ordered domain or a sequence of values over
time. Sometimes, plotting multiple lines on the same plot helps the viewer compare two different
data sets. However, plotting several lines on top of each other makes the visualization difficult to
read, even with a legend. For example, Figure 8.1 shows the first nine Chebyshev polynomials, a
family of orthogonal polynomials that satisfies the recursive relation
The plot on the right makes comparison easier by using small multiples. Instead of using a legend,
the figure makes a separate subplot with a title for each polynomial. Adjusting the figure size and
the line thickness also makes the information easier to read.
Note
Matplotlib titles and annotations can be formatted with LATEX, a system for creating technical
documents.a To do so, use an r before the string quotation mark and surround the text with
dollar signs. For example, add the following line of code to the loop from the previous example.
... plt.title(r"$T_{}(x)$".format(n))
The format() method inserts the input n at the curly braces. The title of the sixth
subplot, instead of being “n = 5,” will then be “T5 (x).”
a See http://www.latex-project.org/ for more information.
Bar Charts
20 Spam
Eggs
15
Hannibal Ham
10 Smoked Sausage
Crispy Bacon
5 Baked Beans
Lobster Thermador
0
Lobster Thermador
Baked Beans
CrispySmoked
Bacon Sausage
Hannibal Ham
Eggs Spam 0 5 10 15 20
Figure 8.2: Bar charts are used to compare quantities between categorical variables. The labels on
the vertical bar chart (left) are more difficult to read than the labels on the horizontal bar chart
(right). Although the labels can be rotated, horizontal text is much easier to read than vertical text.
120 Lab 8. Data Visualization
A bar chart plots categorical data in a sequence of bars. They are best for small, discrete, one-
dimensional data sets. In Matplotlib, plt.bar() creates a vertical bar chart or plt.barh() creates
a horizontal bar chart. These functions receive the locations of each bar followed by the height of
each bar (as lists or arrays). In most situations, horizontal bar charts are preferable to vertical bar
charts because horizontal labels are easier to read than vertical labels. Data in a bar chart should
also be sorted in a logical way, such as alphabetically, by size, or by importance.
Histograms
70
200 60
50
150
40
100 30
20
50
10
0 0
3 2 1 0 1 2 3 3 2 1 0 1 2 3
Figure 8.3: Histograms are used to show the distribution of one-dimensional data. Experimenting
with different values for the bin size is important when plotting a histogram. Using only 10 bins
(left) doesn’t give a good sense for how the randomly generated data is distributed. However, using
35 bins (right) reveals the shape of a normal distribution.
A histogram partitions an interval into a number of bins and counts the number of values
that fall into each bin. Histograms are ideal for visualizing how unordered data in a single array
is distributed over an interval. For example, if data are drawn from a probability distribution, a
histogram approximates the distribution’s probability density function. Use plt.hist() to create
a histogram. The arguments bins and range specify the number of bins to draw and over what
domain. A histogram with too few or too many bins will not give a clear view of the distribution.
Scatter Plots
3 10
2
5
1
0 0
1
2 5
3
4 10
0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35
Figure 8.4: Scatter plots show correlations between variables by plotting markers at coordinate
points. The figure above displays randomly perturbed data that is visualized using two scatter plots
with alpha=.5 and edgecolor='none'. The default (left) makes it harder to see correlation and
pattern whereas making the axes equal better reveals the oscillatory behavior in the perturbed sine
wave.
>>> np.random.seed(0)
>>> x = np.linspace(0,10*np.pi,200) + np.random.normal(size=200)
>>> y = np.sin(x) + np.random.normal(size=200)
A scatter plot draws (x, y) points without connecting them. Scatter plots are best for displaying
data sets without a natural order, or where each point is a distinct, individual instance. They are
frequently used to show correlation between variables in a data set. Use plt.scatter() to create a
scatter plot.1
1 Scatter plots can also be drawn with plt.plot() by specifying a point marker such as '.', ',', 'o', or '+'. The
keywords markersize and color can be used to change the marker size and marker color, respectively.
122 Lab 8. Data Visualization
Similar data points in a scatter plot may overlap, as in Figure 8.4. Specifying an alpha value
reveals overlapping data by making the markers transparent (see Figure 8.5 for an example). The
keyword alpha accepts values between 0 (completely transparent) and 1 (completely opaque). When
plotting lots of overlapping points, the outlines on the markers can make the visualization look
cluttered. Setting the edgecolor keyword to zero removes the outline and improves the visualization.
Problem 3. The file MLB.npy contains measurements from over 1,000 recent Major League
Baseball players, compiled by UCLA.a Each row in the array represents a player; the columns
are the player’s height (in inches), weight (in pounds), and age (in years), in that order.
Create several visualizations to show the correlations between height, weight, and age in
the MLB data set. Use at least one scatter plot. Adjust the marker size, plot a regression line,
change the window limits, and use small multiples where appropriate.
a See http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_MLB_HeightsWeights.
Problem 4. The file earthquakes.npy contains data from over 17,000 earthquakes from the
beginning of 2000 to the end of 2009 that were at least a 5 on the Richter scale.a Each row
in the array represents an earthquake; the columns are the earthquake’s date (as a fraction of
the year, so March 14 2001 would be 2001.2), magnitude (on the Richter scale), longitude, and
latitude, in that order.
Because each earthquake is a distinct event, a good way to start visualizing this data
might be a scatter plot of the years versus the magnitudes of each earthquake.
8
Magnitude
5
2000 2002 2004 2006 2008 2010
Year
123
Unfortunately, this plot communicates very little information because the data is so clut-
tered. Describe the data with at least two better visualizations. Include line plots, scatter plots,
and histograms as appropriate. Your plots should answer the following questions:
Hexbins
12.5 12.5
10.0 10.0
7.5 7.5
5.0 5.0
2.5 2.5
0.0 0.0
2.5 2.5
5.0 5.0
5 0 5 10 5 0 5 10
Figure 8.5: Hexbins can be used instead of using a three-dimensional histogram to show the distribu-
tion of two-dimensional data. Choosing the right gridsize will give a better picture of the distribution.
The figure above shows random data plotted as hexbins with a gridsize of 10 (left) and 25 (right).
Hexbins use color to show height via a colormap and both histograms above use the 'inferno'
colormap.
1.5 7.5
1.0 6.0
0.5
4.5
0.0
3.0
0.5
1.0 1.5
1.5 0.0
1.5 1.0 0.5 0.0 0.5 1.0 1.5
1.5 8.0
1.0 6.4
0.5 4.8
0.0 3.2
0.5 1.6
1.0 0.0
1.5 1.6
1.5 1.0 0.5 0.0 0.5 1.0 1.5
Figure 8.6: Heat maps visualize three-dimensional functions or surfaces by using color to represent
the value in one dimension. With continuous data, it can be hard to identify regions of interest.
Contour plots solve this problem by visualizing the level curves of the surface. Top left: heat map.
Top right: contour plot. Bottom left: filled contour map. Bottom right: contours plotted on a heat
map.
# Plot f using a heat map, a contour map, and a filled contour map.
>>> fig, ax = plt.subplots(2,2)
>>> ax[0,0].pcolormesh(X, Y, Z, cmap="viridis") # Heat map.
>>> ax[0,1].contour(X, Y, Z, 6, cmap="viridis") # Contour map.
>>> ax[1,0].contourf(X, Y, Z, 12, cmap="magma") # Filled contour map.
>>> plt.show()
When plotting hexbins, heat maps, and contour plots, be sure to choose a colormap that best
represents the data. Avoid using spectral or rainbow colormaps like "jet" because they are not
perceptually uniform, meaning that the rate of change in color is not constant. Because of this, data
points may appear to be closer together or farther apart than they actually are. This creates visual
false positives or false negatives in the visualization and can affect the interpretation of the data.
As a default, we recommend using the sequential colormaps "viridis" or "inferno" because they
are designed to be perceptually uniform and colorblind friendly. For the complete list of Matplotlib
color maps, see http://matplotlib.org/examples/color/colormaps_reference.html.
The minimum value of f is 0, which occurs at the point (1, 1) at the bottom of a steep, banana-
shaped valley of the function.
Use a heat map and a contour plot to visualize the Rosenbrock function. Also plot the
minimizer (1, 1). Use a different sequential colormap for each visualization.
126 Lab 8. Data Visualization
Best Practices
Good scientific visualizations make comparison easy and clear. The eye is very good at detecting
variation in one dimension and poor in two or more dimensions. For example, consider Figure 8.7.
Despite the difficulty, most people can probably guess which slice of a pie chart is the largest or
smallest. However, it’s almost impossible to confidently answer the question by how much? The bar
charts may not be as aesthetically pleasing but they make it much easier to precisely compare the
data. Avoid using pie charts as well as other visualizations that make accurate comparison difficult,
such as radar charts, bubble charts, and stacked bar charts.
1
4
0 3
2
2
1
4 0
3
0 5 10 15 20 25
1 4
0
3
2 2
1
4 0
3 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0
1 4
0
3
2
2 1
4
0
3 0 5 10 15 20 25
Figure 8.7: The pie charts on the left may be more colorful but it’s extremely difficult to quantify
the difference between each slice. Instead, the horizontal bar charts on the right make it very easy
to see the difference between each variable.
127
No visualization perfectly represents data, but some are better than others. Finding the best
visualization for a data set is an iterative process. Experiment with different visualizations by ad-
justing their parameters: color, scale, size, shape, position, and length. It may be necessary to use a
data transformation or visualize various subsets of the data. As you iterate, keep in mind the saying
attributed to George Box: “All models are wrong, but some are useful.” Do whatever is needed to
make the visualization useful and effective.
Figure 8.8: Chartjunk refers to anything that does not communicate data. In the image on the
left, the cartoon monster distorts the bar chart and manipulates the feelings of the viewer to think
negatively about the results. The image on the right shows the same data without chartjunk, making
it simple and very easy to interpret the data objectively.
Good visualizations are as simple as possible and no simpler. Edward Tufte coined the term
chartjunk to mean anything (pictures, icons, colors, and text) that does not represent data or is
distracting. Though chartjunk might appear to make data graphics more memorable than plain
visualizations, it is more important to be clear and precise in order to prevent misin-
terpretation. The physicist Richard Feynman said, “For a successful technology, reality must take
precedence over public relations, for Nature cannot be fooled.” Remove chartjunk and anything that
prevents the viewer from objectively interpreting the data.
128 Lab 8. Data Visualization
number of procedures
1500000
1250000
1000000
750000
500000
250000
2006 2007 2008 2009 2010 2011 2012 2013
year source: Americans United for Life
Figure 8.9: The chart on the left is an example of a dishonest graphic shown at a United States
congressional hearing in 2015. The chart on the right shows a more accurate representation of the
data by showing the y-axis and revealing the missing data from 2008. Source: PolitiFact.
Visualizations should be honest. Figure 8.9 shows how visualizations can be dishonest. The
misleading graphic on the left was used as evidence in a United States congressional hearing in 2015.
With the y-axis completely removed, it is easy to miss that each line is shown on a different y-axis
even though they are measured in the same units. Furthermore, the chart fails to indicate that data
is missing from the year 2008. The graphic on the right shows a more accurate representation of the
data.2
Never use data visualizations to deceive or manipulate. Always present information on who
created it, where the data came from, how it was collected, whether it was cleaned or transformed,
and whether there are conflicts of interest or possible biases present. Use specific titles and axis
labels, and include units of measure. Choose an appropriate window size and use a legend or other
annotations where appropriate.
Problem 6. The file countries.npy contains information from 20 different countries. Each
row in the array represents a different country; the columns are the 2015 population (in millions
of people), the 2015 GDP (in billions of US dollars), the average male height (in centimeters),
and the average female height (in centimeters), in that order.a
The countries corresponding are listed below in order.
oct/01/jason-chaffetz/chart-shown-planned-parenthood-hearing-misleading-/.
129
Visualize this data set with at least four plots, using at least one scatter plot, one his-
togram, and one bar chart. List the major insights that your visualizations reveal.
(Hint: consider using np.argsort() and fancy indexing to sort the data for the bar chart.)
a See https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal),
https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population, and
http://www.averageheight.co/.
For more about data visualization, we recommend the following books and websites.
• %time: Execute some code and print out its execution time.
• %timeit: Execute some code several times and print out the average execution time.
• %prun: Run a statement through the Python code profiler,1 printing the number of function
calls and the time each takes. We will demonstrate this tool a little later.
# Time the same list construction, but with a regular for loop.
In [2]: %%time # Use a double %% to time a block of code.
...: x = []
...: for i in range(int(1e5)):
...: x.append(i**2)
...:
CPU times: user 50 ms, sys: 2.79 ms, total: 52.8 ms
Wall time: 55.2 ms # The list comprehension is faster!
131
132 Lab 9. Profiling
3
74
246
8593
def max_path(filename="triangle.txt"):
"""Find the maximum vertical path in a triangle of values."""
with open(filename, 'r') as infile:
data = [[int(n) for n in line.split()]
for line in infile.readlines()]
def path_sum(r, c, total):
"""Recursively compute the max sum of the path starting in row r
and column c, given the current total.
"""
total += data[r][c]
if r == len(data) - 1: # Base case.
return total
else: # Recursive case.
return max(path_sum(r+1, c, total), # Next row, same column.
path_sum(r+1, c+1, total)) # Next row, next column.
The data in triangle.txt contains 15 rows and hence 16384 paths, so it is possible to
solve this problem by trying every route. However, for a triangle with 100 rows, there are 299
paths to check, which would take billions of years to compute even for a program that could
check one trillion routes per second. No amount of improvement to max_path() can make it
run in an acceptable amount of time on such a triangle—we need a different algorithm.
Write a function that accepts a filename containing a triangle of integers. Compute the
largest path sum with the following strategy: starting from the next to last row of the triangle,
replace each entry with the sum of the current entry and the greater of the two “child entries.”
Continue this replacement up through the entire triangle. The top entry in the triangle will be
the maximum path sum. In other words, work from the bottom instead of from the top.
133
3 3 3 23
74 74 20 19 20 19
−→ −→ −→
246 10 13 15 10 13 15 10 13 15
8593 8593 8593 8593
Use your function to find the maximum path sum of the 100-row triangle stored in
triangle_large.txt. Make sure that your new function still gets the correct answer for the
smaller triangle.txt. Finally, use %time or %timeit to time both functions on triangle.txt.
Your new function should be about 100 times faster than the original.
The Profiler
The profiling command %prun lists the functions that are called during the execution of a piece of
code, along with the following information.
Heading Description
primitive calls The number of calls that were not caused by recursion.
ncalls The number of calls to the function. If recursion occurs, the output
is <total number of calls>/<number of primitive calls>.
tottime The amount of time spent in the function, not including calls to other functions.
percall The amount of time spent in each call of the function.
cumtime The amount of time spent in the function, including calls to other functions.
Avoid Repetition
A clean program does no more work than is necessary. The ncalls column of the profiler output is
especially useful for identifying parts of a program that might be repetitive. For example, the profile
of max_path() indicates that len() was called 32,767 times—exactly as many times as path_sum().
This is an easy fix: save len(data) as a variable somewhere outside of path_sum().
Note that the total number of primitive function calls decreased from 49,181 to 16,415. Using
%timeit also shows that the run time decreased by about 15%. Moving code outside of a loop or an
often-used function usually results in a similar speedup.
Another important way of reducing repetition is carefully controlling loop conditions to avoid
unnecessary iterations. Consider the problem of identifying Pythagorean triples, sets of three distinct
integers a < b < c such that a2 + b2 = c2 . The following function identifies all such triples where
each term is less than a parameter N by checking all possible triples.
Since a < b < c by definition, any computations where b√≤ a or c ≤ b are unnecessary.
Additionally, once a and b are chosen, c can be no greater than a2 + b2 . The following function
changes the loop conditions to avoid these cases and takes care to only compute a2 + b2 once for each
unique pairing (a, b).
These improvements have a drastic impact on run time, even though the main approach—
checking by brute force—is the same.
def primes(N):
"""Compute the first N primes."""
primes_list = []
current = 2
while len(primes_list) < N:
isprime = True
for i in range(2, current): # Check for nontrivial divisors.
if current % i == 0:
isprime = False
if isprime:
primes_list.append(current)
current += 1
return primes_list
This function takes about 6 minutes to find the first 10,000 primes on a fast computer.
Without significantly modifying the approach, rewrite primes() so that it can compute
the first 10,000 primes in under 0.1 seconds. Use the following facts to reduce unnecessary
iterations.
• A number is not prime if it has one or more divisors other than 1 and itself.
(Hint: recall the break statement.)
√
• If p ∤ n, then ap ∤ n for any integer a. Also, if p | n and 0 < p < n, then p ≤ n.
Avoid Loops
NumPy routines and built-in functions are often useful for eliminating loops altogether. Consider
the simple problem of summing the rows of a matrix, implemented in three ways.
None of the functions are fundamentally different, but their run times differ dramatically.
In this experiment, row_sum_fast() runs several hundred times faster than row_sum_awful().
This is primarily because looping is expensive in Python, but NumPy handles loops in C, which is
much quicker. Other NumPy functions like np.sum() with an axis argument can often be used to
eliminate loops in a similar way.
The following function solves this problem naïvely for the usual Euclidean norm.
Write a new version of this function without any loops or list comprehensions, using array
broadcasting and the axis keyword in np.linalg.norm() to eliminate the existing loop. Try
to implement the entire function in a single line.
(Hint: See the NumPy Visual Guide in the Appendix for a refresher on array broadcasting.)
Profile the old and new versions with %prun and compare the output. Finally, use %time
or %timeit to verify that your new version runs faster than the original.
a The
nearest neighbor problem is a common problem in many fields of artificial intelligence. The problem
can be solved more efficiently with a k-d tree, a specialized data structure for storing high-dimensional data.
Looking up dictionary values is also almost immediate. Use dictionaries for storing calculations
to be reused, such as mappings between letters and numbers or common function outputs.
• Construction with comprehension. Lists, sets, and dictionaries can all be constructed with
comprehension syntax. This is slightly faster than building the collection in a loop, and the
code is highly readable.
• Intelligent iteration. Unlike looking up dictionary values, indexing into lists takes time.
Instead of looping over the indices of a list, loop over the entries themselves. When indices and
entries are both needed, use enumerate() to get the index and the item simultaneously.
def name_scores(filename="names.txt"):
"""Find the total of the name scores in the given file."""
with open(filename, 'r') as infile:
names = sorted(infile.read().replace('"', '').split(','))
total = 0
for i in range(len(names)):
name_value = 0
for j in range(len(names[i])):
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
for k in range(len(alphabet)):
if names[i][j] == alphabet[k]:
letter_value = k + 1
name_value += letter_value
total += (names.index(names[i]) + 1) * name_value
return total
140 Lab 9. Profiling
Rewrite this function—removing repetition, eliminating loops, and using data structures
correctly—so that it runs in less than 10 milliseconds on average.
Use Generators
A generator is an iterator that yields multiple values, one at a time, as opposed to returning a single
value. For example, range() is a generator. Using generators appropriately can reduce both the run
time and the spatial complexity of a routine. Consider the following function, which constructs a list
containing the entries of the sequence {xn }N
n=1 where xn = xn−1 + n with x1 = 1.
A potential problem with this function is that all of the values in the list are computed before
anything is returned. This can be a big issue if the parameter N is large. A generator, on the other
hand, yields one value at a time, indicated by the keyword yield (instead of return). When the
generator is asked for the next entry, the code resumes right where it left off.
Many generators, like range() and sequence_generator(), only yield a finite number of values.
However, generators can also continue yielding indefinitely. For example, the following generator
yields the terms of {xn }∞
n=1 forever. In this case, using enumerate() with the generator is helpful
for tracking the index n as well as the entry xn .
# Sum the entries of the sequence until the sum exceeds 1000.
>>> total = 0
>>> for i, x in enumerate(sequence_generator_forever()):
... total += x
... if total > 1000:
... print(i) # Print the index where the total exceeds.
... break # Break out of the for loop to stop iterating.
...
17
# Check that 18 terms are required (since i starts at 0 but n starts at 1).
>>> print(sum(sequence_generator(17)), sum(sequence_generator(18)))
969 1140
Problem 6. The function in Problem 2 could be turned into a prime number generator that
yields primes indefinitely, but it is not the only strategy for yielding primes. The Sieve of
Eratosthenes a is a faster technique for finding all of the primes below a certain number.
2. Remove all integers that are divisible by the first entry in the list.
3. Yield the first entry in the list and remove it from the list.
Write a generator that accepts an integer N and that yields all primes (in order, one at a
time) that are less than N using the Sieve of Eratosthenes. Your generator should be able to
find all primes less than 100,000 in under 5 seconds.
Your generator and your fast function from Problem 2 may be helpful in solving problems
10, 35, 37, 41, 49, and 50 (for starters) of https://projecteuler.net.
a See https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes.
Numba
Python code is simpler and more readable than many languages, but Python is also generally much
slower than compiled languages like C. The numba module bridges the gap by using just-in-time (JIT)
compilation to optimize code, meaning that the code is actually compiled right before execution.
Python is a dynamically typed language, meaning variables are not defined explicitly with a
datatype (x = 6 as opposed to int x = 6). This particular aspect of Python makes it flexible,
easy to use, and slow. Numba speeds up Python code primarily by assigning datatypes to all the
variables. Rather than requiring explicit definitions for datatypes, Numba attempts to infer the
correct datatypes based on the datatypes of the input. In row_sum_numba(), if A is an array of
integers, Numba will infer that total should also be an integer. On the other hand, if A is an array
of floats, Numba will infer that total should be a double (a similar datatype to float in C).
143
Once all datatypes have been inferred and assigned, the original Python code is translated to
machine code. Numba caches this compiled version of code for later use. The first function call takes
the time to compile and then execute the code, but subsequent calls use the already-compiled code.
# The first function call takes a little extra time to compile first.
In [23]: %time rows = row_sum_numba(A)
CPU times: user 408 ms, sys: 11.5 ms, total: 420 ms
Wall time: 425 ms
Note that the only difference between row_sum_numba() and row_sum_awful() from a few
pages ago is the @jit decorator, and yet the Numba version is about 99% faster than the original!
The inference engine within Numba does a good job, but it’s not always perfect. Adding
the keyword argument nopython=True to the @jit decorator raises an error if Numba is unable to
convert each variable to explicit datatypes. The inspect_types() method can also be used to check
if Numba is using the desired types.
Alternatively, datatypes can be specified explicitly in the @jit decorator as a dictionary via
the locals keyword argument. Each of the desired datatypes must also be imported from Numba.
While it sometimes results in a speed boost, there is a caveat to specifying the datatypes:
row_sum_numba() no longer accepts arrays that contain anything other than floats. When datatypes
are not specified, Numba compiles a new version of the function each time the function is called with
a different kind of input. Each compiled version is saved, so the function can still be used flexibly.
144 Lab 9. Profiling
Plot the times against the size m on a log-log plot with a base 2 scale (use plt.loglog()).
With n = 10, the plot should show that the Numba and NumPy versions far outperform the
pure Python implementation, with NumPy eventually becoming faster than Numba.
Achtung!
Optimizing code is an important skill, but it is also important to know when to refrain from
optimization. The best approach to coding is to write unit tests, implement a solution that
works, test and time that solution, then (and only then) optimize the solution with profiling
techniques. As always, the most important part of the process is choosing the correct algorithm
to solve the problem. Don’t waste time optimizing a poor algorithm.
145
Additional Material
Other Timing Techniques
Though %time and %timeit are convenient and work well, some problems require more control for
measuring execution time. The usual way of timing a code snippet by hand is via the time module
(which %time uses). The function time.time() returns the number of seconds since the Epoch2 ; to
time code, measure the number of seconds before the code runs, the number of seconds after the
code runs, and take the difference.
The timeit module (which %timeit uses) has tools for running code snippets several times.
The code is passed in as a string, as well as any setup code to be run before starting the clock.
The primary advantages of these techniques are the ability automate timing code and being able
save the results. For more documentation, see https://docs.python.org/3.6/library/time.html
and https://docs.python.org/3.6/library/timeit.html.
Option Description
-l <limit> Include a limited number of lines in the output.
-s <key> Sort the output by call count, cumulative time, function name, etc.
-T <filename> Save profile results to a file (results are still printed).
2 See https://en.wikipedia.org/wiki/Epoch_(reference_date)#Computing.
146 Lab 9. Profiling
10
Introduction to SymPy
147
148 Lab 10. Introduction to SymPy
SymPy has its own version for each of the standard mathematical functions like sin(x), log(x),
√
and x, and includes predefined variables for special numbers such as π. The naming conventions
for most functions match NumPy, but some of the built-in constants are named slightly differently.
√
sin(x) arcsin(x) sinh(x) ex log(x) x
Functions
sy.sin() sy.asin() sy.sinh() sy.exp() sy.log() sy.sqrt()
√
π e i = −1 ∞
Constants
sy.pi sy.E sy.I sy.oo
Other trigonometric functions like cos(x) follow the same naming conventions. For a complete list of
SymPy functions, see http://docs.sympy.org/latest/modules/functions/index.html.
Achtung!
Always use SymPy functions and constants when creating expressions instead of using NumPy’s
functions and constants. Later we will show how to make NumPy and SymPy cooperate.
>>> x = sy.symbols('x')
>>> np.exp(x) # Try to use NumPy to represent e**x.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'Symbol' object has no attribute 'exp'
Note
SymPy defines its own numeric types for integers, floats, and rational numbers. For example,
the sy.Rational class is similar to the standard library’s fractions.Fraction class, and
should be used to represent fractions in SymPy expressions.
>>> x = sy.symbols('x')
>>> (2/3) * sy.sin(x) # 2/3 returns a float, not a rational.
0.666666666666667*sin(x)
Always be aware of which numeric types are being used in an expression. Using rationals and
integers where possible is important for simplifying expressions.
149
2
Problem 1. Write a function that returns the expression 25 ex −y
cosh(x + y) + 3
7 log(xy + 1)
symbolically. Make sure that the fractions remain symbolic.
Simplifying Expressions
The expressions for the summation and product in the previous example are automatically simplified.
More complicated expressions can be simplified with one or more of the following functions.
Function Description
sy.cancel() Cancel common factors in the numerator and denominator.
sy.expand() Expand a factored expression.
sy.factor() Factor an expanded expression.
sy.radsimp() Rationalize the denominator of an expression.
sy.simplify() Simplify an expression.
sy.trigsimp() Simplify only the trigonometric parts of the expression.
>>> x = sy.symbols('x')
>>> expr = (x**2 + 2*x + 1) / ((x+1)*((sy.sin(x)/sy.cos(x))**2 + 1))
>>> print(expr)
(x**2 + 2*x + 1)/((x + 1)*(sin(x)**2/cos(x)**2 + 1))
>>> sy.simplify(expr)
(x + 1)*cos(x)**2
The generic sy.simplify() tries to simplify an expression in any possible way. This is often
computationally expensive; using more specific simplifiers when possible reduces the cost.
150 Lab 10. Introduction to SymPy
Achtung!
1. Simplifications return new expressions; they do not modify existing expressions in place.
2. The == operator compares two expressions for exact structural equality, not algebraic
equivalence. Simplify or expand expressions before comparing them with ==.
3. Expressions containing floats may not simplify as expected. Always use integers and
SymPy rationals in expressions when appropriate.
>>> sy.factor(x**2.0 - 1)
x**2.0 - 1 # Factorization fails due to the 2.0.
151
Problem 2. Write a function that symbolically computes and simplifies the following expres-
sion.
Y5 X5
j(sin(x) + cos(x))
i=1 j=i
Evaluating Expressions
Every SymPy expression has a subs() method that substitutes one variable for another. The result is
usually still a symbolic expression, even if a numerical value is used in the substitution. The evalf()
method actually evaluates the expression numerically after all symbolic variables have been assigned
a value. Both of these methods can accept a dictionary to reassign multiple symbols simultaneously.
These operations are good for evaluating an expression at a single point, but it is typically more
useful to turn the expression into a reusable numerical function. To this end, sy.lambdify() takes
in a symbolic variable (or list of variables) and an expression, then returns a callable function that
corresponds to the expression.
By default, sy.lambdify() uses the math module to convert an expression to a function. For
example, sy.sin() is converted to math.sin(). By providing "numpy" as an additional argument,
sy.lambdify() replaces symbolic functions with their NumPy equivalents instead, so sy.sin() is
converted to np.sin(). This allows the resulting function to act element-wise on NumPy arrays, not
just on single data points.
Note
It is almost always computationally cheaper to lambdify a function than to use substitutions.
According to the SymPy documentation, using sy.lambdify() to do numerical evaluations
“takes on the order of hundreds of nanoseconds, roughly two orders of magnitude faster than
the subs() method.”
Write a function that accepts an integer N . Define an expression for (10.1), then substitute in
2
−y 2 for x to get a truncated Maclaurin series of e−y . Lambdify the resulting expression and
2
plot the series on the domain y ∈ [−2, 2]. Plot e−y over the same domain for comparison.
(Hint: use sy.factorial() to compute the factorial.)
Call your function with increasing values of N to check that the series converges correctly.
The curve is not the image of a single function (such a function would fail the vertical line test),
so the best way to plot it is to convert (10.2) to a pair of parametric equations that depend on
the angle parameter θ.
Construct an expression for the nonzero side of (10.2) and convert it to polar coordinates
with the substitutions x = r cos(θ) and y = r sin(θ). Simplify the result, then solve it for r.
There are two solutions due to the presence of an r2 term; pick one and lambdify it to get
a function r(θ). Use this function to plot x(θ) = r(θ) cos(θ) against y(θ) = r(θ) sin(θ) for
θ ∈ [0, 2π].
(Hint: use sy.Rational() for the fractional exponent.)
154 Lab 10. Introduction to SymPy
Linear Algebra
Sympy can also solve systems of equations. A system of linear equations Ax = b is solved in a
slightly different way than in NumPy and SciPy: instead of defining the matrix A and the vector b
separately, define the augmented matrix M = [A | b] and call sy.solve_linear_system() on M .
SymPy matrices are defined with sy.Matrix(), with the same syntax as 2-dimensional NumPy
arrays. For example, the following code solves the system given below.
x + y + z = 5
2x + 4y + 3z = 2
5x + 10y + 2z = 4
SymPy matrices support the standard matrix operations of addition +, subtraction -, and
multiplication @. Additionally, SymPy matrices are equipped with many useful methods, some of
which are listed below. See http://docs.sympy.org/latest/modules/matrices/matrices.html
for more methods and examples.
Method Returns
det() The determinant.
eigenvals() The eigenvalues and their multiplicities.
eigenvects() The eigenvectors and their corresponding eigenvalues.
inv() The matrix inverse.
is_nilpotent() True if the matrix is nilpotent.
norm() The Frobenius, ∞, 1, or 2 norm.
nullspace() The nullspace as a list of vectors.
rref() The reduced row-echelon form.
singular_values() The singular values.
Achtung!
The * operator performs matrix multiplication on SymPy matrices. To perform element-wise
multiplication, use the multiply_elementwise() method instead.
155
Problem 5. Find the eigenvalues of the following matrix by solving for λ in the characteristic
equation det(A − λI) = 0.
x−y x 0
A= x x−y x
0 x x−y
Also compute the eigenvectors by solving the linear system A − λI = 0 for each eigenvalue λ.
Return a dictionary mapping the eigenvalues to their eigenvectors.
(Hint: the nullspace() method may be useful.)
Check that Av = λv for each eigenvalue-eigenvector pair (λ, v). Compare your results to
the eigenvals() and eigenvects() methods for SymPy matrices.
Calculus
SymPy is also equipped to perform standard calculus operations, including derivatives, integrals, and
taking limits. Like other elements of SymPy, calculus operations can be temporally expensive, but
they give exact solutions whenever solutions exist.
Differentiation
The command sy.Derivative() creates a closed form, unevaluated derivative of an expression. This
is like putting dx
d
in front of an expression without actually calculating the derivative symbolically.
The resulting expression has a doit() method that can be used to evaluate the actual derivative.
Equivalently, sy.diff() immediately takes the derivative of an expression.
Both sy.Derivative() and sy.diff() accept a single expression, then the variable or variables
that the derivative is being taken with respect to.
Use SymPy to find all critical points of p and classify each as a local minimum or a local
maximum. Plot p(x) over x ∈ [−5, 5] and mark each of the minima in one color and the
maxima in another color. Return the collections of the x-values corresponding to the local
minima and local maxima as two separate sets.
To calculate the Jacobian matrix of a multivariate function with SymPy, define that function
as a symbolic matrix (sy.Matrix()) and use its jacobian() method. The method requires a list of
variables that prescribes the ordering of the differentiation.
Integration
The function sy.Integral() creates an unevaluated integral expression. This is like putting an
integral sign in front of an expression without actually evaluating the integral symbolically or nu-
merically. The resulting expression has a doit() method that can be used to evaluate the actual
integral. Equivalently, sy.integrate() immediately integrates an expression.
Both sy.Integral() and sy.integrate() accept a single expression, then a tuple or tuples
containing the variable of integration and, optionally, the bounds of integration.
Problem 7. Let f : R3 → R be a smooth function. The volume integral of f over the sphere
S of radius r can written in spherical coordinates as
ZZZ Z π Z 2π Z r
f (x, y, z)dV = f (h1 (ρ, θ, ϕ), h2 (ρ, θ, ϕ), h3 (ρ, θ, ϕ))| det(J)| dρ dθ dϕ,
0 0 0
S
Calculate the volume integral of f (x, y, z) = (x2 + y 2 + z 2 )2 over the sphere of radius r.
Lambdify the resulting expression (with r as the independent variable) and plot the integral
value for r ∈ [0, 3]. In addition, return the value of the integral when r = 2.
(Hint: simplify the integrand before computing the integral. In this case, | det(J)| = − det(J).)
To check your answer, when r = 3, the value of the integral is 8748 7 π.
Achtung!
SymPy isn’t perfect. It solves some integrals incorrectly, simplifies some expressions poorly,
and is significantly slower than numerical computations. However, it is generally very useful for
simplifying parts of an algorithm, getting exact answers, and handling tedious algebra quickly.
158 Lab 10. Introduction to SymPy
Additional Material
Pretty Printing
SymPy expressions, especially complicated ones, can be hard to read. Calling sy.init_printing()
changes the way that certain expressions are displayed to be more readable; in a Jupyter Notebook,
the rendering is done with LATEX, as displayed below. Furthermore, the function sy.latex() converts
an expression into actual LATEX code for use in other settings.
Limits
Limits can be expressed, similar to derivatives or integrals, with sy.Limit(). Alternatively, sy.
limit() (lowercase) evaluates a limit directly.
Use limits instead of the subs() method when the value to be substituted is ∞ or is a singularity.
Numerical Integration
Many integrals cannot be solved analytically. As an alternative to the doit() method, the as_sum()
method approximates the integral with a summation. This method accepts the number of terms
to use and a string indicating which approximation rule to use ("left", "right", "midpoint", or
"trapezoid").
>>> x = sy.symbols('x')
Differential Equations
SymPy can be used to solve both ordinary and partial differential equations. The documentation for
working with PDE functions is at http://docs.sympy.org/dev/modules/solvers/pde.html
The general form of a first-order differential equation is dx
dt = f (x(t), t). To represent the
unknown function x(t), use sy.Function(). Just as sy.solve() is used to solve an expression for
a given variable, sy.dsolve() solves an ODE for a particular function. When there are multiple
solutions, sy.dsolve() returns a list; when arbitrary constants are involved they are given as C1, C2,
and so on. Use sy.checkodesol() to check that a function is a solution to a differential equation.
>>> t = sy.symbols('t')
>>> x = sy.Function('x')
Since there are many types of ODEs, sy.dsolve() may also take a hint indicating what solving
strategy to use. See sy.ode.allhints for a list of possible hints, or use sy.classify_ode() to see
the list of hints that may apply to a particular equation.
160 Lab 10. Introduction to SymPy
11
Advanced Numpy
Lab Objective: NumPy is a vast library with many useful function that can be easily forgotten if
not used and reviewed. This lab will help you remember some of its functionality that you may have
forgotten and give you a few new functions to master.
Note
Some of this lab is review, but there is new material near the end and additional materials
beyond that.
Data Access
Array Slicing
Indexing for a 1-D NumPy array uses the slicing syntax x[start:stop:step]. If there is no colon,
a single entry of that dimension is accessed. With a colon, a range of values is accessed. For multi-
dimensional arrays, use a comma to separate slicing syntax for each axis.
161
162 Lab 11. Advanced Numpy
>>> A = np.array([[0,1,2,3,4],[5,6,7,8,9]])
>>> A
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
Note
Indexing and slicing operations return a view of the array. Changing a view of an array also
changes the original array. In other words, arrays are mutable. To create a copy of an array,
use np.copy() or the array’s copy() method. Changes to a copy of an array does not affect
the original array, but copying an array uses more time and memory than getting a view.
Fancy Indexing
So-called fancy indexing is a second way to access or change the elements of an array. Instead of
using slicing syntax, provide either an array of indices or an array of boolean values (called a mask )
to extract specific elements.
# A boolean array extracts the elements of 'x' at the same places as 'True'.
>>> mask = np.array([True, False, False, True, False])
>>> x[mask] # Get the 0th and 3rd entries.
array([ 0, 30])
Fancy indexing is especially useful for extracting or changing the values of an array that meet
some sort of criterion. Use comparison operators like < and == to create masks.
While indexing and slicing always return a view, fancy indexing always returns a copy.
Problem 1. Write a function that accepts a single array as input. Make a copy of the array,
then use fancy indexing to set all negative entries of the copy to 0. Return the resulting array.
Array Manipulation
Shaping
An array’s shape attribute describes its dimensions. Use np.reshape() or the array’s reshape()
method to give an array a new shape. The total number of entries in the old array and the new
array must be the same in order for the shaping to work correctly. Using a -1 in the new shape tuple
makes the specified dimension as long as necessary.
# Reshape 'A' into an array with 2 rows and the appropriate number of columns.
>>> A.reshape((2,-1))
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11]])
Use np.ravel() to flatten a multi-dimensional array into a 1-D array and np.transpose() or
the T attribute to transpose a 2-D array in the matrix sense.
>>> A = np.arange(12).reshape((3,4))
>>> A
array([[ 0, 1, 2, 3],
164 Lab 11. Advanced Numpy
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
Note
By default, all NumPy arrays that can be represented by a single dimension, including column
slices, are automatically reshaped into “flat” 1-D arrays. For example, by default an array will
have 10 elements instead of 10 arrays with one element each. Though we usually represent
vectors vertically in mathematical notation, NumPy methods such as dot() are implemented
to purposefully work well with 1-D “row arrays”.
>>> A = np.arange(10).reshape((2,5))
>>> A
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
However, it is occasionally necessary to change a 1-D array into a “column array”. Use
np.reshape(), np.vstack(), or slice the array and put np.newaxis on the second axis. Note
that np.transpose() does not alter 1-D arrays.
>>> x = np.arange(3)
>>> x
array([0, 1, 2])
[1],
[2]])
Stacking
NumPy has functions for stacking two or more arrays with similar dimensions into a single block
matrix. Each of these methods takes in a single tuple of arrays to be stacked in sequence.
Function Description
concatenate() Join a sequence of arrays along an existing axis
hstack() Stack arrays in sequence horizontally (column wise).
vstack() Stack arrays in sequence vertically (row wise).
column_stack() Stack 1-D arrays as columns into a 2-D array.
>>> A = np.arange(6).reshape((2,3))
>>> B = np.zeros((4,3))
>>> A = A.T
>>> B = np.ones((3,4))
See http://docs.scipy.org/doc/numpy-1.10.1/reference/routines.array-manipulation.html
for more array manipulation routines and documentation.
166 Lab 11. Advanced Numpy
In many scientific disciplines, arrays of more than 2 dimensions are readily utilized. Numpy’s function
are designed to work on these larger arrays but sometimes it’s necessary to convert arrays to a different
dimension. To do this we’ll use np.squeeze() and np.dstack
np.squeeze() eliminates any superfluous dimensions in an array. These will be any dimension
of value 1 when the shape attribute is called. It does not matter in which position the dimension is
located. As a result of this, using np.squeeze() on several arrays of different shape can result in
arrays of the same shape.
>>> np.squeeze(test1)
array([[ 0., 1., 2.],
[ 3., 4., 5.],
[ 6., 7., 8.]])
>>> np.squeeze(test2)
array([[ 0., 1., 2.],
[ 3., 4., 5.],
[ 6., 7., 8.]])
# even arrays with many extra dimensions reduce to the same matrix
>>> ridiculous = np.arange(9).reshape(1,1,1,1,3,3,1,1,1,1)
>>> np.squeeze(ridiculous)
array([[ 0., 1., 2.],
[ 3., 4., 5.],
[ 6., 7., 8.]])
# however arrays that have no 1 value in their shape will remain the same
>>> stoic = np.arange(20).reshape(2,2,5)
>>> print(stoic)
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9]],
>>> print(np.squeeze(stoic))
167
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9]],
np.dstack(), on the other hand, performs like the other two stacking function except that it
stacks on the third dimensions rather than the first or second. For example, using np.dstack() on
two matrices of shape (3, 3) would make a matrix of shape (3, 3, 2). If a matrix already has three
dimensions or more, np.dstack() will only affect the third one (i.e, shapes (3, 3, 2, 2) with (3, 3, 2, 2)
will create (3, 3, 4, 2))
Problem 2. Write a function that accepts a list of arrays, squeezes them and pads them with
0’s so they are the same dimensions and then stacks them along the 3rd dimension. Thus, the
arrays in the list can, individually, be any size and dimension. However, you may assume the
arrays will all be 2-dimensional once the extra dimensions are squeezed out.
Hint: Use the various stacking commands to pad the inputted arrays appropriately with
0’s so that they can easily be stacked into the three dimensional array in the end. Again, you
may assume all arrays in the list, once squeezed, will be two dimensional arrays.
Array Broadcasting
Many matrix operations make sense only when the two operands have the same shape, such as
element-wise addition. Array broadcasting extends such operations to accept some (but not all)
operands with different shapes, and occurs automatically whenever possible.
Suppose, for example, that we would like to add different values to the columns of an m × n
matrix A. Adding a 1-D array x with the n entries to A will automatically do this correctly. To add
different values to the different rows of A, first reshape a 1-D array of m values into a column array.
Broadcasting then correctly takes care of the operation.
Broadcasting can also occur between two 1-D arrays, once they are reshaped appropriately.
>>> A = np.arange(12).reshape((4,3))
>>> x = np.arange(3)
>>> A
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
>>> x
array([0, 1, 2])
Function Description
abs() or absolute() Calculate the absolute value element-wise.
exp() / log() Exponential (ex ) / natural log element-wise.
maximum() / minimum() Element-wise maximum / minimum of two arrays.
sqrt() The positive square-root, element-wise.
sin(), cos(), tan(), etc. Element-wise trigonometric operations.
>>> x = np.arange(-2,3)
>>> print(x, np.abs(x)) # Like np.array([abs(i) for i in x]).
[-2 -1 0 1 2] [2 1 0 1 2]
Problem 3. Write a function that accepts a universal function and an n×n NumPy array, and
returns how many times as fast it is to operate on the entire array element-wise, rather than
by using a nested for loop to operate on each element individually. Run each way of operating
on the matrix 10 times, and return the ratio of the averages of the two methods. Vow that you
will avoid unnecessary nested for loops, especially when operating on large arrays.
Achtung!
The math module has many useful functions for numerical computations. However, most of
these functions can only act on single numbers, not on arrays. NumPy functions can act on
either scalars or entire arrays, but math functions tend to be a little faster for acting on scalars.
Always use universal NumPy functions, not the math module, when working with arrays.
The np.ndarray class itself has many useful methods for numerical computations.
170 Lab 11. Advanced Numpy
Method Returns
all() True if all elements evaluate to True.
any() True if any elements evaluate to True.
argmax() Index of the maximum value.
argmin() Index of the minimum value.
argsort() Indices that would sort the array.
clip() restrict values in an array to fit within a given range
max() The maximum element of the array.
mean() The average value of the array.
min() The minimum element of the array.
roll() shuffles the elements of the array according to specified amount.
sort() Return nothing; sort the array in-place.
std() The standard deviation of the array.
sum() The sum of the elements of the array.
var() The variance of the array.
Each of these np.ndarray methods has an equivalent NumPy function. For example, A.max()
and np.max(A) operate the same way. The one exception is the sort() function: np.sort() returns
a sorted copy of the array, while A.sort() sorts the array in-place and returns nothing.
Every method listed can operate along an axis via the keyword argument axis. If axis is
specified for a method on an n-D array, the return value is an (n − 1)-D array, the specified axis
having been collapsed in the evaluation process. If axis is not specified, the return value is usually
a scalar. Refer to the NumPy Visual Guide in the appendix for more visual examples.
>>> A = np.arange(9).reshape((3,3))
>>> A
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
Problem 4. A matrix is called row-stochastic a if its rows each sum to 1. Stochastic matrices
are fundamentally important for finite discrete random processes and some machine learning
algorithms.
Write a function than accepts a matrix (as a 2-D NumPy array). Divide each row of the
matrix by the row sum and return the new row-stochastic matrix. Use array broadcasting and
the axis argument instead of a loop.
a Similarly, a matrix is called column-stochastic if its columns each sum to 1.
Vectorizing functions
Whenever possible making your functions ‘numpy aware’ can greatly reduce complexity, increase
readability and simplicity of code and make functions more versatile. Designing functions to be able
to work with and utilize numpy arrays and numpy functions is one of the best ways to optimize code.
However, sometimes the functions we need to use are very difficult to vectorize. In this case it can
be useful to employ np.vectorize()
np.vectorize() accepts as an argument a function whose input and output is a scalar. It
returns a new function that is ‘numpy aware’, meaning that it will accept a numpy array of values
and output an array where each entry had the operation defined by the original function performed
on it.
Note
172 Lab 11. Advanced Numpy
While the above example can easily be done with array broadcasting, np.vectorize() can be
implemented with very complex scalar functions for which no array broadcasting method exists.
However, it should be noted that this function is used only for convenience and readability since
it does not improve temporal complexity like normal array broadcasting would. Even though
it doesn’t improve the complexity, it is often simpler than trying to formulate the for loop.
Problem 5. Given to you is the code that finds the prime factorization of a number and
returns the largest prime in the factorization. Vectorize the function using np.vectorize()
and program a function that either uses the vectorized function or the naive for loop depending
on the argument ‘naive’ being passed in as True or False.
Make sure you function returns a numpy array of the same size for both cases.
Hint: Make sure the naive approach returns the array with a dtype of ‘int32’
Einsum
While numpy has many functions to help multiply arrays, multiplying the elements of arrays in
unorthodox ways usually requires the conglomeration of quite a few of these functions. np.einsum()
is designed to eliminate this problem by making a general framework for multiplication and addition
in arrays using their shapes and allowing the coder to tell the function which elements exactly are to
be multiplied or summed and how those operations are to be returned.
The np.einsum() function can be used on arrays of greater than 2 dimensions, but we’ll keep
the scope of this text to working with input arrays of 1 or 2 dimensions. The numbers in the following
syntax only represent positions, each position will be explained below:
1 and 2) The variables representing the shape of the first input array (the variables will be explained
later). In the case of 1 dimensional vectors, there will only be one variable and the second will
be omitted.
3 and 4) Similarly, the variables representing the shape of the second input array.
5 and 6) The variables representing the dimensions of the output array. There can one, two, or even
three variables here.
easter 7) The first input array. Make sure it is outside the quotation mark ending the variables
section and preceded by a comma.
Positions 7 and 8 are obviously arrays, but position 1-6 are filled with the variables, usually i,
j, and k which represent the dimensions of the input and output arrays. Einsum can multiply and/or
sum any two matrices across any axis by simply changing the way these variables are arranged and
repeated.
173
Einsum Rules
The way einsum() interprets its inputted variables are as follows:
1) If the variables contained in positions 1 or 2 share a variable in 3 or 4, the values along the
axes specified by the repeated variables positions will be multiplied together.
2) On the other side of the arrow (positions 5 and 6), specify the dimensions of the output.
Ommiting a variable from these positions causes the products to be summed.
>>> A = np.eye(3)
>>> A[0,:] += A[2,:]
>>> A
array([[1, 0, 1],
[0, 1, 0],
[0, 0, 1]])
>>> B = np.arange(9).reshape((3,3))
>>> B
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
Since the j was repeated in the previous example, the elements on those 2 axes are paired and
multiplied. The first pair of indicies has j in the second position, which corresponds to the column
position (axis=1; moving from column to column down a single row). The second pair of indices has
j in the first position, which corresponds to the row position (axis=0; moving row to row down a
single column). Thus, Einsum takes products of the elements along the respective column and row
pairs in exactly the same way as matrix multiplication. The output was specified to be ik instead
of ijk, so einsum summed the products that it created from having the repeated j. It’s important
to note here that Einsum adds and multiplies in a very similar manner to the axis argument in
np.sum() and other numpy functions.
Row sums: Here the variable representing the row axis must be ommitted so einsum sums the rows
(note no products were taken but the summing is still allowed. np.einsum("ij -> j", A)
Dot product: Whenever your output is a scalar the arrow in the function is optional.
np.einsum("i, i", x, y) is equivalent to np.einsum("i, i ->", x, y). Note here that
the i must be repeated to get the proper dot product
Full matrix sum: Summing the entire matrix is similar to the dot product. np.einsum("ij->",A)
Note
While excluding a variables after the -> symbol will cause summation along the corresponding
axis to occur, including the variables when usual matrix multiplication would exclude them can
cause array broadcasting to occur on the input vectors or matrices. One of the next examples
shows this.
Now, to demonstrate Einsum’s true power we’ll show a more complicated example. Imagine
you needed to take an outer product of each column of a matrix with the corresponding column of
another matrix. You could accomplish this in a for loop, but it would be slow and not very effective.
Conversely, this can be done in one line with np.einsum.
>>> A = np.arange(9).reshape((3,3))
>>> outer_products = np.einsum("ic, jc -> cij", A, A)
>>> outer_products
array([[[ 0, 0, 0],
[ 0, 9, 18],
[ 0, 18, 36]],
[[ 1, 4, 7],
[ 4, 16, 28],
[ 7, 28, 49]],
[[ 4, 10, 16],
[10, 25, 40],
[16, 40, 64]]])
The output dimension having three variables creates a 3-dimensional matrix where each element
in the first axis is a 3 by 3 outer product matrix. If we then wanted to sum the rows of each of those
matrices and make a 2 dimensional matrix from these row sums, we simply omit the row axis variable
i.
>>> A = np.arange(9).reshape((3,3))
>>> outer_products_sum = np.einsum("ic, jc -> cj", A, A)
>>> outer_products_sum
array([[ 0, 27, 54],
[ 12, 48, 84],
175
And then perhaps we want to array broadcast a different vector to the rows of the resulting
matrix. We can think of starting with our output variables cj and design it from there, adding to
the already existing Einsum function. The vector variable needs to match the axis we want to iterate
over. Since we want the elements of our vector to be distributed to the rows of the matrix, we choose
j to represent our vector.
>>> A = np.arange(9).reshape((3,3))
>>> v = np.array([0,1,-1])
>>> outer_product_sum_with_broadcast = np.einsum("ic, jc, j -> cj", A, A, v)
>>> outer_product_sum_with_broadcast
array([[ 0, 27, -54],
[ 0, 48, -84],
[ 0, 75, -120]])
And thus, three difficult operations have been reduced to a few letters by the power of Einsum.
Einsum Optimize
np.einsum(), in most cases, performs faster than built in numpy functions. However, the way Einsum
organizes its operations creates redundancy when trying to perfrom multiple operations at once, such
as multiplying two matrices, broadcasting a vector to its rows and then summing the resulting matrix
columns. One should be cautious when using Einsum to perform multiple operations since you may
actually be making your complexity worse rather than improving it.
There are two ways around this problem. First, performing each operation individually will
preserve the integrity of Einsum’s performance, although brevity of code will suffer. Second, multiple
operations can be performed efficiently using kwarg optimize=True.
Setting optimize=True creates an extra step of operational analysis before any calculations are
made to ensure efficient order of operations. In addition, this mode uses a more spatially complex
method of computation in exchange for ensured temporal gains. This is why optimize defaults to
False: to allow the programmer to know whether or not there is sufficient memory or need for the
optimize functionality.
Problem 6. Write a function that accepts 3 vectors and a matrix of appropriate sizes and
returns a matrix that is the result of an outer product of the first 2 vectors, the 3rd vector
array broadcasted to the columns of that matrix and then the multiplication via normal matrix
multiplication of that result to the inputed matrix.
Hint: Your result should return the equivalent of np.outer(x,y)*z.reshape(-1,1)@A
where x, y, and z are vectors and A is a matrix.
Problem 7. Time your einsum function from Problem 6 versus its numpy function equivalent
for vectors of size 3 through 500 and arrays of size (3,3) through (500,500). Plot the results on
a neatly formatted and labeled graph.
176 Lab 11. Advanced Numpy
Hint: If your Einsum function is running slower than its numpy equivalent, consider using one
of the two methods described above. In the end, your graphs should look like one of these:
177
Additional Material
Random Sampling
The submodule np.random holds many functions for creating arrays of random values chosen from
probability distributions such as the uniform, normal, and multinomial distributions. It also contains
some utility functions for getting non-distributional random samples, such as random integers or
random samples from a given array.
Function Description
choice() Take random samples from a 1-D array.
random() Uniformly distributed floats over [0, 1).
randint() Random integers over a half-open interval.
random_integers() Random integers over a closed interval.
randn() Sample from the standard normal distribution.
permutation() Randomly permute a sequence / generate a random sequence.
Function Distribution
beta() Beta distribution over [0, 1].
binomial() Binomial distribution.
exponential() Exponential distribution.
gamma() Gamma distribution.
geometric() Geometric distribution.
multinomial() Multivariate generalization of the binomial distribution.
multivariate_normal() Multivariate generalization of the normal distribution.
normal() Normal / Gaussian distribution.
poisson() Poisson distribution.
uniform() Uniform distribution.
Note that many of these functions have counterparts in the standard library’s random module.
These NumPy functions, however, are much better suited for working with large collections of random
samples.
Function Description
save() Save a single array to a .npy file.
savez() Save multiple arrays to a .npz file.
savetxt() Save a single array to a .txt file.
load() Load and return an array or arrays from a .npy or .npz file.
loadtxt() Load and return an array from a text file.
# Read the array from the file and check that it matches the original.
>>> y = np.load("uniform.npy") # Or np.loadtxt("uniform.txt").
>>> np.allclose(x, y) # Check that x and y are close entry-wise.
True
To save several arrays to a single file, specify a keyword argument for each array in np.savez().
Then np.load() will return a dictionary-like object with the keyword parameter names from the
save command as the keys.
# Read the arrays from the file and check that they match the original.
>>> arrays = np.load("normal.npz")
>>> np.allclose(x, arrays["first"])
True
>>> np.allclose(y, arrays["second"])
True
Part II
Appendices
179
A
NumPy Visual Guide
Lab Objective: NumPy operations can be difficult to visualize, but the concepts are straightforward.
This appendix provides visual demonstrations of how NumPy arrays are used with slicing syntax,
stacking, broadcasting, and axis-specific operations. Though these visualizations are for 1- or 2-
dimensional arrays, the concepts can be extended to n-dimensional arrays.
Data Access
The entries of a 2-D array are the rows of the matrix (as 1-D arrays). To access a single entry, enter
the row index, a comma, and the column index. Remember that indexing begins with 0.
× × × × × × × × × ×
× × × × × × × × × ×
A[0] =
×
A[2,1] =
× × × × × × × × ×
× × × × × × × × × ×
Slicing
A lone colon extracts an entire row or column from a 2-D array. The syntax [a:b] can be read as
“the ath entry up to (but not including) the bth entry.” Similarly, [a:] means “the ath entry to the
end” and [:b] means “everything up to (but not including) the bth entry.”
× × × × × × × × × ×
× × × × × × × × × ×
A[1] = A[1,:] =
× ×
A[:,2] =
× × × × × × × ×
× × × × × × × × × ×
× × × × × × × × × ×
× × × × × × × × × ×
A[1:,:2] =
×
A[1:-1,1:-1] =
× × × × × × × × ×
× × × × × × × × × ×
181
182 Appendix A. NumPy Visual Guide
Stacking
np.hstack() stacks sequence of arrays horizontally and np.vstack() stacks a sequence of arrays
vertically.
× × × ∗ ∗ ∗
A= × × × B= ∗ ∗ ∗
× × × ∗ ∗ ∗
× × × ∗ ∗ ∗ × × ×
np.hstack((A,B,A)) = ×
× × ∗ ∗ ∗ × × ×
× × × ∗ ∗ ∗ × × ×
× × ×
× × ×
× × ×
∗ ∗ ∗
np.vstack((A,B,A)) = ∗ ∗ ∗
∗ ∗ ∗
× × ×
× × ×
× × ×
Because 1-D arrays are flat, np.hstack() concatenates 1-D arrays and np.vstack() stacks them
vertically. To make several 1-D arrays into the columns of a 2-D array, use np.column_stack().
x= y=
× × × × ∗ ∗ ∗ ∗
np.hstack((x,y,x)) =
× × × × ∗ ∗ ∗ ∗ × × × ×
× ∗ ×
× × × × × ∗ ×
np.vstack((x,y,x)) = ∗ ∗ ∗ ∗ np.column_stack((x,y,x)) =
×
∗ ×
× × × ×
× ∗ ×
The functions np.concatenate() and np.stack() are more general versions of np.hstack() and
np.vstack(), and np.row_stack() is an alias for np.vstack().
Broadcasting
NumPy automatically aligns arrays for component-wise operations whenever possible. See http:
//docs.scipy.org/doc/numpy/user/basics.broadcasting.html for more in-depth examples and
broadcasting rules.
183
1 2 3
A= 1 x=
2 3 10 20 30
1 2 3
1 2 3
1 2 3
11 22 33
A + x= 1 2 3 = 11 22 33
+ 11 22 33
10 20 30
1 2 3 10 11 12 13
A + x.reshape((1,-1)) = 1 2 3 + 20 = 21 22 23
1 2 3 30 31 32 33
1 2 3 4
1 2 3 4
A=
1
2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
A.sum(axis=0) =
= 4 8 12 16
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
A.sum(axis=1) =
= 10 10 10 10
1 2 3 4
1 2 3 4
184 Appendix A. NumPy Visual Guide
B
Matplotlib Syntax and
Customization Guide
Lab Objective: The documentation for Matplotlib can be a little difficult to maneuver and basic
information is sometimes difficult to find. This appendix condenses and demonstrates some of the
more applicable and useful information on plot customizations. It is not intended to be read all at
once, but rather to be used as a reference when needed. For an interative introduction to Matplotlib,
see the Introduction to Matplotlib lab in Python Essentials. For more details on any specific function,
refer to the Matplotlib documentation at https: // matplotlib. org/ .
Matplotlib Interface
Matplotlib plots are made in a Figure object that contains one or more Axes, which themselves
contain the graphical plotting data. Matplotlib provides two ways to create plots:
1. Call plotting functions directly from the module, such as plt.plot(). This will create the plot
on whichever Axes is currently active.
2. Call plotting functions from an Axes object, such as ax.plot(). This is particularly useful for
complicated plots and for animations.
Table B.1 contains a summary of functions that are used for managing Figure and Axes objects.
Function Description
add_subplot() Add a single subplot to the current figure
axes() Add an axes to the current figure
clf() Clear the current figure
figure() Create a new figure or grab an existing figure
gca() Get the current axes
gcf() Get the current figure
subplot() Add a single subplot to the current figure
subplots() Create a figure and add several subplots to it
185
186 Appendix B. Matplotlib Customization
Axes objects are usually managed through the functions plt.subplot() and plt.subplots().
The function subplot() is used as plt.subplot(nrows, ncols, plot_number). Note that if the
inputs for plt.subplot() are all integers, the commas between the entries can be omitted. For
example, plt.subplot(3,2,2) can be shortened to plt.subplot(322).
The function subplots() is used as plt.subplots(nrows, ncols), and returns a Figure
object and an array of Axes. This array has the shape (nrows, ncols), and can be accessed as any
other array. Figure B.1 demonstrates the layout and indexing of subplots.
1 2 3
4 5 6
Figure B.1: The layout of subplots with plt.subplot(2,3,i) (2 rows, 3 columns), where i is the
index pictured above. The outer border is the figure that the axes belong to.
The following example demonstrates three equivalent ways of producing a figure with two
subplots, arranged next to each other in one row:
Achtung!
Be careful not to mix up the following similarly-named functions:
1. plt.axes() creates a new place to draw on the figure, while plt.axis() or ax.axis()
sets properties of the x- and y-axis in the current axes, such as the x and y limits.
2. plt.subplot() (singular) returns a single subplot belonging to the current figure, while
plt.subplots() (plural) creates a new figure and adds a collection of subplots to it.
Plot Customization
Styles
Matplotlib has a number of built-in styles that can be used to set the default appearance of plots.
These can be used via the function plt.style.use(); for instance, plt.style.use("seaborn")
will have Matplotlib use the "seaborn" style for all plots created afterwards. A list of built-in
styles can be found at https://matplotlib.org/stable/gallery/style_sheets/style_sheets_
reference.html.
The style can also be changed only temporarily using plt.style.context() along with a with
block:
with plt.style.context('dark_background'):
# Any plots created here use the new style
plt.subplot(1,2,1)
plt.plot(x, y)
# ...
# Plots created here are unaffected
plt.subplot(1,2,2)
plt.plot(x, y)
Plot layout
Axis properties
Table B.2 gives an overview of some of the functions that may be used to configure the axes of a
plot.
The functions xlim(), ylim(), and axis() are used to set one or both of the x and y ranges
of the plot. xlim() and ylim() each accept two arguments, the lower and upper bounds, or a single
list of those two numbers. axis() accepts a single list consisting, in order, of xmin, xmax, ymin,
ymax. Passing None instead of one of the numbers to any of these functions will make it not change
the corresponding value from what it was. Each of these functions can also be called without any
arguments, in which case it will return the current bounds. Note that axis() can also be called
directly on an Axes object, while xlim() and ylim() cannot.
axis() also can be called with a string as its argument, which has several options. The most
common is axis('equal'), which makes the scale of the x- and y-scales equal (i.e. makes circles
circular).
188 Appendix B. Matplotlib Customization
Function Description
axis() set the x- and y-limits of the plot
grid() add gridlines
xlim() set the limits of the x-axis
ylim() set the limits of the y-axis
xticks() set the location of the tick marks on the x-axis
yticks() set the location of the tick marks on the y-axis
xscale() set the scale type to use on the x-axis
yscale() set the scale type to use on the y-axis
ax.spines[side].set_position() set the location of the given spine
ax.spines[side].set_color() set the color of the given spine
ax.spines[side].set_visible() set whether a spine is visible
Table B.2: Some functions for changing axis properties. ax is an Axes object.
To use a logarithmic scale on an axis, the functions xscale("log") and yscale("log") can
be used.
The functions xticks() and yticks() accept a list of tick positions, which the ticks on the
corresponding axis are set to. Generally, this works the best when used with np.linspace(). This
function also optionally accepts a second argument of a list of labels for the ticks. If called with no
arguments, the function returns a list of the current tick positions and labels instead.
The spines of a Matplotlib plot are the black border lines around the plot, with the left and
bottom ones also being used as the axis lines. To access the spines of a plot, call ax.spines[side],
where ax is an Axes object and side is 'top', 'bottom', 'left', or 'right'. Then, functions can
be called on the Spine object to configure it.
The function spine.set_position() has several ways to specify the position. The two simplest
are with the arguments 'center' and 'zero', which place the spine in the center of the subplot or
at an x- or y-coordinate of zero, respectively. The others are a passed as a tuple (position_type,
amount):
• 'axes': place the spine at the specified Axes coordinate, where 0 corresponds to the bottom
or left of the subplot, and 1 corresponds to the top or right edge of the subplot.
• 'outward': places the spine amount pixels outward from the edge of the plot area. A negative
value can be used to move it inwards instead.
spine.set_color() accepts any of the color formats Matplotlib supports. Alternately, using
set_color('none') will make the spine not be visible. spine.set_visible() can also be used for
this purpose.
The following example adjusts the ticks and spine positions to improve the readability of a plot
of sin(x). The result is shown in Figure B.2.
>>> x = np.linspace(0,2*np.pi,150)
>>> plt.plot(x, np.sin(x))
>>> plt.title(r"$y=\sin(x)$")
#Move the bottom spine to zero, remove the top and right ones
>>> ax = plt.gca()
>>> ax.spines['bottom'].set_position('zero')
>>> ax.spines['right'].set_color('none')
>>> ax.spines['top'].set_color('none')
>>> plt.show()
y = sin(x)
1.0
0.5
0.0
0 2
3
2
2
0.5
1.0
Figure B.2: Plot of y = sin(x) with axes modified for clarity
Plot Layout
The position and spacing of all subplots within a figure can be modified using the function plt
.subplots_adjust(). This function accepts up to six keyword arguments that change different
aspects of the spacing. left, right, top, and bottom are used to adjust the rectangle around all of
the subplots. In the coordinates used, 0 corresponds to the bottom or left edge of the figure, and 1
corresponds to the top or right edge of the figure. hspace and wspace set the vertical and horizontal
spacing, respectively, between subplots. The units for these are in fractions of the average height
and width of all subplots in the figure. If more fine control is desired, the position of individual Axes
objects can also be changed using ax.get_position() and ax.set_position().
The size of the figure can be configured using the figsize argument when creating a figure:
>>> plt.figure(figsize=(12,8))
Note that many environments will scale the figure to fill the available space. Even so, changing the
figure size can still be used to change the aspect ratio as well as the relative size of plot elements.
The following example uses subplots_adjust() to create space for a legend outside of the
plotting space. The result is shown in Figure B.3.
190 Appendix B. Matplotlib Customization
#Generate data
>>> x1 = np.random.normal(-1, 1.0, size=60)
>>> y1 = np.random.normal(-1, 1.5, size=60)
>>> x2 = np.random.normal(2.0, 1.0, size=60)
>>> y2 = np.random.normal(-1.5, 1.5, size=60)
>>> x3 = np.random.normal(0.5, 1.5, size=60)
>>> y3 = np.random.normal(2.5, 1.5, size=60)
Dataset 1 4
Dataset 2
Dataset 3 2
0
2
4
2 0 2 4
Figure B.3: Example of repositioning axes.
191
Colors
The color that a plotting function uses is specified by either the c or color keyword arguments; for
most functions, these can be used interchangeably. There are many ways to specific colors. The most
simple is to use one of the basic colors, listed in Table B.3. Colors can also be specified using an
RGB tuple such as (0.0, 0.4, 1.0), a hex string such as "0000FF", or a CSS color name like "
DarkOliveGreen" or "FireBrick". A full list of named colors that Matplotlib supports can be found
at https://matplotlib.org/stable/gallery/color/named_colors.html. If no color is specified
for a plot, Matplotlib automatically assigns it one from the default color cycle.
Code Color
Code Color
'b' blue
'y' yellow
'g' green
'k' black
'r' red
'w' white
'c' cyan
'C0' - 'C9' Default colors
'm' magenta
Plotting functions also accept an alpha keyword argument, which can be used to set the
transparency. A value of 1.0 corresponds to fully opaque, and 0.0 corresponds to fully transparent.
The following example demonstrates different ways of specifying colors:
Colormaps
Certain plotting functions, such as heatmaps and contour plots, accept a colormap rather than a
single color. A full list of colormaps available in Matplotlib can be found at https://matplotlib.
org/stable/gallery/color/colormap_reference.html. Some of the more commonly used ones
are "viridis", "magma", and "coolwarm". A colorbar can be added by calling plt.colorbar()
after creating the plot.
Sometimes, using a logarithmic scale for the coloring is more informative. To do this, pass a
matplotlib.colors.LogNorm object as the norm keyword argument:
>>> plt.title(r"$\frac{1}{2}\sin(x^2)$")
The function legend() can be used to add a legend to a plot. Its optional loc keyword
argument specifies where to place the legend within the subplot. It defaults to 'best', which will
cause Matplotlib to place it in whichever location overlaps with the fewest drawn objects. The other
locations this function accepts are 'upper right', 'upper left', 'lower left', 'lower right',
'center left', 'center right', 'lower center', 'upper center', and 'center'. Alternately,
a tuple of (x,y) can be passed as this argument, and the bottom-left corner of the legend will be
placed at that location. The point (0,0) corresponds to the bottom-left of the current subplot, and
(1,1) corresponds to the top-right. This can be used to place the legend outside of the subplot,
although care should be taken that it does not go outside the figure, which may require manually
repositioning the subplots.
The labels the legend uses for each curve or scatterplot are specified with the label keyword
argument when plotting the object. Note that legend() can also be called with non-keyword argu-
ments to set the labels, although it is less confusing to set them when plotting.
The following example demonstrates creating a legend:
>>> x = np.linspace(0,2*np.pi,250)
The function plot() has several ways to specify this argument; the simplest is to pass it as the
third positional argument. The marker and linestyle keyword arguments can also be used. The
size of these can be modified using markersize and linewidth. Note that by specifying a marker
style but no line style, plot() can be used to make a scatter plot. It is also possible to use both a
marker style and a line style. To set the marker using scatter(), use the marker keyword argument,
with s being used to change the size.
The following code demonstrates specifying marker and line styles. The results are shown in
Figure B.4.
#With plot(), the color to use can also be specified in the same string.
#Order usually doesn't matter.
#Use red dots:
>>> plt.plot(x, y, '.r')
194 Appendix B. Matplotlib Customization
#Equivalent:
>>> plt.plot(x, y, 'r.')
Plot Types
Matplotlib has functions for creaing many different types of plots, many of which are listed in Table
B.6. This section gives details on using certain groups of these functions.
195
Line plots
Line plots, the most basic type of plot, are created with the plot() function. It accepts two lists of
x- and y-values to plot, and optionally a third argument of a string of any combination of the color,
line style, and marker style. Note that this method only works with the single-character color codes;
to use other colors, use the color argument. By specifying only a marker style, this function can
also be used to create scatterplots.
There are a number of functions that do essentially the same thing as plot() but also change
the axis scaling, including loglog(), semilogx(), semilogy(), and polar. Each of these functions
is used in the same manner as plot(), and has identical syntax.
Bar Plots
Bar plots are a way to graph categorical data in an effective way. They are made using the bar()
function. The most important arguments are the first two that provide the data, x and height. The
first argument is a list of values for each bar, either categorical or numerical; the second argument is
a list of numerical values corresponding to the height of each bar. There are other parameters that
may be included as well. The width argument adjusts the bar widths; this can be done by choosing
a single value for all of the bars, or an array to give each bar a unique width. Further, the argument
bottom allows one to specify where each bar begins on the y-axis. Lastly, the align argument can
be set to ’center’ or ’edge’ to align as desired on the x-axis. As with all plots, you can use the color
keyword to specify any color of your choice. If you desire to make a horizontal bar graph, the syntax
follows similarly using the function barh(), but with argument names y, width, height and align.
196 Appendix B. Matplotlib Customization
Box Plots
A box plot is a way to visualize some simple statistics of a dataset. It plots the minimum, maximum,
and median along with the first and third quartiles of the data. This is done by using boxplot()
with an array of data as the argument. Matplotlib allows you to enter either a one dimensional
array for a single box plot, or a 2-dimensional array where it will plot a box plot for each column of
the data in the array. Box plots default to having a vertical orientation but can be easily laid out
horizontally by setting vert=False.
>>> x = np.linspace(0,1,100)
>>> y = np.linspace(0,1,80)
>>> X, Y = np.meshgrid(x, y)
The z-coordinate can then be computed using the x and y mesh grids.
Note that each of these functions can accept a colormap, using the cmap parameter. These
plots are sometimes more informative with a logarithmic color scale, which can be used by passing a
matplotlib.colors.LogNorm object in the norm parameter of these functions.
With pcolormesh(), it is also necessary to pass shading='auto' or shading='nearest' to
avoid a deprecation error.
The following example demonstrates creating heatmaps and contour plots, using a graph of
z = (x2 + y) sin(y). The results is shown in Figure B.5
>>> x = np.linspace(-3,3,100)
>>> y = np.linspace(-3,3,100)
>>> X, Y = np.meshgrid(x, y)
>>> Z = (X**2+Y)*np.sin(Y)
#Heatmap
>>> plt.subplot(1,3,1)
197
#Contour
>>> plt.subplot(1,3,2)
>>> plt.contour(X, Y, Z, cmap='magma')
>>> plt.title("Contour plot")
#Filled contour
>>> plt.subplot(1,3,3)
>>> plt.contourf(X, Y, Z, cmap='coolwarm')
>>> plt.title("Filled contour plot")
>>> plt.colorbar()
>>> plt.show()
Showing images
The function imshow() is used for showing an image in a plot, and can be used on either grayscale
or color images. This function accepts a 2-D n × m array for a grayscale image, or a 3-D n × m × 3
array for a color image. If using a grayscale image, you also need to specify cmap='gray', or it will
be colored incorrectly.
It is best to also use axis('equal') alongside imshow(), or the image will most likely be
stretched. This function also works best if the images values are in the range [0, 1]. Some ways to
load images will format their values as integers from 0 to 255, in which case the values in the image
array should be scaled before using imshow().
3-D Plotting
Matplotlib can be used to plot curves and surfaces in 3-D space. In order to use 3-D plotting, you
need to run the following line:
198 Appendix B. Matplotlib Customization
The argument projection='3d' also must be specified when creating the subplot for the 3-D object:
Curves can be plotted in 3-D space using plot(), by passing in three lists of x-, y-, and z-
coordinates. Surfaces can be plotted using ax.plot_surface(). This function can be used similar
to creating contour plots and heatmaps, by obtaining meshes of x- and y- coordinates from np.
meshgrid() and using those to produce the z-axis. More generally, any three 2-D arrays of meshes
corresponding to x-, y-, and z-coordinates can be used. Note that it is necessary to call this function
from an Axes object.
The following example demonstrates creating 3-D plots. The results are shown in Figure B.6.
plt.show()
199
4 1.0
3 0.5 0.5
2 0.0 0.0
1 0.5 0.5
0 1.0
1 1 1
1 0 1 0 1 0
0 1 0 1 0 1
1 1 1
Figure B.6: Examples of 3-D plotting.
Additional Resources
rcParams
The default plotting parameters of Matplotlib can be set individually and with more fine control than
styles by using rcParams. rcParams is a dictionary that can be accessed as either plt.rcParams or
matplotlib.rcParams.
For instance, the resolution of plots can be changed via the "figure.dpi" parameter:
A list of parameters that can set via rcParams can be found at https://matplotlib.org/
stable/api/matplotlib_configuration_api.html#matplotlib.RcParams.
Animations
Matplotlib has capabilities for creating animated plots. The Animations lab in Volume 4 has detailed
instructions on how to do so.
[ADH+ 01] David Ascher, Paul F Dubois, Konrad Hinsen, Jim Hugunin, Travis Oliphant, et al.
Numerical python, 2001.
[jup] Jupyter notebooks—a publishing format for reproducible computational workflows. pages
87–90.
[MSP+ 17] Aaron Meurer, Christopher P Smith, Mateusz Paprocki, Ondrej Certík, Sergey B Kir-
pichev, Matthew Rocklin, AMiT Kumar, Sergiu Ivanov, Jason K Moore, Sartaj Singh,
et al. Sympy: symbolic computing in python. PeerJ Computer Science, 3:e103, 2017.
[Oli06] Travis E Oliphant. A guide to NumPy, volume 1. Trelgol Publishing USA, 2006.
[Oli07] Travis E Oliphant. Python for scientific computing. Computing in Science & Engineering,
9(3), 2007.
[PG07] Fernando Pérez and Brian E. Granger. IPython: a system for interactive scientific com-
puting. Computing in Science and Engineering, 9(3):21–29, may 2007.
[VD10] Guido VanRossum and Fred L Drake. The python language reference. Python software
foundation Amsterdam, Netherlands, 2010.
201