CSC110
CSC110
Intro
- Data type – a way of categorizing data
- Data type conveys:
o The allowed values for a piece of data
o The allowed operations we can perform on a piece of data
Numeric Data
- A natural number is a value from the set {0, 1, 2, …}
o Denoted by symbol ℕ
o In computer science, 0 is a natural number
- An integer is a value from the set {…, -2, -1, 0, 1, 2, …}
o Denoted by symbol ℤ
𝑝
- A rational number is a value from the set {𝑞 | 𝑝, 𝑞 ∈ ℤ 𝑎𝑛𝑑 𝑞 ≠ 0}
o Denoted by symbol ℚ
- An irrational number is a number with an infinite and non-repeating decimal expansion
o i.e. pi, e
o Denoted by symbol ℚ
- A real number is either a rational or irrational number
- Denoted by symbol ℝ
Boolean Data
- A Boolean is value from the set {True, False}
o Yes/No question
Textual Data
- A string is an ordered sequence of characters, it’s used to represent text
Writing Lists
- Written with square brackets enclosing zero or more values separated by commas
o i.e. [1, 2, 3]
- Empty list – a list having zero elements
o Denoted by []
Mapping Data
- Mapping – an unordered collection of pairs of values
- Each pair consists of a key and an associated value
- The keys must be unique, but the values can be duplicated
- A key cannot exist without a corresponding value
- Used to represent associations between two collections of data
o i.e. a mapping from the name of a country to its GDP; a mapping from student
number to name; etc.
Writing Mappings
- Curly braces are used to represent a mapping
o Similar to sets as both mappings and sets are unordered and both have a
uniqueness constraint (a set’s elements; a mapping’s keys)
- Each key-value pair in a mapping is written using a colon, with the key on the left side of
the colon and its associated value on the right
- i.e. {‘fries’ : 5.99, ‘steak’ : 25.99, ‘soup’ : 8.99}
Operations on Mappings
- |M|
o Returns the size of the mapping M
i.e. the number of key-value pairs in M
- M=N
o Returns whether two mappings are equal
i.e. when they contain exactly the same key-value pairs
- 𝑘∈𝑀
o Returns whether k is a key contained in the mapping M
- M[k]
o When k is a key in M, this operation returns the value that corresponds to k in
the mapping M
Extra
- Images can be represented as a list of integers
o Each element in the list corresponds to a dot called pixel on the screen
o For each dot, three integer values are used to represent three colour channels:
red, green, and blue
o We can add these channels together to get a very wide range of colours
Called the RGB colour model
Variables
- Variable – a piece of code that refers to a value
- We create variables in Python using the syntax:
o <variable> = <expression>
Called an assignment statement
When we execute an assignment statement, it doesn’t produce a value –
it instead defines a variable
- Expression – a piece of Python code that is evaluated to produce a value
- Python executes an assignment statement in 2 steps:
1. The expression on the right side of the = is evaluated, producing a value
2. That value is bound to the variable on the left side
- After the assignment statement is executed, the variable may be used to refer to the
value
>>> square(3.0)
9.0
>>> square(2.5)
6.25
“””
return x ** 2
- Function header – the first line, def square(x: float) -> float:
o Conveys the following pieces of information:
The function’s name (square)
The number and type of arguments the function expects
• Parameter – a variable in a function definition that refers to an
argument when the function is called
• The function has one parameter with name x and type float
The function’s return type
• The type following the -> (float)
o Syntax for a function header for a unary function:
def <function_name> (<parameter_name>: <parameter_type>) ->
<return_type>:
We choose the name square rather than f as the function name
We use data types to specify the function domain and codomain
• x: float specifies that the parameter x must be a float value
• -> float specifies that this function always return a float value
Type contract – the domain-codomain restriction in an analogous way to
𝑓: ℝ → ℝ
• float -> float
- Function docstring – the next lines that start and end with triple-quotes (“””)
o Another way of writing a comment in Python
o Text that is meant to be read by humans, but not executed as Python code
o Goal: to communicate what the function does
o First part of the docstring, Return x squared, is an English description of the
function
o Second part of the docstring looks like Python code
The first example: “when you type square(3.0) into the Python console,
9.0 is returned
The second example : “when you type square(2.5) into the Python
console, 6.25 is returned
Doctest examples – refers to the above examples
o The function docstring is indented inside the function header, as a visual
indicator that it is part of the overall function definition
- Body – the final line, return x ** 2
o Code that is executed when the function is called
o Also indented like the function docstring
o Uses keyword return, which signals the return statement
Form of return statement: return <expression>
o When a return statement is executed,
1. The <expression> is evaluated, producing a value
2. That value is then returned to wherever the function was called
• No more code in the function body is executed after this point
p1 and p2 are tuples of the form (x, y), where the x- and y-coordinates
are points.
p1 and p2 are tuples of the form (x, y), where the x- and y-coordinates
are points.
Intro
- A function call can only access its own variables, but not variables defined within other
functions
>>> square(3.0)
9.0
>>> square(2.5)
6.25
“””
return x ** 2
- The parameter x is a variable that is assigned a value based on when the function was
called
o This variable cannot be accessed from outside the body
- Local variable – variable limited to the function body
o i.e. x
- Scope – places in the code where a variable can be accessed
o A local variable of a function is a variable whose scope is the body of that
function
- Example:
>>> n = 10.0
>>> result = square(n + 3.5)
o 13.5 is assigned to the parameter x
o Incorrect memory model diagram:
Variable Value
n 10.0
x 13.5
o We group the variables together based on whether they are introduced in the
Python console or inside a function:
_main_ (console) square
Variable Value Variable Value
n 10.0 x 13.5
o We use the name _main_ to label the table for variables defined in the Python
console
o Inside the body of square, the only variable that can be used is x
At the point that the body of square is evaluated, only the “square” table
in the memory model is active
o Outside in the Python console, the only variable that can be used is n
After square returns and we’re back to the Python console, the “square”
table is no longer accessible, and only the _main_ table is active
>>> square(3.0)
9.0
>>> square(2.5)
6.25
“””
return x ** 2
Intro
- Modules – Python code files
- The modules are not automatically loaded
>>> is_even(2)
True
>>> is_even(17)
False
“””
4. Implement the function body
a. Indent it to match the docstring
b. Review the examples and consider how we determined the return values
c. i.e.
def is_even(value: int) -> bool:
“””Return whether value is even.
>>> is_even(2)
True
>>> is_even(17)
False
“””
return value % 2 == 0
5. Test the function
a. Test all examples
i. Try with some tricky cases as well
b. Test by calling it in the Python console
c. If we encounter any errors/incorrect return values, make sure that our tests are
correct
d. Go back to Step 4 and try to identify and fix any possible errors in the code
i. Called debugging
# In files trues.py
# In file test_trues.py
if __name__ == ‘__main__’:
import pytest
pytest.main([‘test_trues.py’])
Intro
- We can convert values between different data types
- i.e.
>>> int(‘10’)
10
>>> float(‘10’)
10.0
>>> bool(1000)
True
>>> bool(0)
False
>>> list({1, 2, 3})
[1, 2, 3]
>>> set([1, 2, 3])
{1, 2, 3}
>>> dict([(‘a’, 1), (‘b’, 2), (‘c’, 3’)])
{‘a’: 1, ‘b’: 2, ‘c’: 3}
- Every value of the data types we’ve studied so far has a string representation
o i.e.
>>> str(10)
‘10’
>>> str(True)
‘True’
>>> str([1, 2, 3])
‘[1, 2, 3]’
Intro
- Numbers represent textual data via functions
- Bit – what we call each 0 or 1
- ASCII – a function with domain {0, 1, …, 127} whose codomain is the set of all possible
characters
o Length-7 sequences of bits
Can represent 27 = 128 different characters
o The function is one-to-one, meaning no two numbers map to the same character
o This standard covered all English letters (lowercase and uppercase), digits,
punctuation, and various others (e.g. to communicate a new line)
i.e. 65 is mapped to ‘A’ and 126 is mapped to ‘~’, etcl
- Computer scientists extended ASCII from length-7 to length-8 sequences of bits, and
hence its domain increased to size 256 ({0, 1, …, 255}
o Allowed “extended ASCII” to support some other characters used in similar Latin-
based languages
i.e. ‘é’ (233), ‘ō’ (246), ‘©’ (169), etc.
- The latest standard, Unicode, uses up to 32 bits that gives us a domain of {0, 1, …, 232-1},
over 4 billion different numbers
o This number is larger than the number of distinct characters in use across all
different languages
o There are several unused numbers in the domain of Unicode
o The unused numbers are being used to map to emojis
An emoji may appear as a different emoji on another device
o The process involves submitting a proposal for a new emoji and computer
scientists supporting newly approved emojis by updating their software
Propositions
- Propositional logic – an elementary system of logic that is a crucial building block
underlying other
- Proposition – a statement that is either True or False
o i.e. 2 + 4 = 6; 3 – 5 > 0; Python’s implementation of list.sort is correct on every
input list; etc.
- Propositional variables – variables that represent propositions
o By conventions, propositional variable names are lowercase letters starting at p
- Propositional/Logical operator – an operator whose arguments must all be either True
or False
- Propositional formula – an expression that is built up from propositional variables by
applying the propositional operators
Summary
Operator Notation English Python Operation
NOT ¬𝑝 p is not true not p
AND 𝑝⋀𝑞 p and q p and q
OR 𝑝⋁𝑞 p or q (or both) p or q
Implication 𝑝⇒𝑞 if p, then q not p or q
Biconditional 𝑝⟺𝑞 p if and only if q p == q
Intro
- Predicate – a statement whose truth value depends on 1 or more variables from any set
o Codomain of the function: {True, False}
o Use uppercase letters starting from P to represent predicates
o i.e. P(x) is defined to be the statement “x is a power of 2”
P(8) is True and P(7) is False
- Predicates can depend on more than 1 variable
o i.e. Q(x, y) means “x2 = y”
Q(5, 25) is True since 52 = 25, but Q(5, 24) is False
- We must always give the domain of a predicate as part of its definition
o i.e. P(x): “x is a power of 2,”, where 𝑥 ∈ ℕ
Quantification of Variables
- A predicate by itself does not have a truth value
o i.e. “x is a power of 2” is neither True nor False, since we don’t know that value
of x
o Setting x = 8 in the statement, the statement is now True
- Most of the times, we care about some aggregation of the predicate’s truth values over
all elements of its domain
o i.e. “Every real number x satisfies the inequality 𝑥 2 − 2𝑥 + 1 ≥ 0” makes a claim
about all possible values of x
- Quantifier – modifies a predicate by specifying how a certain variable should be
interpreted
- Existential quantifier
o Existential quantifier – written as ∃
“There exists an element in the domain that satisfies the given predicate”
i.e. ∃𝑥 ∈ ℕ, 𝑥 ≥ 0
“There exists a natural number x that is greater than or equal to 0.”
• True because when x = 1, x ≥ 0
o There has to be at least 1 element of the domain satisfying the predicate
Doesn’t say exactly how many elements do
o ∃𝑥 ∈ 𝑆 – a big OR that runs through all possible values for x from the domain S
i.e. for the previous example,
(0 ≥ 0) ∨ (1 ≥ 0) ∨ (2 ≥ 0) ∨ …
- Universal quantifier
o Universal quantifier – written as ∀
“Every element in the domain satisfies the given predicate”
i.e. ∀𝑥 ∈ ℕ, 𝑥 ≥ 0
“Every natural number x is greater than or equal to 0”
• True because the smallest natural number is 0 itself
i.e. ∀𝑥 ∈ ℕ, 𝑥 ≥ 10 is False
o ∀𝑥 ∈ 𝑆 – a big AND that runs through all possible values of x from S
i.e. for the first example,
(0 ≥ 0) ∧ (1 ≥ 0) ∧ (2 ≥ 0) ∧ …
- Example: We define Loves(a, b) is a binary predicate that is True whenever person a
loves person b
o A = {Ella, Patrick, Malena, Breanna}
o B = {Laura, Stanley, Thelonious, Sophia}
o A line between 2 people indicates that the person on the left loves the person on
the right
Brenna Sophia
Malena Thelonious
Patrick Stanley
Ella Laura
o Consider the following statements:
∃𝑎 ∈ 𝐴, 𝐿𝑜𝑣𝑒𝑠(𝑎, Thelonious) means “there exists someone in A who
loves Thelonious”
• True (Malena loves Thelonious)
∃𝑎 ∈ 𝐴, 𝐿𝑜𝑣𝑒𝑠(𝑎, Sophia) means “there exists someone in A who loves
Sophia”
• False (no one loves Sophia)
∀𝑎 ∈ 𝐴, 𝐿𝑜𝑣𝑒𝑠(𝑎, Stanley) means “every person in A loves Stanley”
• True (all 4 people in A loves Stanley)
∀𝑎 ∈ 𝐴, 𝐿𝑜𝑣𝑒𝑠(𝑎, Thelonious) means “every person in A loves
Thelonious”
• False (Ella does not love Thelonious)
Manipulating Negation
- Given any formula, we can state its negation by preceding it by a ¬ symbol
o i.e. ¬(∀𝑥, 𝑦 ∈ ℕ, ∀𝑧 ∈ ℝ, 𝑃(𝑥, 𝑦) ⟹ 𝑄(𝑥, 𝑦, 𝑧))
o Hard to understand if we try to transliterate each part separately
- Given a formula using negations, we apply some simplification rules to “push” the
negation symbol closer to the individual predicates:
o ¬(¬𝑝) becomes 𝑝
o ¬(𝑝 ∨ 𝑞) becomes (¬𝑝) ∨ (¬𝑞)
o ¬(𝑝 ∧ 𝑞) becomes (¬𝑝) ∧ (¬𝑞)
o ¬(𝑝 ⟹ 𝑞) becomes 𝑝 ∧ (¬𝑞)
o ¬(𝑝 ⟺ 𝑞) becomes (𝑝 ∧ (¬𝑞)) ∨ ((¬𝑝) ∧ 𝑞))
o ¬(∃𝑥 ∈ 𝑆, 𝑃(𝑥)) becomes ∀𝑥 ∈ 𝑆, ¬𝑃(𝑥)
o ¬(∀𝑥 ∈ 𝑆, 𝑃(𝑥)) becomes ∃𝑥 ∈ 𝑆, ¬𝑃(𝑥)
The if Statement
- If statement – compound statement that expresses conditional execution of code
o Compound statement – contains other statements within it
o Syntax:
if <condition>:
<statement>
…
else:
<statement>
…
o If condition – the <condition> following if that evaluates to a boolean
Analogous to the hypothesis of an implication
o If branch – statements under the if
o Else branch – statements under the else
o When an if statement is executed,
The if condition is evaluated, producing a boolean value
If the condition evaluates to True, then the statements in the if branch
are executed.
If the condition evaluates to False, then the statements in the else branch
are executed instead
o i.e.
def get_status(scheduled: int, estimated: int) -> str:
“””Return the flight status for the given scheduled and estimated
departure times
Using if Statements
- Prefer using a sequence of elifs rather than nested if statements
- Write conditions from most specific to most general
o Order matters for conditions, since they are checked one at a time in top-down
order
Import Statements
- Allows the program that executes the import statement to access the functions and
data types defined within that module
- By default, all statements in the imported module are executed, not just function and
data type definitions
- Without the if statement, all the doctests inside the imported modules will be run, even
though they are not relevant for a program that just wants to use the imported modules
Enter __name__
- __name__ - a special variable for each module when a program is run
o Double underscore denotes special variable or function names
o i.e. the __name__ attribute of math is ‘math’
>>> import math
>>> math.__name__
‘math’
- When we run a module, the Python interpreter overrides the default module __name__
and instead sets it to the special string ‘__main__’
- Checking the __name__ variable is a way to determine if the current module is being
run, or whether it’s being imported by another module
- if __name__ == ‘__main__’ – “Execute the following code if this module is being run,
and ignore the following code if this module is being imported by another module”
Simple Specifications
- Example:
def is_even(n: int) -> bool:
“””Return whether n is even.
>>> is_even(1)
False
>>> is_even(2)
True
“””
# Body omitted
- The function’s type contract and description forms a complete specification of this
function’s behaviour:
o The type annotation of the parameter n tells us that the valid inputs to is_even
are int values
This type annotation int specifies a precondition of the function
o The type annotation for the return value tells us that the function will always
return a bool
The function description Return whether n is even completes the
specification by indicating how the return value is based on the input
These specify postconditions of the function
- is_even is implemented correctly when for all ints n, is_even(n) returns a bool that is
True when n is even, and False when n is not even
- If this happens:
>>> is_even(4)
False
o It’s the implementer’s fault
- If this happens and an error occurs
>>> is_even([1, 2, 3])
o It’s the caller’s fault
Preconditions in General
- Consider this function:
def max_length(strings: set) -> int:
“””Return the maximum length of a string in the set of strings.
Preconditions:
len(strings) > 0
“””
return max({len(s) for s in strings})
Now, when we call max_length(empty_set), and receive an error, it is our
fault for violating the precondition
- Checking preconditions automatically with python_ta
o Preconditions can be turned into executable Python code
Use an assert statement as follows:
def max_length(strings: set) -> int:
“””Return the maximum length of a string in the set of
strings.
Preconditions:
len(strings) > 0
“””
assert len(strings) > 0, ‘Precondition violated: max_length
called on an empty set’
return max({len(s) for s in strings})
• Now, precondition is checked every time the function is called,
with a meaningful error message when the precondition is
violated
o We can also use the python_ta library to check preconditions
if __name__ == ‘__main__’:
import python_ta.contracts
python_ta.contracts.DEBUG_CONTRACTS = False # Disable
contract debug messages
python_ta.contracts.check_all_contracts()
max_length(set())
The function we import, check_all_contracts, takes the function type
contract and any preconditions it finds in the function docstring, and
causes the function to check the preconditions every time the function is
called
o check_all_contracts also checks the return type of each function
Intro
- Recall the function max_length
def max_length(strings: set) -> int:
“””Return the maximum length of a string in the set of strings.
Preconditions:
len(strings) > 0
“””
return max({len(s) for s in strings})
o >>> max_length ({1, 2, 3}) outputs an error despite the fact that our inputs are
valid
Preconditions:
len(strings) > 0
“””
return max({len(s) for s in strings})
- General collections
o The above typing module’s collection types are not needed when:
We don’t care what’s in the list
We want a list with elements of different types
Intro
- Definitions – what we use to express a long idea using a single term
- Let 𝑛, 𝑑 ∈ ℤ. We say that d divides n, or n is divisible by d, when there exists a 𝑘 ∈ ℤ
such that 𝑛 = 𝑑𝑘
o In this case, we use the notation 𝑑 | 𝑛 to represent “d divides n”
o | is a binary predicate
i.e. 3 | 6 is True; 4 | 10 is false
o This definition permits 𝑑 = 0
When 𝑑 = 0, 𝑑 | 𝑛 if and only if 𝑛 = 0
- Expressing the statement “For every integer x, if x divides 10, then it also divides 100”
o With the predicate:
∀𝑥 ∈ ℤ, 𝑥 | 10 ⟹ 𝑥 | 100
o Without the predicate
Replace every instance of 𝑑 | 𝑛 with ∃𝑘 ∈ ℤ, 𝑛 = 𝑑𝑘
∀𝑥 ∈ ℤ, (∃𝑘 ∈ ℤ, 10 = 𝑘𝑥) ⟹ (∃𝑘 ∈ ℤ, 100 = 𝑘𝑥)
Each subformula in the parentheses has its own 𝑘 variable, whose scope
is limited by the parentheses
To emphasize their distinctness,
• ∀𝑥 ∈ ℤ, (∃𝑘1 ∈ ℤ, 10 = 𝑘1 𝑥) ⟹ (∃𝑘2 ∈ ℤ, 100 = 𝑘2 𝑥)
- Let 𝑝 ∈ ℤ. We say 𝑝 is prime when it is greater than 1 and the only natural numbers that
divide it are 1 and itself
- Define a predicate 𝐼𝑠𝑃𝑟𝑖𝑚𝑒(𝑝) to express the statement that “𝑝 is a prime number”
o First part: “greater than 1”
o Second part: “if a number 𝑑 divides 𝑝, then 𝑑 = 1 or 𝑑 = 𝑝”
o 𝐼𝑠𝑃𝑟𝑖𝑚𝑒(𝑝) ∶ 𝑝 > 1 ∧ (∀𝑑 ∈ ℕ, 𝑑 | 𝑝 ⇒ 𝑑 = 1 ∨ 𝑑 = 𝑝), where 𝑝 ∈ ℤ
o To express the idea without using divisibility predicate,
𝐼𝑠𝑃𝑟𝑖𝑚𝑒(𝑝) ∶ 𝑝 > 1 ∧ (∀𝑑 ∈ ℕ, (∃𝑘 ∈ ℤ, 𝑝 = 𝑘𝑑) ⇒ 𝑑 = 1 ∨ 𝑑 = 𝑝),
where 𝑝 ∈ ℤ
Expressing Definitions in Programs
- Consider the divisibility predicate |, where 𝑑 | 𝑛 means ∃𝑘 ∈ ℤ, 𝑛 = 𝑘𝑑 (for 𝑑, 𝑛 ∈ ℤ)
- Without using the modulo operator %, it is challenging to translate the mathematical
definition of divisibility precisely into a Python function
o We cannot represent infinite sets in a computer program
- We can use a property of divisibility to restrict the set of numbers to quantify over:
o When 𝑛 ≠ 0, every number that divides 𝑛 must lie in the range {−|𝑛|, −|𝑛| +
1, … , |𝑛| − 1, |𝑛|}
- In Python, we represent the above set using the range data type:
o possible_divisors = range(- abs(n), abs(n) + 1)
- After we replace ℤ with possible_divisors, we can now translate the definition into
Python code
def divides(d: int, n: int) -> bool:
“””Return whether d divides n.”””
possible_divisors = range(- abs(n), abs(n) + 1)
return any({n == k * d for k in possible_divisors}))
- We can also translate the prime number definition in Python:
def is_prime(p: int) -> bool:
“””Return whether p is prime.”””
possible_divisors = range(1, p+1)
return (
p > 1 and
all({d == 1 or d == p for d in possible_divisors if divides(d, p)})
)
Property-Based Testing
- Property-based testing – a single test that consists of a large set of possible inputs that is
generated in a programmatic way
- Property-based tests use assert statements to check for properties that the function
being tested should satisfy
- Possible properties that every output of the function should satisfy:
o The type of the output
The function str should always return a string
o Allowed values of the output
The function len should always return an integer that is greater than or
equal to zero
o Relationships between input and output
The function max(x, y) should return something that is greater than or
equal to both x and y
o Relationships between two (or more) input-output pairs
“For any two lists of numbers nums1 and nums2, we know that
sum(nums1 + nums2) == sum(nums1) + sum(nums2)
Using hypothesis
- Consider the function is_even, which we define in a file called my_functions.py
# Suppose we’ve saved this in my_functions.py
>>> is_even(2)
True
>>> is_even(17)
False
“””
return value % 2 == 0
- Rather than choosing specific inputs to test is_even on, we’re going to test the following
two properties:
o is_even always return True when given an int of the form 2 * x (where x is an int)
o is_even always return False when given an int of the form 2 * x + 1 (where x is an
int)
- Using symbolic notation:
o ∀𝑥 ∈ ℤ, is_even(2𝑥)
o ∀𝑥 ∈ ℤ, ¬is_even(2𝑥 + 1)
- To test the function, we first create a new file called test_my_functions.py and include
the following “test” function
# In file test_my_functions.py
from my_functions import is_even
@given(x=integers())
def test_is_even_2x(x: int) -> None:
“””Test that is_even returns True when given a number of the form 2 *
x”””
assert is_even(2 * x)
- Integers – a hypothesis function that returns a special data type called a strategy
o Strategy – what hypothesis uses to generate a range of possible inputs
o Calling integers() returns a strategy that generates ints
- Given – a hypothesis function that takes in arguments in the form <param>=<strategy>,
which acts as a mapping for the test parameter name to a strategy that hypothesis
should use for generating arguments for that parameter
- The line @given(x=integers()) decorates the test function, so that when we run the test
function, hypothesis will call the test several times, using int values for x as specified by
the strategy integers()
o @given helps automate the process of “run the test on different int values”
- To actually run the test, we use pytest
if __name__ == ‘__main__’:
import pytest
pytest.main([‘test_my_functions.py’, ‘-v’])
- Testing odd values
o We can write multiple property-based tests in the same file and have pytest run
each of them
o This version of test_my_function.py adds a second test for numbers of the form
2𝑥 + 1
# In file test_my_functions.py
from hypothesis import given
from hypothesis.strategies import integers
@given(x=integers())
def test_is_even_2x(x: int) -> None:
“””Test that is_even returns True when given a number of the
form 2 * x”””
assert is_even(2 * x)
@given(x=integers())
def test_is_even_2x_plus_1(x: int) -> None:
“””Test that is_even returns Fals when a given number of the
form 2 * x + 1.”””
assert is_even(2 * x + 1)
if __name__ == ‘__main__’:
import pytest
pytest.main([‘test_my_function.py’, ‘-v’])
@given(nums=lists(integers()), x=integers())
def test_num_evens_one_more_even(nums: List[int], x: int) -> None:
“””Test num_evens when you add one more even element.”””
assert num_evens(nums + [2 * x]) == num_evens(nums) + 1
if __name__ == ‘__main__’:
import pytest
pytest.main([‘test_my_function.py’, ‘-v’])
- Choosing “enough” properties
o A single property alone does not guarantee that a function is correct
o The ideal goal of property-based testing is choosing properties to verify
If all of the properties are verified, then the function must be correct
o An implementation for num_evens is correct (i.e. returns the number of even
elements for any list of numbers) if and only if it satisfies all 3 of the following:
nums_evens([]) = 0
∀nums ∈ 𝐿𝑖𝑛𝑡 , ∀𝑥 ∈ ℤ, nums_evens(nums + [2𝑥]) =
nums_evens(nums) + 1
∀nums ∈ 𝐿𝑖𝑛𝑡 , ∀𝑥 ∈ ℤ, nums_evens(nums + [2𝑥 + 1]) =
nums_evens(nums)
o This means that we can be certain that our num_evens function is correct with
just 1 unit test and 2 property tests
Intro
- Recall the love table
Sophia Thelonious Stanley Laura
Breanna False True True False
Malena False True True True
Patrick False False True False
Ella False False True True
- Since the 𝐿𝑜𝑣𝑒𝑠 predicate is binary, we can quantify both of its inputs
- ∀𝑎 ∈ 𝐴, ∀𝑏 ∈ 𝐵, 𝐿𝑜𝑣𝑒𝑠(𝑎, 𝑏)
o “For every person 𝑎 in 𝐴, for every person 𝑏 in 𝐵, 𝑎 loves 𝑏”
o The order in which we quantified 𝑎 and 𝑏 doesn’t matter
o “For every person 𝑏 in 𝐵, for every person 𝑎 in 𝐴, 𝑎 loves 𝑏” means the same
thing
- The following two formulas are equivalent:
o ∀𝑥 ∈ 𝑆1 , ∀𝑦 ∈ 𝑆2 , 𝑃(𝑥, 𝑦)
o ∀𝑦 ∈ 𝑆2 , ∀𝑥 ∈ 𝑆1 , 𝑃(𝑥, 𝑦)
- The following two formulas are also equivalent:
o ∃𝑥 ∈ 𝑆1 , ∃𝑦 ∈ 𝑆2 , 𝑃(𝑥, 𝑦)
o ∃𝑦 ∈ 𝑆2 , ∃𝑥 ∈ 𝑆1 , 𝑃(𝑥, 𝑦)
- The above would not be the case for a pair of alternating quantifiers
- ∀𝑎 ∈ 𝐴, ∃𝑏 ∈ 𝐵, 𝐿𝑜𝑣𝑒𝑠(𝑎, 𝑏)
o “For every person 𝑎 in 𝐴, there exists a person 𝑏 in 𝐵, such that 𝑎 loves 𝑏”
“Everyone in 𝐴 loves someone in 𝐵”
o This is true: every person in 𝐴 loves at least one person
𝑎 (from 𝐴) 𝑏 (a person in 𝐵 who 𝑎 loves)
Breanna Thelonious
Malena Laura
Patrick Stanley
Ella Stanley
o Since the quantifier ∃𝑏 ∈ 𝐵 occurs after 𝑎, the choice of 𝑏 is allowed to depend
on the choice of 𝑎
- ∃𝑏 ∈ 𝐵, ∀𝑎 ∈ 𝐴, 𝐿𝑜𝑣𝑒𝑠(𝑎, 𝑏)
o “There exists a person 𝑏 in 𝐵, where for every person 𝑎 in 𝐴, 𝑎 loves 𝑏”
“Someone in 𝐵 is loved by everyone in 𝐴”
o This is true because everyone in 𝐴 loves Stanley
𝑏 (from 𝐵) Loved by everyone in 𝐴?
Sophia No
Thelonious No
Stanley Yes
Laura No
o Since the quantifier ∃𝑏 ∈ 𝐵 occurs before 𝑎, the choice of 𝑏 must be
independent of the choice of 𝑎
- When reading a nested quantified expression, always read from left to right and pay
attention to the order of the quantifiers
B={
‘Sophia’: 0,
‘Thelonious’: 1,
‘Stanley’: 2,
‘Laura’: 3
}
- We can define a loves predicate, which takes in two strings and return whether person a
loves person b
def loves(a: str, b: str) -> bool:
“””Return whether the person at index a loves the person at index b.
Preconditions:
- a in A
- b in B
>>> loves(‘Breanna’, ‘Sophia’)
False
“””
a_index = A[a]
b_index = B[b]
return LOVES_TABLE[a_index][b_index]
- We can represent the statements in predicate logic we’ve written
o ∀𝑎 ∈ 𝐴, ∀𝑏 ∈ 𝐵, 𝐿𝑜𝑣𝑒𝑠(𝑎, 𝑏)
>>> all ({loves(a, b) for a in A for b in B})
False
o ∃𝑎 ∈ 𝐴, ∃𝑏 ∈ 𝐵, 𝐿𝑜𝑣𝑒𝑠(𝑎, 𝑏)
>>> any ({loves(a, b) for a in A for b in B})
True
o ∀𝑎 ∈ 𝐴, ∃𝑏 ∈ 𝐵, 𝐿𝑜𝑣𝑒𝑠(𝑎, 𝑏)
>>> all({any({loves(a, b) for b in B}) for a in A})
True
It is very hard to read the above statement
Never nest all/any calls
We can pull out the inner any into its own function
def loves_someone(a: str) -> bool:
“””Return whether a loves at least one person in B.
Preconditions:
- a in A
“””
return any({loves(a, b) for b in B})
Preconditions:
- b in B
“””
return all({loves(a, b)} for a in A)
A Worked Example
- Question: what is the average number of marriage licenses issued by each civic centre?
- To solve the question, we need to create a dictionary comprehension
>>> names = {row[1] for row in marriage_data}
>>> names
{‘NY’, ‘TO’, ‘ET’, ‘SC’}
>>> {key: 0 for key in names}
{‘NY’: 0, ‘TO’: 0, ‘ET’: 0, ‘SC’: 0}
- Calculate the average number of marriage licenses issued per month for the ‘TO’ civic
centre
>>> [row for row in marriage_data if row[1] == ‘TO’] # The ‘TO’ rows
[[1660, ‘TO’, 367, datetime.date(2011, 1, 1)], [1664, ‘TO’, 383,
datetime.date(2011, 2, 1)]]
>>> [row[2] for row in marriage_data if row[1] == ‘TO’] # The ‘TO’ marriages
issued
[367, 383]
>>> issued_by_TO = [row[2] for row in marriage_data if row[1] == ‘TO’]
- Now issued_by_TO is a list containing the number of marriage licenses issued by the
‘TO’ civic centre
- We can calculate its average by dividing its length
>>> sum(issued_by_TO) / len(issued_by_TO)
375.0
- To merge the above code with the dictionary comprehension outline, first design a
function that calculates the average for only 1 civic centre
from typing import List
Preconditions:
- all({len(row) == 4 for row in data})
- data is in the format described in Section 4.1
“””
issued_by_civic_centre = [row[2] for row in data if row[1] == civic_centre]
if issued_by_civic_centre == []:
return 0.0
else:
total = sum(issued_by_civic_centre)
count = len(issued_by_civic_centre)
Preconditions:
- marriage_data is in the format described in Section 4.1
“””
names = {‘TO’, ‘NY’, ‘ET’, ‘SC’}
return {key: average_licenses_issued(marriage_data, key) for key in
names}
@dataclass
class Person:
“””A custom data type that represents data for a person.”””
given_name: str
family_name: str
age: int
address: str
o from dataclasses import dataclass is a Python import statement that lets us use
dataclass below
o @dataclass is a Python decorator. It tells Python that the data type we’re
defining is a data class
o class Person: signals the start of a class definition
The name of the class is Person
The rest of the code is indented to put it inside of the class body
o The next line is a docstring that describes the purpose of the class
o Each remaining line defines a piece of data associated with the class
Each piece of data is called an instance attribute of the class
For each instance attribute, we write a name and a type annotation
- General data class definition syntax
@dataclass
class <ClassName>:
“””Description of data class.
“””
<attribute1>: <type1>
<attribute2>: <type2>
…
Representation Invariants:
- self.age >= 0
”””
given_name: str
family_name: str
age: int
address: str
- In the class docstring, we use the variable name self to refer to a generic instance of the
data class
o This use of self is Python convention
- Checking representation invariants automatically with python_ta
o Like preconditions, representation invariants are assumptions that we make
about values of a data type
i.e. we can assume that every Person instance has an age that’s greater
than or equal to zero
o Representation invariants are also constraints on how we can create a data class
instance
o python_ta.contracts supports checking all representation invariants:
# class Person above
if __name__ == ‘__main__’:
import python_ta.contracts
python_ta.contracts.DEBUG_CONTRACTS = False
python_ta.contracts.check_all_contracts()
o Then, a negative age would output an assertion error
Instance Attributes:
- given_name: the person’s given name
- family_name: the person’s family name
- age: the person’s age
- address: the person’s address
Instance Attributes:
- given_name: the person’s given name
- family_name: the person’s family name
- age: the person’s age
- address: the person’s address
Representation Invariants:
- self.age >= 0
A Worked Example
- Recall the marriage license data set
marriage_data = [
[1657, ‘ET’, 80, datetime.date(2011, 1, 1)],
[1658, ‘NY’, 136, datetime.date(2011, 1, 1)],
[1659, ‘SC’, 159, datetime.date(2011, 1, 1)],
[1660, ‘TO’, 367, datetime.date(2011, 1, 1)],
[1661, ‘ET’, 109, datetime.date(2011, 2, 1)],
[1662, ‘NY’, 150, datetime.date(2011, 2, 1)],
[1663, ‘SC’, 154, datetime.date(2011, 2, 1)],
[1664, ‘TO’, 383, datetime.date(2011, 2, 1)]
]
- Rather than storing each row in the table as a list, we can introduce a new data class to
store this information
from dataclasses import dataclass
from datetime import date
@dataclass
class MarriageData
“””A record of the number of marriage licenses issued in a civic centre in
a given month.
Instance Attributes:
- id: a unique identifier for the record
- civic_centre: the name of the civic centre
- num_licenses: the number of licenses issued
- month: the month these licenses were issued
“””
id: int
civic_centre: str
num_licenses: int
month: date
- Using the above data class, we can represent tabular data as a list of MarriageData
instances rather than a list of lists
- The values representing each entry in the table are the same, but how we “bundle” each
row of data into a single entity is different:
>>> marriage_data = [
MarriageData(1657, ‘ET’, 80, datetime.date(2011, 1, 1)),
MarriageData(1658, ‘NY’, 136, datetime.date(2011, 1, 1)),
MarriageData(1659, ‘SC’, 159, datetime.date(2011, 1, 1)),
MarriageData(1660, ‘TO’, 367, datetime.date(2011, 1, 1)),
MarriageData(1661, ‘ET’, 109, datetime.date(2011, 2, 1)),
MarriageData(1662, ‘NY’, 150, datetime.date(2011, 2, 1)),
MarriageData(1663, ‘SC’, 154, datetime.date(2011, 2, 1)),
MarriageData(1664, ‘TO’, 383, datetime.date(2011, 2, 1))
]
- Instead of writing row[1] and row[2] in a comprehension, we now write
row.civic_centre and row.num_licenses
o This is more explicit in what attributes of the data are accessed
o “Explicit is better than implicit”
return sum_so_far
- We no longer need to use list indexing (numbers[n]) to access individual list elements
o The for loop in Python handles the extracting of individual elements for us
- Accumulators and tracing through loops
o The frequent reassignment of a variable can make loops hard to reason about
o We call the variable sum_so_far the loop accumulator
o Loop accumulator – stores an aggregated result based on the elements of the
collection that have been previously visited by the loop
o We can keep track of the execution of the different iterations of the loop in the
loop accumulation table consisting of three columns:
How many iterations have occurred so far
the value of the loop variable for that iteration
The value of the loop accumulator at the end of that iteration
Iteration Loop variable (number) Loop accumulator (sum_so_far)
0 N/A 0
1 10 10
2 20 30
3 30 60
o Use the _so_far suffix in the variable name of accumulator variables and add a
comment explaining the purpose of the variable
def my_sum(numbers: List[int]) -> int:
“””Return the sum of the given numbers.
return sum_so_far
- When the collection is empty
o When we call my_sum on an empty list,
>>> my_sum([])
0
o This happens because sum_so_far is assigned to 0, and then the for loop does
not execute any code, and do 0 is returned
o When the collection is empty, the initial value of sum_so_far is returned
return <x>_so_far
- Accumulating the product
from typing import List
return product_so_far
>>> my_len(‘David’)
5
“””
# ACCUMULATOR len_so_far: keep track of the number of
# characters in s seen so far in the loop
len_so_far = 0
for _ in s:
len_so_far = len_so_far + 1
return len_so_far
o We used an underscore here because we don’t care what the actual value the
character is – we are only counting iterations
The loop variable is not used in the body of the for loop
return total_so_far
o Loop accumulation table:
Iteration Loop variable (item) Loop accumulator (total_so_far)
0 0.0
1 ‘fries’ 6.5
2 ‘hamburger’ 10.0
- Like sets, dictionaries are unordered
>>> my_len(‘David’)
5
>>> my_len([1, 2, 3])
3
>>> my_len({‘a’: 1000})
1
“””
# ACCUMULATOR len_so_far: keep track of the number of
# characters in s seen so far in the loop
len_so_far = 0
for _ in collection:
len_so_far = len_so_far + 1
return len_so_far
- Accumulators can work with any iterable object
- Alternatives to for loops
o Many of the above examples can be solved using comprehensions rather than
loops
o Comprehensions are often shorter and more direct translations of a computation
than for loops
o For loops allow us to customize exactly how filtering and aggregation occurs
Multiple Accumulators
- For example, given a dictionary mapping menu items to prices, we can get the average
price using 2 accumulators:
def average_menu_price(menu: Dict[str, float]) -> float:
“””Return the average price of an item from the menu.
>>> count_vowels(‘aeiou’)
5
>>> count_vowels(‘David’)
2
“””
# ACCUMULATOR vowels_so_far: keep track of the number of vowels
# seen so far in the loop.
vowels_so_far = 0
for letter in s:
if letter in ‘aeiou’:
vowels_so_far = vowels_so_far + 1
return vowels_so_far
o If the word is the empty string, the for loop will not iterate once and the value 0
is returned
o Two cases for the loop body:
1. When letter is a vowel, the reassignment vowels_so_far =
vowels_so_far + 1 increases the number of vowels seen so far by 1
2. When letter is not a vowel, nothing else happens in the current
iteration because this if statement has no else branch.
• The vowel count remains the same
o Loop accumulation table for count_vowels(‘David’)
Loop Iteration Loop Variable letter Accumulator vowels_so_far
0 0
1 ‘D’ 0
2 ‘a’ 1
3 ‘v’ 1
4 ‘i’ 2
5 ‘d’ 2
o This function can be compared to an equivalent implementation using a filtering
comprehension
- Implementing max
from typing import List
Preconditions:
- numbers != []
return max_so_far
o The accumulator max_so_far is updated only when a larger number is
encountered (if number > max_so_far)
- Existential search
def starts_with(strings: Iterable[str], char: str) -> bool:
“””Return whether one of the given strings starts with the character char.
Precondition:
- all({s != ‘’ for s in strings})
- len(char) == 1
for s in strings:
if s[0] == char:
starts_with_so_far = True
return starts_with_so_far
o To update the accumulator, we set it to True when the current string s starts
with char
o Loop accumulation table:
Iteration Loop variable s Accumulator starts_with_so_far
0 False
1 ‘Hello’ False
2 ‘Goodbye’ False
3 ‘David’ True
4 ‘Mario’ True
- Early returns
o The function starts_with_v2 performs unnecessary work because it must loop
through every element of the collection before returning a result
o As soon as condition s[0] == char evaluates to True, we know that the answer is
yes without checking any of the remaining strings
o We can use the return statement inside the body of the loop
o No code execute after the return statement
def starts_with_early_return(strings: Iterable[str], char: str) -> bool:
“””…”””
for s in strings:
if s[0] == char:
return True
return False
We no longer have the accumulator variable
- One common error
def starts_with_wrong(strings: Iterable[str], char: str) -> bool:
“””…”””
for s in strings:
if s[0] == char:
return True
else:
return False
o The loop will only ever perform 1 iteration
o Existential searches are asymmetric:
The function can return True early as soon as it has found an element of
the collection meeting the desired criterion
To return False, it must check every element of the collection
- Universal search
def all_start_with(strings: Iterable[str], char: str) -> bool:
“””Return whether all of the given strings starts with the character char.
Precondition:
- all({s != ‘’ for s in strings})
- len(char) == 1
return True
Repeating Code
- Recall the my_sum function
def my_sum(numbers: List[int]) -> int:
“””Return the sum of the given numbers.
return sum_so_far
- For the my_sum function, we know that the index starts at 0 and ends at the length – 1
def my_sum_v2(numbers: List[int]) -> int:
“””…”””
# ACCUMULATOR sum_so_far: keep track of the running sum of the
# elements in numbers.
sum_so_far = 0
return sum_so_far
- Differences between my_sum and my_sum_v2:
o Loop variable number vs. i:
number refers to an element of lthe list numbers (starting with the first
element)
i refers to an integer (starting at 0)
o Looping over a list vs. a range:
for number in numbers causes the loop body to execute once for each
element in numbers
for i in range(0, len(numbers)) causes the loop body to execute once for
each integer in range(0, len(numbers))
o Updating the accumulator:
Since number refers to a list element, we can add it directly to the
accumulator
Since i refers to where we are in the list, we access the corresponding list
element using list indexing to add it to the accumulator
- Both our element-based and index-based implementations are correct here
>>> count_adjacent_repeats(‘look’)
1
>>> count_adjacent_repeats(‘David’)
0
“””
- We want to use an accumulator variable that starts at 0 and increases by 1 every time
two adjacent repeated characters are found
- Comparisons:
o string[0] == string[1]
o string[1] == string[2]
o etc.
def count_adjacent_repeats(string: str) -> int:
“””…”””
# ACCUMULATOR repeats_so_far: keep track of the number of adjacent
# characters that are identical
repeats_so_far = 0
return repeats_so_far
o Since we are indexing string[i + 1], our loop variable i only needs to go up to n – 2
rather than n – 1
- We could not have implemented the above function using an element-based for loop
o We would not be able to access the character adjacent the current one
counts stores the number of coins of each type, and denominations stores the
value of each coin type. Each element in counts corresponds to the element at
the same index in denoms.
Preconditions:
- len(counts) == len(values)
return money_so_far
return sum_so_far
- Without using sum, we need another for loop:
def sum_all(lists_of_numbers: List[List[int]]) -> int:
“””…”””
# ACCUMULATOR sum_so_far: keep track of the running sum of the numbers.
sum_so_far = 0
return sum_so_far
o for number in numbers loops is nested within the for numbers in
lists_of_numbers
o If we call our doctest example, sum_all([[1, 2, 3], [10, -5], [100]]), the following
happens:
1. The assignment statement sum_so_far = 0 execute, creating our
accumulator variable
2. The outer loop is reached
• The loop variable list_of_numbers is assigned the first element in
lists_of_numbers, which is [1, 2, 3]
• Then, the body of the outer loop is executed. Its body is just 1
statement: the inner for loop, for number in numbers
o The inner loop variable number is assigned the first value
in numbers, which is 1
o The inner loop body gets executed, updating the
accumulator. sum_so_far is reassigned to 1 (since 0 + 1 ==
1)
o The inner loop iterates twice more, for number = 2 and
number 3. At each iteration, the accumulator is updated,
first by adding 2 and then 3. At this point, sum_so_far = 6
(which is 0 + 1 + 2 + 3)
o After all 3 iterations of the inner loop occur, the inner loop
stops. The Python interpreter is done executing this
statement.
• The next iteration of the outer loop occurs; numbers is assigned to
the list [10, -5]
• Again, the body of the outer loop occurs
o The inner loop now iterates twice: for number = 10 and
number = -5. sum_so_far is reassigned twice more, with a
final value of 11 (which is 6 + 10 + -5)
• The outer loop iterates one more time, for numbers = [100]
• Again, the body of the outer loop occurs
o The inner loop iterates once, for number = 100.
sum_so_far is reassigned to 111 (which is 11 + 100)
• At last, there are no more iterations of the outer loop, and so it
stops
3. After the outer loop is done, the return statement executes, returning
the value of sum_so_far, which is 111
o The above behaviour can be summarized by a loop accumulation table
Inner
Outer Outer Loop Inner
Loop Accumulator
Loop Variable Loop
Variable (sum_so_far)
Iteration (list_of_numbers) Iteration
(number)
0 0
1 [1, 2, 3] 0 1 0
1 [1, 2, 3] 1 2 1
1 [1, 2, 3] 2 3 3
1 [1, 2, 3] 3 6
2 [10, -5] 0 10 6
2 [10, -5] 1 -5 16
2 [10, -5] 2 11
3 [100] 0 100 11
3 [100] 1 111
for x in set1:
for y in set2:
product_so_far = set.union(product_so_far, {(x, y)}_
return product_so_far
o Loop accumulation table
Outer Inner
Outer Loop Inner Loop Accumulator
Loop Loop
Iteration Iteration (product_so_far)
Var (x) Var (y)
0 set()
1 10 0 set()
1 10 1 5 {(10, 5)}
1 10 2 6 {(10, 5), (10, 6)}
1 10 3 7 {(10, 5), (10, 6), (10, 7)}
2 11 0 {(10, 5), (10, 6), (10, 7)}
{(10, 5), (10, 6), (10, 7),
2 11 1 5
(11, 5)}
{(10, 5), (10, 6), (10, 7),
2 11 2 6
(11, 5), (11, 6)}
{(10, 5), (10, 6), (10, 7),
2 11 3 7
(11, 5), (11, 6), (11, 7)}
Outer and Inner Accumulators
- Each loop can have its own accumulator
- Example: suppose we have a list of lists of integers called grades
grades = [
[70, 75, 80], # ENG196
[70, 80, 90, 100], # CSC110
[80, 100] # MAT137
]
o Each element of grades corresponds to a course and contains a list of grades
obtained in that course
o The list of grades for course ENG196 does not have the same length as CSC110 or
MAT137
- We have defined a function average that calculates the average to a list of int
- Goal: return a new list containing the average grade of each course
- We can calculate a list of averages for each course using a comprehension:
def course_averages_comprehension(grades: List[List[int]]) -> List[float]
“””Return a new list for which each element is the average of the grades in the
inner list at the corresponding position of grades.
>>> course_averages_comprehension([[70, 75, 80], [70, 80, 90, 100], [80, 100]])
[75.0, 85.0, 90.0]
“””
return [average(course_grades) for course_grades in grades]
- We can translate the above function into a for loop using a list accumulator variable and
list concatenation for the update
def course_averages_loop(grades: List[List[int]]) -> List[float]
“””…”””
# ACCUMULATOR averages_so_far: keep track of the averages of the lists
# visited so far in grades
average_so_far = []
return averages_so_far
- We can also implement the function without using the average function by expanding
the definition of average directly in the loop body
def course_averages(grades: List[List[int]]) -> List[float]
“””…”””
# ACCUMULATOR averages_so_far: keep track of the averages of the lists
# visited so far in grades
average_so_far = []
return averages_so_far
- The inner loop accumulators are assigned to inside the body of the outer loop rather
than at the top of the function body
o This is because len_so_far and total_so_far are specific to course_grades, which
changes at each iteration of the outer loop
o The statements len_so_far = 0 and total_so_far = 0 act to “reset” these
accumulators for each new course_grades list
- Loop accumulation table
Inner
Outer Outer Loop Inner Inner Inner Outer
Loop
Loop Variable Loop Accumulator Accumulator Accumulator
Variable
Iteration (course_grades) Iteration (len_so_far) (total_so_far) (averages_so_far)
(grade)
0 []
1 [70, 75, 80] 0 0 0 []
1 [70, 75, 80] 1 70 1 70 []
1 [70, 75, 80] 2 75 2 145 []
1 [70, 75, 80] 3 80 3 225 [75.0]
2 [70, 80, 90, 100] 0 0 0 [75.0]
2 [70, 80, 90, 100] 1 70 1 70 [75.0]
2 [70, 80, 90, 100] 2 80 2 150 [75.0]
2 [70, 80, 90, 100] 3 90 3 240 [75.0]
2 [70, 80, 90, 100] 4 100 4 340 [75.0, 85.0]
3 [80, 100] 0 0 0 [75.0, 85.0]
3 [80, 100] 1 80 1 80 [75.0, 85.0]
3 [80, 100] 2 100 2 180 [75.0, 85.0, 90.0]
Variable Reassignment
- Assignment statement – __ = __
o Takes a variable name on the left side and an expression on the right side, and
assigns the value of the expression to the variable
- Variable reassignment – assigns a value to a variable that already refers to a value
>>> x = 1
>>> x = 5 # The variable x is reassigned on this line
- A variable reassignment changes which object a variable refers to
o x changes from referring to an object representing 1 to an object representing 5
- Used to update the accumulator variable inside the loop
Object Mutation
- Object mutation – an operation that changes the value of an existing object
o Python’s list data type contains several methods that mutate the given list object
rather than create a new one
- Function squares without mutation
def squares(nums: List[int]) -> List[int]:
“””Return a list of the squares of the given numbers.”””
squares_so_far = []
Mutating sets
- Two main mutating methods:
o set.add
o set.remove
- Re-implement our squares function with set instead of list
def squares(numbers: Set[int]) -> Set[int]:
“””…”””
squares_so_far = set()
for n in numbers:
set.add(squares_so_far, n * n)
return squares_so_far
o set.add will only add the element if the set does not already contain it
o Sets are unordered whereas list.append will add the element to the end of the
sequence
Mutating Dictionaries
- To mutate a dictionary,
o Add a new key-value pair
>>> items = {‘a’: 1, ‘b’: 2}
>>> items[‘c’] = 3
>>> items
{‘a’: 1, ‘b’: 2, ‘c’: 3}
The left side of the assignment is not a variable but instead an expression
representing a component of items (i.e. the key ‘c’ in the dictionary)
When this assignment statement is evaluated, the right side value 3 is
stored in the dictionary items as the corresponding value for ‘c’
o Change the associated value for a key-value pair
>>> items[‘a’] = 100
>>> items
{‘a’: 100, ‘b’: 2, ‘c’: 3}
The assignment statement takes an existing key-value pair and replaces
the value with a different one
Representation Invariants:
- self.age >= 0
“””
given_name: str
family_name: str
age: int
address: str
- We mutate instances of data classes by modifying their attributes
o We do this by assigning to their attributes directly, using dot notation on the left
side of an assignment statement
>>> p = Person(‘David’, ‘Liu’, 100, ’40 St. George Street’)
>>> p.age = 200
>>> p
Person(given_name=’David’, family_name=’Liu’, age=200, address=’40 St.
George Street’)
- Respect the representation invariants when mutating data class instances
Representing Objects
- Every piece of data is stored in a Python program in an object
- We cannot control which memory addresses are used to store objects, but we can
access a representation of this memory address using the built-in id function
>>> id(3)
1635361280
>> id(‘words’)
4297547872
- Id – a unique int identifier to refer to the object
- Every object in Python has three important properties:
o Id
The only one among the three guaranteed to be unique
o Value
o Type
- A variable is not an object and so does not actually store data
o Variables store an id hat refers to an object that stores data
i.e. variables contain the id of an object
- With a full object-based Python memory model, we draw one table-like structure on the
left showing the mapping between variables and object ids, and the the objects on the
right
o Each object is represented as a box, with its id in the upper-left corner, type in
the upper-right corner, and value in the middle
o The actual object id reported by the id function is unimportant
We just need to know that each object has a unique identifier
>>> x = 3
>>> word = ‘bonjour’
id92 int
__main__
3
x id92
id5 str
word id5
‘bonjour’
o There is no 3 inside the box for variable x; instead, there is the id of an object
whose value is 3
- Assignment statements and evaluating expressions
o Evaluating an expression
Produces and id of an object representing the value of the expression
o Assignment statements
1. Evaluate the expression on the right side, yielding the id of an object
2. If the variable on the left side doesn’t already exist, create it
3. Store the id from the expression on the right side in the variable on the
left side
Representing Compound Data
- An instance of a compound data type does not store values directly
o Instead, it stores the ids of other objects
- Lists
o lst = [1, 2, 3]
id4 list
0 1 2
__main__
id10 id11 id12
lst id4
id10 int id11 int id12 int
1 2 3
4 separate objects on the diagram:
• One for each of the ints 1, 2, 3
• One for the list
- Sets
o my_set = {1, 2, 3}
id4 set
0 1 2
__main__
id10 id11 id12
my_set id4
id10 int id11 int id12 int
1 2 3
- Dictionaries
o my_dict = {‘a’: 1, ‘b’: 2}
id10 str id11 str
id2 dict
__main__ ‘a’ ‘b’
id10 : id12
my_dict id2 id12 int id13 int
id11 : id13
1 2
5 objects in total
- Data classes
o For the Person object
id7 Person id11 str id13 str
given_name id11 ‘David’ ‘Liu’
__main__
family_name id13
david id7 id72 int id8 str
age id72
100 ‘123 Fake St’
address id8
- Use the convention of drawing a double box around objects that are immutable
Visualizing Variable Reassignment and Object Mutation
- Consider this case of variable reassignment
>>> s = [1, 2]
>>> s = [‘a’, b’]
o Before reassignment
id4 list
__main__ id10 int id11 int
0 1
s id4 1 2
id10 id11
o After reassignment
id4 list
id10 int id11 int
0 1
1 2
__main__ id10 id11
s id40 id40 list
id70 str id71 str
0 1
‘a’ ‘b’
id70 id71
o The original list object [1, 2] is not mutated
Variable reassignment does not mutate any objects
What the variable refers to is changed
- Consider this case of object mutation
>>> s = [1, 2]
>>> list.append(s, 3)
o Before mutation
id4 list
__main__ id10 int id11 int
0 1
s id4 1 2
id10 id11
o After mutation
id10 int id11 int
id4 list
__main__ 1 2
0 1 2
s id4 id80 int
id10 id11 id80
3
o No new list object is created
The list object [1, 2] is mutated, and a third id is added at its end
o The id of s is not changed despite changing in size
- Consider this case of assigning to part of a compound data type
>>> s = [1, 2]
>>> s[1] = 300
o Before mutation
id4 list
__main__ id10 int id11 int
0 1
s id4 1 2
id10 id11
o After mutation
id10 int id11 int
id4 list
__main__ 1 2
0 1
s id4 id80 int
id10 id80
300
o Rather than reassigning a variable, it reassigns an id that is part of an object
o This statement does mutate an object, and doesn’t reassign any variables
Aliasing
- Let v1 and v2 be Python variables. v1 and v2 are aliases when they refer to the same
object
- Consider the following
>>> x = [1, 2, 3]
>>> y = [1, 2, 3]
>>> z = x # make z refer to the object that x refers to
o x and z are aliases, as they both reference the same object
They have the same id
o x and y are two different list objects stored separately in the computer’s memory
Variable Reassignment
- Example of variable reassignment
>>> x = (1, 2, 3)
>>> z = x
>>> z = (1, 2, 3, 40)
>>> x
(1, 2, 3)
o When we change z on the third line, x does not change this time
o We reassigned z to a new object, which has no effect on the object that x refers
to
- Reassigning breaks the aliasing
o Afterwards the two variables would no longer refer to the same object
Stack Frames
- Suppose we define the following function, and then call it in the Python console
def repeat(n: int, s:str) -> str:
message = s * n
return message
Intro
- If a function’s documentation does not specify that an object will be mutated, then it
must not be mutated
@given(lst=lists(integers()))
def test_squares_no_mutation_general(lst: List[int]) -> None:
“””Test that squares does not mutate the list it is given.”””
lst_copy = list.copy(lst) # Create a copy of lst (not an alias)
squares(lst)
Modular Arithmetic
- We often care about the remainder when we divide a number by another
- 𝐷𝑒𝑓𝑖𝑛𝑖𝑡𝑖𝑜𝑛. Let 𝑎, 𝑏, 𝑛 ∈ ℤ with 𝑛 ≠ 0. We say that 𝑎 is equivalent to 𝑏 modulo 𝑛
when 𝑛 | 𝑎 − 𝑏. In this case, we write 𝑎 ≡ 𝑏 (mod 𝑛)
- Modular equivalence can be used to divide up numbers based on their remainders when
divided by 𝑛
o Let 𝑎, 𝑏, 𝑛 ∈ ℤ with 𝑛 ≠ 0. Then 𝑎 ≡ 𝑏 (mod 𝑛) if and only if 𝑎 and 𝑏 have the
same remainder when divided by 𝑛
- Let 𝑎, 𝑏, 𝑐, 𝑛 ∈ ℤ with 𝑛 ≠ 0. Then the following hold:
o 𝑎 ≡ 𝑎 (mod 𝑛)
o If 𝑎 ≡ 𝑏 (mod 𝑛) then 𝑏 ≡ 𝑎 (mod 𝑛)
o If 𝑎 ≡ 𝑏 (mod 𝑛) and 𝑏 ≡ 𝑐 (mod 𝑛) the 𝑎 ≡ 𝑐 (mod 𝑛)
- Let 𝑎, 𝑏, 𝑐, 𝑑, 𝑛 ∈ ℤ with 𝑛 ≠ 0. If 𝑎 ≡ 𝑐 (mod 𝑛) and 𝑏 ≡ 𝑑 (mod 𝑛), then the following
hold:
o 𝑎 + 𝑏 ≡ 𝑐 + 𝑑 (mod 𝑛)
o 𝑎 − 𝑏 ≡ 𝑐 − 𝑑 (mod 𝑛)
o 𝑎𝑏 ≡ 𝑐𝑑 (mod 𝑛)
- Addition, subtraction, and multiplication operations preserve modular equivalence
relationships
o However, this is not the case with division
Intro
- Mathematical proof – how we communicate ideas about the truth or falsehood of a
statement to others
- A proof is made of communication, from the person creating the proof to the person
digesting it
- Audience of our proof: an average computer science student
o Formal
o No assuming much background knowledge
First Examples
- Four parts that leads to a completed proof:
o 1. The statement that we want to prove or disprove
o 2. A translation of the statement into predicate logic
Provides insight into the logical structure of the statement
o 3. A discussion to try to gain some intuition about why the statement is true
Informal
Usually reveals the mathematical insight that forms the content of a
proof
The hardest part of developing a proof
o 4. A formal proof
The “final product” of our earlier work
- Ex. Prove that 23 | 115
o 𝑇𝑟𝑎𝑛𝑠𝑙𝑎𝑡𝑖𝑜𝑛. We will expand the definition of divisibility to rewrite this
statement in terms of simpler operations:
∃k ∈ Z, 115 = 23k
o 𝐷𝑖𝑠𝑐𝑢𝑠𝑠𝑖𝑜𝑛. We just need to divide 115 by 23
o 𝑃𝑟𝑜𝑜𝑓. let 𝑘 = 5
o Then 115 = 23 ⋅ 5 = 23 ⋅ 𝑘 QED
- A typical proof of an existential
o Given statement to prove: ∃𝑥 ∈ 𝑆, 𝑃(𝑥)
o 𝑃𝑟𝑜𝑜𝑓. Let 𝑥 = ______
o [Proof that P(____) is True.] QED
o The two blanks represent the same element of 𝑆, which we get to choose as a
prover
- Ex. Prove that there exists an integer that divides 104
o 𝑇𝑟𝑎𝑛𝑠𝑙𝑎𝑡𝑖𝑜𝑛. We could write ∃𝑎 ∈ ℤ, 𝑎 ∣ 104. Expanding the definition of
divisibility,
∃𝑎, 𝑘 ∈ ℤ, 104 = 𝑎𝑘
o 𝐷𝑖𝑠𝑐𝑢𝑠𝑠𝑖𝑜𝑛. We get to pick both 𝑎 and 𝑘. Any pair of divisors will work
o 𝑃𝑟𝑜𝑜𝑓. Let 𝑎 = −2 and let 𝑘 = −52
o Then 104 = 𝑎𝑘 QED
- A mathematical proof must introduce all variables contained in the sentence being
proven
Alternating Quantifiers
- Ex. Prove that all integers are divisible by 1
o 𝑇𝑟𝑎𝑛𝑠𝑙𝑎𝑡𝑖𝑜𝑛. The statement contains a universal quantification: ∀𝑛 ∈ ℤ, 1 ∣ 𝑛.
Unpacking the definition of divisibility,
∀𝑛 ∈ ℤ, ∃ 𝑘 ∈ ℤ, 𝑛 = 1 ⋅ 𝑘
o 𝐷𝑖𝑠𝑐𝑢𝑠𝑠𝑖𝑜𝑛. The statement is valid when 𝑘 equals 𝑛. Introduce the variables in
the same order they are quantified in the statement
o 𝑃𝑟𝑜𝑜𝑓. Let 𝑛 ∈ ℤ. Let 𝑘 = 𝑛
o Then 𝑛 = 1 ⋅ 𝑛 = 1 ⋅ 𝑘 QED
- A typical proof of a universal
o Given statement to prove: ∀𝑥 ∈ 𝑆, 𝑃(𝑥)
o 𝑃𝑟𝑜𝑜𝑓. Let 𝑥 ∈ 𝑆. (i.e. let 𝑥 be an arbitrary element of 𝑆)
o [Proof that 𝑃(𝑥) is True]. QED
- Any existentially-quantified variable can be assigned a value that depends on the
variables defined before it
o In programming, we first initialize a variable n, and then define a new variable k
that is assigned the value of n
- The order of variables in the statement determines the order in which the variables
must be introduced in the proof
o And hence which variables can depend on which other variables
- We cannot use a variable before it’s defined
Intro
- Function that determines whether p is prime
def is_prime(p: int) -> bool:
“””Return whether p is prime.”””
possible_divisors = range(1, p+1)
return (
p > 1 and
all({d == 1 or d == p for d in possible_divisors if divides(d, p)})
)
o It is a direct translation of the mathematical definition of prime numbers, with
the only difference being our restriction of the range of possible divisors
o This algorithm is inefficient because it checks more numbers than necessary
o The range of possible divisors extends only to the square root of the input p
from math import floor, sqrt
Intro
- 𝐷𝑒𝑓𝑖𝑛𝑖𝑡𝑖𝑜𝑛. Let 𝑥, 𝑦, 𝑑 ∈ ℤ. We say that 𝑑 is a common divisor of 𝑥 and 𝑦 when 𝑑
divides 𝑥 and 𝑑 divides 𝑦
- We say that 𝑑 is the greatest common divisor of 𝑥 and 𝑦 when it is the largest number
that is a common divisor or 𝑥 and 𝑦, or 0 when 𝑥 and 𝑦 are both 0
- We can define the function gcd∶ ℤ × ℤ → ℕ as the function which takese numbers 𝑥 and
𝑦, and returns their greatest common divisor
- If 𝑒 is any number which divides 𝑚 and 𝑛, then 𝑒 ≤ 𝑑
- Let 𝑚, 𝑛, 𝑑 ∈ ℤ, and suppose 𝑑 = gcd(𝑚, 𝑛), then 𝑑 satisfies the following:
(𝑚 = 0 ∧ 𝑛 = 0 ⟹ 𝑑 = 0) ∧
(𝑚 ≠ 0 ∨ 𝑛 ≠ 0 ⟹ 𝑑 ∣ 𝑚 ∧ 𝑑 ∣ 𝑛 ∧ ( ∀𝑒 ∈ ℕ, 𝑒 ∣ 𝑚 ∧ 𝑒 ∣ 𝑛 ⟹ 𝑒 ≤ 𝑑 ))
- Ex. Prove that for all integers 𝑝 and 𝑞, if 𝑝 and 𝑞 are distinct primes, then 𝑝 and 𝑞 are
coprime, meaning gcd(𝑝, 𝑞) = 1
o 𝑇𝑟𝑎𝑛𝑠𝑙𝑎𝑡𝑖𝑜𝑛. Structure of the above statement:
∀𝑝, 𝑞 ∈ ℤ, (𝑃𝑟𝑖𝑚𝑒(𝑝) ∧ 𝑃𝑟𝑖𝑚𝑒(𝑞) ∧ 𝑝 ≠ 𝑞) ⟹ gcd(𝑝, 𝑞) = 1
o We could unpack the definitions of 𝑃𝑟𝑖𝑚𝑒 and gcd, but it would not be
necessary
To show that gcd(𝑝, 𝑞) = 1, we just need to make sure that neither 𝑝 nor
𝑞 divides the other
o 𝑃𝑟𝑜𝑜𝑓. Let 𝑝, 𝑞 ∈ ℤ. Assume that 𝑝 and 𝑞 are both prime, and that 𝑝 ≠ 𝑞. We
want to prove that gcd(𝑝, 𝑞) = 1
o By the definition of prime, 𝑝 ≠ 1 (since 𝑝 > 1)
o The only positive divisors of 𝑞 are 1 and 𝑞 itself
o Since we assumed 𝑝 ≠ 𝑞 and concluded 𝑝 ≠ 1, we know that 𝑝 ∤ 𝑞
o Since we know that 1 divides every number, 1 is the only positive common
divisor of 𝑝 and 𝑞, so gcd(𝑝, 𝑞) = 1 QED
6.6 Proofs and Algorithms II: Computing the Greatest Common Divisor
while y != 0:
r=x%y
x, y = y, r
return x
- Documenting loop properties: loop invariants
o The Euclidean Algorithm relies on a key property – gcd(x, y) == gcd(y, x % y)
Even though x and y change, their gcd doesn’t
o gcd(x, y) == gcd(a, b)
This statement is called a loop invariant
Loop invariant – a property about loop variables that must be true at the
start and end of each loop iteration
o By convention, we document loop invariants at the top of a loop body using an
assert statement
def euclidean_gcd(a: int, b: int) -> int:
“””Return the gcd of a and b.”””
x, y = a, b
while y != 0:
# assert naive_gcd(x, y) == naive_gcd(a, b) # loop invariant
r=x%y
x, y = y, r
return x
o After the loop stops, the loop invariant should tell us that gcd(x, 0) == gcd(a, b),
and so we know that x == gcd(a, b), which is why x is returned
o To know for sure whether a loop invariant is correct, we need a proof
Intro
- 𝐷𝑒𝑓𝑖𝑛𝑖𝑡𝑖𝑜𝑛. Let 𝑎, 𝑏, 𝑛 ∈ ℤ, and assume 𝑛 ≠ 0. We say that 𝑎 is equivalent to 𝑏 modulo
𝑛 when 𝑛 ∣ 𝑎 − 𝑏. In this case, we write 𝑎 ≡ 𝑏 (mod 𝑛)
o 𝑎 and 𝑏 have the same remainder when divided by 𝑛
- Theorem. For all 𝑎, 𝑏, 𝑐, 𝑑, 𝑛 ∈ ℤ, if 𝑛 ≠ 0, if 𝑎 ≡ 𝑐 (mod 𝑛) and 𝑏 ≡ 𝑑 (mod 𝑛), then:
1. 𝑎 + 𝑏 ≡ 𝑐 + 𝑑 (mod 𝑛)
2. 𝑎 − 𝑏 ≡ 𝑐 − 𝑑 (mod 𝑛)
3. 𝑎𝑏 ≡ 𝑐𝑑(mod 𝑛)
o 𝑇𝑟𝑎𝑛𝑠𝑙𝑎𝑡𝑖𝑜𝑛 1.
∀𝑎, 𝑏, 𝑐, 𝑑, 𝑛 ∈ ℤ, (𝑛 ≠ 0 ∧ ( 𝑛 ∣ 𝑎 − 𝑐 ) ∧ ( 𝑛 ∣ 𝑏 − 𝑑 )) ⟹
𝑛 ∣ (𝑎 + 𝑏) − (𝑐 + 𝑑)
o 𝑃𝑟𝑜𝑜𝑓 1. Let 𝑎, 𝑏, 𝑐, 𝑑, 𝑛 ∈ ℤ. Assume that 𝑛 ≠ 0, 𝑛 ∣ 𝑎 − 𝑐, and 𝑛 ∣ 𝑏 − 𝑑. This
means we want to prove that 𝑛 ∣ (𝑎 + 𝑏) − (𝑐 + 𝑑).
o By the Divisibility of Linear Combinations Theorem, since 𝑛 ∣ (𝑎 − 𝑐) and
𝑛 ∣ (𝑏 − 𝑑), it divides their sum:
𝑛 ∣ (𝑎 − 𝑐) + (𝑏 − 𝑑)
𝑛 ∣ (𝑎 + 𝑏) − (𝑐 + 𝑑)
QED
o 𝑇𝑟𝑎𝑛𝑠𝑙𝑎𝑡𝑖𝑜𝑛 2.
∀𝑎, 𝑏, 𝑐, 𝑑, 𝑛 ∈ ℤ, (𝑛 ≠ 0 ∧ ( 𝑛 ∣ 𝑎 − 𝑐 ) ∧ ( 𝑛 ∣ 𝑏 − 𝑑 )) ⟹
𝑛 ∣ (𝑎 − 𝑏) − (𝑐 − 𝑑)
o 𝑃𝑟𝑜𝑜𝑓 2. Let 𝑎, 𝑏, 𝑐, 𝑑, 𝑛 ∈ ℤ. Assume that 𝑛 ≠ 0, 𝑛 ∣ 𝑎 − 𝑐, and 𝑛 ∣ 𝑏 − 𝑑. This
means we want to prove that 𝑛 ∣ (𝑎 − 𝑏) − (𝑐 − 𝑑).
o By the Divisibility of Linear Combinations Theorem, since 𝑛 ∣ (𝑎 − 𝑐) and
𝑛 ∣ (𝑏 − 𝑑), it divides their difference:
𝑛 ∣ (𝑎 − 𝑐) − (𝑏 − 𝑑)
𝑛 ∣ (𝑎 − 𝑏) − (𝑐 − 𝑑)
QED
o 𝑇𝑟𝑎𝑛𝑠𝑙𝑎𝑡𝑖𝑜𝑛 3.
∀𝑎, 𝑏, 𝑐, 𝑑, 𝑛 ∈ ℤ, (𝑛 ≠ 0 ∧ ( 𝑛 ∣ 𝑎 − 𝑐 ) ∧ ( 𝑛 ∣ 𝑏 − 𝑑 )) ⟹
𝑛 ∣ 𝑎𝑏 − 𝑐𝑑
o 𝑃𝑟𝑜𝑜𝑓 3. Let 𝑎, 𝑏, 𝑐, 𝑑, 𝑛 ∈ ℤ. Assume that 𝑛 ≠ 0, 𝑛 ∣ 𝑎 − 𝑐, and 𝑛 ∣ 𝑏 − 𝑑. This
means we want to prove that 𝑛 ∣ 𝑎𝑏 − 𝑐𝑑.
o Expanding the definition of division, we want to show that
∃𝑘 such that 𝑎𝑏 − 𝑐𝑑 = 𝑘𝑛
o Take 𝑘 = 𝑛(𝑞𝑎 𝑞𝑏 − 𝑞𝑐 𝑞𝑑 ) + 𝑟1 (𝑞𝑏 − 𝑞𝑑 ) + 𝑟2 (𝑞𝑎 − 𝑞𝑐 )
o By the Quotient-Remainder Theorem,
𝑎 = 𝑛𝑞𝑎 + 𝑟1
𝑐 = 𝑛𝑞𝑐 + 𝑟1
𝑏 = 𝑛𝑞𝑏 + 𝑟2
𝑑 = 𝑛𝑞𝑑 + 𝑟2
2
𝑎𝑏 = 𝑛 𝑞𝑎 𝑞𝑏 + 𝑛𝑟1 𝑞𝑏 + 𝑛𝑟2 𝑞𝑎 + 𝑟1 𝑟2
𝑐𝑑 = 𝑛 2 𝑞𝑐 𝑞𝑑 + 𝑛𝑟1 𝑞𝑑 + 𝑛𝑟2 𝑞𝑐 + 𝑟1 𝑟2
𝑎𝑏 − 𝑐𝑑 = 𝑛 2 (𝑞𝑎 𝑞𝑏 − 𝑞𝑐 𝑞𝑑 ) + 𝑛𝑟1 (𝑞𝑏 − 𝑞𝑑 ) + 𝑛𝑟2 (𝑞𝑎 − 𝑞𝑐 ) = 𝑘𝑛
QED
Modular Division
- Division does not preserve modular equivalence
- Theorem. (Modular inverse) Let 𝑛 ∈ ℤ+ and 𝑎 ∈ ℤ. If gcd(𝑎, 𝑛) = 1, then there exists
𝑝 ∈ ℤ such that 𝑎𝑝 ≡ 1 (mod 𝑛).
We call this 𝑝 a modular inverse of 𝑎 modulo 𝑛
o 𝑇𝑟𝑎𝑛𝑠𝑙𝑎𝑡𝑖𝑜𝑛. ∀𝑛 ∈ ℤ+ , ∀𝑎 ∈ ℤ, gcd(𝑎, 𝑛) = 1 ⟹ (∃𝑝 ∈ ℤ, 𝑎𝑝 ≡ 1 (mod 𝑛))
o 𝑃𝑟𝑜𝑜𝑓. Let 𝑛 ∈ ℤ+ and 𝑎 ∈ ℤ. Assume gcd(𝑎, 𝑛) = 1
o Since gcd(𝑎, 𝑛) = 1, by the GCD Characterization Theorem we know that there
exist integers 𝑝 and 𝑞 such that 𝑝𝑎 + 𝑞𝑛 = gcd(𝑎, 𝑛) = 1
o Rearranging the equation, we get that 𝑝𝑎 − 1 = −𝑞𝑛, and so (by the definition
of divisibility, taking 𝑘 = −𝑞), 𝑛 ∣ 𝑝𝑎 − 1
o Then by the definition of modular equivalence, 𝑝𝑎 ≡ 1 (mod 𝑛) QED
- Ex. Let 𝑎 ∈ ℤ and 𝑛 ∈ ℤ+ . If gcd(𝑎, 𝑛) = 1, then for all 𝑏 ∈ ℤ, there exists 𝑘 ∈ ℤ such
that 𝑎𝑘 ≡ 𝑏 (mod 𝑛)
o 𝑇𝑟𝑎𝑛𝑠𝑙𝑎𝑡𝑖𝑜𝑛. ∀𝑎, 𝑛 ∈ ℤ, gcd(𝑎, 𝑛) = 1 ⟹ (∀𝑏 ∈ ℤ, ∃𝑘 ∈ ℤ, 𝑎𝑘 ≡ 𝑏 (mod 𝑛))
o 𝐷𝑖𝑠𝑐𝑢𝑠𝑠𝑖𝑜𝑛. This is saying that under the given assumptions, 𝑏 is “divisible” by 𝑎
modulo 𝑝.
o Since it is assumed that gcd(𝑎, 𝑛) = 1, we can use the modular inverses
theorem, which gives us a 𝑝 ∈ ℤ such that 𝑎𝑝 ≡ 1 (mod 𝑛)
Looks like we can multiply both sides by 𝑏
o 𝑃𝑟𝑜𝑜𝑓. Let 𝑎 ∈ ℤ and 𝑛 ∈ ℤ+. Assume gcd(𝑎, 𝑛) = 1, and let 𝑏 ∈ ℤ. We want to
prove that there exists 𝑘 ∈ ℤ such that 𝑎𝑘 ≡ 𝑏 (mod 𝑛)
o First, using the Modular Inverses theorem, since we assumed gcd(𝑎, 𝑛) = 1,
there exists 𝑝 ∈ ℤ such that 𝑎𝑝 ≡ 1 (mod 𝑛)
o Second, since modular equivalence preserves multiplication, 𝑎𝑝𝑏 ≡ 𝑏 (mod 𝑛)
o Let 𝑘 = 𝑝𝑏, we have that 𝑎𝑘 ≡ 𝑏 (mod 𝑛) QED
Cryptography
- Cryptography – study of theoretical and practical techniques for keeping data secure
- Encryption involves turning coherent messages into seemingly-random nonsensical
strings, and then back again
Preconditions:
- len(c) == 1 and c in LETTERS
“””
return str.index(LETTERS, c)
Preconditions:
- 0 <= n < len(LETTERS)
“””
return LETTERS[n]
- In the Caesar cipher, the secret key 𝑘 is an integer from the set {1, 2, …, 26}
- Before sending any messages, Alice and Bob meet and decide on a secret key from the
set
- When Alice wants to send a string message 𝑚 to Bob, encrypts her message as follows:
o For each letter of 𝑚, Alice shifts it by adding the secret key 𝑘 to its
corresponding numbers, taking remainders modulo 27, the length of LETTERS
o i.e. if 𝑘 = 3, and the plaintext message is ‘HAPPY’, encryption happens as
follows:
Plaintext Corresponding Shifted Ciphertext
Character Integer Integer Character
‘H’ 7 10 ‘K’
‘A’ 0 3 ‘D’
‘P’ 15 18 ‘S’
‘P’ 15 18 ‘S’
‘Y’ 24 0 ‘A’
o When Bob receives the ciphertext ‘KDSSA’, he decrypts the ciphertext by
applying the corresponding shift in reverse
i.e. subtracting the secret key 𝑘 instread of adding it
o We can implement the above example in Python
def encrypt_caesar(k: int, plaintext: str) -> str:
“””Return the encrypted message using the Caesar cipher with key k.
Preconditions:
- all({x in LETTERS for x in plaintext})
- 1 <= k <= 26
“””
l = len(LETTERS)
ciphertext = ‘’
for letter in plaintext:
ciphertext += num_to_letter((letter_to_num(letter) + k) % l))
return ciphertext
Preconditions:
- all({x in LETTERS for x in plaintext})
- 1 <= k <= 26
“””
l = len(LETTERS)
plaintext = ‘’
return plaintext
Preconditions:
- all({ord(c) < 128 for c in plaintext})
- 1 <= k <= 127
“””
ciphertext = ‘’
return ciphertext
Preconditions:
- all({ord(c) < 128 for c in plaintext})
- 1 <= k <= 127
“””
plaintext = ‘’
return plaintext
- The Caesar cipher is not secure
o An eavesdropper can try all possible secret keys to decrypt a ciphertext
Intro
- The Caesar cipher should never be used in practice
o Consider the ciphertext ‘0LaT0+T^+NZZW’
The 1st and the 5th letters in the plaintext must be the same
The 1st and 10th characters of the plaintext must be consecutive ASCII
characters
o Vulnerable to a brute-force exhaustive key search attack
Given a ciphertext, it is possible to try out every secret key and see which
key yields a meaningful plaintext message
Stream Ciphers
- Stream cipher – a type of symmetric-key cryptosystem that emulate a one-time pad but
share a much smaller secret key
- The shared secret key is small, and both parties use an algorithm to generate an
arbitrary number of new random characters, based on both the secret key and any
previously-generated characters
- Do not have perfect secrecy, since the characters used in encryption aren’t truly
random, though it can appear “random” if the algorithm is good enough
7.3. Computing Shared Secret Keys
Intro
- Limitation of symmetric-key encryption: a secret key needs to be established for every
pair of people who want to communicate
o If there are 𝑛 people who each want to communicate securely with each other,
𝑛(𝑛−1)
there are keys needed
2
- Public-key cryptosystem – each person has two keys:
o A private key known only to them
o A public key known to everyone
Public-Key Cryptography
- Suppose Alice want to send Bob a message
o Alice uses Bob’s public key to encrypt the message
o Bob uses his private key to decrypt the message
- A secure public-key cryptosystem has the following parts:
o A set 𝑃 of possible original messages, called plaintext messages (e.g. a set of
strings)
o A set 𝐶 of possible encrypted messages, called ciphertext messages (e.g. another
set of strings)
o A set 𝐾1 of possible public keys and a set 𝐾2 of possible private keys
o A subset 𝐾 ⊆ 𝐾1 × 𝐾2 of possible public-private key pairs
Not every public key can be paired with every private key
o Two functions 𝐸𝑛𝑐𝑟𝑦𝑝𝑡 ∶ 𝐾1 × 𝑃 → 𝐶 and 𝐷𝑒𝑐𝑟𝑦𝑝𝑡 ∶ 𝐾2 × 𝐶 → 𝑃 that satisfy
the following two properties:
Correctness – For all (𝑘1 , 𝑘2 ) ∈ 𝐾 and 𝑚 ∈ 𝑃,
𝐷𝑒𝑐𝑟𝑦𝑝𝑡(𝑘2 , 𝐸𝑛𝑐𝑟𝑦𝑝𝑡(𝑘1 , 𝑚)) = 𝑚
• i.e. if we encrypt and then decrypt the same message with a
public-private key pair, we get back the original message
Security – For all (𝑘1 , 𝑘2 ) ∈ 𝐾 and 𝑚 ∈ 𝑃, if an eavesdropper only knows
the values of the public key 𝑘1 and the ciphertext 𝑐 = 𝐸𝑛𝑐𝑟𝑦𝑝𝑡(𝑘1 , 𝑚)
but does not know 𝑘2 , it is computationally infeasible to find the
plaintext message 𝑚
Key Generation
- Assume that prime numbers 𝑝 and 𝑞 are given
def rsa_generate_key(p: int, q: int) -> Tuple[Tuple[int, int, int], Tuple[int, int]]:
“””Return an RSA key pair generated using primes p and q.
Preconditions:
- p and q are prime
- p != q
“””
# Compute the product of p and q
n=p*q
Preconditions:
- public_key is a valid RSA public key (n, e)
- 0 < plaintext < public_key[0]
“””
n, e = public_key
encrypted = (plaintext ** e) % n
return encrypted
Preconditions:
- private_key is a valid RSA private key (p, q, d)
- 0 < ciphertext < private_key[0] * private_key[1]
“””
p, q, d = private_key
n=p*q
decrypted = (ciphertext ** d) % n
return decrypted
Preconditions:
- public_key is a valid RSA public key (n, e)
- all({0 < ord(c) < public_key[0] for c in plaintext})
“””
n, e = public_key
encrypted = ‘’
for letter in plaintext:
# Note: we could have also used our rsa_encrypt function here instead
encrypted = encrypted + chr((ord(letter) ** e) % n)
return encrypted
Preconditions:
- private_key is a valid RSA private key (p, q, d)
- all({0 < ord(c) < private_key[0] * private_key[1] for c in ciphertext})
“””
p, q, d = private_key
n=p*q
decrypted = ‘’
for letter in ciphertext:
# Note: we could have also used our rsa_decrypt function here instead
decrypt = decrypted + chr((ord(letter) ** d) % n)
return decrypted
(In)effectiveness of Cryptography
- Diffie-Hellman and RSA are secure because it is very difficult to extract the private part
of the data from what is being public communicated
- Unfortunately, many servers use the same group of prime numbers
- Some steps of the Diffie-Hellman algorithm can be precomputed for a specific group of
prime numbers
- 512-bit and 1024-bit keys are prone to the Logjam attack
o 2048-bit keys are used to avoid it
8.1 An Introduction to Running Time
Preconditions:
-n>0
“””
for i in range(0, math.ceil(math.log2(n))):
print(2 ** i)
- The number of calls to print is log 2 𝑛
- The running time of print_powers_of_two is approximately, but not exactly log 2 𝑛
- We say that print_powers_of_two has a logarithmic running time
Basic Operations
- From fastest to slowest:
o Constant running time logarithmic running time linear running time
quadratic running time
- There are different ways of interpreting “basic operations”, i.e.
o We assign a variable at every loop iteration
o print calls take longer than variable assignment
o Calling the function is also an operation
o This can get extremely complicated
- No matter how we interpret “basic operations” we know for sure that linear is faster
than quadratic
Intro
- Big-O is not necessarily an exact description of growth
o i.e. 𝑛 + 10 ∈ 𝒪(𝑛100 ) is not necessarily informative
- Big-O allows us to express upper bounds on the growth of a function
o It does not allow us to distinguish between an upper bound that is tight and one
that vastly overestimates the rate of growth
Intro
- Consider the following function
def print_items(lst: list) -> None:
for item in lst:
print(item)
- We will concentrate on how the size of the input influences the running time of a
program
- We measure running time using asymptotic notation, and not exact expressions
- Basic operation – any block of code whose running time does not depend on the size of
the input
o Including assignment statements, arithmetic calculations, list/string indexing
return sum_so_far
o 𝑅𝑢𝑛𝑛𝑖𝑛𝑔 𝑡𝑖𝑚𝑒 𝑎𝑛𝑎𝑙𝑦𝑠𝑖𝑠. Let 𝑛 be the length of the input list
o This function body consists of three statements
The assignment statement counts as 1 step
The for loop takes 𝑛 steps: it has 𝑛 iterations, and each iteration takes 1
step
The return statement counts as 1 step
o The total running time is the sum of these three parts: 1 + 𝑛 + 1 = 𝑛 + 2, which
is Θ(𝑛) QED
Nested Loops
- Count the number of repeated basic operations in a loop starting with the innermost
loop and working our way out
- Ex. Consider the following function:
def print_sums(lst: list) -> None:
for item1 in lst:
for item2 in lst:
print(item1 + item2)
Perform a runtime analysis of print_sums.
o 𝑅𝑢𝑛𝑛𝑖𝑛𝑔 𝑡𝑖𝑚𝑒 𝑎𝑛𝑎𝑙𝑦𝑠𝑖𝑠. Let 𝑛 be the length of lst
o The inner loop runs 𝑛 times, and each iteration is just a single basic operation
o The outer loop runs 𝑛 times, and each of its iterations take 𝑛 operations
o The total number of basic operations is 𝑅𝑇𝑝𝑟𝑖𝑛𝑡_𝑠𝑢𝑚𝑠 (𝑛) =
steps for the inner loop × number of times inner loop is repeated
=𝑛×𝑛
= 𝑛2
o So the running time of this algorithm is Θ(𝑛 2 ) QED
- Ex. Consider the following function:
def f(lst: List[int]) -> None:
for item in lst:
for i in range(0, 10):
print(item + i)
Perform a runtime analysis of this function.
o 𝑅𝑢𝑛𝑛𝑖𝑛𝑔 𝑡𝑖𝑚𝑒 𝑎𝑛𝑎𝑙𝑦𝑠𝑖𝑠. Let 𝑛 be the length of the input list lst
o The inner loop repeats 10 times, and each iteration is a single basic operation,
for a total of 10 basic operations
o The outer loop repeats 𝑛 times, and each iteration takes 10 steps, for a total of
10𝑛 steps.
o The running time of this function is Θ(𝑛)
o Alternatively, the inner loop’s running time does not depend on the number of
items in the input list, so we can count it as a single basic operation
o The outer loop runs 𝑛 times, and each iteration takes 1 step, for a total of 𝑛
steps, which is Θ(𝑛) QED
- Ex. Analyze the running time of the following function.
def combined(lst: List[int]) -> None:
# Loop 1
for item in lst:
for i in range(0, 10):
print(item + i)
# Loop 2
for item1 in lst:
for item2 in lst:
print(item1 + item2)
o 𝑅𝑢𝑛𝑛𝑖𝑛𝑔 𝑡𝑖𝑚𝑒 𝑎𝑛𝑎𝑙𝑦𝑠𝑖𝑠. Let 𝑛 be the length of lst. We have already seen that
the first loop runs in time Θ(𝑛), while the second loop runs in time Θ(𝑛 2 )
o By the Sum of functions theorem from the previous section, we can conclude
that combined runs in time Θ(𝑛 2 ). QED
Comprehensions
- Consider the following function
def square_all(numbers: List[int]) -> List[int]:
“””Return a new list containing the squares of the given numbers”””
return [x ** 2 for x in numbers]
o 𝑅𝑢𝑛𝑛𝑖𝑛𝑔 𝑡𝑖𝑚𝑒 𝑎𝑛𝑎𝑙𝑦𝑠𝑖𝑠. We analyze it in the same way as a for loop
1. We determine the number of steps required to evaluate the leftmost
expression in the comprehension. In this case, evaluating x ** 2 takes 1
step
2. The collection that acts as the source of the comprehension (i.e.
numbers) determines how many times the leftmost expression is
evaluated
o Let 𝑛 be the length of the input list numbers. The comprehension expression
takes 𝑛 steps
1 step per element of numbers
o The running time of square_all is 𝑛 steps, which is Θ(𝑛) QED
- The same analysis would hold in the above function if we had used a set or dictionary
comprehension instead
While Loops
- Analysing the running time of code involving while loops follows the same principle as
for loops
o We calculate the sum of the different loop iterations (by
multiplication/summation)
- Ex. Analyse the running time of the following function:
def my_sum_v2(numbers: List[int]) -> int:
“””Return the sum of the given numbers.”””
sum_so_far = 0
i=0
while i < len(numbers):
sum_so_far += numbers[i]
i += 1
return sum_so_far
o 𝑅𝑢𝑛𝑛𝑖𝑛𝑔 𝑡𝑖𝑚𝑒 𝑎𝑛𝑎𝑙𝑦𝑠𝑖𝑠. Let 𝑛 be the length of the input numbers
o We can divide up the function into 3 parts
1. The cost of the assignment statements sum_so_far = 0 and i = 0 is
constant time
2. The while loop
• Each iteration is constant time
• There are 𝑛 iterations, since i starts at 0 and increases by 1 until it
reaches 𝑛
3. The return statement takes constant time
o The total running time is 1 + 𝑛 + 1 = 𝑛 + 2, which is Θ(𝑛) QED
- Ex. Analyse the running time of the following function:
def my_sum_powers_of_two(numbers: List[int]) -> int:
“””Return the sum of the given numbers whose indexes are powers of 2.
return sum_so_far
o 𝑅𝑢𝑛𝑛𝑖𝑛𝑔 𝑡𝑖𝑚𝑒 𝑎𝑛𝑎𝑙𝑦𝑠𝑖𝑠. Let 𝑛 be the length of the input list numbers
o We count the initial assignment statements as 1 step, and the return statement
as 1 step
o Each iteration takes constant time
o To determine the number of loop iterations, we follow these steps:
1. Find a pattern for how i changes at each loop iteration, and a general
formula for 𝑖𝑘 , the value of i after 𝑘 iterations
Iteration Value of i
0 1
1 2
2 4
3 8
4 16
So we find that after 𝑘 iterations, 𝑖𝑘 = 2𝑘
2. We know the while loop continues while i < len(numbers)
• i.e. the while loop continues until i >= len(numbers)
To find the number of iterations, we need to find the smallest value of 𝑘
such that 𝑖𝑘 ≥ 𝑛, which makes the loop condition False
𝑖𝑘 ≥ 𝑛
2𝑘 ≥ 𝑛
𝑘 ≥ log 2 𝑛
So we need to find the smallest value of 𝑘 such that 𝑘 ≥ log 2 𝑛, which is
⌈log 2 𝑛⌉
o The while loop iterates ⌈log 2 𝑛⌉ times, with 1 step per iteration, for a total of
⌈log 2 𝑛⌉ steps
o The function my_sum_powers_of_two has a running time of 1 + ⌈log 2 𝑛⌉ + 1 =
⌈log 2 𝑛⌉ + 2, which is Θ(log 𝑛) QED
A Trickier Example
- Example of a standard loop, with a twist in how the loop variable changes at each
iteration
def twisty(n: int) -> int:
“””Return the number of iterations it takes for this special loop to stop
for the given n.
“””
iterations_so_far = 0
x=n
while x > 1:
if x % 2 == 0:
x=x/2
else:
x=2*x–2
iterations_so_far += 1
return iterations_so_far
- The loop variable x does not always get closer to the loop stopping condition
o i.e. sometimes it increases
- We will perform an analysis based on multiple iterations
- 𝐶𝑙𝑎𝑖𝑚. For any integer value of x greater than 2, after two iterations of the loop in
twisty the value of x decreases by at least one.
- 𝑃𝑟𝑜𝑜𝑓. Let 𝑥0 be the value of variable x at some iteration of the loop, and assume
𝑥0 > 2. Let 𝑥1 be the value of 𝑥 after one loop iteration, and 𝑥2 be the value of 𝑥 after
two loop iterations. We want to prove that 𝑥2 ≤ 𝑥0 − 1
o We divide up this proof into four cases, based on the remainder of 𝑥0 when
dividing by 4
o Case 1: Assume 4 ∣ 𝑥0, i.e. ∃𝑘 ∈ ℤ, 𝑥0 = 4𝑘
In this case, 𝑥0 is even, so the if branch executes in the first loop
𝑥0
iteration, and so 𝑥1 = = 2𝑘. Then 𝑥1 is also even, and so the if branch
2
𝑥1
executes again: 𝑥2 = =𝑘
2
1
So then 𝑥2 = 4 𝑥0 ≤ 𝑥0 − 1 (since 𝑥0 ≥ 4), as required.
o Case 2: Assume 4 ∣ 𝑥0 − 1, i.e. ∃𝑘 ∈ ℤ, 𝑥0 = 4𝑘 + 1
In this case, 𝑥0 is odd, so the else branch executes in the first loop
iteration, and so 𝑥1 = 2𝑥0 − 2 = 8𝑘. Then 𝑥1 is even, and so
𝑥1
𝑥2 = = 4𝑘.
2
So then 𝑥2 = 4𝑘 = 𝑥0 − 1, as required.
o Case 3: Assume 4 ∣ 𝑥0 − 2, i.e. ∃𝑘 ∈ ℤ, 𝑥0 = 4𝑘 + 2
In this case, 𝑥0 is even, so the if branch executes in the first loop
𝑥0
iteration, and so 𝑥1 = = 2𝑘 + 1. The 𝑥1 is odd, and so the else branch
2
executes: 𝑥2 = 2𝑥1 − 2 = 4𝑘.
So then 𝑥2 = 4𝑘 ≤ 𝑥0 − 1, as required.
o Case 4: Assume 4 ∣ 𝑥0 − 3, i.e. ∃𝑘 ∈ ℤ, 𝑥0 = 4𝑘 + 3
In this case, 𝑥0 is odd, so the else branch executes in the first loop
iteration, and so 𝑥1 = 2𝑥0 − 2 = 8𝑘 + 4. Then 𝑥1 is even, and so
𝑥1
𝑥2 = = 4𝑘 + 2.
2
So the 𝑥2 = 4𝑘 + 2 = 𝑥0 − 1, as required. QED
- 𝑅𝑢𝑛𝑛𝑖𝑛𝑔 𝑡𝑖𝑚𝑒 𝑎𝑛𝑎𝑙𝑦𝑠𝑖𝑠. (Analysis of twisty)
o We count the variable initialization before the while loop as 1 step, and the
return statement as 1 step
o For the while loop:
The loop body takes 1 step
To count the number of loop iterations, we first observe that 𝑥 starts at 𝑛
and the loop terminates when 𝑥 reaches 1 or less. The Claim tells us that
after every 2 iterations, the value of 𝑥 decreases by at least 1
So the after 2 iterations, 𝑥 ≤ 𝑛 − 1; after 4 iterations, 𝑥 ≤ 𝑛 − 2, and in
general, after 2𝑘 iterations, 𝑥 ≤ 𝑛 − 𝑘
This tells us that after 2(𝑛 − 1) loop iterations, 𝑛 ≤ 𝑛 − (𝑛 − 1) = 1,
and so the loop must stop.
o This analysis tells us that the loop iterates at most 2(𝑛 − 1) times, and so takes
at most 2(𝑛 − 1) steps
o So the total running time of twisty is at most 1 + 2(𝑛 − 1) + 1 = 2𝑛 steps,
which is 𝒪(𝑛) QED
- We did not compute the exact number of steps the function twisty takes, only an upper
bound on the number of steps
- We were only able to conclude a Big-O bound, and not a Theta bound
o We don’t know whether this bound is tight
- It is possible to prove something remarkable about what happens to the variable x after
three iterations of the twisty loop
- 𝐶𝑙𝑎𝑖𝑚 (Improved). For any integer value of x greater than 2, let 𝑥0 be the initial value of
1 1
x and let 𝑥3 be the value of x after three loop iterations. Then 8 𝑥0 ≤ 𝑥3 ≤ 2 𝑥0
- The running time of twisty is both 𝒪(log 𝑛) and Ω(log 𝑛), and hence conclude that its
running time is Θ(log 𝑛)
Timing Operations
- Python provides a module called timeit that can tell us how long Python code takes to
execute
>>> from timeit import timeit
>>> timeit(‘5 + 15’, number=1000)
1.97999133245455784654e-05
o The above call to timeit will perform the operation 5 + 15 one thousand times
o The function returns the total time elapsed
return squares_so_far
o 𝑅𝑢𝑛𝑛𝑖𝑛𝑔 𝑡𝑖𝑚𝑒 𝑎𝑛𝑎𝑙𝑦𝑠𝑖𝑠. Let 𝑛 be the length of the input list (i.e. numbers)
o The assignment statement counts as 1 step
o The for loop:
Takes 𝑛 iterations
list.append takes constant time, and so the entire loop body counts as 1
step
This means the for loop takes 𝑛 ⋅ 1 = 𝑛 steps total
o The return statement counts as 1 step
o The total running time is 1 + 𝑛 + 1 = 𝑛 + 2, which is Θ(𝑛) QED
- Ex. Analyse the running time of the following function
def squares_reversed(numbers: List[int]) -> int:
“””Return a list containing the squares of the given numbers, in reverse
order.”””
squares_so_far = 0
Data Classes
- Data classes store their instance attributes using a dictionary that maps attribute names
to their corresponding values
- Data classes benefit from the constant-time dictionary operations above
- The two operations that we can perform on a dataclass instance: looking up an attribute
value (i.e. david.age), and mutating the instance by assigning to an attribute (i.e.
david.age = 99) both take constant time
Aggregation Functions
- sum, max, min have a linear running time (Θ(𝑛)), proportional to the size of the input
collection
o Each element of the collection must be processed in order to calculate each of
these values
- len has a constant running time (Θ(1)), independent of the size of input collection
o The Python interpreter does not need to process each element of a collection
when calculating the collection’s size
o Each of these collection data types stores a special attribute referring to the size
of that collection
- any and all need to check every element of their input collection, but they can short-
circuit (stopping before checking every element)
o Similar to the logical or and and operators
o Their running time isn’t a fixed function of the input size, but rather a possible
range of values, depending on whether this short-circuiting happens or not
Intro
- Algorithms often depend on the actual value of the input, not just its size
- Consider the following function
def has_even(numbers: List[int]) -> bool:
“””Return whether numbers contain an even element.”””
for number in numbers:
if number % 2 == 0:
return True
return False
o Because this function returns as soon as it finds an even number in the list, its
running time is not necessarily proportional to the length of the input list
- The running time of a function can vary even when the input size is fixed
o 𝐼ℎ𝑎𝑠_𝑒𝑣𝑒𝑛,10 do not all have the same runtime
- Because our asymptotic notation is used to describe the growth rate of functions, we
cannot use it to describe the growth of a whole range of values with respect to
increasing input sizes
- We focus on the maximum of this range, which corresponds to the slowest the
algorithm could run for a given input size
- 𝐷𝑒𝑓𝑖𝑛𝑖𝑡𝑖𝑜𝑛. Let func be a program. We define the function 𝑊𝐶𝑓𝑢𝑛𝑐 ∶ ℕ → ℕ, called the
worst-case running time function of func, as follows:
𝑊𝐶𝑓𝑢𝑛𝑐 (𝑛) = max{running time of executing 𝑓𝑢𝑛𝑐(𝑥) | 𝑥 ∈ 𝐼𝑓𝑢𝑛𝑐,𝑛 }
- 𝑊𝐶𝑓𝑢𝑛𝑐 is a function, not a constant number: it returns the maximum possible running
time for an input of size 𝑛, for every natural number 𝑛
o And so we can use asymptotic notation to describe it
- The goal of a worst-case runtime analysis for func is to find an elementary function 𝑓
such that 𝑊𝐶𝑓𝑢𝑛𝑐 ∈ Θ(𝑓)
- We take a two-pronged approach: proving matching upper and lower bounds on the
worst-case running time of our algorithm
An Efficiency Test
- Consider the following example
from math import floor, sqrt
from timeit import timeit
Intro
- We can think of abstraction as allowing for the separation of two groups of people with
different goals:
o The creators of an entity
Responsible for designing, building, and implementing an entity
o The users (or clients) of that entity
Responsible for using it
- The interface of an entity is the boundary between creator and user
o Interface – the set of rules (implicit or explicit) governing how users can interact
with that entity
The public side of an entity
The part of the creator’s work that everyone can interact it
Defining an Initializer
- A Person object has been created, but it has no attributes
- We need to define a new method for Person called the initializer
o The initializer method of a class is called when an instance of the class is created
in Python
o Purpose: to initialize all of the instance attributes for the new object
o Python always use the name __init__ for the initializer method
- When we use the @dataclass decorator the Python interpreter automatically creates an
initializer method for the class, like what is shown below:
class Person:
“””A custom data type that represents data for a person.”””
given_name: str
family_name: str
age: int
address: str
def __init__(self, given_name: str, family_name: str, age: int, address: str)
-> None:
“””Initialize a new Person object.”””
self.given_name = given_name
self.family_name = family_name
self.age = age
self.address = address
- This method is indented so that it is inside the body of the class Person definition
- Every initializer has a first parameter that refers to the instance that has just been
created and is to be initialized
o By convention, we always call it self
o We could have written self: Person, but it is redundant because the type for self
should always be the class that the initializer belongs to
- We use the initializer by calling the data class as usual:
>>> david – Person(‘david’, ‘Liu’, 100, ’40 St. George Street’)
o The initializer is called automatically
o We never have to pass a value for self
Python automatically sets it to the instance that is to be initialized
- Memory at the beginning of the initializer:
__main__ id60 Person id11 str
“David”
Person.__init__ id12 str id13 int
self id60 “Liu” 100
given_name id11
family_name id12 id14 str
age id13 “40 St. George Street”
address id14
- The initializer’s job is to create and initialize the instance attributes
o To do this, we use one assignment statement per instance attribute
Uses the same dot notation syntax for assigning to instance attributes
o given_name is a parameter of the initialize
o self.given_name is an instance attribute
- Memory immediately before the initializer returns:
id60 Person
given_name id11
__main__ id11 str
family_name id12
“David”
age id13
address id14
Person.__init__ id12 str id13 int
self id60 “Liu” 100
given_name id11
family_name id12 id14 str
age id13 “40 St. George Street”
address id14
def __init__(self, given_name: str, family_name: str, age: int, address: str)
-> None:
“””Initialize a new Person object.”””
self.given_name = given_name
self.family_name = family_name
self.age = age
self.address = address
Intro
- Concrete data types – synonymous to Python class
o Have concrete implementations in Python code
- Abstract data type (ADT) – defines an entity that stores some kind of data and the
operations that can be performed on it
o Language-independent
o Pure interface concerned only with the what (i.e. what data is stored, what we
can do with the data) and not the how (i.e. how a computer actually stores this
data or implements these operations)
9.4 Stacks
Preconditions:
- not self.is_empty()
“””
Applications of Stacks
- “Undo” feature
o We want to undo the most recent action
Analyzing Efficiency
- We could have implemented Stack1 using the front of _items to represent the top of the
stack
class Stack2:
# Duplicated code from Stack1 omitted. Only push and pop are different.
Custom Exceptions
- A better solution is to raise a custom exception that is descriptive, yet does not reveal
any implementation details
- We can define our own type of error by defining a new class
class EmptyStackError(Exception):
“””Exception raised when calling pop on an empty stack.”””
- To use EmptyStackError in our pop method,
def pop(self) -> Any:
“””Remove and return the element at the top of this stack.
Raise a EmptyStackError if this stack is empty.
“””
if self.is_empty():
raise EmptyStackError
else:
return self._items.pop()
- The exception is now part of the public interface because the docstring names both the
type of exception and the scenario that will cause that exception to be raised
- The Python keyword raise will raise an exception
- When we call pop on an empty stack, it will display EmptyStackError rather than
mentioning any implementation details
Testing Exceptions
- We cannot simply call pop on an empty stack and check the return value or the state of
stack after pop returns
o Raising an error interrupts the regular control flow of a Python program
- The pytest module allows us to write tests that expects an exception to occur using a
function pytest.raises together with the with keyword
# Assuming our stack implementation is contained in a file stack.py.
from stack import Stack, EmptyStackError
import pytest
def test_empty_stack_error():
“””Test that popping from an empty stack raises an exception.”””
s = Stack()
with pytest.raises(EmptyStackError):
s.pop()
o The test passes when that exception is raised, and fails when that exception is
not raised
Also fails when a different exception is raised
Handling Exceptions
- Python provides the try-except statement to execute a block of code and handle3 a case
where one or more pre-specified exceptions are raised in that block
o The simplest form of a try-except statement:
try:
<statement>
…
except <ExceptionClass>:
<statement>
…
- When a try-except statement is executed:
o The block of code indented within the try is executed
o If no exception occurs when executing this block, the except part is skipped, and
the Python interpreter continues to the next statement after the try-except
o If an exception occurs when executing this block:
If the exception has type <ExceptionClass>, the block under the except is
executed, and then after that the Python interpreter continues executing
the next statement after the try-except
• In this case the problem does not immediately half
If the exception is a different type, this does stop the normal program
execution
- Try-except statements shield users from seeing errors that they should never see, and
allows the rest of the program to continue
- Example: a function that takes a stack and returns the second item from the top of the
stack
def second_from_top(s: Stack) -> Optional[str]:
“””Return the item that is second from the top of s.
If there is no such item in the Stack, returns None.
“””
try:
hold1 = s.pop()
except EmptyStackError:
# In this case, s is empty. We can return None.
return None
try:
hold2 = s.pop()
except EmptyStackError:
# In this case, s had only 1 element
# We restore s to its original state and return None
s.push(hold1)
return None
return hold2
9.6 Queues
class EmptyQueueError(Exception):
“””Exception raised when calling dequeue on an empty queue.”””
Implementation Efficiency
- Our Queue.enqueue calls list.append, which takes constant time
- Our Queue.dequeue calls self._items.pop(0), which takes Θ(𝑛) time
- If we change things around so that the front of the queue is the end of the list (rather
than the beginning), we simply swap these running times
- Using an array-based list, we can either have an efficient enqueue or an efficient
dequeue operation
class EmptyPriorityQueueError(Exception):
“””Exception raised when calling dequeue on an empty priority queue.”””
Intro
- Recall that the Stack ADT can be implemented using a Python list in 2 ways:
o Storing the top of the stack at the end of the list (Stack1)
o Storing the top of the stack at the front of the list (Stack2)
- They share the same public interface of the Stack ADT
class EmptyStackError(Exception):
“””Exception raised when calling pop on an empty stack.”””
class Stack2(Stack):
“””…”””
o The syntax (Stack) indicates that Stack1 and Stack2 inherit from Stack
o Stack: base class, superclass, parent class
o Stack1, Stack2: subclass, child class, derived class
- When one class in Python inherits from another,
o The Python interpreter treats every instance of the subclass as an instance of the
superclass as well
>>> s1 = Stack()
>>> isinstance(s1, Stack1)
True
>>> isinstance(s1, Stack)
True
>>> isinstance(s1, Stack2)
False
o When the superclass is abstract, the subclass must implement all abstract
methods from the superclass, without changing the public interface of those
methods
- Inheritance serves as a form of contract:
o The implementer of the subclass must implement the methods from the abstract
superclass
o Any user of the subclass may assume that they can call the superclass methods
on instances of the subclass
- Because Stack1 and Stack2 are both subclasses of Stack, we expect them to implement
all the stack methods
o They might also implement additional methods that are unique to each subclass
(i.e. not shared)
Intro
- object is an ancestor class of every other class
o Ancestor class – parent class, or parent of a parent class
- Whenever we define a new class (including data classes), if we do not specify a
superclass in parentheses, object is the implicit superclass
Method Inheritance
- The object class is not abstract and implements each of the special methods
- Here, where the superclass is a concrete class, inheritance is used not just to define a
shared public interface, but also to provide default implementations for each method in
the interface
- Suppose we create a dummy class with a completely empty body:
class Donut:
“””A donut.”””
- This class inherits the object.__init__ method, which allows us to create new Donut
instances
>>> donut = Donut()
>>> type(donut)
<class ‘__main__.Donut’>
- Similarly, this class inherits the object.__str__ method, which returns a string that states
the class name and memory location of the object
>>> d = Donut()
>>> d.__str__()
‘<__main__.donut object at 0x7fc299d7b588>’
- We can use the built-in dir function to see all of the special methods that Donut has
inherited form object
>>> dir(Donut)
[‘__class__’, ‘__delattr__’, ‘__dict__’, (the rest is omitted by me)]
- The special methods are often called by other functions or parts of Python syntax
o We have already seen how the __init__ method is called when a new object is
initialized
o The __str__ method is called when we attempt to convert an object to a string
by calling str on it
>>> d = Donut()
>>> d.__str__()
‘<__main__.Donut object at 0x7fc299d7b588>’
>>> str(d)
‘<__main__.Donut object at 0x7fc299d7b588>’
o The built-in print function first converts its arguments into strings using their
__str__ methods, and then prints out the resulting text
Method Overriding
- Every time we’ve defined our own __init__ in a class, we have overridden the
object.__init__ method
- We say that a class C overrides a method m when the method m is defined in the
superclass of C, and is also given a concrete implementation in the body of C
- When we defined a custom exception class
class EmptyStackError(Exception):
“””…”””
def __str__(self) -> str:
“””…”””
return ‘pop may not be called on an empty stack’
o This class overrode the __str__ method to use its own string representation,
which is displayed when this exception is raised.
Introducing Hercules
- We want to launch a Hercules app that allows people to order groceries and meals from
grocery stores and restaurants, and arrange for couriers to make deliveries right to their
front doors
- When designing and implementing this app, we need to consider:
o How restaurants will register with the app and post menus
o How customers will register with the app to browse restaurants and place orders
o How couriers will register with the app to claim orders and deliver them from
restaurants to customers
o … and more
@dataclass
class Customer:
“””A person who orders food.”””
@dataclass
class Courier:
“””A person who delivers food orders from restaurants to customers.”””
@dataclass
class Order:
“””A food order from a customer.”””
Designing the Restaurant Data Class
- We need a way to identify each restaurant: its name
o We’ll use a str to represent it
- A user needs to see what food is available to order, so we need to store a food menu for
each restaurant
o Since it has a few different options, we’ll use a dict that maps the names of
dishes (strs) to their price (floats)
- Couriers need to know where restaurants are in order to pick up food orders, and so we
need to store a location for each restaurant
o We could store its address as a str
o We could also store the latitude and longitude (a tuple of floats)
- Each of these three pieces of information (restaurant name, food menu, location) are
appropriate attributes for the restaurant
@dataclass
class Restaurant:
“””A place that serves food.
Instance Attributes:
- name: the name of the restaurant
- address: the address of the restaurant
- menu: the menu of the restaurant with the name of the dish
mapping to the price
- location: the location of the restaurant as (latitude, longitude)
Representation Invariants:
- all(self.menu[item] >= 0 for item in self.menu)
- -90 <= self.location[0] <= 90
- -180 <= self.location[1] <= 180
“””
name: str
address: str
menu: Dict[str, float]
location: Tuple[float, float]
- Since the menu is a compound data type, we could have created a completely separate
Menu data class
- Each new class we create introduces a little more complexity into our program, and for a
relatively simple class for a menu, this additional complexity does not worth it
- We could have used a dictionary to represent a restaurant instead of a Restaurant data
class
o This would have reduced on area of complexity, but introduced another
i.e. the “valid” keys of a dictionary used to represent a restaurant
Attributes:
- customer: the name of the customer who placed this order
- restaurant: the name of the restaurant the order is place for
- food_items: Dict[str, int]
- start_time: datetime.datetime
- courier: Optional[Courier] = None
- end_time: Optional[datetime.datetime] = None
- The line courier: Optional[Courier] = None is how we define an instance attribute
Courier with a default value of None
o The type annotation Optional[Courier] means tha this attribute can either bge
None or a Courier instance
o Similarly, the end_time attribute must be either None (its initial value) or a
datetime.datetime value
- Here is how we could use this class
o Note: Customer is currently an empty data class, and so is instantiated simply as
Customer()
>>> david = Customer()
>>> mcdonalds = Restaurant(name=’McDonalds’, address=’160 Spadina Ave’,
menu={‘fries’: 4.5}, location=(43.649, -79.397))
>>> order = Order(customer=david, restaurant=mcdonalds, food_items={‘fries’:
10}, start_time=datetime.datetime(2020, 11, 5, 11, 30))
Class Composition
- Classes can be “nested” within each other through their instance attributes
o i.e. our Order data class has attributes which are instances of other classes we
have defined (Customer, Restaurant, and Courier)
- The relationship between Order and these other classes is called class composition, and
is fundamental to object-oriented design
- We use class composition to represent a “has a” relationship between two classes
o i.e. “an Order has a Customer”
Intro
- We can create a new manager class whose role is to keep track of all the entities in the
system and to mediate the interactions between them (like a customer placing a new
order)
- The FoodDeliverySystem will store (and have access to) every customer, courier, and
restaurant represented in our system
class FoodDeliverySystem
“””A system that maintains all entities (restaurants, customers, couriers,
and orders).
Public Attributes:
- name: the name of this food delivery system
Representation Invariants:
- self.name != ‘’
- all(r == self._restaurants[r].name for r in self._restaurants)
- all(c == self._customers[c].name for c in self._customers)
- all(c == self._couriers[c].name for c in self._couriers)
“””
name: str
Changing State
- So far, we have modelled the static properties of our food delivery system, that is, the
attributes that are necessary to capture a particular snapshot of the state of the sytem
at a specific moment in time
- Adding entities
o We can define simple methods to add entities to the system
class FoodDeliverySystem:
…
def add_restaurant(self, restaurant: Restaurant) -> bool:
“””Add the given restaurant to this system.
Do NOT add the restaurant if one with the same name already
exists.
Return whether the restaurant was successfully added to this
system.
“””
if restaurant.name in self._restaurants:
return False
else:
self._restaurants[restaurant.name] = restaurant
return True
Preconditions:
- order in self.orders
“””
o FoodDeliverySystem.place_order would be responsible for both recording the
order and assigning a courier to that order
o FoodDeliverySystem.complete_order marks the order as complete and un-
assigning the courier so that they are free to take a new order
Instance Attributes:
- timestamp: the start time of the event
“””
timestamp: datetime.datetime
class NewOrderEvent(Event):
“””An event where a customer places an order for a restaurant.”””
o Since subclasses inherit all the methods from their superclass, we must provide a
datetime.datetime object as the first argument when creating a new
NewOrderEvent object
>>> e = NewOrderEvent(datetime.datetime(2020, 9, 8))
>>> e.timestamp
datetime.datetime(2020, 9, 8, 0, 0)
Subclass-Specific Attributes
- We often make the subclass-specific attributes private, to avoid changing the public
interface declared by the abstract superclass
- We do not need to repeat the documentation for the timestamp attribute
class NewOrderEvent(Event):
“””An event representing when a customer places an order at a
restaurant.”””
# Private Instance Attributes:
# - _order: the new order to be added to the FoodDeliverySystem
_order: Order
Implementing NewOrderEvent.handle_event
class NewOrderEvent(Event):
“””…”””
…
def handle_event(self, system: FoodDeliverySystem) -> None:
“Mutate system by placing an order.”””
system.place_order(self._order)
Returning No Events
- Our CompleteOrderEvent does not cause any new events to happen
o Returns an empty list
Preconditions:
- duration > 0
“””
return events
new_events = event.handle_event(system)
for new_event in new_events:
events.enqueue(new_event)
- Our run_simulation function is polymorphic
o It works regardless of what Event instances it’s given in its initial_events
parameter, or what new events are generated and stored in new_events
o Our function needs to be able to call the handle_event method on each event
object
A Simulation Class
class FoodDeliverySimulation
“””A simulation of the food delivery system.”””
# Private Instance Attributes:
# - _system: The FoodDeliverySystem instance that this simulation uses
# - _events: A collection of the events to process during the simulation
_system: FoodDeliverySystem
_events: EventQueue
self._populate_initial_events(start_time, num_days)
self._generate_system(num_couriers, num_customers, num_restaurants)
new_events = event.handle_event(self._system)
for new_event in new_events:
self._events.enqueue(new_event)
- Key items to note in this (incomplete) implementation:
o The run_simulation method has been renamed to simply run, since it’s a method
in the FoodDeliverySimulation class
o The local variable events and parameter system from the function are now
instance attributes for the FoodDeliverySimulation class, and have been moved
out of the run method entirely. It’s the job of the
FoodDeliverySimulation.__init__ to initialize these objects
o The initializer takes in several parameters representing configuration values for
the simulation. It then uses these values in two helper methods to initialize the
_system and _events objects. These methods are marked private (named with a
leading underscore) because they’re only meant to be called by the initializer,
and not code outside of the class
- To use the FoodDeliverySimulation class:
>>> simulation = foodDeliverySimulation(datetime.datetime(2020, 11, 30), 7, 4,
100, 50)
>>> simulation.run()