CSC111
CSC111
Table of Contents
11.1 Introduction to Linked Lists ............................................................................................. 3
11.2 Traversing Linked Lists ................................................................................................... 4
11.3 Mutating Linked Lists ..................................................................................................... 6
11.4 Index-Based Mutation .................................................................................................... 6
11.5 Linked List Running-Time Analysis .................................................................................. 8
12.1 Proof by Induction ........................................................................................................ 10
12.2 Recursively-Defined Functions ...................................................................................... 11
12.3 Introduction to Nested Lists .......................................................................................... 14
12.4 Nested Lists and Structural Recursion ........................................................................... 15
12.5 Recursive Lists .............................................................................................................. 17
13.1 Introduction to Trees .................................................................................................... 19
13.2 Recursion on Trees ....................................................................................................... 21
13.3 Mutating Trees ............................................................................................................ 26
13.4 Running-Time Analysis for Tree Operations .................................................................. 28
13.5 Introduction to Binary Search Trees .............................................................................. 31
13.6 Mutating Binary Search Trees....................................................................................... 34
13.7 The Running Time of Binary Search Tree Operations ..................................................... 36
14.1 Introduction to Abstract Syntax Trees ........................................................................... 38
14.2 Variables and the Variable Environment ....................................................................... 42
14.3 From Expressions to Statements ................................................................................... 43
14.4 Abstract Syntax Trees in Practice .................................................................................. 48
15.1 Introduction to Graphs ................................................................................................. 49
15.2 Some Properties of Graphs ........................................................................................... 51
15.3 Representing Graphs in Python..................................................................................... 53
15.4 Connectivity and Recursive Graph Traversal.................................................................. 56
15.5 Cycles and Trees ........................................................................................................... 59
15.6 Computing Spanning Trees ........................................................................................... 63
16.1 Sorted Lists and Binary Search ...................................................................................... 66
16.2 Selection Sort ............................................................................................................... 68
16.3 Insertion Sort ............................................................................................................... 71
16.4 Introduction to Divide-and-Conquer Algorithms ............................................................ 74
16.5 Mergesort .................................................................................................................... 74
16.6 Quicksort ..................................................................................................................... 77
16.7 Running-Time Analysis for Mergesort and Quicksort ..................................................... 79
17.1 Introduction to Average-Case Running Time ................................................................. 82
17.2 Average-Case Running Time of Linear Search ................................................................ 83
11.1 Introduction to Linked Lists
Intro
- Goal: create a new Python class that behaves exactly the same as the built-in list class
Instance Attributes:
- item: the data stored in this node
- next: the next node in the list, if any
“””
item: Any
next: Optional[_Node] = None
- An instance of _Node represents a single element of a list
- Given a bunch of _Node objects, we can follow each next attribute to recover the
sequence of items that these nodes represent
Building Links
- _node3 is very different from _node3.item
- The former is a _Node object containing a value (say 111), and the latter is the value 111
11.2 Traversing Linked Lists
Intro
- For a Python list, we can manually use an index variable i to keep track of where we are
in the list
i=0
while i < len(my_list):
… do something with my_list[i] …
i += 1
o Contains 4 parts
Initialize the loop variable i (0 refers to the starting index of the list)
Check if we’ve reached the end of the lit in the loop condition
In the loop, do something with the current element my_list[i]
Increment the index loop variable
- We don’t have the indexing operation
- Our loop variable should refer to the _Node object we’re currently on in the loop
curr = my_linked_list._first # Initialize curr to the start of the list
while curr is not None # curr is None if we’ve reached the end of the loop
… curr.item … # Do something with the current ‘element’, curr.item
curr = curr.next # ‘Increment’ curr, assigning it to the next node
Preconditions:
- i >= 0
“””
curr = self._first
curr_index = 0
curr = curr.next
curr_index = curr_index + 1
# If we’ve reached the end of the list and no item has been returned,
# the given index is out of bounds.
raise IndexError
o By updating curr and curr_index together, we get a loop invariant
curr refers to the node at index curr_index in the linked list
o The above implementation uses an early return inside the loop, stopping as soon
as we’ve reached the node at the given index i
- Another approach modifies the while loop condition so that the loop stops when it
either reaches the end of the list or the correct index
def __getitem__(self, i: int) -> Any:
“””…”””
curr = self._first
curr_index = 0
LinkedList.append
- We need to find the current last node in the linked list, and then add a new node to the
end of that
- We need to stop the loop when it reaches the last node
- We also need a case where curr starts as None
def append(self, item: Any) -> None:
“””Add the given item to the end of this linked list. “”
new_node = _Node(item)
if self._first is None:
self._first = new_node
else:
curr = self._first
while curr.next is not None:
curr = curr.next
Intro
- We want to implement a method analogous to link.insert
class LinkedList:
def insert(self, i: int, item: Any) -> None:
“””Insert the given item at index i in this linked list.
If i *equals* the length of self, add the item to the end of the linked list,
which is the same as LinkedList.app
Preconditions:
- i >= 0
Implementing LinkedList.insert
- If we want the node to be inserted into position i, we need to access the node at
position i – 1
def insert(self, i: int, item: Any) -> None:
“””…”””
new_node = _Node(item)
curr = self._first
curr_index = 0
if i == 0:
# Insert the new node at the start of the linked list
self._first, new_node.next = new_node, self._first
if curr is None:
# i – 1 is out of bounds. The item cannot be inserted.
raise IndexError
else: # curr_index == i – 1
# i – 1 is in bounds. Insert the new item.
new_node.next = curr.next
curr.next = new_node
o When the loop is over
If curr is None, the then list doesn’t have a node at position i – 1, and so i
is out of bounds
If not, then we’ve reached the desired index, and can insert the new
node
o Corner case: i == 0
We need an extra condition since it does not make sense to iterate to the
-1th node
Common Error
- The following order of link updates in the final else branch doesn’t work
curr.next = new_node
new_node.next = curr.next
o On the second line, curr.next has already been updated, and its old value lost
- Parallel assignment is recommended
curr.next, new_node.next = new_node, curr.next
if i == 0:
self._first, new_node.next = new_node, self._first
else:
curr = self._first
curr_index = 0
if curr is None:
raise IndexError
else:
curr.next, new_node.next = new_node, curr.next
- Running-time analysis. Let 𝑛 be the length (i.e. number of items) of self
o Case 1: Assume i == 0. In this case, the if branch executes, which takes constant
time, so we’ll count it as 1 step
o Case 2: Assume i > 0. In this case,
The first 2 statements in the else branch (curr = self._first, curr_index = 0)
takes constant time, so we’ll count them as 1 step
The statements after the while loop all take constant time, so we’ll count
them as 1 step
The while loop iterates until either it reaches the end of the list (curr is
None) or until it reaches the correct index (curr_index == i – 1)
• The first case happens after 𝑛 iterations, since curr advances by 1
_Node each iteration
• The second case happens after 𝑖 − 1 iterations, since curr_index
starts at 0 and increases by 1 each iteration
So the number of iterations taken is min(𝑛, 𝑖 − 1)
Each iteration takes 1 step, for a total of min(𝑛, 𝑖 − 1) steps
This gives us a total running time of 1 + min(𝑛, 𝑖 − 1) + 1 =
min(𝑛, 𝑖 − 1) + 2 steps
o In the first case, we have a running time of Θ(1); in the second case, we have a
running time of Θ(min(𝑛, 𝑖)). The second expression also becomes Θ(1) when
𝑖 = 0, and so we can say that the overall running time of LinkedList.insert is
Θ(min(𝑛, 𝑖)).
A Proof by Induction
𝑛(𝑛+1)
- Ex. Let 𝑓 ∶ ℕ → ℕ be defined as 𝑓(𝑛) = ∑𝑛𝑖=0 𝑖 . Prove that for all 𝑛 ∈ ℕ, 𝑓(𝑛) = .
2
Back to Example
- 𝑃𝑟𝑜𝑜𝑓. We prove this statement by induction on 𝑛.
- Base case: Let 𝑛 = 0
0(0+1)
o In this case, 𝑓(0) = ∑0𝑖=0 𝑖 = 0, and = 0. So the two sides of the equation
2
are equal.
𝑘(𝑘+1)
- Inductive step: Let 𝑘 ∈ ℕ and assume that 𝑓(𝑘) = . We want to prove that
2
(𝑘+1)(𝑘+2)
𝑓(𝑘 + 1) = 2
o 𝐷𝑖𝑠𝑐𝑢𝑠𝑠𝑖𝑜𝑛. We need to determine how to use the induction hypothesis.
𝑓(𝑘 + 1) = ∑𝑘+1 𝑘
𝑖=0 𝑖 can be broken down by “taking out” the last term: (∑𝑖=0 𝑖 ) +
(𝑘 + 1)
o We will start with the left side of the equation
𝑓(𝑘 + 1) = ∑𝑘+1𝑖=0 𝑖 (Definition of 𝑓)
𝑘
= (∑𝑖=0 𝑖 ) + (𝑘 + 1) (Taking out last term)
= 𝑓(𝑘) + (𝑘 + 1) (Definition of 𝑓)
𝑘(𝑘+1)
= + (𝑘 + 1) (By the I.H.)
2
(𝑘+1)(𝑘+2)
= QED
2
Intro
- In the last section, we broke down the summation into one of the same form, but of a
slightly smaller size:
o ∑𝑘+1 𝑘
𝑖=0 𝑖 = (∑𝑖=0 𝑖 ) + (𝑘 + 1)
o For all 𝑘 ∈ ℕ, 𝑓(𝑘 + 1) = 𝑓(𝑘) + (𝑘 + 1)
This relationship gives us a different way of defining 𝑓
0, if 𝑛 = 0
𝑓(𝑛) = {
𝑓(𝑛 − 1) + 𝑛, if 𝑛 > 0
This is a recursive definition
• i.e. 𝑓 is defined in terms of itself
• Another name: self-referential definition
We were able to manipulate the equation 𝑓(𝑘 + 1) = 𝑓(𝑘) + (𝑘 + 1) to
prove the inductive step
Preconditions:
- n >= 0
>>> f(4)
10
“””
if n == 0:
return 0
else:
return f(n – 1) + n
o Inefficient comparing to return n * (n + 1) // 2
- Let f be a Python function. f is a recursively-defined function (or recursive function) when
it contains a call to itself in its body
- We use the term recursive call to describe the inner f(n – 1) call
- Recursion – the programming technique of defining recursive functions to perform
computations and solve problems
- Structure
o The if branch, consisting of the statement return 0, is the base case of the
function
o The else branch, consisting of the statement return f(n – 1) + n, is the recursive
step of the function, since it contains a recursive call
- The base case does not require any additional “breaking down” of the problem
- Both the inductive step of a proof and the recursive step of a function require the
problem to be broken down into an instance of a smaller size
o Either by using the inductive hypothesis or by making a recursive call
Preconditions:
- a >= 0 and b >= 0
“””
x=a
y=b
while y != 0:
r=x%y
x=y
y=r
return x
- We also know that gcd(𝑎, 0) = 𝑎 for all 𝑎 ∈ ℕ
- The recursive definition of the gcd function over the natural numbers:
𝑎, if 𝑏 = 0
o gcd(𝑎, 𝑏) = {
gcd(𝑏, 𝑎 % 𝑏) , if 𝑏 > 0
o The recursive part decreases from 𝑏 to 𝑎 % 𝑏 each time
o We are not just limited to going “from 𝑛 to 𝑛 − 1”
- A recursive definition is valid as long as it always uses “smaller” argument values to the
function in the recursive call
- Translating the recursive gcd definition into Python code:
def euclidian_gcd_rec(a: int, b: int) -> int:
“””Return the gcd of a and b using recursion.
Preconditions:
- a >= 0 and b >= 0
“””
if b == 0:
return a
else:
return euclidean_gcd_rec(b, a % b)
A Motivating Example
- Consider the problem of computing the sum of a list of numbers. It is very easy.
- If we make the input structure a list of lists of numbers, we would need to use a nested
loop to process individual items in the nested list
- If we add another layer, the function would have a “nested nested loop”
- This can go on forever
Non-Uniform Testing
- No function of the above form can handle nested lists with a non-uniform level of
nesting among its elements
o i.e. [[1, 2], [[[3]]], 4,]
- These functions operate on a list of a specific structure, requiring that each list element
itself have the same level of nesting of its elements
- We need a better solution
class RecursiveList:
“””A recursive implementation of the List ADT.
Representation Invariants:
- self._first is None == self._rest is None
”””
# Private Instance Attributes:
# - _first: The first item in the list, or None if this list is empty.
# - _rest: A list containing the items that come after the first one, or None if
# this list is empty.
_first: Optional[Any]
_rest: Optional[RecursiveList]
def __init__(self, first: Optional[Any], rest: Optional[RecursiveList]) -> None:
“””Initialize a new recursive list.”””
self._first = first
self._rest = rest
o This RecursiveList data type is recursive, because its _rest instance attribute
refers to another instance of RecursiveList
o An empty list has no “first” or “rest” values, so we set both these attributes to
None to represent an empty list
- To create a RecursiveList representing [1, 2, 3, 4]:
o RecursiveList(1, RecursiveList(2, RecursiveList(3, RecursiveList(4,
RecursiveList(None, None)))))
Preconditions:
- every element in this list is an int
“””
if self._first is None: # Base case: this list is empty
return 0
else:
return self._first + self._rest.sum()
o This function calculates the sum of an arbitrary number of elements without
using a built-in aggregation function or a loop
Nodes
- The RecursiveList and _Node classes have essentially the same structure
class RecursiveList:
_first: Optional[Any]
_rest: Optional[RecursiveList]
class _Node:
item: Any
next: Optional[_Node] = None
- _Node is technically a recursive class
- For a _Node, we think of it as representing a single list element
o Its recursive attribute next is a “link” to another _Node, and we traverse these
links in a loop to access each node one at a time
- For a RecursiveList, we think of it as representing an entire sequence of elements, not
just one element
o Its recursive attribute _rest is not a link, it is the rest of the list itself
- When computing on a RecursiveList, we don’t try to access each item individually;
instead, we make a recursive function call on the _rest attribute, and focus on how to
use the result of that call in our computation
-
o The root value of the above tree is A, it is connected to three subtrees
o The size is of the above tree is 10
o The leaves of the above tree are E, F, G, J, I
o The internal values of the above tree are A, B, C, D, H
o The height of the above tree is 4
o The children of A are B, C, D
A Tree Implementation
class Tree:
“””A recursive tree data structure.
Representation Invariants:
- self._root is not None or self.subtrees == []
“””
# Private Instance Attributes:
# - _root: The item stored at this tree’s root, or None if the tree is empty
# - _subtrees: The list of subtrees of this tree. This attribute is empty when
# self._root is None (representing an empty tree). However, this attribute may be
# empty when self._root is not None, which represents a tree consisting of just
# one item.
_root: Optional[Any]
_subtrees: List[Tree]
Preconditions:
- root is not none or subtrees == []
“””
self._root = root
self._subtrees = subtrees
Tree Size
- Suppose we want to calculate the size of a tree. We can approach this problem by
following the recursive definition of a tree, being either empty or a root connected to a
list of subtrees
- Let 𝑇 be a tree, and let 𝑠𝑖𝑧𝑒 be a function mapping of any tree to its size
o If 𝑇 is empty, the its size is 0: we can write 𝑠𝑖𝑧𝑒(𝑇) = 0
o If 𝑇 is non-empty, then it consists of a root value and collection of subtrees
𝑇0 , 𝑇1 , … , 𝑇𝑘−1 for some 𝑘 ∈ ℕ. In this case, the size of 𝑇 is the 𝑠𝑢𝑚 of the sizes
of its subtrees, plus 1 for the root:
𝑠𝑖𝑧𝑒(𝑇) = 1 + ∑𝑘−1
𝑖=0 𝑠𝑖𝑧𝑒(𝑇𝑖 )
- We can combine the above observations to write a recursive mathematical definition of
our 𝑠𝑖𝑧𝑒 function:
0, if 𝑇 is empty
o 𝑠𝑖𝑧𝑒(𝑇) = {
1 + ∑𝑘−1
𝑖=0 𝑠𝑖𝑧𝑒(𝑇𝑖 ),if 𝑇 has subtrees 𝑇0 , 𝑇1 , … , 𝑇𝑘−1
class Tree:
…
def __len__(self) -> int:
“””Return the number of items contained in this tree.
Example: Tree.__str__
- Trees have a non-linear ordering on the elements
- To implement Tree.__str__, we can start with the value of the root, then recursively add
on the __str__ for each of the subtrees
o The base case is when the tree is empty, and in this case the method returns an
empty string
def _str_indented(self, depth: int) -> str:
“””Return an indented string representation fo this tree.
The indentation level is specified by the <depth> parameter.
“””
if self.is_empty():
return ‘’
else:
s = ‘ ‘ * depth + f’{self._root}\n’
for subtree in self._subtrees:
s += subtree._str_indented(depth + 1)
return s
Optional Parameters
- One way to customize the behaviour of functions is to make a parameter optional by
giving it a default value
o Can be done for any function
i.e. def _str_indented(self, depth: int = 0) -> str:
• depth becomes an optional parameter that can either be included
or not included when this method is called
• We can call t._str_indented()
o No argument for depth given
- All optional parameters must appear after all of the required parameters in the function
header
- Do not use mutable values like lists for optional parameters
o If we do, the code will mysteriously stop working
o Use optional parameters with immutable values like integers, strings, and None
Traversal Orders
- The __str__ implementation we gave visits the values in the tree in a fixed order:
o 1. First it visits the root value
o 2. Then it recursively visits each of its subtrees, in left-to-right order
By convention, we think of the _subtrees list as being ordered from left
to right
- This visit order is known as the (left-to-right) preorder tree traversal
o Root value is visited before any values in the subtrees
- Another common tree traversal is the (left-to-right) postorder
o Visits the root value after it has visited every value in its subtrees
- Implementation of _str_indented in a postorder fashion:
def _str_indented_postorder(self, depth: int = 0) -> str:
“””Return an indented *postorder* string representation of this tree.
s += ‘ ‘ * depth + f’{self._root}\n’
return s
13.3 Mutating Trees
Value-Based Deletion
class Tree:
def remove(self, item: Any) -> bool:
“””Delete *one* occurrence of the given item from this tree.
# If the loop doesn’t return early, the item was not deleted from
# any of the subtrees. In this case, the item does not appear
# in this tree.
return False
o We can move self._root == item check into an elif condition
def remove(self, item: Any) -> bool:
“””…”””
if self.is_empty():
return False
elif self._root == item:
self._delete_root()
return True
else:
for subtree in self._subtrees:
deleted = subtree.delete_item(item)
if deleted:
return True
return False
Preconditions:
- not self.is_empty()
“””
if self._subtrees == []:
self._root = None
else:
# Get the last subtree in this tree.
chosen_subtree = self._subtrees.pop()
self._root = chosen_subtree._root
self._subtrees.extend(chosen_subtree._subtrees)
o This implementation picks the rightmost subtree, and “promote” its root and
subtrees by moving them up a level in the tree
o However, this implementation changed around some structure of the original
tree just to delete a single element
The Problem of Empty Subtrees
- By the above implementation, after we delete an item, the result is an empty tree
o The parent will contain an empty tree in its subtrees list
- Fixing the problem
o If we detect that we deleted a leaf, we remove the now-empty subtree from its
parent’s subtree list
else:
for subtree in self._subtrees:
deleted = subtree.remove(item)
if deleted and subtree.is_empty():
# The item was deleted and the subtree is now empty.
# We should remove the subtree from the list of subtrees.
self._subtrees.remove(subtree)
return True
elif deleted:
# The item was deleted, and the subtree is not empty.
return True
return False
o In general it is extremely dangerous to remove an object from a list as we iterate
through it
It interferes with the iterations of the loop that is underway
As soon as we remove the subtree, we stop the method by returning
Analysing Tree.__len__
def __len__(self) -> int:
“””…”””
if self.is_empty():
return 0
else:
size_so_far = 1
for subtree in self._subtrees:
size_so_far += subtree.__len__()
return size_so_far
- Let 𝑛 be the size of self, i.e. the number of items in the tree
- We can ignore “small” values of 𝑛, so we assume 𝑛 > 0, and the else branch executes
- We are making a call to subtree.__len__, but we are in the middle of trying to analyse
the running time of subtree.__len__, which we don’t know the running time
- We first identify all recursive calls that are made when we call __len__ on this tree
- Shorthand: “(A)” means “the tree rooted at A”
- When we make our initial call on the whole tree (rooted at A)
o Initial call (A) makes three recursive calls on each of its subtrees (B), (C), and (D)
The recursive call on (B) makes two recursive calls on each of its subtrees,
(E) and (F)
• Each of (E) and (F) is a leaf, so no more recursive calls are made
from them
The recursive call on (C) makes two recursive calls, on (G) and on (H)
• The (G) is a leaf, so no more recursive calls happen for that tree
• The recursive call on (H) makes one more recursive call on (J)
o The (J) is a leaf, so no more recursive calls happen
The recursive call on (D) makes one recursive call on (I)
• The (I) is a leaf, so no more recursive calls happen
- __len__’s recursive step always make a recursive call on every subtree, in total there is
one __len__ call per item in the tree
o The structure of the recursive calls exactly follows the structure of the tree
Analysing the Non-Recursive Part of Each Call
- We can’t just count the number of recursive calls, since each call might perform other
operations as well
- In addition to making recursive calls, there are some constant time operations, and a for
loop that adds to an accumulator
- We can count the number of steps performed by a single recursive call, and add those
up across all the different recursive calls that are made
- For the recursive step, we’ll count the number of steps taken, assuming each recursive
call takes constant time
- For Tree.__len__, the total number of steps taken is:
o 1 step for the assignment statement
o 𝑘 steps for the loop, where 𝑘 is the number of subtrees in the tree (which
determines the number of loop iterations)
We are counting the loop body as just 1 step
o 1 step for the return statement
- The above gives a total of 𝑘 + 2 steps for the non-recursive cost of the else branch
- To find the total running time, we need to sum up across all recursive calls
- Challenge: 𝑘 changes for each recursive call
o i.e. (A) has 𝑘 = 3, (B) has 𝑘 = 2, (E) has 𝑘 = 0
- We can write these costs for every recursive call in our example tree
o The numbers in this tree represent the total number of steps taken by our initial
call to Tree.__len__ across all the recursive calls that are made
o The sum of all the subtrees (bold terms) is 9, which is one less than the total
number of items
o The 20 (sum of the non-bold terms) is the constant number of steps (2)
multiplied by the number of recursive calls, which is equal to the number of
items (10)
Generalize
- Let 𝑛 ∈ ℤ+ and suppose we have a tree of size 𝑛. We know that there will be 𝑛 recursive
calls made.
o The “constant time” parts will take 2𝑛 steps across all 𝑛 recursive calls
o The total number of steps taken by the for loop across all recursive calls is equal
to the sum of all of the numbers of children of each node, which is 𝑛 − 1
- This gives us a total running time of 2𝑛 + (𝑛 − 1) = 3𝑛 − 1, which is Θ(𝑛)
Looking Back
- The above technique applies to any tree method of the form:
class Tree
def method(self) -> …:
if self.is_empty():
…
else:
…
for subtree in self._subtrees:
… subtree.method() …
…
as long as each of the … is filled in with constant-time operations
- If we have an early return, we will need to show the worst-case running time
Representation Invariants:
- (self._root is None) == (self._left is None)
- (self._root is None) == (self._right is None)
- (BST Property) if self._root is not None, then all items in self._left are <=
self._root, and all items in self._right are >= self._root
”””
# Private Instance Attributes:
# - _root: The item stored at the root of this tree, or None if this tree is
# empty.
# - _left: The left subtree, or None if this tree is empty.
# - _right: The right subtree, or None if this tree is empty.
_root: Optional[Any]
_left: Optional[BinarySearchTree]
_right: Optional[BinarySearchTree]
o Since the left/right ordering matters, we use explicit attributes to refer to the left
and right subtrees
- An empty tree has a _root value of None, and its _left and _right attributes are None as
well
- An empty tree is the only case where any of the attributes can be None
- The _left and _right attributes might refer to empty binary search trees, but this is
different from them being None
- The initializer and is_empty methods are based on the corresponding methods for the
Tree class
class BinarySearchTree:
def __init__(self, root: Optional[Any]) -> None:
“””Initialize a new BST containing only the given root value.
Insertion
- The simplest approach is to put the new item at the “bottom” of the tree
- We can implement recursively:
o If the BST is empty, make the new item the root of the tree
o Otherwise, the item should be inserted into either the left subtree or the right
subtree, while maintaining the binary search tree property
- If we want to insert 15 into the below tree, there is only one possible leaf position to put
it: to the left of the 17
Deletion
- Given an item to delete, we take the same approach as __contains__ to search for the
item
o If we find it, it will be at the root of a subtree, where we delete it
class BinarySearchTree
def remove(self, item: Any) -> None:
“””Remove *one* occurrence of <item> from this BST.
Preconditions:
- not self.is_empty()
“””
if self._left.is_empty() and self._right.is_empty():
self._root = None
self._left = None
self._right = None
elif self._left.is_empty():
# “Promote” the right subtree
self._root, self._left, self._right = \
self._right._root, self._right._left, self._right._right
elif self._right.is_empty():
# “Promote” the left subtree
self._root, self._left, self._right = \
self._left._root, self._left._left, self._left._right
else:
self._root = self._left._extract_max()
- After deleting the item, we set self._root = None only if the tree consists of just the root
(with no subtrees)
o If the BST has at least 1 other item, doing so would violate our representation
invariant
- When at least one of the subtrees is empty, but the other one isn’t , we can “promote”
the other subtree up
- For the case where both subtrees are non-empty, we can fill the “hole” at the root by
replacing the root item with another value from the tree (and then removing that other
value from where it was)
o For example, if we want to remove the root value 30 from the below tree, the
two values we could replace it with are 17 and 90
o Our implementation above extracts the maximum value from the left subtree,
which requires a helper function
class BinarySearchTree:
def _extract_max(self) -> Any:
“””Remove and return the maximum item stored in this tree.
Preconditions:
- not self.is_empty()
“””
if self._right.is_empty():
max_item = self._root
# Like remove_root, “promote” the left subtree
self._root, self._left, self._right = \
self._left._root, self._left._left, self._left._right
return max_item
else:
return self._right._extract_max()
o The base case here handles two scenarios:
self has a left (but no right) child
self has no children
o
- The number of recursive calls differs depending on the values stored in the tree and the
values being searched for
- We will focus on the worst-case running time
- The total number of recursive calls is at most the height of the BST plus 1
o The longest possible “search path” for an item is equal to the height of the BST,
plus 1 for recursing into an empty subtree
o ℎ + 1 steps, where ℎ is the height of the BST
- The worst-case running time for BinarySearchTree.__contains__ is Θ(ℎ)
o The same analysis holds for insert and remove as well
Looking Ahead
- The above relies on the assumption of the height of a binary search tree always being
roughly log 2 𝑛
o The insertion and deletion algorithms we have studied do not guarantee this
property holds when we mutate a binary search tree
Ex. when we insert items into a binary search tree in sorted order
Intro
- The first step that the Python interpreter takes when given a Python file to run is to
parse file’s contents and create a new representation of the program code called an
Abstract Syntax Tree (AST)
The returned value should be the result of how this expression would be
evaluated by the Python interpreter.
“””
raise NotImplementedError
Instance Attributes:
- n: the value of the literal
“””
n: Union[int, float]
The returned value should be the result of how this expression would be
evaluated by the Python interpreter.
Instance Attributes:
- left: the left operand
- op: the name of the operator
- right: the right operand
Representation Invariants:
- self.op in {‘+’, ‘*’}
“””
left: Expr
op: str
right: Expr
def __init__(self, left: Expr, op: str, right: Expr) -> None:
“””Initialize a new binary operation expression.
Preconditions:
- op in {‘+’, ‘*’}
“””
self.left = left
self.op = op
self.right = right
- The BinOp class is a binary tree
o Its root value is the operator name, and its left and right subtrees represent the
two operand subexpressions
- Ex. 3 + 5.5 can be represented in BinOp(Num(3), ‘+’, Num(5.5))
- The left and right attributes of BinOp data type are Exprs (not Nums)
o This makes this data type recursive, and allows it to represent nested arithmetic
operations
- Ex. ((3 + 5.5) * (0.5 + (15.2 * -13.3))) can be represented in
BinOp(
BinOp(Num(3), ‘+’, Num(5.5)),
‘*’,
BinOp(
Num(0.5),
‘+’,
BinOp(Num(15.2), ‘*’, Num(-13.3)))
- To evaluate a binary operation, we first evaluate its left and right operands, and then
combine them using the specified arithmetic operator
class BinOp(Expr):
def evaluate(self) -> Any:
“””Return the *value* of this expression.
The returned value should be the result of how this expression would be
evaluated by the Python interpreter.
if self.op == ‘+’:
return left_val + right_val
elif self.op == ‘*’:
return left_val * right_val
else:
# We shouldn’t reach this branch because of our representation
# invariant
raise ValueError(f’Invalid operator {self.op}’)
Instance Attributes:
- id: The variable name.
“””
id: str
The returned value should be the result of how this expression would be
evaluated by the Python interpreter.
This should have the same effect as evaluating the statement by the real
Python interpreter.
def Expr(Statement):
…
Instance Attributes:
- target: the variable name on the left-hand side of the equals sign
- value: the expression on the right-hand side of the equals sign
“””
target: str
value: Expr
Instance Attributes:
- argument: The argument expression to the `print` function.
“””
argument: Expr
This evaluates the argument of the print call, and then actually prints it.
Note that it doesn’t return anything, since `print` doesn’t return anything.
“””
print(self.argument.evaluate(env))
Instance Attributes:
- body: A sequence of statements.
“””
body: list[Statement]
Evaluating Modules
- To evaluate a module, we do two things:
o 1. Initialize an empty dictionary to represent the environment (starting with no
variable bindings)
o 2. Iterate over each statement of the module body and evaluate it
class Module:
def evaluate(self) -> None:
“””Evaluate this statement with the given environment.”””
env = {}
for statement in self.body:
statement.evaluate(env)
Instance Attributes:
- test: The condition expression of this if statement.
- body: A sequence of statements to evaluate if the condition is True
- orelse: A sequence of statements to evaluate if the condition is False.
(This would be empty in the case that there if no `else` block)
“””
test: Expr
body: list[Statement]
orelse: list[Statement]
- We can represent a for loop over a range of numbers
class ForRange(Statement):
“””A for loop that loops over a range of numbers.
for <target> in range(<start>, <stop>):
<body>
Instance Attributes:
- target: The loop variable.
- start: The start for the range (inclusive).
- stop: The end of the range (this is *exclusive*, so <stop> is not included.
- body: The statements to execute in the loop body.
“””
target: str
start: Expr
stop: Expr
body: list[Statement]
Code Analysis
- Abstract syntax trees can be used to analyse a program’s code without running it
o Known as static program analysis
- Examples
o Check for common errors
PyCharm, PythonTA, pylint
o Identify unused or redundant code
o Check the types of expressions and definitions
mypy
o Check for common security and efficiency problems
Transpilers
- We can transform an abstract syntax tree in one language to an equivalent abstract
syntax tree in another
- Allows us to develop tools to translate code from one programming language to
another, or between different versions of the same programming language
o E.g. Different web browsers support different versions of the JavaScript
language, and often lag behind the latest JavaScript version
Tools like Babel translate between newer versions of JavaScript to older
versions, allowing programmers to write code using the latest JavaScript
features
Graphs
- 𝐷𝑒𝑓𝑖𝑛𝑖𝑡𝑖𝑜𝑛. A graph is a pair of sets (𝑉, 𝐸), which are defined as follows:
o 𝑉 is a set of objects. Each element of 𝑉 is called a vertex of the graph, and 𝑉
itself is call the the set of vertices of the graph
o 𝐸 is a set of pairs of objects from 𝑉, where each pair {𝑣1 , 𝑣2 } is a set consisting of
2 distinct vertices (i.e. 𝑣1 , 𝑣2 ∈ 𝑉 and 𝑣1 ≠ 𝑣2) and is called an edge of the graph
o Order does not matter in the pairs, and so {𝑣1, 𝑣2 } and {𝑣2 , 𝑣1 } represent the
same edge
- The conventional notation to introduce a graph is to write 𝐺 = (𝑉, 𝐸), where 𝐺 is the
graph itself, 𝑉 is its vertex set, and 𝐸 is its edge set
- The set of vertices of a graph represents a collection of objects
- The set of edges of a graph represent the relationships between those objects
- Ex. to describe Facebook:
o Each Facebook user is a vertex
o Each friendship between two Facebook users is an edge between the
corresponding vertices
- We often draw graphs using:
o Dots to represent vertices
o Line segments to represent edges
- Ex.
o 7 vertices
o 11 edges
- Graphs are generalization of trees
o Rather than enforcing a strict hierarchy on the data, graphs support any vertex
being joined by an edge to any other vertex
- 𝐷𝑒𝑓𝑖𝑛𝑖𝑡𝑖𝑜𝑛. Let 𝐺 = (𝑉, 𝐸), and let 𝑣1 , 𝑣2 ∈ 𝑉. We say that 𝑣1 and 𝑣2 are adjacent if
and only if there exists an edge between them, i.e. {𝑣1 , 𝑣2 } ∈ 𝐸
o Equivalently, we can also say that 𝑣1 and 𝑣2 are neighbours
- 𝐷𝑒𝑓𝑖𝑛𝑖𝑡𝑖𝑜𝑛. Let 𝐺 = (𝑉, 𝐸) and let 𝑣 ∈ 𝑉. We say that the degree of 𝑣, denoted 𝑑(𝑣), is
its number of neighbours, or equivalently, how many edges 𝑣 is part of
Example
- A and B are not adjacent
- A and B are connected (E.g. A, F, G, B)
- The length of the shortest path between vertices B and F is 2 (i.e. B, G, F)
Prove that this example graph is not connected
- 𝑇𝑟𝑎𝑛𝑠𝑙𝑎𝑡𝑖𝑜𝑛. Let 𝐺 = (𝑉, 𝐸) refer to this graph. 𝐺 is not connected means:
o ¬(𝐺 is connected)
o ¬(∀𝑢, 𝑣 ∈ 𝑉, 𝑢 and 𝑣 are connected)
o ∃𝑢, 𝑣 ∈ 𝑉, 𝑢 and 𝑣 are not connected
o ∃𝑢, 𝑣 ∈ 𝑉, there is no path between 𝑢 and 𝑣
- 𝑃𝑟𝑜𝑜𝑓. Let 𝐺 = (𝑉, 𝐸) be the above graph. Let 𝑢 and 𝑣 be the vertices labelled 𝐸 and 𝐵,
respectively. We will show that there does not exist a path between 𝑢 and 𝑣.
Suppose for a contradiction that there exists a path 𝑣0 , 𝑣1 , … , 𝑣𝑘 between 𝑢 and 𝑣,
where 𝑣0 = 𝐸. Since 𝑣0 and 𝑣1 must be adjacent, and 𝐶 is the only vertex adjacent to 𝐸,
we know that 𝑣1 = 𝐶. Since we know 𝑣𝑘 = 𝐵, the path cannot be over yet; i.e. 𝑘 ≥ 2.
By the definition of path, we know that 𝑣2 must be adjacent to 𝐶, and must be distinct
from 𝐸 and 𝐶. But the only vertex that’s adjacent to 𝐶 is 𝐸, and so 𝑣2 cannot exist,
which gives us our contradiction.
Q.E.D.
A Proof by Contradiction
- Ex. Prove that for all graphs 𝐺 = (𝑉, 𝐸), if |𝑉| ≥ 2, then there exist two vertices in 𝑉
that have the same degree.
o 𝑇𝑟𝑎𝑛𝑠𝑙𝑎𝑡𝑖𝑜𝑛. ∀𝐺 = (𝑉, 𝐸), |𝑉| ≥ 2 ⟹ (∃𝑣1 , 𝑣2 ∈ 𝑉, 𝑑(𝑣1 ) = 𝑑(𝑣2 ))
o 𝑃𝑟𝑜𝑜𝑓. Assume for a contradiction that this statement is False, i.e. that there
exists a graph 𝐺 = (𝑉, 𝐸) such that |𝑉| ≥ 2 and all of the vertices in 𝑉 have a
different degree. We’ll derive a contradiction from this. Let 𝑛 = |𝑉|.
Let 𝑣 be an arbitrary vertex in 𝑉. We know that 𝑑(𝑣) ≥ 0, and because there are
𝑛 − 1 other vertices not equal to 𝑣 that could be potential neighbours of 𝑣,
𝑑(𝑣) ≤ 𝑛 − 1. So every vertex in 𝑉 has degree between 0 and 𝑛 − 1, inclusive.
Since there are 𝑛 different vertices in 𝑉 and each has a different degree, this
means that every number in {0, 1, … , 𝑛 − 1} must be the degree of some vertex
(note that this set has size 𝑛). In particular, there exists a vertex 𝑣1 ∈ 𝑉 such that
𝑑(𝑣1 ) = 0, and another vertex 𝑣2 ∈ 𝑉 such that 𝑑(𝑣2 ) = 𝑛 − 1.
Then on the one hand, since 𝑑(𝑣1 ) = 0, it is not adjacent to any other vertex,
and so {𝑣1 , 𝑣2 } ∉ 𝐸.
But on the other hand, since 𝑑(𝑣2 ) = 𝑛 − 1, it is adjacent to every other vertex,
and so {𝑣1 , 𝑣2 } ∈ 𝐸.
So both {𝑣1 , 𝑣2 } ∉ 𝐸 and {𝑣1 , 𝑣2 } ∈ 𝐸 are True, which gives us our contradiction.
Q.E.D.
Instance Attributes:
- item: The data stored in this vertex.
- neighbours: The vertices that are adjacent to this vertex.
“””
item: Any
neighbours: set[_Vertex]
Preconditions:
- item not in self._vertices
“””
self._vertices[item] = _Vertex(item, set())
Precondition:
- item1 != item2
“””
if item1 in self._vertices and item2 in self._vertices:
v1 = self._vertices[item1]
v2 = self._vertices[item2]
Checking Adjacency
- Two common questions:
o “Are these two items adjacent?”
o “What items are adjacent to this item?”
class Graph
def adjacent(self, item1: Any, item2: Any) -> bool:
“””Return whether item1 and item2 are adjacent vertices in this graph.
Note that the *items* are returned, not the _Vertex objects themselves.
Intro
- Our goal is to implement a generalization of the adjacent method that checks whether
two vertices are connected
class Graph:
def connected(self, item1: Any, item2: Any) -> bool:
“””Return whether item1 and item2 are connected vertices in this graph.
Return False if item1 or item2 do not appear as vertices in this graph.
“””
class _Vertex:
def check_connected(self, target_item: Any) -> bool:
“””Return whether this vertex is connected to a vertex corresponding to
the target_item.
“””
- Two vertices are connected when there exists a path between them
- Recursive definition of connectedness: given two vertices 𝑣1 and 𝑣2, they are connected
when:
o 𝑣1 = 𝑣2, or
o there exists a neighbour 𝑢 of 𝑣1 such that 𝑢 and 𝑣2 are connected
- This recursion is not structural because it doesn’t break down the data type into a
smaller instance with the same structure
- Implementation of this definition
class _Vertex:
def check_connected(self, target_item: Any) -> bool:
“””…”””
if self.item == target_item:
# Our base case: the target_item is the current vertex
return True
else:
for u in self.neighbours:
if u.check_connected(target_item):
return True
return False
- This implementation does not work
Fixing _Vertex.check_connected
- If 𝑣1 and 𝑣2 are connected, then we should be able to find a path between a neighbour
𝑢 and 𝑣2 that doesn’t use 𝑣1, and then add 𝑣1 to the start of that path
- We can modify the definition: given two vertices 𝑣1 and 𝑣2, they are connected when:
o 𝑣1 = 𝑣2, or
o there exists a neighbour 𝑢 of 𝑣1 such that 𝑢 and 𝑣2 are connected by a path that
does not use 𝑣1
- We should be able to remove 𝑣1from the graph and still find a path between 𝑢 and 𝑣2
Preconditions:
- self not in visited
“””
if self.item == target_item:
# Our base case: the target_item is the current vertex
return True
else:
new_visited = visited.union({self})
# Add self to the set of visited vertices
for u in self.neighbours:
if u not in new_visited:
# Only recurse on vertices that haven’t been visited
if u.check_connected(target_item, new_visited):
return True
return False
- With this version, we’ve eliminated our infinite recursion error
The edges are returned as a list of sets, where each set contains the two
ITEMS corresponding to an edge. Each returned edge is in this graph (i.e.,
this function does not create new edges).
Preconditions:
- this graph is connected
“””
Preconditions:
- self not in visited
- d >= 0
“””
print(‘ ‘ * d + str(self.item))
visited.add(self)
for u in self.neighbours:
if u not in visited:
u.print_all_connected(visited, d + 1)
o By printing anything, we can see the recursive call structure that this method
traces
o The recursive call structure forms a tree that spans all of the vertices that the
starting vertex is connected to
The edges are returned as a list of sets, where each set contains the two
ITEMS corresponding to an edge.
Preconditions:
- self not in visited
“””
edges_so_far = []
visited.add(self)
for u in self.neighbours:
if u not in visited:
edges_so_far.append({self.item, u.item})
edges_so_far.extend(u.spanning_tree(visited))
return edges_so_far
- Use the above method as a helper to implement our original Graph.spanning_tree
method
o We can choose any starting vertex we want because we assume that our graph is
connected
class Graph
def spanning_tree(self) -> list[set]:
“””…”””
# Pick a vertex to start
all_vertices = list(self._vertices.values())
start_vertex = all_vertices[0]
Binary Search
- If we have a sorted list and want to search for an item, we can take the middle of the list
and compare with the item, determine which one is greater, then cut down the list by
half and repeat
o This algorithm is known as binary search
o At every comparison, the range of elements to check is halved
o
- At the start of our search, the full list is being searched. The binary search algorithm
uses a while loop to decrease the size of the range
o 1. First, we calculate the midpoint m of the current range
o 2. Then, we compare lst[m] against item
If item == lst[m], we can return True right away
If item < lst[m], we know that all indexes >= m contains elements larger
than item, and so update e to reflect this
If item > lst[m], we know that all indexes <= m contain elements less than
item, and so update b to reflect this
- The loop should stop when lst[b:e] is empty (i.e. when b >= e)
def binary_search(lst: list, item: Any) -> bool:
“””Return whether item is in lst using the binary search algorithm.”””
b=0
e = len(lst)
while b < e:
m = (b + e) // 2
if item = lst[m]:
return True
elif item < lst[m]:
e=m
else: # item > lst[m]
b=m+1
# If the loop ends without finding the item, the item is not in the list.
return False
Loop Invariants
- all(lst[i] < item for i in range(0, b))
- all(lst[i] > item for i in range(e, len(lst)))
Running Time
- Two loop variables: b and e that change over time in unpredictable ways
- Focus on the quantity e – b
o e – b initially equals 𝑛 the length of the input list
o The loop stops when e – b <= 0
o At each iteration, e – b decreases by at least a factor of 2
This requires a formal proof
- And so binary_search runs for at most 1 + log 2 𝑛 iterations, with each iteration taking
constant time
- Worst case running time is 𝒪(log 𝑛)
16.2 Selection Sort
return sorted_so_far
o Works but mutates the input lst
Can be fixed by making a copy of lst
Loop Invariants
- We use the variable i to represent the boundary between the sorted and unsorted parts
of the list
assert is_sorted(lst[:i])
- At iteration i, the first i items must be smaller than all other items in the list
assert i == 0 or all(lst[i – 1] < lst[j] for j in range(i, len(lst)))
# Find the index of the smallest item in lst[i:] and swap that
# item with the item at index i
index_of_smallest = _min_index(lst, i)
lst[index_of_smallest], lst[i] = lst[i], lst[index_of_smallest]
In the case of ties, return the smaller index (i.e. the index that appears first)
Preconditions:
- 0 <= i <= len(lst) – 1
“””
index_of_smallest_so_far = i
Preconditons:
- 0 <= i < len(lst)
- is_sorted(lst[:i])
“””
# Version 1, using an early return
for j in range(i, 0, -1): # This goes from i down to 1
if lst[j – 1] <= lst[j]:
return
else:
# Swap lst[j – 1] and lst[j]
lst[j – 1], lst[j] = lst[j], lst[j – 1]
j -= 1
- With the _insert function complete, we can simply call it inside our main insertion_sort
loop
def insertion_sort(lst: list) -> None:
“””Sort the given list using the insertion sort algorithm.
_insert(lst, i)
A Divide-and-Conquer Approach
- The divide-and-conquer approach to algorithm:
o 1. Given the problem input, split it up into 2 or more smaller subparts with the
same structure
o 2. Recursively run the algorithm on each subpart separately
o 3. Combine the results of each recursive call into a single result, solving the
original problem
- A divide-and-conquer sorting algorithm:
o 1. Given a list to sort, split it up into 2 or more smaller lists
o 2. Recursively run the sorting algorithm on each smaller list separately
o 3. Combine the sorted results of each recursive call into a single sorted list
16.5 Mergesort
Intro
- The mergesort algorithm:
o 1. Given an input list to sort, divide the input into the left half and right half
o 2. Recursively sort each half
o 3. Merge each sorted half together
- The ‘easy’ part is dividing the input into halves
- The ‘hard’ part is merging each sorted half into one final sorted list
Implementing Mergesort
- We will use the non-mutating version
- Base case: ‘when can we not divide the list any further?’
o Occurs when the list has less than 2 elements
- Recursive step: divide the list into two halves, recursively sort each half, and then merge
the sorted halves back together
o Merging the two sorted halves is complicated, and we need a helper
def mergesort(lst: list) -> list:
“””Return a new sorted list with the same elements as lst.
This is a *non-mutating* version of mergesort; it does not mutate the input list.
“””
if len(lst) < 2:
return lst.copy() # Use the list.copy method to return a new list object
else:
# Divide the list into 2 parts, and sort them recursively.
mid = len(lst) // 2
left_sorted = mergesort(lst[:mid])
right_sorted = mergesort(lst[mid:])
Preconditions:
- is_sorted(lst1)
- is_sorted(lst2)
“””
i1, i2 = 0, 0
sorted_so_far = []
16.6 Quicksort
o
- The “divide” step is difficult, the “combine” step is easy
def quicksort(lst: list) -> list:
“””Return a sorted list with the same elements as lst.
This is a *non-mutating* version of quicksort; it does not mutate the input list.
“””
if len(lst) < 2::
return lst.copy()
else:
# Divide the list into two parts by picking a pivot and then partitioning
# the list. In this implementation, we’re choosing the first element as
# the pivot, but we could have made lots of other choices here
# (e.g. last, random).
pivot = lst[0]
smaller, bigger = _partition(lst[1:], pivot)
Return two lists, where the first contains the items in lst
that are <= pivot, and the second contains the items in lst that are > pivot.
“””
smaller = []
bigger = []
Intro
- Approach for analysing the running time of recursive tree operations:
o 1. Find a pattern for the tree structure of recursive calls made for our recursive
function
o 2. Analyse the non-recursive running time of the recursive calls
o 3. ‘Fill in the tree’ of recursive calls with the non-recursive running time, and
then add up all of these numbers to obtain the total running time
Analysing Mergesort
def mergesort(lst: list) -> list:
“””…“””
if len(lst) < 2:
return lst.copy() # Use the list.copy method to return a new list object
else:
# Divide the list into 2 parts, and sort them recursively.
mid = len(lst) // 2
left_sorted = mergesort(lst[:mid])
right_sorted = mergesort(lst[mid:])
o
- Key observation: each level in the tree has nodes with the same running time
o At depth 𝑑 in the tree, there are 2𝑑 nodes, and each node contains the number
𝑛
2𝑑
𝑛
o When we add up the nodes at each depth, we get 2𝑑 ⋅ 2𝑑 = 𝑛
o Each level in the tree has the same total running time
- There are log 2 𝑛 + 1 levels in total, and we get a total running time of 𝑛 ⋅ (log 2 𝑛 + 1),
which is Θ(𝑛 log 𝑛)
Quicksort
def quicksort(lst: list) -> list:
“””…“””
if len(lst) < 2::
return lst.copy()
else:
# Partition the list using the first element as the pivot
pivot = lst[0]
smaller, bigger = _partition(lst[1:], pivot)
Intro
- The worst-case running time says very little about the ‘typical’ number in the set, and
nothing about the distribution of numbers in that set.
- The worst-case running time of quicksort is Θ(𝑛 2 ), but it runs faster than mergesort on
most inputs
o The average-case running time of quicksort is Θ(𝑛 log 𝑛)
o When there is only one input per size, or when all inputs of a given size have the
same running time, the average-case running time is the same as the worst-case
running time, as there is no spread
Intro
- The linear search algorithm searches for an item in a list by checking each list element
one at a time
def search(lst: list, x: Any) -> bool:
“””Return whether x is in lst.”””
for item in lst:
if item == x:
return True
return False
- Worst case is Θ(𝑛), where 𝑛 is the length of the list
- We need to precisely define what we mean by “all possible inputs of length 𝑛”
o There could be an infinite number of lists of length 𝑛 to choose from
o We cannot take an average of an infinite set of numbers
- In average-case running-time analysis, we choose a particular set of allowable inputs,
and then compute the average running time for that set
A First Example
- Let 𝑛 ∈ ℕ. We’ll choose our input set to be the set of inputs where:
o lst is always the list [0, 1, 2, …, n – 1]
o x is an integer between 0 and n – 1, inclusive
- i.e. we’ll consider always searching in the same list [0, 1, …, n – 1] and search for one of
the elements in the list
o Use ℐ𝑛 to denote this set
- Average-case running time analysis. for this definition of ℐ𝑛 , we know that |ℐ𝑛 | = 𝑛,
since there are 𝑛 different choices for x (and just one choice for lst), From our definition
of average-case running time, we have
1
𝐴𝑣𝑔𝑠𝑒𝑎𝑟𝑐ℎ (𝑛) = × ∑ running time of 𝑠𝑒𝑎𝑟𝑐ℎ(𝑙𝑠𝑡, 𝑥)
𝑛
(𝑙𝑠𝑡,𝑥)∈ℐ𝑛
o To calculate the sum, we need to compute the running time of search(lst, x) for
every possible input. Let 𝑥 ∈ {0, 1, 2, … , 𝑛 − 1}. We’ll calculate the running time
in terms of 𝑥.
Since lst = [0, 1, 2, …, n – 1], we know that there will be exactly 𝑥 + 1 loop
iterations until 𝑥 is found in the list, at which point the early return will
occur and the loop will stop. Each loop iteration takes constant time, for
a total of 𝑥 + 1 steps
o So the running time of search(lst, x) equals 𝑥 + 1, and we can use this to
calculate the average-case running time:
1
𝐴𝑣𝑔𝑠𝑒𝑎𝑟𝑐ℎ (𝑛) = × ∑ running time of 𝑠𝑒𝑎𝑟𝑐ℎ(𝑙𝑠𝑡, 𝑥)
𝑛
(𝑙𝑠𝑡,𝑥)∈ℐ𝑛
𝑛−1
1
= × ∑(𝑥 + 1)
𝑛
𝑥=0
𝑛
1
= × ∑ 𝑥′ (𝑥 ′ = 𝑥 + 1)
𝑛 ′
𝑥 =1
1 𝑛(𝑛 + 1)
= ×
𝑛 2
𝑛+1
=
2
𝑛+1
o And so the average-case running time of search on this set of inputs is 2 , which
is Θ(𝑛)
Note: we do not need to compute an upper and lower bound separately,
since in this case we have computed an exact average
- For the given set of inputs ℐ𝑛 for each 𝑛, the average-case running time is asymptotically
equal to that of the worst-case
= ∑ ∑ (𝑖 + 1) (from Step 2)
𝑖=0 𝑙𝑠𝑡∈𝑆𝑛,𝑖
𝑛
= ∑|𝑆𝑛,𝑖 | ⋅ (𝑖 + 1)
𝑖=0
𝑛−1
(∑ 2𝑛−𝑖−1 ⋅ (𝑖 + 1)) + 𝑛 + 1
𝑖=0
𝑛
′
= (∑ 2𝑛−𝑖 ⋅ 𝑖 ′ ) + 𝑛 + 1 (𝑖 ′ = 𝑖 + 1)
𝑖 ′ =1
𝑛 ′
𝑛
1 𝑖 ′
= 2 (∑ ( ) ⋅ 𝑖 ) + 𝑛 + 1
′
2
𝑖 =1
𝑛+1 1 1
𝑛+1 ( 𝑛+1 − 1)
=2 (2 𝑛
−2 2 )+𝑛+1 (Using the formula)
1 1
−2 4
𝑛 + 1 1
= 2𝑛 (− 𝑛 − 𝑛 + 2) + 𝑛 + 1
2 2
= −(𝑛 + 1) − 1 + 2𝑛+1 + 𝑛 + 1
= 2𝑛+1 − 1
Our total running time for all inputs in ℐ𝑛 is 2𝑛+1 − 1
o Step 5: putting everything together.
1
𝐴𝑣𝑔𝑠𝑒𝑟𝑎𝑐ℎ (𝑛) = × ∑ running time of 𝑠𝑒𝑎𝑟𝑐ℎ(𝑙𝑠𝑡, 𝑥)
|ℐ𝑛 |
(𝑙𝑠𝑡,𝑥)∈ℐ𝑛
1
= ⋅ (2𝑛+1 − 1)
2𝑛
1
= 2− 𝑛
2
1
o Our average-case running time is 2 − 2𝑛 steps, which is Θ(1)