Algorithms JoachimFavre
Algorithms JoachimFavre
If you did not get this document through my GitHub repository, then you may be interested by the fact
that I have one on which I put my typed notes. Here is the link (go take a look in the “Releases” section
to find the compiled documents):
https://github.com/JoachimFavre/EPFLNotesIN
Please note that the content does not belong to me. I have made some structural changes, reworded some
parts, and added some personal notes; but the wording and explanations come mainly from the Professor,
and from the book on which they based their course.
I think it is worth mentioning that in order to get these notes typed up, I took my notes in LATEXduring
the course, and then made some corrections. I do not think typing handwritten notes is doable in terms
of the amount of work. To take notes in LATEX, I took my inspiration from the following link, written
by Gilles Castel. If you want more details, feel free to contact me at my e-mail address, mentioned
hereinabove.
https://castel.dev/post/lecture-notes-1/
I would also like to specify that the words “trivial” and “simple” do not have, in this course, the definition
you find in a dictionary. We are at EPFL, nothing we do is trivial. Something trivial is something that
a random person in the street would be able to do. In our context, understand these words more as
“simpler than the rest”. Also, it is okay if you take a while to understand something that is said to be
trivial (especially as I love using this word everywhere hihi).
Since you are reading this, I will give you a little advice. Sleep is a much more powerful tool than you
may imagine, so never neglect a good night of sleep in favour of studying (especially the night before the
exam). I will also take the liberty of paraphrasing my high school philosophy teacher, Ms. Marques, I
hope you will have fun during your exams!
Version 2023–04–12
To Gilles Castel, whose work has
inspired me this note taking method.
1 Summary by lecture 11
2 Introduction 15
2.1 Definitions and example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Recall: Asymptotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Sorting algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4 Insertion sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5 Dynamic programming 43
5.1 Introduction and Fibonacci . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.2 Application: Rod cutting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.3 Application: Change-making problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.4 Application: Matrix-chain mulitplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.5 Application: Longest common subsequence . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.6 Application: Optimal binary search tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6 Graphs 53
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.2 Primitives for traversing and searching a graph . . . . . . . . . . . . . . . . . . . . . . . . 54
6.3 Topological sort of graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.4 Strongly connected components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.5 Flow networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.6 Disjoint sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.7 Minimum spanning tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.8 Single-source shortest paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.9 All-pairs shortest paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
7 Probabilistic analysis 85
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7.2 Hash functions and tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
8 Back to sorting 91
8.1 Quick sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
8.2 Sorting lower bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
7
Algorithms CONTENTS
8
List of lectures
Lecture 3 : Trees which grow in the wrong direction — Friday 30th September 2022 . . . . 22
Lecture 7 : Queues, stacks and linked list — Friday 14th October 2022 . . . . . . . . . . . . 34
Lecture 8 : More trees growing in the wrong direction — Monday 17th October 2022 . . . 38
Lecture 10 : ”There are 3 types of mathematicians: the ones who can count, and the
ones who cannot” (Prof. Kapralov) (what do you mean by “this title is too long”?)
— Monday 24th October 2022 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Lecture 11 : LCS but not LoL’s one — Friday 28th October 2022 . . . . . . . . . . . . . . . . 48
Lecture 15 : I definitely really like this date — Monday 14th November 2022 . . . . . . . . . 60
Lecture 16 : This date is really nice too, though — Friday 18th November 2022 . . . . . . . 63
Lecture 17 : The algorithm may stop, or may not — Monday 21st November 2022 . . . . . 67
Lecture 18 : Either Levi or Mikasa made this function — Friday 25th November 2022 . . . 70
Lecture 20 : I like the structure of maths courses — Friday 2nd December 2022 . . . . . . . 77
Lecture 24 : Doing fun stuff with matrices (really) — Friday 16th December 2022 . . . . . 80
9
Algorithms LIST OF LECTURES
Lecture 23 : Quantum bogosort is a comparison sort in Θ(n) — Monday 12th December 2022 89
10
Chapter 1
Summary by lecture
Lecture 3 : Trees which grow in the wrong direction — Friday 30th September 2022 p. 22
• Definition of max-heap.
• Explanation on how to store a heap.
• Explanation of the MaxHeapify procedure.
• Explanation on how to make a heap out of a random array.
• Explanation on how to use heaps to make heapsort.
11
Algorithms CHAPTER 1. SUMMARY BY LECTURE
Lecture 7 : Queues, stacks and linked list — Friday 14th October 2022 p. 34
Lecture 8 : More trees growing in the wrong direction — Monday 17th October 2022 p. 38
Lecture 10 : ”There are 3 types of mathematicians: the ones who can count, and the ones
who cannot” (Prof. Kapralov) (what do you mean by “this title is too long”?) — Monday
24th October 2022 p. 45
Lecture 11 : LCS but not LoL’s one — Friday 28th October 2022 p. 48
• Explanation on how to use dynamic programming in order to find the optimal binary search tree given
a sorted sequence and a list of probabilities.
• Definition of directed and undirected graphs, and explanation on how to store them in memory.
• Explanation of BFS.
• Explanation of DFS, and of the depth-first forest and edge classification it implies.
• Explanation of the parenthesis theorem.
• Explanation of the white-path theorem.
• Definition of directed acyclic graphs.
• Proof that a DAG is acyclic if and only if it does not have any back edge.
• Definition of topological sort, and explanation of an algorithm to compute it.
12
Notes by Joachim Favre
Lecture 15 : I definitely really like this date — Monday 14th November 2022 p. 60
Lecture 16 : This date is really nice too, though — Friday 18th November 2022 p. 63
Lecture 17 : The algorithm may stop, or may not — Monday 21st November 2022 p. 67
Lecture 18 : Either Levi or Mikasa made this function — Friday 25th November 2022 p. 70
• There exists no other Ackerman in the world, and when they wrote the term “Inverse Ackermann
function”, they definitely made a mistake while writing the word “Ackerman”.
• Definition of the disjoint-set data structures.
• Explanation of how to implement a disjoint-set data structure though linked lists.
• Explanation of how to implement a disjoint-set data structure though a forest of trees.
• Definition of spanning trees.
• Explanation of the minimum spanning tree problem.
Lecture 20 : I like the structure of maths courses — Friday 2nd December 2022 p. 77
• Explanation of the Bellman-Ford’s algorithm for finding shortest paths and detecting negative cycles.
• Proof of optimality of the Bellmand-Ford algorithm.
• Explanation of Dijkstra’s algorithm for finding a shortest path in a weighted graph, and proof of its
optimality.
Lecture 24 : Doing fun stuff with matrices (really) — Friday 16th December 2022 p. 80
13
Algorithms CHAPTER 1. SUMMARY BY LECTURE
Lecture 23 : Quantum bogosort is a comparison sort in Θ(n) — Monday 12th December 2022 p.
89
• Proof of an upper bound on the runtime complexity of successful search in hash tables.
• Explanation of quicksort.
• Proof and analysis of naive quick sort.
• Analysis of randomised quick sort.
• Proof of the Ω(n log(n)) lower bound for comparison sorts.
14
Friday 23rd September 2022 — Lecture 1 : I’m rambling a bit
Chapter 2
Introduction
Example: Arith- Let’s say that, given n, we want to compute ni=1 i. There are multiples way to do
P
metic series so.
Naive al- The first algorithm that could come to mind is to compute the sum:
gorithm
ans = 0
for i = 1, 2, ..., n
ans = ans + i
return ans
15
Algorithms CHAPTER 2. INTRODUCTION
Clever al- A much better way is to simply use the arithmetic partial series
gorithm formula, yielding:
return n*(n + 1)/2
Personal note: We say that f (x) ∈ O(g(x)), or more informally f (x) = O(g(x)), read “f is big-O
Definitions of g”, if there exists a M ∈ R+ and a x0 ∈ R such that:
|f (x)| ≤ M |g(x)|, ∀x ≥ x0
Personal note: Let f and g be two functions, such that the following limit exists or diverges:
Theorem
|f (x)|
lim = ` ∈ R ∪ {∞}
x→∞ |g(x)|
We can draw the following conclusions, depending on the value of `:
• If ` = 0, then f (x) ∈ o(g(x)).
16
2.2. RECALL: ASYMPTOTICS Notes by Joachim Favre
Proof We will only prove the third point, the other two are left as exercises
to the reader.
First, we can see that ` > 0, since ` 6= 0 by hypothesis and since
|g(x)| > 0 for all x.
|f (x)|
We can apply the definition of the limit. Since it is valid for all
ε > 0, we know that, in particular, it is true for ε = 2` > 0. Thus,
by definition of the limit, we know that for ε = 2` > 0, there exists
a x0 ∈ R, such that for all x ≥ x0 , we have:
|f (x)| `
−` ≤ε=
|g(x)| 2
` |f (x)| `
⇐⇒ − ≤ −`≤
2 |g(x)| 2
` 3`
⇐⇒ |g(x)| ≤ |f (x)| ≤ |g(x)|
2 2
since |g(x)| > 0.
Since 2` |g(x)| ≤ |f (x)| for x ≥ x0 , we get that f ∈ Ω(g(x)). Also,
since |f (x)| ≤ 3`
2 |g(x)| for x ≥ x0 , we get that f ∈ O(g(x)).
We can indeed conclude that f ∈ Θ(g(x)).
Example Let a ∈ R and b ∈ R+ . Let us compute the following ratio:
b
(n + a) a b
lim = lim 1+ =1
n→∞ |nb | n→∞ n
Side note: You can go read my Analyse 1 notes on my GitHub (in French) if
Link with you want more information, but there is an interesting link with
series series we can do here.
You
P∞ can convince yourself that if an ∈ Θ(bn ), then n=1 |an | and
P∞
n=1 |bn | have the same nature. Indeed, this hypothesis yields that
∃C1 , C2 ∈ R+ and a n0 ∈ N such that, for all n ≥ n0 :
∞
X ∞
X ∞
X
0 ≤ C1 |bn | ≤ |an | ≤ C2 |bn | =⇒ C1 |bn | ≤ |an | ≤ C2 |bn |
n=1 n=1 n=1
17
Algorithms CHAPTER 2. INTRODUCTION
Example Given the input (5, 2, 4, 6, 1, 3), a correct output is (1, 2, 3, 4, 5, 6).
Personal note: It is important to have the same numbers at the start and the end.
Remark Else, it allows to have algorithms such as the Stalin sort (remove all
elements which are not in order, leading to a complexity of Θ(n)),
or the Nagasaki sort (clearing the list, leading to a complexity of
Θ(1)).
They are more jokes than real algorithms, here is where I found the
Nagasaki sort:
https://www.reddit.com/r/ProgrammerHumor/comments/o5w3eo
Definition: In An algorithm solving the sorting problem is said to be in place when the numbers
place algorithm are rearranged within the array (with at most a constant number of variables oustside
the array at any time).
Loop invariant We will see algorithms, which we will need to prove are correct. To do so, one of
the methods is to use a loop invariant. This is something that stays true at any
iteration of a loop. The idea is very similar to induction.
To use a loop invariant, we need to do three steps. In the initialization, we show
that the invariant is true prior to the first iteration of the loop. In the maintenance,
we show that, if the invariant is true before an iteration, then it remains true before
the next iteration. Finally, in the termination, we use the invariant when the loop
terminates to show that our algorithm works.
Proof Let us prove that insertion sort works by using a loop invariant.
We take as an invariant that at the start of each iteration of the outer for loop,
the subarray a[1…(j-1)] consists of the elements originally in a[1…(j-1)] but in
sorted order.
1. Before the first iteration of the loop, we have j = 2. Thus, the subarray
consist only of a[1], which is trivially sorted.
2. We assume the invariants holds at the beginning of an iteration j = k. The
body of our inner while loop works by moving the elements a[k-1], a[k-2],
18
2.4. INSERTION SORT Notes by Joachim Favre
and so on one step to the right, until it finds the proper position for a[k],
at which point it inserts the value of a[k]. Thus, at the end of the loop,
the subarray a[1…k] consists of the elements originally in a[1…k] in a sorted
order.
3. The loop terminated when j = n + 1. Thus, the loop invariant implies that
a[1…n] contains the original elements in sorted order.
Complexity We can see that the first line is executed n times, and the lines which do not belong
analysis to the inner loop are executed n − 1 times (the first line of a loop is executed one
time more than its body, since we need to do a last comparison before knowing we
can exit the loop). We only need to compute how many times the inner loop is
executed every iteration.
In the best case, the loop is already sorted, meaning that the inner loop is never
entered. This leads to T (n) = Θ(n) complexity, where T (n) is the number of
operations required by the algorithm.
In the worst case, the loop is sorted in reverse order, meaning that the first line of
the inner loop is executed j times. Thus, our complexity is given by:
n
X n(n + 1) − 1
= Θ n2
T (n) = Θ j = Θ
j=2
2
As mentioned in the previous course, we mainly have to keep in mind the worst case
scenario.
19
Algorithms CHAPTER 2. INTRODUCTION
20
Chapter 3
Θ(1), if n ≤ c
T (n) = n
aT + D(n) + C(n), otherwise
b
where D(n) is the time to divide and C(n) the time to combine solutions.
21
Algorithms CHAPTER 3. DIVIDE AND CONQUER
i = 1
j = 1
for k = p to r:
// Since both subarrays are sorted , the next element is one of L[i]
or R[j]
i f L[i] <= R[j]:
A[k] = L[i]
i = i + 1
else :
A[k] = R[j]
j = j + 1
We can see that this algorithm is not in place, making it require more memory than
insertion sort.
Rermark The Professor put the following video on the slides, and I like it
very much, so here it is (reading the comments, dancers say “teile
une herrsche”, which means “divide and conquer”):
https://www.youtube.com/watch?v=dENca26N6V4
Friday 30th September 2022 — Lecture 3 : Trees which grow in the wrong direction
Theorem: Cor- Assuming that the implementation of the merge procedure is correct, mergeSort(A,
rectness of p, r) correctly sorts the numbers in A[p . . . r].
Merge-Sort
Θ(1), if n = 1
T (n) = n
2T + Θ(n), otherwise
2
Let’s first try to guess the guess the solution of this recurrence. We can set Θ(n) = c·n
for some c, leading to:
n
T (n) = 2T +c·n
2 n n
= 2 2T +c + cn
n 4 2
= 4T + 2cn
4 n n
= 4 2T +c
n 8 4
= 8T + 3cn
8
Thus, this seems that, continuing this enough times, we get:
We still need to prove that this is true. We can do this by induction, and this is
then named the substitution method.
22
3.1. MERGE SORT Notes by Joachim Favre
Proof: Upper We want to show that there exists a constant a > 0 such that
bound T (n) ≤ an log(n) for all n ≥ 2 (meaning that T (n) = O(n log(n))),
by induction on n.
• For any constant n ∈ {2, 3, 4}, T (n) has a constant value; selecting
a larger than this value will satisfy the base cases when n ∈
{2, 3, 4}.
• We assume that our statement is true for all n ∈ {2, 3, . . . , k − 1}
and we want to prove it for n = k:
n
T (n) = 2T + cn
2
IH an n
≤2 log + cn
2 2
n
= an log + cn
2
= an log(n) − an + cn
≤ an log(n)
if we select a ≥ c.
We can thus select a to be a positive constant so that both the base
case and the inductive step hold.
Proof: Lower We want to show that there exists a constant b > 0 such that
bound T (n) ≥ bn log(n) for all n ≥ 0 (meaning that T (n) = Ω(n log(n))),
by induction on n.
• For n = 1, T (n) = c and bn log(n) = 0, so the base case is satisfied
for any b.
• We assume that our statement is true for all k ∈ {0, . . . , k − 1}
we want to prove it for n = k:
n
T (n) = 2T + cn
2
IH bn n
≥ 2 log + cn
2 2
n
= bn log + cn
2
= bn log(n) − bn + cn
≥ bn log(n)
selecting b ≤ c.
We can thus select b to be a positive constant so that both the
base case and the inductive step hold.
Proof: Conclu- Since T (n) = O(n log(n)) and T (n) = Ω(n log(n)), we have proven
sion that T (n) = Θ(n log(n)).
c, if n = 1
T (n) = j n k l n m
T +T + c · n.
2 2
Note that we are allowed to take the same c everywhere since we
are considering the worst-case, and thus we can take the maximum
of the two constants supposed to be there, and call it c.
Anyhow, in our proof, we did not consider floor and ceiling functions.
Indeed, they make calculations really messy but don’t change the
final asymptotic result. Thus, when analysing recurrences, we simply
assume for simplicity that all divisions evaluate to an integer.
23
Algorithms CHAPTER 3. DIVIDE AND CONQUER
Other proof: Another way of guessing the complexity of merge sort, which works for many
Tree recurrences, is thinking of the entire recurrence tree. A recurrence tree is a tree
(really?) where each node corresponds to the cost of a subproblem. We can thus
sum the costs within each level of the tree to obtain a set of per-level costs, and
then sum all the per-level costs to determine the total cost of all levels of recursion.
For merge sort, we can draw the following tree:
We can observe than on any level, the amount of work sums up to cn. Since there
are log2 (n) levels, we can guess that T (n) = cn log2 (n) = Θ(n log(n)). To prove it
formally, we would again need to use recurrence.
Again, we notice that every level contributes to around cn, and we have at least
log3 (n) full levels. Therefore, it seems reasonable to say that an log3 (n) ≤ T (n) ≤
bn log 32 (n) and thus T (n) = Θ(n log(n)).
24
3.1. MERGE SORT Notes by Joachim Favre
Upper bound Let’s prove that there exists a b such that T (n) ≤ bn. We consider
the base case to be correct, by choosing b to be large enough.
Let’s do the inductive step. We get:
1 3 1 3
T (n) = T n +T n + 1 ≤ b n + b n + 1 = bn + 1
4 4 4 4
Upper bound Let’s now instead the harder induction hypothesis, stating that
(better) T (n) ≤ bn − b0 . This gives us:
1 3
T (n) = T n +T n +1
4 4
1 3
≤ b n − b0 + b n − b0 + 1
4 4
= bn − b + (1 − b0 )
0
≤ bn − b0
as long as b0 ≥ 1.
Thus, taking b such that the base case works, we have proven that
T (n) ≤ bn − b0 ≤ bn, and thus T (n) ∈ O(n). We needed to make
our claim stronger for it to work, and this is something that is often
needed to do.
Master theorem Let a ≤ 1 and b > 1 be constants. Also, let T (n) be a function defined on the
nonnegative integers by the following recurrence:
n
T (n) = aT + f (n)
b
Then, T (n) has the following asymptotic bounds:
1. If f (n) = O nlogb (a)−ε for some constant ε > 0, then T (n) = Θ nlogb (a) .
3. If f (n) = Ω nlogb (a)+ε for some constant ε > 0, and if af nb ≤ cf (n) for
some constant c < 1 and all sufficiently large n, then T (n) = Θ(f (n)). Note
that the second condition holds for most functions.
Example Let us consider the case for merge sort, thus T (n) = 2T n2 + cn.
Tree To learn this theorem, we only need to get the intuition of why it
works, and to be able to reconstruct it. To do so, we can draw a tree.
The depth of this tree is logb (n), and there are alogb (n) = nlogb (a)
leaves.
If a node does f (n) work, then each of its children does
af nb work.
25
Algorithms CHAPTER 3. DIVIDE AND CONQUER
1. If f grows slowly, a parent does less work than all its children
combined. This means that most of the work is done at the leaf.
Thus, the only thing that matters is the number of leafs which f
has: nlogb (a) .
2. If f grows such that every child contributes exactly the same
as their parents, then every level does the same work. Since we
have nlogb (a) leafs which each contribute a constant amount of
work, the last level adds up to c · nlogb (a) work, and thus every
level adds up to this value. We have logb (n) levels, meaning that
we have a total work of cnlogb (a) logb (n).
3. If f grows fast, then a parent does more work than all its children
combined. This means that all the work is done at the root and,
thus, that all that matters is f (n).
Application Let’s use a modified version of merge sort in order to count the number of inversions
in an array A (an inversion is i < j such that A[j] < A[i], where A never has twice
the same value).
The idea is that we can just add a return value to merge sort: the number of
inversions. In the trivial case n = 0, there is no inversion. For the recursive part,
we can just add the number of inversions of the two sub-cases and the number of
inversions we get from the merge procedure (which is the complicated part).
For the merge procedure, since the two subarrays are sorted, we notice that if the
element we are considering from the first subarray is greater than the one we are
considering of the second subarray, then we need to add (q − i + 1) to our current
count.
If A[i] > A[j], then all those
(q − i + 1) numbers
are greater than A[j]
This solution is Θ(n log(n)) and thus much better than the trivial Θ n double
2
for-loop solution.
Remark We can notice that there are at most n(n−1) inversions (in a reverse-
2
sorted array). It seems great that our algorithm achieves to count
this value in a smaller complexity. This comes from the fact that,
sometimes, we add much more than 1 at the same time in the merge
procedure.
Maximum subar- We have an array of values representing stock price, and we want to find when we
ray problem should have bought and when we should have sold (retrospectively, so this is no
investment advice). We want to buy when the cost is as low as possible and sell
when it is as high as possible. Note that we cannot just take the all time minimum
and all time maximum since the maximum could be before the minimum.
Let’s switch our perspective by instead considering the array of changes: the difference
between i and i − 1. We then want to find the largest contiguous subarray that has
the maximum sum; this is named the maximum subarray problem. In other
words, we want to find i < j such that A[i . . . j] has the biggest sum possible. For
instance, for A = [1, −4, 3, −4], we have i = j = 3, and the sum is 3.
The bruteforce solution, in which we compute the sum efficiently, is a runtime of
Θ n2 = Θ n2 , which is not great.
26
3.2. FAST MULTIPLICATION Notes by Joachim Favre
Let’s now instead use a divide-and-conquer is method. Only the merge procedure is
complicated. We must not miss solutions that cross the midpoint. However, if we
know that we want to find a subarray which crosses the midpoint, we can try to
find the best i in the left part until the midpoint (which takes linear time), find the
best j so that the subarray from the midpoint to j (which also takes linear time).
This means that we get three subarrays: one that is only in the left part, one that
cross the midpoint and one that is only in the right. This represents all possible
subarrays, and we can just take the best one amongst those three.
We get that the divide step is Θ(1), the conquer step solves two problems each of size
2 , and the merge time takes linear time. Thus, we have the exact same recurrence
n
Remark We will make a Θ(n) algorithm to solve this problem in the third
exercise series.
We are given two integers a, b with n bits each (they are given to us through arrays
of bits), and we want to output a · b. This can be important for cryptography for
instance.
Fast multiplica- We want to use a divide and conquer strategy.
tion Let’s say we have an array of values a0 , . . . , an giving us a, and an array of values
b0 , . . . , bn giving us b (we will use base 10 here, but it works for any base):
n−1
X n−1
X
a= ai 10i , b= bi 10i
i=0 i=0
Let’s divide our numbers in the middle. We get four numbers aL , aH , bL and bH ,
defined as:
n n
2 −1 n−1 2 −1 n−1
i− n n
X X X X
i
aL = ai 10 , aH = ai 10 2 , bL = bi 10i , bH = bi 10i− 2
i=0 i= n
2
i=0 i= n
2
27
Algorithms CHAPTER 3. DIVIDE AND CONQUER
(aL + aH )(bL + bH ) = aL bL + bH bH + aH bL + bH aL
(aL + aH )(bL + bH ) − aL bL − aH bH = aH bL + bH aL
Thus, considering what we did before, we this time only need three multiplications:
(aL + aH )(bL + bH ), aL bL and aH bH .
school algorithm.
Note that we are cheating a bit on the complexity, since
(aL + aH )(bL + bH ) is T n2 + 1 . However, as mentioned in the
last lesson, we don’t really care about floor and ceiling functions
(nor this +1).
Remark Note that, in most of the cases, we are working with 64-bit numbers which can be
multiplied in constant time on a 64-bit CPU. The algorithm above is in fact really
useful for huge numbers (in cryptography for instance).
Divide and con- We can realise that, when multiplying matrices, this is like multiplying submatrices.
quer If we have A and B being two n×n matrices, then we can split them into submatrices
and get:
C11 C12 A11 A12 B11 B12
=
C21 C21 A21 A22 B21 B22
28
3.3. MATRIX MULTIPLICATION Notes by Joachim Favre
no improvement.
Strassen’s al- Strassen realised that we only need to perform 7 recursive multiplications of n2 × n2
gorithm rather than 8. This gives us the recurrence:
n
+ Θ n2
T (n) = 7T
2
where the Θ n2 comes from additions, substractions and copying some matrices.
This solves to T (n) = Θ nlog2 (7) by the master theorem, which is better!
Remark Strassen was the first to beat the Θ n3 , but now we find algorithms with better
and better complexity (Even though the best ones currently known are galactic
algorithms).
29
Algorithms CHAPTER 3. DIVIDE AND CONQUER
30
Monday 10th October 2022 — Lecture 6 : Heap sort
Chapter 4
Example For instance, the three on the left is a nearly-complete binary tree
of depth 3, but not the one on the right:
Heap A heap (or max-heap) is a nearly-complete binary tree such that, for every node i,
the key (value stored at that node) of its children is less than or equal to its key.
Examples For instance, the nearly complete binary tree of depth 3 of the left
is a max-heap, but not the one on the right:
Remark 1 We can define the min-heap to be like the max-heap, but the property
each node follows is that the key of its children is greater than or
equal to its key.
Remark 2 We must not confuse heaps and binary-search trees (which we will
define later), which are very similar but have a more restrictive
property.
31
Algorithms CHAPTER 4. GREAT DATA STRUCTURES YIELD GREAT ALGORITHMS
Height The height of a node is defined to be the number of edges on a longest simple path
from the node down to a leaf.
Example For instance, in the following picture, the node holding 10 has height
1, the node holding 14 has height 2, and the one holding a 2 has
height 0.
Remark We note that, if we have n nodes, we can bound the height h of any
node:
h ≤ log2 (n)
Also, we notice that the height of the root is the largest height of a
node from the tree. This is defined to be the height of the heap.
We notice it is thus Θ(log2 (n)).
Storing a heap We will store a heap in an array, layer by layer. Thus, take the first layer and store
it in the array. Then, we take the next layer, and store it after the first layer. We
continue this way until the end.
Let’s consider we store our numbers in a starting with index starting at 1. The
children of a node A[i]
are
stored in A[2i] and A[2i + 1]. Also, if i > 1, the parent
of the node A[i] is A 2i .
Using this method, we realise that we do not need a pointer to the left and right
elements for each node.
Example For instance, let’s consider again the following tree, but considering
the index of each node:
Max heapify To manipulate a heap, we need to max-heapify. Given an i such that the subtrees
of i are heaps (this condition is important), this algorithm ensures that the subtree
rooted at i is a heap (satisfying the heap property). The only violation we could
have is the root of one of the subtrees being larger than the node i.
So, to fix our tree, we compare A[i], A[left(i)] and A[right(i)]. If necessary, we
swap A[i] with the largest of the two children to get the heap property. This could
break the previous sub-heap, so we need to continue this process, comparing and
swapping down the heap, until the subtree rooted at i is a max-heap. We could
write this algorithm in pseudocode as:
32
4.1. HEAP SORT Notes by Joachim Favre
i f largest != i
swap(A, i, largest) // swap A[i] and A[largest]
maxHeapify(A, largest , n)
Remark This procedure is the main primitive we have to work with heaps,
it is really important.
Building a heap To make a heap from an unordered array A of length n, we can use the following
buildMaxHeap procedure:
procedure buildMaxHeap(A, n)
for i = floor(n/2) downto 1
maxHeapify(A, i, n)
The idea is that the nodes strictly larger than n2 are leafs (no node after n2 can
have a left child since it would be at index 2i leading to an overflow, meaning that
they are all all leaves), which are trivial subheaps. Then, we can merge increasingly
higher heaps through the maxHeapify procedure. Note that we cannot loop in the
other direction (from 1 to n2 ), since maxHeapify does not create heap when our
Correctness To prove the correctness of this algorithm, we can use a loop invari-
ant: at the start of every iteration of for loops, each node i +1, . . . , n
is a root of a max-heap.
1. At start, each node n2 + 1, . . . , n is a leaf,
which is the root of
a trivial max-heap. Since, at start, i = n2 , the initialisation of
the loop invariant is true.
2. We notice that both children of the node i are indexed higher
than i and thus, by the loop invariant, they are both roots of
33
Algorithms CHAPTER 4. GREAT DATA STRUCTURES YIELD GREAT ALGORITHMS
Heapsort Now that we built our heap, we can use it to sort our array:
procedure heapsort(A, n)
buildMaxHeap(A, n)
for i = n downto 2
swap(A, 1, i) // swap A[1] and A[i]
maxHeapify(A, 1, i-1)
A max-heap is only useful for one thing: get the maximum element. When we get
it, we swap it to its right place. We can then max-heapify the new tree (without the
element we put in the right place), and start again.
Complexity We run O(n) times the heap repair (which runs in O(log(n))), thus
we get that our algorithm has complexity O(n log2 (n)).
It is interesting to see that, here, the good complexity comes from
a really good data-structure. This is basically like selection sort
(finding
the biggest element and putting it at the end, it runs in
O n2 ) but finding the maximum in a constant time.
Remark We can note that, unlike Merge Sort, this sorting algorithm is in
place.
Friday 14th October 2022 — Lecture 7 : Queues, stacks and linked list
Usage Priority queue have many usage, the biggest one will be in Dijkstra’s
algorithm, which we will see later in this course.
Maximum Since we are using a heap, we have two procedures for free.
Maximum(S) simply returns the root. This is Θ(1).
For Extract-Max(S), we can move the last element of the array to
the root and run Max-Heapify on the root (like what we do with
heap-sort, but without needing to put the root to the last element
of the heap).
34
4.3. STACK AND QUEUE Notes by Joachim Favre
Increase key To implement Increase-Key, after having changed the key of our
element, we can make it go up until its parent has a bigger key that
it.
procedure HeapIncreaseKey(A, i, key)
i f key < A[i]:
error "new key i s smaller than current key"
A[i] = key
while i > 1 and A[Parent(i)] < A[i]
exchange A[i] with A[Parent(i)]
i = Parent(i)
Insert To insert a new key into heap, we can increment the heap size, insert
a new node in the last position in the heap with the key −∞, and
increase the −∞ value to key using Heap-Increase-Key.
procedure HeapInsert(A, key , n)
n = n + 1 // this is more complex in real life , but
this is not important here
A[n] = -infinity
HeapIncreaseKey(A, n, key)
Definition: Stack A stack is a data structure where we can insert (Push(S, x)) and delete elements
(Pop(S)). This is known as a last-in, first-out (LIFO), meaning that the element we
get by using the Pop procedure is the one that was inserted the most recently.
Intuition This is really like a stack: we put elements over one another, and
then we can only take elements back from the top.
35
Algorithms CHAPTER 4. GREAT DATA STRUCTURES YIELD GREAT ALGORITHMS
We have an array of size n, and a pointer S.top to the last element (some space in
the array can be unused).
Empty To know if our stack is empty, we can basically only return S.top
== 0. This definitely has a complexity of O(1).
Intuition This is really like a queue in real life: people that get out of the
queue are people who were there for the longest.
Usage Queues are also used a lot, for instance in packet switches in the
internet.
Queue imple- We have an array Q, a pointer Q.head to the first element of the queue, and Q.tail
mentation to the place after the last element.
Enqueue To insert an element, we can simply use the tail pointer, making
sure to have it wrap around the array if needed:
procedure Enqueue(Q, x)
Q[Q.tail] = x
i f Q.tail == Q.length
Q.tail = 1
else
Q.tail = Q.tail + 1
Note that, in real life, we must verify upon overflow. We can observe
that this procedure is executed in constant time.
Dequeue To get an element out of our queue, we can use the head pointer:
procedure Dequeue(Q)
x = Q[Q.head]
i f Q.head == Q.length
Q.head = 1
else
Q.head = Q.head + 1
36
4.4. LINKED LIST Notes by Joachim Favre
return x
Note that, again, in real life, we must verify upon underflow. Also,
this procedure is again executed in constant time.
Summary Both stacks and queues are very efficient and have natural operations. However,
they have limited support (we cannot search). Also, implementations using arrays
have a fixed-size capacity.
We can have multiple types of linked list. In a single-linked list, every element only
knows the position of the next element in memory. In a double-linked list, every
element knows the position of the element before and after. Also, it is possible to
have a circular linked list, by making like the first element is the successor of the
last one.
Note that, when we are not working with circular linked list, the head pointer of the
first element is a nullptr, and the tail pointer of the last element is also nullptr.
Those are pointers to nowhere, usually implemented as pointing to the memory
address 0. This can also be represented as Nil.
It is also possible to have a sentinel, a node like all the others, except it does not
wear a value. This can simplify a lot the algorithms, since the pointer to the head
of the list always points to this sentinel. For instance, a circular doubly-linked list
with a sentinel would look like:
37
Algorithms CHAPTER 4. GREAT DATA STRUCTURES YIELD GREAT ALGORITHMS
x.prev = nil
Summary A linked list is interesting because it does not have a fixed capacity, and we can insert
and delete elements in O(1) (as long as we have a pointer to the given element).
However, searching is O(n), which is not great (as we will see later).
Monday 17th October 2022 — Lecture 8 : More trees growing in the wrong direction
Definition: Bin- A binary search tree is a binary tree (which is not necessarily nearly-complete),
ary Search Trees which follows the following core property: for any node x, if y is in its left subtree
then y.key < x.key, and if y is in the right subtree of x, then y.key ≥ x.key.
We could also have the following binary search tree of height h = 14:
38
4.5. BINARY SEARCH TREES Notes by Joachim Favre
We will see that good binary search trees are one with the smallest
height, since complexity will depend on it.
Remark Even though binary search trees are not necessarily nearly-complete,
we can notice that their property is much more restrictive than the
one of heaps.
Searching We designed this data-structure for searching, so the implementation is not very
complicated:
procedure TreeSearch(x, k)
i f x == Nil or k == key[x]
return x
else i f k < x.key
return TreeSearch(x.left , k)
else
return TreeSearch(x.right , k)
Extrema We can notice that the minimum element is located in the leftmost node, and the
maximum is located in the rightmost node.
This gives us the following procedure to find the minimum element, in complexity
O(h):
procedure TreeMinimum(x)
while x.left != Nil
x = x.left
return x
We can make a very similar procedure to find the maximum element, which also
runs in complexity O(h):
procedure TreeMaximum(x)
while x.right != Nil
x = x.right
return x
Successor The successor of a node x is the node y where y.key is the smallest key such that
y.key > x.key. For instance, if we have a tree containing the numbers 1, 2, 3, 5, 6,
the successor of 2 is 3 and the successor of 3 is 5.
To find the successor of a given element, we can split two cases. If x has a non-empty
right subtree, then its sucessor is the minimum in this right subtre (it is the minimum
number which is greater than x). However, if x does not have a right subtree, we can
39
Algorithms CHAPTER 4. GREAT DATA STRUCTURES YIELD GREAT ALGORITHMS
consider the point of view of the successor y of x: we know that x is the predecessor
of y, meaning that it is at the rightmost bottom of y’s left subtree. This means that
we can go to the up-left (go up, as long as we are the right child), until we need to
go to the up-right (go up, but we are the left child). When we went up-right, we
found the successor of x.
We can convince ourselves that the this is an “else” (do the first case if there is no
right subtree, else do the second case) by seeing that, if x has a right subtree, then
all of these elements are greater that x but smaller than the y that we would find
through our second procedure. In other words, if x has a right subtree, then its
successor must definitely be there. Also, this means that the only element for which
we will not find a successor is the one at the rightmost of the tree (since it has no
right subtree, and we will never be able to go up-right), which makes sense since
this is the maximum number.
This gives us the following algorithm for finding the successor in O(h):
procdure TreeSuccessor(x)
i f x.right != nil
return TreeMinimum(x.right)
y = x.p
while y != nil and x == y.right
x = y
y = y.p
return y
We can note that looking at the successor of the greatest element yields y = Nil, as
expected. Also, the procedure to find the predecessor is symmetrical.
Printing To print a binary tree, we have three methods: preorder, inorder and postorder
tree walk. They all run in Θ(n).
The preorder tree walk looks like:
procedure PreoderTreeWalk(x)
i f x != nil
print key[x]
PreorderTreeWalk(x.left)
PreorderTreeWal(x.right)
The inorder has the print key[x] statement one line lower, and the postorder tree
walk has this instruction two lines lower.
Insertion To insert an element, we can basically search for this element and, when finding its
supposed position, we can basically insert it at that position.
procedure BinarySearchTreeInsert(T, z)
y = Nil // previous Node we looked at
x = T.root // current node to look at
// Search
while x != Nil
y = x
i f z.key < x.key
x = x.left
else
x = x.right
// Insert
i f y == Nil
T.root = z // Tree was empty
else i f z.key < y.key
y.left = z
else
y.right = z
40
4.5. BINARY SEARCH TREES Notes by Joachim Favre
Deletion When deleting a node z, we can consider three cases. If z has no child, then we
can just remove it. If it has exactly one child, then we can make that child take z’s
position in the tree. If it has two children, then we can find its successor y (which is
at the leftmost of its right subtree, since this tree is not empty by hypothesis) and
replace z by y. Note that deleting the node y to move it at the place of z is not so
hard since, by construction, y has 0 or 1 child. Also, note that we could use the
predecessor instead of the successor.
To implement our algorithm, it is easier to have a transplant procedure (to replace
a subtree rooted at u with the one rooted at v):
procedure Transplant(T, u, v):
// u is root
i f u.p == Nil:
T.root = v
// u is the left child of its parent
else i f u == u.p.left:
u.p.left = v
// u is the right child of its parent
else :
u.p.right = v
i f v != Nil:
v.p = u.p
Balancing Note that neither our insertion nor our deletion procedures keep the height low. For
instance, creating a binary search tree by inserting in order elements of a sorted list
of n elements makes a tree of height n − 1.
There are balancing tricks when doing insertions and deletions, such as red-black
trees or AVL trees, which allow to stay at h = Θ(log(n)), but they will not be
covered in this course.
However, generally, making a tree from random order insertion is not so bad.
Summary We have been able to make search, max, min, predecessor, successor, insertion and
deletion in O(h).
Note that binary search tree are really interesting because we are easily able to
insert elements. This is the main thing which makes them more interesting than a
basic binary search on a sorted array.
41
Algorithms CHAPTER 4. GREAT DATA STRUCTURES YIELD GREAT ALGORITHMS
42
Chapter 5
Dynamic programming
The first idea to compute this is through the given recurrence relation:
procedure Fib(n):
i f n == 0 or n == 1:
return 1
else
return Fib(n-1) + Fib(n-2)
However, we can see that we are computing many things more than twice, meaning
that we are wasting resources and that we can do better. An idea is to remember
what we have computed not to compute it again.
We have two solutions. The first method is top-down with memoisation: we
solve recursively but store each result in a table. The second method is bottom-up:
we sort the problems, and solve the smaller ones first; that way, when solving a
subproblem, we have already solved the smaller subproblems we need.
43
Algorithms CHAPTER 5. DYNAMIC PROGRAMMING
We can note that this has a much better runtime complexity than
the naive one, since this is Θ(n). Also, for memory, we are taking
about the same size since, for the naive algorithm, we had a stack
depth of O(n), which also needs to be saved.
Designing a dy- Dynamic programming is a good idea when our problem has an optimal substructure:
namic program- the problem consists of making a choice, which leaves one or several subproblems to
ming algorithm solve, where finding the optimal solution to all those subproblems allows us to get
the optimal solution to our problem.
This is a really important idea which we must always keep in mind when solving
problems.
44
5.2. APPLICATION: ROD CUTTING Notes by Joachim Favre
Monday 24th October 2022 — Lecture 10 : ”There are 3 types of mathematicians: the ones
who can count, and the ones who cannot” (Prof. Kapralov) (what do you mean by “this
title is too long”?)
Brute force Let’s first consider the brute force case: why work smart when we can work fast.
We have n − 1 places where we can cut the rod, at each place we can either cut or
not, so we have 2n−1 possibilities to cut the rod (which is not great).
Also, as we will show in the 5th exercise series, a greedy algorithm does not work.
Theorem: Op- We can notice that, if the leftmost cut in an optimal solution is after i units, and an
timal substruc- optimal way to cut a solution of size n − i is into rods of sizes s1 , …, sk , then an
ture optimal way to cut our rod is into rods of sizes i, s1 , . . . , sk .
This yields that, since the optimal solution is worse or equal to the
solution we made (using the solution of the subproblem), this new
solution is indeed optimal.
Dynamic pro- The theorem above shows the optimal substructure of our problem, meaning that we
gramming can apply dynamic programming. Letting r(n) to be the optimal revenue from a rod
of length n we get by the structural theorem that we can express r(n) recursively as:
0, if n = 0,
(
r(n) = max {p + r(n − i)}, otherwise
i
1≤i≤n
However, just as for the Fibonacci algorithm, we are computing many times the
same thing. Thus, let use bottom-up with some memoisation:
45
Algorithms CHAPTER 5. DYNAMIC PROGRAMMING
procedure BottomUpCutRode(p, n)
// r contains the solution
top down? // s contains where to cut to get the optimal solution
l e t r[0...n] and s[0...n] be new arrays
// initial condition
r[0] = 0
// fill in the array
for j = 1 to n:
// find max
q = -infinity
for i = 1 to j:
i f q < p[i] + r[j - i]:
q = p[i] + r[j-i]
s[j] = i
r[j] = q
return r and s
Note that this is a typical Dynamic Programming approach: fill in the array of
subproblems (r here) and another array containing more information (s here), and
then use them to reconstruct the solution.
Solution We first want to see our optimal substructure. To do so, we need to define our
subproblems. Thus, let r[w] be the smallest number of coins needed to make change
for w. We can note that if we choose which coin i to use first and know the optimal
solution to make W − wi , we can make the optimal solution by adding 1 to the
solution we just found (since we just use one more coin wi ):
We can add the boundary conditions r[w] = +∞ for all w < 0 and r[0] = 0.
We can see that the runtime is O(nw) since we have w subproblems, and the
recomputation problem (checking all the value to get the minimum) takes order n.
46
5.4. APPLICATION: MATRIX-CHAIN MULITPLICATION Notes by Joachim Favre
Theorem: Op- We can notice that, if the outermost parenthesisation in an optimal solution is
timal substruc- (A1 · · · Ai )(Ai+1 · · · An ), and if PL and PR are optimal parenthesisation for A1 · · · Ai
ture and Ai+1 · · · An , respectively; then, ((PL ) · (PR )) is an optimal parenthesisation for
A1 · · · An .
Proof Let ((OL ) · (OR )) be an optimal parenthesisation (we know it has
this form by hypothesis), where OL and OR are parenthesisation
for A1 · · · Ai and Ai+1 · · · An , respectively. Also, let M (P ), be the
number of scalar multiplications required by a parenthesisation P .
Since PL and PR are optimal by hypothesis, we know that M (PL ) ≤
M (OL ) and M (PR ) ≤ M (OR ). This allows us to get that:
However, since ((PL ) · (PR )) needs less than or the same number of
scalar multiplications of the optimal parenthesisation, it necessarily
means that it is itself optimal.
Dynamic pro- We can note that our theorem gives us the following recursive formula, where m[i, j]
gramming is the optimal number of scalar multiplications for calculating Ai · · · Aj :
if i = j
(
0,
m[i, j] = min {m[i, k] + m[k + 1, j] + pi−1 pk pj }, if i < j
i≤k<j
47
Algorithms CHAPTER 5. DYNAMIC PROGRAMMING
j = i+l-1
// find max
m[i, j] = infinity
for k = i to j - 1:
q = (m[i, k] + m[k + 1, j]
+ p[i-1]p[k]p[j])
i f q < m[i, j]
m[i, j] = q
s[i, j] = k
return m and s
We can note that this is Θ n3 . A good way to see this is that we have n2
(A1 · · · Ak )(Ak+1 · · · An )
Friday 28th October 2022 — Lecture 11 : LCS but not LoL’s one
Remark This problem can for instance be useful if we want a way to compute
how far are two strings from each others: finding the length of the
longest common subsequence would be one way to measure this
distance.
Brute force We can note that brute force does not work, since we have 2m subsequences of X
and that each subsequence takes Θ(n) time to check (since we need to scan Y for
the first letter, then scan it for the second, and so on, until we know if this is also a
subsequence of Y ). This leads to a runtime complexity of Θ(n2m ).
Theorem: Op- We can note the following idea. Let’s say we start at the end of both words and
timal substruc- move to the left step-by-step (the other direction would also work), considering one
ture letter from both word at any time. If the two letters are the same, the we can let it
to be in the susequence. If they are not the same, then the optimal subsequence can
obtained by moving to the left in one of the words.
Let’s write this more formally. Let Xi and Yj denote the subprefixes hx1 , . . . , xn i
and hy1 , . . . , yj i, respectively. Also, let Z = hz1 , . . . zk i be any longest common
subsequence (LCS) of Xi and Yj . We then know that:
1. If xi = yj , then zk = xi = yj and Zk−1 is an LCS of Xi−1 and Yj−1 .
48
5.5. APPLICATION: LONGEST COMMON SUBSEQUENCE Notes by Joachim Favre
2. If xi =
6 yj and zk =6 xi , then Z is an LCS of Xi−1 and Yj .
3. If xi =6 yj and zk =6 yj , then Z is an LCS of Xi and Yj−1 .
Proof of the Let’s prove the first point, supposing for contradiction that zk 6=
first part of the xi = yj .
first point However, there is contradiction, since we can just create Z 0 =
hz1 , . . . , zk , xi i. This is indeed a new subsequence of Xi and Yj
which is one longer, and thus contradicts the fact that Z was the
longest common subsequence.
We also note that, if yj would be already matched to something
(meaning that we would not be able to match it now), it would
mean that zk = yj : yj is the last letter of Y and it must thus be the
last letter of Z. Naturally, we can do a similar reasoning to show
that xi was not already matched.
Proof of the The proofs of the second part of the first point, and for the second
rest and the third point (which are very similar) are considered trivial
and left as exercises to the reader. ,
Dynamic pro- The theorem above gives us the following recurrence, where c[i, j] is the length of an
gramming LCS of Xi and Yj :
0, if i = 0 or j = 0
c[i, j] = c[i − 1, j − 1] + 1, if i, j > 0 and xi = yi
max(c[i − 1, j], c[i, j − 1]), if i, j > 0 and xi 6= yj
We have to be careful since the naive implementation solves the same problems
many times. We can treat this problem with dynamic programming, as usual:
procedure LCSLength(X, Y, m, n):
// c is the length of an LCS
// b is the path we take to get our LCS
l e t b[1...m, 1...n] and c[0...m, 0...n] be new tables
// base cases
for i = 1 to m:
c[i, 0] = 0
for j = 0 to n:
c[0, j] = 0
// bottom up
for i = 1 to m:
for j = 1 to n:
// Three cases given by recurrence relation
i f X[i] == Y[j]:
c[i, j] = c[i-1, j-1] + 1
b[i, j] = (-1, -1)
// max of the two
else i f c[i-1, j] >= c[i, j -1]:
c[i, j] = c[i-1, j]
b[i, j] = (-1, 0)
else :
c[i, j] = c[i, j-1]
b[i, j] = (0, -1)
return c and b
We can note that time is dominated by instructions inside the two nested loops,
which are executed m · n times. We thus get that the total runtime of our solution
is Θ(mn).
We can then consider the procedure to print our result:
procedure PrintLCS(b, X, i, j):
i f i == 0 or j == 0:
return
di , dj = b[i, j]
PrintLCS(b, x, i+di , j+dj)
i f (di , dj) == (-1, -1):
print X[i]
49
Algorithms CHAPTER 5. DYNAMIC PROGRAMMING
We can note that each recursive call decreases i + j by at least one. Hence, the
time needed is at most T (i + j) ≤ T (i + j − 1) + Θ(1), which means it is O(i + j).
Also, each recursive call decreases i + j by at most two. Hence, the time needed
at least T (i + j) ≥ T (i + j − 2) + Θ(1), which means it is Ω(i + j). In other words,
this means that we can print the result in Θ(m + n).
where the numbers are stored in the array c and the arrows in
the array b. For this example, we would give us that the longest
common subsequence is Z = hA, B, B, Ai.
Example For instance, let us consider the following tree and probability table:
50
5.6. APPLICATION: OPTIMAL BINARY SEARCH TREE Notes by Joachim Favre
i 1 2 3 4 5
pi 0.25 0.2 0.05 0.3 0.3
Then, we can compute the expected search cost to be:
Observation We notice that the optimal binary search tree might not have the smallest height,
and that it might not have the highest-probability key at the root too.
Brute force Before doing anything too fancy, let us start by considering the brute force algorithm:
we construct each n-node BST, then for each put in keys, and compute their expected
search cost. We can finally pick the tree with the smallest expected search cost.
However, there are exponentially many trees, making our algorithm really bad.
This means that we, if we know two trees which we want to merge using a root, we
can easily compute the expected search cost through a recursive function:
r−1
!
E[search cost] = pr + p` + E[search cost in left subtree]
X
`=i
j
!
p` + E[search cost in right subtree]
X
+
`=r+1
j
= E[search cost left subtree] + E[search cost right subtree] +
X
p`
`=i
However, this means that, given the i, j, and r, the optimality of our tree only
depends on the optimality of the two subtrees (the formula above is minimal when
the two expected values are minimal). We have thus proven the optimal substructure
of our problem.
Let e[i, j] be the expected search cost of an optimal binary search tree of key
ki , . . . , kj . This gives us the following recurrence relation:
0, if i = j + 1
j
( )
e[i, j] =
p` , if i ≤ j
X
min e[i, r − 1] + e[r + 1, j] +
i≤r≤j
`=i
51
Algorithms CHAPTER 5. DYNAMIC PROGRAMMING
for i = 1 to n+1:
e[i, i-1] = 0
w[i, i-1] = 0
for l = 1 to n:
for i = 1 to n-l+1:
j = i + l - 1
e[i, j] = infinity
w[i, j] = w[i, j-1] + p[j]
for r = 1 to j:
t = e[i, r-1] + e[r+1, j] + w[i, j]
i f t < e[i, j]:
e[i, j] = t
root[i ,j] = r
return e and root
way to see this is that we have Θ n2 cells to fill in and that most cells take Θ(n)
52
Monday 7th November 2022 — Lecture 14 : I love XKCD
Chapter 6
Graphs
6.1 Introduction
Introduction Graphs are everywhere. For instance, in social media, when we have a bunch of
entities and relationships between them, graphs are the way to go.
Definition: A graph G = (V, E) consists of a vertex set V , and an edge set E that contains
Graph pairs of vertices.
Terminology We can have a directed graph, where such pairs are ordered (if
(a, b) ∈ E, then it is not necessarily the case that (b, a) ∈ E), or an
undirected graph where such pairs are non-ordered (if (a, b) ∈ E,
then (b, a) ∈ E).
Personal re- It is funny that we are beginning graph theory right now because,
mark very recently, XKCD published a comic about this subject:
https://xkcd.com/2694/
Definition: De- For a graph G = (V, E), the degree of a vertex u ∈ V , denoted degree(u), is its
gree number of outgoing vertices.
Storing a graph We can store a graph in two ways: adjacency lists and adjacency matrices. Note
that any of those two representations can be extended to include other attributes,
such as edge weights.
53
Algorithms CHAPTER 6. GRAPHS
Adjacency lists We use an array. Each index represents a vertex, where we store
the pointer to the head of a list containing its neighbours.
For an undirected graph, if a is in the list of b, then b is in the list
of a.
For instance, for the undirected graph above, we could have the
adjacency list:
Algorithm The idea is to send some kind of wave out form s. It will first hit all
vertices which are 1 edge from s. From those points, we can again
send some kind of waves, which will hit all vertices at a distance of
two edges from s, and so on. In other words, beginning with the
root, we look at the vertex closest to the current vertices, set all
its neighbours distance to the distance of the current vertex plus 1,
and store them in a queue to consider their neighbours later.
54
6.2. PRIMITIVES FOR TRAVERSING AND SEARCHING A GRAPH Notes by Joachim Favre
0 3
s e
1
c
1 2
a f
2 ∞
d h
3
b g
3
Note that cycles are never a problem, since we enqueue nodes only
if it had not been visited before, and thus only once.
Remark We can save the shortest path tree by keeping track of the edge that
discovered the vertex. Note that since each vertex (which is not
the root and which distance is not infinite) have exactly one such
vertex, and thus this is indeed a tree. Then, when given a vertex,
we can find the shortest path by using those pointers in reverse
orders, climbing the tree in the opposite direction.
55
Algorithms CHAPTER 6. GRAPHS
0 3
s e
1
c
1 2
a f
2 ∞
d h
3
b g
3
Depth-First BFS goes through every connected vertex, but not necessarily every edge. We would
Search now want to make an algorithm that goes through every edges. Note that this
algorithm may seem very abstract and useless for now but, as we will see right after,
it gives us a very interesting insight about a graph.
The idea is, starting from a given point, we get going following a path until we
get stuck, then backtrack, and get back going. Doing so, we want to output two
timestamps on each vertex v: the discovery time v.d (when we first start visiting a
node) and the finishing time v.f (when we finished visiting all neighbours of our
node). This algorithm is named Depth-First Search (DFS).
This is not really important where we start for now.
Algorithm Our algorithm can be stated the following way, where WHITE means
not yet visited, GREY means currently visiting, and BLACK finished
visiting:
procedure DFS(G):
for u in G.v:
u.color = WHITE
time = 0
for u in G.v:
i f u.color == WHITE:
DFSVisit(G, u)
Example For instance, running DFS on the following graph, where we have
two DFS-visit (one on b and one on e):
Analysis The runtime is Θ(V + E). Indeed, Θ(V ) since every vertex is
discovered once, and Θ(E) since each edge is examined once if it is
a directed graph and twice if it is an undirected graph.
56
6.2. PRIMITIVES FOR TRAVERSING AND SEARCHING A GRAPH Notes by Joachim Favre
https://xkcd.com/2407/
And there is also the following great XKCD on the slides:
https://xkcd.com/761/
Depth-First Just as BFS leads to a tree, DFS leads to a forest (a set of trees).
forest Indeed, we can again consider the edge that we used to discover a given vertex, to
be an edge linking this vertex to its parent. Since trees might be disjoint but we are
running DFS so that every edge is discovered, we may have multiple detached trees.
There will be examples hereinafter.
Formal defini- Very formally, each tree is made of edges (u, v) such that u (currently
tion explored) is grey and v is white (not yet explored) when (u, v) is
explored.
57
Algorithms CHAPTER 6. GRAPHS
Parenthesis We can think of the discovery time as an opening parenthesis and the finishing time
theorem a closing parenthesis. Let us note u’s discovery and finishing time by brackets and
v’s discovery and finishing times by braces. Then, to make a well-parenthesised
formulation, we have only the following possibilities:
1. (){}
2. {}()
3. ({})
4. {()}
However, it is for instance impossible to have ({)}.
Very formally, this yields that, for any vertex u, v, we have exactly one of the
following properties (where, in order, they exactly refer to each well-parenthesised
brace-parenthesis expresesions above):
1. u.d < u.f < v.d < v.f and neither of u and v are descendant of each other.
2. v.d < v.f < u.d < u.f and neither of u and v are descendent of each other.
3. u.d < v.d < v.f < u.f and v is a descendant of u.
4. v.d < u.d < u.f < v.f and u is a descendant of v.
White-path Vertex v is a descendant of u if and only if, at time u.d, there is a path from u to v
theorem consisting of only white vertices (except for u, which was just coloured grey).
Example For instance, in the following graph, tree edges are represented in
orange, back edges in blue, forward edges in red and cross edges in
green.
58
6.3. TOPOLOGICAL SORT OF GRAPHS Notes by Joachim Favre
Observation A different starting point for DFS will lead to a different edge
classification.
Topological sort We have as input a directed acyclic graph, and we want to output a linear ordering
of vertices such that, if (u, v) ∈ E, then u appears somewhere before v.
Use This can for instance really be useful for dependency resolution
when compiling files: we need to know in what order we need to
compiles files.
Example For instance, let us say that, as good computer scientists, we made
the following graph to remember which clothes we absolutely need
to put before other clothes, in order to get dressed in the morning:
Theorem A directed graph G is acyclic if and only if a DFS of G yields no back edges.
Example For instance, let’s consider the graph above. Running DFS, we may
get:
59
Algorithms CHAPTER 6. GRAPHS
Proof of cor- We want to show that, if the graph is acyclic and (u, v) ∈ E, then
rectness v.f < u.f .
When we traverse the edge (u, v), u is grey (since we are currently
considering it). We can then split our proof for the different colours
v can have:
1. v cannot be grey since, else, it would mean that we got to v
first, then got to u, and finally got back to v. In other words,
we would have v.d < u.d and thus v.d < u.d < u.f < v.f . This
would imply (u, v) would be a back edge, contradicting that our
graph is acyclic.
2. v could be white, which would mean that that it is a descendant
of u and thus, by the parenthesis theorem, we would have u.d <
v.d < v.f < u.f .
3. v could also be black, which would mean that v is already finished
and thus, definitely, v.f < u.d, implying that v.f < u.f .
Monday 14th November 2022 — Lecture 15 : I definitely really like this date
Observation For directed graph, this definition no longer really makes sense.
Since we may want one similar, we will define strongly connected
components right after.
Remark To verify if we indeed have a SCC, we first verify that every vertex
can reach every other vertex. We then also need to verify that it is
maximal, which we can do by adding any element which has one
connection to the potential SCC, and verifying that what it yields
is not a SCC.
Example ;
For instance, the first example is not a SCC since c 6 b, the second
is not either since we could add f and it is thus not maximal:
60
6.4. STRONGLY CONNECTED COMPONENTS Notes by Joachim Favre
Theorem: Exist- Any vertex belongs to one and exactly one SCC.
ence and unicity
of SCCs
Proof First, we notice that a vertex always belongs to at least one SCC
since we can always make an SCC containing one element (and
adding it enough elements so that to make it maximal). This shows
the existence.
Second, let us suppose for contradiction that SCCs are not unique.
Thus, for some graph, there exists a vertex v such that v ∈ C1 and
v ∈ C2 , where C1 and C2 are two SCCs such that C1 6= C2 . By
definition of SCCs, for all u1 ∈ C1 , we have u1 ; v and v ; u1 ,
and similarly for all u2 ∈ C2 . However, by transitivity, this also
means that u1 ; u2 and u2 ; u1 . This yields that we can create a
new SCC C1 ∪ C2 , which contradicts the maximality of C1 and C2
and thus shows the unicity.
Definition: Com- For a directed graph (digraph) G = (V, E), its component graph GSCC =
ponent graph V SCC , E SCC is defined to be the graph where V SCC has a vertex for each SCC
Theorem For any digraph G, its component graph GSCC is a DAG (directed acyclic graph).
Proof Let’s suppose for contradiction that GSCC has a cycle. This means
that we can access one SCC from G from another SCC (or more);
and thus any elements from the first SCC have a path to elements
of the second SCC, and reciprocally. However, this means that we
could the SCCs, contradicting their maximality.
61
Algorithms CHAPTER 6. GRAPHS
Kosarju’s al- The idea of Kosarju’s algorithm to compute component graphs efficiently is:
gorithm 1. Call DFS(G) to compute the finishing times u.f for all u.
2. Compute GT .
3. Call DFS(GT ) where the order of the main loop of this procedure goes in
order of decreasing u.f (as computed in the first DFS).
4. Output the vertices in each tree of the depth-first forest formed in second
DFS, as a separate SCC. Cross-edges represent links in the component graph.
Unicity Since SCCs are unique, the result will always be the same, even
though graphs can be traversed in very different ways with DFS.
Intuition The main intuition for this algorithm is to realise that elements from
SCCs can be accessed from one another when going forwards (in
the regular graph) or backwards (in the transposed graph). Thus,
we first compute some kind of “topological sort” (this is not a real
one this we don’t have a DAG), and use its reverse-order as starting
points to go in the other direction. If two elements can be accessed
in both directions, they will indeed be in the same tree at the end. If
two elements have one direction where one cannot access the other,
then the first DFS will order them so that we begin the second DFS
by the one which cannot access the other.
Personal re- The Professor used the name “magic algorithm” since we do not
mark prove this theorem and it seems very magical. I feel like it is better
to give it its real name, but probably it is important to know its
informal name for exams.
62
6.5. FLOW NETWORKS Notes by Joachim Favre
Friday 18th November 2022 — Lecture 16 : This date is really nice too, though
Definition: Flow A flow network is a directed graph G = (V, E), where each edge (u, v) has a capacity
network c(u, v) ≥ 0. This is function is such that, c(u, v) = 0 if and only if (u, v) 6∈ E. Finally,
we have a source node s and a sink node t.
We also assume that there are never antiparallel edges (both (u, v) ∈ E and (v, u) ∈
E). This supposition is more or less without loss of generality since, then, we could
just break one of the antiparallel edges into two edges linking a new node v 0 (see
the picture below). This will simplify notations in our algorithm.
Definition: Flow A flow is a function f : V × V 7→ R satisfying the two following constraints. First,
the capacity constraint states that, for all u, v ∈ V , we have:
0 ≤ f (u, v) ≤ c(u, v)
In other words, the flow cannot be greater than what is supported by the pipe. The
second constraint is flow conservation, which states that for all u ∈ V \ {s, t}:
X X
f (v, u) = f (u, v)
v∈V v∈V
In other words, the flow coming into u is the same as the flow coming out of u.
Notation We will note flows on a flow network by noting f (u, v)/c(u, v) for
all edge. For instance, we could have:
which is the flow out of the source minus the flow into the source.
63
Algorithms CHAPTER 6. GRAPHS
Example For instance, for the flow graph and flow hereinabove:
|f | = (1 + 2) − 0 = 3
Goal The idea is now to develop an algorithm that, given a flow network, we find the
maximum flow. The basic idea that could come to mind is to take a random path
through our network, consider its bottleneck link, and send this value of flow onto
this path. We then have a new graph, with capacities reduced and some links less
(if the capacity happens to be 0). We can continue this iteratively until the source
and the sink are 0.
This idea we would for example on the following (very simple) flow network:
Indeed, its bottleneck link has capacity 3, so we send 3 of flow on the only path.
Then, it leads to a new graph with one edge less, where the source and the sink are
no longer connected.
However, we notice that we suddenly have problems on different graphs. This
algorithm may produce the following sub-optimal result on the following flow network:
This means that we need a way to “undo” bad choices of paths done by our algorithm.
To do so, we will need the following definitions.
Definition: Re- Given a flow network G and a flow f , the residual capacity is defined as:
sidual capacity
c(u, v) − f (u, v), if (u, v) ∈ E
cf (u, v) = f (v, u), if (v, u) ∈ E
0, otherwise
The main idea of this function is its second part: the first part is just the capacity
left in the pipe, but the second part is a new, reversed, edge we add. This new edge
holds a capacity representing the amount of flow that can be reversed.
Example For instance, if we have an edge (u, v) with capacity c(u, v) = 5 and
current flow f (u, v) = 3, then cf (u, v) = 5 − 3 = 2 and cf (v, u) =
f (u, v) = 3.
Remark This definition is the reason why we do not want antiparallel edges:
the notation is much simpler without.
Definition: Re- Given a flow network G and flow f , the residual network Gf is defined as:
sidual network
Gf = (V, Ef ), where Ef = {(u, v) ∈ V × V : cf (u, v) > 0}
We basically use our residual capacity function, removing edges with 0 capacity left.
64
6.5. FLOW NETWORKS Notes by Joachim Favre
Definition: Aug- Given a flow network G and flow f , an augmenting path is a simple path (never
menting path going twice on the same vertex) from s to t in the residual network Gf .
Augmenting the flow f by this path means applying the minimum capacity over
the path: add it to edges which were here at the start, and remove it to edges we
added after through the residual capacity. This can easily be seen by looking at the
definition of residual capacity (if (u, v) ∈ E, then we use the opposite of the flow, if
(v, u) ∈ E, then we use the positive version of the flow).
Ford-Fulkerson The idea of the Ford-Fulkerson greedy algorithm for finding the maximum flow in a
algorithm flow network is, as the one we had before, to improve our flow iteratively; but using
residual networks in order to cancel wrong choices of paths.
Example Let’s consider again our non-trivial flow network, and the suboptimal
flow our naive algorithm found:
Now, the new algorithm will indeed be able to take the new path.
Taking the edge going from bottom to top basically cancels the
choice it did before to choose it. Being careful to apply the new
path correctly (meaning to add it to edges from G and remove it
to edges introduced by the residual network), we get the following
flow and residual network:
Proof of optim- We will want to prove its optimality. However, to do so, we need
ality the following definitions.
Definition: Cut A cut of flow network G = (V, E) is a partition of V into S and T = V \ S such
of flow network that s ∈ S and t ∈ T .
In other words, we split our graph into nodes on the source side and on the sink
side.
Example For instance, we could have the following cut (where nodes from S
are coloured in black, and ones from T are coloured in white):
65
Algorithms CHAPTER 6. GRAPHS
Note that the cut does not necessarily have to be a straight line
(since, anyway, straight lines make no sense for a graph).
f (S, T ) = 12 + 11 − 4 = 19
|f | = f (S, T )
Definition: Capa- The capacity of a cut S, T is defined as:
city of a cut
X
c(S, T ) = c(u, v)
u∈S
v∈T
Example For instance, on the graph hereinabove, the capacity of the cut is:
12 + 14 = 26
Note that we do not add the 9, since it goes in the wrong direction.
66
6.5. FLOW NETWORKS Notes by Joachim Favre
Monday 21st November 2022 — Lecture 17 : The algorithm may stop, or may not
|f | ≤ c(S, T )
And thus:
X X
|f | ≤ f (u, v) ≤ c(u, v) = c(S, T )
u∈S u∈S
v∈T v∈T
Definition: Min- Let f be a flow. A min-cut is a cut with minimum capacity. In other words, it is a
cut cut (Smin , Tmin ), such that for any cut (S, T ):
Remark By the property above, the value of the flow is less than or equal to
the min-cut:
|f | ≤ c(Smin , Tmin )
We will prove right after that, in fact, |fmax | = c(Smin , Tmin ).
Max-flow min- Let G = (V, E) be a flow network, with source s, sink t, capacities c and flow f .
cut theorem Then, the following propositions are equivalent:
1. f is a maximum flow.
2. Gf has no augmenting path.
3. |f | = c(S, T ) for some cut (S, T ).
Remark This theorem shows that the Ford-Fulkerson method gives the
optimal value. Indeed, it terminates when Gf has no augmenting
path, which is, as this theorem says, equivalent to having found a
maximum flow.
Proof (1) =⇒ Let’s suppose for contradiction that Gf has an augmenting path p.
(2) However, then, Ford-Fulkerson method would augment f by p to
obtain a flow with increased value. This contradicts the fact that f
was a maximum flow.
Proof (2) =⇒ Let S be the set of nodes reachable from s in the residual network,
(3) and T = V \ S.
Every edge going out of S in G must be at capacity. Indeed,
otherwise, we could reach a node outside S in the residual network,
contradicting the construction of S.
Since every edge is at capacity, we get that f (S, T ) = c(S, T ).
However, since |f | = f (S, T ) for any cut, we indeed find that:
|f | = c(S, T )
Proof (3) =⇒ We know that |f | ≤ c(S, T ) for all cuts S, T . Therefore, if the
(1) value of the flow is equal to the capacity of some cut, it cannot be
improved. This shows its maximality.
67
Algorithms CHAPTER 6. GRAPHS
Summary All this shows that our Ford-Fulkerson method for finding a max-flow works:
start with 0-flow
while there i s an augmenting path from s to t in the residual network:
find an augmenting path
compute the bottleneck // the min capacity on the path
increase the flow on the path by the bottleneck and update the residual
network
// flow is maximal
Also, when we have found a max-flow, we can use our flow to find a min-cut:
i f no augmenting path exists in the residual network:
find the set of nodes S reachable from s in the residual network
set T = V \ S
// (S, T) is a minimum cut
High complexity It takes O(E) to find a path in the residual network (using breadth-first search for
analysis instance). Each time, the flow value is increased by at least 1. Thus, the running
time has a worst case of O(E|fmax |).
We can note that, indeed, there are some cases where we reach such a complexity if
we always choose the bad path (the one taking the link in the middle here, which
will always exist on the residual network):
This graph would never terminate before the heat death of the universe.
Lower complex- In fact, if we don’t choose our paths randomly and if the capacities are integers
ity analysis (or rational numbers, this does not really matter since we could then just multiply
everything by the lowest common divisor an get an equivalent problem), then we
can get a much better complexity.
If we take the shortest path given by BFS, then the complexity is bounded by 12 EV .
If we take the fattest path (the path which bottleneck has the largest capacity),
then the complexity is bounded by E log(E|fmax |).
Observation If the capacities of our network are irrational, then the Ford-Fulkerson method might
not really terminate.
Application: Bi- Let’s consider the Bipartite matching problem. It is easier to explain it with an
partite matching example. We have N students applying for M ≥ N jobs, where each student get
problem several offers. Every job can be taken at most once, and every student can have at
most one job.
68
6.5. FLOW NETWORKS Notes by Joachim Favre
If the Ford-Fulkerson method gives us that |fmax | = N , then every student was
able to find a job. Indeed, flows obtained by Ford-Fulkerson are integer valued if
capacities are integers, so the value on every edge is 0 or 1. Since every student
has the in-flow for at most one job, and each job has the out-flow for at most one
student, there cannot be any student matched to two jobs or any job matched to
two students, by conservation of the flow.
Application: In an undirected graph, we may want to know the minimum number of routes that
Edge-disjoint we can take that do not share a common road. To do so, we set an edge of capacity
paths problem 1 for both directions for every roads (in a non-anti-parallel fashion, as seen earlier).
Then, the max-flow is the number of edge-disjoint paths, and the min-cut shows the
minimum number of roads that need to be closed so that there is no more route
going from the start to the end.
69
Algorithms CHAPTER 6. GRAPHS
Friday 25th November 2022 — Lecture 18 : Either Levi or Mikasa made this function
S = S \ Sx \ Sy ∪ {Sx ∪ Sy }
Linked list rep- A way to represent this data structure is through a linked list. To do so, each set is
resentation an object looking like a single linked list. Each set object is represented by a pointer
to the head of the list (which we will take as the representative) and a pointer to
the tail of the list. Also, each element in the list has a pointer to the set object and
to the next element.
Make-Set For the procedure Make-Set(x), we can just create a singleton list
containing x. This is easily done in time Θ(1).
Find For the procedure Find(x), we can follow the pointer back to the
list object, and then follow the head pointer to the representative.
This is also done in time Θ(1).
Union For the procedure Union(x, y), everything gets more complicated.
We notice that we can append a list to the end of another list.
However, we will need to update all the elements of the list we
appended to point to the right set object, which will take a lot of
time if its size is big. So, to do so, we can just append the smallest list
to the largest one (if their size are equal, we can make an arbitrary
choice). This method is named weighted-union heuristic.
We notice that, on a single operation, both ideas have exactly the
same bound. So, to understand why this is better, let’s consider
the following theorem.
70
6.6. DISJOINT SETS Notes by Joachim Favre
Proof with The inefficiency comes from constantly rewiring our elements when
running the Union procedure. Let us count how many times an
element i may get rewire if, amongst those m operations, there are
n Union calls.
When we merge a set A containing i and another set B, if we have
to update wiring of i, then it means that the size of the list A was
smaller than the one of B, and thus the size of the total list of A ∪ B
is at least twice the size of the one of A. However, we can double
the sizes of a list at most log(n) times, meaning that the element i
has been rewired at most log(n) times. Since we have n elements
for which we could have made the exact same analysis, we get a
complexity of O(n log(n)) for this scenario.
Note that we also need to consider the case where there are many
more Make-Set and Find calls than Union ones. This is pretty
trivial since they are both Θ(1), and thus this case is Θ(m).
Putting everything together, we get a worst case complexity of
O(m + n log(n)) = O(max{m, n log(n)}).
Proof without Let’s say that we have n elements each in a singleton set and that
our m operations consist in always appending the list of the first
set to the second one, through unions. This way, the first set
will get a size constantly growing. Thus, we will have to rewire
1 + 2 + . . . + n − 1 elements, leading to a worst case complexity of
O n2 for this scenario.
Again, considering the case where there are mostly Make-Set and
Find call, it leads to a complexity of Θ(m). Putting everything
together, we indeed get a worst case of O m + n2 .
Remark This kind of analysis is amortised complexity analysis: we don’t
make our analysis on a single operation, since we may have a really
bad case happening. However, on average, it is fine.
Forest of trees Now, let’s consider instead a much better idea. We make a forest of trees (which are
not binary), where each tree represents one set, and the root is the representative.
Also, since we are working with trees, naturally each node only points to its parent.
71
Algorithms CHAPTER 6. GRAPHS
procedure FindSet(x):
i f x != x.p:
x.p = FindSet(x.p) // update parent
return x.p
Union For Union(x, y), we can make the root of one of the trees the child
of another.
Again, we can optimise this procedure with another great heuristic:
union by rank. For the Find(x) procedure to be efficient, we need
to keep the height of our trees as small as possible. So, the idea
is to append the tree with smallest height to the other. However,
using heights is not really efficient (since they change often and are
thus hard to keep track of), so we use ranks instead, which give the
same kind of insights.
procedure Union(x, y):
Link(FindSet(x), FindSet(y))
Example For instance, in the following graph, we have two connected com-
ponents:
This means that our algorithm will give us two disjoint sets in the
end.
72
6.7. MINIMUM SPANNING TREE Notes by Joachim Favre
Minimum span- The goal is now that, given an undirected graph G = (V, E) and weights w(u, v) for
ning tree (MST) each edge (u, v) ∈ E, we want to output a spanning tree of minimum total weight
(which sum of weights is the smallest).
Application: This problem can have many applications. For instance, let’s say
Communica- we have some cities between which we can make communication
tion networks lines at different costs. Finding how to connect all the cities at the
smallest cost possible is exactly an application of this problem.
73
Algorithms CHAPTER 6. GRAPHS
Definition: Cut Let G = (E, V ) be a graph. A cut (S, V \ S) is a partition of the vertices into two
non-empty disjoint sets S and V \ S.
Definition: Let G = (E, V ) be a graph, and (S, V \ S) be a cut. A crossing edge is an edge
Crossing edge connecting a vertex from S to a vertex from V \ S.
Theorem: Cut Let S, V \ S be a cut. Also, let T be a tree on S which is part of a MST, and let e
property be a crossing edge of minimum weight.
Then, there is a MST of G containing both e and T .
74
6.7. MINIMUM SPANNING TREE Notes by Joachim Favre
see that starting with a subtree of a MST and adding the crossing
edge with smallest weight yields another subtree of a MST by the
cut property.
Implementa- We need to keep track of all the crossing edges at every iteration,
tion and to be able to efficiently find the minimum crossing edge at every
iteration.
Checking out all outgoing edges is not really good since it leads to
O(E) comparisons at every iteration and thus a total running time
of O(EV ).
Let’s consider a better solution. For every node w, we keep a value
dist(w) that measure the “distance” (the minimum sum of weights
to reach it) of w from the current tree. When a new node u is
added to the tree, we check whether neighbours of u have their
distance to the tree decreased and, if so, we decrease it. To extract
the minimum efficiently, we use a min-priority queue for the nodes
and their distances. In pseudocode, it looks like:
procedure Prim(G, w, r):
l e t Q be an empty min -priority queue
for each u in G.V:
u.key = infinity
u.pred = Nil
Insert(Q, u)
decreaseKey(Q, r, 0) // set r.key to 0
while !Q.isEmpty ():
u = extractMin(Q)
for each v in G.Adj[u]:
i f v in Q and w(u, v) < v.key:
v.pred = u
decreaseKey(Q, v, w(u, v))
Analysis Initialising Q and the first for loop take O(V log(V )) time. Then,
decreasing the key of r takes O(log(V )). Finally, in the while loop,
we make V extractMin calls—leading to O(V log(V ))—and at most
E decreaseKey calls—leading to O(E log(V )).
In total, this sums up to O(E log(V )).
Kruskal’s al- Let’s consider another way to solve this problem. The idea of Kruskal’s algorithm
gorithm for finding MSTs is to start from a forest T with all nodes being in singleton trees.
Then, at each step, we greedily add the cheapest edge that does not create a cycle.
The forest will have been merged into a single tree at the end of the procedure.
75
Algorithms CHAPTER 6. GRAPHS
Analysis Initialising result is in O(1), the first for loop represents V makeSet
calls, sorting E takes O(E log(E)) and the second for loop is O(E)
findSets and unions. We thus get a complexity of:
We can note that, if the edges are already sorted, then we get a
complexity of O(Eα(V )), which is almost linear.
Negative-weight Note that we will try to allow negative weights, as long there is no negative-weight
edges cycle (a cycle which sum is negative) reachable from the source (since then we could
just keep going around in the cycle and all nodes would have distance −∞). In fact,
one of our algorithm will allow to detect such negative-weight cycles.
76
6.8. SINGLE-SOURCE SHORTEST PATHS Notes by Joachim Favre
Application This can for instance be really interesting for exchange rates. Let’s
say we have some exchange rate for some given currencies. We are
wondering if we can make an infinite amount of money by trading
money to a currency, and then to another, and so on until, when
we come back to the first currency, we have made more money.
To compute this, we need to compute the product of our rates.
Since we will want to apply minimum-path and we will compute
their sum, we can take a logarithm on every exchange rate. That
way, minimising this sum of logarithms is equivalent to minimising
the logarithm of the product of currencies, and thus minimising the
product of currencies. Moreover, we want to find the way to make
the maximum amount of money, so we can just consider the opposite
of all our logarithms. To sum up, we have w(u, v) = − log(r(u, v)).
Now, we only need to find negative cycles: they allow to make an
infinite amount of money.
Friday 2nd December 2022 — Lecture 20 : I like the structure of maths courses
Bellman-Ford We receive in input a directed graph with edge weights, a source s and no negative
algorithm cycle, and, for each vertex v, we want to output `(v), the distance of the shortest
path, and π(v) = pred(v), the predecessor of v in the shortest path (this is enough
to reconstruct the path at the end).
Note that, as the algorithm iterates, `(v) will always be the current upper estimate
of the length of the shortest path to v, and pred(v) be the predecessor of v in this
shortest path.
Algorithm The idea is to iteratively relax all paths, meaning to replace the
current path by a new path if it is better:
procedure Relax(u, v, w):
i f u.d + w(u, v) < v.d:
v.d = u.d + w(u, v)
v.pred = u
// Main algorithm
for i = 1 to len(G.V) - 1:
for each edge (u, v) in G.E:
relax(u, v, w)
Note that the negative cycle detection will be explained through its
proof after.
77
Algorithms CHAPTER 6. GRAPHS
Analysis InitSingleSource updates ` and pred for each vertex in time Θ(V ).
Then, the nested for loops run Relax V − 1 times for each edge,
giving Θ(EV ). Finally, the last for loop runs once for each edge,
giving a time of Θ(E).
This gives us a total runtime of Θ(EV ).
Lemma: Optimal If (s, v1 , . . . , vk+1 ) is a shortest path from s to vk+1 , then (s, v1 , . . . , vk ) is a shortest
substructure path from s to vk .
Since `(v) never increases, we know that this property will hold
until the end of the procedure, which concludes our proof.
Lemma: Num- If there are no negative cycles reachable from s, then for any v there is a shortest
ber of edges in path from s to v using at most |V | − 1 edges.
shortest path
Proof Let’s suppose for contradiction that a shortest path with the smallest
number of edges has |V | or more edges. However, this means that
there exists at least a vertex the path uses at least twice by the
pigeon-hole principle, meaning that there is a cycle. Since the weight
of this cycle is non-negative, it can be removed without increasing
78
6.8. SINGLE-SOURCE SHORTEST PATHS Notes by Joachim Favre
the length of the path, contradicting that it had the smallest number
of edges.
Theorem: Op- If there is no negative cycle, Bellman-Ford will return the correct answer after |V | − 1
timality of iterations.
Bellamn-Ford
Proof The proof directly comes from the two previous lemmas: there
always exists a shortest path with at most |V | − 1 edges when there
is no negative cycle, and after |V | − 1 iterations we are sure that
we have found all paths with at most |V | − 1 edges.
Theorem: Detec- There is no negative cycle reachable from s if and only if the `-value of no node
tion of negative change when we run a |V |th iteration of Bellman-Ford.
cycles
Proof We already know from the lemma telling the maximum minimum
number of edges in the shortest path, that if there are no negative
cycles reachable from the source, then the `-values don’t change in
the nth iteration. We would now want to show that this is actually
equivalent, by showing that if the `-values of the vertices do not
change in the nth iteration, then there is no negative cycle that is
reachable from the source.
If there is no cycle, then definitely there is no negative cycle and
everything works perfectly. Let’s thus consider the case where
there are cycles. We consider any cycle, (v0 , v1 , . . . , vt−1 , vt ) with
v0 = vt . Since no `-value changed by hypothesis, we know that, by
construction of the Relax procedure, `(vi ) ≤ `(vi−1 ) + w(vi−1 , vi )
(since it is true for all edges, then it is definitely true for the ones in
our cycle). We can sum those values:
t
X t
X t
X Xt
`(vi ) ≤ (`(vi−1 ) + w(vi−1 , vi )) = `(vi−1 )+ w(vi−1 , vi )
i=1 i=1 i=1 i=1
Remark We have shown that Bellman-Ford returns true if and only if there
is no (strictly) negative cycle.
Remark If we have a DAG (directed acyclic graph), we can first use a topological sort,
followed by one pass of Bellman-Ford using this ordering. This allows to find a
solution in O(V + E).
Dijkstra’s al- Let’s consider a much faster algorithm, which only works with non-negative weights.
gorithm The idea is to make a version of BFS working with weighted graphs. We start
with a set containing the source S = {s}. Then, we greedily grow the source S by
iteratively adding to S the vertex that is closest to S (v 6∈ S such that it minimises
minu∈S u.d + w(u, v)). To find those, we use a priority queue. At any time, we have
found the shortest path of every element in S.
79
Algorithms CHAPTER 6. GRAPHS
Implementa- The program looks like Prim’s algorithm, but we are minimising
tion u.d + w(u, v) instead of w(u, v):
procedure Dijkstra(G, w, s):
InitSingleSource(G, s)
l e t S be an empty set
l e t Q be an empty priority queue
insert all G.V into Q
while !Q.isEmpty ():
u = ExtractMin(Q)
S = union(S, u)
for each v in G.Adj[u]:
Relax(u, v, w, Q) // also decreases the key of
v in Q
Analysis Just like Prim’s algorithm, the running time is dominated by opera-
tions on the queue. If we implement it using a binary heap, each
operation takes O(log(V )) time, meaning that we have a runtime
of O(E log(V )).
Note that we can use a more careful implementation of the priority
queue, leading to a runtime of O(V log(V ) + E). This is almost as
efficient as BFS.
Proof The proof of optimality of this algorithm is considered trivial and
left as an exercise to the reader. The idea is to show the loop
invariant stating that, at the start of each iteration, we have for all
v ∈ S that the distance v.d from s to v is equal to the shortest path
from s to v. This naturally uses the assumption that all weights are
non-negative (positive or zero).
Friday 16th December 2022 — Lecture 24 : Doing fun stuff with matrices (really)
80
6.9. ALL-PAIRS SHORTEST PATHS Notes by Joachim Favre
We thus hope that we could use those algorithms to make sense of:
We can see that it has the form of a component of a matrix resulting a matrix
multiplication. In other words, defining Dm = (dm (i, j)) and W = (w(i, j)) and a
matrix multiplication A B defined element wise by our reinterpretation above,
then we could write:
Dm = Dm−1 W
This might seem a bit abstract, but we are allowed to do so since (R, min, +) is a
semiring (it is not important if we don’t exactly understand this explanation; we
only need to get the main idea).
In fact, since Dm = Dm−1 W , this means that Dm = W m . If we just use the naive
method
to compute this product, it takes |V | “multiplications” and thus we still have
time. However, we can notice that we can use fast exponent—squaring
4
Θ |V |
repeatedly and multiplying the correct terms—giving O |V | log|V | .
3
We may want to try doing more. We designed all this in the idea of using Strassen’s
algorithm, but sadly we cannot use it since the minimum operation does not have
an inverse.
Application This idea of applying matrix multiplications with modified opera-
tions is a really nice one, which can be applied to many problems.
For instance, let’s say we want to solve an easier version of our
problem: transitive closure. We want to output an array t[i, j] such
that:
1, if there is a path from i to j
(
t[i, j] =
0, otherwise
Now, we can use ∨ (OR) and ∧ (AND) operations:
({0, 1}, ∨, ∧) is not a ring, but we can still use fast matrix multiplic-
ation in our case. This gives us a O V 2.376 log(V ) time (which is
really good).
Floyd-Warshall The idea of this algorithm is to make another dynamic program, but in a more clever
algorithm way.
Let’s consider a slightly different subproblem: let ck (u, v) be the weight of the
shortest path from u to v with intermediate vertices in {1, 2, . . . , k}. Our guess is
that our shortest path uses vertex k or not, leading to:
if k = 0
(
w(u, v),
ck (u, v) =
min(ck−1 (u, v), ck−1 (u, k) + ck−1 (k, v)), if k 6= 0
81
Algorithms CHAPTER 6. GRAPHS
Johnson’s al- Our new idea takes is source on the hope to apply Dijkstra |V | times. However, our
gorithm: Intu- graph may have negative edge weights, so we need to find a clever way to transform
ition our graph in order to only have non-negative weights and keeping the propriety that
the shortest path over those weights is the shortest path over the original weights.
To do so, we first need the following lemmas.
Theorem: Let h : V 7→ R be any function. Let’s also suppose that for all u, v ∈ V we have the
Weight trans- following property:
formation
wh (u, v) = w(u, v) + h(u) − h(v)
where wh (u, v) are the edge weights of a modified version of our graph. Then, for
any path p:
w(p) = wh (p) − h(u) + h(v)
where wh (p) is the weight of our path p on the modified version of our graph.
Intuition In other words, if we are able to construct a new graph with weights
wh (u, v) which are all positive, then, computing a distance on
this graph, we can easily transform it to a distance on the non-
transformed graph.
In particular, if we find a path which has the shortest distance
on the modified graph, then it will also be the path with shortest
distance on our original graph.
Proof This is just a telescopic series. Supposing that we have any path
p = (x0 , x1 , . . . , xk ) where x0 = u and xk = v, it yields thatè:
k
X
wh (p) = wh (xi−1 , xi )
i=1
k
X
= (w(xi−1 , xi ) + h(xi−1 ) − h(xi ))
i=1
k
!
X
= w(xi−1 , xi ) + h(x0 ) − h(xk )
i=1
= w(p) + h(u) − h(v)
Definition: Sys- Our goal is to have a modified graph with positive edges. This means that, for all
tem of difference u, v ∈ V we want:
constraints
wh (u, v) = w(u, v) + h(u) − h(v) ≥ 0 ⇐⇒ h(v) − h(u) ≤ w(u, v)
82
6.9. ALL-PAIRS SHORTEST PATHS Notes by Joachim Favre
all i, we get:
k
X k
X
w(c) = w(vi−1 , vi ) ≥ (h(vi ) − h(vi−1 )) = 0
i=1 i=1
w(s, x) = 0, ∀x ∈ V
h(v) = d(s, v)
We want to show that h has the required property. Before doing so,
we can notice that, by the triangle inequality, taking the shortest
path from s to u and then from u to v cannot be a shorter path
than directly taking the shortest path form s to v:
as required.
Johnson’s al- We now have all the keys to make Johnson’s algorithm, in order to find the all-pair
gorithm shortest path on a given graph.
First, we consider a new graph with a new vertex s connected to all other vertices
with edges of weight 0. We run Bellman-Ford on this modified graph from s, i.e.
to get the shortest paths from s to all vertices. We let h(v) = d(s, v) for all vertex
v ∈ V (note that this is useful if there are paths of negative weight going to v).
Second, we run |V | times Dijkstra on another graph where we let wh (u, v) = w(u, v)+
h(u) − h(v) (this can be done without modifying the graph, we just need to fake the
way Dijkstra sees weights). At the same time, when we compute the distance dh (u, v)
from a vertex u to a vertex v, we output the distance d(u, v) = dh (u, v) − h(u) + h(v).
The first Bellman-Ford takes O(V E). Then, the |V | Dijkstra take
O V E + V 2 log(V ) . This indeed yields an algorithm running in
O V E + V 2 log(V ) .
83
Algorithms CHAPTER 6. GRAPHS
84
Monday 5th December 2022 — Lecture 21 : Stochastic probabilistic randomness
Chapter 7
Probabilistic analysis
7.1 Introduction
Goal We see that the worst case usually does not happen, so doing average case and
amortised analysis seems usually better. In fact, using randomness on the input
may allow us to get out of the worst case and lend in the average case: randomising
the elements in an array before sorting it allows avoid the reverse-sorted case.
Also, randomness is really important in cryptography.
Hiring problem Let’s say we have n persons entering one after the other, and we want to hire tall
people. Our strategy is that, when someone comes in, we see if he or she is taller
than the tallest person we had hired before. We want to know how many people we
will have hired at the end on average.
For instance, if n = 3 people enter with sizes 2, 1, 3, then we will hire 2 and 3, and
thus have hired 2 people.
Unsatisfactory First, we notice that, in the worst case, we can hire n person if they
answer come from the shortest to the tallest. In the contrary, in the best
case, we hire 1 person if they come from the tallest to the shortest.
However, we realise that we should expect them to enter in uniform
random order. Listing all n! possibilities for n = 3, we get an
expected value of:
3+2+2+2+1+1 11
=
3! 6
The thing is, listing n! possibilities is really not tractable as n grows.
So, we will need to develop a more intelligent theory.
Theorem: Linear- Expected values are linear. In other words, for n random variables X1 , . . . , Xn and
ity of expected constants α1 , . . . , αn :
values
E(α1 X1 + . . . + αn Xn ) = α1 E(X1 ) + . . . + αn E(Xn )
85
Algorithms CHAPTER 7. PROBABILISTIC ANALYSIS
Definition: In- Given a sample space (the space of all possible outcomes) and an event A, we define
dicator Random the indicator random variable to be:
Variables
1, if A occurs
(
I(A) =
0, if A does not occurs
E(XA ) = P(A)
Example: Coin Let’s say we throw a coin n times. We want to know the expected number of heads.
flip We could compute it using the definition of expected values:
n
X
E(X) = kP(X = k)
k=0
This is the binomial distribution, so it works very well, but we can find a much
better way.
Let’s consider a single coin. Our sample space is {H, T } where every event has the
same probability to occur. Let us take an indicator variable XH = I(H), which
counts the number of heads in one flip. Since P(H) = 12 , we get that E(XH ) = 12 by
our theorem.
Now, let X be a random variable for the number of heads in n flips, and let
Xi = I(ith flip result). By the linearity of the expectations, we get:
1 1 n
E(X) = E(X1 + . . . + Xn ) = E(X1 ) + . . . + E(Xn ) = + ... + =
2 2 2
as expected (lolz).
Back to hiring Let’s consider again the hiring problem. We want to do a similar analysis as for the
problem coin flip example.
Let Hi be the event that the candidate i is hired, and let Xi = I(Hi ). We need to
find this expected value and, to do so, we need P(Hi ). There are i! ways for the first
i candidates to come, and only (i − 1)! where the ith is the tallest (we set the tallest
to come last, and compute the permutations of the (i − 1) left).. Since people are
hired if and only if they are the tallest, it gives us:
(i − 1)! 1
E(Xi ) = P(Hi ) = =
i! i
Thus, we find that:
1 1
E(X) = E(X1 + . . . + Xn ) = E(X1 ) + . . . + E(Xn ) = + . . . + = Hn
1 n
We know that the harmonic partial series has a complexity of log(n)+O(1). However,
the best case is 1 with probability n1 (if the tallest is the first), and the worst case is
n with probability n! 1
(if they come in sorted order, from shortest to tallest). By
randomising our input before hiring people, it allows us to avoid malicious users to
input the worst case and to (almost with certainty) land in the average-case scenario
of log(n) + O(1).
Randomness Let’s say that we have a random function that returns 1 with probability p and 0
extraction with probability 1 − p. We want to generate 0 or 1 with probability 12 .
To do so, we can generate a pair of random numbers (a, b). We notice that there is
the same probability p(1 − p) to get (a, b) = (0, 1) and (a, b) = (1, 0). Thus, if we
86
7.2. HASH FUNCTIONS AND TABLES Notes by Joachim Favre
generate new pairs until we have a 6= b, we can then just output a, which will be 0
or 1 with both probability 12 .
In the worst case, this method requires an infinite number of throws, but we can
compute that the expected number of throws is 2p(1−p)1
, meaning that we are fine
on average (if p is not too small and not too close to 1).
Birthday lemma Let M be a finite set, q ∈ N and f : {1, 2, . . . , q} 7→ M a function chosen uniformly
at random (meaning that f (1) is chosen uniformly random in M , and so on until
f (q); whenp those values are chosen they become fixed).
If q > 1.78 |M |, then the probability that f is injective (meaning that there is no
collision) is at most 12 , i.e. the P(injective) ≤ 12 .
m m−1 m − (q − 1)
P(injective) = · ···
m m m
1 q−1
= 1 1− ··· 1 −
m m
However, we know that 1 − x ≤ e−x for all x (this can be proven
easily by the Taylor series of the exponential). So:
1 q−1
P(injective) ≤ e−0 e− m · · · e− m
0 + 1 + . . . + (q − 1)
= exp −
m
q(q − 1)
= exp −
m
This is really interesting since it tells us that the probability that f
is injective is exponentially small.
This gives us an equation:
q(q − 1) 1
exp − ≤
2m 2
⇐⇒ q 2 − q ≥ 2 ln(2)m
p
1 + 1 + 8 ln(2) √ √
=⇒ q ≥ m ≈ 1.78 m
2
87
Algorithms CHAPTER 7. PROBABILISTIC ANALYSIS
h(k) = k mod m
Naive table We want to design a data structure which allows to insert, delete and search for
elements in (expected) constant time.
A first way to do it is to give a unique number to every element (such as ISBN if we
are considering books), and we can make a table with an entry with each possible
number. This indeed gives a running time of O(1) for each operation, but a space of
O(|U |). The thing is we only have |K| |U | keys, so this is very space inefficient.
Bad hash table In our table, instead of storing an element with key k into index k, we store it in
h(k), where h : U 7→ {0, 1, . . . , m − 1} is a hash function (note that the array needs
m slots now). Two keys could collide, but we take m to be big enough so that it is
very unlikely. Search, insertion and deletion are also be in O(1) in the average case
as long as there is no collision.
By the birthday lemma, for h to be injective with good probability, we need m > K 2 .
This means that we would still have a lot of empty memory, which is really inefficient.
Hash table Let’s now (finally) see a correct hash table. The goal is to have a space proportional
to the number K of keys stored, i.e. Θ(K). Instead of avoiding every collision, we
will try to deal with them.
In every index we can instead store a double-linked list. Then, searching, inserting
and deleting are also very simple to implement. To insert an element, we insert it at
the head of the linked list in the good slot, which can be done in Θ(1). To delete an
element we can remove it easily from the linked list in Θ(1), since we are given a
pointer to it. Searching however requires to search in the list at the correct index,
which takes a time proportional to the size of this list. The worst case for searching
is thus Θ(n) (if the hash function conspires against us and puts all elements in the
same slot), but we want to show that this is actually Θ(1) in average case, without
requiring m too large. This is done in the following paragraphs.
Expected list Let nj denote the length of the list T [j]. By construction, we have:
size
n = n0 + n1 + . . . + nm−1
Also, we have:
1 1 1 n
E(nj ) = P(h(k1 ) = j) + . . . + P(h(kn ) = j) = + + ... + =
m m m m
Since this value will be important, we call E(nj ) = n
m = α.
88
7.2. HASH FUNCTIONS AND TABLES Notes by Joachim Favre
Monday 12th December 2022 — Lecture 23 : Quantum bogosort is a comparison sort in Θ(n)
Successful search A successful search also takes expected time Θ(1 + α).
Proof Let’s say we search for a uniformly random element. However, a list
with many elements has higher probability of being chosen. This
means that we need to be careful.
Let x be the element we search, selected at random amongst all
the n elements from the table. The number of elements examined
during a successful search for x is one more than the number of
elements that appear before x in the list. By the implementation
of our table, these elements are the elements inserted after x was
inserted, which have the same hash. Thus, we need to find the
average of how many elements were inserted into x’s list after x was
inserted, over the n possibilities to take x in the table.
For i = 1, . . . , n, let xi be the ith element inserted into the table, and
let ki = key(xi ). For all i and j, we define the following indication
variable:
Xij = I(h(ki ) = h(kj ))
Since we are using simple uniform hashing, P(h(ki ) = h(kj )) = m
1
,
and thus E(Xij ) = m . This tells us that the expected number of
1
= ...
α α
= 1+ −
2 2n
which is indeed Θ(1 + α)
Complexity of Both successful and unsuccessful searches have average complexity Θ(1 + α), where
search α= m n
. So, if we choose the size of our table to be proportional to the number of
elements stored (meaning m = Θ(n)), then search has O(1) average complexity.
This allowed us to construct a table which has insertion and deletion in O(1), and
search in expected O(1) time.
89
Algorithms CHAPTER 7. PROBABILISTIC ANALYSIS
90
Chapter 8
Back to sorting
Naive partition- Let’s consider the last element to be the pivot. We thus want to reorder our array
ing in place such that the pivot lands in index q, and that the two other subarrays have
the property mentioned above.
To do so, we can use two counters: i and j. j goes through all elements one by one,
and if it finds an element less than the pivot, it will move i one forward and place
this element at position i. That way, i will always have elements less than or equal
to the pivot before it.
procedure Partition(A, p, r):
pivot = A[r]
i = p-1 // will be incremented before usage
for j = p to r-1:
i f A[j] <= pivot:
swap(A, i, j) // exchange A[i] with A[j]
swap(A, i+1, r) // place pivot correctly by swapping A[i+1] and A[r]
return i+1 // pivot index
91
Algorithms CHAPTER 8. BACK TO SORTING
Example For instance, it turns the first array to the second one:
Proof Let’s show that our partition algorithm is correct by showing the
loop invariant that all entries in A[p . . . i] are less than or equal to
the pivot, that all entries in A[(i + 1) . . . (j − 1)] are strictly greater
than the pivot, and that A[r] is always equal to the pivot.
The initial step is trivial since, before the loop starts, A[r] is the
pivot by construction, and A[p . . . i] and A[(i + 1) . . . (j − 1)] are
empty.
For the inductive step, let us split our proof in different cases, letting
x to be the value of the pivot. If A[j] ≤ x, then A[j] and A[i + 1] are
swapped, and then i and j are incremented; keeping our properties
as required. If A[j] > x, then we only increment j, which is correct
too. In both cases, we don’t touch at the pivot A[r].
When the loop terminates, j = r, so all elements in A are partitioned
into A[p . . . i] which only has value less than or equal to the pivot,
A[(i + 1) . . . (r − 1)] which only has value strictly greater than the
pivot, and A[r] being the pivot, showing our loop invariant.
We can then move the pivot the right place by swapping A[i + 1]
and A[r].
Complexity Let’s consider the time complexity of our procedure.
The for loop runs around n = r − p + 1 times. Each iteration takes
time Θ(1), meaning that the total running time is Θ(n) for an array
of length n. We can also observe that the number of comparions
made is around n.
Naive quick sort We can now write our quick sort procedure:
procedure Quicksort(A, p, r):
i f p < r:
q = Partition(A, p, r)
Quicksort(A, p, q-1)
Quicksort(A, q+1, r)
// no need for a combine step :D
Worst case Let’s consider the worst case running time of this algorithm. If
the list is already sorted, we will always have one of our subarray
empty:
92
8.1. QUICK SORT Notes by Joachim Favre
Best case Let’s consider the best case of our algorithm now. This happens
when every subarray are completely balance every time, meaning
that the pivots always split the array into two subarrays of equal
size. This gives us the following recurrence:
n
T (n) = 2T + Θ(n)
2
We have already seen it, typically for merge sort, and it can be
solved to T (n) = Θ(n log(n)) by the master theorem.
Average case Let’s now consider the average case over all possible inputs. Doing it
formally would take too much time, but let’s consider some intuition
to get why we have an average complexity of Θ(n log(n)).
First, we notice that even if Partition always produces a 9-to-1 split
(9 times moreelementsin one subarray), then we get the recurrence
10 + T 10 + Θ(1), which solves to Θ(n log(n)).
n
T (n) = T 9n
Also, we can notice that even if the recursion tree will not always be
good, it will usually be a mix of good and bad splits. For instance,
having a bad split (splitting n into 0 and n − 1) followed by a perfect
split takes Θ(n), and yields almost the same result as only having a
good split (which also takes Θ(n)).
Randomised There is a huge difference between the expected running time over all inputs, and the
quick sort expected running time for any input. We saw intuitively that the first is Θ(n log(n)),
and by computing a random permutation of our input array, we are able to transform
the second into the first.
However, let’s do a better strategy: instead of computing a permutation, let’s pick
randomly the pivot in the subarray we are considering.
Implementa- This modification can be changed very easily by just changing the
tion Partition procedure:
procedure RandomisedPartition(A, p, r):
i = Random(p, r)
exchange A[r] with A[i]
return Partition(A, p, r)
Proposition: Randomised quick sort has expected running time O(n log(n)) for any input.
Randomised
quick sort ex-
pected running
time
Proof The dominant cost of the algorithm is partitioning. Each call to
Partition has cost Θ(1 + Fi ) where Fi is the number of comparis-
ons in the for loop. Since an element can be a pivot at most once,
Partition is called at most n times.
This means that, letting X to be the number of comparisons
performed in all calls to Partition, the total work done over
the entire execution is O(n + X). We thus want to show that
E(X) = O(n log(n)).
Let’s call z1 , . . . , zn the elements of A, in a way such that z1 ≤
. . . ≤ zn . Also, let Zij = {zi , zi+1 , . . . , zj }, Cij be the event that zi
93
Algorithms CHAPTER 8. BACK TO SORTING
2
P(Cij ) =
j−i+1
Finally, we can put everything together:
n n n n n−1 n−i
X X X X 2 XX 2
E(X) = P(Cij ) = =
i=1 j=i+1 i=1 j=i+1
j − i + 1 i=1
k + 1
k=1
as required.
Remark We have thus seen that randomised quick sort has an expected running time
O(n log(n)). Also, this algorithm is in-place and it is very efficient and easy to
implement in practice.
Proof This is trivial since our algorithm definitely needs to consider every
input at least once.
Definition: Com- A comparison sorting algorithm is a sorting algorithm which only uses comparisons
parison sorting of two elements to gain order information about a sequence.
94
8.2. SORTING LOWER BOUND Notes by Joachim Favre
Remark All sorts we have seen so far (insertion sort, merge sort, heapsort,
and quicksort) are comparison sorts.
Theorem: Com- Any comparison sorting takes (expected time) Ω(n log(n)).
parison sorting
lower bound
Proof Any comparison sorting algorithm can be summed up by a decision
tree.
Counting sort Let’s try to make a sorting algorithm in O(n). By our theorem above, it requires
not to be a comparison sort.
Let’s say that the array we receive A[1 . . . n] has values A[j] ∈ {0, 1, . . . , k} for some
k. Note that, as long as the array has integer values, we can always shift it like that
by subtracting the minimum (found in O(n)).
Now, the idea, is to make an array C[0 . . . k] such that C[i] represents the number
of times the value i appears in A (it can be 0, or more). It can be constructed from
A in one pass. From this array, we can then make a new, sorted, array B[1 . . . n]
with the values of A, in another single pass.
Algorithm We can realise that adding one more step can make the algorithm
much simpler to implement. When we have computed our array C,
we can then turn it to a cumulative sum, so that C[i] represents the
number of times the values 0, . . . , i appear in A. However, we notice
that, then, this values is exactly the place value i would require to
land in B (if we need to add it to B), i.e. B[C[i]] = i.
procedure CountingSort(A, B, n, k):
// Initialise i
l e t C[0...k] be a new array
for i = 0 to k:
C[i] = 0
// Count occurences
for j = 1 to n:
95
Algorithms CHAPTER 8. BACK TO SORTING
C[A[j]] += 1
Analysis Our algorithm only consists of for loops with Θ(k) iterations and
Θ(n) iterations. This gives us a runtime complexity of Θ(n + k).
Note that, in fact, because k typically has to be very big, this is
not better than the comparisons sort we saw for arrays we have no
information about. However, if this k is small enough, then it can
be very powerful.
Remark This algorithm will not be at the exam since we did not have time
to see in the lecture.
96