Introduction To Algorithms Lecture Notes (MIT 6 - 006) - It-eBooks - It-eBooks-2017, 2017 - IBooker It-eBooks - Anna's Archive
Introduction To Algorithms Lecture Notes (MIT 6 - 006) - It-eBooks - It-eBooks-2017, 2017 - IBooker It-eBooks - Anna's Archive
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 1 Introduction and Document Distance 6.006 Spring 2008
Course Overview
• Efficient procedures for solving problems on large inputs (Ex: entire works of Shake
speare, human genome, U.S. Highway map)
• Scalability
Pre-requisites
• Familiarity with Python and Discrete Mathematics
Contents
The course is divided into 7 modules - each of which has a motivating problem and problem
set (except for the last module). Modules and motivating problems are as described below:
• Identical - easy?
1
Lecture 1 Introduction and Document Distance 6.006 Spring 2008
To answer the above, we need to define practical metrics. Metrics are defined in terms of
word frequencies.
Definitions
1. Word : Sequence of alphanumeric characters. For example, the phrase “6.006 is fun”
has 4 words.
2. Word Frequencies: Word frequency D(w) of a given word w is the number of times
it occurs in a document D.
For example, the words and word frequencies for the above phrase are as below:
Count : 1 0 1 1 0 1
W ord : 6 the is 006 easy f un
3. Distance Metric: The document distance metric is the inner product of the vectors D1
and D2 containing the word frequencies for all words in the 2 documents. Equivalently,
this is the projection of vectors D1 onto D2 or vice versa. Mathematically this is
expressed as:
�
D1 · D2 = D1 (w) · D2 (w) (1)
w
4. Angle Metric: The angle between the vectors D1 and D2 gives an indication of overlap
between the 2 documents. Mathematically this angle is expressed as:
� �
D1 · D2
θ(D1 , D2 ) = arccos
� D1 � ∗ � D2 �
0 ≤ θ ≤ π/2
An angle metric of 0 means the two documents are identical whereas an angle metric
of π/2 implies that there are no common words.
5. Number of Words in Document: The magnitude of the vector D which contains word
frequencies of all words in the document. Mathematically this is expressed as:
√
N (D) =� D �= D·D (2)
So let’s apply the ideas to a few Python programs and try to flesh out more.
2
Lecture 1 Introduction and Document Distance 6.006 Spring 2008
The python code and results relevant to this section are available here. This program com
putes the distance between 2 documents by performing the following steps:
• Read file
• Compute θ
Ideally, we would like to run this program to compute document distances between writings
of the following authors:
Experiment: Comparing the Bobsey and Lewis documents with docdist1.py gives θ = 0.574.
However, it takes approximately 3 minutes to compute this document distance, and probably
gets slower as the inputs get large.
What is wrong with the efficiency of this program?
Is it a Python vs. C issue? Is it a choice of algorithm issue - θ(n2 ) versus θ(n)?
Profiling: docdist2.py
In order to figure out why our initial program is so slow, we now “instrument” the program
so that Python will tell us where the running time is going. This can be done simply using
the profile module in Python. The profile module indicates how much time is spent in each
routine.
(See this link for details on profile).
The profile module is imported into docdist1.py and the end of the docdist1.py file is
modified. The modified docdist1.py file is renamed as docdist2.py
Detailed results of document comparisons are available here .
3
Lecture 1 Introduction and Document Distance 6.006 Spring 2008
• Count-frequency: 44 secs
So the get words from line list operation is the culprit. The code for this particular section
is:
word_list = [ ]
for line in L:
words_in_line = get_words_from_string(line)
word_list = word_list + words_in_line
return word_list
Solution:
word_list.extend (words_in_line)
[word_list.append(word)] for each word in words_in_line
4
Lecture 1 Introduction and Document Distance 6.006 Spring 2008
Take Home Lesson: Python has powerful primitives (like concatenation of lists) built in.
To write efficient algorithms, we need to understand their costs. See Python Cost Model
for details. PS1 also has an exercise on figuring out cost of a set of operations.
We can improve further by looking for other quadratic running times hidden in our routines.
The next offender (in terms of overall computation time) is the count frequency routine,
which computes the frequency of each word, given the word list.
def count_frequency(word_list):
"""
Return a list giving pairs of form: (word,frequency)
"""
L = []
for new_word in word_list:
for entry in L:
if new_word == entry[0]:
entry[1] = entry[1] + 1
break
else:
L.append([new_word,1])
return L
If document has n words and d distinct words, θ(nd). If all words distinct, θ(n2 ). This shows
that the count frequency routine searches linearly down the list of word/frequency pairs to
find the given word. Thus it has quadratic running time! Turns out the count frequency
routine takes more than 1/2 of the running time in docdist3.py. Can we improve?
Dictionaries: docdist4.py
The solution to improve the Count Frequency routine lies in hashing, which gives constant
running time routines to store and retrieve key/value pairs from a table. In Python, a hash
table is called a dictionary. Documentation on dictionaries can be found here.
5
Lecture 1 Introduction and Document Distance 6.006 Spring 2008
Modify docdist3.py to docdist4.py using dictionaries to give constant time lookup. Modified
count frequency routine is as follows:
def count_frequency(word_list):
"""
Return a list giving pairs of form: (word,frequency)
"""
D = {}
for new_word in word_list:
if D.has_key(new_word):
D[new_word] = D[new_word]+1
else:
D[new_word] = 1
return D.items()
Details of implementation and results are here. Running time is now θ(n). We have suc
cessfully replaced one of our quadratic time routines with a linear-time one, so the running
time will scale better for larger inputs. For the Bobsey vs. Lewis example, running time
improves from 85 secs to 42 secs.
What’s left? The two largest contributors to running time are now:
• Get words from string routine (13 secs) — version 5 of docdist fixes this with translate
• Insertion sort routine (11 secs) — version 6 of docdist fixes this with merge-sort
6
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 2 Ver 2.0 More on Document Distance 6.006 Spring 2008
Readings
CLRS Chapter 4
Asymptotic Notation
General Idea
For any problem (or input), parametrize problem (or input) size as n Now consider many
different problems (or inputs) of size n. Then,
T (n) = worst case running time for input size n
= max running time on X
X: Input of Size n
How to make this more precise?
• Don’t care about T (n) for small n
• Don’t care about constant factors (these may come about differently with different
computers, languages, . . . )
For example, the time (or the number of steps) it takes to complete a problem of size n
might be found to be T (n) = 4n2 − 2n + 2 µs. From an asymptotic standpoint, since n2
will dominate over the other terms as n grows large, we only care about the highest order
term. We ignore the constant coefficient preceding this highest order term as well because
we are interested in rate of growth.
1
Lecture 2 Ver 2.0 More on Document Distance 6.006 Spring 2008
Formal Definitions
2
Lecture 2 Ver 2.0 More on Document Distance 6.006 Spring 2008
The following table summarizes the efficiency of our various optimizations for the Bobsey
vs. Lewis comparison problem:
The details for the version 5 (V5) optimization will not be covered in detail in this lecture.
The code, results and implementation details can be accessed at this link. The only big
obstacle that remains is to replace Insertion Sort with something faster because it takes
time Θ(n2 ) in the worst case. This will be accomplished with the Merge Sort improvement
which is discussed below.
Merge Sort
Merge Sort uses a divide/conquer/combine paradigm to scale down the complexity and
scale up the efficiency of the Insertion Sort routine.
sort sort
2 sorted arrays
L’ R’ of size n/2
merge
3
Lecture 2 Ver 2.0 More on Document Distance 6.006 Spring 2008
5 4 7 3 6 1 9 2
3 4 5 7 1 2 6 9
i j
1 2 3 4 5 6 7 9
inc j inc i inc i inc i
inc j inc i inc j inc j
(array (array
L R
done) done)
T (n) = 2T (n/2) + C · n
= C · n + 2 × (C · n/2 + 2(C · (n/4) + . . .))
Detailed notes on implementation of Merge Sort and results obtained with this improvement
are available here. With Merge Sort, the running time scales “nearly linearly” with the size
of the input(s) as n lg(n) is “nearly linear”in n.
An Experiment
Insertion Sort Θ(n2 )
Merge Sort Θ(n lg(n)) if n = 2i
Built in Sort Θ(n lg(n))
• Test Merge Routine: Merge Sort (in Python) takes ≈ 2.2n lg(n) µs
The 20X constant factor difference comes about because Built in Sort is written in C while
Merge Sort is written in Python.
4
Lecture 2 Ver 2.0 More on Document Distance 6.006 Spring 2008
}
Cn Cn
T(n) = Cn(lg(n)+1)
= Θ(nlgn)
Figure 3: Efficiency of Running Time for Problem of size n is of order Θ(n lg(n))
Question: When is Merge Sort (in Python) 2n lg(n) better than Insertion Sort (in C)
0.01n2 ?
Aside: Note the 20X constant factor difference between Insertion Sort written in Python
and that written in C
Answer: Merge Sort wins for n ≥ 212 = 4096
Take Home Point: A better algorithm is much more valuable than hardware or compiler
even for modest n
See recitation for more Python Cost Model experiments of this sort . . .
5
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 3 Ver 2.0 Scheduling and Binary Search Trees 6.006 Spring 2008
– Definition
– How to solve with lists
– Operations
Readings
CLRS Chapter 10, 12. 1-3
• Add t to the set of no other landings are scheduled within < 3 minutes either way.
Example
37 41 46 49 56
time (mins)
now x x x x
Let R denote the reserved landing times: R = (41, 46, 49, 56)
Request for time: 44 not allowed (46�R)
53 OK
20 not allowed (already past)
| R |= n
Goal: Run this system efficiently in O(lg n) time
1
Lecture 3 Ver 2.0 Scheduling and Binary Search Trees 6.006 Spring 2008
Algorithm
init: R = [ ]
req(t): if t < now: return "error"
for i in range (len(R)):
if abs(t-R[i]) <3: return "error" %\Theta (n)
R.append(t)
R = sorted(R)
land: t = R[0]
if (t != now) return error
R = R[1: ] (drop R[0] from R)
Can we do better?
• Sorted list: A 3 minute check can be done in O(1). It is possible to insert new
time/plane rather than append and sort but insertion takes Θ(n) time.
• Dictionary or Python Set: Insertion is O(1) time. 3 minute check takes Ω(n) time
New Requirement
Rank(t): How many planes are scheduled to land at times ≤ t? The new requirement
necessitates a design amendment.
2
Lecture 3 Ver 2.0 Scheduling and Binary Search Trees 6.006 Spring 2008
BST NIL
insert 49 BST 49 root
all elements > 49
insert 79 BST 49
off to the right,
79 in right subtree
49 all elements < 49,
insert 46 BST go into left subtree
79
46
49
insert 41 79
BST 46
insert 64
64
41
49
49
79
41 79
46
46
3
Lecture 3 Ver 2.0 Scheduling and Binary Search Trees 6.006 Spring 2008
next-larger(x)
49
41 79
46
Figure 4: next-larger(x)
Cannot solve it efficiently with what we have but can augment the BST structure.
6
what lands before 79?
49
2 3
46 79 keep track of size of subtrees,
1 1 1 during insert and delete
43 64 83
4
Lecture 3 Ver 2.0 Scheduling and Binary Search Trees 6.006 Spring 2008
subtree
49 46
1 + 2 + 1 + 1 = 5
79 64
subtree
All the Python code for the Binary Search Trees discussed here are available at this link
43
46
49
55
The tree in Fig. 7 looks like a linked list. We have achieved O(n) not O(log(n)!!
. .
|
5
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 4 Balanced Binary Search Trees 6.006 Spring 2008
Lecture Overview
• The importance of being balanced
• AVL trees
– Definition
– Balance
– Insert
Readings
CLRS Chapter 13. 1 and 13. 2 (but different approach: red-black trees)
– key
– left pointer
– right pointer
– parent pointer
See Fig. 1
3 41
2 20 65 1
29 50 φ
φ 11 1
26
φ
1
Lecture 4 Balanced Binary Search Trees 6.006 Spring 2008
≤x ≥x
• height of node = length (� edges) of longest downward path to a leaf (see CLRS B.5
for details).
vs.
2
Lecture 4 Balanced Binary Search Trees 6.006 Spring 2008
AVL Trees:
Definition
AVL trees are self-balancing binary search trees. These trees are named after their two
inventors G.M. Adel’son-Vel’skii and E.M. Landis 1
An AVL tree is one that requires heights of left and right children of every node to differ
by at most ±1. This is illustrated in Fig. 4)
k-1 k
• Each node stores its height. This is inherently a DATA STRUCTURE AUGMENTATION
procedure, similar to augmenting subtree size. Alternatively, one can just store dif
ference in heights.
A good animation applet for AVL trees is available at this link. To compare Binary Search
Trees and AVL balancing of trees use code provided here .
1
Original Russian article: Adelson-Velskii, G.; E. M. Landis (1962). ”An algorithm for the organization
of information”. Proceedings of the USSR Academy of Sciences 146: 263266. (English translation by Myron
J. Ricci in Soviet Math. Doklady, 3:12591263, 1962.)
3
Lecture 4 Balanced Binary Search Trees 6.006 Spring 2008
Balance:
⇒ Nh = Nh−1 + Nh−2 + 1
> 2Nh−2
⇒ Nh > 2h/2
1
=⇒ h < lg h
2
Alternatively:
AVL Insert:
2. work your way up tree, restoring AVL property (and updating heights as you go).
Each Step:
4
Lecture 4 Balanced Binary Search Trees 6.006 Spring 2008
x
y
y Left-Rotate(x) k
k-1 k+1 x
A k
C
x y
k+1 x
k-1 z k+1 Left-Rotate(x)
A k
C
k B C k k-1 A k B
x y k+1
Right-Rotate(z)
z k x z k
k-1 k+1 Left-Rotate(x)
A
k-1
k y or
k-1 A B C D k-1
D k-1 k-2
k-1
or
B C
k-2
5
Lecture 4 Balanced Binary Search Trees 6.006 Spring 2008
2 20 65 1 20 65 1
50 φ 50 φ
φ 11 1 29 φ 11 2 29
26 1 26
φ
φ 23
Done Insert(55)
3 41 3 41
2 20 65 1
2 20 65 1
50 φ 50 φ
φ 11 1 26 φ 11 1 26
φ 23 29 φ φ 23 29 φ
2 20 2 65 2 20 1 55
50 1 φ 50 φ 65
11 1 26 11 1 26
φ φ
55 φ
φ 23 29 φ φ 23 29 φ
Comment 1. In general, process may need several rotations before an Insert is completed.
6
Lecture 4 Balanced Binary Search Trees 6.006 Spring 2008
Note 1. Skip Lists and Treaps use random numbers to make decisions fast with high
probability.
Note 2. Splay Trees and Scapegoat Trees are “amortized”: adding up costs for several
operations =⇒ fast on average.
7
Lecture 4 Balanced Binary Search Trees 6.006 Spring 2008
Splay Trees
Upon access (search or insert), move node to root by sequence of rotations and/or double-
rotations (just like AVL trees). Height can be linear but still O(lg n) per operation “on
average” (amortized)
Optimality
• For BSTs, cannot do better than O(lg n) per search in worst case.
A Conjecture: Splay trees are O(best BST) for every access pattern.
• With fancier tricks, can achieve O(lg lg u) performance for integers 1 · · · u [Van Ernde
Boas; see 6.854 or 6.851 (Advanced Data Structures)]
Big Picture:
Abstract Data Type(ADT): interface spec.
e.g. Priority Queue:
• Q = new-empty-queue()
• Q.insert(x)
• x = Q.deletemin()
vs.
There are many possible DSs for one ADT. One example that we will discuss much later in
the course is the “heap” priority queue.
8
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 5 Hashing I: Chaining, Hash Functions 6.006 Spring 2008
Lecture Overview
• Dictionaries and Python
• Motivation
• Hash functions
• Chaining
Readings
CLRS Chapter 11. 1, 11. 2, 11. 3.
Dictionary Problem
Abstract Data Type (ADT) maintains a set of items, each with a key, subject to
• assume items have distinct keys (or that inserting new one clobbers old)
• balanced BSTs solve in O(lg n) time per op. (in addition to inexact searches like
nextlargest).
Python Dictionaries:
1
Lecture 5 Hashing I: Chaining, Hash Functions 6.006 Spring 2008
Motivation
Document Distance
• already used in
def count_frequency(word_list):
D = {}
for word in word_list:
if word in D:
D[word] += 1
else:
D[word] = 1
=⇒ optimal Θ(n) document distance assuming dictionary ops. take O(1) time
PS2
2
Lecture 5 Hashing I: Chaining, Hash Functions 6.006 Spring 2008
φ
1
2
key item
key item
key item
.
.
.
Problems:
2. large key range =⇒ large space e.g. one key of 2256 is bad news.
2 Solutions:
• In Python: hash (object) where object is a number, string, tuple, etc. or object
implementing — hash — Misnomer: should be called “prehash”
• Object’s key should not change while in table (else cannot find it anymore)
3
Lecture 5 Hashing I: Chaining, Hash Functions 6.006 Spring 2008
• Reduce universe U of all keys (say, integers) down to reasonable size m for table
• hash function h: U → φ, 1, . . . , m − 1
T
φ
k1 1
. . .
U . . . k. k3
k k . k.
1
h(k 1) = 1
.
. .
2
k.
4
. 3
k2
m-1
1. Chaining: TODAY
4
Lecture 5 Hashing I: Chaining, Hash Functions 6.006 Spring 2008
Chaining
Linked list of colliding elements in each slot of table
.k
k1 . . .
U k k . . 2 k1 k4 k2
4
.k 3
. h(k 1) =
k3 h(k 2) =
h(k )
4
• Worst case: all keys in k hash to same slot =⇒ Θ(n) per operation
The performance is likely to be O(1 + α) - the 1 comes from applying the hash function
and access slot whereas the α comes from searching the list. It is actually Θ(1 + α), even
for successful search (see CLRS ).
Therefore, the performance is O(1) if α = O(1) i. e. m = Ω(n).
5
Lecture 5 Hashing I: Chaining, Hash Functions 6.006 Spring 2008
Hash Functions
Division Method:
h(k) = k mod m
• but if keys are x, 2x, 3x, . . . (regularity) and x and m have common divisor d then use
only 1/d of table. This is likely if m has a small divisor e. g. 2.
Multiplication Method:
h(k) = [(a · k) mod 2w ] � (w − r) where m = 2r and w-bit machine words and a = odd
integer between 2( w − 1) and 2w .
Good Practise: a not too close to 2(w−1) or 2w .
Key Lesson: Multiplication and bit extraction are faster than division.
w
k
x a
}
6
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 6 Hashing II: Table Doubling, Karp-Rabin 6.006 Spring 2008
• Amortization
• Rolling Hash
Readings
CLRS Chapter 17 and 32.2.
Recall:
Hashing with Chaining:
table collisions
all possible
keys
.kk1 . . .
U k . . 2 k1 k4 k2
k k.
4
3
h
}
n keys
. expected size
in set DS m slots k3 α = n/m
Cost : Θ (1+α)
Multiplication Method:
1
Lecture 6 Hashing II: Table Doubling, Karp-Rabin 6.006 Spring 2008
w
k
x a ≡
Idea:
Rehashing:
2
Lecture 6 Hashing II: Table Doubling, Karp-Rabin 6.006 Spring 2008
• m + = 1?
=⇒ rebuild every step
=⇒ n inserts cost Θ(1 + 2 + · · · + n) = Θ(n2 )
Amortized Analysis
This is a common technique in data structures - like paying rent: $ 1500/month ≈ $ 50/day
• “T (n) amortized” roughly means T (n) “on average”, but averaged over all ops.
Back to Hashing:
Maintain m = Θ(n) so also support search in O(1) expected time assuming simple uniform
hashing
Delete:
• solution: when n decreases to m/4, shrink to half the size =⇒ O(1) amortized cost
for both insert and delete - analysis is harder; (see CLRS 17.4).
String Matching
Given two strings s and t, does s occur as a substring of t? (and if so, where and how many
times?)
E.g. s = ‘6.006’ and t = your entire INBOX (‘grep’ on UNIX)
3
Lecture 6 Hashing II: Table Doubling, Karp-Rabin 6.006 Spring 2008
t
s
s
Simple Algorithm:
Karp-Rabin Algorithm:
4
Lecture 6 Hashing II: Table Doubling, Karp-Rabin 6.006 Spring 2008
Karp-Rabin Application:
for c in s: hs.append(c)
for c in t[:len(s)]:ht.append(c)
if hs() == ht(): ...
Data Structure:
Treat string as a multidigit number u in base a where a denotes the alphabet size. E.g. 256
5
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 7 Hashing III: Open Addressing 6.006 Spring 2008
Lecture Overview
• Open Addressing, Probing Strategies
• Advanced Hashing
Readings
CLRS Chapter 11.4 (and 11.3.3 and 11.5 if interested)
Open Addressing
Another approach to collisions
• no linked lists
item2
item1
item3
• hash function specifies order of slots to probe (try) for a key, not just one slot: (see
Fig. 2)
Insert(k,v)
for i in xrange(m):
if T [h(k, i)] is None: � empty slot
T [h(k, i)] = (k, v) � store item
return
raise ‘full’
1
Lecture 7 Hashing III: Open Addressing 6.006 Spring 2008
k h(k,1)
h(k,4)
h(k,2)
φ
1 586 , . . . collision
2 133 , . . .
probe h(496, φ) = 4 3
probe h(496, 1) = 1 4 204 , . . . collision
5 496 , . . . insert
probe h(496, 2) = 5
6 481 , . . .
7
m-1
Search(k)
for i in xrange(m):
if T [h(k, i)] is None: � empty slot?
return None � end of “chain”
elif T [h(k, i)][φ] == k: � matching key
return T [h(k, i)] � return item
return None ˙ � exhausted table
2
Lecture 7 Hashing III: Open Addressing 6.006 Spring 2008
Delete(k)
• replace item with DeleteMe, which Insert treats as None but Search doesn’t
Probing Strategies
Linear Probing
h(k, i) = (h� (k) +i) mod m where h� (k) is ordinary hash function
• problem: clustering as consecutive group of filled slots grows, gets more likely to grow
(see Fig. 4)
;
;
h(k,m-1)
h(k,0)
h(k,1)
h(k,2)
;
.
.
;
• for 0.01 < α < 0.99 say, clusters of Θ(lg n). These clusters are known
√
• for α = 1, clusters of Θ( n) These clusters are known
Double Hashing
h(k, i) =(h1 (k) +i. h2 (k)) mod m where h1 (k) and h2 (k) are two ordinary hash functions.
• actually hit all slots (permutation) if h2 (k) is relatively prime to m
3
Lecture 7 Hashing III: Open Addressing 6.006 Spring 2008
Analysis
1
Open addressing for n items in table of size m has expected cost of ≤ per operation,
1−α
where α = n/m(< 1) assuming uniform hashing
Example: α = 90% =⇒ 10 expected probes
Proof:
n n−1 n−2
So expected cost = 1+ (1 + (1 + (· · · )
m m−1 m−2
n−1 n
Now ≤ = α for i = φ, · · · , n(≤ m)
m−1 m
So expected cost ≤ 1 + α(1 + α(1 + α(· · · )))
= 1 + α + α2 + α3 + · · ·
1
=
1−α
4
Lecture 7 Hashing III: Open Addressing 6.006 Spring 2008
Advanced Hashing
Universal Hashing
Instead of defining one hash function, define a whole family and select one at random
• =⇒ O(1) expected time per operation without assuming simple uniform hashing!
CLRS 11.3.3
Perfect Hashing
k items => m = k 2
NO COLLISIONS
2 levels
[CLRS 11.5]
5
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 8 Sorting I: Heaps 6.006 Spring 2008
Lecture Overview
• Review: Insertion Sort and Merge Sort
• Selection Sort
• Heaps
Readings
CLRS 2.1, 2.2, 2.3, 6.1, 6.2, 6.3 and 6.4
Sorting Review
Insertion Sort
key
θ(n2) algorithm
5 2 4 6 1 3 2 5 4 6 1 3
1 2 3 4 5 6 2 4 5 6 1 3
1 2 4 5 6 3 2 4 5 6 1 3
Merge Sort
Divide n-element array into two subarrays of n/2 elements each. Recursively sort sub-arrays
using mergesort. Merge two sorted subarrays.
1
Lecture 8 Sorting I: Heaps 6.006 Spring 2008
L R
A[1: n/2] A[n/2+1: n]
θ(n) time 2 4 5 7 1 2 3 6
2 4 5 7 1 2 3 6
θ(n) auxiliary want sorted A[1: n]
space A 1 2 2 3 4 5 6 7
w/o auxiliary space??
In-Place Sorting
Numbers re-arranged in the array A with at most a constant number of them sorted outside
the array at any time.
Selection Sort
0. i=1
3. i = i + 1, stop if i = n
Iterate steps 0-3 n times. Step 1 takes O(n) time. Can we improve to O(lg n)?
2
Lecture 8 Sorting I: Heaps 6.006 Spring 2008
2 1 5 4
i=1
1 2 5 4
θ(n ) time
2
in-place
1 2 5 4
1 2 4 5
1 16
1 2 3 4 5 6 7 8 9 10
2 14 10 3
4 5 7 6 9 7 3 16 14 10 8 7 9 3 2 4 1
8
8 9 4
2 1
Data Structure
root A[i]
Node with index i
PARENT(i) = � 2i �
LEFT(i) = 2i
RIGHT(i) = 2i + 1
Note: NO POINTERS!
3
Lecture 8 Sorting I: Heaps 6.006 Spring 2008
heap-size[A]: ≤ length[A]
BUILD MAX HEAP: O(n) produces max-heap from unordered input array
Max Heapify(A,i)
l ← left(i)
r ← right(i)
if l ≤ heap-size(A) and A[l] > A[i]
then largest ← l
else largest ← i
if r ≤ heap-size(A) and A[r] > largest
then largest ← r
if largest =� i
then exchange A[i] and A[largest]
MAX HEAPIFY(A, largest)
This assumes that the trees rooted at left(i) and Right(i) are max-heaps. A[i] may be
smaller than children violating max-heap property. Let the A[i] value “float down” so
subtree rooted at index i becomes a max-heap.
4
Lecture 8 Sorting I: Heaps 6.006 Spring 2008
Example
1 16
2 MAX_HEAPIFY (A,2)
4 10 3
heap_size[A] = 10
4 5 7 6 9 7 3
14
8 10
9 8 1
2
1 16
1 16
Exchange A[4] with A[9]
2 14 10 3 No more calls
4 5 7 6 9 7 3
8
8 9 4 10
2 1
5
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 9 Sorting II: Heaps 6.006 Spring 2008
Lecture Overview
• Review: Heaps and MAX HEAPIFY
• Building a Heap
• Heap Sort
Readings
CLRS 6.1-6.4
Review
Heaps:
Parent(i) = �i/2�
Left(i) = 2i
Right(i) = 2i + 1
A[Parent(i)] ≥ A[i]
• MAX HEAPIFY(A, 2)
heap size(A) = 10
A[2] ←→ A[4]
• MAX HEAPIFY(A,4)
A[4] ←→ A[9]
1
Lecture 9 Sorting II: Heaps 6.006 Spring 2008
1
16
3
2 10
4
7
4 6
14 5 9 3
7
8 9
2 8 10
1
Violation
1 2 3 4 5 6 7 8 9 10
16 4 10 14 7 9 3 2 8 1
etc
O(lg n) time
Building a Heap
A[1 · · · n] converted to a max heap Observation: Elements A[�n/2 + 1� · · · n] are all leaves
of the tree and can’t have children.
2
Lecture 9 Sorting II: Heaps 6.006 Spring 2008
1
4 A 4 1 3 2 16 9 10 14 8 7
3
2 MAX-HEAPIFY (A,5)
1 3
7 no change
4 6
2 9 10 MAX-HEAPIFY (A,4)
5 16
8
Swap A[4] and A[8]
9
14 8 10
7
1
4 MAX-HEAPIFY (A,3)
3 Swap A[3] and A[7]
2
1 3
7
4 6
14 9 10
5 16
8 9
2 8 10
7
1
MAX-HEAPIFY (A,2)
4
Swap A[2] and A[5]
3
2 Swap A[5] and A[10]
1 10
7
4 6
9 3
14 5 16
8 9
2 8 10
7
1
4
3
2 MAX-HEAPIFY (A,1)
10
16 7 Swap A[1] with A[2]
4 6
14 9 3 Swap A[2] with A[4]
5 7
8
Swap A[4] with A[9]
9
2 8 10
1
3
Lecture 9 Sorting II: Heaps 6.006 Spring 2008
Sorting Strategy
• Build max heap from unordered array
4
Lecture 9 Sorting II: Heaps 6.006 Spring 2008
1
1 heap_size = 9
16
1 MAX_HEAPIFY (A,1)
3
2 3
10 2
14 10
6
7 14
4 7
9 3 6
8 5 7 4
9 3
8 5 7
8 9 10
2 4 10 8 9
1 2 4 16 not part of heap
Note: cannot run MAX_HEAPIFY with heapsize of 10
1
1
14
1
3
2 3
8 10 2 10
6
7 8
4 7
9 3 6
4 5 7 4
9 3
4 5 7
8 9
2 1 8 9 10
2
14 16 not part of heap
MAX_HEAPIFY (A,1)
1
1
10
2
3
2 3
9 2
8 9
6
7 8
4 7
1 3 6
4 5 7 4
1 3
4 5 7
8
2 8 9 10
10 14 16 not part of heap
MAX_HEAPIFY (A,1)
and so on . . .
5
Lecture 9 Sorting II: Heaps 6.006 Spring 2008
Priority Queues
This is an abstract datatype as it can be implemented in different ways.
6
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 10 Sorting III: Linear Bounds Linear-Time Sorting 6.006 Spring 2008
Lecture Overview
• Sorting lower bounds
– Decision Trees
• Linear-Time Sorting
– Counting Sort
Readings
CLRS 8.1-8.4
Comparison Sorting
Insertion sort, merge sort and heap sort are all comparison sorts.
The best worst case running time we know is O(n lg n). Can we do better?
Decision-Tree Example
1:2
2:3 1:3
2:3
123 1:3 213
231 321
132 312
1
Lecture 10 Sorting III: Linear Bounds Linear-Time Sorting 6.006 Spring 2008
Example
Sort < a1 , a2 , a3 >=< 9, 4, 6 > Each leaf contains a permutation, i.e., a total ordering.
1:3
2:3 9 > 6 (a1 > a3)
2:3
(a2 ≤ a3) 4 ≤ 6
231 4≤ 6≤ 9
Can model execution of any comparison sort. In order to sort, we need to generate a total
ordering of elements.
Theorem
Any decision tree that can sort n elements must have height Ω(n lg n).
Proof: Tree must contain ≥ n! leaves since there are n! possible permutations. A height-h
binary tree has ≤ 2h leaves. Thus,
n! ≤ 2h
n
=⇒ h ≥ lg(n!) (≥ lg(( )n ) Stirling)
e
≥ n lg n − n lg e
= Ω(n lg n)
2
Lecture 10 Sorting III: Linear Bounds Linear-Time Sorting 6.006 Spring 2008
Intuition
Since elements are in the range {1, 2, · · · , k}, imagine collecting all the j’s such that A[j] = 1,
then the j’s such that A[j] = 2, etc.
Don’t compare elements, so it is not a comparison sort!
for i ← 1 to k
θ(k) { do C [i] = 0
θ(n) for j ← 1 to n
{ do C [A[j]] = C [A[j]] + 1
θ(k) for i ← 2 to k
{ do C [i] = C [i] + C [i-1]
θ(n) for j ← n downto 1
{ do B[C [A[j]]] = A[j]
C [A[j]] = C [A[j]] - 1
θ(n+k)
3
Lecture 10 Sorting III: Linear Bounds Linear-Time Sorting 6.006 Spring 2008
Example
1 2 3 4 5 1 2 3 4
A: 4 1 3 4 3 C: 0 0 0 0
1 2 3 4 5
C: 1 0 2 2
B: 1 3 3 4 4
1 2 3 4
C: 1 1 3 5
2 4
A[n] = A[5] = 3
C[3] = 3
B[3] = A[5] = 3, C[3] decr.
A[4] = 4
C[4] = 5
B[5] = A[4] = 4, C[4] decr. and so on . . .
4
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 11 Sorting IV: Stable Sorting, Radix Sort 6.006 Spring 2008
Lecture Overview
• Stable Sorting
• Radix Sort
• Sorting Races
Stable Sorting
Preserves input order among equal elements
4’ 1 3* 4 3
counting sort is stable
merge sort is stable
1 3* 3 4’ 4
Figure 1: Stability
Selection Sort and Heap: Find maximum element and put it at end of array (swap with
element at end of array) NOT STABLE!
3 2a 2b ← 2b 2a 3 define
2a <2b
Radix Sort
• Herman Hollerith card-sorting machine for 1890 census.
1
Lecture 11 Sorting IV: Stable Sorting, Radix Sort 6.006 Spring 2008
3. Gather cards bin by bin, so cards with first place punched are on top of cards
with second place punched, etc.
80 cols
.
10 . .
places . .
MSB strategy: Cards in 9 of 10 bins must be put aside, leading to a large number of
intermediate piles
LSB strategy: Can gather sorted cards in bins appropriately to create a deck!
Example
3 2 9 7 2 0 7 2 0 3 2 9
4 5 7 3 5 5 3 2 9 3 5 5
6 5 7 4 3 6 4 3 6 4 3 6
8 3 9 4 5 7 8 3 9 4 5 7
4 3 6 6 5 7 3 5 5 6 5 7
7 2 0 3 2 9 4 5 7 7 2 0
3 5 5 8 3 9 6 5 7 8 3 9
2
Lecture 11 Sorting IV: Stable Sorting, Radix Sort 6.006 Spring 2008
Analysis
Quick Sort
This section is for “enrichment” only.
Divide: Partition the array into two. Sub-arrays around a pivot x such that elements in
lower sub array ≤ x ≤ elements in upper sub array. ← Linear Time
pivot
≤x x ≥x
Combine: Trivial
If we choose a pivot such that two sub arrays are roughly equal:
3
Lecture 11 Sorting IV: Stable Sorting, Radix Sort 6.006 Spring 2008
Sorting Races
Click here for a reference on this.
Bubble Sort: Repeatedly step through the list to be sorted. Compare 2 items, swap if they
are in the wrong order. Continue through list, until no swaps. Repeat pass through
list until no swaps. Θ(n2 )
Shell Sort: Improves insertion sort by comparing elements separated by gaps Θ(nlg2 n)
4
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 12 Searching I: Graph Search & Representations 6.006 Spring 2008
• Applications
• Graph Representations
Readings
CLRS 22.1-22.3, B.4
Graph Search
Explore a graph e.g., find a path from start vertices to a desired vertex
Recall: graph G = (V, E)
UNDIRECTED DIRECTED
1
Lecture 12 Searching I: Graph Search & Representations 6.006 Spring 2008
Applications:
There are many.
Pocket Cube:
Consider a 2 × 2 × 2 Rubik’s cube
• Configuration Graph:
• � vertices = 8!.38 = 264, 539, 520 (because there are 8 cubelets in arbitrary positions,
and each cubelet has 3 possible twists)
2
Lecture 12 Searching I: Graph Search & Representations 6.006 Spring 2008
• in fact, graph has 3 connected components of equal size =⇒ only need to search in
one
=⇒ 7!.36 = 3, 674, 160
3
Lecture 12 Searching I: Graph Search & Representations 6.006 Spring 2008
... “breadth-
first
tree”
possible
first moves
reachable
in two steps
but not one
� reachable configurations
4
Lecture 12 Searching I: Graph Search & Representations 6.006 Spring 2008
• for each vertex u�V, Adj[u] stores u’s neighbors, i.e., {v�V | (u, v)�E}. colorBlue(u, v)
are just outgoing edges if directed. (See Fig. 5 for an example)
• in Python: Adj = dictionary of list/set values vertex = any hashable object (e.g., int,
tuple)
a a c
b c a
b c c b
Adj
Object-oriented variations:
Incidence Lists:
5
Lecture 12 Searching I: Graph Search & Representations 6.006 Spring 2008
e.a e e.b
The above representations are good for for sparse graphs where | E |� (| V |)2 . This
translates to a space requirement = Θ(V + E) (Don’t bother with | . | ’s inside O/Θ).
Adjacency Matrix:
1 2 3
a
1
b c
A=
( (
0 0 1
1 0 1
0 1 0
2
3
6
Lecture 12 Searching I: Graph Search & Representations 6.006 Spring 2008
Implicit Graphs:
Adj(u) is a function or u.neighbors/edges is a method =⇒ “no space” (just what you need
now)
s ...
frontier
Figure 8: Illustrating Breadth-First Search
• initially {s}
• repeatedly advance frontier to next level, careful not to go backwards to previous level
Depth-first search
7
Lecture 12 Searching I: Graph Search & Representations 6.006 Spring 2008
• recursively explore it
8
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 13 Searching II 6.006 Spring 2008
• Shortest Paths
• Depth-First Search
• Edge Classification
Readings
CLRS 22.2-22.3
Recall:
graph search: explore a graph
e.g., find a path from start vertices to a desired vertex
• for each vertex u�V, Adj[u] stores u’s neighbors, i.e. {v�V | (u, v)�E}
v - just outgoing edges if directed
a a c
b c a
b c c b
Adj
1
Lecture 13 Searching II 6.006 Spring 2008
s ...
level Ø last level
level 1
level 2
• level φ = {s}
• build level i > 0 from level i − 1 by trying all outgoing edges, but ignoring vertices
from previous levels
BFS (V,Adj,s):
level = { s: φ }
parent = {s : None }
i=1
frontier = [s] � previous level, i − 1
while frontier:
next = [ ] � next level, i
for u in frontier:
for v in Adj [u]:
if v not in level: � not yet seen
level[v] = i � = level[u] + 1
parent[v] = u
next.append(v)
frontier = next
i+=1
2
Lecture 13 Searching II 6.006 Spring 2008
Example:
level Ø
level 1
1 Ø 3 frontierØ = {s}
2
a s d f frontier1 = {a, x}
frontier2 = {z, d, c}
2 1 2 3 frontier3 = {f, v}
z x c v (not x, c, d)
level 2 level 3
Analysis:
• O(E) time
- O(V + E) to also list vertices unreachable from v (those still not assigned level)
“LINEAR TIME”
Shortest Paths:
• for every vertex v, fewest edges to get from s to v is
�
level[v] if v assigned level
∞ else (no path)
• parent pointers form shortest-path tree = union of such a shortest path for each v
=⇒ to find shortest path, take v, parent[v], parent[parent[v]], etc., until s (or None)
3
Lecture 13 Searching II 6.006 Spring 2008
• recursively explore
} search from
}
DFS-visit (V, Adj, s): start vertex s
for v in Adj [s]: (only see
if v not in parent: stuff reachable
parent [v] = s from s)
DFS-visit (V, Adj, v)
4
Lecture 13 Searching II 6.006 Spring 2008
Example:
cross edge
S1 1 S2
a b c
forward 8
edge 5
2 6
4
back d e f 7
edge 3
back edge
Edge Classification:
To compute this classification, keep global time counter and store time interval during
which each vertex is on recursion stack.
Analysis:
5
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 14 Searching III 6.006 Spring 2008
• job scheduling
• topological sort
• intractable problems
• P, NP, NP-completeness
Readings
CLRS, Sections 22.4 and 34.1-34.3 (at a high level)
Recall:
• Breadth-First Search (BFS): level by level
1
Lecture 14 Searching III 6.006 Spring 2008
Job Scheduling:
Given Directed Acylic Graph (DAG), where vertices represent tasks & edges represent
dependencies, order tasks without violating dependencies
8 7 9
G H I
4 3 2 1
A B C F
D E
6 5
Source
Attempt
- from D finds C, E, F
- from G finds H
} need to merge
- costly
2
Lecture 14 Searching III 6.006 Spring 2008
Topological Sort
Reverse of DFS finishing times (time at which node’s outgoing edges finished)
Exercise: prove that no constraints are violated
Intractability
• DFS & BFS are worst-case optimal if problem is really graph search (to look at graph)
• what if graph . . .
– is implicit?
– has special structure?
– is infinite?
The first 2 characteristics (implicitness and special structure) apply to the Rubik’s Cube
problem.
The third characteristic (infiniteness) applies to the Halting Problem.
Halting Problem:
UNDECIDABLE: no algorithm solves this problem (correctly in finite time on all inputs)
• decision problem = a function from binary strings to {YES,NO}. Binary strings refer
to ≈ nonneg. integers while {YES,NO} ≈ {0,1}
• =⇒ not nearly enough programs for all problems & each program solves only one
problem
3
Lecture 14 Searching III 6.006 Spring 2008
n × n × n Rubik’s cube:
n × n Chess:
Given n × n board & some configuration of pieces, can WHITE force a win?
n2 − 1 Puzzle:
Given n × n grid with n2 − 1 pieces, sort pieces by sliding (see Figure 3).
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15
Figure 3: Puzzle
4
Lecture 14 Searching III 6.006 Spring 2008
Tetris:
Given current board configuration & list of pieces to come, stay alive
P, NP, NP-completeness
P = all (decision) problems solvable by a polynomial (O(nc )) time algorithm (efficient)
NP = all decision problems whose YES answers have short (polynomial-length) “proofs”
checkable by a polynomial-time algorithm
e.g., Rubik’s cube and n2 − 1 puzzle:
is there a solution of length ≤ k?
YES =⇒ easy-to-check short proof(moves)
Tetris � NP
but we conjecture Chess not NP (winning strategy is big- exponential in n)
5
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 15 Shortest Paths I: Intro 6.006 Spring 2008
• Weighted Graphs
• General Approach
• Negative Edges
• Optimal Substructure
Readings
CLRS, Sections 24 (Intro)
Motivation:
Shortest way to drive from A to B (Google maps “get directions”)
1
Lecture 15 Shortest Paths I: Intro 6.006 Spring 2008
Weighted Graphs:
Notation:
p
means p is a path from v0 to vk . (v0 ) is a path from v0 to v0 of weight 0.
v0 −→ vk
Definition:
Shortest path weight from u to v as
⎧ � �
⎪ p
⎨ min w(p) : if ∃ any such path
δ(u, v) = u −→ v
⎪
∞ otherwise (v unreachable from u)
⎩
Given G = (V, E), w and a source vertex S, find δ(S, V ) [and the best path] from S to each
v�V .
Data structures:
2
Lecture 15 Shortest Paths I: Intro 6.006 Spring 2008
Example:
A C E
5 3
1 5 3
1
3 3
0 1 2 1 4
S
2
1 1
2 3 4
B D F
Negative-Weight Edges:
• Natural in some applications (e.g., logarithms used for weights)
• If you have negative weight edges, you might also have negative weight cycles =⇒
may make certain shortest paths undefined!
Example:
See Figure 2
3
Lecture 15 Shortest Paths I: Intro 6.006 Spring 2008
3 -2
B
4 E
S 2
-6 1
2
A D
If negative weight edges are present, s.p. algorithm should find negative weight cycles (e.g.,
Bellman Ford)
d [v] ← ∞
Initialize: for v � V :
Π [v] ← NIL
d[S] ← 0
Main: repeat
⎡select edge (u, v) [somehow]
if d[v] > d[u] + w(u, v) :
“Relax” edge (u, v) ⎣ d[v] ← d[u] + w(u, v)
⎢
π[v] ← u
until all edges have d[v] ≤ d[u] + w(u, v)
4
Lecture 15 Shortest Paths I: Intro 6.006 Spring 2008
Complexity:
v0 v1 v2 v3 v4 v5 v6 v7
4 8 10 12 13 14
T(0) = 0 v0, v1 13
v1, v2 10 11 12
T(n+2) = 3 + 2T(n) v2, vn 11
v0, v2 4 6 8 9 10
T(n) = θ(2 )n/2
v2, vn 9
6 7 8
7
Optimal Substructure:
Theorem: Subpaths of shortest paths are shortest paths
Let p = < v0 , v1 , . . . vk > be a shortest path
Let pij = < vi , vi+1 , . . . vj > 0 ≤ i ≤ j ≤ k
Then pij is a shortest path.
Proof:
pij’
If p�ij is shorter than pij , cut out pij and replace with p�ij ; result is shorter than p.
Contradiction.
5
Lecture 15 Shortest Paths I: Intro 6.006 Spring 2008
Triangle Inequality:
Proof:
δ (u,v)
u v
δ (u,x) δ (x,v)
x
6
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 16 Shortest Paths II: Bellman-Ford 6.006 Spring 2008
– Analysis
– Correctness
Recall:
Shortest path weight from u to v is δ(u, v). δ(u, v) is ∞ if v is unreachable from u, undefined
if there is a negative cycle on some path from u to v.
-ve
u v
d [v] ← ∞
Initialize: for v � V :
Π [v] ← NIL
d[S] ← 0
Main: repeat
⎡select edge (u, v) [somehow]
if d[v] > d[u] + w(u, v) :
“Relax” edge (u, v) ⎣ d[v] ← d[u] + w(u, v)
⎢
π[v] ← u
until you can’t relax any more edges or you’re tired or . . .
1
Lecture 16 Shortest Paths II: Bellman-Ford 6.006 Spring 2008
Complexity:
Termination: Algorithm will continually relax edges when there are negative cycles present.
-1
1 -4
u v
0 1 3 4
1 2 1
0 2
d[u] -1 1
-2 0
etc
v0 v1 v2 v3 v4 v5 v6
4 8 10 12 13 14
13
ORDER
10 11 12
(v0, v1)
11
T(n) = 3 + 2T(n-2) (v1, v2)
4 6 8 9 10
all of v2, vn
T(n) = θ(2 )
n/2
(v0, v2)
all of v2, vn
2
Lecture 16 Shortest Paths II: Bellman-Ford 6.006 Spring 2008
5-Minute 6.006
Here’s what I want you to remember from 6.006 five years after you graduate
Bellman-Ford(G,W,S)
Initialize ()
for i = 1 to | v | −1
for each edge (u, v)�E:
Relax(u, v)
for each edge (u, v)�E
do if d[v] > d[u] + w(u, v)
then report a negative-weight cycle exists
1
∞ -1 -1
B 1 B 1
-1 2 -1 2
3 3
0 4 7 0 4 7
A 2 E ∞ A 2 E ∞ 1
3 1 3 1 1
5 8 5 8
4 2 -3 4 2 -3
6 6
C D C D
5 5 1 -2
4 2 ∞ ∞ 2 ∞ 2 3
2 3
Figure 5: The numbers in circles indicate the order in which the δ values are computed
3
Lecture 16 Shortest Paths II: Bellman-Ford 6.006 Spring 2008
Theorem:
If G = (V, E) contains no negative weight cycles, then after Bellman-Ford executes d[v] =
δ(u, v) for all v�V .
Proof:
v�V be any vertex. Consider path p from s to v that is a shortest path with minimum
number of edges.
v
v1 vk
δ (s, vi) =
S δ (s, vi-1) + w (vi-1,vi)
p: v0 v2
Corollary
If a value d[v] fails to converge after | V | −1 passes, there exists a negative-weight cycle
reachable from s.
4
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 17 Shortest Paths III: Dijkstra 6.006 Spring 2008
• Dijkstra’s Algorithm
Readings
CLRS, Sections 24.2-24.3
DAGs:
Can’t have negative cycles because there are no cycles!
1. Topologically sort the DAG. Path from u to v implies that u is before v in the linear
ordering
2. One pass over vehicles in topologically sorted order relaxing each edge that leaves
each vertex
Θ(V + E) time
Example:
6 1
r s t x y z
5 2 7 -1 -2
∞ 0 ∞ ∞ ∞ ∞
4
3
2
1
Lecture 17 Shortest Paths III: Dijkstra 6.006 Spring 2008
6 1
r s t x y z
5 2 7 -1 -2
∞ 0 2 6 ∞ ∞
4
3
2
process t, x, y
6 1
r s t x y z
5 2 7 -1 -2
∞ 0 2 6 5 3
4
3
2
Dijkstra’s Algorithm
For each edge (u, v) � E, assume w(u, v) ≥ 0, maintain a set S of vertices whose final
shortest path weights have been determined. Repeatedly select u � V − S with minimum
shortest path estimate, add u to S, relax all edges out of u.
Pseudo-code
2
Lecture 17 Shortest Paths III: Dijkstra 6.006 Spring 2008
Recall
RELAX(u, v, w)
if d[v] > d[u] + w(u, v)
then d[v] ← d[u] + w(u, v)
TT[v] ← u
Example
∞ ∞
2
B D
10
0 4 8 9
A 1
7
3
2
C E
∞ ∞
S={ } { A B C D E } = Q
S={A} 0 ∞ ∞ ∞ ∞
S = { A, C } 0 10 3 ∞ ∞ after relaxing
edges from A
S = { A, C } 0 7 3 11 5 after relaxing
edges from C
S = { A, C, E } 0 7 3 11 5
S = { A, C , E, B} 0 7 3 9 5 after relaxing
edges from B
3
Lecture 17 Shortest Paths III: Dijkstra 6.006 Spring 2008
Complexity
Array impl:
Binary min-heap:
4
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 18 Shortest Paths III: Dijkstra 6.006 Spring 2008
Readings
Wagner, Dorothea, and Thomas Willhalm. "Speed-Up Techniques for Shortest-Path
Computations." In Lecture Notes in Computer Science: Proceedings of the 24th Annual
Symposium on Theoretical Aspects of Computer Science. Berlin/Heidelberg, MA: Springer,
2007. ISBN: 9783540709176. Read up to section 3.2.
Initialize()
Q ← V [G]
while Q �= φ
do u ← EXTRACT MIN(Q) (stop if u = t!)
for each vertex v � Adj[u]
do RELAX(u, v, w)
Observation: If only shortest path from s to t is required, stop when t is removed from
Q, i.e., when u = t
DIJKSTRA Demo
4
B D
19 15
A 11 13
7
5
C E
A C E B D D B E C A E C A D B
7 12 18 22 4 13 15 22 5 12 13 16
1
Lecture 18 Shortest Paths III: Dijkstra 6.006 Spring 2008
Bi-Directional Search
Note: Speedup techniques covered here do not change worst-case behavior, but reduce the
number of visited vertices in practice.
S t
backward search
forward search
Bi-D Search
Algorithm terminates when some vertex w has been processed, i.e., deleted from the queue
of both searches, Qf and Qb
u u’
3
3 3
s t
5
5
2
Lecture 18 Shortest Paths III: Dijkstra 6.006 Spring 2008
Subtlety: After search terminates, find node x with minimum value of df (x) + db (x). x may
not be the vertex w that caused termination as in example to the left!
Find shortest path from s to x using Πf and shortest path backwards from t to x using Πb .
Note: x will have been deleted from either Qf or Qb or both.
u u’ Backward u u’
Forward 3 3
3 df (u) = 3
3 3 db(u’) = 3 3
s
s t
5 5 t db(t) = 0
df (s) = 0 5 5
w df (w) = 5
w db (w) = 5
df (u’) = 6 db(u’) = 3
u u’ u u’
Forward 3 Backward 3
3 3
3 3
df (u) = 3 db(u) = 6
s s t
t
5 5 db(t) = 0
df (s) = 0 5 5
df (w) = 5 w db (w) = 5
w
u u’ u u’ db(u’) = 3
Forward 3 Backward 3
df (u’) = 6 df (u’) = 6
3 3 3
3 df (u) = 3 df (u) = 3
db(u) = 6
s s t
t
5 5 db(t) = 0
5 df (t) = 10 df (s) = 0 5
df (s) = 0 df (t) = 10
db (s) = 10
df (w) = 5 w db (w) = 5
w
df (w) = 5
deleted from both queues
so terminate!
Minimum value for df (x) + db (x) over all vertices that have been processed in at least one
search
df (u) + db (u) = 3 + 6 = 9
3
Lecture 18 Shortest Paths III: Dijkstra 6.006 Spring 2008
df (u� ) + db (u� ) = 6 + 3 = 9
df (w) + db (w) = 5 + 5 = 10
Goal-Directed Search or A∗
Modify edge weights with potential function over vertices.
v’ v
5 5
increase decrease
go uphill go downhill
Correctness
w(p) = w(p) − λt (s) + λt (t)
So shortest paths are maintained in modified graph with w weights.
p
s t
p’
Landmarks
(l)
Small set of landmarks LCV . For all u�V, l�L, pre-compute δ(u, l). Potential λt (u) =
δ(u, l) = δ(t, l) for each l.
(l)
CLAIM: λt is feasible.
4
Lecture 18 Shortest Paths III: Dijkstra 6.006 Spring 2008
Feasibility
(l) (l)
w(u, v) = w(u, v) − λt (u) + λt (v)
= w(u, v) − δ(u, l) + δ(t, l) + δ(v, l) − δ(t, l)
= w(u, v) − δ(u, l) + δ(v, l) ≥ 0 by the Δ -inequality
(l)
λt (u) = max λt (u) is also feasible
l�L
5
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 19 Dynamic Programming I of IV 6.006 Spring 2008
• Shortest Paths
• Crazy Eights
• Guessing Viewpoint
Readings
CLRS 15
Fibonacci Numbers
F1 = F2 = 1; Fn = Fn−1 + Fn−2
Naive Algorithm
fib(n):
if n ≤ 2: return 1
else return fib(n − 1) + fib(n − 2)
=⇒ T (n) = T (n − 1) + T (n − 2) + O(1) ≈ φn
≥ 2T (n − 2) + O(1) ≥ 2n/2
EXPONENTIAL - BAD!
1
Lecture 19 Dynamic Programming I of IV 6.006 Spring 2008
Fn
Fn-1 Fn-2
Fn-2
Fn-3 Fn-3 Fn-4
Simple Idea
memoize
memo = { }
fib(n):
if n in memo: return memo[n]
else: if n ≤ 2 : f = 1
else: f = fib(n − 1) + fib(n − 2)
� �� �
free
memo[n] = f
return f
T (n) = T (n − 1) + O(1) = O(n)
[Side Note: There is also an O(lg n)- time algorithm for Fibonacci, via different techniques]
* DP ≈ recursion + memoization
• remember (memoize) previously solved “subproblems” that make up problem
2
Lecture 19 Dynamic Programming I of IV 6.006 Spring 2008
Shortest Paths
• Recursive formulation: �
δ(s, t) = min{w(s, v) + δ(v, t) �(s, v) � E}
�
s t
3
Lecture 19 Dynamic Programming I of IV 6.006 Spring 2008
• recursive formulation:
“Guessing” Viewpoint
• what is the first card in best trick? guess!
i.e., try all possibilities & take best result
- only O(n) choices
– if you pretend you knew, solution becomes easy (using other subproblems)
– actually pay factor of O(n) to try all
4
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 20 Dynamic Programming II of IV 6.006 Spring 2008
• Bottom-up implementation
Readings
CLRS 15
Summary
* DP ≈ “controlled brute force”
• essentially an amortization
• count each subproblem only once; after first time, costs O(1) via memoization
1
Lecture 20 Dynamic Programming II of IV 6.006 Spring 2008
Figure 1: DAG
Example.
Fibonacci:
for k in range(n + 1): fib[k] = · · ·
Shortest Paths:
for k in range(n): for v in V : d[k, v, t] = · · ·
Crazy Eights:
for i in reversed(range(n)): trick[i] = · · ·
2
Lecture 20 Dynamic Programming II of IV 6.006 Spring 2008
– Fibonacci: PS6
– Shortest Paths: re-use same table ∀k
• e.g., H I E R O G L Y P H O L O G Y vs. M I C H A E L A N G E L O
common subsequence is Hello
• suffixes x[i :]
• prefixes x[: i]
The suffixes and prefixes are Θ(| x |) ← =⇒ use if possible
• substrings x[i : j] Θ(| x |2 )
Idea: Combine such subproblems for x & y (suffixes and prefixes work)
LCS DP
• idea: either x[i] = y[j] part of LCS or not =⇒ either x[i] or y[j] (or both) not in
LCS (with anyone)
3
Lecture 20 Dynamic Programming II of IV 6.006 Spring 2008
j
φ |y|
φ
i if x[i] = y[j]
if x[i] = y[j]
|x|
[-linear space via antidiagonal order ]
Figure 2: DP Table
• recursive DP:
4
Lecture 20 Dynamic Programming II of IV 6.006 Spring 2008
• bottom-up DP:
if x[i] = y[j]:
c[i, j] = 1 + c[i + 1, j + 1]
parent[i, j] = (i + 1, j + 1)
else:
if c[i + 1, j] > c[i, j + 1]:
c[i, j] = c[i + 1, j]
parent[i, j] = (i + 1, j)
else:
c[i, j] = c[i, j + 1]
parent[i, j] = (i, j + 1)
lcs = [ ]
here = (0,0)
while c[here]:
if x[i] == y[j]:
lcs.append(x[i])
here = parent[here]
5
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
������� �� ������� ����������� ��� �� �� ����� ������ ����
• ����������������
• ��������
• ���������������� ����
• ������ ��������
��������
���� ��
�������
� �� �� ��� ����� ����������� � ��������
� � ���� ������
��� ����� �������� ������� � � ���������� �� �������� ���� �� ����� � =⇒ ����� �����
�
������� �� ������� ����������� ��� �� �� ����� ������ ����
���� �������������
����� ���� ���� ����� ������
• ������� ��� ��������� ����� ���������� ��� �� ���� ����� �� �� ���� ����� ������
�
�� ����� ������ > ���� �����
�
• ����� ����� ����� ���� ����� �� min �������
�� ���������
�� �������� = ��[φ]
�
������� �� ������� ����������� ��� �� �� ����� ������ ����
�����������������
������� ���������� �� ����������� ���������� � ����� ����������� ����������� ��������
A B C
(AB)C costs θ(n2)
. . A(BC) costs θ(n)
�� ���������
• ��[i, j] = min(�� [i, k] + ��[k, j]+ ���� �� ����������� (A[i] · · · A[k − 1]) ��
�� �������� = DP [0, n]
�� ��� ������ �������� �� ������� ��������
���������
�������� �� ���� S ��� ���� �� ����
• ����� ������ ������ �� ����� �� ������� ����� ����� ������� �� ����� ���� ≤S
����� ��������
�� ���������
�
������� �� ������� ����������� ��� �� �� ����� ������ ����
• ��� ������ ����������� �� ���� ������� ���� i ��� � ��� ���� ����� �� �����
������
�� ���������
�� �������� = DP [φ, S]
�� ��� ������ �������� �� ������� �������
�� ������ �� �������
���� ������
���������
���������� � ����
����������� � ���
���������� � �� ��
�
������� �� ������� ����������� ��� �� �� ����� ������ ����
������ �� ������
������ ���������
• ����� �������� �� n ������ ������ � � ����� �� ����� ����� w
����� ��������
�� �������� = DP [φ, φ]
�� ��� ������ �������� �� ������� ������
�
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
������� �� ������� ����������� �� �� �� ����� ������ ����
• ���������� �� �������
��������
���� ��
�������
� ���� ����� ��� ��
�� �������� �������
� � ����� �� ���������
�� �� �� ����� ����� ����� ����������� �� ��� ����� �� ����� �� ������ ����������
���
�
������� �� ������� ����������� �� �� �� ����� ������ ����
����� ���������
���������� �������� ������� ���������� ������� �����
• ����� ������� ����� �� ����� ��� �������� �� �������� ����� ���� ����� ����
• ������ d(f, p, g, q) �� �������� ����� ���� ���� p ���� ����� f �� ���� q ���� ����� g
������ �������� =⇒ ∞ �� f = g
����� ��������
�� ���������� � ���� ��������� ��� ����� �����[i :]
�� ���������� � min �������� ��� ���� ��������� ����� ����� f �� ���� ����[i]
�
������� �� ������� ����������� �� �� �� ����� ������ ����
���������� ���
������ ������������� ��������� ����� ���� � ���������������� ��� ������� �� ���������� ���
������� ����������
Figure 1: DP on Trees
������ ������
���� ������� ��� �� �������� ������� ���� ���� ����� ���� �� ������� �� ≥1 ���
�� �������� � �� � �� ������
YES
NO
Figure 2: Vertex Cover
�
������� �� ������� ����������� �� �� �� ����� ������ ����
� =⇒ 2 �������
� ��� =⇒ ����� �������� �����
=⇒ ���� ���� �������� ��������
�� ���� = O(n)
�� ��������
���������� ����
���� ������� ��� �� �������� ���� ���� ����� ������ �� �� �� �������� �� ���
�
�� ��[v] = min(1 + ���(DP [c] ��� c �� ��������[v] Y ES
� �� �
��� � �� ������� ��������� · · · ���� �������
��� ���������� � � ���� ���� ��� ������� ������ �� � ����� ���� � ��������� �������
=⇒ 2n ����������� �����
�
�� ���� � O( deg(v)) = O(E) = O(n)
�� ��������
�
������� �� ������� ����������� �� �� �� ����� ������ ����
�������
����������
���� ������ ��� ������ ������ ���� ���������� ����������� �∼ � ������
• ���� �������� ���� ��� ����������� �� ������� ��� �� ������ �� ���� ������ ��� ��
������ �������
������ ����� ����������� �� �����
• ��� ������� ����� ��������� �� ����� �������� ������� ��� ���������� ��� ������ 1+1/k
������ �� ������� � � � ∀ ��������� k
�
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
������� �� �������� � ����� ������ ����
• �����������
�
• �������� ������ ( (a), 1/b)
• ���� ����
������������
���������� ���������� ���� � �������� �������� ��� ��� ���� ��� ���������������� ����� �����
√2
1
���� �� �������
���������� ��������� ��� ����� ������ �������� �� ������������ ��� ��� ��� � ��������
√
2 = 1. 414 213 562 373 095
048 801 688 724 209
698 078 569 671 875
�
������� �� �������� � ����� ������ ����
����������
������� ��������
• �� α, β � P � ���� (α)β � P
����� �������� �������� ����� ������ ��� �� �������� ��� ���� � ���� � ������ α, β �����
�����������
C0 = 1 ����� ������
Cn+1 ? ����� ������ ���� n+1 ����� �� ����������� ��� �� �������� �� � ������ ��� ���
���� 2�
�
������� �� �������� � ����� ������ ����
�������� �������
1 A
C D
1000,000,000,000
BD = 1
���� �� AD�
�
AD = AC − CD = 500, 000, 000, 000 − 500, 000, 000, 0002 − 1
� �� �
a
�������� ������
xi
xi+1
y = f(x)
�
������� �� �������� � ����� ������ ����
������� �� (xi , f (xi )) �� ���� y = f (xi ) + f � (xi ) · (x − xi ) ����� f � (xi ) �� ��� �����������
f (xi )
xi+1 = xi −
f � (xi )
������ �����
f (x) = x2 − a
a
χi +
(χi 2 − a) χi
χi+1 = χi − =
2χi 2
�������
χ0 = 1.000000000 a=2
χ1 = 1.500000000
χ1 = 1.416666666
χ1 = 1.414215686
χ1 = 1.414213562
0 ≤ x0 , x1 < rn/2
0 ≤ y0 , y1 < rn/2
z = x · y = x1 y1 · rn + (x0 · y1 + x1 · y0 )rn/2 + x0 · y0
� ��������������� �� ���������� ��� =⇒ ��������� ��������� θ(n2 ) ����
�
������� �� �������� � ����� ������ ����
����������� ������
log2n
log2n
4T(n/2) 3T(n/2)
4log n = nlog 4 = n2
2 2
3log n = nlog 3
2 2
���
z0 = x0 · y0
z2 = x2 · y2
z1 = (x0 + x1 ) · (y0 + y1 ) − z0 − z2
= x0 y1 + x1 y0
z = z2 · rn + z · rn/2 + z0
�
T (n) = ���� �� �������� ��� n������� s
= 3T (n/2) + θ(n)
� �
= θ nlog2 3 = θ n1.5849625···
� �
�
������� �� �������� � ����� ������ ����
�����
Xn + a/Xn
Xn+1 =
√ 2
a(1 + �n ) + √ a
a(1+�n )
=
� 2 �
1
� (1 + �n ) + (1+�n )
= (a)
2
2 + 2�n + �n 2
� �
�
= (a)
2(1 + �n )
�n 2
� �
�
= (a) 1 +
2(1 + �n )
����������
�n 2
�n+1 =
2(1 + �n )
��������� ������������ �� � ������ ��������
�
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
������� �� �������� �� ����� ������ ����
• �������
� ���������
� ����� ��������
• �����������
�������
√
���� ��������� ����� �� 2� √
� 2 · 102d � d = 106
√
������� � a� ��� �������� ������
χ0 = 1 �������� ������
χi + a/χi
χi+1 = ← ���������
2
��������� �������������� � ������� ������ ������� ���� �����
���������������
�
������� �� �������� �� ����� ������ ����
��������
R
�������� ������ ��� ���������
b
� �
1 b R
f (x) = − ���� �� x =
x R b
−1
f � (x) =
x2 � �
1
b
f (χi ) χi
R −
χi+1 = χi − = χi −
f � (χi ) −1/χi 2
bχi 2 → ��������
� �
1 b
χi+1 = χi + χi 2 − = 2χi −
χi R R → ���� ���
�������
R 216 65536
���� = = = 13107.2
b 5 5
216
��� ������� ����� = 214
4
χ0 = 214 = 16384
χ1 = 2 · (16384) − 5(16384)2 /65536 = 12288
χ2 = 2 · (12288) − 5(12288)2 /65536 = 13056
χ3 = 2 · (13056) − 5(13056)2 /65536 = 13107
�
������� �� �������� �� ����� ������ ����
����� ��������
bχi 2 R
χi+1 = 2χi − ������ χi = (1 + �i )
R b
b R 2
� �
R
= 2 (1 + �i ) − (1 + �i )2
b R b
R�
(2 + 2�i ) − (1 + 2�i + �i 2 )
�
=
b
R� � R
= 1 − �i 2 = (1 + �i+1 ) ����� �i+1 = −�i 2
b b
��������� ������������ � ������ ������� �� ���� ����
��������� ���������� �� �������� � ���������� �� ��������������
�����������
χ + �a/χ �
���������� χi+1 = � i i
�
2
�� ����� ����� ���� ������� ����������
��������� ��
a
χi + χi −α
χi+1 = −β
2
a
χi + α
=
χi
−γ ����� γ = + β ��� 0 ≤ γ < 1
2 2
a+b √ χi + χai √ √
����� ≥ ab, ≥ a� �� ����������� γ ������ ������ �� ≥ � a�� ���� �����
2 2
���� ����� ����� �� �i < 1 ����� ������� ������
�
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 25 Beyond 6.006 6.006 Spring 2008
8. 6.856: Randomized Algorithms (how randomness makes algorithms simpler & faster)