0% found this document useful (0 votes)
14 views150 pages

Introduction To Algorithms Lecture Notes (MIT 6 - 006) - It-eBooks - It-eBooks-2017, 2017 - IBooker It-eBooks - Anna's Archive

The document outlines the MIT course 6.006 Introduction to Algorithms, focusing on the Document Distance problem and its computational efficiency. It discusses various algorithms and optimizations implemented in Python to compute document similarity, including the use of word frequency metrics and data structures like dictionaries to improve performance. The document also highlights the importance of understanding algorithmic complexity and provides a summary of the improvements made throughout the course, culminating in a more efficient approach to solving the Document Distance problem.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views150 pages

Introduction To Algorithms Lecture Notes (MIT 6 - 006) - It-eBooks - It-eBooks-2017, 2017 - IBooker It-eBooks - Anna's Archive

The document outlines the MIT course 6.006 Introduction to Algorithms, focusing on the Document Distance problem and its computational efficiency. It discusses various algorithms and optimizations implemented in Python to compute document similarity, including the use of word frequency metrics and data structures like dictionaries to improve performance. The document also highlights the importance of understanding algorithmic complexity and provides a summary of the improvements made throughout the course, culminating in a more efficient approach to solving the Document Distance problem.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 150

MIT OpenCourseWare

http://ocw.mit.edu

6.006 Introduction to Algorithms


Spring 2008

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 1 Introduction and Document Distance 6.006 Spring 2008

Lecture 1: Introduction and the Document


Distance Problem

Course Overview
• Efficient procedures for solving problems on large inputs (Ex: entire works of Shake­
speare, human genome, U.S. Highway map)

• Scalability

• Classic data structures and elementary algorithms (CLRS text)

• Real implementations in Python ⇔ Fun problem sets!

• β version of the class - feedback is welcome!

Pre-requisites
• Familiarity with Python and Discrete Mathematics

Contents
The course is divided into 7 modules - each of which has a motivating problem and problem
set (except for the last module). Modules and motivating problems are as described below:

1. Linked Data Structures: Document Distance (DD)

2. Hashing: DD, Genome Comparison

3. Sorting: Gas Simulation

4. Search: Rubik’s Cube 2 × 2 × 2

5. Shortest Paths: Caltech → MIT

6. Dynamic Programming: Stock Market



7. Numerics: 2

Document Distance Problem


Motivation

Given two documents, how similar are they?

• Identical - easy?

• Modified or related (Ex: DNA, Plagiarism, Authorship)

1
Lecture 1 Introduction and Document Distance 6.006 Spring 2008

• Did Francis Bacon write Shakespeare’s plays?

To answer the above, we need to define practical metrics. Metrics are defined in terms of
word frequencies.

Definitions

1. Word : Sequence of alphanumeric characters. For example, the phrase “6.006 is fun”
has 4 words.

2. Word Frequencies: Word frequency D(w) of a given word w is the number of times
it occurs in a document D.
For example, the words and word frequencies for the above phrase are as below:

Count : 1 0 1 1 0 1
W ord : 6 the is 006 easy f un

In practice, while counting, it is easy to choose some canonical ordering of words.

3. Distance Metric: The document distance metric is the inner product of the vectors D1
and D2 containing the word frequencies for all words in the 2 documents. Equivalently,
this is the projection of vectors D1 onto D2 or vice versa. Mathematically this is
expressed as:


D1 · D2 = D1 (w) · D2 (w) (1)
w

4. Angle Metric: The angle between the vectors D1 and D2 gives an indication of overlap
between the 2 documents. Mathematically this angle is expressed as:
� �
D1 · D2
θ(D1 , D2 ) = arccos
� D1 � ∗ � D2 �

0 ≤ θ ≤ π/2

An angle metric of 0 means the two documents are identical whereas an angle metric
of π/2 implies that there are no common words.

5. Number of Words in Document: The magnitude of the vector D which contains word
frequencies of all words in the document. Mathematically this is expressed as:


N (D) =� D �= D·D (2)

So let’s apply the ideas to a few Python programs and try to flesh out more.

2
Lecture 1 Introduction and Document Distance 6.006 Spring 2008

Document Distance in Practice


Computing Document Distance: docdist1.py

The python code and results relevant to this section are available here. This program com­
putes the distance between 2 documents by performing the following steps:

• Read file

• Make word list [“the”,“year”,. . . ]

• Count frequencies [[“the”,4012],[“year”,55],. . . ]

• Sort into order [[“a”,3120],[“after”,17],. . . ]

• Compute θ

Ideally, we would like to run this program to compute document distances between writings
of the following authors:

• Jules Verne - document size 25k

• Bobsey Twins - document size 268k

• Lewis and Clark - document size 1M

• Shakespeare - document size 5.5M

• Churchill - document size 10M

Experiment: Comparing the Bobsey and Lewis documents with docdist1.py gives θ = 0.574.
However, it takes approximately 3 minutes to compute this document distance, and probably
gets slower as the inputs get large.
What is wrong with the efficiency of this program?
Is it a Python vs. C issue? Is it a choice of algorithm issue - θ(n2 ) versus θ(n)?

Profiling: docdist2.py

In order to figure out why our initial program is so slow, we now “instrument” the program
so that Python will tell us where the running time is going. This can be done simply using
the profile module in Python. The profile module indicates how much time is spent in each
routine.
(See this link for details on profile).
The profile module is imported into docdist1.py and the end of the docdist1.py file is
modified. The modified docdist1.py file is renamed as docdist2.py
Detailed results of document comparisons are available here .

3
Lecture 1 Introduction and Document Distance 6.006 Spring 2008

More on the different columns in the output displayed on that webpage:

• tottime per call(column3) is tottime(column2)/ncalls(column1)

• cumtime(column4)includes subroutine calls

• cumtime per call(column5) is cumtime(column4)/ncalls(column1)

The profiling of the Bobsey vs. Lewis document comparison is as follows:

• Total: 195 secs

• Get words from line list: 107 secs

• Count-frequency: 44 secs

• Get words from string: 13 secs

• Insertion sort: 12 secs

So the get words from line list operation is the culprit. The code for this particular section
is:

word_list = [ ]
for line in L:
words_in_line = get_words_from_string(line)
word_list = word_list + words_in_line
return word_list

The bulk of the computation time is to implement

word_list = word_list + words_in_line

There isn’t anything else that takes up much computation time.

List Concatenation: docdist3.py

The problem in docdist1.py as illustrated by docdist2.py is that concatenating two lists


takes time proportional to the sum of the lengths of the two lists, since each list is copied
into the output list!
L = L1 + L2 takes time proportional to | L1 | + | L2 |. If we had n lines (each with one
word), computation time would be proportional to 1 + 2 + 3 + . . . + n = n(n+1)
2 = θ(n2 )

Solution:

word_list.extend (words_in_line)
[word_list.append(word)] for each word in words_in_line

Ensures L1 .extend(L2 ) time proportional to | L2 |

4
Lecture 1 Introduction and Document Distance 6.006 Spring 2008

Take Home Lesson: Python has powerful primitives (like concatenation of lists) built in.
To write efficient algorithms, we need to understand their costs. See Python Cost Model
for details. PS1 also has an exercise on figuring out cost of a set of operations.

Incorporate this solution into docdist1.py - rename as docdist3.py. Implementation details


and results are available here. This modification helps run the Bobsey vs. Lewis example
in 85 secs (as opposed to the original 195 secs).

We can improve further by looking for other quadratic running times hidden in our routines.
The next offender (in terms of overall computation time) is the count frequency routine,
which computes the frequency of each word, given the word list.

Analysing Count Frequency

def count_frequency(word_list):
"""
Return a list giving pairs of form: (word,frequency)
"""
L = []
for new_word in word_list:
for entry in L:
if new_word == entry[0]:
entry[1] = entry[1] + 1
break
else:
L.append([new_word,1])
return L

If document has n words and d distinct words, θ(nd). If all words distinct, θ(n2 ). This shows
that the count frequency routine searches linearly down the list of word/frequency pairs to
find the given word. Thus it has quadratic running time! Turns out the count frequency
routine takes more than 1/2 of the running time in docdist3.py. Can we improve?

Dictionaries: docdist4.py

The solution to improve the Count Frequency routine lies in hashing, which gives constant
running time routines to store and retrieve key/value pairs from a table. In Python, a hash
table is called a dictionary. Documentation on dictionaries can be found here.

Hash table is defined a mapping from a domain(finite collection of immutable things) to a


range(anything). For example, D[‘ab’] = 2, D[‘the’] = 3.

5
Lecture 1 Introduction and Document Distance 6.006 Spring 2008

Modify docdist3.py to docdist4.py using dictionaries to give constant time lookup. Modified
count frequency routine is as follows:

def count_frequency(word_list):
"""
Return a list giving pairs of form: (word,frequency)
"""
D = {}
for new_word in word_list:
if D.has_key(new_word):
D[new_word] = D[new_word]+1
else:
D[new_word] = 1
return D.items()

Details of implementation and results are here. Running time is now θ(n). We have suc­
cessfully replaced one of our quadratic time routines with a linear-time one, so the running
time will scale better for larger inputs. For the Bobsey vs. Lewis example, running time
improves from 85 secs to 42 secs.

What’s left? The two largest contributors to running time are now:

• Get words from string routine (13 secs) — version 5 of docdist fixes this with translate

• Insertion sort routine (11 secs) — version 6 of docdist fixes this with merge-sort

More on that next time . . .

6
MIT OpenCourseWare
http://ocw.mit.edu

6.006 Introduction to Algorithms


Spring 2008

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 2 Ver 2.0 More on Document Distance 6.006 Spring 2008

Lecture 2: More on the Document Distance


Problem
Lecture Overview
Today we will continue improving the algorithm for solving the document distance problem.
• Asymptotic Notation: Define notation precisely as we will use it to compare the
complexity and efficiency of the various algorithms for approaching a given problem
(here Document Distance).

• Document Distance Summary - place everything we did last time in perspective.

• Translate to speed up the ‘Get Words from String’ routine.

• Merge Sort instead of Insertion Sort routine

– Divide and Conquer


– Analysis of Recurrences

• Get rid of sorting altogether?

Readings
CLRS Chapter 4

Asymptotic Notation
General Idea

For any problem (or input), parametrize problem (or input) size as n Now consider many
different problems (or inputs) of size n. Then,
T (n) = worst case running time for input size n
= max running time on X
X: Input of Size n
How to make this more precise?
• Don’t care about T (n) for small n

• Don’t care about constant factors (these may come about differently with different
computers, languages, . . . )
For example, the time (or the number of steps) it takes to complete a problem of size n
might be found to be T (n) = 4n2 − 2n + 2 µs. From an asymptotic standpoint, since n2
will dominate over the other terms as n grows large, we only care about the highest order
term. We ignore the constant coefficient preceding this highest order term as well because
we are interested in rate of growth.

1
Lecture 2 Ver 2.0 More on Document Distance 6.006 Spring 2008

Formal Definitions

1. Upper Bound: We say T (n) is O(g(n)) if ∃ n0 , ∃ c s.t. 0 ≤ T (n) ≤ c.g(n) ∀n ≥ n0


Substituting 1 for n0 , we have 0 ≤ 4n2 − 2n + 2 ≤ 26n2 ∀n ≥ 1
∴ 4n2 − 2n + 2 = O(n2 )
Some semantics:

• Read the ‘equal to’ sign as “is” or � belongs to a set.


• Read the O as ‘upper bound’

2. Lower Bound: We say T (n) is Ω(g(n)) if ∃ n0 , ∃ d s.t. 0 ≤ d.g(n) ≤ T (n) ∀n ≥ n0


Substituting 1 for n0 , we have 0 ≤ 4n2 + 22n − 12 ≤ n2 ∀n ≥ 1
∴ 4n2 + 22n − 12 = Ω(n2 )
Semantics:

• Read the ‘equal to’ sign as “is” or � belongs to a set.


• Read the Ω as ‘lower bound’

3. Order: We say T (n) is Θ(g(n)) iff T (n) = O(g(n)) and T (n) = Ω(g(n))


Semantics: Read the Θ as ‘high order term is g(n)’

Document Distance so far: Review


To compute the ‘distance’ between 2 documents, perform the following operations:

For each of the 2 files:


Read file
Make word list + op on list Θ(n2 )
Count frequencies double loop Θ(n2 )
Sort in order insertion sort, double loop Θ(n2 )
Once vectors D1 ,D2 are obtained:
D1 ·D2
� �
Compute the angle arccos �D1 �∗�D2 �
Θ(n)

2
Lecture 2 Ver 2.0 More on Document Distance 6.006 Spring 2008

The following table summarizes the efficiency of our various optimizations for the Bobsey
vs. Lewis comparison problem:

Version Optimizations Time Asymptotic


V1 initial ? ?
V2 add profiling 195 s
V3 wordlist.extend(. . . ) 84 s Θ(n2 ) → Θ(n)
V4 dictionaries in count-frequency 41 s Θ(n2 ) → Θ(n)
V5 process words rather than chars in get words from string 13 s Θ(n) → Θ(n)
V6 merge sort rather than insertion sort 6s Θ(n2 ) → Θ(n lg(n))
V6B eliminate sorting altogether 1s a Θ(n) algorithm

The details for the version 5 (V5) optimization will not be covered in detail in this lecture.
The code, results and implementation details can be accessed at this link. The only big
obstacle that remains is to replace Insertion Sort with something faster because it takes
time Θ(n2 ) in the worst case. This will be accomplished with the Merge Sort improvement
which is discussed below.

Merge Sort
Merge Sort uses a divide/conquer/combine paradigm to scale down the complexity and
scale up the efficiency of the Insertion Sort routine.

A input array of size n

2 arrays of size n/2


L R

sort sort
2 sorted arrays
L’ R’ of size n/2

merge

sorted array A sorted array of size n

Figure 1: Divide/Conquer/Combine Paradigm

3
Lecture 2 Ver 2.0 More on Document Distance 6.006 Spring 2008

5 4 7 3 6 1 9 2

3 4 5 7 1 2 6 9

i j
1 2 3 4 5 6 7 9
inc j inc i inc i inc i
inc j inc i inc j inc j
(array (array
L R
done) done)

Figure 2: “Two Finger” Algorithm for Merge

The above operations give us T (n) = C1 + 2.T (n/2) + ����


C.n
���� � �� �
divide recursion merge
Keeping only the higher order terms,

T (n) = 2T (n/2) + C · n
= C · n + 2 × (C · n/2 + 2(C · (n/4) + . . .))

Detailed notes on implementation of Merge Sort and results obtained with this improvement
are available here. With Merge Sort, the running time scales “nearly linearly” with the size
of the input(s) as n lg(n) is “nearly linear”in n.

An Experiment
Insertion Sort Θ(n2 )
Merge Sort Θ(n lg(n)) if n = 2i
Built in Sort Θ(n lg(n))

• Test Merge Routine: Merge Sort (in Python) takes ≈ 2.2n lg(n) µs

• Test Insert Routine: Insertion Sort (in Python) takes ≈ 0.2n2 µs

• Built in Sort or sorted (in C) takes ≈ 0.1n lg(n) µs

The 20X constant factor difference comes about because Built in Sort is written in C while
Merge Sort is written in Python.

4
Lecture 2 Ver 2.0 More on Document Distance 6.006 Spring 2008

}
Cn Cn

C(n/2) C(n/2) Cn lg(n)+1


levels
C(n/4) including
leaves
Cn
Cn
C C Cn

T(n) = Cn(lg(n)+1)
= Θ(nlgn)

Figure 3: Efficiency of Running Time for Problem of size n is of order Θ(n lg(n))

Question: When is Merge Sort (in Python) 2n lg(n) better than Insertion Sort (in C)
0.01n2 ?
Aside: Note the 20X constant factor difference between Insertion Sort written in Python
and that written in C
Answer: Merge Sort wins for n ≥ 212 = 4096
Take Home Point: A better algorithm is much more valuable than hardware or compiler
even for modest n

See recitation for more Python Cost Model experiments of this sort . . .

5
MIT OpenCourseWare
http://ocw.mit.edu

6.006 Introduction to Algorithms


Spring 2008

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 3 Ver 2.0 Scheduling and Binary Search Trees 6.006 Spring 2008

Lecture 3: Scheduling and Binary Search Trees


Lecture Overview
• Runway reservation system

– Definition
– How to solve with lists

• Binary Search Trees

– Operations

Readings
CLRS Chapter 10, 12. 1-3

Runway Reservation System


• Airport with single (very busy) runway (Boston 6 → 1)

• “Reservations” for future landings

• When plane lands, it is removed from set of pending events

• Reserve req specify “requested landing time” t

• Add t to the set of no other landings are scheduled within < 3 minutes either way.

– else error, don’t schedule

Example

37 41 46 49 56
time (mins)
now x x x x

Figure 1: Runway Reservation System Example

Let R denote the reserved landing times: R = (41, 46, 49, 56)
Request for time: 44 not allowed (46�R)
53 OK
20 not allowed (already past)
| R |= n
Goal: Run this system efficiently in O(lg n) time

1
Lecture 3 Ver 2.0 Scheduling and Binary Search Trees 6.006 Spring 2008

Algorithm

Keep R as a sorted list.

init: R = [ ]
req(t): if t < now: return "error"
for i in range (len(R)):
if abs(t-R[i]) <3: return "error" %\Theta (n)
R.append(t)
R = sorted(R)
land: t = R[0]
if (t != now) return error
R = R[1: ] (drop R[0] from R)

Can we do better?

• Sorted list: A 3 minute check can be done in O(1). It is possible to insert new
time/plane rather than append and sort but insertion takes Θ(n) time.

• Sorted array: It is possible to do binary search to find place to insert in O(lg n)


time. Actual insertion however requires shifting elements which requires Θ(n) time.

• Unsorted list/array: Search takes O(n) time

• Dictionary or Python Set: Insertion is O(1) time. 3 minute check takes Ω(n) time

What if times are in whole minutes?


Large array indexed by time does the trick. This will not work for arbitrary precision
time or verifying width slots for landing.
Key Lesson: Need fast insertion into sorted list.

New Requirement
Rank(t): How many planes are scheduled to land at times ≤ t? The new requirement
necessitates a design amendment.

2
Lecture 3 Ver 2.0 Scheduling and Binary Search Trees 6.006 Spring 2008

Binary Search Trees (BST)

BST NIL
insert 49 BST 49 root
all elements > 49
insert 79 BST 49
off to the right,
79 in right subtree
49 all elements < 49,
insert 46 BST go into left subtree
79
46

49
insert 41 79
BST 46
insert 64
64
41

Figure 2: Binary Search Tree

Finding the minimum element in a BST

Key is to just go left till you cannot go left anymore.

49
49
79
41 79
46
46

Figure 3: Delete-Min: finds minimum and eliminates it

All operations are O(h) where h is height of the BST.

3
Lecture 3 Ver 2.0 Scheduling and Binary Search Trees 6.006 Spring 2008

Finding the next larger element

next-larger(x)

if right child not NIL, return minimum(right)


else y = parent(x)
while y not NIL and x = right(y)
x = y; y = parent(y)
return(y);

See Fig. 4 for an example. What would next-larger(46) return?

49

41 79

46

Figure 4: next-larger(x)

What about rank(t)?

Cannot solve it efficiently with what we have but can augment the BST structure.

6
what lands before 79?
49
2 3
46 79 keep track of size of subtrees,
1 1 1 during insert and delete
43 64 83

Figure 5: Augmenting the BST Structure

Summarizing from Fig. 5, the algorithm for augmentation is as follows:

1. Walk down tree to find desired time

2. Add in nodes that are smaller

3. Add in subtree sizes to the left

In total, this takes O(h) time.

4
Lecture 3 Ver 2.0 Scheduling and Binary Search Trees 6.006 Spring 2008

subtree
49 46
1 + 2 + 1 + 1 = 5
79 64
subtree

Figure 6: Augmentation Algorithm Example

All the Python code for the Binary Search Trees discussed here are available at this link

Have we accomplished anything?


Height h of the tree should be O(log(n).

43

46

49

55

Figure 7: Insert into BST in sorted order

The tree in Fig. 7 looks like a linked list. We have achieved O(n) not O(log(n)!!

. .
|

Balanced BSTs to the rescue...more on that in the next lecture!

5
MIT OpenCourseWare
http://ocw.mit.edu

6.006 Introduction to Algorithms


Spring 2008

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 4 Balanced Binary Search Trees 6.006 Spring 2008

Lecture 4: Balanced Binary Search Trees

Lecture Overview
• The importance of being balanced

• AVL trees

– Definition
– Balance
– Insert

• Other balanced trees

• Data structures in general

Readings
CLRS Chapter 13. 1 and 13. 2 (but different approach: red-black trees)

Recall: Binary Search Trees (BSTs)


• rooted binary tree

• each node has

– key
– left pointer
– right pointer
– parent pointer

See Fig. 1

3 41

2 20 65 1

29 50 φ
φ 11 1

26
φ

Figure 1: Heights of nodes in a BST

1
Lecture 4 Balanced Binary Search Trees 6.006 Spring 2008

≤x ≥x

Figure 2: BST property

• BST property (see Fig. 2).

• height of node = length (� edges) of longest downward path to a leaf (see CLRS B.5
for details).

The Importance of Being Balanced:


• BSTs support insert, min, delete, rank, etc. in O(h) time, where h = height of tree
(= height of root).

• h is between lg(n) and n: Fig. 3).

vs.

Perfectly Balanced Path

Figure 3: Balancing BSTs

• balanced BST maintains h = O(lg n) ⇒ all operations run in O(lg n) time.

2
Lecture 4 Balanced Binary Search Trees 6.006 Spring 2008

AVL Trees:
Definition

AVL trees are self-balancing binary search trees. These trees are named after their two
inventors G.M. Adel’son-Vel’skii and E.M. Landis 1
An AVL tree is one that requires heights of left and right children of every node to differ
by at most ±1. This is illustrated in Fig. 4)

k-1 k

Figure 4: AVL Tree Concept

In order to implement an AVL tree, follow two critical steps:

• Treat nil tree as height −1.

• Each node stores its height. This is inherently a DATA STRUCTURE AUGMENTATION
procedure, similar to augmenting subtree size. Alternatively, one can just store dif­
ference in heights.

A good animation applet for AVL trees is available at this link. To compare Binary Search
Trees and AVL balancing of trees use code provided here .

1
Original Russian article: Adelson-Velskii, G.; E. M. Landis (1962). ”An algorithm for the organization
of information”. Proceedings of the USSR Academy of Sciences 146: 263266. (English translation by Myron
J. Ricci in Soviet Math. Doklady, 3:12591263, 1962.)

3
Lecture 4 Balanced Binary Search Trees 6.006 Spring 2008

Balance:

The balance is the worst when every node differs by 1.


Let Nh = min (� nodes).

⇒ Nh = Nh−1 + Nh−2 + 1
> 2Nh−2
⇒ Nh > 2h/2
1
=⇒ h < lg h
2
Alternatively:

Nh > Fn (nth Fibonacci number)


In fact,Nh = Fn+2 − 1 (simple induction)
φ h
Fh = √ (rounded to nearest integer)
5

1+ 5
where φ = ≈ 1.618 (golden ratio)
2
=⇒ maxh ≈ logφ (n) ≈ 1.440 lg(n)

AVL Insert:

1. insert as in simple BST

2. work your way up tree, restoring AVL property (and updating heights as you go).

Each Step:

• suppose x is lowest node violating AVL

• assume x is right-heavy (left case symmetric)

• if x’s right child is right-heavy or balanced: follow steps in Fig. 5

• else follow steps in Fig. 6

• then continue up to x’s grandparent, greatgrandparent . . .

4
Lecture 4 Balanced Binary Search Trees 6.006 Spring 2008

x
y

y Left-Rotate(x) k
k-1 k+1 x
A k
C

k-1 B C k k-1 A k-1 B

x y

k+1 x
k-1 z k+1 Left-Rotate(x)
A k
C

k B C k k-1 A k B

Figure 5: AVL Insert Balancing

x y k+1

Right-Rotate(z)
z k x z k
k-1 k+1 Left-Rotate(x)
A

k-1
k y or
k-1 A B C D k-1
D k-1 k-2
k-1
or
B C
k-2

Figure 6: AVL Insert Balancing

5
Lecture 4 Balanced Binary Search Trees 6.006 Spring 2008

Example: An example implementation of the AVL Insert process is illustrated in Fig. 7

Insert(23) x = 29: left-left case


3 41 41

2 20 65 1 20 65 1

50 φ 50 φ
φ 11 1 29 φ 11 2 29

26 1 26
φ
φ 23
Done Insert(55)
3 41 3 41

2 20 65 1
2 20 65 1

50 φ 50 φ
φ 11 1 26 φ 11 1 26

φ 23 29 φ φ 23 29 φ

x=65: left-right case Done


41 3 41

2 20 2 65 2 20 1 55

50 1 φ 50 φ 65
11 1 26 11 1 26
φ φ
55 φ
φ 23 29 φ φ 23 29 φ

Figure 7: Illustration of AVL Tree Insert Process

Comment 1. In general, process may need several rotations before an Insert is completed.

Comment 2. Delete(-min) harder but possible.

6
Lecture 4 Balanced Binary Search Trees 6.006 Spring 2008

Balanced Search Trees:


There are many balanced search trees.

AVL Trees Adel’son-Velsii and Landis 1962


B-Trees/2-3-4 Trees Bayer and McCreight 1972 (see CLRS 18)
BB[α] Trees Nievergelt and Reingold 1973
Red-black Trees CLRS Chapter 13
Splay-Trees Sleator and Tarjan 1985
Skip Lists Pugh 1989
Scapegoat Trees Galperin and Rivest 1993
Treaps Seidel and Aragon 1996

Note 1. Skip Lists and Treaps use random numbers to make decisions fast with high
probability.
Note 2. Splay Trees and Scapegoat Trees are “amortized”: adding up costs for several
operations =⇒ fast on average.

7
Lecture 4 Balanced Binary Search Trees 6.006 Spring 2008

Splay Trees

Upon access (search or insert), move node to root by sequence of rotations and/or double-
rotations (just like AVL trees). Height can be linear but still O(lg n) per operation “on
average” (amortized)

Note: We will see more on amortization in a couple of lectures.

Optimality

• For BSTs, cannot do better than O(lg n) per search in worst case.

• In some cases, can do better e.g.

– in-order traversal takes Θ(n) time for n elements.


– put more frequent items near root

A Conjecture: Splay trees are O(best BST) for every access pattern.

• With fancier tricks, can achieve O(lg lg u) performance for integers 1 · · · u [Van Ernde
Boas; see 6.854 or 6.851 (Advanced Data Structures)]

Big Picture:
Abstract Data Type(ADT): interface spec.
e.g. Priority Queue:

• Q = new-empty-queue()
• Q.insert(x)
• x = Q.deletemin()

vs.

Data Structure (DS): algorithm for each op.

There are many possible DSs for one ADT. One example that we will discuss much later in
the course is the “heap” priority queue.

8
MIT OpenCourseWare
http://ocw.mit.edu

6.006 Introduction to Algorithms


Spring 2008

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 5 Hashing I: Chaining, Hash Functions 6.006 Spring 2008

Lecture 5: Hashing I: Chaining, Hash Functions

Lecture Overview
• Dictionaries and Python

• Motivation

• Hash functions

• Chaining

• Simple uniform hashing

• “Good” hash functions

Readings
CLRS Chapter 11. 1, 11. 2, 11. 3.

Dictionary Problem
Abstract Data Type (ADT) maintains a set of items, each with a key, subject to

• insert(item): add item to set

• delete(item): remove item from set

• search(key): return item with key if it exists

• assume items have distinct keys (or that inserting new one clobbers old)

• balanced BSTs solve in O(lg n) time per op. (in addition to inexact searches like
nextlargest).

• goal: O(1) time per operation.

Python Dictionaries:

Items are (key, value) pairs e.g. d = ‘algorithms’: 5, ‘cool’: 42

d.items() → [(‘algorithms’, 5),(‘cool’,5)]


d[‘cool’] → 42
d[42] → KeyError
‘cool’ in d → True
42 in d → False

Python set is really dict where items are keys.

1
Lecture 5 Hashing I: Chaining, Hash Functions 6.006 Spring 2008

Motivation
Document Distance

• already used in

def count_frequency(word_list):
D = {}
for word in word_list:
if word in D:
D[word] += 1
else:
D[word] = 1

• new docdist7 uses dictionaries instead of sorting:

def inner_product(D1, D2):


sum = φ. φ

for key in D1:


if key in D2:
sum += D1[key]*D2[key]

=⇒ optimal Θ(n) document distance assuming dictionary ops. take O(1) time

PS2

How close is chimp DNA to human DNA?


= Longest common substring of two strings
e.g. ALGORITHM vs. ARITHMETIC.
Dictionaries help speed algorithms e.g. put all substrings into set, looking for duplicates
- Θ(n2 ) operations.

2
Lecture 5 Hashing I: Chaining, Hash Functions 6.006 Spring 2008

How do we solve the dictionary problem?


A simple approach would be a direct access table. This means items would need to be
stored in an array, indexed by key.

φ
1
2
key item

key item

key item
.
.
.

Figure 1: Direct-access table

Problems:

1. keys must be nonnegative integers (or using two arrays, integers)

2. large key range =⇒ large space e.g. one key of 2256 is bad news.

2 Solutions:

Solution 1 : map key space to integers.

• In Python: hash (object) where object is a number, string, tuple, etc. or object
implementing — hash — Misnomer: should be called “prehash”

• Ideally, x = y ⇔ hash(x) = hash (y)

• Python applies some heuristics e.g. hash(‘\φB ’) = 64 = hash(‘\φ \ φC’)

• Object’s key should not change while in table (else cannot find it anymore)

• No mutable objects like lists

3
Lecture 5 Hashing I: Chaining, Hash Functions 6.006 Spring 2008

Solution 2 : hashing (verb from ‘hache’ = hatchet, Germanic)

• Reduce universe U of all keys (say, integers) down to reasonable size m for table

• idea: m ≈ n, n =| k |, k = keys in dictionary

• hash function h: U → φ, 1, . . . , m − 1

T
φ
k1 1
. . .
U . . . k. k3
k k . k.
1
h(k 1) = 1
.
. .
2

k.
4

. 3

k2
m-1

Figure 2: Mapping keys to a table

• two keys ki , kj � K collide if h(ki ) = h(kj )

How do we deal with collisions?

There are two ways

1. Chaining: TODAY

2. Open addressing: NEXT LECTURE

4
Lecture 5 Hashing I: Chaining, Hash Functions 6.006 Spring 2008

Chaining
Linked list of colliding elements in each slot of table

.k
k1 . . .
U k k . . 2 k1 k4 k2
4
.k 3

. h(k 1) =
k3 h(k 2) =
h(k )
4

Figure 3: Chaining in a Hash Table

• Search must go through whole list T[h(key)]

• Worst case: all keys in k hash to same slot =⇒ Θ(n) per operation

Simple Uniform Hashing - an Assumption:


Each key is equally likely to be hashed to any slot of table, independent of where other keys
are hashed.

let n = � keys stored in table


m = � slots in table
load factor α = n/m = average � keys per slot

Expected performance of chaining: assuming simple uniform hashing

The performance is likely to be O(1 + α) - the 1 comes from applying the hash function
and access slot whereas the α comes from searching the list. It is actually Θ(1 + α), even
for successful search (see CLRS ).
Therefore, the performance is O(1) if α = O(1) i. e. m = Ω(n).

5
Lecture 5 Hashing I: Chaining, Hash Functions 6.006 Spring 2008

Hash Functions
Division Method:

h(k) = k mod m

• k1 and k2 collide when k1 = k2 ( mod m) i. e. when m divides | k1 − k2 |

• fine if keys you store are uniform random

• but if keys are x, 2x, 3x, . . . (regularity) and x and m have common divisor d then use
only 1/d of table. This is likely if m has a small divisor e. g. 2.

• if m = 2r then only look at r bits of key!

Good Practice: A good practice to avoid common regularities in keys is to make m a


prime number that is not close to power of 2 or 10.
Key Lesson: It is inconvenient to find a prime number; division slow.

Multiplication Method:

h(k) = [(a · k) mod 2w ] � (w − r) where m = 2r and w-bit machine words and a = odd
integer between 2( w − 1) and 2w .
Good Practise: a not too close to 2(w−1) or 2w .
Key Lesson: Multiplication and bit extraction are faster than division.

w
k

x a
}

Figure 4: Multiplication Method

6
MIT OpenCourseWare
http://ocw.mit.edu

6.006 Introduction to Algorithms


Spring 2008

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 6 Hashing II: Table Doubling, Karp-Rabin 6.006 Spring 2008

Lecture 6: Hashing II: Table Doubling,


Karp-Rabin
Lecture Overview
• Table Resizing

• Amortization

• String Matching and Karp-Rabin

• Rolling Hash

Readings
CLRS Chapter 17 and 32.2.

Recall:
Hashing with Chaining:

table collisions
all possible
keys
.kk1 . . .
U k . . 2 k1 k4 k2
k k.
4

3
h
}
n keys
. expected size
in set DS m slots k3 α = n/m
Cost : Θ (1+α)

Figure 1: Chaining in a Hash Table

Multiplication Method:

h(k) = [(a · k) mod 2w ] � (w − r)


where m = table size = 2r
w = number of bits in machine words
a = odd integer between 2w−1 and 2w

1
Lecture 6 Hashing II: Table Doubling, Karp-Rabin 6.006 Spring 2008

w
k

x a ≡

ignore keep ignore +


}
}
r w-r product lots of mixing
as sum

Figure 2: Multiplication Method

How Large should Table be?


• want m = θ(n) at all times

• don’t know how large n will get at creation

• m too small =⇒ slow; m too big =⇒ wasteful

Idea:

Start small (constant) and grow (or shrink) as necessary.

Rehashing:

To grow or shrink table hash function must change (m, r)

=⇒ must rebuild hash table from scratch


for item in old table:
insert into new table
=⇒ Θ(n + m) time = Θ(n) if m = Θ(n)

2
Lecture 6 Hashing II: Table Doubling, Karp-Rabin 6.006 Spring 2008

How fast to grow?

When n reaches m, say

• m + = 1?
=⇒ rebuild every step
=⇒ n inserts cost Θ(1 + 2 + · · · + n) = Θ(n2 )

• m ∗ = 2? m = Θ(n) still (r+ = 1)


=⇒ rebuild at insertion 2i
=⇒ n inserts cost Θ(1 + 2 + 4 + 8 + · · · + n) where n is really the next power of 2
= Θ(n)

• a few inserts cost linear time, but Θ(1) “on average”.

Amortized Analysis
This is a common technique in data structures - like paying rent: $ 1500/month ≈ $ 50/day

• operation has amortized cost T (n) if k operations cost ≤ k · T (n)

• “T (n) amortized” roughly means T (n) “on average”, but averaged over all ops.

• e.g. inserting into a hash table takes O(1) amortized time.

Back to Hashing:

Maintain m = Θ(n) so also support search in O(1) expected time assuming simple uniform
hashing

Delete:

Also O(1) expected time

• space can get big with respect to n e.g. n× insert, n× delete

• solution: when n decreases to m/4, shrink to half the size =⇒ O(1) amortized cost
for both insert and delete - analysis is harder; (see CLRS 17.4).

String Matching
Given two strings s and t, does s occur as a substring of t? (and if so, where and how many
times?)
E.g. s = ‘6.006’ and t = your entire INBOX (‘grep’ on UNIX)

3
Lecture 6 Hashing II: Table Doubling, Karp-Rabin 6.006 Spring 2008

t
s
s

Figure 3: Illustration of Simple Algorithm for the String Matching Problem

Simple Algorithm:

Any (s == t[i : i + len(s)] for i in range(len(t)-len(s)))


- O(| s |) time for each substring comparison
=⇒ O(| s | ·(| t | − | s |)) time
= O(| s | · | t |) potentially quadratic

Karp-Rabin Algorithm:

• Compare h(s) == h(t[i : i + len(s)])

• If hash values match, likely so do strings

– can check s == t[i : i + len(s)] to be sure ∼ cost O(| s |)


– if yes, found match — done
1
– if no, happened with probability < |s|
=⇒ expected cost is O(1) per i.

• need suitable hash function.

• expected time = O(| s | + | t | ·cost(h)).

– naively h(x) costs | x |


– we’ll achieve O(1)!
– idea: t[i : i + len(s)] ≈ t[i + 1 : i + 1 + len(s)].

Rolling Hash ADT


Maintain string subject to

• h(): reasonable hash function on string

• h.append(c): add letter c to end of string

• h.skip(c): remove front letter from string, assuming it is c

4
Lecture 6 Hashing II: Table Doubling, Karp-Rabin 6.006 Spring 2008

Karp-Rabin Application:

for c in s: hs.append(c)
for c in t[:len(s)]:ht.append(c)
if hs() == ht(): ...

This first block of code is O(| s |)

for i in range(len(s), len(t)):


ht.skip(t[i-len(s)])
ht.append(t[i])
if hs() == ht(): ...

The second block of code is O(| t |)

Data Structure:

Treat string as a multidigit number u in base a where a denotes the alphabet size. E.g. 256

• h() = u mod p for prime p ≈| s | or | t | (division method)

• h stores u mod p and | u |, not u


=⇒ smaller and faster to work with (u mod p fits in one machine word)

• h.append(c): (u · a + ord (c)) mod p = [(u mod p) · a + ord (c)] mod p

• h.skip(c): [u − ord (c) · (a|u|−1 mod p)] mod p


= [(u mod p) − ord (c) · (a|u−1| mod p)] mod p

5
MIT OpenCourseWare
http://ocw.mit.edu

6.006 Introduction to Algorithms


Spring 2008

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 7 Hashing III: Open Addressing 6.006 Spring 2008

Lecture 7: Hashing III: Open Addressing

Lecture Overview
• Open Addressing, Probing Strategies

• Uniform Hashing, Analysis

• Advanced Hashing

Readings
CLRS Chapter 11.4 (and 11.3.3 and 11.5 if interested)

Open Addressing
Another approach to collisions

• no linked lists

• all items stored in table (see Fig. 1)

item2

item1
item3

Figure 1: Open Addressing Table

• one item per slot =⇒ m ≥ n

• hash function specifies order of slots to probe (try) for a key, not just one slot: (see
Fig. 2)

Insert(k,v)

for i in xrange(m):
if T [h(k, i)] is None: � empty slot
T [h(k, i)] = (k, v) � store item
return
raise ‘full’

1
Lecture 7 Hashing III: Open Addressing 6.006 Spring 2008

<h(k,φ), h(k,1), . . . , h(k, m-1)> permutation


h: U x {φ,1, . . . , m-1} {φ,1, . . . , m-1}
all
possible which slot to probe
keys probe
h(k,3)

k h(k,1)

h(k,4)
h(k,2)

Figure 2: Order of Probes

Example: Insert k = 496

φ
1 586 , . . . collision
2 133 , . . .
probe h(496, φ) = 4 3
probe h(496, 1) = 1 4 204 , . . . collision
5 496 , . . . insert
probe h(496, 2) = 5
6 481 , . . .
7
m-1

Figure 3: Insert Example

Search(k)

for i in xrange(m):
if T [h(k, i)] is None: � empty slot?
return None � end of “chain”
elif T [h(k, i)][φ] == k: � matching key
return T [h(k, i)] � return item
return None ˙ � exhausted table

2
Lecture 7 Hashing III: Open Addressing 6.006 Spring 2008

Delete(k)

• can’t just set T [h(k, i)] = None

• example: delete(586) =⇒ search(496) fails

• replace item with DeleteMe, which Insert treats as None but Search doesn’t

Probing Strategies
Linear Probing

h(k, i) = (h� (k) +i) mod m where h� (k) is ordinary hash function

• like street parking

• problem: clustering as consecutive group of filled slots grows, gets more likely to grow
(see Fig. 4)

;
;

h(k,m-1)
h(k,0)
h(k,1)
h(k,2)
;

.
.
;

Figure 4: Primary Clustering

• for 0.01 < α < 0.99 say, clusters of Θ(lg n). These clusters are known

• for α = 1, clusters of Θ( n) These clusters are known

Double Hashing

h(k, i) =(h1 (k) +i. h2 (k)) mod m where h1 (k) and h2 (k) are two ordinary hash functions.
• actually hit all slots (permutation) if h2 (k) is relatively prime to m

• e.g. m = 2r , make h2 (k) always odd

Uniform Hashing Assumption


Each key is equally likely to have any one of the m! permutations as its probe sequence
• not really true

• but double hashing can come close

3
Lecture 7 Hashing III: Open Addressing 6.006 Spring 2008

Analysis
1
Open addressing for n items in table of size m has expected cost of ≤ per operation,
1−α
where α = n/m(< 1) assuming uniform hashing
Example: α = 90% =⇒ 10 expected probes

Proof:

Always make a first probe.


With probability n/m, first slot occupied.
In worst case (e.g. key not in table), go to next.
n−1
With probability , second slot occupied.
m−1
n−2
Then, with probability , third slot full.
m−2
Etc. (n possibilities)

n n−1 n−2
So expected cost = 1+ (1 + (1 + (· · · )
m m−1 m−2
n−1 n
Now ≤ = α for i = φ, · · · , n(≤ m)
m−1 m
So expected cost ≤ 1 + α(1 + α(1 + α(· · · )))
= 1 + α + α2 + α3 + · · ·
1
=
1−α

Open Addressing vs. Chaining


Open Addressing: better cache performance and rarely allocates memory

Chaining: less sensitive to hash functions and α

4
Lecture 7 Hashing III: Open Addressing 6.006 Spring 2008

Advanced Hashing
Universal Hashing

Instead of defining one hash function, define a whole family and select one at random

• e.g. multiplication method with random a


1
• can prove P r (over random h) {h(x) = h(y)} = m for every (i.e. not random) x �= y

• =⇒ O(1) expected time per operation without assuming simple uniform hashing!
CLRS 11.3.3

Perfect Hashing

Guarantee O(1) worst-case search


1
• idea: if m = n2 then E[� collisions] ≈ 2
=⇒ get φ after O(1) tries . . . but O(n2 ) space

• use this structure for storing chains

k items => m = k 2
NO COLLISIONS

2 levels
[CLRS 11.5]

Figure 5: Two-level Hash Table

• can prove O(n) expected total space!

• if ever fails, rebuild from scratch

5
MIT OpenCourseWare
http://ocw.mit.edu

6.006 Introduction to Algorithms


Spring 2008

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 8 Sorting I: Heaps 6.006 Spring 2008

Lecture 8: Sorting I: Heaps

Lecture Overview
• Review: Insertion Sort and Merge Sort

• Selection Sort

• Heaps

Readings
CLRS 2.1, 2.2, 2.3, 6.1, 6.2, 6.3 and 6.4

Sorting Review
Insertion Sort

key
θ(n2) algorithm

5 2 4 6 1 3 2 5 4 6 1 3

1 2 3 4 5 6 2 4 5 6 1 3

1 2 4 5 6 3 2 4 5 6 1 3

Figure 1: Insertion Sort Example

Merge Sort

Divide n-element array into two subarrays of n/2 elements each. Recursively sort sub-arrays
using mergesort. Merge two sorted subarrays.

1
Lecture 8 Sorting I: Heaps 6.006 Spring 2008

L R
A[1: n/2] A[n/2+1: n]
θ(n) time 2 4 5 7 1 2 3 6
2 4 5 7 1 2 3 6
θ(n) auxiliary want sorted A[1: n]
space A 1 2 2 3 4 5 6 7
w/o auxiliary space??

Figure 2: Merge Sort Example

In-Place Sorting

Numbers re-arranged in the array A with at most a constant number of them sorted outside
the array at any time.

Insertion Sort: stores key outside array Θ(n2 ) in-place

Merge Sort: Need O(n) auxiliary space Θ(n lg n) during merging

Question: Can we have Θ(n lg n) in-place sorting?

Selection Sort
0. i=1

1. Find minimum value in list beginning with i

2. Swap it with the value in ith position

3. i = i + 1, stop if i = n

Iterate steps 0-3 n times. Step 1 takes O(n) time. Can we improve to O(lg n)?

2
Lecture 8 Sorting I: Heaps 6.006 Spring 2008

2 1 5 4
i=1
1 2 5 4
θ(n ) time
2

in-place
1 2 5 4

1 2 4 5

Figure 3: Selection Sort Example

Heaps (Not garbage collected storage)


A heap is an array object that is viewed as a nearly complete binary tree.

1 16

1 2 3 4 5 6 7 8 9 10
2 14 10 3

4 5 7 6 9 7 3 16 14 10 8 7 9 3 2 4 1
8
8 9 4
2 1

Figure 4: Binary Heap

Data Structure
root A[i]
Node with index i
PARENT(i) = � 2i �
LEFT(i) = 2i
RIGHT(i) = 2i + 1

Note: NO POINTERS!

3
Lecture 8 Sorting I: Heaps 6.006 Spring 2008

length[A]: number of elements in the array

heap-size[A]: number of elements in the heap stored within array A

heap-size[A]: ≤ length[A]

Max-Heaps and Min-Heaps


Max-Heap Property: For every node i other than the root A[PARENT(i)] ≥ A[i]
Height of a binary heap O(lg n)

MAX HEAPIFY: O(lg n) maintains max-heap property

BUILD MAX HEAP: O(n) produces max-heap from unordered input array

HEAP SORT: O(n lg n)

Heap operations insert, extract max etc O(lg n).

Max Heapify(A,i)

l ← left(i)
r ← right(i)
if l ≤ heap-size(A) and A[l] > A[i]
then largest ← l
else largest ← i
if r ≤ heap-size(A) and A[r] > largest
then largest ← r
if largest =� i
then exchange A[i] and A[largest]
MAX HEAPIFY(A, largest)

This assumes that the trees rooted at left(i) and Right(i) are max-heaps. A[i] may be
smaller than children violating max-heap property. Let the A[i] value “float down” so
subtree rooted at index i becomes a max-heap.

4
Lecture 8 Sorting I: Heaps 6.006 Spring 2008

Example

1 16

2 MAX_HEAPIFY (A,2)
4 10 3
heap_size[A] = 10
4 5 7 6 9 7 3
14
8 10
9 8 1
2

1 16

Exchange A[2] with A[4]


2 14 10 3 Call MAX_HEAPIFY(A,4)
because max_heap property
4 5 7 6 9 7 3
4 is violated
8 10
9 8 1
2

1 16
Exchange A[4] with A[9]
2 14 10 3 No more calls

4 5 7 6 9 7 3
8
8 9 4 10
2 1

Figure 5: MAX HEAPIFY Example

5
MIT OpenCourseWare
http://ocw.mit.edu

6.006 Introduction to Algorithms


Spring 2008

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 9 Sorting II: Heaps 6.006 Spring 2008

Lecture 9: Sorting II: Heaps

Lecture Overview
• Review: Heaps and MAX HEAPIFY

• Building a Heap

• Heap Sort

• Priority Queues (Recitation)

Readings
CLRS 6.1-6.4

Review
Heaps:

Parent(i) = �i/2�
Left(i) = 2i
Right(i) = 2i + 1

Max heap property:

A[Parent(i)] ≥ A[i]

• MAX HEAPIFY(A, 2)
heap size(A) = 10
A[2] ←→ A[4]

• MAX HEAPIFY(A,4)
A[4] ←→ A[9]

1
Lecture 9 Sorting II: Heaps 6.006 Spring 2008

1
16
3
2 10
4
7
4 6
14 5 9 3
7
8 9
2 8 10
1

Violation

1 2 3 4 5 6 7 8 9 10

16 4 10 14 7 9 3 2 8 1

etc
O(lg n) time

Figure 1: Review from last lecture

Building a Heap
A[1 · · · n] converted to a max heap Observation: Elements A[�n/2 + 1� · · · n] are all leaves
of the tree and can’t have children.

BUILD MAX HEAP(A):


heap size(A) = length(A)
O(n) times for i ← � length[A]/2� downto 1
O(lg n) time do MAX HEAPIFY(A, i)
O(n lg n) overall

See Figure 2 for an example.

2
Lecture 9 Sorting II: Heaps 6.006 Spring 2008

1
4 A 4 1 3 2 16 9 10 14 8 7
3
2 MAX-HEAPIFY (A,5)
1 3
7 no change
4 6
2 9 10 MAX-HEAPIFY (A,4)
5 16
8
Swap A[4] and A[8]
9
14 8 10
7

1
4 MAX-HEAPIFY (A,3)
3 Swap A[3] and A[7]
2
1 3
7
4 6
14 9 10
5 16
8 9
2 8 10
7

1
MAX-HEAPIFY (A,2)
4
Swap A[2] and A[5]
3
2 Swap A[5] and A[10]
1 10
7
4 6
9 3
14 5 16
8 9
2 8 10
7

1
4
3
2 MAX-HEAPIFY (A,1)
10
16 7 Swap A[1] with A[2]
4 6
14 9 3 Swap A[2] with A[4]
5 7
8
Swap A[4] with A[9]
9
2 8 10
1

Figure 2: Example: Building Heaps

3
Lecture 9 Sorting II: Heaps 6.006 Spring 2008

Sorting Strategy
• Build max heap from unordered array

• Find maximum element (A[1])

• Put it in correct position A[n], A[n] goes to A[1]


New root could violate max heap property but children remain max heaps.

• Discard node n from heap (decrement heapsize)

Heap Sort Algorithm

O(n lg n) BUILD MAX HEAP(A):


n times for i =length[A] downto 2
do exchange A[1] ←→ A[i]
heap size[A] = heap size[A] − 1
O(lg n) MAX HEAPIFY(A, 1)
O(n lg n) overall

See Figure 3 for an illustration.

4
Lecture 9 Sorting II: Heaps 6.006 Spring 2008

1
1 heap_size = 9
16
1 MAX_HEAPIFY (A,1)
3
2 3
10 2
14 10
6
7 14
4 7
9 3 6
8 5 7 4
9 3
8 5 7
8 9 10
2 4 10 8 9
1 2 4 16 not part of heap
Note: cannot run MAX_HEAPIFY with heapsize of 10

1
1
14
1
3
2 3
8 10 2 10
6
7 8
4 7
9 3 6
4 5 7 4
9 3
4 5 7
8 9
2 1 8 9 10
2
14 16 not part of heap

MAX_HEAPIFY (A,1)
1
1
10
2
3
2 3
9 2
8 9
6
7 8
4 7
1 3 6
4 5 7 4
1 3
4 5 7
8
2 8 9 10
10 14 16 not part of heap

MAX_HEAPIFY (A,1)
and so on . . .

Figure 3: Illustration: Heap Sort Algorithm

5
Lecture 9 Sorting II: Heaps 6.006 Spring 2008

Priority Queues
This is an abstract datatype as it can be implemented in different ways.

INSERT(S, X) : inserts X into set S


MAXIMUM(S): returns element of S with largest key
EXTRACT MAX(S): removes and returns element with largest key
INCREASE KEY(S, x, k): increases the value of element x’s key to new value k
(assumed to be as large as current value)

6
MIT OpenCourseWare
http://ocw.mit.edu

6.006 Introduction to Algorithms


Spring 2008

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 10 Sorting III: Linear Bounds Linear-Time Sorting 6.006 Spring 2008

Lecture 10: Sorting III: Linear Bounds


Linear-Time Sorting

Lecture Overview
• Sorting lower bounds

– Decision Trees

• Linear-Time Sorting

– Counting Sort

Readings
CLRS 8.1-8.4

Comparison Sorting
Insertion sort, merge sort and heap sort are all comparison sorts.
The best worst case running time we know is O(n lg n). Can we do better?

Decision-Tree Example

Sort < a1 , a2 , · · · an >.

1:2

2:3 1:3

2:3
123 1:3 213

231 321
132 312

Figure 1: Decision Tree

Each internal node labeled i : j, compare ai and aj , go left if ai ≤ aj , go right otherwise.

1
Lecture 10 Sorting III: Linear Bounds Linear-Time Sorting 6.006 Spring 2008

Example

Sort < a1 , a2 , a3 >=< 9, 4, 6 > Each leaf contains a permutation, i.e., a total ordering.

1:2 9 > 4 (a1 > a2)

1:3
2:3 9 > 6 (a1 > a3)

2:3

(a2 ≤ a3) 4 ≤ 6

231 4≤ 6≤ 9

Figure 2: Decision Tree Execution

Decision Tree Model

Can model execution of any comparison sort. In order to sort, we need to generate a total
ordering of elements.

• One tree size for each input size n

• Running time of algo: length of path taken

• Worst-case running time: height of the tree

Theorem

Any decision tree that can sort n elements must have height Ω(n lg n).

Proof: Tree must contain ≥ n! leaves since there are n! possible permutations. A height-h
binary tree has ≤ 2h leaves. Thus,

n! ≤ 2h
n
=⇒ h ≥ lg(n!) (≥ lg(( )n ) Stirling)
e
≥ n lg n − n lg e
= Ω(n lg n)

2
Lecture 10 Sorting III: Linear Bounds Linear-Time Sorting 6.006 Spring 2008

Sorting in Linear Time


Counting Sort: no comparisons between elements

Input: A[1 . . . n] where A[j] � {1, 2, · · · , k}

Output: B[1 . . . n] sorted

Auxiliary Storage: C[1 . . . k]

Intuition

Since elements are in the range {1, 2, · · · , k}, imagine collecting all the j’s such that A[j] = 1,
then the j’s such that A[j] = 2, etc.
Don’t compare elements, so it is not a comparison sort!

A[j]’s index into appropriate positions.

Pseudo Code and Analysis

for i ← 1 to k
θ(k) { do C [i] = 0
θ(n) for j ← 1 to n
{ do C [A[j]] = C [A[j]] + 1
θ(k) for i ← 2 to k
{ do C [i] = C [i] + C [i-1]
θ(n) for j ← n downto 1
{ do B[C [A[j]]] = A[j]
C [A[j]] = C [A[j]] - 1

θ(n+k)

Figure 3: Counting Sort

3
Lecture 10 Sorting III: Linear Bounds Linear-Time Sorting 6.006 Spring 2008

Example

Note: Records may be associated with the A[i]’s.

1 2 3 4 5 1 2 3 4
A: 4 1 3 4 3 C: 0 0 0 0

1 2 3 4 5
C: 1 0 2 2
B: 1 3 3 4 4

1 2 3 4
C: 1 1 3 5
2 4

Figure 4: Counting Sort Execution

A[n] = A[5] = 3
C[3] = 3
B[3] = A[5] = 3, C[3] decr.
A[4] = 4
C[4] = 5
B[5] = A[4] = 4, C[4] decr. and so on . . .

4
MIT OpenCourseWare
http://ocw.mit.edu

6.006 Introduction to Algorithms


Spring 2008

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 11 Sorting IV: Stable Sorting, Radix Sort 6.006 Spring 2008

Lecture 11: Sorting IV: Stable Sorting, Radix Sort

Lecture Overview
• Stable Sorting

• Radix Sort

• Quick Sort ← not officially a part of 6.006

• Sorting Races

Stable Sorting
Preserves input order among equal elements

4’ 1 3* 4 3
counting sort is stable
merge sort is stable
1 3* 3 4’ 4

Figure 1: Stability

Selection Sort and Heap: Find maximum element and put it at end of array (swap with
element at end of array) NOT STABLE!

3 2a 2b ← 2b 2a 3 define
2a <2b

Figure 2: Selection Sort Instability

Radix Sort
• Herman Hollerith card-sorting machine for 1890 census.

• Digit by Digit sort by mechanical machine

1. Examine given column of each card in a deck


2. Distribute the card into one of 10 bins

1
Lecture 11 Sorting IV: Stable Sorting, Radix Sort 6.006 Spring 2008

3. Gather cards bin by bin, so cards with first place punched are on top of cards
with second place punched, etc.

80 cols
.
10 . .
places . .

Figure 3: Punch Card

MSB vs. LSB?

Sort on most significant digit first or least significant digit first?

MSB strategy: Cards in 9 of 10 bins must be put aside, leading to a large number of
intermediate piles

LSB strategy: Can gather sorted cards in bins appropriately to create a deck!

Example

3 2 9 7 2 0 7 2 0 3 2 9
4 5 7 3 5 5 3 2 9 3 5 5
6 5 7 4 3 6 4 3 6 4 3 6
8 3 9 4 5 7 8 3 9 4 5 7
4 3 6 6 5 7 3 5 5 6 5 7
7 2 0 3 2 9 4 5 7 7 2 0
3 5 5 8 3 9 6 5 7 8 3 9

Digit sort needs to be stable, else will get wrong result!

Figure 4: Example of Radix Sort

2
Lecture 11 Sorting IV: Stable Sorting, Radix Sort 6.006 Spring 2008

Analysis

Assume counting sort is auxiliary stable sort. Θ(n + k) complexity.

Suppose we have n words of b bits each.

One pass of counting sort Θ(n + 2b )


b passes of counting sort Θ(b(n + 2)) = Θ(nb)
b b bn
passes Θ( (n + 2r )) minimized when r = lg n Θ( )
r r lg n

Quick Sort
This section is for “enrichment” only.

Divide: Partition the array into two. Sub-arrays around a pivot x such that elements in
lower sub array ≤ x ≤ elements in upper sub array. ← Linear Time

pivot

≤x x ≥x

Figure 5: Pivot Definition

Conquer: Recursively sort the two sub arrays

Combine: Trivial

If we choose a pivot such that two sub arrays are roughly equal:

T (n) = 2T (n/2) + Θ(n) =⇒ T (n) = Θ(n lg n)

If one array is much bigger:

T (n) = T (n − 1) + Θ(n) =⇒ T (n) = Θ(n2 )

Average case Θ(n lg n) assuming input array is randomized!

3
Lecture 11 Sorting IV: Stable Sorting, Radix Sort 6.006 Spring 2008

Sorting Races
Click here for a reference on this.

Bubble Sort: Repeatedly step through the list to be sorted. Compare 2 items, swap if they
are in the wrong order. Continue through list, until no swaps. Repeat pass through
list until no swaps. Θ(n2 )

Shell Sort: Improves insertion sort by comparing elements separated by gaps Θ(nlg2 n)

4
MIT OpenCourseWare
http://ocw.mit.edu

6.006 Introduction to Algorithms


Spring 2008

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 12 Searching I: Graph Search & Representations 6.006 Spring 2008

Lecture 12: Searching I: Graph Search and


Representations

Lecture Overview: Search 1 of 3


• Graph Search

• Applications

• Graph Representations

• Introduction to breadth-first and depth-first search

Readings
CLRS 22.1-22.3, B.4

Graph Search
Explore a graph e.g., find a path from start vertices to a desired vertex
Recall: graph G = (V, E)

• V = set of vertices (arbitrary labels)

• E = set of edges i.e. vertex pairs (v, w)

– ordered pair =⇒ directed edge of graph


– unordered pair =⇒ undirected

e.g. a b V = {a,b,c,d} a V = {a,b,c}


E = {{a,b},{a,c}, E = {(a,c),(b,c),
{b,c},{b,d}, (c,b),(b,a)}
{c,d}}
c d b c

UNDIRECTED DIRECTED

Figure 1: Example to illustrate graph terminology

1
Lecture 12 Searching I: Graph Search & Representations 6.006 Spring 2008

Applications:
There are many.

• web crawling (How Google finds pages)

• social networking (Facebook friend finder)

• computer networks (Routing in the Internet)


shortest paths [next unit]

• solving puzzles and games

• checking mathematical conjectures

Pocket Cube:
Consider a 2 × 2 × 2 Rubik’s cube

Figure 2: Rubik’s Cube

• Configuration Graph:

– vertex for each possible state


– edge for each basic move (e.g., 90 degree turn) from one state to another
– undirected: moves are reversible

• Puzzle: Given initial state s, find a path to the solved state

• � vertices = 8!.38 = 264, 539, 520 (because there are 8 cubelets in arbitrary positions,
and each cubelet has 3 possible twists)

Figure 3: Illustration of Symmetry

2
Lecture 12 Searching I: Graph Search & Representations 6.006 Spring 2008

• can factor out 24-fold symmetry of cube: fix one cubelet

811 .3 =⇒ 7!.37 = 11, 022, 480

• in fact, graph has 3 connected components of equal size =⇒ only need to search in
one
=⇒ 7!.36 = 3, 674, 160

3
Lecture 12 Searching I: Graph Search & Representations 6.006 Spring 2008

“Geography” of configuration graph

... “breadth-
first
tree”
possible
first moves
reachable
in two steps
but not one

Figure 4: Breadth-First Tree

� reachable configurations

distance 90◦ turns 90◦ & 180◦ turns


0 1 1
1 6 9
2 27 54
3 120 321
4 534 1,847
5 2,256 9,992
6 8,969 50,136
7 33,058 227,536
8 114,149 870,072
9 360,508 1,887,748
10 930,588 623,800
11 1,350,852 2,644 ← diameter
12 782,536
13 90,280
14 276 ← diameter
3,674,160 3,674,160
Wikipedia Pocket Cube

Cf. 3 × 3 × 3 Rubik’s cube: ≈ 1.4 trillion states; diameter is unknown! ≤ 26

4
Lecture 12 Searching I: Graph Search & Representations 6.006 Spring 2008

Representing Graphs: (data structures)


Adjacency lists:

Array Adj of | V | linked lists

• for each vertex u�V, Adj[u] stores u’s neighbors, i.e., {v�V | (u, v)�E}. colorBlue(u, v)
are just outgoing edges if directed. (See Fig. 5 for an example)

• in Python: Adj = dictionary of list/set values vertex = any hashable object (e.g., int,
tuple)

• advantage: multiple graphs on same vertices

a a c

b c a

b c c b

Adj

Figure 5: Adjacency List Representation

Object-oriented variations:

• object for each vertex u

• u.neighbors = list of neighbors i.e., Adj[u]

Incidence Lists:

• can also make edges objects (see Figure 6)

• u.edges = list of (outgoing) edges from u.

• advantage: storing data with vertices and edges without hashing

5
Lecture 12 Searching I: Graph Search & Representations 6.006 Spring 2008

e.a e e.b

Figure 6: Edge Representation

Representing Graphs: contd.

The above representations are good for for sparse graphs where | E |� (| V |)2 . This
translates to a space requirement = Θ(V + E) (Don’t bother with | . | ’s inside O/Θ).

Adjacency Matrix:

• assume V = {1, 2, . . . , |v|} (number vertices)

• A = (aij ) = |V | × |V | matrix where i = row and j = column, and



1 if (i, j) � E
aij =
φ otherwise
See Figure 7.

• good for dense graphs where | E |≈ (| V |)2

• space requirement = Θ(V 2 )

• cool properties like A2 gives length-2 paths and Google PageRank ≈ A∞

• but we’ll rarely use it Google couldn’t; | V |≈ 20 billion =⇒ (| V |)2 ≈ 4.1020


[50,000 petabytes]

1 2 3
a
1

b c
A=
( (
0 0 1
1 0 1
0 1 0
2
3

Figure 7: Matrix Representation

6
Lecture 12 Searching I: Graph Search & Representations 6.006 Spring 2008

Implicit Graphs:

Adj(u) is a function or u.neighbors/edges is a method =⇒ “no space” (just what you need
now)

High level overview of next two lectures:


Breadth-first search

Levels like “geography”

s ...

frontier
Figure 8: Illustrating Breadth-First Search

• frontier = current level

• initially {s}

• repeatedly advance frontier to next level, careful not to go backwards to previous level

• actually find shortest paths i.e. fewest possible edges

Depth-first search

This is like exploring a maze.

• e.g.: (left-hand rule) - See Figure 9

• follow path until you get stuck

• backtrack along breadcrumbs until you reach an unexplored edge

7
Lecture 12 Searching I: Graph Search & Representations 6.006 Spring 2008

• recursively explore it

• careful not to repeat a vertex

Figure 9: Illustrating Depth-First Search

8
MIT OpenCourseWare
http://ocw.mit.edu

6.006 Introduction to Algorithms


Spring 2008

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 13 Searching II 6.006 Spring 2008

Lecture 13: Searching II: Breadth-First Search


and Depth-First Search
Lecture Overview: Search 2 of 3
• Breadth-First Search

• Shortest Paths

• Depth-First Search

• Edge Classification

Readings
CLRS 22.2-22.3

Recall:
graph search: explore a graph
e.g., find a path from start vertices to a desired vertex

adjacency lists: array Adj of | V | linked lists

• for each vertex u�V, Adj[u] stores u’s neighbors, i.e. {v�V | (u, v)�E}
v - just outgoing edges if directed

a a c

b c a

b c c b

Adj

Figure 1: Adjacency Lists

1
Lecture 13 Searching II 6.006 Spring 2008

s ...
level Ø last level
level 1
level 2

Figure 2: Breadth-First Search

Breadth-first Search (BFS):


See Figure 2
Explore graph level by level from S

• level φ = {s}

• level i = vertices reachable by path of i edges but not fewer

• build level i > 0 from level i − 1 by trying all outgoing edges, but ignoring vertices
from previous levels

BFS (V,Adj,s):
level = { s: φ }
parent = {s : None }
i=1
frontier = [s] � previous level, i − 1
while frontier:
next = [ ] � next level, i
for u in frontier:
for v in Adj [u]:
if v not in level: � not yet seen
level[v] = i � = level[u] + 1
parent[v] = u
next.append(v)
frontier = next
i+=1

2
Lecture 13 Searching II 6.006 Spring 2008

Example:

level Ø
level 1
1 Ø 3 frontierØ = {s}
2
a s d f frontier1 = {a, x}
frontier2 = {z, d, c}
2 1 2 3 frontier3 = {f, v}
z x c v (not x, c, d)

level 2 level 3

Figure 3: Breadth-First Search Frontier

Analysis:

• vertex V enters next (& then frontier)


only once (because level[v] then set)
base case: v = s

• =⇒ Adj[v] looped through only once



� | E | for directed graphs
time = | Adj[V ] |=
2 | E | for undirected graphs
v�V

• O(E) time
- O(V + E) to also list vertices unreachable from v (those still not assigned level)
“LINEAR TIME”

Shortest Paths:
• for every vertex v, fewest edges to get from s to v is

level[v] if v assigned level
∞ else (no path)

• parent pointers form shortest-path tree = union of such a shortest path for each v
=⇒ to find shortest path, take v, parent[v], parent[parent[v]], etc., until s (or None)

3
Lecture 13 Searching II 6.006 Spring 2008

Depth-First Search (DFS):


This is like exploring a maze.

Figure 4: Depth-First Search Frontier

• follow path until you get stuck

• backtrack along breadcrumbs until reach unexplored neighbor

• recursively explore

parent = {s: None}

} search from

}
DFS-visit (V, Adj, s): start vertex s
for v in Adj [s]: (only see
if v not in parent: stuff reachable
parent [v] = s from s)
DFS-visit (V, Adj, v)

DFS (V, Adj)


parent = { } explore
for s in V: entire graph
if s not in parent:
parent [s] = None (could do same
DFS-visit (V, Adj, s) to extend BFS)

Figure 5: Depth-First Search Algorithm

4
Lecture 13 Searching II 6.006 Spring 2008

Example:

cross edge
S1 1 S2
a b c
forward 8
edge 5
2 6
4

back d e f 7
edge 3
back edge

Figure 6: Depth-First Traversal

Edge Classification:

tree edges (formed by parent)


nontree edges

back edge: to ancestor

forward edge: to descendant


cross edge (to another subtree)

Figure 7: Edge Classification

To compute this classification, keep global time counter and store time interval during
which each vertex is on recursion stack.

Analysis:

• DFS-visit gets called with�


a vertex s only once (because then parent[s] set)
=⇒ time in DFS-visit = | Adj[s] |= O(E)
s�V

• DFS outer loop adds just O(V )


=⇒ O(V + E) time (linear time)

5
MIT OpenCourseWare
http://ocw.mit.edu

6.006 Introduction to Algorithms


Spring 2008

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 14 Searching III 6.006 Spring 2008

Lecture 14: Searching III: Toplogical Sort and


NP-completeness
Lecture Overview: Search 3 of 3 & NP-completeness
• BFS vs. DFS

• job scheduling

• topological sort

• intractable problems

• P, NP, NP-completeness

Readings
CLRS, Sections 22.4 and 34.1-34.3 (at a high level)

Recall:
• Breadth-First Search (BFS): level by level

• Depth-First Search (DFS): backtrack as necc.

• both O(V + E) worst-case time =⇒ optimal

• BFS computes shortest paths (min. � edges)

• DFS is a bit simpler & has useful properties

1
Lecture 14 Searching III 6.006 Spring 2008

Job Scheduling:
Given Directed Acylic Graph (DAG), where vertices represent tasks & edges represent
dependencies, order tasks without violating dependencies

8 7 9
G H I

4 3 2 1
A B C F

D E
6 5

Figure 1: Dependence Graph

Source

Source = vertex with no incoming edges


= schedulable at beginning (A,G,I)

Attempt

BFS from each source:

- from A finds H,B,C,F

- from D finds C, E, F

- from G finds H
} need to merge
- costly

Figure 2: BFS-based Scheduling

2
Lecture 14 Searching III 6.006 Spring 2008

Topological Sort
Reverse of DFS finishing times (time at which node’s outgoing edges finished)
Exercise: prove that no constraints are violated

Intractability
• DFS & BFS are worst-case optimal if problem is really graph search (to look at graph)

• what if graph . . .

– is implicit?
– has special structure?
– is infinite?

The first 2 characteristics (implicitness and special structure) apply to the Rubik’s Cube
problem.
The third characteristic (infiniteness) applies to the Halting Problem.

Halting Problem:

Given a computer program, does it ever halt (stop)?

decision problem: answer is YES or NO

UNDECIDABLE: no algorithm solves this problem (correctly in finite time on all inputs)

Most decision problems are undecidable:

• program ≈ binary string ≈ nonneg. integer � ℵ

• decision problem = a function from binary strings to {YES,NO}. Binary strings refer
to ≈ nonneg. integers while {YES,NO} ≈ {0,1}

• ≈ infinite sequence of bits ≈ real number � �

• ℵ � �: non assignment of unique nonneg. integers to real numbers (� uncountable)

• =⇒ not nearly enough programs for all problems & each program solves only one
problem

• =⇒ almost all problems cannot be solved

3
Lecture 14 Searching III 6.006 Spring 2008

n × n × n Rubik’s cube:

• n = 2 or 3 is easy algorithmically: O(1) time


in practice, n = 3 still unsolved

• graph size grows exponentially with n

• solvability decision question is easy (parity check)

• finding shortest solution: UNSOLVED

n × n Chess:

Given n × n board & some configuration of pieces, can WHITE force a win?

• can be formulated as (αβ) graph search

• every algorithm needs time exponential in n:


“EXPTIME-complete” [Fraenkel & Lichtenstein 1981]

n2 − 1 Puzzle:

Given n × n grid with n2 − 1 pieces, sort pieces by sliding (see Figure 3).

• similar to Rubik’s cube:

• solvability decision question is easy (parity check)

• finding shortest solution: NP-COMPLETE [Ratner & Warmuth 1990]

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15

Figure 3: Puzzle

4
Lecture 14 Searching III 6.006 Spring 2008

Tetris:

Given current board configuration & list of pieces to come, stay alive

• NP-COMPLETE [Demaine, Hohenberger, Liben-Nowell 2003]

P, NP, NP-completeness
P = all (decision) problems solvable by a polynomial (O(nc )) time algorithm (efficient)

NP = all decision problems whose YES answers have short (polynomial-length) “proofs”
checkable by a polynomial-time algorithm
e.g., Rubik’s cube and n2 − 1 puzzle:
is there a solution of length ≤ k?
YES =⇒ easy-to-check short proof(moves)
Tetris � NP
but we conjecture Chess not NP (winning strategy is big- exponential in n)

� NP: Big conjecture (worth $1,000,000) ≈ generating proofs/solutions is harder than


P=
checking them

NP-complete = in NP & NP-hard

NP-hard = as hard as every problem in NP


= every problem in NP can be efficiently converted into this problem
=⇒ if this problem � P then P = NP (so probably this problem not in P)

5
MIT OpenCourseWare
http://ocw.mit.edu

6.006 Introduction to Algorithms


Spring 2008

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 15 Shortest Paths I: Intro 6.006 Spring 2008

Lecture 15: Shortest Paths I: Intro


Lecture Overview
• Homework Preview

• Weighted Graphs

• General Approach

• Negative Edges

• Optimal Substructure

Readings
CLRS, Sections 24 (Intro)

Motivation:
Shortest way to drive from A to B (Google maps “get directions”)

Formulation: Problem on a weighted graph G(V, E) W :E→�

Two algorithms: Dijkstra O(V lg V + E) assumes non-negative edge weights


Bellman Ford O(V E) is a general algorithm

Problem Set 5 Preview:


• Use Dijkstra to find shortest path from CalTech to MIT

– See “CalTech Cannon Hack” photos (search web.mit.edu )


– See Google Maps from CalTech to MIT

• Model as a weighted graph G(V, E), W : E → �

– V = vertices (street intersections)


– E = edges (street, roads); directed edges (one way roads)
– W (U, V ) = weight of edge from u to v (distance, toll)

path p = < v0 , v1 , . . . vk >


(vi , vi+1 ) � E for 0 ≤ i < k
k−1

w(p) = w(vi , vi+1 )
i=0

1
Lecture 15 Shortest Paths I: Intro 6.006 Spring 2008

Weighted Graphs:
Notation:
p
means p is a path from v0 to vk . (v0 ) is a path from v0 to v0 of weight 0.
v0 −→ vk
Definition:
Shortest path weight from u to v as
⎧ � �
⎪ p
⎨ min w(p) : if ∃ any such path
δ(u, v) = u −→ v

∞ otherwise (v unreachable from u)

Single Source Shortest Paths:

Given G = (V, E), w and a source vertex S, find δ(S, V ) [and the best path] from S to each
v�V .
Data structures:

d[v] = value inside circle


� �
0 if v = s
= ⇐= initially
∞ otherwise
= δ(s, v) ⇐= at end
d[v] ≥ δ(s, v) at all times

d[v] decreases as we find better paths to v


Π[v] = predecessor on best path to v, Π[s] = NIL

2
Lecture 15 Shortest Paths I: Intro 6.006 Spring 2008

Example:

A C E
5 3
1 5 3
1
3 3
0 1 2 1 4
S
2
1 1
2 3 4
B D F

Figure 1: Shortest Path Example: Bold edges give predecessor Π relationships

Negative-Weight Edges:
• Natural in some applications (e.g., logarithms used for weights)

• Some algorithms disallow negative weight edges (e.g., Dijkstra)

• If you have negative weight edges, you might also have negative weight cycles =⇒
may make certain shortest paths undefined!

Example:

See Figure 2

B → D → C → B (origin) has weight −6 + 2 + 3 = −1 < 0!


Shortest path S −→ C (or B, D, E) is undefined. Can go around B → D → C as many
times as you like
Shortest path S −→ A is defined and has weight 2

3
Lecture 15 Shortest Paths I: Intro 6.006 Spring 2008

3 -2
B
4 E
S 2
-6 1
2

A D

Figure 2: Negative-weight Edges

If negative weight edges are present, s.p. algorithm should find negative weight cycles (e.g.,
Bellman Ford)

General structure of S.P. Algorithms (no negative cycles)

d [v] ← ∞
Initialize: for v � V :
Π [v] ← NIL
d[S] ← 0
Main: repeat
⎡select edge (u, v) [somehow]
if d[v] > d[u] + w(u, v) :
“Relax” edge (u, v) ⎣ d[v] ← d[u] + w(u, v)

π[v] ← u
until all edges have d[v] ≤ d[u] + w(u, v)

4
Lecture 15 Shortest Paths I: Intro 6.006 Spring 2008

Complexity:

Termination? (needs to be shown even without negative cycles)


Could be exponential time with poor choice of edges.

v0 v1 v2 v3 v4 v5 v6 v7
4 8 10 12 13 14
T(0) = 0 v0, v1 13
v1, v2 10 11 12
T(n+2) = 3 + 2T(n) v2, vn 11
v0, v2 4 6 8 9 10
T(n) = θ(2 )n/2
v2, vn 9
6 7 8
7

Figure 3: Running Generic Algorithm

Optimal Substructure:
Theorem: Subpaths of shortest paths are shortest paths
Let p = < v0 , v1 , . . . vk > be a shortest path
Let pij = < vi , vi+1 , . . . vj > 0 ≤ i ≤ j ≤ k
Then pij is a shortest path.
Proof:

p0j pij pjk


p = v0 vi vj vk

pij’

Figure 4: Optimal Substructure Theorem

If p�ij is shorter than pij , cut out pij and replace with p�ij ; result is shorter than p.
Contradiction.

5
Lecture 15 Shortest Paths I: Intro 6.006 Spring 2008

Triangle Inequality:

Theorem: For all u, v, x �X, we have

δ (u, v) ≤ δ (u, x) + δ (x, v)

Proof:

δ (u,v)
u v

δ (u,x) δ (x,v)
x

Figure 5: Triangle inequality

6
MIT OpenCourseWare
http://ocw.mit.edu

6.006 Introduction to Algorithms


Spring 2008

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 16 Shortest Paths II: Bellman-Ford 6.006 Spring 2008

Lecture 16: Shortest Paths II: Bellman-Ford


Lecture Overview
• Review: Notation

• Generic S.P. Algorithm

• Bellman Ford Algorithm

– Analysis
– Correctness

Recall:

path p = < v0 , v1 , . . . , vk >


(v1 , vi+1 ) �E 0≤i<k
k−1

w(p) = w(vi , vi+1 )
i−0

Shortest path weight from u to v is δ(u, v). δ(u, v) is ∞ if v is unreachable from u, undefined
if there is a negative cycle on some path from u to v.

-ve

u v

Figure 1: Negative Cycle

Generic S.P. Algorithm

d [v] ← ∞
Initialize: for v � V :
Π [v] ← NIL
d[S] ← 0
Main: repeat
⎡select edge (u, v) [somehow]
if d[v] > d[u] + w(u, v) :
“Relax” edge (u, v) ⎣ d[v] ← d[u] + w(u, v)

π[v] ← u
until you can’t relax any more edges or you’re tired or . . .

1
Lecture 16 Shortest Paths II: Bellman-Ford 6.006 Spring 2008

Complexity:

Termination: Algorithm will continually relax edges when there are negative cycles present.

-1
1 -4
u v
0 1 3 4
1 2 1
0 2
d[u] -1 1
-2 0
etc

Figure 2: Algorithm may not terminate due to negative Cycles

Complexity could be exponential time with poor choice of edges.

v0 v1 v2 v3 v4 v5 v6
4 8 10 12 13 14
13
ORDER
10 11 12
(v0, v1)
11
T(n) = 3 + 2T(n-2) (v1, v2)
4 6 8 9 10
all of v2, vn
T(n) = θ(2 )
n/2
(v0, v2)
all of v2, vn

Figure 3: Algorithm could take exponential time

2
Lecture 16 Shortest Paths II: Bellman-Ford 6.006 Spring 2008

5-Minute 6.006
Here’s what I want you to remember from 6.006 five years after you graduate

Exponential Bad Polynomial Good

T(n) = C1 + C2T(n - C3) T(n) = C1 + C2T(n / C3)

C2 > 1 okay provided C3 > 1


if C2 > 1, trouble! if C3 > 1
Divide & Explode Divide & Conquer

Figure 4: Exponential vs. Polynomial

Bellman-Ford(G,W,S)

Initialize ()
for i = 1 to | v | −1
for each edge (u, v)�E:
Relax(u, v)
for each edge (u, v)�E
do if d[v] > d[u] + w(u, v)
then report a negative-weight cycle exists

At the end, d[v] = δ(s, v), if no negative-weight cycles

1
∞ -1 -1
B 1 B 1
-1 2 -1 2
3 3
0 4 7 0 4 7
A 2 E ∞ A 2 E ∞ 1
3 1 3 1 1

5 8 5 8
4 2 -3 4 2 -3
6 6
C D C D
5 5 1 -2
4 2 ∞ ∞ 2 ∞ 2 3
2 3

End of pass 1 End of pass 2 (and 3 and 4)

Figure 5: The numbers in circles indicate the order in which the δ values are computed

3
Lecture 16 Shortest Paths II: Bellman-Ford 6.006 Spring 2008

Theorem:
If G = (V, E) contains no negative weight cycles, then after Bellman-Ford executes d[v] =
δ(u, v) for all v�V .
Proof:
v�V be any vertex. Consider path p from s to v that is a shortest path with minimum
number of edges.

v
v1 vk
δ (s, vi) =
S δ (s, vi-1) + w (vi-1,vi)
p: v0 v2

Figure 6: Illustration for proof

Initially d[v0 ] = 0 = δ(S, V0 ) and is unchanged since no negative cycles.


After 1 pass through E, we have d[v1 ] = δ(s, v1 )
After 2 passes through E, we have d[v2 ] = δ(s, v2 )
After k passes through E, we have d[vk ] = δ(s, vk )
No negative weight cycles =⇒ p is simple =⇒ p has ≤| V | −1 edges

Corollary
If a value d[v] fails to converge after | V | −1 passes, there exists a negative-weight cycle
reachable from s.

4
MIT OpenCourseWare
http://ocw.mit.edu

6.006 Introduction to Algorithms


Spring 2008

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 17 Shortest Paths III: Dijkstra 6.006 Spring 2008

Lecture 17: Shortest Paths III - Dijkstra and


Special Cases
Lecture Overview
• Shortest paths in DAGs

• Shortest paths in graphs without negative edges

• Dijkstra’s Algorithm

Readings
CLRS, Sections 24.2-24.3

DAGs:
Can’t have negative cycles because there are no cycles!

1. Topologically sort the DAG. Path from u to v implies that u is before v in the linear
ordering

2. One pass over vehicles in topologically sorted order relaxing each edge that leaves
each vertex
Θ(V + E) time

Example:

6 1
r s t x y z
5 2 7 -1 -2
∞ 0 ∞ ∞ ∞ ∞
4
3
2

Figure 1: Shortest Path using Topological Sort

Vertices sorted left to right in topological order

Process r: stays ∞. All vertices to the left of s will be ∞ by definition

Process s: t : ∞ → 2 x : ∞ → 6 (see top of Figure 2)

1
Lecture 17 Shortest Paths III: Dijkstra 6.006 Spring 2008

6 1
r s t x y z
5 2 7 -1 -2
∞ 0 2 6 ∞ ∞
4
3
2
process t, x, y
6 1
r s t x y z
5 2 7 -1 -2
∞ 0 2 6 5 3
4
3
2

Figure 2: Preview of Dynamic Programming

Dijkstra’s Algorithm
For each edge (u, v) � E, assume w(u, v) ≥ 0, maintain a set S of vertices whose final
shortest path weights have been determined. Repeatedly select u � V − S with minimum
shortest path estimate, add u to S, relax all edges out of u.

Pseudo-code

Dijkstra (G, W, s) //uses priority queue Q


Initialize (G, s)
S←φ
Q ← V [G] //Insert into Q
while Q = � φ
do u ← EXTRACT-MIN(Q) //deletes u from Q
S = S ∪ {u}
for each vertex v � Adj[u]
do RELAX (u, v, w) ← this is an implicit DECREASE KEY operation

2
Lecture 17 Shortest Paths III: Dijkstra 6.006 Spring 2008

Recall
RELAX(u, v, w)
if d[v] > d[u] + w(u, v)
then d[v] ← d[u] + w(u, v)
TT[v] ← u

Example

∞ ∞
2
B D
10
0 4 8 9
A 1
7

3
2
C E
∞ ∞
S={ } { A B C D E } = Q
S={A} 0 ∞ ∞ ∞ ∞
S = { A, C } 0 10 3 ∞ ∞ after relaxing
edges from A
S = { A, C } 0 7 3 11 5 after relaxing
edges from C
S = { A, C, E } 0 7 3 11 5
S = { A, C , E, B} 0 7 3 9 5 after relaxing
edges from B

Figure 3: Dijkstra Execution

Strategy: Dijkstra is a greedy algorithm: choose closest vertex in V − S to add to set S

Correctness: Each time a vertex u is added to set S, we have d[u] = δ(s, u)

3
Lecture 17 Shortest Paths III: Dijkstra 6.006 Spring 2008

Complexity

θ(v) inserts into priority queue


θ(v) EXTRACT MIN operations
θ(E) DECREASE KEY operations

Array impl:

θ(v) time for extra min


θ(1) for decrease key
Total: θ(V.V + E.1) = θ(V 2 + E) = θ(V 2 )

Binary min-heap:

θ(lg V ) for extract min


θ(lg V ) for decrease key
Total: θ(V lg V + E lg V )

Fibonacci heap (not covered in 6.006):

θ(lg V ) for extract min


θ(1) for decrease key
amortized cost
Total: θ(V lg V + E)

4
MIT OpenCourseWare
http://ocw.mit.edu

6.006 Introduction to Algorithms


Spring 2008

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 18 Shortest Paths III: Dijkstra 6.006 Spring 2008

Lecture 18: Shortest Paths IV - Speeding up


Dijkstra
Lecture Overview
• Single-source single-target Dijkstra
• Bidirectional search
• Goal directed search - potentials and landmarks

Readings
Wagner, Dorothea, and Thomas Willhalm. "Speed-Up Techniques for Shortest-Path
Computations." In Lecture Notes in Computer Science: Proceedings of the 24th Annual
Symposium on Theoretical Aspects of Computer Science. Berlin/Heidelberg, MA: Springer,
2007. ISBN: 9783540709176. Read up to section 3.2.

DIJKSTRA single-source, single-target

Initialize()
Q ← V [G]
while Q �= φ
do u ← EXTRACT MIN(Q) (stop if u = t!)
for each vertex v � Adj[u]
do RELAX(u, v, w)

Observation: If only shortest path from s to t is required, stop when t is removed from
Q, i.e., when u = t

DIJKSTRA Demo

4
B D
19 15

A 11 13

7
5
C E

A C E B D D B E C A E C A D B
7 12 18 22 4 13 15 22 5 12 13 16

Figure 1: Dijkstra Demonstration with Balls and String

1
Lecture 18 Shortest Paths III: Dijkstra 6.006 Spring 2008

Bi-Directional Search
Note: Speedup techniques covered here do not change worst-case behavior, but reduce the
number of visited vertices in practice.

S t

backward search
forward search

Figure 2: Bi-directional Search

Bi-D Search

Alternate forward search from s


backward search from t
(follow edges backward)
df (u) distances for forward search
db (u) distances for backward search

Algorithm terminates when some vertex w has been processed, i.e., deleted from the queue
of both searches, Qf and Qb

u u’
3

3 3

s t
5
5

Figure 3: Bi-D Search

2
Lecture 18 Shortest Paths III: Dijkstra 6.006 Spring 2008

Subtlety: After search terminates, find node x with minimum value of df (x) + db (x). x may
not be the vertex w that caused termination as in example to the left!
Find shortest path from s to x using Πf and shortest path backwards from t to x using Πb .
Note: x will have been deleted from either Qf or Qb or both.

u u’ Backward u u’
Forward 3 3

3 df (u) = 3
3 3 db(u’) = 3 3
s
s t
5 5 t db(t) = 0
df (s) = 0 5 5

w df (w) = 5
w db (w) = 5

df (u’) = 6 db(u’) = 3
u u’ u u’
Forward 3 Backward 3

3 3
3 3
df (u) = 3 db(u) = 6
s s t
t
5 5 db(t) = 0
df (s) = 0 5 5

df (w) = 5 w db (w) = 5
w

u u’ u u’ db(u’) = 3
Forward 3 Backward 3
df (u’) = 6 df (u’) = 6
3 3 3
3 df (u) = 3 df (u) = 3
db(u) = 6
s s t
t
5 5 db(t) = 0
5 df (t) = 10 df (s) = 0 5
df (s) = 0 df (t) = 10
db (s) = 10
df (w) = 5 w db (w) = 5
w
df (w) = 5
deleted from both queues
so terminate!

Figure 4: Forward and Backward Search

Minimum value for df (x) + db (x) over all vertices that have been processed in at least one
search
df (u) + db (u) = 3 + 6 = 9

3
Lecture 18 Shortest Paths III: Dijkstra 6.006 Spring 2008

df (u� ) + db (u� ) = 6 + 3 = 9
df (w) + db (w) = 5 + 5 = 10

Goal-Directed Search or A∗
Modify edge weights with potential function over vertices.

w (u, v) = w (u, v) − λ(u) + λ(v)

Search toward target:

v’ v
5 5
increase decrease
go uphill go downhill

Figure 5: Targeted Search

Correctness
w(p) = w(p) − λt (s) + λt (t)
So shortest paths are maintained in modified graph with w weights.

p
s t

p’

Figure 6: Modifying Edge Weights

To apply Dijkstra, we need w(u, v) ≥ 0 for all (u, v).


Choose potential function appropriately, to be feasible.

Landmarks
(l)
Small set of landmarks LCV . For all u�V, l�L, pre-compute δ(u, l). Potential λt (u) =
δ(u, l) = δ(t, l) for each l.
(l)
CLAIM: λt is feasible.

4
Lecture 18 Shortest Paths III: Dijkstra 6.006 Spring 2008

Feasibility

(l) (l)
w(u, v) = w(u, v) − λt (u) + λt (v)
= w(u, v) − δ(u, l) + δ(t, l) + δ(v, l) − δ(t, l)
= w(u, v) − δ(u, l) + δ(v, l) ≥ 0 by the Δ -inequality
(l)
λt (u) = max λt (u) is also feasible
l�L

5
MIT OpenCourseWare
http://ocw.mit.edu

6.006 Introduction to Algorithms


Spring 2008

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 19 Dynamic Programming I of IV 6.006 Spring 2008

Lecture 19: Dynamic Programming I:


Memoization, Fibonacci, Crazy Eights, Guessing
Lecture Overview
• Fibonacci Warmup

• Memoization and subproblems

• Shortest Paths

• Crazy Eights

• Guessing Viewpoint

Readings
CLRS 15

Dynamic Programming (DP)


Big idea: :hard yet simple
• Powerful algorithmic design technique

• Large class of seemingly exponential problems have a polynomial solution (“only”)


via DP

• Particularly for optimization problems (min / max) (e.g., shortest paths)


* DP ≈ “controlled brute force”
* DP ≈ recursion + re-use

Fibonacci Numbers
F1 = F2 = 1; Fn = Fn−1 + Fn−2

Naive Algorithm

follow recursive definition

fib(n):
if n ≤ 2: return 1
else return fib(n − 1) + fib(n − 2)
=⇒ T (n) = T (n − 1) + T (n − 2) + O(1) ≈ φn
≥ 2T (n − 2) + O(1) ≥ 2n/2
EXPONENTIAL - BAD!

1
Lecture 19 Dynamic Programming I of IV 6.006 Spring 2008

Fn

Fn-1 Fn-2

Fn-2
Fn-3 Fn-3 Fn-4

Figure 1: Naive Fibonacci Algorithm

Simple Idea

memoize
memo = { }
fib(n):
if n in memo: return memo[n]
else: if n ≤ 2 : f = 1
else: f = fib(n − 1) + fib(n − 2)
� �� �
free
memo[n] = f
return f
T (n) = T (n − 1) + O(1) = O(n)
[Side Note: There is also an O(lg n)- time algorithm for Fibonacci, via different techniques]

* DP ≈ recursion + memoization
• remember (memoize) previously solved “subproblems” that make up problem

– in Fibonacci, subproblems are F0 , F1 , · · · , Fn

• if subproblem already solved, re-use solution


* =⇒ time = � of subproblems · time/subproblem
• – in fib: � of subproblems is O(n) and time/subproblem is O(1) - giving us a total
time of O(n).

2
Lecture 19 Dynamic Programming I of IV 6.006 Spring 2008

Shortest Paths
• Recursive formulation: �
δ(s, t) = min{w(s, v) + δ(v, t) �(s, v) � E}

• does this work with memoization?


no, cycles =⇒ infinite loops (see Figure 2).

s t

Figure 2: Shortest Paths

• in some sense necessary for neg-weight cycles

• works for directed acyclic graphs in O(V + E)


(recursion effectively DFS/topological sort)

• trick for shortest paths: removing cyclic dependency.

– δk (s, t) = shortest path using ≤ k edges �


= min{δk−1 (s, t)} ∪ {w(s, v) + δk−1 (v, t) �(s, v) � E}

. . . except δk (t, t) = φ, δφ (s, t) = ∞ if s �= t


– δ(s, t) = δn−1 (s, t) assuming no negative cycles
=⇒ time = � subproblems · time/subproblem
� �� � � �� �
O(n3 ) for s,t,k··· really O(n2 ) O(n)··· really degV

= O(V · deg(V )) = O(V E)
V

* Subproblem dependency should be acyclic.

3
Lecture 19 Dynamic Programming I of IV 6.006 Spring 2008

Crazy Eights Puzzle


• given a sequence of cards c[φ], c[1], · · · , c[n − 1]
e.g., 7♥, 6♥, 7♦, 3♦, 8♣, J♠

• find longest left-to-right “trick” (subsequence)

c[i1 ], c[i2 ], · · · c[ik ] (i1 < i2 < · · · ik )


where c[ij ] & c[ij+1 ] “match” for all j
have some suit or rank or one has rank 8

• recursive formulation:

trick(i) = length of best trick starting at c[i]


= 1 + max(trick(j) for j in range(i + 1, n) if match (c[i], c[j]))
best = max(trick(i) for i in range(n))

• memoize: trick(i) depends only on trick(> i)

=⇒ time = � subproblems · time/subproblem


� �� � � �� �
O(n) O(n)
= 2
O(n ) (to find actual trick, trace through max’s)

“Guessing” Viewpoint
• what is the first card in best trick? guess!
i.e., try all possibilities & take best result
- only O(n) choices

• what is next card in best trick from i? guess!

– if you pretend you knew, solution becomes easy (using other subproblems)
– actually pay factor of O(n) to try all

• * use only small � choices/guesses per subproblem


� �� �
poly(n)∼O(1)

4
MIT OpenCourseWare
http://ocw.mit.edu

6.006 Introduction to Algorithms


Spring 2008

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 20 Dynamic Programming II of IV 6.006 Spring 2008

Lecture 20: Dynamic Programming II: Longest


Common Subsequence, Parent Pointers
Lecture Overview
• Review of big ideas & examples so far

• Bottom-up implementation

• Longest common subsequence

• Parent pointers for guesses

Readings
CLRS 15

Summary
* DP ≈ “controlled brute force”

* DP ≈ guessing + recursion + memoization

* DP ≈ dividing into reasonable � subproblems whose solutions relate - acyclicly - usually


via guessing parts of solution.

* time = � subproblems × time/subproblem


� �� �
treating recursive calls as O(1)
(usually mainly guessing)

• essentially an amortization
• count each subproblem only once; after first time, costs O(1) via memoization

1
Lecture 20 Dynamic Programming II of IV 6.006 Spring 2008

Examples: Fibonacci Shortest Paths Crazy Eights


subprobs: fib(k) δk (s, t)∀s, k < n trick(i) = longest
0≤k≤n = min path s → t trick from card(i)
using ≤ k edges
� subprobs: Θ(n) Θ(V 2 ) Θ(n)
guessing: none edge from s, if any next card j
� choices: 1 deg(s) n−i
relation: = fib(k − 1) = min{δk−1 (s, t)} = 1 + max(trick(j))
+ fib(k − 2) u{w(s, v) + δk−1 (v, t) for i < j < n if
| v� Adj[s]} match(c[i], c[j])
time/subpr: Θ(1) Θ(1 + deg(s)) Θ(n − i)
DP time: Θ(n) Θ(V E) Θ(n2 )
orig. prob: fib(n) δn−1 (s, t) max{trick(i), 0 ≤ i < n}
extra time: Θ(1) Θ(1) Θ(n)

Bottom-up implementation of DP:


alternative to recursion
• subproblem dependencies form DAG (see Figure 1)

• imagine topological sort

• iterate through subproblems in that order


=⇒ when solving a subproblem, have already solved all dependencies

• often just: “solve smaller subproblems first”

Figure 1: DAG

Example.
Fibonacci:
for k in range(n + 1): fib[k] = · · ·
Shortest Paths:
for k in range(n): for v in V : d[k, v, t] = · · ·
Crazy Eights:
for i in reversed(range(n)): trick[i] = · · ·

2
Lecture 20 Dynamic Programming II of IV 6.006 Spring 2008

• no recursion for memoized tests


=⇒ faster in practice

• building DP table of solutions to all subprobs. can often optimize space:

– Fibonacci: PS6
– Shortest Paths: re-use same table ∀k

Longest common subsequence: (LCS)


A.K.A. edit distance, diff, CVS/SVN, spellchecking, DNA comparison, plagiarism, detec­
tion, etc.
Given two strings/sequences x & y, find longest common subsequence LCS(x,y) sequential
but not necessarily contiguous

• e.g., H I E R O G L Y P H O L O G Y vs. M I C H A E L A N G E L O
common subsequence is Hello

• equivalent to “edit distance” (unit costs): � character insertions/deletions to transform


x → y everything except the matches

• brute force: try all 2|x| subsequences of x =⇒ Θ(2|x| · | y |) time

• instead: DP on two sequences simultaneously

* Useful subproblems for strings/sequences x:

• suffixes x[i :]
• prefixes x[: i]
The suffixes and prefixes are Θ(| x |) ← =⇒ use if possible
• substrings x[i : j] Θ(| x |2 )

Idea: Combine such subproblems for x & y (suffixes and prefixes work)

LCS DP

• subproblem c(i, j) =| LCS(x[i :], y[j :]) | for 0 ≤ i, j < n


=⇒ Θ(n2 ) subproblems
- original problem ≈ c[0, 0] (length ∼ find seq. later)

• idea: either x[i] = y[j] part of LCS or not =⇒ either x[i] or y[j] (or both) not in
LCS (with anyone)

• guess: drop x[i] or y[j]? (2 choices)

3
Lecture 20 Dynamic Programming II of IV 6.006 Spring 2008

• relation among subproblems:

if x[i] = y[j] : c(i, j) = 1 + c(i + 1, j + 1)


(otherwise x[i] or y[j] unused ∼ can’t help)
else: c(i, j) = max{c(i + 1, j), c(i, j + 1)}
� �� � � �� �
x[i] out y[j] out
base cases: c(| x |, j) = c(i, | y |) = φ
=⇒ Θ(1) time per subproblem
=⇒ Θ(n2 ) total time for DP

• DP table: see Figure 2

j
φ |y|
φ

i if x[i] = y[j]

if x[i] = y[j]
|x|
[-linear space via antidiagonal order ]

Figure 2: DP Table

• recursive DP:

def LCS(x, y):


seen = { }
def c(i, j):
if i ≥ len(x) or j ≥ len(y) : returnφ
if (i, j) not in seen:
if x[i] == y[j]:
seen[i, j] = 1 + c(i + 1, j + 1)
else:
seen[i, j] = max(c(i + 1, j), c(i, j + 1))
return seen[i, j]
return c(0, 0)

4
Lecture 20 Dynamic Programming II of IV 6.006 Spring 2008

• bottom-up DP:

def LCS(x, y):


c = {}
for i in range(len(x)):
c[i, len(y)] = φ
for j in range(len(y)):
c[len(x), j] = φ
for i in reversed(range(len(x))):
for j in reversed(range(len(y))):
if x[i] == y[j]:
c[i, j] = 1 + c[i + 1, j + 1]
else:
c[i, j] = max(c[i + 1, j], c[i, j + 1])
return c[0, 0]

Recovering LCS: [material covered in recitation]


• to get LCS, not just its length, store parent pointers (like shortest paths) to remember
correct choices for guesses:

if x[i] = y[j]:
c[i, j] = 1 + c[i + 1, j + 1]
parent[i, j] = (i + 1, j + 1)
else:
if c[i + 1, j] > c[i, j + 1]:
c[i, j] = c[i + 1, j]
parent[i, j] = (i + 1, j)
else:
c[i, j] = c[i, j + 1]
parent[i, j] = (i, j + 1)

• . . . and follow them at the end:

lcs = [ ]
here = (0,0)
while c[here]:
if x[i] == y[j]:
lcs.append(x[i])
here = parent[here]

5
MIT OpenCourseWare
http://ocw.mit.edu

6.006 Introduction to Algorithms


Spring 2008

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
������� �� ������� ����������� ��� �� �� ����� ������ ����

������� ��� ������� ����������� ���� ����


������������� ����������������� ���������
���������������� ����� ������ ��������
������� ��������
• ���� ������������

• ����������������

• ��������

• ���������������� ����

• ������ ��������

��������
���� ��

�������
� �� �� ��� ����� ����������� � ��������

� � ���� ������

��� ����� ������������ ����� � ���������

��� ����� ����� �� ���������� ����� � �������

��� ������ �������� ���������� ������� �������������

��� ������� � ������� �� ����� �� ����� ������ ���

���� � ������������� � ��������

������ ����������� ������� ������������

��� ����� �������� ������� � � ���������� �� �������� ���� �� ����� � =⇒ ����� �����

� ��� ���������� ���� ����������� ��� ����� ������� �� ������ �� ����������


������� �� ������� ����������� ��� �� �� ����� ������ ����

���� �������������
����� ���� ���� ����� ������

• ������� ��� ��������� ����� ���������� ��� �� ���� ����� �� �� ���� ����� ������

• ��� ���� ��� ���� ���� ��� �����

blah blah blah blah blah


.. b l a h vs. blah blah . .
reallylongword reallylongword

������ �� ���� ��� ��� ������������

• ����� �������(i, j) ��� ���� �� ����� [i : j] �����


�� ����� ������ > ���� �����

(���� ����� � ����� ������)


3 ����


• ����� ����� ����� ���� ����� �� min �������

�� ���������� � min ������� ��� ���� �����[i :]


=⇒ � ����������� = Θ(n) ����� n=� �����

�� �������� � ����� �� ��� ���� ����� ��� i:j


=⇒ � ������� = n − i = O(n)

�� ���������

• ��[i] = min��������(i, j) � ��[j] ��� j �� �����(i + 1, n + 1)�


• ��[n] =φ
=⇒ ���� ��� ���������� = O(n)

�� ����� ���� = O(n2 )

�� �������� = ��[φ]

�� ��� ������ �������� �� ������� ������


������� �� ������� ����������� ��� �� �� ����� ������ ����

�����������������
������� ���������� �� ����������� ���������� � ����� ����������� ����������� ��������

A B C
(AB)C costs θ(n2)
. . A(BC) costs θ(n)

������ �� Evaluation of an Expression

�� �������� � ��������� �������������� · · · )(����


(���� ··· )
↑k−1 ↑k
=⇒ � ������� = O(n)
�� ����������� � ������� � �������� ��

� ���� �� ��������� A[i : j]


=⇒ � 2
����������� � Θ(n )

�� ���������

• ��[i, j] = min(�� [i, k] + ��[k, j]+ ���� �� ����������� (A[i] · · · A[k − 1]) ��

(A[k] · · · A[j − 1]) ��� k �� �����(i + 1, j))


• ��[i, i + 1] = φ

=⇒ ���� ��� ���������� = O(n)

�� ����� ���� = O(n3 )

�� �������� = DP [0, n]
�� ��� ������ �������� �� ������� ��������

���������
�������� �� ���� S ��� ���� �� ����

• ���� i ��� ������� ���� si � ���� ����� vi

• ����� ������ ������ �� ����� �� ������� ����� ����� ������� �� ����� ���� ≤S

����� ��������

�� ���������� � ����� ��� ����� �� �����

�� �������� � ������� �� ������� ���� i =⇒ � ������� � 2

�� ���������


������� �� ������� ����������� ��� �� �� ����� ������ ����

• DP [i] = max(DP [i + 1], vi + DP [i + 1] �� � ≤�


si � S?!)

• ��� ������ ����������� �� ���� ������� ���� i ��� � ��� ���� ����� �� �����

������

�� ���������� � ����� ��� ���� i�


����� �������� �� ���� X
=⇒ � ����������� = O(nS) �

�� ���������

• DP [i, X] = max(DP [i + 1, X], vi + DP [i + 1, X − si ] �� si ≤ X)


• DP [n, X] = φ
=⇒ ���� ��� ���������� = O(1)

�� ����� ���� = O(nS)

�� �������� = DP [φ, S]
�� ��� ������ �������� �� ������� �������

�������� ���������� ������ ��� �������� ��������

�������� �� �� ���� ������������ =⇒ ������� �� ��������������� ��������� �����������

�� ������ �� �������

���� ������

• ���� ����� =< S, s0 , · · · , sn−1 , v0 , · · · , vn−1 >

• ������ �� ������� O(lg S + lg s0 + · · · ) ≈ O(n lg . . .)

• �� O(nS) �� ��� �����������������

• O(nS) ����� ������ ���� �� S �� �����

• � ���������������� ������ ���������� �� ������ �� ����� � �������� �� ��� �����

���������

���������� � ����

����������� � ���

���������� � �� ��


������� �� ������� ����������� ��� �� �� ����� ������ ����

������ �� ������

������ ���������
• ����� �������� �� n ������ ������ � � ����� �� ����� ����� w

• ���� ������ ����������� � x ���������� ��� ����

• ���� ���� ���� ����� ���� �� ���� ���������

• ���� ���� �� ��� �����

������� ����� �������������� �� ����� ����� ����� �� w ����� ���� ������������

• ����� ������� ����� ���� ������ ������ h

��������� ����� ������� �� �����������

����� ��������

�� ���������� � ������� �� ����� i�� �����

�� �������� � ��� �� ���� ����� i =⇒ � ������� = O(w)

�� ��������� DP [i] = DP [i + 1] �� ��� ������ ������������

���� �� �� ���� �� ���� ����� ����� � ��

�� ���������� � �������� �� ���� i�


����� ������� ������ ����������� h0 , h1 , · · · , hw−1
=⇒ � ����������� = O(n · hw )
�� ��������� DP [i, h] = max(DP [i, m] ��� ����� ����� m �� ����� i �� h)

=⇒ ���� ��� ���������� = O(w)

�� ����� ���� = O(nwhw )

�� �������� = DP [φ, φ]
�� ��� ������ �������� �� ������� ������


MIT OpenCourseWare
http://ocw.mit.edu

6.006 Introduction to Algorithms


Spring 2008

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
������� �� ������� ����������� �� �� �� ����� ������ ����

������� ��� ������� ����������� ��� �����


���������� ���������� �� �������� ������ ������
���������� ���� ������
������� ��������
• ����� ���������

• ���������� �� �������

• ������ ����� � ���������� ���

• ������� ���������� ������ ������� �������

��������
���� ��

�������
� ���� ����� ��� ��

�� ����������� ������ � ������

�� �������� ����� � ������

�� �������� ���� ���� �����

�� �� ���� ������ ���������

�� �������� �������

� � ����� �� ���������
�� �� �� ����� ����� ����� ����������� �� ��� ����� �� ����� �� ������ ����������

�� �� �� ������ ���� ����������� �� ����� ���� ��������� �� �������� ����� �� ��������

���

• ���������� ������ ���� ��������� �� subproblems.


• ���� ������ ���������� ���� �������� �� solution.


������� �� ������� ����������� �� �� �� ����� ������ ����

����� ���������
���������� �������� ������� ���������� ������� �����

������ ������ ���� �����

��� ������� �������� ������� ����� ����

• ����� ������� ����� �� ����� ��� �������� �� �������� ����� ���� ����� ����

• ������ d(f, p, g, q) �� �������� ����� ���� ���� p ���� ����� f �� ���� q ���� ����� g

����� 1 < f < g & p > q =⇒ �������������


������� ����� p � q =⇒ �������������

������ �������� =⇒ ∞ �� f = g

���������� ����� ������ �� ����� g � {4, 5}

3 → 4 & 4 → 3 �������� ∼ ����

����� ��������
�� ���������� � ���� ��������� ��� ����� �����[i :]

�� �������� � ����� f ��� ���� ����[i]

�� ��[i] = min(DP [i + 1] + d(����[i], f, ����[i + 1], ?) ��� f · · · )


→ ��� ������ �����������

�� ���������� � min �������� ��� ���� ��������� ����� ����� f �� ���� ����[i]

�� �������� � ����� g ��� ���� ����[i + 1]

�� ������ � � min(DP [i + 1, g] + d(����[i], f, ����[i + 1], g) ��� g � �����(F ))


← � ������ � � ��� ������
��[n, f ] = φ

�� Fn ������������ F ������� ��� ���������� =⇒ O(F 2 n) ����

�� min(��[φ, f ] ���f �� �����(F ))


������� �� ������� ����������� �� �� �� ����� ������ ����

���������� ���
������ ������������� ��������� ����� ���� � ���������������� ��� ������� �� ���������� ���

������� ����������

� ��� �� �� ������ ������ ���������� �� ������� ������ �� ������ v� ��� ��� v

Figure 1: DP on Trees

������ ������
���� ������� ��� �� �������� ������� ���� ���� ����� ���� �� ������� �� ≥1 ���

• ����������� �� ������� ������

• ���������� ��� ������

�� ���������� � ���� ����� ��� ������� ������ �� v


=⇒ n �����������

�� �������� � �� � �� ������

YES

NO
Figure 2: Vertex Cover


������� �� ������� ����������� �� �� �� ����� ������ ����

� =⇒ 2 �������
� ��� =⇒ ����� �������� �����
=⇒ ���� ���� �������� ��������

� �� =⇒ ��� �������� ���� �� �� �����

=⇒ ���� ���� ������������� ��������

�� ��[v] = min(1 + ������[c] ��� c �� ��������[v]) ���

������������� � ������[g] ��� g �� �������������(v)�� ��

�� ���� = O(n)
�� ��������

���������� ����
���� ������� ��� �� �������� ���� ���� ����� ������ �� �� �� �������� �� ���

� ����� ����������� �� ������� ���������� �� ������

��������� ����� ������� �� �����������


�� ���������� � ���� ���� ��� ������� ������ �� v

�� �������� � �� � �� ���� ����

• ��� =⇒ �������� ��������

• �� =⇒ ���� ��� ���� ����� �� ���� ���

=⇒ �������� ���� ������� ��������


�� ��[v] = min(1 + ���(DP [c] ��� c �� ��������[v] Y ES
� �� �
��� � �� ������� ��������� · · · ���� �������

1 + ���(DP (c) ��� c �= d �� ��������[v])) ��


+ ���(DP � [g] ��� � �� ��������[d])) ��
����� ������� ��������� ∼ �������� �������

� �������� �� ��� ������ ���� ���

��� d �� ��������[c]� ← ����� ����� � ��� �

��� ���������� � � ���� ���� ��� ������� ������ �� � ����� ���� � ��������� �������

��� ������ �����������

=⇒ 2n ����������� �����

��� ���[v] = min( 1+ ���(DP


� [c] ��� c �� ��������[v], ���

���(DP [c] ��� c �� ��������[v])) ��


�� ���� � O( deg(v)) = O(E) = O(n)

�� ��������


������� �� ������� ����������� �� �� �� ����� ������ ����

�������
����������
���� ������ ��� ������ ������ ���� ���������� ����������� �∼ � ������

• ���� �������� ���� ��� ����������� �� ������� ��� �� ������ �� ���� ������ ��� ��

������ �������
������ ����� ����������� �� �����

Figure 3: Planar Graphs

• ������ ������ ����� ���� ��� ������� ��� ������ �

• ����� ���� ����� ��� ����� ������ � � ��

�������� ���� ������ φ, 1, · · · , k − 1 �������

• �� ��� ������ ��������� ����� �� � ������ ����� �� ��������� O(k)


=⇒ ��� ����� ���� ���������� �� ���������

• ��� ������� ����� ��������� �� ����� �������� ������� ��� ���������� ��� ������ 1+1/k
������ �� ������� � � � ∀ ��������� k

������� �������� ���� ����������


�������������� �� ��� ���� ������

• �� �� ���������� �� ������ �������� ���������


MIT OpenCourseWare
http://ocw.mit.edu

6.006 Introduction to Algorithms


Spring 2008

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
������� �� �������� � ����� ������ ����

������� ��� �������� �


������� ��������

• �����������

• �������� ������ ( (a), 1/b)

• ���� ��������� �������� ←

• ���� ����

� ���� ��������� ����� ���������� ����������

� ���� ��������� ��������

������������

���������� ���������� ���� � �������� �������� ��� ��� ���� ��� ���������������� ����� �����

��� �� ��������� �� � ����� � �� ������ ��� ����� �������������

√2
1

Figure 1: Ratio of a Square's Diagonal to its Sides

���������� ���������� �������

���� �� �������

����������� ���� � �������

���������� ��������� ��� ����� ������ �������� �� ������������ ��� ��� ��� � ��������


2 = 1. 414 213 562 373 095
048 801 688 724 209
698 078 569 671 875


������� �� �������� � ����� ������ ����

����������

������� ��������

��� � �� �������� ����������� ������� ��� ����������� ������ ��

• λ�P �λ �� ����� �������

• �� α, β � P � ���� (α)β � P

����� �������� �������� ����� ������ ��� �� �������� ��� ���� � ���� � ������ α, β �����

��� �������� (()) ()() �������� �� (()) ()()


���� ����
α β

�����������

Cn � ������ �� �������� ����������� ������� ���� ������� n ����� �� �����������

C0 = 1 ����� ������

Cn+1 ? ����� ������ ���� n+1 ����� �� ����������� ��� �� �������� �� � ������ ��� ���

���� 2�

��� ����� ���� ����� ���������� ���� ��� �����

k ����� ���� α� n − k ����� ���� β


n

Cn+1 = Ck · Cn−k n≥0
k=0
2
C0 = 1 C1 = C0 = 1 C2 = C0 C1 + C1 C0 = 2 C3 = · · · = 5

�� �� �� �� ��� ��� ���� ���� ����� ����� ������

������ ������� ������� �������� ��������

��������� ���������� ���������� �����������

����������� ������������ ������������

������������� �������������� �������������� � � �


������� �� �������� � ����� ������ ����

�������� �������

1 A
C D
1000,000,000,000

Figure 2: Geometry Problem

BD = 1
���� �� AD�

AD = AC − CD = 500, 000, 000, 000 − 500, 000, 000, 0002 − 1
� �� �
a

����� ��������� AD �� � ������� �������

�������� ������

���� ���� �� f (x) = 0 ������� ���������� ������������� ����� f (x) = x2 − a


-

xi

xi+1
y = f(x)

Figure 3: Newton's Method


������� �� �������� � ����� ������ ����

������� �� (xi , f (xi )) �� ���� y = f (xi ) + f � (xi ) · (x − xi ) ����� f � (xi ) �� ��� �����������

xi+1 = ��������� �� ������

f (xi )
xi+1 = xi −
f � (xi )

������ �����

f (x) = x2 − a
a
χi +
(χi 2 − a) χi
χi+1 = χi − =
2χi 2
�������
χ0 = 1.000000000 a=2
χ1 = 1.500000000
χ1 = 1.416666666
χ1 = 1.414215686
χ1 = 1.414213562

��������� ������������ � ������ �������

���� ��������� �����������



2 �� d������ ����������
�···
1� .414213562373
��
√ √� ������
���� ������� �10
d 2� � � 2 · 102d � � �������� ���� �� ������ ����

��� ����� ��� �������� �������



����� ��� �� �� 2� ��� ��� ������� AD�
��� �������� ������������

���� ��������� ��������������

����������� ��� n������ ������� ������ r = 2, 10�


0 ≤ x, y < rn

x = x1 · rn/2 + x0 x1 = ���� ����


n/2
y = y1 · r + y0 x0 = ��� ����

0 ≤ x0 , x1 < rn/2
0 ≤ y0 , y1 < rn/2

z = x · y = x1 y1 · rn + (x0 · y1 + x1 · y0 )rn/2 + x0 · y0
� ��������������� �� ���������� ��� =⇒ ��������� ��������� θ(n2 ) ����


������� �� �������� � ����� ������ ����

����������� ������

log2n
log2n

4T(n/2) 3T(n/2)
4log n = nlog 4 = n2
2 2
3log n = nlog 3
2 2

Figure 4: Branching Factors

���

z0 = x0 · y0
z2 = x2 · y2
z1 = (x0 + x1 ) · (y0 + y1 ) − z0 − z2
= x0 y1 + x1 y0
z = z2 · rn + z · rn/2 + z0

����� ��� ����� ���������� �� ��� ����� �������������


T (n) = ���� �� �������� ��� n������� s
= 3T (n/2) + θ(n)
� �
= θ nlog2 3 = θ n1.5849625···
� �

������ ���� θ(n2 )� ������ ���� �����


������� �� �������� � ����� ������ ����

����� �������� �� �������� ������



������� Xn = a · (1 + �n ) �n ��� �� � �� �

�����

Xn + a/Xn
Xn+1 =
√ 2
a(1 + �n ) + √ a
a(1+�n )
=
� 2 �
1
� (1 + �n ) + (1+�n )
= (a)
2
2 + 2�n + �n 2
� �

= (a)
2(1 + �n )
�n 2
� �

= (a) 1 +
2(1 + �n )

����������
�n 2
�n+1 =
2(1 + �n )
��������� ������������ �� � ������ ��������


MIT OpenCourseWare
http://ocw.mit.edu

6.006 Introduction to Algorithms


Spring 2008

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
������� �� �������� �� ����� ������ ����

������� ��� �������� ��


������� ��������

• �������

� ���� ��������� ����������


� ��������������
• ��������

� ���������
� ����� ��������
• �����������

�������

���� ��������� ����� �� 2� √
� 2 · 102d � d = 106

������� � a� ��� �������� ������

χ0 = 1 �������� ������
χi + a/χi
χi+1 = ← ���������
2
��������� �������������� � ������� ������ ������� ���� �����

���������������

�� ����� ������ � ������� ������� θ(d2 ) ����


�� ���������� θ(dlog2 3 ) = θ(d1.584... )
�� ��������� ����������� ��������� ������ ���� k ≥ 2 ����� �
� �
T (d) = 5T (d/3) + θ(d) = θ dlog3 5 = θ d1.465...
� �

�� ������������������ � ������ ������� θ(d lg d lg lg d) ����� ���� ��� �� ����� ��� ��


���� �������
� ∗

�� ������������ θ n log n2O(log n) ����� log∗ n �� �������� ���������� � ����� ��� �����
�� �� ������� �� ��� � ������ ���� �� ���� ���� �� ����� �� 1�


������� �� �������� �� ����� ������ ����

���� ��������� ��������


a
�� ���� ���� ��������� ��� ��
b
1
• ������� �������������� ��� �� ����
b
1 R
• �������������� ��� �� ����� � � ����� R �� ����� ����� ���� �� �� ���� �� ������ ��
b b
R
��� R = 2k ��� ������ ���������������

��������
R
�������� ������ ��� ���������
b
� �
1 b R
f (x) = − ���� �� x =
x R b
−1
f � (x) =
x2 � �
1
b
f (χi ) χi
R −
χi+1 = χi − = χi −
f � (χi ) −1/χi 2
bχi 2 → ��������
� �
1 b
χi+1 = χi + χi 2 − = 2χi −
χi R R → ���� ���

�������

R 216 65536
���� = = = 13107.2
b 5 5
216
��� ������� ����� = 214
4

χ0 = 214 = 16384
χ1 = 2 · (16384) − 5(16384)2 /65536 = 12288
χ2 = 2 · (12288) − 5(12288)2 /65536 = 13056
χ3 = 2 · (13056) − 5(13056)2 /65536 = 13107


������� �� �������� �� ����� ������ ����

����� ��������

bχi 2 R
χi+1 = 2χi − ������ χi = (1 + �i )
R b
b R 2
� �
R
= 2 (1 + �i ) − (1 + �i )2
b R b
R�
(2 + 2�i ) − (1 + 2�i + �i 2 )

=
b
R� � R
= 1 − �i 2 = (1 + �i+1 ) ����� �i+1 = −�i 2
b b
��������� ������������ � ������ ������� �� ���� ����
��������� ���������� �� �������� � ���������� �� ��������������

�����������

χ + �a/χ �
���������� χi+1 = � i i

2
�� ����� ����� ���� ������� ����������
��������� ��
a
χi + χi −α
χi+1 = −β
2
a
χi + α
=
χi
−γ ����� γ = + β ��� 0 ≤ γ < 1
2 2

a+b √ χi + χai √ √
����� ≥ ab, ≥ a� �� ����������� γ ������ ������ �� ≥ � a�� ���� �����
2 2
���� ����� ����� �� �i < 1 ����� ������� ������


MIT OpenCourseWare
http://ocw.mit.edu

6.006 Introduction to Algorithms


Spring 2008

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 25 Beyond 6.006 6.006 Spring 2008

Lecture 25: Beyond 6.006: Follow-on Classes,


Geometric Folding Algorithms
Algorithms Classes at MIT: (post 6.006)
1. 6.046: Intermediate Algorithms (more advanced algorithms & analysis, less coding)

2. 6.047: Computational Biology (genomes, phylogeny, etc.)

3. 6.854: Advanced Algorithms (intense survey of whole field)

4. 6.850: Geometric Computing (working with points, lines, polygons, meshes, . . . )

5. 6.851: Advanced Data Structures (sublogarithmic performance)

6. 6.852: Distributed Algorithms (reaching consensus in a network with faults)

7. 6.855: Network Optimization (optimization in graph: beyond shortest paths )

8. 6.856: Randomized Algorithms (how randomness makes algorithms simpler & faster)

9. 6.857: Network and Computer Security (cryptography)

10. 6.885: Geometric and Folding Algorithms * TODAY

Other Theory Classes:


• 6.045: Automata, Computability, & Complexity

• 6.840: Theory of Computing

• 6.841: Advanced Complexity Theory

• 6.842: Randomness & Computation

You might also like