Week 09
Week 09
Greedy Techniques
Hoang Dau
RMIT University
Email : [email protected]
Lecture 9
Learning outcomes:
• Understand and be able to apply the greedy approach to solve
interesting problems.
• Examples:
• spanning tree – Prim’s algorithm
• spanning tree – Kruskal’s algorithm
• single source shortest-path – Dijkstra’s algorithm
• data compression
1 Overview
2 Prim’s Algorithm
3 Kruskal’s Algorithm
4 Dijkstra’s Algorithm
5 Data Compression
6 Summary
1 Overview
2 Prim’s Algorithm
3 Kruskal’s Algorithm
4 Dijkstra’s Algorithm
5 Data Compression
6 Summary
1 Overview
2 Prim’s Algorithm
3 Kruskal’s Algorithm
4 Dijkstra’s Algorithm
5 Data Compression
6 Summary
a 1 b a 1 b a 1 b a 1 b
5 2 2 5 5 2
c 3 d c 3 d c 3 d c d
a
7
8
5 b c
7
9 5
15
d e
6 9
8
11
f g
VT = {}, PQ = {}
a
7
8
5 b c
7
9 5
15
d e
6 9
8
11
f g
a
7
8
5 b c
7
9 5
15
d e
6 9
8
11
f g
a
7
8
5 b c
7
9 5
15
d e
6 9
8
11
f g
a
7
8
5 b c
7
9 5
15
d e
6 9
8
11
f g
a
7
8
5 b c
7
9 5
15
d e
6 9
8
11
f g
a
7
8
5 b c
7
9 5
15
d e
6 9
8
11
f g
a
7
8
5 b c
7
9 5
15
d e
6 9
8
11
f g
VT = {d, a, f , b, e, c, g}, PQ = {}
https://www.youtube.com/watch?v=5mnOClCO_9o
1 Overview
2 Prim’s Algorithm
3 Kruskal’s Algorithm
4 Dijkstra’s Algorithm
5 Data Compression
6 Summary
3 4 4 6
a 5 f 5 d
6 2 8
All Edges : bc ef ab bf cf af df ae cd de
1 2 3 4 4 5 5 6 6 8
Tree Edges : ∅
∅
Hoang Dau (RMIT University) Greedy Techniques Lecture 9 17 / 61
Kruskal’s Algorithm – Example
b 1 c
3 4 4 6
a 5 f 5 d
6 2 8
All Edges : bc ef ab bf cf af df ae cd de
1 2 3 4 4 5 5 6 6 8
Tree Edges : ∅
∅
Hoang Dau (RMIT University) Greedy Techniques Lecture 9 18 / 61
Kruskal’s Algorithm – Example
b 1 c
3 4 4 6
a 5 f 5 d
6 2 8
All Edges : bc ef ab bf cf af df ae cd de
1 2 3 4 4 5 5 6 6 8
Tree Edges : bc
1
Hoang Dau (RMIT University) Greedy Techniques Lecture 9 19 / 61
Kruskal’s Algorithm – Example
b 1 c
3 4 4 6
a 5 f 5 d
6 2 8
All Edges : bc ef ab bf cf af df ae cd de
1 2 3 4 4 5 5 6 6 8
Tree Edges : bc ef
1 2
Hoang Dau (RMIT University) Greedy Techniques Lecture 9 20 / 61
Kruskal’s Algorithm – Example
b 1 c
3 4 4 6
a 5 f 5 d
6 2 8
All Edges : bc ef ab bf cf af df ae cd de
1 2 3 4 4 5 5 6 6 8
Tree Edges : bc ef ab
1 2 3
Hoang Dau (RMIT University) Greedy Techniques Lecture 9 21 / 61
Kruskal’s Algorithm – Example
b 1 c
3 4 4 6
a 5 f 5 d
6 2 8
All Edges : bc ef ab bf cf af df ae cd de
1 2 3 4 4 5 5 6 6 8
Tree Edges : bc ef ab bf
1 2 3 4
Hoang Dau (RMIT University) Greedy Techniques Lecture 9 22 / 61
Kruskal’s Algorithm – Example
b 1 c
3 4 4 6
a 5 f 5 d
6 2 8
All Edges : bc ef ab bf cf af df ae cd de
1 2 3 4 4 5 5 6 6 8
Tree Edges : bc ef ab bf df
1 2 3 4 5
Hoang Dau (RMIT University) Greedy Techniques Lecture 9 23 / 61
Kruskal’s Algorithm – Summary
1 Overview
2 Prim’s Algorithm
3 Kruskal’s Algorithm
4 Dijkstra’s Algorithm
5 Data Compression
6 Summary
Problem
Given a weighted connected graph, the shortest-path problem asks to
find the shortest path from a starting source vertex to a destination
target vertex.
Problem
Given a weighted connected graph, the shortest-path problem asks to
find the shortest path from a starting source vertex to a destination
target vertex.
Problem
Given a weighted connected graph, the single-source shortest-paths
problem asks to find the shortest path to all vertices given a single
starting source vertex.
Idea:
• At all times, we maintain our best estimate of the shortest-path
distances from source vertex to all other vertices.
Problem
Given a weighted connected graph, the single-source shortest-paths
problem asks to find the shortest path to all vertices given a single
starting source vertex.
Idea:
• At all times, we maintain our best estimate of the shortest-path
distances from source vertex to all other vertices.
• Initially we do not know, so all these distance estimates are ∞.
Problem
Given a weighted connected graph, the single-source shortest-paths
problem asks to find the shortest path to all vertices given a single
starting source vertex.
Idea:
• At all times, we maintain our best estimate of the shortest-path
distances from source vertex to all other vertices.
• Initially we do not know, so all these distance estimates are ∞.
• But as the algorithm explores the graph, we update our estimates,
which converges to the true shortest path distance.
b 4 c
3 2 5 6
a 7 d 4 e
b 4 c
3 2 5 6
a 7 d 4 e
b 4 c
3 2 5 6
a 7 d 4 e
b 4 c
3 2 5 6
a 7 d 4 e
b 4 c
3 2 5 6
a 7 d 4 e
b 4 c
3 2 5 6
a 7 d 4 e
c(b,7) e(d,5+4)
S = {a(a,0), b(a,3), d(b,5)}
b 4 c
3 2 5 6
a 7 d 4 e
e(d,9)
S = {a(a,0), b(a,3), d(b,5), c(b,7)}
b 4 c
3 2 5 6
a 7 d 4 e
Length Path
3 a-b
5 a-b-d
7 a-b-c
9 a-b-d-e
1 Overview
2 Prim’s Algorithm
3 Kruskal’s Algorithm
4 Dijkstra’s Algorithm
5 Data Compression
6 Summary
a 7→ 00,
c 7→ 01,
g 7→ 10,
t 7→ 11,
A A R D V A R K
0x41 0x41 0x52 0x44 0x56 0x41 0x52 0x4B
1000001 1000001 1010010 1000100 1010110 1000001 1010010 1001011
• Fixed length codewords are not the optimal in average bits per
source symbol.
• Why? The frequency of appearance of each member of the
source alphabet may not be uniformly distributed.
• Fixed length codewords are not the optimal in average bits per
source symbol.
• Why? The frequency of appearance of each member of the
source alphabet may not be uniformly distributed.
• Consider the letters ‘e’ and ‘z’ in natural language text, and using
the same length codewords to represent both.
• e.g., “zee”
• Using ascii where each letter represented by 7 bits, this is 3 * 7 =
21 bits
The frequency of appearance of characters from the English alphabet extracted from
a 267 MB segment of SGML -tagged newspaper text drawn from the WSJ component
of the TREC data set.
Solution?
• A variable length code maps each member of a source alphabet to
a codeword string, but the length of codewords is no longer fixed.
• E.g., use a shorter codeword for ’e’ and a larger one for ’z’.
Solution?
• A variable length code maps each member of a source alphabet to
a codeword string, but the length of codewords is no longer fixed.
• E.g., use a shorter codeword for ’e’ and a larger one for ’z’.
• “zee” (hypothetically use 2 bit code for ’e’ and ’10’ bit code for z,
this is 10 + 2*2 = 14 bits)
Solution?
• A variable length code maps each member of a source alphabet to
a codeword string, but the length of codewords is no longer fixed.
• E.g., use a shorter codeword for ’e’ and a larger one for ’z’.
• “zee” (hypothetically use 2 bit code for ’e’ and ’10’ bit code for z,
this is 10 + 2*2 = 14 bits)
• However, not all possible variable length coding schemes are
decodeable.
Symbol a b c d e f g
Frequency 25 12 9 4 3 2 1
Symbol Codeword `i
a 0 1
b 1 1
c 00 2
d 01 2
e 10 2
f 11 3
g 110 3
Decode: 0010100010000111001011
Symbol a b c d e f g
Frequency 25 12 9 4 3 2 1
Symbol Codeword `i
a 0 1
b 1 1
c 00 2
d 01 2
e 10 2
f 11 3
g 110 3
Decode: 0010100010000111001011
Variable length codes must be chosen so text is uniquly decodeable.
Symbol Codeword `i
a 0 1
b 100 3
c 110 3
d 111 3
e 1010 4
Idea: Build prefix tree bottom up. Read the codes from this prefix tree.
Idea: Build prefix tree bottom up. Read the codes from this prefix tree.
1 For each symbol, calculate the probability of appearance.
Construct a leaf node for it.
Idea: Build prefix tree bottom up. Read the codes from this prefix tree.
1 For each symbol, calculate the probability of appearance.
Construct a leaf node for it.
2 Put these leaf nodes to the set of candidate nodes (to merge).
Idea: Build prefix tree bottom up. Read the codes from this prefix tree.
1 For each symbol, calculate the probability of appearance.
Construct a leaf node for it.
2 Put these leaf nodes to the set of candidate nodes (to merge).
3 Select the two nodes with the lowest probability (from candidate
nodes) and combine them in a “bottom-up” tree construction.
Idea: Build prefix tree bottom up. Read the codes from this prefix tree.
1 For each symbol, calculate the probability of appearance.
Construct a leaf node for it.
2 Put these leaf nodes to the set of candidate nodes (to merge).
3 Select the two nodes with the lowest probability (from candidate
nodes) and combine them in a “bottom-up” tree construction.
4 The new parent has a probability equal to the sum of the two child
probabilities, and replaces the two children in the set of candidate
nodes. Add links to the children, one link with labelled ’0’, the
other ’1’.
Idea: Build prefix tree bottom up. Read the codes from this prefix tree.
1 For each symbol, calculate the probability of appearance.
Construct a leaf node for it.
2 Put these leaf nodes to the set of candidate nodes (to merge).
3 Select the two nodes with the lowest probability (from candidate
nodes) and combine them in a “bottom-up” tree construction.
4 The new parent has a probability equal to the sum of the two child
probabilities, and replaces the two children in the set of candidate
nodes. Add links to the children, one link with labelled ’0’, the
other ’1’.
5 When only one candidate node remains, a tree has been formed,
and codewords can be read from the edge labels of a tree.
Hoang Dau (RMIT University) Greedy Techniques Lecture 9 50 / 61
Huffman Trees – Example
a b c d e f g
a b
a b
a b
a b
0.40 0.60
0 1 0 1
0.20 0.20 0.30 0.30
0 1 d e 0 1
0.10 0.10 0.20 0.10
0 1 c f g
0.05 0.05
a b
1.00
0 1
0.40 0.60
0 1 0 1
0.20 0.20 0.30 0.30
0 1 d e 0 1
0.10 0.10 0.20 0.10
0 1 c f g
0.05 0.05
a b
Symbol Codeword `i
a 0000 4
b 0001 4
c 001 3
d 01 2
e 10 2
f 110 3
g 111 3
1 Overview
2 Prim’s Algorithm
3 Kruskal’s Algorithm
4 Dijkstra’s Algorithm
5 Data Compression
6 Summary