13 Union-Find
13 Union-Find
Structures
Yves Lucet
1
"Hash table 3 1 1 0 1 0 0 SP" by Jorge Stolfi - Own work. Licensed under CC BY-SA 3.0 via Commons -
https://commons.wikimedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svg#/media/File:Hash_table_3_1_1_0_1_0_0_SP.svg
CC BY-SA 3.0
https://en.wikipedia.org/wiki/Skip_list#/media/File:Skip_list_add_element-en.gif
Wk Class Date Activity Reading/Prep given Peer Lab
1 1 Sep 06 Syllabus, TBL, Java review NO LAB
1a Sep 08 Git, testing Java, Generics, Testing
2 2 Sep 13 RAT1: generics; unit testing Complexity 1 unit testing
3 Sep 15 Lect: Complexity, streams Lists
3 4 Sep 20 Build & Critic (training) iMAT1 2 coverage testing
5 Sep 22 tMAT1 Recursion
4 6 Sep 27 RAT2 Stack, Queue P eer 1 3 streams
7 Sep 29 Build & Critic Iterators
5 8 Oct 04 mini-lecture+exercises iMAT2 4 simulation
9 Oct 06 tMAT2 BST, PQ, heap Holiday: Mon Oct 2
6 10 Oct 11 RAT3 Hash, skip list P eer 2
11 Oct 13 Hash table, Skiplist, bottom-up
14.6heap
Shortest
construction
path Holiday: Mon Oct 9
Union-Find 3
History of Minimum Spanning Trees
History Fastest
• Boruvka’s 1926 • Karger, Klein & Tarjan 1995
• Choquet 1938, Florek et al. • randomized
1951, Sollin 1965 • linear time
• Prim’s 1930 • Boruvka’s + reverse-delete
Optimal route of a
salesman visiting the
15 biggest cities in
Germany. The route
shown is the shortest
of the 43,589,145,600
possible ones.
6
By The original uploader was Kapitän Nemo at German Wikipedia. -
https://www.cia.gov/cia/publications/factbook/maps/gm-map.gif, Public Domain,
https://commons.wikimedia.org/w/index.php?curid=5584283
Travelling salesman
Problem is hard but…
• Can get a lower bound using MST
• Can estimate the distance to the best solution
7
Minimum Spanning Trees
9
Cycle Property
Cycle Property: f 8
4
Let T be a minimum spanning tree of a C 9
weighted graph G 2 6
3 e
Let e be an edge of G that is not in T and let 7
C be the cycle formed by e with T 8
7 10
Partition Property
U V
f 7
Partition Property:
4
Consider a partition of the vertices of G into 9
subsets U and V 2 5
8
Let e be an edge of minimum weight across
the partition 8 e 3
There is a minimum spanning tree of G 7
containing edge e
Replacing f with e yields
Proof: another MST
Let T be an MST of G
U V
If T does not contain e, consider the cycle C f 7
formed by e with T and let f be an edge of 4
C across the partition 9
2 5
By the cycle property, 8
weight(f) weight(e)
8 e 3
Thus, weight(f) = weight(e)
We obtain another MST by replacing f with e 7 11
Prim-Jarnik’s Algorithm
Similar to Dijkstra’s algorithm
We pick an arbitrary vertex s and we grow the MST as a
cloud of vertices, starting from s
We store with each vertex v label d(v) representing the
smallest weight of an edge connecting v to a vertex in
the cloud
At each step:
We add to the cloud the vertex u outside the cloud with the
smallest distance label
We update the labels of the vertices adjacent to u
12
Example
7
2 7 D 2 7 D
B 4 B 4
8 9 5 9
2 5 F 2 5 F
C 8 C 8
8 3 8 3
E E
A 7 7 A 7 7
0 0
7 7
2 7 D D
2 7
B 4 B 4
5 9 9 4
2 5 F 5 5
C 2 F
8 C 8
8 3 3
E 8
A 7 E
0 7 A 7 7
0
13
Example (contd.)
7
2 7 D
B 4
5 9 4
2 5 F
C 8
8 3
E
A 7 3 7
0 7 D
2
B 4
5 9 4
2 5 F
C 8
8 3
E
A 7 3
0
14
Analysis
Graph operations
We cycle through the incident edges once for each vertex
Label operations
We set/get the distance, parent and locator labels of vertex z O(deg(z)) times
Setting/getting a label takes O(1) time
Priority queue operations
Each vertex is inserted once into and removed once from the priority queue,
where each insertion or removal takes O(log n) time
The key of a vertex w in the priority queue is modified at most deg(w) times,
where each key change takes O(log n) time
Prim-Jarnik’s algorithm runs in O((n + m) log n) time provided the graph is
represented by the adjacency list structure
Recall that Sv deg(v) = 2m
The running time is O(m log n) since the graph is connected
15
Kruskal’s Approach
Maintain a partition of the vertices into clusters
Initially, single-vertex clusters
Keep an MST for each cluster
Merge “closest” clusters and their MSTs
A priority queue stores the edges outside clusters
Key: weight
Element: edge
At the end of the algorithm
One cluster and one MST
16
Example
G G
B 8 8
4 B 4
9 E 6 9 E
5 F 5 6
1 1 F
C 11 3 C 11 3
2 2
7 D H 7 H
D
A 10 A 10
G G
B 8 B 8
4 4
9 E 6 9 E
5 5 6
1 F 1 F
C 11 3 C 11 3
2 2
7 D H 7 H
D
A 10 A 10
Campus Tour 17
Example (contd.)
G G
B 8 8
4 B 4
9 E 6 9 E
5 5 6
1 F 1 F
C 11 3 C 11 3
2 2
7 D H 7 H
D
A 10 A 10
four steps
s
ep
st
G G
8 o 8
tw
B 4 B 4
9 E 6 9 E 6
1 5 F 1 5 F
C 11 3 C 11 3
2 2
7 D H 7 D H
A 10 A 10
Campus Tour 18
https://cmps-people.ok.ubc.ca/ylucet/DS/Kruskal.html
Data Structure for Kruskal’s
Algorithm
The algorithm maintains a forest of trees
A forest is a disjoint union of trees
A priority queue extracts the edges by increasing weight
An edge is accepted it if connects distinct trees
We need a data structure that maintains a partition, i.e.,
a collection of disjoint sets, with operations:
makeSet(u): create a set consisting of u
find(u): return the set storing u
union(A, B): replace sets A and B with their union
19
Union-Find Partition Structures
Union-Find 20
Partitions with Union-Find
Operations
makeSet(x): Create a singleton set containing
the element x and return the position storing x
in this set
union(A, B): Return the set A U B, destroying
the old A and B
find(p): Return the set containing the element at
position p
21
List-based Implementation
Each set is stored in a sequence represented
with a linked-list
Each node should store an object containing the
element and a reference to the set name
Union-Find 22
Analysis of List-based
Representation
Total time needed to do n unions and finds is
O(n log n)
Union or find take on average
23
Tree-based Implementation
Each element is stored in a node, which contains a pointer
to a set name
A node v whose set pointer points back to v is also a set
name
Each set is a tree, rooted at a node with a self-referencing
set pointer
For example: The sets “1”, “2”, and “5”:
1 2 5
4 7 3 6 8 10
9 11
12 24
Union-Find Operations
5
To do a union, simply make 2
the root of one tree point to 8 10
the root of the other 3 6
11
9 12
5 5
8 10 8 10
11 11
2 12 2 12
3 6 3 6
9 9
28
Complexity
• 1964 Galler & Fischer
• 1973 Hopcroft & Ullman
29
https://en.wikipedia.org/wiki/Disjoint-set_data_structure
Iterated logarithm
Log star
Number of times the logarithm function must be
iteratively applied before the result is less or
equal to 1
x lg* x
(−∞, 1] 0
(1, 2] 1
(2, 4] 2
(4, 16] 3
(16, 65536] 4
30
https://en.wikipedia.org/wiki/Iterated_logarithm (65536, 2 65536
] 5
Complexity
• 1964 Galler & Fischer
• 1973 Hopcroft & Ullman
• 1975 Tarjan
31
https://en.wikipedia.org/wiki/Disjoint-set_data_structure
Inverse Ackermann function
Ackermann function
33
https://en.wikipedia.org/wiki/Ackermann_function#Inverse
Complexity
• 1964 Galler & Fischer
• 1973 Hopcroft & Ullman
• 1975 Tarjan
• 1979 Tarjan showed this is a lower bound for a
restricted case
• 1989 Fredman & Sak proved the optimality of
union-find, i.e., the upper bound is in fact a
lower bound
34
https://en.wikipedia.org/wiki/Disjoint-set_data_structure
Other Minimum Spanning Tree
Algorithms
• Prim’s
• start from one node
• grow cloud by one node with smaller weight
• similar to Dijkstra
• Kruskal’s
• start with all nodes as spanning trees
• merge 2 spanning trees by one edge with lowest weight provided
that does not create a cycle
• Reverse-delete
• start from full graph
• delete edge with largest weight unless it disconnects the graph
• loop till all edges considered
35
Example: reverse delete
G
G B 8
8 4
B 4 9 E 6
E 5 F
9 6 1 3
1 5 F C 11
C 11 2
2 7 D H
7 H A
D 10
A 10
G G
8 8
B B 4
4 E
9 E 9 6
5 6 1 5 F
1 F 3
C 11 3 C 11
2 2
7 H 7 H
D D
A A 10
10
36
Example (contd.)
G
8
B 4
9 E 6
1 5 F
C 11 3
2
7 H
D
A 10
G
8
B 4
9 E 6
1 5 F
C 11 3
2
7 H
D
A 10
10
Campus Tour 37
https://cmps-people.ok.ubc.ca/ylucet/DS/Kruskal.html
Application
38
39
40
41
42
Exercise
A graph G is bipartite if its vertices can be
partitioned into two sets X and Y such that every
edge in G has one end vertex in X and the other
in Y. Design and analyze an efficient algorithm
for determining if an undirected graph G is
bipartite (without knowing the sets X and Y in
advance).
43
http://mathonline.wikidot.com/bipartite-and-complete-bipartite-graphs
Nonbipartite: there is an edge with 2 end points
Bipartite
that have the same color
44
Solution
1. Do a breadth-first search to construct a
breadth-first search tree T
2. Place the vertices on even levels in X and the
ones in odd levels in Y.
3. Now double-check that there are no edges
between a pair of vertices in X or a pair of
vertices in Y.
This algorithm runs in O(n+m) time.
Nonlinear
• Binary tree, binary search tree
• Priority queue
– heap
• Array-based, pointer-based
• Adaptable priority queue
• Bottom-up heap construction
• Hash table
– Open addressing linear probing
– Separate Chaining Shiyu Ji, CC BY-SA 4.0,
via Wikimedia Commons
• Skip list
• Graph
– Adjacency list, adjacency matrix
• Union-find/disjoint-set
Questions?