0% found this document useful (0 votes)
19 views49 pages

13 Union-Find

This document discusses minimum spanning trees (MSTs). It defines MSTs and provides properties like the cycle property and partition property. It also discusses applications and classic algorithms for finding MSTs like Kruskal's and Prim's algorithms.

Uploaded by

Arhaan Khaku
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views49 pages

13 Union-Find

This document discusses minimum spanning trees (MSTs). It defines MSTs and provides properties like the cycle property and partition property. It also discusses applications and classic algorithms for finding MSTs like Kruskal's and Prim's algorithms.

Uploaded by

Arhaan Khaku
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 49

COSC 222 Data

Structures
Yves Lucet

Shiyu Ji, CC BY-SA 4.0, via Wikimedia Commons

1
"Hash table 3 1 1 0 1 0 0 SP" by Jorge Stolfi - Own work. Licensed under CC BY-SA 3.0 via Commons -
https://commons.wikimedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svg#/media/File:Hash_table_3_1_1_0_1_0_0_SP.svg
CC BY-SA 3.0
https://en.wikipedia.org/wiki/Skip_list#/media/File:Skip_list_add_element-en.gif
Wk Class Date Activity Reading/Prep given Peer Lab
1 1 Sep 06 Syllabus, TBL, Java review NO LAB
1a Sep 08 Git, testing Java, Generics, Testing
2 2 Sep 13 RAT1: generics; unit testing Complexity 1 unit testing
3 Sep 15 Lect: Complexity, streams Lists
3 4 Sep 20 Build & Critic (training) iMAT1 2 coverage testing
5 Sep 22 tMAT1 Recursion
4 6 Sep 27 RAT2 Stack, Queue P eer 1 3 streams
7 Sep 29 Build & Critic Iterators
5 8 Oct 04 mini-lecture+exercises iMAT2 4 simulation
9 Oct 06 tMAT2 BST, PQ, heap Holiday: Mon Oct 2
6 10 Oct 11 RAT3 Hash, skip list P eer 2
11 Oct 13 Hash table, Skiplist, bottom-up
14.6heap
Shortest
construction
path Holiday: Mon Oct 9

7 12 Oct 18 Dijsktra+adaptable PQ Union-find 5 hashing


13 Oct 20 Union-find/Disjoint sets iMAT3
8 14 Oct 25 tMAT3 Search Trees AVL/RB 6 connected
15 Oct 27 Lecture BST, AVL, (2,4), RB B-Trees P eer 3
9 16 Nov 01 B-trees iMAT4
17 Nov 03 tMAT4
10 18 Nov 08 Midterm review NO LAB
19 Nov 10 Midterm
11 Nov 15 Reading week
Nov 17 Reading week Text processing
12 20 Nov 22 Pattern matching KMP, BM, Trie 7 matching
21 Nov 24 RAT4 Huffman coding
13 22 Nov 29 Huffman coding iMAT5 8 Trie vs. BST vs. Hash tab
2
23 Dec 01 tMAT5
14 24 Dec 06 Review/Course Evaluation P eer 4 NO LAB
Union-Find aka Disjoint-set
Partition Structures

Motivation: Kruskal’s algorithms need


to maintain a partition

Union-Find 3
History of Minimum Spanning Trees

History Fastest
• Boruvka’s 1926 • Karger, Klein & Tarjan 1995
• Choquet 1938, Florek et al. • randomized
1951, Sollin 1965 • linear time
• Prim’s 1930 • Boruvka’s + reverse-delete

• Jarnik; Prim 1957, Dijkstra • Chazelle 2000


1959 • nonrandomized
• Kruskal’s 1956 •
in practise:
• Reverse-delete
• soft heap
• Kruskal 1956
• Pettie & Ramachandran 2002
• Optimal but unknown complexity
https://en.wikipedia.org/wiki/Minimum_spanning_tree#Classic_algorithms 4
Other Minimum Spanning Tree
Algorithms
• Prim’s
• start from one node
• grow cloud by one node with smaller weight
• similar to Dijkstra
• Kruskal’s
• start with all nodes as spanning trees
• merge 2 spanning trees by one edge with lowest weight provided
that does not create a cycle
• Reverse-delete
• start from full graph
• delete edge with largest weight unless it disconnects the graph
• loop till all edges considered
5
MST application
Travelling salesman: given a list of cities and their
connections, visit each city only once, go back
to your starting city and minimize the distance
travelled.

Optimal route of a
salesman visiting the
15 biggest cities in
Germany. The route
shown is the shortest
of the 43,589,145,600
possible ones.

6
By The original uploader was Kapitän Nemo at German Wikipedia. -
https://www.cia.gov/cia/publications/factbook/maps/gm-map.gif, Public Domain,
https://commons.wikimedia.org/w/index.php?curid=5584283
Travelling salesman
Problem is hard but…
• Can get a lower bound using MST
• Can estimate the distance to the best solution

7
Minimum Spanning Trees

Minimum Spanning Trees 8


Minimum Spanning Trees
Spanning subgraph
Subgraph of a graph G containing ORD
10
all the vertices of G 1 PIT
Spanning tree
DEN 6 7
Spanning subgraph that is itself a
(free) tree 9
3 DCA
Minimum spanning tree (MST) STL
4
Spanning tree of a weighted graph
with minimum total edge weight 8 5 2
Applications
DFW ATL
Communications networks
Transportation networks

9
Cycle Property
Cycle Property: f 8
4
Let T be a minimum spanning tree of a C 9
weighted graph G 2 6
3 e
Let e be an edge of G that is not in T and let 7
C be the cycle formed by e with T 8

For every edge f of C, weight(f)  weight(e) 7


Proof: Replacing f with e yields
By contradiction a better spanning tree
If weight(f) > weight(e) we can get a f 8
spanning tree of smaller weight by 4
replacing e with f C 9
2 6
3 e
8 7

7 10
Partition Property
U V
f 7
Partition Property:
4
Consider a partition of the vertices of G into 9
subsets U and V 2 5
8
Let e be an edge of minimum weight across
the partition 8 e 3
There is a minimum spanning tree of G 7
containing edge e
Replacing f with e yields
Proof: another MST
Let T be an MST of G
U V
If T does not contain e, consider the cycle C f 7
formed by e with T and let f be an edge of 4
C across the partition 9
2 5
By the cycle property, 8
weight(f)  weight(e)
8 e 3
Thus, weight(f) = weight(e)
We obtain another MST by replacing f with e 7 11
Prim-Jarnik’s Algorithm
Similar to Dijkstra’s algorithm
We pick an arbitrary vertex s and we grow the MST as a
cloud of vertices, starting from s
We store with each vertex v label d(v) representing the
smallest weight of an edge connecting v to a vertex in
the cloud
At each step:
We add to the cloud the vertex u outside the cloud with the
smallest distance label
We update the labels of the vertices adjacent to u
12
Example
 7
2 7 D 2 7 D
B 4 B 4
8 9  5 9 
2 5 F 2 5 F
C 8 C 8
8 3 8 3
E E
A 7 7 A 7 7
0 0

7 7
2 7 D D
2 7
B 4 B 4
5 9  9 4
2 5 F 5 5
C 2 F
8 C 8
8 3 3
E 8
A 7 E
0 7 A 7 7
0
13
Example (contd.)

7
2 7 D
B 4
5 9 4
2 5 F
C 8
8 3
E
A 7 3 7
0 7 D
2
B 4
5 9 4
2 5 F
C 8
8 3
E
A 7 3
0

14
Analysis
 Graph operations
 We cycle through the incident edges once for each vertex
 Label operations
 We set/get the distance, parent and locator labels of vertex z O(deg(z)) times
 Setting/getting a label takes O(1) time
 Priority queue operations
 Each vertex is inserted once into and removed once from the priority queue,
where each insertion or removal takes O(log n) time
 The key of a vertex w in the priority queue is modified at most deg(w) times,
where each key change takes O(log n) time
 Prim-Jarnik’s algorithm runs in O((n + m) log n) time provided the graph is
represented by the adjacency list structure
 Recall that Sv deg(v) = 2m
 The running time is O(m log n) since the graph is connected
15
Kruskal’s Approach
Maintain a partition of the vertices into clusters
Initially, single-vertex clusters
Keep an MST for each cluster
Merge “closest” clusters and their MSTs
A priority queue stores the edges outside clusters
Key: weight
Element: edge
At the end of the algorithm
One cluster and one MST
16
Example
G G
B 8 8
4 B 4
9 E 6 9 E
5 F 5 6
1 1 F
C 11 3 C 11 3
2 2
7 D H 7 H
D
A 10 A 10

G G
B 8 B 8
4 4
9 E 6 9 E
5 5 6
1 F 1 F
C 11 3 C 11 3
2 2
7 D H 7 H
D
A 10 A 10
Campus Tour 17
Example (contd.)
G G
B 8 8
4 B 4
9 E 6 9 E
5 5 6
1 F 1 F
C 11 3 C 11 3
2 2
7 D H 7 H
D
A 10 A 10
four steps

s
ep
st
G G
8 o 8
tw
B 4 B 4
9 E 6 9 E 6
1 5 F 1 5 F
C 11 3 C 11 3
2 2
7 D H 7 D H
A 10 A 10
Campus Tour 18
https://cmps-people.ok.ubc.ca/ylucet/DS/Kruskal.html
Data Structure for Kruskal’s
Algorithm
 The algorithm maintains a forest of trees
A forest is a disjoint union of trees
 A priority queue extracts the edges by increasing weight
 An edge is accepted it if connects distinct trees
 We need a data structure that maintains a partition, i.e.,
a collection of disjoint sets, with operations:
 makeSet(u): create a set consisting of u
 find(u): return the set storing u
 union(A, B): replace sets A and B with their union
19
Union-Find Partition Structures

Motivation: Kruskal’s algorithms need


to maintain a partition

Union-Find 20
Partitions with Union-Find
Operations
makeSet(x): Create a singleton set containing
the element x and return the position storing x
in this set
union(A, B): Return the set A U B, destroying
the old A and B
find(p): Return the set containing the element at
position p

21
List-based Implementation
Each set is stored in a sequence represented
with a linked-list
Each node should store an object containing the
element and a reference to the set name

Union-Find 22
Analysis of List-based
Representation
Total time needed to do n unions and finds is
O(n log n)
Union or find take on average

23
Tree-based Implementation
Each element is stored in a node, which contains a pointer
to a set name
A node v whose set pointer points back to v is also a set
name
Each set is a tree, rooted at a node with a self-referencing
set pointer
For example: The sets “1”, “2”, and “5”:
1 2 5

4 7 3 6 8 10

9 11
12 24
Union-Find Operations
5
To do a union, simply make 2
the root of one tree point to 8 10
the root of the other 3 6
11
9 12

To do a find, follow set-name 5


pointers from the starting 2
node until reaching a node 8 10
whose set-name pointer 3 6
refers back to itself 11
9 12
25
Union-Find Heuristic 1
Union by size:
When performing a union, make
the root of smaller tree point to
the root of the larger 5
Implies O(n log n) time for 2
performing n union-find 8 10
operations: 3 6
Each time we follow a pointer, we 11
are going to a subtree of size at 9 12
least double the size of the
previous subtree
Thus, we will follow at most O(log
n) pointers for any find.
26
Union-Find Heuristic 2
Path compression:
After performing a find, compress all the pointers on the path just
traversed so that they all point to the root

5 5

8 10 8 10

11 11

2 12 2 12

3 6 3 6

9 9

Implies O(n log* n) time for performing n union-find operations:


Proof is somewhat involved… (and not in the book) 27
Array-based implementation
https://cmps-people.ok.ubc.ca/ylucet/DS/DisjointSets.html

28
Complexity
• 1964 Galler & Fischer
• 1973 Hopcroft & Ullman

29
https://en.wikipedia.org/wiki/Disjoint-set_data_structure
Iterated logarithm
Log star
Number of times the logarithm function must be
iteratively applied before the result is less or
equal to 1

x lg* x
(−∞, 1] 0
(1, 2] 1
(2, 4] 2
(4, 16] 3
(16, 65536] 4
30
https://en.wikipedia.org/wiki/Iterated_logarithm (65536, 2 65536
] 5
Complexity
• 1964 Galler & Fischer
• 1973 Hopcroft & Ullman
• 1975 Tarjan

31
https://en.wikipedia.org/wiki/Disjoint-set_data_structure
Inverse Ackermann function
Ackermann function

It grows very rapidly


is an integer with 19,729 digits
grows very rapidly
32
https://en.wikipedia.org/wiki/Ackermann_function#Inverse
Inverse Ackermann function
grows very rapidly
So grows very slowly and is denoted
is less than 5 for any practical input size since is
on the order of

33
https://en.wikipedia.org/wiki/Ackermann_function#Inverse
Complexity
• 1964 Galler & Fischer
• 1973 Hopcroft & Ullman
• 1975 Tarjan
• 1979 Tarjan showed this is a lower bound for a
restricted case
• 1989 Fredman & Sak proved the optimality of
union-find, i.e., the upper bound is in fact a
lower bound

34
https://en.wikipedia.org/wiki/Disjoint-set_data_structure
Other Minimum Spanning Tree
Algorithms
• Prim’s
• start from one node
• grow cloud by one node with smaller weight
• similar to Dijkstra
• Kruskal’s
• start with all nodes as spanning trees
• merge 2 spanning trees by one edge with lowest weight provided
that does not create a cycle
• Reverse-delete
• start from full graph
• delete edge with largest weight unless it disconnects the graph
• loop till all edges considered
35
Example: reverse delete
G
G B 8
8 4
B 4 9 E 6
E 5 F
9 6 1 3
1 5 F C 11
C 11 2
2 7 D H
7 H A
D 10
A 10

G G
8 8
B B 4
4 E
9 E 9 6
5 6 1 5 F
1 F 3
C 11 3 C 11
2 2
7 H 7 H
D D
A A 10
10
36
Example (contd.)
G
8
B 4
9 E 6
1 5 F
C 11 3
2
7 H
D
A 10

G
8
B 4
9 E 6
1 5 F
C 11 3
2
7 H
D
A 10
10
Campus Tour 37
https://cmps-people.ok.ubc.ca/ylucet/DS/Kruskal.html
Application

38
39
40
41
42
Exercise
A graph G is bipartite if its vertices can be
partitioned into two sets X and Y such that every
edge in G has one end vertex in X and the other
in Y. Design and analyze an efficient algorithm
for determining if an undirected graph G is
bipartite (without knowing the sets X and Y in
advance).

43

http://mathonline.wikidot.com/bipartite-and-complete-bipartite-graphs
Nonbipartite: there is an edge with 2 end points
Bipartite
that have the same color

44
Solution
1. Do a breadth-first search to construct a
breadth-first search tree T
2. Place the vertices on even levels in X and the
ones in odd levels in Y.
3. Now double-check that there are no edges
between a pair of vertices in X or a pair of
vertices in Y.
This algorithm runs in O(n+m) time.

See for example https://www.baeldung.com/cs/graphs-bipartite,


45
https://www.geeksforgeeks.org/bipartite-graph/
Data Structures
Summary
Linear "Hash table 3 1 1 0 1 0 0 SP" by Jorge Stolfi - Own work. Licensed under CC BY-SA 3.0 via Commons
-
https://commons.wikimedia.org/wiki/File:Hash_table_3_1_1_0_1_0_0_SP.svg#/media/File:Hash_table

• Array, list, queue, stack, deque


_3_1_1_0_1_0_0_SP.svg

Nonlinear
• Binary tree, binary search tree
• Priority queue
– heap
• Array-based, pointer-based
• Adaptable priority queue
• Bottom-up heap construction
• Hash table
– Open addressing linear probing
– Separate Chaining Shiyu Ji, CC BY-SA 4.0,
via Wikimedia Commons

• Skip list
• Graph
– Adjacency list, adjacency matrix
• Union-find/disjoint-set

CC BY-SA 3.0 https://en.wikipedia.org/wiki/Skip_list#/media/File:Skip_list_add_element-en.gif


47

Questions?

This Photo by Unknown Author is licensed under


CC BY
48

Date Files Readings


1 readings-Java-UnitTesting.txtDownload
Sep 13 1 readings-Java-UnitTesting.txt
1 Unit Testing handout.pdfDownload 1 Unit Testing handout.pdf
Sep 15 The readings should be
2 readings-complexity.txtDownload 2 readings-complexity.txt
done before the indicated
Sep 20 3 readings-Array-List.txtDownload 3 readings-Array-List.txt
date with questions posted
5_6 readings-recursion-stack-queue.txt on Ms Teams.
Sep 27
Download 5_6 readings-recursion-stack-queue.txt
Oct 4 7 readings-iterator.txtDownload 7 readings-iterator.txt
Oct 11 8_9 readings Trees-PQ.txtDownload 8_9 readings Trees-PQ.txt
Oct 13 10 reading hash skiplist.txtDownload 10 reading hash skiplist.txt
Oct 18 11 readings Dijkstra.txtDownload 11 readings Dijkstra.txt
Oct 20 12 reading union-find.txtDownload 12 reading union-find.txt
Oct 27 14 readings search trees.txtDownload 14 readings search trees.txt
Nov 17 15 reading (2,4)-B trees.txtDownload 15 reading (2,4)-B trees.txt
21 readings text processing.txt
Nov 24
Download 21 readings text processing.txt
Nov 29 22 readings RK Trie.txtDownload 22 readings RK Trie.txt
Dec 1 23 readings Huffman.txt
49

End of class This Photo by Unknown Author is licensed under CC BY

You might also like