0% found this document useful (0 votes)
75 views19 pages

DisjointSet Slide

Union-find data structures can be used to maintain disjoint sets and perform operations like determining if two elements are connected. The key operations are MAKE-SET, FIND-SET, and UNION. Several optimizations can achieve O(log n) time complexity for these operations. Union-by-size and union-by-rank heuristics improve performance by linking smaller or lower ranked trees to larger/higher ranked ones. Path compression further optimizes FIND-SET by collapsing paths directly to the root. Together, union-by-rank and path compression give O(log n) time bounds for all operations.

Uploaded by

RishabhGupta251
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views19 pages

DisjointSet Slide

Union-find data structures can be used to maintain disjoint sets and perform operations like determining if two elements are connected. The key operations are MAKE-SET, FIND-SET, and UNION. Several optimizations can achieve O(log n) time complexity for these operations. Union-by-size and union-by-rank heuristics improve performance by linking smaller or lower ranked trees to larger/higher ranked ones. Path compression further optimizes FIND-SET by collapsing paths directly to the root. Together, union-by-rank and path compression give O(log n) time bounds for all operations.

Uploaded by

RishabhGupta251
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

DISJOINT SET DATA STRUCTURE

Partha P. Chakrabarti & Aritra Hazra


Department of Computer Science and Engineering
Indian Institute of Technology Kharagpur
Disjoint-Set Data Structures: Applications
Minimum Spanning Tree of Graph (G)
Algorithm MST_Kruskal ( G = (V,E) ) {
A = { };
for each v in V do MAKE-SET(v);
for each edge e = (u, v) in E ordered by
increasing weight(u, v) do {
if FIND-SET(u) ≠ FIND-SET(v) then {
A = A + {(u, v)};
UNION(FIND-SET(u), FIND-SET(v));
}
}
return A;
}
Disjoint-Set Data-Type and Operations
•  Primary Operations:
–  MAKE-SET(x): create a new set containing only element x
–  FIND-SET(x): return a canonical element in the set containing x
–  UNION(x, y): replace the sets containing x and y with their union

•  Performance parameters:
–  m = number of calls to FIND-SET and UNION operations
disjoint sets =
–  n = number of elements = number of calls to MAKE-SET
connected components
•  Application: Dynamic connectivity over initially empty graph
–  ADD-NODE(u): add node u (1 MAKE-SET operation)
–  ADD-EDGE(u, v): add an edge between nodes u and v (1 UNION operation)
–  IS-CONNECTED(u, v): is there a path between u and v ? (2 FIND-SET operations)
Disjoint-Set Operations: Implementation (1)
Linked List Implementation Set A: {c, h ,e, b}
•  MAKE-SET(x): O(1)
–  need to create only one
node created with
appropriate pointers Set B: {f, g, d}
•  FIND-SET(x): O(n)
–  need to traverse entire
linked list to find x
Set (A U B): {c, h, e, b, f, g, d}
•  UNION(x,y): O(n)
–  need to point back all
back-pointers of second
list to head of first list
Disjoint-Set Operations: Implementation (2)
•  Array Representation UNION(3,5) or
–  Represent each set as tree of elements UNION(8,6) or
UNION(8,7)
–  Allocate an array of parent[] of length n
–  parent[i]=j (parent of element i is j)

/ /
0 0 •  Analysis of Operations:
–  Total zeros in array = Disjoint-sets
X X
–  FIND-SET(x): O(n) worst-case
–  UNION(x,y): O(n) worst-case
•  UNION(FIND-SET(x), FIND-SET(y))
•  O(n) due to FIND-SET operation
Suppress self-loops for root for brevity! Solution: Smart Union-Find Algorithms !!
Smart Disjoint-Set Operations: Union-by-Size
•  Union-by-Size UNION(x,y) {
–  Maintain a tree size r ß FIND-SET(x);
(number of nodes) for s ß FIND-SET(y);
each root node if(r == s) return r;
–  Link root of smaller else if(size[r] > size[s]) {
tree to root of larger parent[s] ß r;
tree (break tries size[r] = size[r] + size[s];
arbitrarily) return r;
}
FIND-SET(x) { else { UNION(3,5)
while(x is not parent) parent[r] ß s;
x ß parent[x]; size[s] = size[r] + size[s];
return x; return s;
} MAKE-SET(x) {
}
parent[x] ß 0;
}
size[x] ß 1;
return x;
}
Analysis of Union-by-Size Heuristic (1)
Property: Using union-by-size, for every root node r, we have size[r] ≥ 2height(r)
Proof: [ by induction on number of links ]
–  Base case: singleton tree has size 1 and height 0
–  Inductive hypothesis: assume true after first i links

–  Tree rooted at r changes only when a smaller (or


equal) size tree rooted at s is linked into r

–  Case 1. [ height(r) > height(s) ]


sizeʹ[r] > size[r] ≥ 2height(r) = 2heightʹ(r)
–  Case 2. [ height(r) ≤ height(s) ]
sizeʹ[r] = size[r] + size[s] ≥ 2 size[s] ≥ 2 x 2height(s)
= 2height(s) + 1 = 2heightʹ(r)
Analysis of Union-by-Size Heuristic (2)
•  Theorem: Using union-by-size, any UNION or FIND-SET operation takes O(log2 n)
time in the worst case, where n is the number of elements
•  Proof:
–  The running time of each operation is bounded by the tree height
–  Using union-by-size, a tree with n nodes can have height at most log2 n
–  By the previous property, the height is ≤ ⎣log2 n⎦

•  The UNION operation takes O(1) time except for its two calls to FIND-SET
–  FIND-SET required to find out the set representative (which is the root)

•  m number of UNION and FIND-SET operations takes a total of O(m log2 n) time
Smart Disjoint-Set Operations: Union-by-Rank
•  Union-by-Rank rank = height
–  Maintain an integer
rank for each node, UNION(x,y) {
initially 0 r ß FIND-SET(x);
s ß FIND-SET(y);
–  Link root of smaller
if (r == s) return r;
rank to root of larger
else if (rank[r] ≥ rank[s]) {
rank; if tie, increase
rank of larger root by 1 parent[s] ß r;
if(rank[r] == rank[s])
FIND-SET(x) { rank[r] = rank[r] + 1; UNION(3,5)
while(x is not parent) return r;
x ß parent[x]; }
return x; else {
} MAKE-SET(x) {
parent[r] ß s;
parent[x] ß 0;
return s;
rank[x] ß 0;
}
return x;
}
}
Analysis of Union-by-Rank Heuristic (1)
Property-1: If x is not a root node, then rank[x] < rank[parent[x]]
Proof: A node of rank k is created only by linking two roots of rank k – 1.

Property-2: If x is not a root node, then rank[x] will never change again
Proof: Rank changes only for roots; a non-root never becomes a root.

Property-3: If parent[x] changes,


then rank[parent[x]] strictly
increases.
Proof: The parent can change
only for a root, so before linking
parent[x] = 0. After x is linked
using union-by-rank to new root
r we have rank[r] > rank[x].
Analysis of Union-by-Rank Heuristic (2)
Property-4: Any root node of rank k
has ≥ 2k nodes in its tree
Proof: [ by induction on k ]
•  Base case: true for k = 0
•  Inductive hypothesis: assume true
for k – 1
•  A node of rank k is created only
by linking two roots of rank k – 1
•  By inductive hypothesis, each of
two sub-tree has ≥ 2k – 1 nodes
=> resulting tree has ≥ 2k nodes

Property-5: The highest rank of a node is ≤ ⎣log2 n⎦


Proof: Immediately concluded from Property-1 and Property-4
Analysis of Union-by-Rank Heuristic (3)
Property-6: For any integer k ≥ 0, there are ≤ n / 2k nodes with rank k
Proof:
•  Any root node of rank k has ≥ 2k descendants. [by Property-4]
•  Any non-root node of rank k has ≥ 2k descendants because:
§  it had this property just before it became a non-root [by Property-4]
§  its rank does not change once it became a non-root [by Property-2]
§  its set of descendants does not change once it became a non-root
•  Different nodes of rank k cannot have common descendants [by Property-1]
Theorem: Using union-by-rank, any
UNION or FIND-SET operation takes
O(log2 n) time in the worst case, where
n is the number of elements.
Proof: The running time of UNION and
FIND-SET is bounded by the tree
height ≤ ⎣log2 n⎦ [by Property-5]
Smart Disjoint-Set Operations: Path Compression
•  When finding the root r of the tree containing x, change the parent pointer of all
nodes along the path to point directly to r
Path Compression: Example

FIND-SET(x) {
if(x is not parent)
parent[x] ß FIND-SET(parent[x]);
return x;
}
Properties of Union-by-Rank + Path Compression (1)
Property-0: The tree roots, node ranks, and elements within a tree are the same with or
without path compression.
Property-1: If x is not a root node, then rank[x] < rank[parent[x]]
Proof: Path compression can make x point to only an ancestor of parent[x]
Property-2: If x is not a root node, then rank[x] will never change again
Property-3: If parent[x] changes, then rank[parent[x]] strictly increases.
Proof: Path compression doesn’t change any ranks, but it can change parents
If parent[x] doesn’t change during a path compression the inequality continues to hold
if parent[x] changes, then rank[parent[x]] strictly increases
Property-4: Any root node of rank k has ≥ 2k nodes in its tree
Property-5: The highest rank of a node is ≤ ⎣log2 n⎦
Property-6: For any integer k ≥ 0, there are ≤ n / 2k nodes with rank k
Properties of Union-by-Rank + Path Compression (2)
•  Definitions: Rank Groups

–  log* n = 0, when n ≤ 1 i times


Inverse Ackerman

= MIN { i ≥ 0 | log2 log2 … log2 n ≤1}, when n ≥ 2


Function

–  log* n = 0, when n ≤ 1 Recursive Definition


= 1 + log* (log2 n), otherwise
–  Ackerman Function, F(j) = 1, when j = 0
= 2F(j-1), when j ≥ 1
Property-7: The largest
group number is ≤ log* Property-8: Number of nodes in a particular group g is
(log2 n) = log* n – 1 given by, ng < n/F(g)
Proof: Since largest Proof: ng < ΣF(g)r=F(g-1) +1 n/2r < 2n/2F(g-1)+1 = n/2F(g-1) = n/F(g)
possible rank is ⎣log2 n⎦, [ since, n/2r + n/2r+1 + n/2r+2 + … + n/2r+k
hence the result < (n/2r) Σ∞0 (1/2k) = 2n/2r ]
Analysis of Union-by-Rank with Path Compression (1)
x
•  Case-1: If v is root (= x), a child of root or if parent[v] is
in a different rank group; then we charge ONE unit of v
time to FIND-SET operation
•  Case-2: If v ≠ x, and both v and parent[u] are in the parent[u]
same group, then we charge ONE unit of time to node v
u
•  Observation-1: Ranks of nodes in a path from u to x
increases monotonically path-compression
w.r.t. FIND-SET(u)
–  After x is found to be the root, we do path
compression x
–  If later on, x becomes a child of another node and v
& x are in different groups, no more node charges u v
on v in later FIND-SET operations
Analysis of Union-by-Rank with Path Compression (2)
•  Observation-2: If a node v is in group g (g > 0), v can be moved and charged at most
[F(g) – F(g-1)] times before it acquires a parent in a higher group.

•  Complexity Analysis:
–  Time Complexity = (Number of nodes in group g) x (Movement charges across
groups) x (Movement charges with groups) = (n/F(g)) x (log* n) x [F(g) – F(g-1)]
≤ n log* n [ since, (n/F(g))x[F(g) – F(g-1)] ≤ n ]

•  Theorem: The time complexity required to process m UNION and FIND-SET


operations using union-by-rank with path-compression heuristic is O(m log* n) in
the worst case
–  which may be also said as O(m), as log*n ≤ 5 practically
(as otherwise n is more than the number of atoms in universe!!)
Thank you

You might also like