0% found this document useful (0 votes)

97 views6 pages

Algorithms Part 1 - Lecture Notes: 1 Union Find

The document provides lecture notes on algorithms for the union find problem. It begins by defining the dynamic connectivity problem and describing two initial algorithms: quick find and quick union. It then discusses improvements to quick union, including weighted quick union and path compression. These improvements reduce the running time from O(N^2) to O(N+Mlog*N). The document concludes by discussing applications of union find, such as percolation modeling, and providing analysis of algorithm running times and memory usage.

Uploaded by

Jack

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

97 views6 pages

Algorithms Part 1 - Lecture Notes: 1 Union Find

Uploaded by

Jack

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Algorithms Part 1 - Lecture Notes

February 13, 2016

Union Find

1.1

Dynamic Connectivity Problem

General Steps to developing a usable algorithm: Model the problem, find an algorithm, calculate its speed, and
improve until satisfied.
Defining the Problem: Given a set of N objects: Union command is connecting two objects, and the find query:
is there a path connecting the two objects?
Implementing the operations:
Find Query - Check if two objects are in the same component (connected component = set of maximal
connected nodes)
Union Command - Replace components containing two objects with their union

1.2

Algorithm 1 - Quick Find

The data structure is a integer array of size N, and p and q are connected iff they have the same id. So, if 1, 2,
3 are connected: then n = 1, 2, 3 will have the same id in the array. Union: To merge components containing
p and q, change all entries whose id equals id[p] to id[q].
Algorithm:
1. Initialize an array of size N and set id of each object to itself.
2. Union: If adding p and q, change all entries with id[p] to id[q].
3. To check if p and q are connected, check if id[p] = id[q]. If true then connected.
Efficiency: Quick-find takes N to initialize, N to get the union and 1 to find. Hence, takes N 2 array accesses
to process sequence of N union commands on N objects.

1.3

Algorithm 2 - Quick Union

Data Structure: Integer array of size N, an6 d the parent of i is id[i]. Now, to check if two nodes are connected,
we just check if they have the same root. Now, for union, to merge components, containing p and q, set the id
of ps root to the id of qs root.

Algorithm:
1. Set if of each object to itself (N array accesses)
2. Union: Given p and q, change the root of p to point to the root of q.
3. To check if p and q are connected: while p 6= id[p], p = id[p] and then return p. This is the parent root
of p, and repeat for q. Then, check if these parent roots are the same.
Defect for this algorithm is that the trees can get too tall, hence the find operation becomes too expensive.

1.4
1.4.1

Improvements
Improvement 1 - Weighted Quick Union

Take steps to avoid having tall trees. Keep track of number of objects in each tree, and then maintain balance
by linking the root of the smaller tree to the root of the larger tree (smaller tree lower). In this data structure,
we need a new array, which counts the number of objects in the tree node at i: size[i]. Modifying quick union:
given nodes p and q: get their roots using the roots function. Now, compare their sizes: if equal then return.
Otherwise: if size[i] < size[j], then id[i] = j and size[j] += size[i].
Proposition 1 Depth of any node x is at most log2 (N ).
Proof Depth of x increased by 1 when tree T1 containing x is merged into another tree T2 . By weighted
algorithm, |T2 | |T1 |. Hence, size of the tree containing x at least doubles. Then, if we start with 1, then
the size of the tree containing x can double at most log2 (N ) times because 2|x| N . Efficiency: Initializing =
O(N ) and connected/union = O(log2 (N )).
1.4.2

Improvement 2 - Path Compression

After computing the root of p, set the id of each examined node to point to that root. Make every other node
in the path point to its grandparent (hence halving the length). In the while loop in the root function: id[i] =
id[id[i]].
Proposition 2 Starting from an empty data structure, any sequence of M union fidn ops on N objects makes
c(N + M log2 N ) array accesses. Here, lg N is the iterated log function: is the number of times we need
to take the log to get N (less than around 5). Hence, we have a linear relationship.
Worst Case Analysis:
Quick Find = O(M N )
Quick Union = O(M N )
Weighted Quick Union = O(N + M log(N ))
Quick Union with path compression = O(N + M log(N ))
weighted quick union + path compression = O(N + M log (N ))

1.5

Union Find Applications

Percolation We have a N by N grid of sites, and each site is open with some probability p. System percolate
iff top and bottom are connected by open sites.
When N is large, there is a sharp threshold p such that: if p > p then it almost certainly percolates, and if
less than then it will almost certainly not percolate. We cannot mathematically find p, but use simulations
to find this value. We can run Monte Carlo simulations: initialize whole grid to be blocked, and then declare
random sites to open up until the top and bottom are connected. Vacancy percentage estimates p. Note that
we have two types of open boxes: open and empty or open and connected.
Checking if N by N system percolates: Create an object for each site and name then from 0 to N 2 1. Sites
are in the same component if they are connected by open sites. Percolates iff any site on the bottom row is
connected to site on top row. If we brute force, then efficiency is O(n2 ). To fix this, we introduce two more
vertices: a virtual top (connecting to every node in the top row), and a virtual bottom connecting to every
node in the bottom row. Now, system percolates iff virtual top site is connected to the virtual bottom site.
Modeling a open: connect it to all the adjacent sites (4 at maximum). After running this simulation, we find
that this threshold is approximately 0.59.

Analysis of Algorithms

2.1

Observations

Example 1: 3-Sum: Given N distinct integers, how many triples sum to exactly 0.
Algorithm
Input: An array of numbers a and a number N , to which the three numbers must sum.
1. Initialize a count variable to 0: count = 0.
2. Three nested for loops: for i < N . Then, for j = i + 1 < N , and finally for k = j + 1 < N .
3. If a[i] + a[j] + a[k] = 0 then count += 1.
Output: The variable count, which is the total number of triples that sum to N .
Calculating the running time: Create a plot of N versus T (N ), and also do a log-log plot. Then, run a regression
to get a straight line. Hence, log(T (N )) = blog(N ) + c. To quickly estimate b run the program doubling the
size of the input, then the log of the ratio converges to the constant b.

2.2

Mathematical Models

We make simplifications: When N is large, the smaller order terms can be ignored. For example, for 2-Sum,
we choose 2 from N , which is N2 , which we then multiply by 2 because we have 2 array accesses, hence we use
N 2 array accesses intotal therefore the running time is of the order O(n2 ). Now, for 3-Sum we are choosing 3
from N , which is N3 , which is of the order 61 N 3 , and we have 3 array access for each tuple, hence 12 N 3 , so the
running time is O(n3 ).

To estimate a discrete sum, we can approximate using an integral (helps to get the high order approximation).
For example:
Z N
N
X
1
1 + 2 + + N =
xdx = N 2
i
2
1
i=1

So, in general, if we are summing a function, we can approximate by integrating the function between 1 and
N . Given f (x) g(x) for large x, we verify the condition that:
f (x)
=1
x g(x)
lim

In general, the running time is:

TN = a1 A + a2 B + a3 C + a4 D + a5 E
where: A = array accesses, B = integer add, C = integer compare, D = increment, and E = variable assignment.
But, in general, we use approximate models where only the highest order term is considered.

2.3

Order of Growth Classification

Common examples for order of growth.

1 (constant): a = b + c: this is a statement. Example: Adding two number.s
log(N ) (logarithmic): while (N > 1) : N = N/2: dividing in half. Example; Binary search.
N (linear): for i in range(N): 1 loop. Example: finding the maximum.
N log(N ) (linearithmic): See section on mergesort.
N 2 (quadratic): for i in range(N)...for j in range(N): double loop. Example: checking all pairs.
N 3 (cubic): Triple for loop, checking all triples.
2N (exponential): Exhaustive search, refer to combinatorial lecture. Example: Checking all subsets.
Example 2: Binary Search: Given a sorted array and a key, find index of the key in the array.
Algorithm
Input: An array and a key.
1. Initialize lo = 0, high = last index of array and mid = middle index.
2. While lo hi:
If key < array[mid] then hi = mid 1
Elif key > array[mid] then lo = mid + 1
Else return mid
3. Return 1
Output: Index of key if it exists or 1 if the key is not in the given array.
Proposition 3 Binary Search uses at most 1 + lg(N ) compares to search in a sorted array of size N .

Proof First, define T (N ) = the number of compares to binary search in a sorted subarray of size N . Now,
we know that T (N ) T ( N2 ) + 1, N > 1, T (1) = 1. Now, applying the recurrence to the RHS we have:
T (N ) T ( N4 ) + 1 + 1. When we repeat this lg(N ) times, we have: T (N ) T ( N
N ) + lg(N ) = 1 + lg(N ).
Now, we cna come up with a faster algorithm for the 3-Sum problem. First, sort the N distinct numbers, and
for every pair of numbers a[i] and a[j], binary search for (a[i] + a[j]). Then only count if a[i] < a[j] < a[k] to
avoid double counting. The first step is any type of sorting algorithm (take insertion sort for example), which
has O(n2 ), and the second step is binary search, which yields a O(n2 log(n)).
2.3.1

Notation

O(n) means that the upper bound for the running time of the algorithm is an.
Note: A function 12N 3 and 11N are both O(N 3 ), because they are both bounded by some aN 3 .
(n) means that the lower bound for the running time of the algorithm is an.
(n) means that the upper and lower bound for the running time have the same order.
The lower bound for both the 1-Sum and 3-Sum is (n), because we need to examine at least the entire array.
For the 3-Sum, there is a gap between the lower and upper bound. We have only found an optimal algorithm
when the upper and lower bound coincide.

2.4

Memory

Typical memory usage: boolean(1), byte(1), char(2), int(4), float(4), long(8), and double(8).
Total memory usage for a data type value:
Primitive type: 4 bytes for int, 8 bytes for double etc
Object Reference: 8 bytes
Array: 24 bytes + memory for each array entry
Object: 16 bytes + memory for each instance variable + 8 (if inner class)
Padding: Round up to multiple of 8 bytes.
Example: How much memory does the Weighted Quick Union Function use as a function of N : 16 bytes for
the object overhead, then we have two arrays each contributing: (4N + 24) + 8, where the 8 bytes are for the
reference to the array. Finally, we also have 8 byes for the padding and 4 for the integer count. Hence, the
total memory usage is: 8N + 88 8N . The padding memory is assigned as follows: first we sum all memory
contributions, and see the minimum amount needed to make the answer a multiple of 8.

3
3.1

Stacks and Queues

Stacks

For stacks: remove the item that was most recently added, and for a queue, examine the item least recently
added.

3.1.1

First Implementation: Linked list

Refer to code files. Every operation takes a constant time in the worst case, and as for space usage: A stack
with N bytes will use of the order 40N bytes. Because: 16 bytes of object overhead + 8 bytes for the inner
class overhead + 8 bytes for the string reference and 8 bytes for the node reference. This does not include the
space for the strings, because they are under the client. Each item stores a node, such that the node contains
the item and the next item (starting from the last node).
3.1.2

Second Implementation: Arrays

Use an array to store N items on the stack. Then to add, we simply push and to remove we simply pop from
the last index (n 1). The defect is that stack overflows when N exceeds the capacity, because we need to
declare the size of the array beforehand. For overflow, we must resize the array (later in the notes). There is a
problem with this in java: when we popped from the stack, java will still hold a reference to that object, and
we must remove this for more efficient memory allocation. If we popped the N index, then we have: String
item = s[N] and then we need two further lines: s[N] = Null and return item. This means garbage collector
can reclaim memory only if no outstanding references.

3.2

Resizing Arrays

Our above implementation had a defect, where we required clients to provide the maximum capacity of the
stack. Out first attempt is as follows: we push and pop terms on the stack to increase or decrease the length.
However, this approach is computationally expensive, because each time we must copy the stack. This can thus
2
require time of the order: 1 + 2 + 3 + + N N2 , which is quadratic. Therefore, we want to avoid doing this.
So, we will reduce the number of times we need to create a new array. Hence, each time we need to extend the
stack, we create a stack of twice the size and copy the items. This requires time: N + 2 + 4 + 8 + + N 3N .
Now, for the case to pop, we cannot use this same idea, where instead of doubling, we halve each time, because
in the worst case, when the client pushes and pops, we are halving and doubling, and this will indeed still take
the quadratic time. Hence, the solution is to halve the size of array when it is a quarter full. Starting from an
empty stack, any sequence of N push and pop operations takes at worst N time, and the best case time is a
constant.
Resizing Array versus Linked List: Linked-list implementation: every operation takes a constant time
in the worst case and uses extra time and space to deal with links. Resizing array implementation: Every
operation takes constant amortized (averaged over the whole process) time and less wasted space. To be sure
that every operation is fast we should use linked list, but if we only care about the total then we can use resizing
array.

DAA Unit-2
No ratings yet
DAA Unit-2
47 pages
DAA Lecture Notes
No ratings yet
DAA Lecture Notes
171 pages
DAA II-Unit (2)(Conflict2024-04-06-14-34-05)
No ratings yet
DAA II-Unit (2)(Conflict2024-04-06-14-34-05)
59 pages
MCS 208 TTE DEC COMPLETE
No ratings yet
MCS 208 TTE DEC COMPLETE
25 pages
G5 - A2SV __ Union Find
No ratings yet
G5 - A2SV __ Union Find
84 pages
G5 - A2SV __ Union Find (No Code)
No ratings yet
G5 - A2SV __ Union Find (No Code)
76 pages
12-13-Union-Find
No ratings yet
12-13-Union-Find
53 pages
Unit - I: Random Access Machine Model
No ratings yet
Unit - I: Random Access Machine Model
39 pages
DSA NOTES
No ratings yet
DSA NOTES
510 pages
M-comp0005-1.1 Fundamentals Steps
No ratings yet
M-comp0005-1.1 Fundamentals Steps
50 pages
Algorithm Methodologies
No ratings yet
Algorithm Methodologies
28 pages
15UnionFind
No ratings yet
15UnionFind
31 pages
4
No ratings yet
4
34 pages
Small 16
No ratings yet
Small 16
77 pages
161 Main
No ratings yet
161 Main
51 pages
Beginner Guide
No ratings yet
Beginner Guide
55 pages
Algorithms Everything
No ratings yet
Algorithms Everything
33 pages
Lecture 15
No ratings yet
Lecture 15
40 pages
Week 6
No ratings yet
Week 6
22 pages
DAA_ans
No ratings yet
DAA_ans
13 pages
ADS Mid-2
No ratings yet
ADS Mid-2
28 pages
Lec_01
No ratings yet
Lec_01
16 pages
Module 2 Daa
No ratings yet
Module 2 Daa
34 pages
Data Structures Cheat Sheet
71% (14)
Data Structures Cheat Sheet
2 pages
Ada Chapt6 Tronsform and Conquer
No ratings yet
Ada Chapt6 Tronsform and Conquer
106 pages
Data Structures and Algorithms: (CS210/ESO207/ESO211)
No ratings yet
Data Structures and Algorithms: (CS210/ESO207/ESO211)
23 pages
ada 2
No ratings yet
ada 2
11 pages
DAANotes
No ratings yet
DAANotes
12 pages
Unit 1 Daa Notes Daa Unit 1 Note
No ratings yet
Unit 1 Daa Notes Daa Unit 1 Note
26 pages
Daa Notes (Final)
No ratings yet
Daa Notes (Final)
41 pages
57.range Query 2-NOTES
No ratings yet
57.range Query 2-NOTES
24 pages
CSE 241 Class Notes
No ratings yet
CSE 241 Class Notes
7 pages
CS603PC_DAA_UNIT-2
No ratings yet
CS603PC_DAA_UNIT-2
15 pages
Unit 1 Daa Notes Daa Unit 1 Note
No ratings yet
Unit 1 Daa Notes Daa Unit 1 Note
26 pages
Disjoint Sets and Joint Sets
No ratings yet
Disjoint Sets and Joint Sets
9 pages
Computer Algorithms: Submitted By: Rishi Jethwa Suvarna Angal
No ratings yet
Computer Algorithms: Submitted By: Rishi Jethwa Suvarna Angal
32 pages
Riccardo
No ratings yet
Riccardo
49 pages
ECE250 Notes
No ratings yet
ECE250 Notes
23 pages
Unit-1 DAA - Notes
No ratings yet
Unit-1 DAA - Notes
25 pages
1 - Abstract-Note Version
No ratings yet
1 - Abstract-Note Version
62 pages
04 Graph1
No ratings yet
04 Graph1
27 pages
Summer of Science End-Term Report: Data Structures and Algorithms
No ratings yet
Summer of Science End-Term Report: Data Structures and Algorithms
19 pages
Lecture 19: Swinging From Up-Trees To Graphs: Today's Agenda
No ratings yet
Lecture 19: Swinging From Up-Trees To Graphs: Today's Agenda
24 pages
Chap 8
No ratings yet
Chap 8
36 pages
Disjoint Sets: Each of The Elements Is in Exactly One Set at Any Time
No ratings yet
Disjoint Sets: Each of The Elements Is in Exactly One Set at Any Time
28 pages
Dynamic Connectivity: 1.1 Algorithm
No ratings yet
Dynamic Connectivity: 1.1 Algorithm
4 pages
Disjoint Set Data Structure: Find (X) - Determine Which Set An Item With Key X Is In, I.e., Return The Key of
No ratings yet
Disjoint Set Data Structure: Find (X) - Determine Which Set An Item With Key X Is In, I.e., Return The Key of
5 pages
ADS Practice Sheet Solution: 1 STL Containers
No ratings yet
ADS Practice Sheet Solution: 1 STL Containers
9 pages
CS2040 Note
No ratings yet
CS2040 Note
2 pages
Lecture 12
No ratings yet
Lecture 12
4 pages
Mehlhorn K., Sanders P. Concise Algorithmics, The Basic Toolbox 124ñ PDF
No ratings yet
Mehlhorn K., Sanders P. Concise Algorithmics, The Basic Toolbox 124ñ PDF
124 pages
Unit-1 DAA Notes - Daa Unit 1 Note Unit-1 DAA Notes - Daa Unit 1 Note
No ratings yet
Unit-1 DAA Notes - Daa Unit 1 Note Unit-1 DAA Notes - Daa Unit 1 Note
26 pages
1 Ds + Daa: 1.1.1 Data
No ratings yet
1 Ds + Daa: 1.1.1 Data
4 pages
DAA 2marks With Answers
No ratings yet
DAA 2marks With Answers
11 pages
Algorithms and Data Structures Princeton University Fall 2005 Kevin Wayne
No ratings yet
Algorithms and Data Structures Princeton University Fall 2005 Kevin Wayne
9 pages
Basic Exercises for Competitive Programming: Python
From Everand
Basic Exercises for Competitive Programming: Python
Jan Pol
No ratings yet
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Numerical Analysis II Essentials
From Everand
Numerical Analysis II Essentials
The Editors of REA
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

Algorithms Part 1 - Lecture Notes: 1 Union Find

Uploaded by

Algorithms Part 1 - Lecture Notes: 1 Union Find

Uploaded by

Algorithms Part 1 - Lecture Notes

February 13, 2016

Dynamic Connectivity Problem

Algorithm 1 - Quick Find

Algorithm 2 - Quick Union

Improvement 2 - Path Compression

Union Find Applications

In general, the running time is:

Order of Growth Classification

Common examples for order of growth.

Stacks and Queues

First Implementation: Linked list

Second Implementation: Arrays

You might also like