Application of Data Structures
Application of Data Structures
Overview
Cumulative Sum Data Structures on Intervals Augmenting data structures with extra info to solve questions
Christopher Moh 2005
Often has other satellite data For example, when sorting pixels by their R value, we consider the R as the key field and GB as satellite data
Common PQ operations
Create()
Find_Min()
Insert(x)
Delete(x)
Change(x, k)
Optional PQ operations
Union (a,b)
Search (k)
Returns the position of the element in the heap with key value k
During the course of this presentation, we shall assume that there exists existing extra data which allows us to do a search in O(1) time. The handling of this data structure will be assumed and not covered.
Christopher Moh 2005
Linear Array
Unsorted Array
Create, Insert, Change in O(1) time Find_min, Delete in O(n) time Create, Find_min in O(1) time Insert, Delete, Change in O(n + log n) = O(n) time
Christopher Moh 2005
Sorted Array
Binary Heaps
Will be the most common structure that will be implemented in competition setting
A heap is a structure where the value of a node is less than the value of all of its children A binary heap is a heap where the maximum number of children for each node is 2.
Christopher Moh 2005
Array implementation
Consider a heap of size nheap in an array BHeap[1..nheap] (Define BHeap[nheap+1 .. (nheap*2)+1] to be INFINITY for practical reasons)
The children of BHeap[x] are BHeap[x*2] and BHeap[x*2+1] The parent of BHeap[x] are BHeap[x/2] This allows a near uniform Binary Heap where we can ensure that the number of levels in this heap is O(log n) Some properties wrt Key values: BHeap[x] >= BHeap[x/2], BHeap[x] <= BHeap[x*2], BHeap[x] <= BHeap[x*2+1], BHeap[x*2] ?? BHeap[x*2+1]
PQ Operations on a BHeap
We define BTree(x) to be the Binary Tree rooted at BHeap[x] We define Heapify(x) to be an operation that does the following:
Assume: BTree(x*2) and BTree(x*2+1) are binary heaps but BTree(x) is not necessarily a binary heap Produce: BTree(x) binary heap Details of Heapify in later slides but for now, we assume Heapify is O(log n)
For the rest of the presentation, we assume the variable n refers to nheap
Christopher Moh 2005
Operations on a BHeap
1.
Create is trivial O(1) time Find_min: Insert (element with key value x)
1. 2. 3. 4.
nheap++ BHeap[nheap] = x T = nheap While (T != 1 && Bheap[T] < BHeap[T/2]) O(log n) time as the number of levels is O(log n)
Swap (Bheap[T], BHeap[T/2] T=T/2
1. 2.
Operations on a BHeap
1. 2. 3.
Operations on a BHeap
Assume: k > existing BHeap[x] BHeap[x] = k Heapify(x) O(log n) as complexity of Heapify is O(log n)
Operations on a BHeap
1. 2. 3. 4. 5.
Operations on a BHeap
1. 2. 3.
4.
5.
Create, Find_min in O(1) time Change (includes both ChangeUp and ChangeDown), Insert, and Delete are O(log n) time Union operations are how long?
Corollary: Heapsort
We can convert an unsorted array to a heap using Heapify (why does this work?):
1.
Heapify(i)
Binomial Trees
B(0) is a single node B(n), n != 0, is formed by merging two B(n-1) trees in the following way:
The root of the B(n) tree is the root of one of the B(n-1) trees, and the (new) leftmost child of this root is the root of the other B(n-1) tree.
Within the tree, the heap property holds i.e. that the key field of any node is greater than the key field of all its children.
Christopher Moh 2005
The number of nodes in B(k) is exactly 2^k. The height of B(k) is exactly (k + 1) For any tree B(k)
The root of B(k) has exactly k children If we take the children of B(k) from left to right, they form the roots of a B(k-1), B(k2), , B(0) tree in that order
Christopher Moh 2005
Binomial Heaps
Binomial Heaps are a forest of binomial trees with the following properties:
All the binomial trees are of different sizes The binomial trees are ordered (from left to right) by increasing size
If we consider the fact that the size of B(k) is 2^k, the binomial tree B(k) exists in a binomial heap of n nodes iff the bit representing 2^k is 1 in the binary representation of n
For example: 13 (decimal) = 1101 (binary), so the binomial heap with 13 nodes consists of the binomial trees B(0), B(2), and B(3).
Christopher Moh 2005
Key field
The binomial heap is represented by a head pointer that points to the root of the smallest binomial tree (which is the leftmost binomial tree)
Christopher Moh 2005
Number of children in field degree Any other data that might be useful for the program
Parent Next Sibling (ordered left to right; a sibling must have the same parent); For roots of binomial trees, next sibling points to the root of the next binomial tree Leftmost child
1. 2. 3. 4.
Links two binomial trees with root h1 and h2 of the same order k to form a new binomial tree of order (k+1) We assume h1->key < h2->key which implies that h1 is the root of the new tree T = h1->leftchild h1->leftchild = h2 h2->parent = h1 H2->next_sibling= T O(1) time
Christopher Moh 2005
Create Create a new binomial heap with one node (key field set) Find_min
1. 2.
1. 2.
Set Parent, Leftchild, Next sibling to NIL O(1) time X = head, min = INFINITY While (X != nil)
3.
Return min O(log n) time as there are at most log n binomial trees (log n bits)
More Operations
More Operations
1. 2.
Append the (binomial)tree with root h1 to L h1 = h1->next_sibling Apply above steps to h2 instead
Christopher Moh 2005
2.
Else
1.
More Operations
The fundamental operation involving binomial heaps Takes two binomial heaps with head pointers h1 and h2 and creates a new binomial heap of the union of h1 and h2
More Operations
1.
2. 3.
Start with empty binomial heap Merge (h1, h2, L) Go by increasing k in the list L until L is empty
1.
2.
Union is O(log n)
If there is exactly one or exactly three (how can this happen?) binomial trees of order k in L, append one binomial tree of order k to the binomial heap and remove that tree from L If there are two trees of order k, remove both trees, use Link to form a tree of order (k+1) and pre-pend this tree to L
More Operations
Create a new binomial heap with that one node Union (existing heap with head h, new heap) O (log n) time
Decreasing the key value of a node Same idea as binary heap: Bubble up the binomial tree containing this node (exchange only key fields and satellite data! Whats the complexity if you physically change the node?) O (log n) time
Christopher Moh 2005
More Operations
1.
2.
3.
4.
O(log n) complexity
Deleting position x from the heap ChangeDown(x, -INFINITY) Now x is at the root of its binomial tree Supposing that the binomial tree is of order k Recall that the children of the root of the binomial tree, from right to left, are binomial trees of order 0, 1, 2, 3, 4, , k-1 Form a new binomial heap with the children of the root of this binomial tree the roots in the new binomial heap Remove the original binomial tree from the original binomial heap Union (original heap, new heap)
More Operations
Create in O(1) time Union, Find_min, Delete, Insert, and Change operations take O(log n) time In general, because they are more complicated, in competition it is far more prudent (saves time coding and debugging) to use a binary heap instead
1. 2.
3.
The following describes how Dijkstras algorithm can be coded with a binary heap Initializing phase: Let n be the number of nodes Create a heap of size n, all key fields initialized to INFINITY Change_val (s, 0) where s is the source node
Christopher Moh 2005
2. 3.
X = node corresponding to find_min value Delete (position of X in heap = 1) For all nodes k that are adjacent to X
1.
O(n log n)
Total running time O([m+n] log n) This is faster than using a basic array list unless the graph is very dense, in which case m is about O(n^2) which leads to a running time of O(n^2 log n)
Christopher Moh 2005
O(m log n)
Problem: We have a line that runs from x coordinate 1 to x coordinate N. At x coordinate X [X an integer between 0 and N], there is g(X) gold. Given an interval [a,b], how much gold is there between a and b? How efficiently can this be done if we dynamically change the amount of gold and the interval [a,b] keeps changing?
Christopher Moh 2005
Let us define C(0) = 0, and C(x) = C(x-1) + g(x) where g(x) is the amount of gold at position x C(x) then defines the total amount of gold from position 1 to position x The amount of gold in interval [a,b] is simply C(b) C(a-1)
However, if we change g(x), we will have to change C(x), C(x+1), C(x+2), , C(N)
We can use the binary representation of any number to come up with a cumulative sum tree For example, let say we take 13 (decimal) = 1101 (binary)
The cumulative sum of g(1) + g(2) + g(13) can be represented as the sum of:
g(1) + g(2) + + g(8) [ 8 elements ] g(9) + g(10) + + g(12) [ 4 elements ] g(13) [ 1 element ]
Notice that the number of elements in each case represents a bit that is 1 in the binary representation of the number
C(19) is the sum of the following: g(1) + g(2) + + g(16) [ 16 elements ] g(17) + g(18) [ 2 elements ] g(19) [ 1 element ]
Let us define C2(x) to be the sum of g(x) + g(x-1) + + g(p + 1) where p is a number with the same binary representation as x except the least significant bit of x (the rightmost bit of x that is 1) is 0 Examples of x and the corresponding p:
If we want to find the cumulative sum C(x) = g(1) + g(2) + + g(x), we can trace through the values of C2 using the binary representation of x
Examples: C(13) = C2(8) + C2(8+4) + C2(8+4+1) C(16) = C2(16) C(21) = C2(16) + C2(16+4) + C2(16+4+1) C(99) = C2(64) + C2(64+32) + C2(64+32+2) + C2(64+32+2+1)
Hence the amount of gold in interval [a,b] = C(b) C(a-1) can be found in log N time, which implies updates of a and b can be done in O(log N)
Christopher Moh 2005
If g(x) is changed, we only need to update C2(y) where C2(y) covers g(x) We can go through all necessary C2(y) in the following way:
While (x <= N)
1. 2.
1.
This runs in O(log N) time Hence updates to g can also be done in O(log n) time, which is a great improvement over the O(N) needed for an array.
Christopher Moh 2005
We can implement a cumulative sum tree very simply: By simply using a linear array to store the values of C2. Can we extend a cumulative sum tree to 2 or more dimensions?
Change to g(5) [ 101 ] : Update C2(5), C2(6), C2(8), C2(16) and all C2(power of 2 > 16) Change to g(13) [ 1101 ]: Update C2(13), C2(14), C2(16), and all C2(power of 2 > 16) Change to g(35) [ 100011 ]: Update C2(35), C2(36), C2(40), C2(48), C2(64), and all C2(power of 2 > 64)
Another way to solve the question is to use a Sum of Intervals Binary Tree Each node in the tree is represented by (L, R) and the value of (L,R) is g(L) + g(L+1) + + g(R) The root of the tree has L = 1 and R = N Every leaf has L = R Every non-leaf has children (L, [L+R]/2) [left child] and ([L+R]/2+1, R) [right child] The number of nodes in the tree is O(2*N) [ why? ] In an implementation, every node should have pointers to its children and its parent
Christopher Moh 2005
1. 2.
1.
2.
Else
1.
C += value of (L,R) Set L and R to the left child of the current node Set L and R to the right child of the current node
3.
1. 2.
3.
Complexity of O(log N) Hence all updates of interval [a,b] and g(x) can be done in O(log N) time
Christopher Moh 2005
It is often useful to change the data structure in some way, by adding additional data in each node or changing what each node represents. This allows us to use the same data structure to solve problems For example, we can use so-called interval trees to solve not just cumulative sum problems
We can use properties of elements in the interval (L,R) that are related to L and R.
Christopher Moh 2005