CH 2 - Hashing and Priority Queues Hashing
CH 2 - Hashing and Priority Queues Hashing
1
Hash Functions
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
16 47 35 36 65 129 25 2501 29
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
14 16 47 35 36 65 129 25 2501 29
attempts
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
14 16 47 35 36 65 129 25 2501 99 29
attempts
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
14 16 47 35 36 65 127 129 25 2501 99 29
attempts
Linear Probing
• Eliminates need for separate data structures
(chains), and the cost of constructing nodes.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
29 16 47 35 36 129 25 2501 65
t
attempts
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
29 16 47 14 35 36 129 25 2501 65
t+1 t+4 t
attempts
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
29 16 47 14 35 36 129 25 2501 99 65
t t+1 t+4
attempts
Array:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
47 35 36 129 25 2501
65(?)
Array:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
47 35 36 65 129 25 2501 29
t
attempt
Double Hashing
If the hash table is not full, attempt to store key
in array elements (t+d)%N, (t+2d)%N …
Let f2(x)= 11 − (x % 11) f2(16)=d=6
Where would you store: 16?
Array:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
16 47 35 36 65 129 25 2501 29
t
attempt
Array:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
14 16 47 35 36 65 129 25 2501 29
t+16 t+8 t
attempts
Array:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
14 16 47 35 36 65 129 25 2501 99 29
t+22 t+11 t t+33
attempts
Array:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
14 16 47 35 36 65 129 25 2501 99 29
t+10 t t+5
attempts
Infinite loop!
The Squished Pigeon
Principle
• An insert using Closed Hashing cannot work with a
load factor of 1 or more.
• Quadratic probing can fail if > ½
• Linear probing and double hashing slow if > ½
• Lazy deletion never frees space
• Separate chaining becomes slow once > 1
• Eventually becomes a linear search of long chains
• How can we relieve the pressure on the pigeons?
REHASH! 44
REHASHING
• When the load factor exceeds a threshold, double the table size.
• Rehash each record in the old table into the new table.
• Expensive: O(N) work done in copying.
Rehashing Example
Separate chaining
h1(x) = x mod 5 rehashes to h2(x) = x mod 11
0 1 2 3 4
=1
25 37 83
52 98
0 1 2 3 4 5 6 7 8 9 10
=5/11
25 37 83 52 98
46
Extendible Hashing
47
Extendible Hashing
Finally, one can use bitmap to build the index but store an actual key in
the bucket! 49
Extendible Hashing
Only one hash function h can be used, but depending on the size of the
index, only a portion of the added h(K) is utilized.
A simple way to achieve this effect is by looking at the address into the
string of bits from which only the i leftmost bits can be used.
50
Extendible Hashing
52
Priority Queue ADT
1. PQueue data : collection of data with priority
2. PQueue operations
• insert
• deleteMin
53
Applications of the Priority Queue
• Anything greedy
54
Potential Implementations
insert deleteMin
Unsorted list (Array) O(1) O(n)
Unsorted list (Linked-List) O(1) O(n)
55
Binary Heap Properties
1. Structure Property
2. Ordering Property
56
Heap Structure Property
• A binary heap is a complete binary tree.
Complete binary tree – binary tree that is
completely filled, with the possible exception of
the bottom level, which is filled left to right.
Examples:
57
Representing Complete
Binary Trees in an Array
1 A From node i:
2 3
B C
4 5 6
F
7
G
left child:
D E
8 9 10 11 12 right child:
H I J K L
parent:
58
Heap Order Property
Heap order property: For every non-root node X,
the value in the parent of X is less than (or equal
to) the value in X.
10
10
20 80
20 80
40 60 85 99
30 15
50 700
not a heap
59
Heap Operations
• findMin:
• insert(val): percolate up.
• deleteMin: percolate down.
10
20 80
40 60 85 99
50 700 65
60
Heap – Insert(val)
Basic Idea:
1. Put val at “next” leaf position
2. Percolate up by repeatedly exchanging node until no longer
needed
61
Insert: percolate up
10
20 80
40 60 85 99
50 700 65 15
10
15 80
40 20 85 99
50 700 65 60
62
Heap – Deletemin
Basic Idea:
1. Remove root (that is always the min!)
2. Put “last” leaf node at root
3. Find smallest child of node
4. Swap node with its smallest child if needed.
5. Repeat steps 3 & 4 until no swaps needed.
63
DeleteMin: percolate down
10
20 15
40 60 85 99
50 700 65
15
20 65
40 60 85 99
50 700
64
Binary Heaps
65
Building a Heap
12 5 11 3 10 6 9 4 8 1 7 2
• Adding the items one at a time is O(n log n) in the worst case
66
Working on Heaps
• What are the two properties of a heap?
• Structure Property
• Order Property
67
BuildHeap: Floyd’s Method
12 5 11 3 10 6 9 4 8 1 7 2
12
5 11
3 10 6 9
4 8 1 7 2
68
BuildHeap: Floyd’s Method
12 12
5 11 5 11
3 10 2 9 3 1 2 9
4 8 1 7 6 4 8 10 7 6
12 12
5 2 1 2
3 1 6 9 3 5 6 9
69
4 8 10 7 11 4 8 10 7 11
Finally…
1
3 2
4 5 6 9
12 8 10 7 11
runtime:
70
Facts about Heaps
Observations:
• Finding a child/parent index is a multiply/divide by two
• Operations jump widely through the heap
• Each percolate step looks at only two new nodes
• Inserts are at least as common as deleteMins
Realities:
• Division/multiplication by powers of two are equally fast
• Looking at only two new pieces of data: bad for cache!
• With huge data sets, disk accesses dominate
71
Cycles to access:
CPU
Cache
Memory
Disk
72
A Solution: d-Heaps
• Each node has d children
• Still representable by array 1
• Good choices for d:
• (choose a power of two for 3 7 2
efficiency)
• fit one set of children in a 4 8 5 12 11 10 6 9
cache line
• fit one set of children on a 12 1 3 7 2 4 8 5 12 11 10 6 9
memory page/disk block
73
Leftist Heaps
Idea:
Focus all heap maintenance work in one small part of the heap
Leftist heaps:
1. Most nodes are on the left
2. All the merging work is done on the right
74
Definition: Null Path Length
null path length (npl) of a node x = the number of nodes between x
and a null in its subtree
OR
npl(x) = min distance to a descendant with 0 or 1 children
• npl(null) = -1 ?
• npl(leaf) = 0
• npl(single-child node) = 0 ? ?
Equivalent definitions: 0 1 ? 0
• Leftist property
• For every node x, npl(left(x)) npl(right(x))
• result: tree is at least as “heavy” on the left as the right
76
Are These Leftist?
2 2 0
1 1 1 1 0
0 1 0 0 1 0 0 0 1
0 0 0 0 0 0 0 0
0
Every subtree of a leftist
0
tree is leftist!
0
77
Why do we have the leftist property?
Because it guarantees that:
• the right path is really short compared to the number of nodes in the
tree
• A leftist tree of N nodes, has a right path of at most lg (N+1) nodes
78
Merge two heaps (basic idea)
• Put the smaller root as the new root,
• Hang its left subtree on the left.
• Recursively merge its right subtree and the other tree.
79
Merging Two Leftist Heaps
• merge(T1,T2) returns one leftist heap containing all
elements of the two (distinct) leftist heaps T1 and T2
merge
T1 a a
merge
L1 R1 a<b L1 R1
T2 b b
L2 R2 L2 R2 80
Merge Continued
a a
If npl(R’) > npl(L1)
L1 R’ R’ L1
R’ = Merge(R1, T2)
runtime:
81
Operations on Leftist Heaps
• merge with two trees of total size n: O(log n)
• insert with heap size n: O(log n)
• pretend node is a size 1 leftist heap
• insert by merging original heap with one node heap
merge
merge
82
Leftest Merge Example
merge
?
1 3
5
0 0
0 merge
10 12 7 1 ?
5 5
0
1 14
3 0 0 0 merge
10 12 10 0
0 0 12
7 8 0 0
8 8
0
14
0
(special case) 8
0
12
83
Sewing Up the Example
? ? 1
3 3 3
0 0 0
7 ? 7 1 7 5 1
5 5
0 0 0 0 0
14 0 0 0
10 0 14 14 10 8
8 10 8
0
0 0 12
12 12
Done?
84
Finally…
1 1
3 3
0 0
7 5 1 5 1 7
0 0 0 0 0 0
14 10 8 10 8 14
0 0
12 12
85
Random Definition:
Amortized Time
am·or·tized time:
Running time limit resulting from “writing off” expensive
runs of an algorithm over multiple cheap runs of the
algorithm, usually resulting in a lower overall running time
than indicated by the worst possible case.
If M operations take total O(M log N) time,
amortized time per operation is O(log N)
86
Skew Heaps
Problems with leftist heaps
• extra storage for npl
• extra complexity/logic to maintain and check npl
• right side is “often” heavy and requires a switch
Solution: skew heaps
• “blindly” adjusting version of leftist heaps
• merge always switches children when fixing right path
• amortized time for: merge, insert, deleteMin = O(log n)
• however, worst case time for all three = O(n)
87
Merging Two Skew Heaps
merge
T1 a a
merge
L1 R1 a<b L1
R1
T2 b b
L2 R2 L2 R2
8 10 14
12 89
Skew Heap Code
void merge(heap1, heap2) {
case {
heap1 == NULL: return heap2;
heap2 == NULL: return heap1;
heap1.findMin() < heap2.findMin():
temp = heap1.right;
heap1.right = heap1.left;
heap1.left = merge(heap2, temp);
return heap1;
otherwise:
return merge(heap2, heap1);
}
}
90
Other Priority Queues
• Leftist Heaps
• O(log N) time for insert, deletemin, merge
• The idea is to have the left part of the heap be long and the right part short,
and to perform most operations on the left part.
• Skew Heaps (“splaying leftist heaps”)
• O(log N) amortized time for insert, deletemin, merge
92
Yet Another Data Structure:
Binomial Queues
What’s a forest?
• Structural property
• Forest of binomial trees with at most What’s a binomial tree?
one tree of any height
• Order property
• Each binomial tree has the heap-order property
93
The Binomial Tree, Bh
• Bh has height h and exactly 2h nodes
• Bh is formed by making Bh-1 a child of another Bh-1
• Root has exactly h children
• Number of nodes at depth d is binomial coeff.
• Hence the name; we will not use this last property
B0 B1 B2 B3
94
Binomial Queue with n elements
Binomial Q with n elements has a unique structural
representation in terms of binomial trees!
1 B3 1 B2 No B1 1 B0
95
Worst Case Run Times
B4 B3 B2 B1 B0
depth 4 3 2 1 0
number of elements 24 = 16 23 = 8 22 = 4 21 = 2 20 = 1
BQ.1
Example 1.
N=110=12 22 = 4 21 = 2 20 = 1
N=310=112 22 = 4 21 = 2 20 = 1
1
BQ.1 3
Example 2.
N=210=102 22 = 4 21 = 2 20 = 1
Merge BQ.1 and BQ.2
4
This is an add with a
+ BQ.2 6
carry out.
= BQ.3 4 3
6
N=410=1002 22 = 4 21 = 2 20 = 1
1 7
BQ.1 3
N=310=112 22 = 4 21 = 2 20 = 1
Example 3.
4 8
7
= carry
8
N=210=102 22 = 4 21 = 2 20 = 1
7 1 7
carry 8 + BQ.1 3
N=210=102 22 = 4 21 = 2 20 = 1 N=310=112 22 = 4 21 = 2 20 = 1
4 8
Example 3.
+ BQ.2 6
Part 2 - Add the existing
values and the carry.
N=310=112 22 = 4 21 = 2 20 = 1
1 7
= BQ.3 4 3 8
6
N=610=1102 22 = 4 21 = 2 20 = 1
Exercise
4 9 2 13 1
8 7 10 15
12
N=310=112 22 = 4 21 = 2 20 = 1 N=710=1112 22 = 4 21 = 2 20 = 1
4 9 2 13 1
+
8 7 10 15
12
2 1
4 7 10 9
13 8 12
15
4/25/03 Binomial Queues - Lecture 12 104
O(log N) time to Merge
• For N keys there are at most log2 N trees in a binomial forest.
• Each merge operation only looks at the root of each tree.
• Total time to merge is O(log N).
1 2 5
5 2 1
4 7 10 9
9 4 7 10
13 8 12
15 13 8 12
15
5 2 1 5 2 Remove min
9 4 7 10 9 4 7 10
13 8 12 13 8 12
15 15
1 Return this
5 2 5 2
9 9
New forest
4 7 10 0 1 2 3 4 5 6 7
13 8 12
10 7 4
15
12 13 8
15
5 2 0 1 2 3 4 5 6 7
Merge
9
5 2
0 1 2 3 4 5 6 7
10 4 7 9
10 7 4 13 8 12
12 13 8 15
15