Eci 2023
Eci 2023
Conrado Martínez
Univ. Politècnica de Catalunya, Spain
1 Analysis of Algorithms:
Review of basic concepts. Probabilistic tools. The
continuous master theorem. Amortized analysis.
2 Probabilistic & Randomized Dictionaries:
Review of basic techniques (search trees, hash tables).
Randomized binary search trees. Skip lists. Cuckoo
hashing. Bloom filters.
3 Priority Queues:
Review of basic techniques. Binomial queues. Fibonacci
heaps. Applications: Dijkstra’s algorithm for shortest paths.
2/405
Outline of the course
4 Disjoint Sets:
Union by weight and by rank. Path compression heuristics.
Applications: Kruskal’s algorithm for minimum spanning
trees.
5 Data Structures for String Processing:
Tries. Patricia. Ternary search trees.
6 Multidimensional Data Structures:
Associative queries. K -dimensional search trees.
Quadtrees.
3/405
Part I
Analysis of Algorithms
3 Amortized Analysis
4/405
Linearity of Expectations
5/405
Indicator variables
6/405
Union Bound
7/405
Markov’s Inequality
Theorem
Let X be a positive random variable. For any a>0
[X ]
P[X > a] E :
a
8/405
Markov’s Inequality
Proof.
Let A be the event X > a. Let IA denote the indicator
variable of that event. Then
9/405
Markov’s Inequality
Example
Suppose we throw a fair coin n times. Let Hn
denote number of heads in the n throws. We have
E[Hn ] = n=2. Using Markov’s inequality
n=2 2
P[Hn > 3n=4] = :
3n=4 3
In general,
E[X ] 1
P[X > c E[X ]] = :
c E[X ] c
10/405
Chebyshev’s Inequality
Theorem
Let X be a positive random variable. For any a>0
V[X ]
P[jX E[X ] j a] :
a2
Corollary
1
P[jX E[X ] j c X ] 2 ;
c
p
with X = V[X ], the standard deviation of X .
11/405
Chebyshev’s Inequality
Proof.
We have
h i
P[jX E[X ] j a] = P (X E[X ])2 a2
E (X E[X ])2 [X ]
2 a
=V2 :
a
12/405
Chebyshev’s Inequality
Example
Again Hn = the number of heads in n throws of a fair coin.
Since Hn Binomial(n; 1=2), E[Hn ] = n=2 and
V[Hn ] = n=4. Using Chebyshev’s inequality
n
P[Hn > 3n=4] P jHn
2
j n4 (Vn=[H4)n2] = n4 :
13/405
Chebyshev’s Inequality
Example
The expected number of comparisons E[qn ] in standard
quicksort is 2n lnn + o(n log n). It can be shown that
V[qn ] = 7 23 n2 + o(n2 ). Hence, the probability that
2
= + o (1= log n) :
2c ln n
14/405
Jensen’s Inequality
Theorem
If f is a convex function then
E[f (X )] f (E[X ]) :
Example
For any random variable X, E X2 (E[X ])2 , since
f (x) = x2 is convex.
15/405
Chernoff Bounds
Theorem
Let fXi gni=0 be independent Bernouilli trials, with
Pn
P[Xi = 1] = pi . Then, if X = i=1 Xi , and = E[X ], we
have
1 P[X (1 )] (1 e)(1 ) , for 2 (0; 1).
2 P[X (1 + )] e for any > 0.
(1+ )(1+)
16/405
Chernoff Bounds
Corollary (Corollary 1)
Let fXi gni=0 be independent Bernouilli trials, with
Pn
P[Xi = 1] = pi . Then if X = i=1 Xi , and = E[X ], we
have
1 P[X (1 )] e 2 =2 ; for 2 (0; 1).
2 P[X (1 + )] e 2 =3 ; for 2 (0; 1].
Corollary (Corollary 2)
Let fXi gni=0 be independent Bernouilli trials, with
Pn
P[Xi = 1] = pi . Then if X = i=1 Xi , = E[X ] and
2 (0; 1), we have
P[jX j ] 2e 2 =3 :
17/405
Chernoff Bounds
Back to an old example: We flip n times a fair coin, we wish an
upper bound on the probability of having at least 34n heads.
Recall Let Hn Binomial(n; 1=2), then,
= E[Hn ] = n=2; V[Hn ] = n=4.
h i
We want to bound P Hn 34n .
h i
Markov: P Hn 34n 3n= = 2=3:
h i 4
Chebyshev: P Hn 34n P jHn n2 j n4 (Vn= [Hn ] 4
4)2 = n .
Chernoff:
h Using
i Cor. 1.2,
P Hn 4 = P Hn (1 + ) n2 ) (1 + ) = 32 ) = 12
3n
h i
=) P Hn 34n e =3 = e 24 :
2 n
Example
If n = 100, Cheb. = 0:04, Chernoff = 0:0155
If n = 106 , Cheb. = 4 10 6 , Chernoff = 2:492 10 18095
18/405
Part I
Analysis of Algorithms
3 Amortized Analysis
19/405
The Continuous Master Theorem
20/405
The Continuous Master Theorem
Definition
Given the sequence of weights !n;j , ! (z ) is a shape
function for that set of weights if
R1
0 ! (z ) dz 1
1
X Z (j +1)=n
!n;j !(z ) dz = O(n )
0j<n j=n
!1 n !n;zn
!(z ) = nlim
21/405
The Continuous Master Theorem
22/405
The Continuous Master Theorem
23/405
The Continuous Master Theorem
24/405
Example #1: QuickSort
25/405
QuickSort
26/405
QuickSort
27/405
QuickSort
There are many ways to do the partition; not them all are
equally good. Some issues, like repeated elements, have to be
dealt with carefully. Bentley & McIlroy (1993) discuss a very
efficient partition procedure, which works seamlessly even in
the presence of repeated elements. Here, we will examine a
basic algorithm, which is reasonably efficient.
We will keep two indices i and j such that A[` + 1::i 1]
contains elements less than or equal to the pivot p, and
A[j + 1::u] contains elements greater than or equal to the pivot
p. The two indices scan the subarray locations, i from left to
right, j from right to left, until A[i] > p and A[j ] < p, or until they
cross (i = j + 1).
28/405
QuickSort
procedure PARTITION(A, `, u, k)
Require: ` u
[
Ensure: A `::k 1] [ ] [ + 1 ]
A k A k ::u
:= + 1 := := [ ]
i ` ; j u; p A `
while i < j +1do
while i < j +1 [ ]
^ A i p do
i := i + 1
while i < j + 1 ^ A[j ] p do
j := j 1
if i < j + 1 then
A[i] :=: A[j ]
i := i + 1; j := j 1
A[`] :=: A[j ]; k := j
29/405
The Cost of QuickSort
The worst-case cost of Q UICK S ORT is (n2 ), hence not very
attractive. But it only occurs is all or most recursive call one of
the subarrays contains very few elements and the other
contains almost all. That would happen if we systematically
choose the first element of the current subarray as the pivot
and the array is already sorted!
The cost of the partition is (n) and we would have then
30/405
The Cost of QuickSort
31/405
The Cost of QuickSort
32/405
The Cost of QuickSort
33/405
Solving QuickSort’s Recurrence
We apply CMT to quicksort’s recurrence with the set of weights
!n;j = 2=n and toll function tn = n 1. As we have already
seen, we can take ! (z ) = 2, and the CMT applies with a = 1
and b = 0. All necessary conditions to apply CMT are met.
Then we compute
Z 1
z =1
H=1 2z dz = 1 z 2 z=0 = 0;
0
hence we will have to apply CMT’s second case and compute
Z 1 z =1
z2 2 1
H0 = 2z ln z dz =
2
z ln z = :
0 z =0 2
Finally,
n ln n
qn = + o(n log n) = 2n ln n + o(n log n)
1=2
= 1:386 : : : n log2 n + o(n log n): 34/405
Example #2: QuickSelect
35/405
QuickSelect
36/405
QuickSelect
37/405
QuickSelect
Example
We are looking the fourth element (j = 4) out of n = 15
elements
9 5 10 12 3 1 11 15 7 2 8 13 6 4 14
38/405
QuickSelect
Example
We are looking the fourth element (j = 4) out of n = 15
elements
9 5 10 12 3 1 11 15 7 2 8 13 6 4 14
38/405
QuickSelect
Example
We are looking the fourth element (j = 4) out of n = 15
elements
7 5 4 6 3 1 8 2 9 15 11 13 12 10 14
pivot ends at position k=9>j
38/405
QuickSelect
Example
We are looking the fourth element (j = 4) out of n = 15
elements
7 5 4 6 3 1 8 2 9 15 11 13 12 10 14
38/405
QuickSelect
Example
We are looking the fourth element (j = 4) out of n = 15
elements
1 5 4 2 3 6 8 7 9 15 11 13 12 10 14
pivot ends at position k=6>j
38/405
QuickSelect
Example
We are looking the fourth element (j = 4) out of n = 15
elements
1 5 4 2 3 6 8 7 9 15 11 13 12 10 14
38/405
QuickSelect
Example
We are looking the fourth element (j = 4) out of n = 15
elements
2 3 1 4 5 6 8 7 9 15 11 13 12 10 14
pivot ends at position k = 4 = j ) DONE!
38/405
QuickSelect
39/405
QuickSelect
40/405
QuickSelect
Cn = n + O(1)
1 X
+ [remaining number of comp. j pivot is the k-th element] ;
n 1kn E
as the pivot will be the k-th smallest element with probability
1=n for all k, 1 k n.
41/405
QuickSelect
The probability that j = k is 1=n, then no more comparisons
are need since we would be done. The probability that j < k is
(k 1)=n, then we will have to make Ck 1 comparisons.
Similarly, with probability (n k)=n we have j > k and we will
then make Cn k comparisons. Thus
1 X k 1 n k
Cn = n + O(1) + C + C
n 1kn n k 1 n n k
2 X k
= n + O(1) + C:
n 0k<n n k
Applying the CMT with the shape function
2zn
lim n
n!1
= 2z
n n
R1 2
we obtain H = 1 0 2z dz = 1=3 > 0 and Cn = 3n + o(n).
42/405
Part I
Analysis of Algorithms
3 Amortized Analysis
43/405
Amortized Analysis
44/405
Amortized Analysis
45/405
A first example: Binary counter
46/405
A first example: Binary counter
Proof
Any increment flips O(k) bits.
47/405
Aggregate method
48/405
Aggregate method
In the binary counter problem, we observe that bit 0 flips n
times, bit 1 flips bn=2c times, bit 2 flips bn=4c times, . . .
Theorem
Starting from 0, a sequence of n increments makes
(n) bit flips.
Proof
Each increment flips at least 1 bit, thus we make at
least (n) flips. But total cost is O(n). Indeed
kX1 n kX1 1 X1 1 1
2j
n 2 j < n
2j = n 1 (1=2) = 2n:
j =0 j =0 j =0
49/405
Accounting method (banker’s viewpoint)
We associate “charges” to different operations, these charges
may be smaller or larger than the actual cost.
When the charge or amortized cost c^i of an operation is
larger than the actual cost ci then the difference is seen as
credits that we store in the data structure to pay for future
operations.
When the amortized cost c^i is smaller than ci the
difference must be covered from the credits stored in the
data structure.
The initial data structure D0 has 0 credits.
Invariant: For all `,
X̀
(^ci ci ) 0;
i=1
that is, at all moments, there must be a positive number of
credits in the data structure.
50/405
Accounting method (banker’s viewpoint)
Theorem
The total cost of processing a sequence of n operations
starting with D0 is bounded by the sum of amortized
costs.
Proof
n
X n
X
Invariant =) c^i ci = total cost:
i=1 i=1
51/405
Accounting method (banker’s viewpoint)
In the binary counter problem, we charge 2 credits every time
we flip a bit from 0 to 1, we pay 1 credit every time we flip a bit
from 1 to 0.
We consider that each time we flip a bit from 0 to 1 we store
one credit in the data structure and use the other credit to pay
the flip. When the bit is flip from 1 to 0, we use the stored credit
for that. Thus 1-bits all store 1 credit each, whereas 0-bits do
not store credits.
Theorem
Starting from 0, a sequence of n increments makes
(n) bit flips.
Proof
P
Every increment flips at least one bit, thus
i
ci n. Every increment flips a 0-bit to
a 1-bit once (the rightmost 0 in the counter before the increment is the only 0-bit flipped).
Hence c ^i = 2 because all the other flips during the i-th increment are from 1-bits to 0-bits,
and their amortized cost is 0. Thus
X X
c^i = 2n ci :
i i
As the number of credits per bit is 0 then the number of credits stored at the counter are
0 at all times, that is, the invariant is preserved.
53/405
Potential method (physicist’s viewpoint)
54/405
Potential method (physicist’s viewpoint)
Theorem
The total cost of processing a sequence of n operations
starting with D0 is bounded by the sum of amortized
costs.
Proof
n
X n
X n
X n
X
c^i = ci + i = ci + ((Di ) (Di 1 ))
i=1 i=1 i=1 i=1
Xn n
X n
X
= ci + (Dn ) (D0 ) = ci + (Dn ) ci ;
i=1 i=1 i=1
since (Dn ) 0 and (D0 ) = 0.
55/405
Potential method (physicist’s viewpoint)
For the binary counter problem, we take
(D) = number of 1-bits in the binary counter D. Notice that
(D0 ) = 0 and (D) 0 for all D.
The actual cost ci of the i-th increment is 1 + p, where p,
0 p k is the position of the rightmost 0-bit. We flip the p
1’s to the right of the rightmost 0-bit, then the rightmost
0-bit (except when the counter is all 1’s and we reset it,
then the cost is p).
The change in potential is 1 p because we add one
1-bit (flipping the rightmost 0-bit to a 1-bit, except if p = k)
and we flip p 1-bits to 0-bits, those to the right of the
rightmost 0-bit. Hence
c^i = ci + i 1 + p + (1 p) = 2;
X X
=) 2n c^i ci :
i i
56/405
Stacks with multi-pop
Example
Suppose we have a stack that supports:
P USH(x)
P OP(): pops the top of the stack and returns it,
stack must be non-empty
MP OP(k): pops k items, the stack must contain at
least k items
The cost of P USH and P OP is O(1) and the cost of
MP OP(k) is (k) = O(n) (n = size of the stack), but
saying that the worst-case cost of a sequence of N
stack operations is O(N 2 ) is too pessimistic!
57/405
Stacks with multi-pop
Example
Accounting: Assign 2 credits to each P USH. One is
used to do the operation and the other credit to pop
(with pop or multi-pop) the element at a later time. The
total number of credits in the stack = size of the stack.
c^PUSH = 2.
c^POP = c^MPOP = 0.
N
X N
X
=) 2N c^i ci :
i=1 i=1
58/405
Stacks with multi-pop
Example
Potential: (S ) = size(S ). Then: (S0 ) = 0 and (S )
0 for all stacks S .
c^PUSH = 1 + i = 2.
c^POP = 1 + i = 1 + ( 1) = 0.
c^MPOP = k + i = k + ( k) = 0 jSi 1j k
N
X N
X
=) 2N c^i ci :
i=1 i=1
59/405
Dynamic arrays
Example
We often use dynamic arrays (a.k.a. vectors in C++),
the array dynamically grows as we add items (using
v.push_back(x), say).
60/405
Dynamic arrays
Example
When a new element has to be added and n = size =
capacity a new array with double capacity is allocated
from dynamic memory, the contents of the old array
copied into the new and the old array freed back to
dynamic memory, with total cost (n). The program
sets the array name (a pointer) to point to the new array
instead of poiting to the old.
61/405
Dynamic arrays
Example
Cost to fill a dynamic array using n push_back’s?
Aggregate:
The cost ci of the i-th push_back is (1) except if
i = 2k + 1 for some k, 0 k log2 (n 1).
When i = 2k + 1, it triggers a resizing with cost (i).
0 1
n
X X X
Total cost = ci = @ 1+ iA
i=1 i:i6=2k +1 i:i=2k +1
blog2X
(n 1)c
= n (log n) + (2k + 1)
k=0
n (log n) + (log n) + (2log2 (n 1)+1 1) = (n):
62/405
Dynamic arrays
Example
Cost to fill a dynamic array using n push_back’s?
Accounting:
Charge 3 credits to the assignment v [j ] := x in
which we add x to the first unused array slot j ;
every push_back does it, sometimes a resizing is
also needed. Use 1 credit for the assignment, and
store the remaining 2 credits in slot j .
When resizing an array v of size n to an array v 0
with capacity 2n, each j 2 v [n=2::n 1] stores 2
credits; use one credit for the copying v 0 [j ] := v [j ]
and use the other credit for the copying of
v[j n=2] to v0 [j n=2]
63/405
Dynamic arrays
64/405
Dynamic arrays
Example
Cost to fill a dynamic array using n push_back’s?
Accounting: The total number of credits in the dynamic
array v is 2 (size(v )=2) = size(v ), therefore always 0.
n
X n
X
c^i = 3n ci :
i=1 i=1
65/405
Dynamic arrays
Example
Instead of 3 credits for the assignment v [j ] := x we
might charge some other constant quantity c 3 so
that we use 1 credit for the assignment v [j ] := x proper,
and we store c 1 credits at every used slot j in the
upper half of v ; these c 1 credits will be used to pay
for the copying of v [j ] and of v [j n=2], but also the
creation of a unused slot v 0 [j + n] in the new array and
the destruction of v [j n=2] and v[j ] in the old array,
if such construction/destruction costs need to be taken
into account.
66/405
Dynamic arrays
Example
Cost to fill a dynamic array using n push_back’s?
Potential:
When there is no resizing: ci = 1.
When there is resizing: ci = 1 + k cap(vi ), for
some constant k, vi is the dynamic array after the
i-th push_back
(v ) = 2k (2 size(v ) cap(v ) + 1).
N.B. We will take k = 1=2 to simplify the calculations
67/405
Dynamic arrays
Example
Cost to fill a dynamic array using n push_back’s?
Potential:
When there is no resizing: cap(vi ) = cap(vi 1 ),
68/405
Dynamic arrays
Example
Cost to fill a dynamic array using n push_back’s?
Potential:
When there is resizing:
cap(vi ) = 2 cap(vi 1 ) = 2 size(vi 1 ),
5 Skip Lists
6 Hash Tables
Separate Chaining
Open Addressing
Cuckoo Hashing
7 Bloom Filters
70/405
Random BSTs
71/405
Random BSTs
71/405
Random BSTs
71/405
Random BSTs
71/405
Randomized binary search trees
C. Aragon R. Seidel
Two incarnations
Randomized treaps (tree+heap) invented by Aragon and
Seidel (FOCS 1989, Algorithmica 1996) use random
priorities and bottom-up balancing
Randomized binary search trees (RBSTs) invented by
Martínez and Roura (ESA 1996, JACM 1998) use subtree
sizes and top-down balancing
72/405
Randomized binary search trees
72/405
Insertion in a RBST
Inserting an item x = 48
42 6
27 3 64 2
11 1 1
35 56 1
73/405
Insertion in a RBST
Inserting an item x = 48
48
insert new
42 6
item
27 3 64 2
11 1 1
35 56 1
73/405
Insertion in a RBST
Inserting an item x = 48
48
27 3 64 2
11 1 1
35 56 1
73/405
Insertion in a RBST
Inserting an item x = 48
6
48
42
27 3 64 2
73/405
Insertion in a RBST
Inserting an item x = 48
42 7
27 3 48 3
11 1 1
35 64 2
56 1
73/405
Insertion in a RBST
procedure I NSERT(T , k, v )
:=
n T ! size . n =0 =
if T
(0 ) = 0
if U NIFORM ; n then
=
. this will always succeed if T
( )
return I NSERT- AT-R OOT T; k; v
if k < T ! key then
:= ( )
T ! left I NSERT T ! left; k; v
else
:= (
T ! right I NSERT T ! right; k; v )
Update T ! size
return T
74/405
Insertion in a RBST
hT ; T +i = S PLIT(T; x)
T = BST for fy 2 T j y < xg
T + = BST for fy 2 T j x < yg
S PLIT is like partition in Quicksort
Insertion at root was invented by Stephenson in 1976
75/405
Splitting a RBST
To split a RBST T around x, we need just to follow the path
from the root of T to the leaf where x falls
76/405
Splitting a RBST
To split a RBST T around x, we need just to follow the path
from the root of T to the leaf where x falls
x<z
z
T −= L−
L+
T+= <L+, z , R>
76/405
Splitting a RBST & Insertion at Root
77/405
Splitting a RBST
Lemma
Let T and T + be the BSTs produced by S PLIT(T; x).
If T is a random BST containing the set of keys K ,
then T and T + are independent random BSTs
containing the sets of keys K = fy 2 T j y < xg
and K + = fy 2 T j y > xg, respectively.
78/405
Insertion in RBSTs
Theorem
If T is a random BST that contains the set of keys K
and x is any key not in K , then I NSERT(T; x) produces a
random BST containing the set of keys K [ fxg.
79/405
The Cost of Insertions
E[In ] = 2 ln n + O(1)
80/405
The Cost of Insertions
1 X j n j+1
E[In ] = 1 + E[Ij 1 ] + [I ]
n 1j n n + 1 n+1 E n j
To solve this recurrence the Continuous Master Theorem
(Roura, 2001) comes handy
We need to produce O(log n) random numbers on average
to insert an item
81/405
The Cost of Insertions
1 X j n j+1
E[In ] = 1 + E[Ij 1 ] + [I ]
n 1j n n + 1 n+1 E n j
To solve this recurrence the Continuous Master Theorem
(Roura, 2001) comes handy
We need to produce O(log n) random numbers on average
to insert an item
81/405
The Cost of Insertions
1 X j n j+1
E[In ] = 1 + E[Ij 1 ] + [I ]
n 1j n n + 1 n+1 E n j
To solve this recurrence the Continuous Master Theorem
(Roura, 2001) comes handy
We need to produce O(log n) random numbers on average
to insert an item
81/405
RBST resulting from the insertion of 500 keys in ascending
order
Source: R. Sedgewick, Algorithms in C (3rd edition), 1997
82/405
Deletions in RBSTs
83/405
Deletions in RBSTs
procedure D ELETE(T , k)
=
if T then
return T
=
if k T ! key then
return D ELETE -R OOT T ( )
if x < T ! key then
:= (
T ! left D ELETE T ! left; k )
else
:= (
T ! right D ELETE T ! right; k )
Update T ! size
return T
85/405
Deletions in RBSTs
J OIN(; ) =
J OIN(T; ) = J OIN(; T ) = T
J OIN(T1 ; T2 ) = ?; T1 6= ; T2 6=
86/405
Joining two BSTs
x y
T1 T2
L1 R1 L2 R2
87/405
Joining two BSTs
join(R1, T 2) y
L1
R1 L2 R2
87/405
Joining two BSTs
88/405
Joining two RBSTs
Lemma
Let L and R be two independent random BSTs, such
that the keys in L are strictly smaller than the keys in
R. Let KL and KR denote the sets of keys in L and R,
respectively. Then T = J OIN(L; R) is a random BST that
contains the set of keys K = KL [ KR .
89/405
Joining two RBSTs
The cost of the joining phase is the sum of the path lengths
to the leaves minus twice the depth of the ith item; the
expected cost follows from well-known results
1 1
2 = O(1)
i n+1 i
90/405
Deletions in RBSTs
Theorem
If T is a random BST that contains the set of keys K ,
then D ELETE(T; x) produces a random BST containing
the set of keys K n fxg.
91/405
Deletions in RBSTs
Theorem
If T is a random BST that contains the set of keys K ,
then D ELETE(T; x) produces a random BST containing
the set of keys K n fxg.
Corollary
The result of any arbitary sequence of insertions and
deletions, starting from an initially empty tree is always a
random BST.
91/405
Additional remarks
92/405
Additional remarks
93/405
To learn more
94/405
To learn more (2)
[3] J. L. Eppinger.
An empirical study of insertion and deletion in binary
search trees.
Comm. of the ACM, 26(9):663—669, 1983.
[4] W. Panny.
Deletions in random binary search trees: A story of errors.
J. Statistical Planning and Inference, 140(8):2335–2345,
2010.
[5] H. M. Mahmoud.
Evolution of Random Search Trees.
Wiley Interscience, 1992.
95/405
Part II
5 Skip Lists
6 Hash Tables
Separate Chaining
Open Addressing
Cuckoo Hashing
7 Bloom Filters
96/405
Skip lists
W. Pugh
97/405
Skip lists
W. Pugh
97/405
Skip lists
98/405
Skip lists
−OO 12 21 37 40 42 53 66 + OO
Header NIL
99/405
Skip lists
−OO 12 21 37 40 42 53 66 + OO
Header NIL
99/405
Skip lists
Prfheight(x) = kg = pq k 1 ; q=1 p
The height of the skip list S is the number of non-empty
lists,
height(S ) = maxfheight(x)g
x2 S
100/405
Skip lists
Prfheight(x) = kg = pq k 1 ; q=1 p
The height of the skip list S is the number of non-empty
lists,
height(S ) = maxfheight(x)g
x2 S
100/405
Skip lists
Prfheight(x) = kg = pq k 1 ; q=1 p
The height of the skip list S is the number of non-empty
lists,
height(S ) = maxfheight(x)g
x2 S
100/405
Searching in a skip list
−OO 12 21 37 40 42 53 66 + OO
Header NIL
101/405
Searching in a skip list
−OO 12 21 37 40 42 53 66 + OO
Header NIL
101/405
Searching in a skip list
−OO 12 21 37 40 42 53 66 + OO
Header NIL
101/405
Searching in a skip list
−OO 12 21 37 40 42 53 66 + OO
Header NIL
101/405
Searching in a skip list
−OO 12 21 37 40 42 53 66 + OO
Header NIL
101/405
Searching in a skip list
−OO 12 21 37 40 42 53 66 + OO
Header NIL
101/405
Implementing skip lists
Inserting an item x = 48
−OO 12 21 37 40 42 53 66 + OO
Header NIL
103/405
Insertion in a skip list
Inserting an item x = 48
−OO 12 21 37 40 42 53 66 + OO
Header NIL
Geom(p)
48
103/405
Insertion in a skip list
Inserting an item x = 48
−OO 12 21 37 40 42 53 66 + OO
Header NIL
48
103/405
Insertion in a skip list
Inserting an item x = 48
−OO 12 21 37 40 42 48 53 66 + OO
Header NIL
103/405
Implementing skip lists
104/405
Implementing skip lists
105/405
Implementing skip lists
procedure I NSERT(k, v , S )
...
while : : : do
. loop to locate whether k is present or not
. and to determine predecessors at each level
[1] =
if p ! next = [1]
null _ k 6 p ! next ! key then
. k is not present
. Insert new item, see next slide
else
. k is present, update its value
[1]
p ! next ! value v :=
106/405
Implementing skip lists
107/405
Implementing skip lists
108/405
Other Operations
109/405
Other Operations
109/405
Other Operations
109/405
Other Operations
109/405
Performance of skip lists
110/405
Performance of skip lists
111/405
Performance of skip lists
112/405
Performance of skip lists
113/405
Performance of skip lists
Then
nqLn 1 = 1=q =) Ln = logq (1=n) = log1=q n
114/405
Performance of skip lists
Then
nqLn 1 = 1=q =) Ln = logq (1=n) = log1=q n
114/405
Performance of skip lists
Then
nqLn 1 = 1=q =) Ln = logq (1=n) = log1=q n
114/405
Performance of skip lists
Then
nqLn 1 = 1=q =) Ln = logq (1=n) = log1=q n
114/405
Performance of skip lists
Then the steps remaining to reach Hn (=the height of a random
skip list of size n) can analyzed this way:
we need not more horizontal steps than nodes with height
Ln, the expected number is 1=q, by definition
the probability that Hn > k is
n
1 1 q k nq k
the expected value of the height Hn can be bounded as
X X X
E[Hn ] = P[Hn > k] = P[Hn > k] + P[Hn > k]
k0 0k<Ln kLn
X X
Ln + P[Hn > Ln + k] = Ln + nq n q k
L
k 0 k 0
= Ln + 1=p
thus the expected additional vertical steps need to reach
Hn from Ln is 1=p 115/405
Performance of skip lists
116/405
Analysis of the height
W. Szpankowski V. Rego
1
E[Hn ] = log1=q n + + (log1=q n) + O(1=n)
ln(1=q ) 2
where = 0:577 : : : is Euler’s constant and (t) a
fluctuation of period 1, mean 0 and small amplitude.
117/405
Analysis of the forward cost
−OO 12 21 37 40 42 53 66 + OO
Header NIL
118/405
Analysis of the forward cost
−OO 12 21 37 40 42 53 66 + OO
Header NIL
118/405
Analysis of the forward cost
119/405
Analysis of the forward cost
119/405
Analysis of the forward cost
P. Kirschenhofer H. Prodinger
Source: Wikipedia
121/405
To learn more
[1] L. Devroye.
A limit theory for random skip lists.
The Annals of Applied Probability, 2(3):597–609, 1992.
[2] P. Kirschenhofer and H. Prodinger.
The path length of random skip lists.
Acta Informatica, 31(8):775–792, 1994.
[3] P. Kirschenhofer, C. Martínez and H. Prodinger.
Analysis of an Optimized Search Algorithm for Skip Lists.
Theoretical Computer Science, 144:199–220, 1995.
122/405
To learn more (2)
5 Skip Lists
6 Hash Tables
Separate Chaining
Open Addressing
Cuckoo Hashing
7 Bloom Filters
124/405
Hash Tables
125/405
Hash Tables
If the hash function evenly “spreads” the keys, the hash table
will be useful as there will be a small number of keys mapping
to any given address of the table.
Given two distinct keys x and y , we say that they are
synonyms, also that they collide if h(x) = h(y ).
A fundamental problem in the implementation of a dictionary
using a hash table is to design a collision resolution strategy.
126/405
Hash Functions
#fk 2 K j h(k) = ig
#fk 2 K g
M1
127/405
Collision Resolution
128/405
Separate Chaining
struct node {
Key _k;
Value _v;
..-
};
vector<list<node>> _Thash; // array of linked lists of synonyms
int _M; // capacity of the table
int _n; // number of elements
double _alpha_max; // max. load factor
};
129/405
Separate Chaining
M = 13 X = { 0, 4, 6, 10, 12, 13, 17, 19, 23, 25, 30}
h (x) = x mod M
0 13 0
1
2
4 30 17 4
5
6 19 6
7
8
10 23 10
11
12 25 12
130/405
Separate Chaining
131/405
Separate Chaining
132/405
Separate Chaining
procedure I NSERT(T , k, v )
if n=M > max then
( )
R ESIZE T
:=
i H ASH k ()
:= ( )
p __L OOKUP T; i; k
=
if p null then
:=
p new NODE k; v( )
:= [ ]
p:next T i
T [i] := p
n := n + 1
else
p:value := v
133/405
Separate Chaining
134/405
The Cost of Separate Chaining
135/405
Open Addressing
i1 = i0 + 1; i2 = i1 + 1; : : : ;
taking modulo M in all cases.
136/405
Linear Probing
M = 13 X = { 0, 4, 6, 10, 12, 13, 17, 19, 23, 25, 30}
h (x) = x mod M (incremento 1)
0 0 0 0 occupied 0 0 occupied
1 1 13 occupied 1 13 occupied
2 2 free 2 25 occupied
3 3 free 3 free
4 4 4 4 occupied 4 4 occupied
5 5 17 occupied 5 17 occupied
6 6 6 6 occupied 6 6 occupied
7 7 19 occupied 7 19 occupied
8 8 free 8 30 occupied
9 9 free 9 free
10 10 10 10 occupied 10 10 occupied
11 11 23 occupied 11 23 occupied
12 12 12 12 occupied 12 12 occupied
137/405
+ {0, 4, 6, 10, 12} + {13, 17, 19, 23} + {25, 30}
Linear Probing
138/405
Other Open Addressing Schemes
i0 = h(x);
ij = ij 1 (j; x);
where x y denotes x + y (mod M ).
139/405
Other Open Addressing Schemes
142/405
The Cost of Open Addressing
143/405
The Cost of Open Addressing
Sn;i =D Ui 1
D
where = denotes equal distribution
144/405
The Cost of Open Addressing
Un = 1 (1 )+2 (1 )+3 2 (1 )
X
k 1 (1 Xd( k )
= k ) = (1 )
k>0 k>0 d
d X k 1
= (1 ) = :
d k>0 1
145/405
The Cost of Open Addressing
146/405
The Cost of Open Addressing
147/405
The Cost of Open Addressing
148/405
The Cost of Open Addressing
150/405
Cuckoo Hashing
152/405
Cuckoo Hashing
153/405
Cuckoo Hashing
154/405
Cuckoo Hashing
=
procedure I NSERT(T hT1 ; T2 i, k, v )
if k 2 T then . update v and return
else
=
if n M 1 = =
then . M jT1 j jT2 j
( )
R ESIZE T
( )
R EHASH T . can’t insert M elements
:= ( ); =
x NODE k; v x:free false
:= 1
for i ( )
to M AX I TER n; M do
( )=2
. for example, M AX I TER n; M n
x :=: T1 [h1 (k)]
if x:free then return
x :=: T2 [h2 (k)]
if x:free then return
. Insertion failed! pick new functions h1 and h2
R EHASH(T )
I NSERT(T; k; v ) . retry with the new functions
155/405
Cuckoo Hashing
156/405
Cuckoo Hashing
157/405
Cuckoo Hashing
158/405
Cuckoo Hashing
To prove facts #1 (good insertion needs expected O(1) time)
and #2 (probability of a good insertion is 1 O(1=n2 )) we
formulate the problem in graph theoretic terms.
159/405
Cuckoo Hashing
Cuckoo graph:
Vertices: V = fv1;i ; v2;i j 0 i < M g =
the set of 2M slots in the tables
Edges: If T1 [j ] is occupied by x then there’s an edge
(v1;j ; v2;h2 (x) ), where v`;j is the vertex associated to T` [j ]; x
is the label of the edge. If T2 [k] is occupied by y then there
is an edge (v2;k ; v1;h1 (y) ) with label y .
This is a labeled directed “bipartite” multigraph—all edges go
from v1;j to v2;k or from v2;k to v1;j .
160/405
Cuckoo Hashing
161/405
Cuckoo Hashing
162/405
Cuckoo Hashing
The most detailed analysis of the cuckoo graph has been made
by Drmota and Kutzelnigg (2012). They prove, among many
other things:
1 The probability that the cuckoo graph contains no complex
component is
1
1 h() + O(1=M 2 )
M
We do not reproduce their explicit formula for h() here
(h() ! 1 as ! 0)
2 The expected number of steps in n good insertions is
=)
n min 4; ln(1
1
+ O(1)
These two results prove the two Facts that we needed for our
analysis
163/405
Cuckoo Hashing
164/405
Part II
5 Skip Lists
6 Hash Tables
Separate Chaining
Open Addressing
Cuckoo Hashing
7 Bloom Filters
165/405
Bloom filters
166/405
Bloom filters
167/405
Bloom filters
168/405
Implementing Bloom filters
169/405
Implementing Bloom filters
170/405
Insertion & lookup
procedure I NSERT(x)
for j:= 1 to k do
A[hj (x)] := 1
procedure L OOKUP(x)
for j:= 1 to k do
[ ( )] = 0
if A hj x then
return false
return true
171/405
Insertion & lookup
n
Y
P[A[j ] is not updated in `-th insertion] =
`=1
!n
1 kn
1 k
1 = 1
M M
174/405
Analysis of Bloom filters
1 kn
1 1
M
Probability that k checked bits are set to 1 probability of
a false positive
!k
1 kn
k
1 1
M
1 e kn=M
175/405
Analysis of Bloom filters
d
kn=M k
1 e =0
dk k=k
which gives
M M
k ln 2 0:69
n n
Call p the probability of a false positive. This probability is a
function of k, p = p(k); for the optimal choice k we have
ln 2 M
M ln 2 1
p(k ) 1 0:6185
n
e ln 2 n
=
M
n
2
177/405
Optimal parameters for Bloom filters
178/405
Optimal parameters for Bloom filters
179/405
Optimal parameters for Bloom filters
180/405
To learn more
181/405
To learn more (2)
182/405
Part III
Priority Queues
9 Heaps
10 Binomial Queues
11 Fibonacci Heaps
183/405
Priority Queues: Introduction
184/405
Introduction
185/405
Priority Queues: Introduction
PriorityQueue<double, string> P;
for (int i = 0; i < n; ++i)
P.insert(Weigth[i], Symb[i]);
int i = 0;
while(not P.empty()) {
Weight[i] = P.min();
Symb[i] = P.min_prio();
++i;
P.remove_min();
}
186/405
Priority Queues: Introduction
187/405
Part III
Priority Queues
9 Heaps
10 Binomial Queues
11 Fibonacci Heaps
188/405
Heaps
Definition
A heap is a binary tree such that
1 All empty subtrees are located in the last two levels
of the tree.
2 If a node has an empty left subtree then its right
subtree is also empty.
3 The priority of any element is larger or equal than
the priority of any element in its descendants.
189/405
Heaps
190/405
Heaps
n = 10 level 0
76
height = 4
72 34 level 1
59 63 29 level 2
17
33 level 3
37 29
level 4
leaves
191/405
Heaps
Proposition
1 The root of a max-heap stores an element of
maximum priority.
2 A heap of size n has height
h = dlog2 (n + 1)e:
192/405
Heaps: Removing the maximum
1 Replace the root of the heap with the last element (the
rightmost element in the last level)
2 Reestablish the invariant (heap order) sinking the root:
The function sink exchanges a given node with its largest
priority child, if its priority is smaller than the priority of its
child, and repeats the same until the heap order is
reestablished.
193/405
Heaps: Removing the maximum
1 Replace the root of the heap with the last element (the
rightmost element in the last level)
2 Reestablish the invariant (heap order) sinking the root:
The function sink exchanges a given node with its largest
priority child, if its priority is smaller than the priority of its
child, and repeats the same until the heap order is
reestablished.
193/405
Heaps: Removing the maximum
194/405
Heaps: Adding a new element
195/405
Heaps: Adding a new element
195/405
The Cost of Heaps
Since the height of a heap is (log n), the cost of removing the
maximum and the cost of insertions is O(log n).
196/405
Implementing Heaps
197/405
Implementing Heaps
197/405
Implementing Heaps
197/405
Implementing Heaps
198/405
Implementing Heaps
return nelems == 0;
}
199/405
Implementing Heaps
200/405
Implementing Heaps
// Cost: O(log(n/j))
template <typename Elem, typename Prio>
void PriorityQueue<Elem,Prio>::sink(int j) {
int minchild = 2 * j;
if (minchild < nelems and
h[minchild].second > h[minchild + 1].second)
++minchild;
201/405
Implementing Heaps
// Cost: O(log j)
template <typename Elem, typename Prio>
void PriorityQueue<Elem,Prio>::siftup(int j) {
int father = j / 2;
if (h[j].second < h[father].second) {
swap(h[j], h[father]);
siftup(father);
}
}
202/405
Part III
Priority Queues
9 Heaps
10 Binomial Queues
11 Fibonacci Heaps
203/405
Binomial Queues
J. Vuillemin
204/405
template <typename Elem, typename Prio>
class PriorityQueue {
public:
PriorityQueue() throw(error);
~PriorityQueue() throw();
PriorityQueue(const PriorityQueue& Q) throw(error);
PriorityQueue& operator=(const PriorityQueue& Q) throw(error);
...
};
205/405
Binomial Queues
B0
B1
B2
B3
206/405
Binomial Queues
Bi
Bi
Bi+1
5 3
7 4 5 6
9 4 8
10
Each node in the binomial queue will store an Elem and its
priority (any type that admits a total order)
Each node will also store the order of the binomial subtree
of which the node is the root
We will use the usual first-child/next-sibling representation
for general trees, with a twist: the list of children of a node
will be double linked and circularly closed
We need thus three pointers per node: first_child,
next_sibling, prev_sibling
The binomial queue is simply a pointer to the root of the
first binomial tree
We will impose that all lists of children are in increasing
order
209/405
Binomial Queues
n = 10 = (1,0,1,0) 2
5 3
7 6 5 4
8 4 9
10
210/405
Binomial Queues
211/405
Binomial Queues
212/405
Binomial Queues
213/405
Binomial Queues
214/405
Binomial Queues
215/405
Binomial Queues
216/405
Binomial Queues
217/405
Binomial Queues
n = 10 = (1,0,1,0) 2
5 3
7 4 5 6
9 4 8
10
218/405
Binomial Queues
n = 10 = (1,0,1,0) 2
5 3
7 4 5 6
9 4 8
10
218/405
Binomial Queues
n = 10 = (1,0,1,0) 2
5
3
7
4 5 6
9 4 8
10
218/405
Binomial Queues
n = 10 = (1,0,1,0) 2
7
Q’ 4 5 6
9 4 8
10
218/405
Binomial Queues
n = 9 = (1,0,0,1) 2
6 4
5 9 4
5 8 10
218/405
Binomial Queues
219/405
Binomial Queues
220/405
Binomial Queues
221/405
Binomial Queues
222/405
Binomial Queues
Q 5 3
7 4 5 6
9 4 8
10
Q’ 12 1 2
6 4 2
13
223/405
Binomial Queues
Q 3 5
B
4 5 6 7
9 4 8 12 B’
10
carry
Q’ 1 2
6 4 2
13
223/405
Binomial Queues
Q 3 5
B
4 5 6 7
9 4 8 B’
10
carry
Q’ 1 2
6 4 2
result 12
13
223/405
Binomial Queues
Q 3 5
B
4 5 6 7
1
9 4 8 B’
6
10
carry
Q’ 2
4 2
result 12
13
223/405
Binomial Queues
Q 3
B
4 5 6
9 4 8 B’
10
1
carry
Q’ 2
5 6
4 2
12 7
result
13
223/405
Binomial Queues
3 B
4 5 6 2
B’
9 4 8 4 2
1
10 13
carry
Q’
5 6
12 7
result
223/405
Binomial Queues
3 B
4 5 6
B’
9 4 8
1
10
carry
Q’
2 5 6
12 4 2 7
result
13
223/405
Binomial Queues
Q
B
B’
3 2 6
5 carry
Q’
4 5 6 4 2 7
result 12 9 4 8 13
10
223/405
Binomial Queues
Q
B
Q’ B’
carry
3 2 6
5
4 5 6 4 2 7
result 12 9 4 8 13
10
223/405
Binomial Queues
// removes the first binomial tree from the binomial queue q
// and returns it; if the queue q is empty, returns NULL: cost: Theta(1)
static node_bq* pop_front(node_bq*& q) throw();
224/405
Binomial Queues
static node_bq* add(node_bq*& A, node_bq*& B, node_bq*& C) throw() {
int i = order(A); int j = order(B); int k = order(C);
int r = min(i, j, k);
node_bq* a, b, c;
a = b = c = NULL;
if (i == r) { a = A; A = NULL; }
if (j == r) { b = B; B = NULL; }
if (k == r) { c = C; C = NULL; }
if (a != NULL and b == NULL and c == NULL) {
return a;
}
if (a == NULL and b != NULL and c == NULL) {
return b;
}
if (a == NULL and b == NULL and c != NULL) {
return c;
}
if (a != NULL and b != NULL and c == NULL) {
C = join(a, b);
return NULL;
}
if (a != NULL and b == NULL and c != NULL) {
C = join(a,c);
return NULL;
}
if (a == NULL and b != NULL and c != NULL) {
C = join(b,c);
return NULL;
}
/// a != NULL and b != NULL and c != NULL
C = join(a,b); 225/405
Binomial Queues
226/405
Binomial Queues
227/405
Binomial Queues
To learn more:
[1] J. Vuillemin
A Data Structure for Manipulating Priority Queues.
Comm. ACM 21(4):309–315, 1978.
[2] T. Cormen, C. Leiserson, R. Rivest and C. Stein.
Introduction to Algorithms, 2e.
MIT Press, 2001.
229/405
Part III
Priority Queues
9 Heaps
10 Binomial Queues
11 Fibonacci Heaps
230/405
Fibonacci Heaps
233/405
Fibonacci Heaps
234/405
Fibonacci Heaps
235/405
Fibonacci Heaps
236/405
Fibonacci Heaps
class FibonacciHeap {
...
private:
struct FH_node {
FH_node* parent;
FH_node* a_child;
FH_node* left_sibling, * right_sibling;
int rank;
bool mark;
Elem info;
int prio;
};
FH_node* min;
int rank;
...
}
237/405
Fibonacci Heaps
238/405
Fibonacci Heaps
Notation Meaning
n number of elements
rank(x) rank of x = number of children of node x
rank(H ) max. rank of any node in H
trees(H ) number of trees in H
marks(H ) number of marked nodes in H
239/405
Fibonacci Heaps
Potential function:
240/405
Fibonacci Heaps
241/405
Fibonacci Heaps
241/405
Fibonacci Heaps
242/405
Fibonacci Heaps
Linking:
Given two trees of rank k (rank of a tree = rank of its root),
linking T1 and T2 yields a tree of rank k + 1, adding the root with
larger priority as a child of the root with smaller priority.
243/405
Fibonacci Heaps
244/405
Fibonacci Heaps
244/405
Fibonacci Heaps
245/405
Fibonacci Heaps
245/405
Fibonacci Heaps
245/405
Fibonacci Heaps
245/405
Fibonacci Heaps
245/405
Fibonacci Heaps
245/405
Fibonacci Heaps
245/405
Fibonacci Heaps
245/405
Fibonacci Heaps
245/405
Fibonacci Heaps
245/405
Fibonacci Heaps
245/405
Fibonacci Heaps
245/405
Fibonacci Heaps
245/405
Fibonacci Heaps
246/405
Fibonacci Heaps
247/405
Fibonacci Heaps
248/405
Fibonacci Heaps
249/405
Fibonacci Heaps
249/405
Fibonacci Heaps
249/405
Fibonacci Heaps
249/405
Fibonacci Heaps
250/405
Fibonacci Heaps
251/405
Fibonacci Heaps
252/405
Fibonacci Heaps
252/405
Fibonacci Heaps
253/405
Fibonacci Heaps
253/405
Fibonacci Heaps
253/405
Fibonacci Heaps
253/405
Fibonacci Heaps
254/405
Fibonacci Heaps
254/405
Fibonacci Heaps
254/405
Fibonacci Heaps
254/405
Fibonacci Heaps
254/405
Fibonacci Heaps
254/405
Fibonacci Heaps
254/405
Fibonacci Heaps
255/405
Fibonacci Heaps
Summary:
Insert: O(1)
Extract min:O(rank(H )) amortized
Decrease priority: O(1) amortized
Last step: Fibonacci lemma.
Lemma
Let H be a Fibonacci heap with n elements. Then
rank(H ) log n
256/405
Fibonacci Heaps
Lemma
Fix some moment in time and consider a tree of rank k
with root x. Denote y1 , . . . , yk the k children of x in the
order in which they have been attached as children of x.
Then (
0 if i = 1,
rank(yi )
i 2 if i 2.
257/405
Fibonacci Heaps
258/405
Fibonacci Heaps
Lemma
Let sk be the smallest possible number of elements in
a Fibonacci heap of rank k. Then sk Fk+2 , where Fk
denotes the k-th Fibonacci number
Proof
Consider a FH consisting of a single tree with root x.
Basis: s0 = 1, s1 = 2. Inductive hyp: si Fi+2 for
all i, 0 i < k.
Let y1 ; : : : ; yk denote the children of x.
260/405
Part IV
Disjoint Sets
13 Implementation of Union-Find
14 Analysis of Union-Find
261/405
Disjoint Sets
262/405
Disjoint Sets
Ax = fy 2 A j y xg:
263/405
Disjoint Sets
264/405
Disjoint Sets
class UnionFind {
public:
// Creates the partition {{0}, {1}, ..., {n-1}}
UnionFind(int n);
266/405
Part IV
Disjoint Sets
13 Implementation of Union-Find
14 Analysis of Union-Find
267/405
Implementation #1: Quick-find
268/405
Implementation #1: Quick-find
269/405
Implementation #1: Quick-find
270/405
Implementation #2: Quick-union
271/405
Implementation #2: Quick-union
class UnionFind {
...
private:
vector<int> P;
int nr_blocks;
};
UnionFind::UnionFind(int n) : P(vector<int>(n)) {
// constructor
for (int j = 0; j < n; ++j)
P[j] = j;
nr_blocks = n;
}
void UnionFind::Union(int i, int j) {
int ri = Find(i); int rj = Find(j);
if (ri != rj) {
P[ri] = rj; --nr_blocks;
}
}
int UnionFind::Find(int i) {
while (P[i] != i) i = P[i];
return i;
}
272/405
Implementation #2: Quick-union
273/405
Implementation #3: Union by weight or by rank
274/405
Implementation #3: Union by weight or by rank
275/405
Implementation #3: Union by weight or by rank
class UnionFind {
...
private:
vector<int> P;
int nr_blocks;
};
UnionFind::UnionFind(int n) : P(vector<int>(n)) {
// constructor
for (int j = 0; j < n; ++j)
P[j] = -1; // all items are roots of trees of size 1 (or rank 1)
nr_blocks = n;
}
int UnionFind::Find(int i) {
// P[i] < 0 when i is a root
while (P[i] > 0) i = P[i];
return i;
}
...
276/405
Implementation #3: Union by weight or by rank
277/405
Implementation #3: Union by weight or by rank
Lemma
The height of a tree that represents a block of size k is
1 + log2 k, using union-by-weight.
Proof
We prove it by induction. If k = 1 the lemma is
obviously true, the height of a tree of one element is
1. Let T be a tree of size k resulting from the union-
by-weight of two trees T1 and T2 of sizes r and s,
respectively, assume r s < k = r + s. Then T
has been obtained putting T1 as child of T2 .
278/405
Implementation #3: Union by weight or by rank
Proof (cont’d)
By inductive hypothesis, height(T1 ) 1 + log2 r
and height(T2 ) 1 + log2 s. The height of T is
that of T2 unless height(T1 ) = height(T2 ), then
height(T ) = height(T1 ) + 1. That is,
279/405
Implementation #3: Union by weight or by rank
280/405
Path Compression
While we look for the representative of i in a F IND(i), we follow
the pointers from i up to the root, and we could make the
pointers along that path change so that the path becomes
shorter and therefore we may speeed up future calls to F IND.
There are several heuristics for path compression:
1 In full path compression, we traverse the path from i to its
282/405
Path compression: full path compression
283/405
Path compression: full path compression
284/405
Path compression: full path compression
285/405
Path compression: full path compression
285/405
Path compression: full path compression
285/405
Path compression: full path compression
285/405
Path compression: full path compression
285/405
Path compression: path splitting
Make every node point to its grandparent (except if it is the root
or a child of the root).
Disjoint Sets
13 Implementation of Union-Find
14 Analysis of Union-Find
288/405
Amortized analysis of Union-Find
The analysis of Union-Find with union by weight (or by rank)
using some path compression heuristic must be amortized: the
union of two representatives (roots) is always cheap, and the
cost of any F IND is bounded by O(log n), but if we apply many
F IND’s the trees become bushier, and we approach rather
quickly the situation of Quick-Find while we avoid costly
U NION’s.
Proposition
The tree roots, node ranks and elements within a tree
are the same with or without path compression.
Proof
Path compression only changes some parent pointers,
nothing else. It does not create new roots, does not
change ranks or move elements from one tree to
another.
291/405
Amortized analysis of Union-Find
Properties:
1 If x is not a root node then rank(x) < rank(parent(x)).
2 If x is not a root node then its rank will not change.
3 Let rt = rank(parent(x)) at time t. If at time t + 1 x
changes its parent then rt < rt+1
4 A root node of rank k has 2k descendants.
5 The rank of any node is dlog2 ne
6 For any r 0, the Union-Find data structure contains at
most n=2r elemnts of rank r
292/405
Amortized analysis of Union-Find
293/405
Amortized analysis of Union-Find
Proof of Property #1
A node of rank k can only be created by joining two
nodes of rank k 1. Path compression can’t change
ranks. However it might change the parent of x; in
that case, x will point to some ancestor of its previous
parent, hence rank(x) < rank(parent(x)) at all times.
Proof of Property #2
The rank of a node can only change in union-by-rank
if x was a root and becomes a non-root. Once a root
becomes a non-root it will never become a root node
again. Path compression never changes ranks and
never changes roots.
294/405
Amortized analysis of Union-Find
Example of property #1
295/405
Amortized analysis of Union-Find
Proof of Property #3
When the parent of x changes it is because either
1 x becomes a non-root and union-by-rank
guarantees that rt = rank(parent(x)) = rank(x) and
rt+1 > rt as x becomes a chlid of a node whose
rank is larger than rt
2 x is a non-root at time t but path compression
changes its parent. Because x will be pointing to
some ancestor of its parent then rt+1 > rt (because
of Property #1)
296/405
Amortized analysis of Union-Find
Proof of Property #4
By induction on k.
Base: If k = 0 then x is the root of a tree with only one
node, so the number of descendants is 2k .
Inductive hypothesis: a node x of rank k can only get
that rank because of the union of two nodes of rank
k 1, hence x was the root of one of the trees involved
and its rank was k 1 before the union. By hypothesis,
each tree contained 2k 1 and the result must then
contain 2k nodes.
Proof of Property #5
Immediate from Properties #1 and #4.
297/405
Amortized analysis of Union-Find
An example of property #4
298/405
Amortized analysis of Union-Find
Proof of Property #6
Because of Property #4, any node x of rank k is the
root of a subtree with 2k nodes. Indeed, if x is a root
that is the statement of the property. Else, inductively,
because x had the property just before becoming a non-
root; since neither its rank nor the set of descendants
can change afterwards, the property is also true for non-
root nodes. Because of Property #1, two distinct nodes
of rank k can’t be one ancestor of the other and then
they can’t have common ancestors.
Therefore, there can be at most n=2r nodes of rank r.
299/405
Amortized analysis of Union-Find
An example of property #6
300/405
Amortized analysis of Union-Find
Definition
The iterated logarithm function is
(
0; if x 1
lg x =
1 + lg (lg x); otherwise
We consider only logarithms base 2, hence write
lg log2 .
n lg n
(0; 1] 0
(1; 2] 1
(2; 4] 2 lg n 5 in this Universe.
(4; 16] 3
(16; 65536] 4
(65536; 265536 ] 5 301/405
Amortized analysis of Union-Find
2
Given k, let 2 "" k = 22
| {z } . Inductively: 2 "" 0 = 1, and
k exponentiations
2 "" k = 22""(k 1) . Then lg (2 "" k) = k. Define groups
G0 = f1g
G1 = f2g
G2 = f3; 4g
G3 = f5; : : : ; 16g
G4 = f17; : : : ; 65536g
G5 = f65537; : : : ; 265536 g
::: = :::
Gk = f1 + 2 "" (k 1); : : : ; 2 "" kg
302/405
Amortized analysis of Union-Find
303/405
Amortized analysis of Union-Find
Accounting scheme: We assign credits during a U NION to the
node that ceases to be a root; if its rank belongs to group Gk
we assign 2 "" k credits to the item.
Proposition
The number of credits assigned in total among all nodes
is n lg n.
Proof
By Property #6, the number of nodes with rank x+1 is
at most
n n n
+ x+2 + : : : x
2x+1 2 2
Consider nodes that belong (their ranks) to group Gk =
fx + 1; : : : ; 2xg (x = 2 "" (k 1)).
304/405
Amortized analysis of Union-Find
Proof (cont’d)
As the group contains 2x nodes the number of credits
assigned to nodes in the group is n. All the ranks
belong in the first lg n groups, hence the total number
of credits is n lg n.
305/405
Amortized analysis of Union-Find
306/405
Amortized analysis of Union-Find
Theorem
Starting from an initial Union-Find data structures for
n elements with n disjoint blocks, any sequence of
m n U NION and F IND using union-by-rank and full
path compression have total cost O(m lg n).
Proof
The amortized cost of F IND is O(lg n), and that of any
U NION is constant, hence the sequence of m operations
has total cost O(m lg n).
308/405
Amortized analysis of Union-Find
1970 1990
To learn more:
[1] Michael J. Fischer
Efficiency of Equivalence Algorithms
Symposium on Complexity of Computer Computations,
IBM Thomas J. Watson Research Center, 1972.
[2] J.E. Hopcroft and J.D. Ullman
Set Merging Algorithms
SIAM J. Computing 2(4):294–303, 1973.
[3] Robert E. Tarjan
Efficiency of a Good But Not Linear Set Union Algorithm
J. ACM 22(2):215–225, 1975.
310/405
Disjoint Sets
To learn more:
[1] Robert E. Tarjan and Jan van Leeuwen
Worst-Case Analysis of Set Union Algorithms
J. ACM 31(2):245–281, 1984.
[2] Michael L. Fredman and Michael E. Saks
The Cell Probe Complexity of Dynamic Data Structures
Proc. 21st Symp. Theory of Computing (STOC), p.
345–354, 1989.
[3] Z. Galil and G. Italiano
Data Structures and Algorithms for Disjoint Set Union
Problems
ACM Computing Surveys 23(3):319–344, 1991.
311/405
Part V
15 Tries
16 Suffix Trees
312/405
Tries
313/405
Tries
Consider a finite alphabet = f1 ; : : : ; m g with m 2
symbols. We denote , as usual in the literature, the set of all
strings that can be formed with symbols from . Given two
strings u and v in we write u v for the string which results
from the concatenation of u and v . We will use to dente the
empty string or string of length 0.
Definition
Given a finite set of strings X , all of identical
length, the trie T of X is an m-ary tree recursively
defined as follows:
1 If X contains a single element x or none, then T is
a tree consisting on a single node that contains x or
is empty.
2 If jX j 2, let Ti be the trie for the subset
Xi = fy j x = i y 2 X ^ i 2 g 314/405
Tries
315/405
Tries
Lemma
If the edges in the trie T for X are labelled in such a
way that the edge connecting the root of T with subtree
Ti has label i , 1 i m, then the sequence of
labels in the path from the root to a non-empty leaf
that contains x form the shortest prefix that univoquely
identifies x, that is, the shortest prefix of x which is not
shared by any other element of X
Lemma
Let p be the sequence of labels in a path from the root
of the trie T to some node v (either internal or leaf); the
subtree rooted at v is a trie for the subset of all strings
in X starting with the prefix p (and no other strings)
316/405
Tries
Lemma
Given a finite set X of strings of equal length,
the trie T for X is unique; in particular, T does not
depen on the order in which we “present” or “insert” the
elements of X
Lemma
The height of a trie T is the minimum length of prefixes
needed to distinguish to elements in X ; in other words,
the length of the longest prefix which is common to 2
elements in X ; of course, if ` is the length of the string
in X then
height(T ) `
317/405
Tries
318/405
Tries
X = {ahora, alto, amigo, amo, asi, bar, barco, bota, ...}
a b c #
...
a h l m s # a o #
... ... ...
ahora
asi bota
alto
a r #
a
...
i o
...
#
... ...
c #
amigo amo ...
319/405
barco bar
Tries
320/405
Tries
321/405
Tries
322/405
Tries
Despite it is more costly in space to use nodes to store full
string, symbol by symbol, instead of storing suffixes once a leaf
is reached, it is advantadgeous to avoid different types of
nodes, different types of pointers or forcing pointers of one type
point to nodes of some other type, using wasteful unions,. . .
// We asssume that the class Key supports
// x.length() = length() >= 0 of key x
int Key::length() const;
324/405
Tries
325/405
Tries
326/405
Tries
// Pre: p points to the root of the subtree that contains
// all elements such that their first i-1 symbols
// coincide with the first i-1 symbols of the key k
// Post: returns a pointer to the root of the tree resulting from
// the insertion of the pair $\pair{k[i..],v}$ in the subtree
// Cost: O j j
( k m)
template <typename Symbol, typename Key,
typename Value>
DigitalDictionary<Symbol,Key,Value>::trie_node*
DigitalDictionary<Symbol,Key,Value>::_insert(trie_node* p,
const Key& k, int i) const {
if (i == k.length()) {
if (p == nullptr) p = new trie_node;
p -> _c = Symbol(); // Symbol() is the end-of-string symbol
// e.g. Symbol() == ’\0’ or Symbol() == ’\sharp’
p -> _v = v;
return p;
}
if (p == nullptr or p -> _c > k[i]) {
trie_node* p = new trie_node;
p -> _next_sibl = p;
p -> _c = k[i];
p -> _first_child = _insert(nullptr, k, i+1);
return p;
}
if (p -> _c < k[i])
p -> _next_sibl = _insert(p -> _next_sibl, k, i);
else // p -> _c == k[i]
p -> _first_child = _insert(p -> _first_child, k, i+1);
return p; 327/405
Ternary Search Trees
One alternative to implement tries is to represent the trie nodes
as binary search trees with pointers to roots of subtrees,
instead of as array of pointers to roots, or as linked lists of
pointers to roots (of non-empty subtrees).
The new data structure, invented by Bentley and Sedgewick
(1997), is called an ternary search tree (TST). It tries to
combine the efficiency in space of list-tries (we avoid the large
number of null pointers when using arrays) and the efficiency in
time (we avoid the linear “scans” when using lists to navigate to
the appropriate subtree).
Nodes in TSTs have a symbol c and 3 pointers each: pointers
to the left and right child of the node in the BST that represents
the trie “node”, and a central pointer to the root of the subtree
with symbol c at that level.
328/405
Ternary Search Trees
329/405
Ternary Search Trees
330/405
Ternary Search Trees
331/405
Ternary Search Trees
332/405
Ternary Search Trees
template <typename Symbol,
typename Key,
typename Value>
DigitalDictionary<Symbol,Key,Value>::tst_node*
DigitalDictionary<Symbol,Key,Value>::_insert(
tst_node* t, int i,
const Key& k, const Value& v) {
if (t == nullptr) {
t = new tst_node;
t -> _left = t -> _right = t -> cen = nullptr;
t -> _c = k[i];
if (i < k.length() - 1) {
t -> _cen = _insert(t -> _cen, i + 1, k, v);
} else { // i == k.length() - 1; k[i] == Symbol()
t -> _v = v;
}
} else {
if (t -> _c == k[i])
t -> _cen = _insert(t -> _cen, i + 1, k, v);
if (k[i] < t -> _c)
t -> _left = _insert(t -> _left, i, k, v);
if (t -> _c < k[i])
t -> _right = _insert(t -> _right, i, k, v);
}
return t;
}
333/405
Performance of Tries
334/405
Performance of Tries
The EPL is the sum of the length of all paths from the root of
the trie to all leaves in the trie; a random search (successful or
unsuccessful) will cost, on average CS =HS log n
335/405
Performance of Tries
For example, if the source is memoryless with probabilities p1 ,
P
. . . , pm , for the symbols of the alphabet ( i pi = 1) then
P
HS = i pi log pi is the entropy of the source and
Type CS
Array 1
P
List
Pi
(i 1)pi
TST 2 i<j pi +p:::i p+j pj
When all pi = 1=m we have that HS = log m and hence the
cost of random searches is
Type Cost of random search
Array logm n
List m2 logm n
TST 2 ln m logm n = 2 ln n
336/405
Performance of Tries
337/405
Performance of Tries
338/405
Patricia (a.k.a. Compressed Tries)
339/405
Patricia
340/405
Patricia
f g
A Patricia for the set X = bear; bell; : : : ; stock; stop .
342/405
Patricia
343/405
Patricia
344/405
Patricia
345/405
Inverted files
346/405
Inverted files
We then proceed to insert/update, one by one, each index term
of each document, in a trie (or TST or Patricia). When a word
appears in several different documents we will keep track in a
occurrence list. Because of their sheer volume, occurrence lists
will be typically stored in secondary memory, and they won’t be
kept in any particular order.
When we process a word w from document Di , we consider
three cases
1 w is a stopword: discard it and proceed to the next
list with (w; Di ), append the new occurrence list to the set
of occurrence lists, and add a link from the (compressed) 347/405
Inverted files
350/405
Tries
To learn more:
[1] D. E. Knuth
The Art of Computer Programming, Volume 3: Sorting and
Searching, 2nd ed
Addison-Wesley, 1998
[2] M. T. Goodrich and R. Tamassia
Algorithm Design and Applications
John Wiley & Sons, 2015
351/405
Tries
To learn more:
[1] J. L. Bentley and R. Sedgewick
Fast algorithms for sorting and searching strings
Proc. SODA, pp. 360–369, 1997
[2] J. Clà c ment, Ph. Flajolet and B. Vallà c e
The Analysis of Hybrid Trie Structures
Proc. SODA, pp. 531–539, 1998
352/405
Part V
15 Tries
16 Suffix Trees
353/405
Suffix Trees
A suffix tree (or suffix trie) is simply a trie for all the suffixes of a
string, the text, T [0::n 1].
Thus we will form a trie with the M suffixes of the text T . Since
the length of the text is n there are The total number of (proper)
suffixes to store is n (n + 1 if we count the empty suffix) and the
total number of symbols involved is
n
X n(n + 1)
n i=
2
i=0
as the suffix T [i::n 1] has length n i, 0 i n
354/405
Suffix Tries
A compact representation of the trie, storing the pair (i; j ) to
represent a substring T [i::j ] will be most convenient; in any
case, using Patricia for the suffixes guarantees that the space
used is (n).
356/405
Suffix Tries
357/405
Suffix Tries
The same bound is achieved with suffix tries: but we invest time
(n) in the preprocessing of the text (! build the suffix trie!)
and then search the pattern in the suffix trie with cost (k).
358/405
Suffix tries
int k = P.size();
int j = 0;
suffix_trie_node* p = T.root;
do {
bool fin = true;
for(q:children of p) {
int i = q.first;
if (P[j] == T[i]) {
int len = q.last - i + 1;
if (k <= len) {
// suffix is shorter that node label
if (P[j..j+k-1] == T[i..i+k-1])
return ‘‘match at i-j’’
else
return ‘‘P not a substring of T’’
} else { // k > len
if (P[j..j+len-1]==T[i..i+len-1])
k -= len; j += len; p = q;
fin = false;
break; // end the for(q:children of p) loop
}
}
}
} while (not fin and p is not a leaf);
return ‘‘P not a substring of T’’;
359/405
Tries
To learn more:
[1] D. E. Knuth
The Art of Computer Programming, Volume 3: Sorting and
Searching, 2nd ed
Addison-Wesley, 1998
[2] M. T. Goodrich and R. Tamassia
Algorithm Design and Applications
John Wiley & Sons, 2015
[3] Dan Gusfield
Algorithms on Strings, Trees & Sequences
Cambridge Univ. Press, 1997
360/405
Part VI
19 Quad Trees
361/405
Why Multidimensional?
Multidimensional data everywhere:
Points, lines,
rivers, maps, cities, roads,
hyperplanes, cubes, hypercubes,
mp3, mp4 and mp5 files,
jpeg files, pixels,
...,
Used in applications such as:
database design, geographic information systems (GIS),
computer graphics, computer vision, computational
geometry, image processing,
pattern recognition,
very large scale integration (VLSI) design,
...
362/405
Why Multidimensional?
x = (x0 ; x1 ; : : : ; xK 1 )
Retrieval: associative queries that involve more than one of
the K dimensions
Data structures:
K -Dimensional Search Trees (a.k.a. K -d trees)
Quad Trees
...
363/405
Associative Retrieval
364/405
Associative Queries
365/405
Associative Queries
365/405
Associative Queries
365/405
Partial Match Queries
Definition
Given a file F of n K -dimensional records and a query
q = (q0 ; q1 ; : : : ; qK 1 ) where each qi is either a value
in Di (it is specified) or (it is unspecified), a partial
match query returns the subset of records x in F whose
attributes coincide with the specified attributes of q . This
is,
366/405
Partial Match Queries
Example
Query: q = (; q2 ) or q = (q1 ; q2 ) with specification
pattern: 01
367/405
Part VI
19 Quad Trees
368/405
Standard K -d trees
369/405
Standard K -d trees
369/405
Standard K -d trees
369/405
Standard K -d trees
369/405
Standard K -d trees
369/405
Standard K -d Trees
Definition (Bentley75)
A standard K -d tree T of size n 0 is a binary tree that
stores a set of n K -dimensional points, such that
it is empty when n = 0, or
its root stores a key x and a discriminant
j = level of the root mod K , 0 j < K , and the
remaining n 1 records are stored in the left and
right subtrees of T , say L and R, in such a way that
both L and R are K -d trees; furthermore, for any
key u 2 L, it holds that uj < xj , and for any key
v 2 R, it holds that xj < vj .
370/405
Relaxed K -d trees
371/405
Relaxed K -d trees
372/405
Relaxed K -d trees
372/405
Relaxed K -d trees
2
2
372/405
Relaxed K -d trees
2
2
3
3
1
372/405
Relaxed K -d trees
1
4
2
2
3
3 4
1
372/405
Relaxed K -d trees
1
4
2
5 5 2
3
3 4
1
372/405
K -d trees
373/405
K -d trees
The bounding box of a leaf in the K -d tree T is a region of the
domain associated to that leaf, namely, the set of points in the
K -dimensional space which will replace that leaf if any of them
were inserted into T .
The bounding box of a node x is that of the leaf which that node
replaced.
374/405
Partial Match in K -d Trees
375/405
Partial Match Algoritm
376/405
Orthogonal Range and Region Queries in K -d
Trees
377/405
Orthogonal Range Algoritm
378/405
Nearest Neighbors
379/405
Nearest Neighbors
380/405
Nearest Neighbors
381/405
Nearest Neighbors Algoritm
procedure NN(T , q , S , r)
. T : tree, q: query, S : result, initially empty
+
. r: search radius, initially 1
=
if T 6 then . nothing to do if T were empty
=
i T ! discr
if d:= ( )
D ISTANCE T ! key; q r then
if jS j k then
()
S:EXTRACT _ MIN
( )
S:INSERT T ! key; d
:= ()
r S:MIN _ PRIO
else
( )
S:INSERT T ! key; d
. . . . see next slide
382/405
Nearest Neighbors Algoritm
procedure NN(T , q , S , r)
...
[]
if q i [] []
r T ! key i ^ T ! key i q i r then [ ]+
[]
if q i T ! key i then[]
(
NN T ! left; q; S; r )
[]
if T ! key i q i [ ]+
r then
(
NN T ! right; q; S; r )
else
(
NN T ! right; q; S; r )
[]
if q i r T ! key i then[]
(
NN T ! left; q; S; r )
else
. query region does no intersect both BB’s
. visit the subtree that corresponds
...
383/405
Part VI
19 Quad Trees
384/405
2-d Quad Trees
386/405
Quad Trees
386/405
Quad Trees
386/405
Quad Trees
386/405
Quad Trees
387/405
Part VI
19 Quad Trees
388/405
The Cost of Partial Match Searches
s 1 nX1 j + 1 n j
Pn = 1 + Pj + P
K n j =0 n + 1 n+1 n 1 j
K s 1 nX1
+ (P + P 1 j)
K n j =0 j n
389/405
The Cost of Partial Match Searches
The shape function for the recurrence is, with := s=K ,
!(z ) = 2z + 2(1 )
If we compute
Z 1
H=1 !(z )dz = 1 = 1<0
0
We need to find 2 [0; 1] such that
Z 1
z !(z )dz = 1;
0
that is
2 2(1 )
+ = 1:
+1 +2
The solution of the quadratic equation is
p
= ( 9 8 1)=2
390/405
The Cost of Partial Match
Pn = n + O(1);
where
p
= ( 9 8 1)=2
(2 + 1)
=
(1 )( + 1) 3 ( + 1)
(z ) is Euler’s Gamma function.
391/405
The Cost of Partial Match
( + 2) ( + 1)1 = 2;
and u is a constant depending on the query pattern u
(u[i] = specified/non-specificied)
392/405
The Cost of Partial Match
#= (1 )
it is very close to 0 (and never greater than 0.07) for satdnard
K -d trees. The excess for relaxed K -d trees is not very big but
can be as much as 0.12.
Squarish K -d trees achieve the ultimate optimal expected
performance as they have excess = 0, thanks to their more
(heuristically) balanced space partition, induced by the choice
of discriminants.
393/405
The Cost of Partial Match Searches
Excess of the exponent with respect to 1 s=K . Solid line:
standard K -d trees. Dotted line: relaxed K -d trees
0.12
0.1
0.08
0.06
0.04
0.02
394/405
The Cost of Partial Match Searches
Plot of the coefficient for relaxed K -d trees
14
12
10
395/405
The Cost of Orthogonal Range Queries
+ v1 n ((K 1)=K ) + 2v
0 ln n + O(1)
where vK is, roughly, the “volume” of the K -dimensional
boundary of Q. For example, if K = 2 then v2 = 0 1 ,
v1 = 0 (1 1 ) + 1 (1 0 ) 0 + 1 ,
v0 = (1 0 ) (1 1 ) = 1 0 1 + 0 1 . Intuition: the
orthogonal range search behaves in a region of “‘volume” vj
exactly as a partial match with K j specified coordinates.
396/405
The Cost of Neareast Neighbors Queries
Sn = (n# + log n)
where # = maxj f (j=K ) 1 + j=K g is the maximum excess.
Intuition: the nearest neighbor search behaves like an
397/405
To learn more
[1] J. L. Bentley.
Multidimensional binary search trees used for associative
retrieval.
Communications of the ACM, 18(9):509–517, 1975.
[2] J. L. Bentley and R. A. Finkel.
Quad trees: A data structure for retrieval on composite
keys.
Acta Informatica, 4:1–9, 1974.
[3] H. H. Chern and H. K. Hwang.
Partial match queries in random k-d trees.
SIAM J. on Computing, 35(6):1440–1466, 2006.
[4] H. H. Chern and H. K. Hwang.
Partial match queries in random quad trees.
SIAM Journal on Computing, 32(4):904–915, 2003.
398/405
To learn more (2)
[5] L. Devroye.
Branching processes in the analysis of the height of trees.
Acta Informatica, 24:277–298, 1987.
[6] L. Devroye and L. Laforest.
An analysis of random d-dimensional quadtrees.
SIAM Journal on Computing, 19(5):821–832, 1990.
[7] A. Duch.
Randomized insertion and deletion in point quad trees.
In Int. Symposium on Algorithms and Computation
(ISAAC), LNCS. Springer–Verlag, 2004.
[8] A. Duch, V. Estivill-Castro, and C. Martínez.
Randomized K -dimensional binary search trees.
In K.-Y. Chwa and O. H. Ibarra, editors, Int. Symposium on
Algorithms and Computation (ISAAC’98), volume 1533 of
399/405
LNCS, pages 199–208. Springer-Verlag, 1998.
To learn more (3)
402/405
General References (2)
403/405
General References (3)
[7] R. Sedgewick.
Algorithms in C.
Addison-Wesley, 3rd edition, 1997.
[8] R. Sedgewick and K. Wayne.
Algorithms.
Addison-Wesley, 4th edition, 2011.
[9] D. Gusfield.
Algorithms on String, Trees, and Sequences.
Cambridge Univ. Press, 1997.
404/405
General References (3)
[10] H. Samet.
Foundations of Multidimensional and Metric Data
Structures.
Morgan Kaufmann, 2006.
405/405