1.advanced Tree Structures
1.advanced Tree Structures
Text books:
1.C.L.Shaffer, Data Structures & Algorithm Analysis, Prentice Hall, 1997.
2.R. Sedgewick, Algorithms in C++, Addison-Wesley, 1992.
1
Tries, K-D trees, Quad trees
2
Preprocessing Strings
Preprocessing the pattern speeds up pattern matching
queries
3
Standard Tries
The standard trie for a set of strings S is an ordered tree such
that:
Each node but the root is labeled with a character
The children of a node are alphabetically ordered
The paths from the external nodes to the root yield the strings of
S
Example: standard trie for the set of strings
S = { bear, bell, bid, bull, buy, sell, stock, stop } 31
b s
e i u e t
a l d l y l o
r l l l c p
k
4
Analysis of Standard Tries
A standard trie uses O(n) space and supports
searches, insertions and deletions in time O(dm),
where:
n total size of the strings in S O(31)
m size of the string parameter of the operation d = 26,
m=4 for bull
d size of the alphabet
b s
e i u e t
a l d l y l o
r l l l c p
k
5
Word Matching with a Trie
We insert s e e a b e a r ? s e l l s t o c k !
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
the words
s e e a b u l l ? b u y s t o c k !
of the text 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
into a trie b i d s t o c k ! b i d s t o c k !
Each leaf 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
stores the h e a r t h e b e l l ? s t o p !
occurrence 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88
s of the
associated
b h s
word in the
text e i u e e t
a l d l y a e l o
47, 58 36 0, 24
r l l r l c p
6 78 30 69 12 84
k
17, 40,
51, 62 6
Compressed
Tries
A compressed trie has b s
internal nodes of
degree at least two e id u ell to
It is obtained from
standard trie by ar ll ll y ck p
compressing chains of
“redundant” nodes
b s
e i u e t
a l d l y l o
r l l l c p
k
7
Compact Tries
Compact Trie:
Replace a chain of one-child nodes with an
edge labeled with a string
Each non-leaf node (except root) has at
least two children
b s
b sun
e u u
i ear$
a n ul day$
d id$ $
r l d
$
$ l a l$
$ k k$
$ y
$
$
8
Compact Tries II
Implementation:
Strings are external to the structure in one array,
9
Compact Representation
Compact representation of a compressed trie for an array of
strings:
Stores at the nodes ranges of indices instead of substrings
Uses O(s) space, where s is the number of strings in the array
Serves as an auxiliary index structure
0 1 2 3 4 0 1 2 3 0 1 2 3
S[0] = s e e S[4] = b u l l S[7] = h e a r
S[1] = b e a r S[5] = b u y S[8] = b e l l
S[2] = s e l l S[6] = b i d S[9] = s t o p
S[3] = s t o c k
1, 0, 0 7, 0, 3 0, 0, 0
1, 1, 1 6, 1, 2 4, 1, 1 0, 1, 1 3, 1, 2
1, 2, 3 8, 2, 3 4, 2, 3 5, 2, 2 0, 2, 2 2, 2, 3 3, 3, 4 9, 3, 3
10
Suffix Trie
The suffix trie of a string X is the compressed trie of all
the suffixes of X
m i n i m i z e
0 1 2 3 4 5 6 7
e i mi nimize ze
11
Analysis of Suffix Tries
Compact representation of the suffix trie for a
string X of size n from an alphabet of size d
Uses O(n) space
Supports arbitrary pattern matching queries in X in
O(dm) time, where m is the size of the pattern
Can be constructed in O(n) time
m i n i m i z e
0 1 2 3 4 5 6 7
7, 7 1, 1 0, 1 2, 7 6, 7
4, 7 2, 7 6, 7 2, 7 6, 7
12
Search and Insertion in
Tries
Trie-Search(t, P[k..m])
01 if t is leaf then return true
02 else if t.child(P[k])=nil then return false
03 else return Trie-Search(t.child(P[k]), P[k+1..m])
13
Word Matching with Tries
(17,18)
(31,34)
(1,2)
(22,24)
(14,16) (19,19)
31 (3,3) 20
(8,11)
12 17
6 (28,30) (4,5)
25,35 1
1 2 3 4 5 6 7 8 9 10 11 12 14 16 18 20 22 24 26 28 30 32 34 36 38
To find a word P:
At each node, follow edge (i,j), such that P[i..j] = T[i..j]
If there is no such edge, there is no P in T, otherwise, find
all starting indices of P when a leaf is reached
14
Patricia trie (practical algorithm to retrieve information coded in alphanumeric)
Patricia trie:
a compact trie where each edge’s label (from, to) is
replaced by (T[from], to – from + 1)
(w,2)
(a,4)
(t,2)
(r,3)
(_,1)
(a,3)
31 (e,1) 20
(i,4)
12 17
6 (y,2)
(r,3)
25,35 1
1 2 3 4 5 6 7 8 9 10 11 12 14 16 18 20 22 25 27 29 31 33 35 37
15
k-d tree
k-dimensional indexing
16
Definition
Let k be a positive integer. Let t be
a k-d tree, with a root node p.
Then, for any node n in t :
The key j,j+1, …, j-1 of any node q in the
left subtree of n is smaller than that of
node p,
The key j,j+1, …, j-1 of any node q in the
right subtree of n is larger than that of
node p.
17
k-d tree
18
Example
20,31
15,15 36,10
6,6 31,40
25,16 40,36
19
Insertion
20,31
15,15 36,10
6,6 31,40
25,16 40,36
20
Exact Search
20,31
(40, 36)
15,15 36,10
6,6 31,40
25,16 40,36
21
Range search
20,31
15,15 36,10
6,6 31,40
25,16 40,36
22
Quad Trees
23
Motivation for Studying Quad
Trees
24
Point Quadtrees
• Point of decomposition at data points.
25
Point Quadtree: Insertion
Order of data point insertion matters:
1 1 1
2 2 2
3 3 3
4 4 4
1 2
2 1 3
3 4
An optimized point quadtree is constructed such that for any node x, the number
of nodes in any of its quadrants will not exceed half the total number of nodes in
the subtree rooted at x.
Procedure for constructing optimized point quadtrees: (1) Sort points
by x-value. (2) Assign median point m as root of tree. (3) By choosing m,
remaining points get divided into 4 groups. (4) Repeat procedure on each group.26
Point Quad Tree: Deletion
27
PR (Point Region) Quad Trees
PR Quadtree
adv:
Tree shape independent of order of data point insertion. It
depends only on arrangement of data points in space
Deletion is straightforward, since all data points reside in leaf
nodes.
drawbacks:
Certain quadrants may require many subdivisions to separate
densely clumped points, leading to a deep search paths.
29