0% found this document useful (0 votes)
14 views

1.advanced Tree Structures

Uploaded by

rgn12
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

1.advanced Tree Structures

Uploaded by

rgn12
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 29

Data Structures & Algorithms II

CSC 357 2.0

Text books:
1.C.L.Shaffer, Data Structures & Algorithm Analysis, Prentice Hall, 1997.
2.R. Sedgewick, Algorithms in C++, Addison-Wesley, 1992.

1
Tries, K-D trees, Quad trees

2
Preprocessing Strings
Preprocessing the pattern speeds up pattern matching
queries

If the text is large, immutable and searched for often


(e.g., works by Shakespeare), we may want to
preprocess the text instead of the pattern

A trie is a compact data structure for representing a set


of strings, such as all the words in a text
 A tries supports pattern matching queries in time

proportional to the pattern size

3
Standard Tries
The standard trie for a set of strings S is an ordered tree such
that:
 Each node but the root is labeled with a character
 The children of a node are alphabetically ordered
 The paths from the external nodes to the root yield the strings of
S
Example: standard trie for the set of strings
S = { bear, bell, bid, bull, buy, sell, stock, stop } 31
b s

e i u e t

a l d l y l o

r l l l c p

k
4
Analysis of Standard Tries
A standard trie uses O(n) space and supports
searches, insertions and deletions in time O(dm),
where:
n total size of the strings in S O(31)
m size of the string parameter of the operation d = 26,
m=4 for bull
d size of the alphabet
b s

e i u e t

a l d l y l o

r l l l c p

k
5
Word Matching with a Trie
We insert s e e a b e a r ? s e l l s t o c k !
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
the words
s e e a b u l l ? b u y s t o c k !
of the text 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
into a trie b i d s t o c k ! b i d s t o c k !
Each leaf 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
stores the h e a r t h e b e l l ? s t o p !
occurrence 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88
s of the
associated
b h s
word in the
text e i u e e t

a l d l y a e l o
47, 58 36 0, 24
r l l r l c p
6 78 30 69 12 84
k
17, 40,
51, 62 6
Compressed
Tries
A compressed trie has b s
internal nodes of
degree at least two e id u ell to
It is obtained from
standard trie by ar ll ll y ck p
compressing chains of
“redundant” nodes

b s

e i u e t

a l d l y l o

r l l l c p

k
7
Compact Tries
Compact Trie:
 Replace a chain of one-child nodes with an
edge labeled with a string
 Each non-leaf node (except root) has at
least two children
b s
b sun
e u u
i ear$
a n ul day$
d id$ $
r l d
$
$ l a l$
$ k k$
$ y
$
$

8
Compact Tries II
Implementation:
 Strings are external to the structure in one array,

edges are labeled with indices in the array (from, to)


Can be used to do word matching: find where the given
word appears in the text.
 Use the compact trie to “store” all words in the text

 Each child in the compact trie has a list of indices in

the text where the corresponding word appears.

9
Compact Representation
Compact representation of a compressed trie for an array of
strings:
 Stores at the nodes ranges of indices instead of substrings
 Uses O(s) space, where s is the number of strings in the array
 Serves as an auxiliary index structure
0 1 2 3 4 0 1 2 3 0 1 2 3
S[0] = s e e S[4] = b u l l S[7] = h e a r
S[1] = b e a r S[5] = b u y S[8] = b e l l
S[2] = s e l l S[6] = b i d S[9] = s t o p
S[3] = s t o c k

1, 0, 0 7, 0, 3 0, 0, 0

1, 1, 1 6, 1, 2 4, 1, 1 0, 1, 1 3, 1, 2

1, 2, 3 8, 2, 3 4, 2, 3 5, 2, 2 0, 2, 2 2, 2, 3 3, 3, 4 9, 3, 3
10
Suffix Trie
The suffix trie of a string X is the compressed trie of all
the suffixes of X

m i n i m i z e
0 1 2 3 4 5 6 7

e i mi nimize ze

mize nimize ze nimize ze

11
Analysis of Suffix Tries
Compact representation of the suffix trie for a
string X of size n from an alphabet of size d
 Uses O(n) space
 Supports arbitrary pattern matching queries in X in
O(dm) time, where m is the size of the pattern
 Can be constructed in O(n) time

m i n i m i z e
0 1 2 3 4 5 6 7

7, 7 1, 1 0, 1 2, 7 6, 7

4, 7 2, 7 6, 7 2, 7 6, 7

12
Search and Insertion in
Tries
Trie-Search(t, P[k..m])
01 if t is leaf then return true
02 else if t.child(P[k])=nil then return false
03 else return Trie-Search(t.child(P[k]), P[k+1..m])

The search algorithm just follows the path down


the tree (starting with Trie-Search(root, P[0..m]))
Trie-Insert(t, P[k..m])
01 if t is not leaf then //otherwise P is already
present
02 if t.child(P[k])=nil then
03 Create a new child of t and a “branch” starting
with that chlid and storing P[k..m]
04 else Trie-Insert(t.child(P[k]), P[k+1..m])
 How would the delete work?

13
Word Matching with Tries
(17,18)
(31,34)
(1,2)
(22,24)
(14,16) (19,19)
31 (3,3) 20
(8,11)
12 17
6 (28,30) (4,5)

25,35 1

1 2 3 4 5 6 7 8 9 10 11 12 14 16 18 20 22 24 26 28 30 32 34 36 38

T: they think that we were there and there 40

To find a word P:
 At each node, follow edge (i,j), such that P[i..j] = T[i..j]
 If there is no such edge, there is no P in T, otherwise, find
all starting indices of P when a leaf is reached
14
Patricia trie (practical algorithm to retrieve information coded in alphanumeric)
Patricia trie:
 a compact trie where each edge’s label (from, to) is
replaced by (T[from], to – from + 1)

(w,2)
(a,4)
(t,2)
(r,3)
(_,1)
(a,3)
31 (e,1) 20
(i,4)
12 17
6 (y,2)
(r,3)

25,35 1

1 2 3 4 5 6 7 8 9 10 11 12 14 16 18 20 22 25 27 29 31 33 35 37

T: they think that we were there and there


40

15
k-d tree
k-dimensional indexing

16
Definition
Let k be a positive integer. Let t be
a k-d tree, with a root node p.
Then, for any node n in t :
 The key j,j+1, …, j-1 of any node q in the
left subtree of n is smaller than that of
node p,
 The key j,j+1, …, j-1 of any node q in the
right subtree of n is larger than that of
node p.
17
k-d tree

18
Example

20,31

15,15 36,10

6,6 31,40

25,16 40,36

19
Insertion

20,31

15,15 36,10

6,6 31,40

25,16 40,36

20
Exact Search

20,31
(40, 36)

15,15 36,10

6,6 31,40

25,16 40,36

21
Range search

20,31

15,15 36,10

6,6 31,40

25,16 40,36

22
Quad Trees

23
Motivation for Studying Quad
Trees

Allows efficient querying of points


in multidimensional space by
pruning the search space.
Applications:
 Photon Mapping
 Point Cloud Processing

24
Point Quadtrees
• Point of decomposition at data points.

25
Point Quadtree: Insertion
Order of data point insertion matters:

Unbalanced Point Quadtree Optimized Point Quadtree

1 1 1
2 2 2

3 3 3
4 4 4

1 2

2 1 3

3 4

An optimized point quadtree is constructed such that for any node x, the number
of nodes in any of its quadrants will not exceed half the total number of nodes in
the subtree rooted at x.
Procedure for constructing optimized point quadtrees: (1) Sort points
by x-value. (2) Assign median point m as root of tree. (3) By choosing m,
remaining points get divided into 4 groups. (4) Repeat procedure on each group.26
Point Quad Tree: Deletion

Naïve Approach: Remove node x and re-insert nodes in


subtree(x). This is supposedly very inefficient.
Smart approach: General idea is to pick a replacement
node that minimizes the number of nodes requiring re-
insertion. As illustrated below, in a good scenario, we only
need to replace node x without having to restructure the
tree.

Want to delete this

27
PR (Point Region) Quad Trees

Space subdivided repeatedly into congruent quadrants until all


quadrants contain no more than one data point
The bucket PR quadtree allows no more than some b > 1 number of
data points per quadrant 28
Quadtree Comparison
Point Quadtree
 advantages:
 Compact, because number of tree nodes equals number of data
points
 Shorter search paths (~log N) compared to kd-trees (~log N)
4 2
 drawbacks:
 Deletion of a node involves finding a suitable replacement and
rearranging the nodes of its subtree.
 Tree shape depends on order of data point insertion. Inserting at
arbitrary order may result in unbalanced trees.

PR Quadtree
 adv:
 Tree shape independent of order of data point insertion. It
depends only on arrangement of data points in space
 Deletion is straightforward, since all data points reside in leaf
nodes.
 drawbacks:
 Certain quadrants may require many subdivisions to separate
densely clumped points, leading to a deep search paths.

29

You might also like