Multidimensional Search Trees
Multidimensional Search Trees
Search Trees
CS 302 Data Structures
Dr. George Bebis
Query Types
2
Exact Match Query
Example:
key=ID: retrieve the record with ID=12345
3
Range Query
Example:
key=Age: retrieve all records satisfying
20 < Age < 50
key= #Children: retrieve all records satisfying
1 < #Children < 4
4
Nearest-Neighbor(s) (NN)
Query
Example:
key=Salary: retrieve the employee whose salary
is closest to $50,000 (i.e., 1-NN).
key=Age: retrieve the 5 employees whose age is
closest to 40 (i.e., k-NN, k=5).
5
Nearest Neighbor(s) Query
What is the closest restaurant to my hotel?
6
Nearest Neighbor(s) Query
(contd)
Find the 4 closest restaurants to my hotel
7
Multi-dimensional Query
8
Nearest Neighbor Query in High
Dimensions
Very important and practical problem!
Image retrieval
find N closest
matches (i.e., N
nearest neighbors)
(f1,f2, .., fk)
9
Nearest Neighbor Query in High
Dimensions
Face recognition
10
We will discuss
Range trees
KD-trees
Quadtrees
11
Interpreting Queries
Geometrically
Multi-dimensional keys can be thought as
points in high dimensional spaces.
12
Example 1- Range Search in
2D
13
Example 2 Range Search in
3D
14
Example 3 Nearest
Neighbors Search
Query
Point
15
1D Range Search
16
1D Range Search
Range: [x, x]
17
1D Range Search
Data Structure 2: BST
Search using binary search property.
Some subtrees are eliminated during search.
Search using:
Range:[l,r]
x if
if l x r>x
search
search
18
1D Range Search
Data Structure 3: BST with data stored in leaves
Internal nodes store splitting values (i.e., not
necessarily same as data).
Data points are stored in the leaf nodes.
19
BST with data stored in
leaves
0 100
25 50 75
50
10 39 55 120
20
1D Range Search
Retrieving data in [x, x]
Perform binary search twice, once using x and the other using x
Suppose binary search ends at leaves l and l
The points in [x, x] are the ones stored between l and l plus,
possibly, the points stored in l and l
21
1D Range Search
Example: retrieve all points in [25, 90]
The search path for 25 is:
22
1D Range Search
The search for 90 is:
23
1D Range Search
Examine the leaves in the sub-trees between the
two traversing paths from the root.
split node
25
1D Range Search
How do we find the leaves of interest?
Find split node (i.e., node where the
paths to x and x split).
26
1D Range Search
Speed-up search by keeping the leaves in
sorted order using a linked-list.
27
2D Range Search
28
2D Range Search (contd)
A 2D range query can be decomposed in two 1D
range queries:
One on the x-coordinate of the points.
The other on the y-coordinates of the points.
29
2D Range Search (contd)
Store a primary 1D range tree for all the points
based on x-coordinate.
For each node, store a secondary 1D range tree based
on y-coordinate.
30
2D Range Search (contd)
Range Tree
31
2D Range Search (contd)
Search using the x-coordinate only.
How to restrict to points with proper y-coordinate?
32
2D Range Search (contd)
Recursively search within each subtree using
the y-coordinate.
33
Range Search in d
dimensions
1D query time: O(logn + k)
34
KD Tree
A binary search tree where every node is a
k-dimensional point.
27, 28 65, 51
Pleft Pright
KD Tree (contd)
As we move down the tree, we divide the space along
alternating (but not always) axis-aligned hyperplanes:
x
KD Tree - Example
Split by y-coordinate: split by a horizontal line that
has half the points below or on and half above.
y y
KD Tree - Example
Split by x-coordinate: split by a vertical line that
has half the points left or on, and half right.
y y
x
x x x
KD Tree - Example
Split by y-coordinate: split by a horizontal line that
has half the points below or on and half above.
y y
x
x x x
y y
Node Structure
A KD-tree node has 5 fields
Splitting axis
Splitting value
Data
Left pointer
Right pointer
Splitting Strategies
Divide based on order of point insertion
Assumes that points are given one at a time.
62
KD Tree (contd)
Lets discuss
Insert
Delete
Search
63
Insert new data
55 > 53, move right
70, 3 99, 90 x
30, 11 31, 85
55 < 99, move left
y 65, 51
27, 28
x 70, 3 99, 90
30, 11 31, 85
x 38, 23 73, 75
15, 61
79
KD Tree - Range Search
Consider a KD Tree where the data is stored
at the leaves, how do we perform range
search?
KD Tree Region of a node
The region region(v) corresponding to a node
v is a rectangle, which is bounded by
splitting lines stored at ancestors of v.
KD Tree - Region of a node
(contd)
A point is stored in the subtree rooted at node
Space requirements:
KD tree: O(n)
Range tree: O(nlogd-1n)
Query requirements:
KD tree: O(n1-1/d+k) O(n+k) as d increases!
Range tree: O(logdn+k)
Nearest Neighbor (NN)
Search
Given: a set P of n points in Rd
Goal: find the nearest neighbor p of q in P
p = ( x1 , y1 ) q = ( x2 , y2 )
p d = ( x1 - x2 ) 2 + ( y1 - y2 ) 2
q
Euclidean distance
Nearest Neighbor Search
-Variations
p
q
Array (Grid) Structure
(1) Subdivide the plane into a grid of M x N square cells (same size)
p1
p1,p2
p2
Array (Grid) Structure
Algorithm
* Look up cell holding query point. p1
q
p2
* First examine the cell containing the query,
then the cells adjacent to the query
(i.e., there could be points in adjacent
cells that are closer).
Comments
* Uniform grid inefficient if points unequally distributed.
- Too close together: long lists in each grid, serial search.
- Too far apart: search large number of neighbors.
400 a
Input: point set P b
while Some cell C contains more than k c
points do d e
Split cell C
Y
g f
end h l
j
i k
X 50, Y 200 0 X 100
SW SE NW NE
j k f g l d a b
Query
P
B(75,80) SE NE
C(90,65) SW
NW
B(75,80)
A(50,50) E D SE
NE
SW NW
E(25,25) C
SW
X1,Y1
NE
NW SE
X2,Y2
Y
X
Quadtree Nearest Neighbor
Query
SW X1,Y1 NE
NW SE
X2,Y2
Y NW
X
Quadtree Nearest Neighbor
Query
SW X1,Y1 NE
NW SE
X2,Y2
SW
Y NW SE NE
X
Quadtree Nearest Neighbor
Search
Algorithm
Initialize range search with large r
Put the root on a stack
Repeat
Pop the next node T from the stack
q
For each child C of T
Easy to implement.
Explorethebranchofthetreethatisclosesttothequery
pointfirst.
Nearest Neighbor with KD
Trees
5 7 8 10 12 13 15 18
20
7,8,10,12 13,15,18
8 ,7 12 ,10 15 ,13 18
NN example using kD trees
(contd)
d=1 (binary search tree)
5 20
7 8 10 12 13 15 18
query
7,8,10,12 13,15,18 17
5 20
7 8 10 12 13 15 18
query
7,8,10,12 13,15,18 16
Principal Component
Partitioning (PCP)
KD variations - PCP Trees
Curse of dimensionality
KD-trees are not suitable for efficiently finding the
nearest neighbor in high dimensional spaces.
Query time: O(n1-1/d+k)
Approximate Nearest-Neighbor (ANN)
Examine only the N closest bins of the kD-tree
Use a heap to identify bins in order by their distance
from query.
Return nearest-neighbors with high probability
(e.g., 95%).
J. Beis and D. Lowe, Shape Indexing Using Approximate Nearest-Neighbour Search in
High-Dimensional Spaces, IEEE Computer Vision and Pattern Recognition, 1997.
118
Dimensionality Reduction
Idea: Find a mapping T to reduce the dimensionality
of the data.
Drawback: May not be able to find all similar objects
(i.e., distance relationships might not be preserved)
119