Spatial Data Indexing and Queries
Spatial Data Indexing and Queries
(from G.Kollios)
Ahmed Mahmood
[email protected]
10/08/2024
Introduction
Suppose we have the following table
Sid Sname GPA Location
0010 David 3.9 (-117.4, 33.9)
0011 Kim 3.9 (-118.1, 35.2)
0014 Joe 3.9 (-122.2, 37.4)
We know how to build index on one-dimensional data, e.g., Sid, Sname, GPA
■ Other applications:
■ VLSI design, CAD/CAM, model of human brain, etc.
Spatial data types
point region
line
■ k-nn queries
■ k-nn queries
■ k-nn queries
■ range queries
■ k-nn queries
■ range queries
■ k-nn queries
■ Common methods:
- Z-ordering Curve
- Hilbert Curve
Z-ordering
■ Basic assumption: Finite precision in the representation
of each coordinate, K bits (2K values)
■ The address space is a square (image) and represented
as a 2K x 2K array
■ Each element is called a pixel
Z-ordering
■ Impose a linear ordering on the pixels
of the image →1 dimensional problem
11 A ZA = shuffle(xA, yA) = shuffle(“01”, “11”)
10 = 0111 = (7)10
01 B
00
00 01 10 11
ZB = shuffle(“01”, “01”) = 0011
Example of Z-values
• Left part shows a map with spatial object A, B, C
• Right part and Left bottom part Z-values within A, B and C
• Note C gets z-values of 2 and 8, which are not close
• Exercise: Compute z-values for B.
Fig 4.7
19
Z-ordering: Summary
■ Given a point (x, y) and the precision K find the pixel for
the point and then compute the z-value
■ Given a set of points, use a B+-tree to index the z-values
■ A range (rectangular) query in 2-d is mapped to a set of
ranges in 1-d
20
Queries
■ Find the z-values contained in the query and
then the ranges
QA QA → range [4, 7]
11
10 QB → ranges [2,3] and [8,9]
01
00
00 01 10 11
QB
21
Hilbert Curve
■ We want points that are close in 2d to be close in the 1d
■ Note that in 2d there are 4 neighbors for each point where
in 1d only 2.
■ Z-curve has some “jumps” that we would like to avoid
■ Hilbert curve avoids the jumps : recursive definition
22
Hilbert Curve- example
■ It has been shown that in general Hilbert is better than
the other space filling curves for retrieval [Jag90]
■ Hi (order-i) Hilbert curve for 2ix2i array
23
Hilbert vs Z-ordering
■ Hilbert tends to transform nearby objects
into nearby sequences.
24
Motivation for Spatial Indexes
■ No perfect approximation when mapping
high-dimensional point.
■ Spatial objects include complicated geometric
shapes.
■ Space filling curves depend on predefined
precision, which cannot handle skew.
25
R-trees
■ [Guttman 84] Main idea: extend B+-tree to
multi-dimensional spaces!
26
R-trees
■ A multi-way external memory tree
■ Index nodes and data (leaf) nodes
■ All leaf nodes appear on the same level
■ Every node contains between M/2 and M entries
■ The root node has at least 2 entries (children)
27
Example
■ eg., with tree fanout 4: group nearby rectangles to
parent MBRs; each group → disk page
A C F G I
H
B
E J
D
28
Example
■ F=4
P1 P3
A C F G I
H
B
E P4 J A B C H I J
P2 D D E F G
29
Example
■ F=4
P1 P3
P1 P2 P3 P4
A C F G I
B H
E P4 J A B C H I J
P2 D D E F G
30
R-trees - format of nodes
■ {(MBR; obj_ptr)} for leaf nodes
P1 P2 P3 P4
x-low; x-high
obj A B C
y-low; y-high
ptr ...
...
31
R-trees - format of nodes
■ {(MBR; node_ptr)} for non-leaf nodes
x-low; x-high node
y-low; y-high ptr P1 P2 P3 P4
... ...
A B C
32
R-trees – Example with Point Data
y axis
Root
10 E7
E E E
E1 e f E2 1 2 3
8
E8 E
d E5 g E 2
1
6 i
E6 h E9 E
E E E E 8 E
contents 4 5 6 7 9
4 omitted
b
E4
a
2 c
E3 a b c d e f h g i
x
0 2 4 6 8 10 axis E E E
4 5 8
33
R-trees:Search
P1 P3
P1 P2 P3 P4
A C F G I
H
B
E P4 J A B C H I J
P2 D D E F G
34
R-trees:Search
P1 P3 I P1 P2 P3 P4
A C F G H
B
E P4 J A B C H I J
P2 D D E F G
35
R-trees: Main points
■ Every parent node completely covers its ‘children’
■ A child MBR may be covered by more than one parent - it is stored
under ONLY ONE of them. (ie., no need for duplicate elimination)
■ A query may follow multiple branches
■ Everything works for any(?) dimensionality
■ well, up to few dimensions (maybe 6-7)
■ dimensionality curse (space becomes very sparse)
36
R-trees: Insertion
■ How to find the next node to insert the new
object?
■ Using ChooseLeaf: Find the entry that needs the
least enlargement to include Y. Resolve ties using
the area (smallest)
37
R-trees: Insertion
Insert X
P1 P3
P1 P2 P3 P4
A C F G I
B H
X E P4 J A B C H I J
P2 D D E X F G
38
R-trees: Insertion
Insert Y
P1 P3
P1 P2 P3 P4
AC F G I
B H
Y E P4 J A B C H I J
P2 D D E F G
39
R-trees: Insertion
■ Extend the parent MBR
P1 P3
P1 P2 P3 P4
A C F G I
B H
Y E P4 J A B C H I J
P2 D D E Y F G
40
R-trees:Insertion
■ Use the ChooseLeaf to find the leaf node to insert an
entry E
■ If leaf node is full, then Split, otherwise insert there
■ Propagate the split upwards, if necessary
■ Adjust parent nodes
41
R-trees: Insertion
■ If node is full then Split : ex. Insert w
P1 P3
P1 P2 P3 P4
A K C W F G I
B H
E P4 J A B C K H I J
P2 D D E F G
42
R-trees: Insertion
■ If node is full then Split : ex. Insert w
Q1 Q2
K P5 P3 P1 P5 P2 P3 P4
P1 A C F G I
W H
B A B
E P4 J C K W H I J
P2 D Q2 F G
Q1 D E
43
R-trees: Split
■ Split node P1: partition the MBRs into two groups; each
group will become a node.
P1
A K C W • ‘linear’ split
B • ‘quadratic’ split
44
R-trees: Split
Why a good split is very important :
The aim is to minimize the area of the newly created nodes. This will make
future searches more efficient.
seed2
R
seed1
46
R-trees: Split
■ Pick two rectangles as ‘seeds’;
■ Assign each rectangle ‘R’ to the ‘closest’ ‘seed’:
■ ‘closest’: the smallest increase in area
seed2
R
seed1
47
R-trees: Split
■ How to pick Seeds:
■ Linear: Find the highest low side and lowest high side in
48
R-trees: Split
■ After the two seeds are chosen, PickNext picks which of the remaining
entries to insert next:
■ Linear: Pick any of the remaining entries
■ Quadratic: Let Q be the set of remaining entries. For each entry E
in Q calculate
■ d1 = The area increase required in the covering rectangle of Group 1 to include E.
■ d2 = The area increase required in the covering rectangle of Group 2 to include E.
■ Choose an entry with the maximum difference between d1 and d2. (Aim is to choose an
entry with maximum preference for any one already created rectangle)
49
R-Trees: Deletion
■ Find the leaf node that contains the entry E
■ Remove E from this node
■ If underflow:
■ Eliminate the node by removing the node entries and the
parent entry
■ Reinsert the orphaned (other entries) into the tree using
Insert
50
R-Tree Performance Parameters
■ The area covered by a directory rectangle should be minimized. Here the
aim is to reduce ‘dead’ space (i.e., space covered by the bounding
rectangle but not covered by enclosed rectangles).
52
+
R -tree
53
Note: Figure from https://tildesites.bowdoin.edu/~ltoma/teaching/cs340/spring08/Papers/Rtree-chap2.pdf
R*-tree
■ Splitting Algorithm: consider both minimizing the total area
of the bounding box and the overlap between the entries.
55
Thank you
Any questions?