0% found this document useful (0 votes)

14 views52 pages

Spatial Indexing I: Point Access Methods

Uploaded by

sssbooks84

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views52 pages

Spatial Indexing I: Point Access Methods

Uploaded by

sssbooks84

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 52

Spatial Indexing I

Point Access Methods

Many slides are based on slides provided

by Prof. Christos Faloutsos (CMU) and
Prof. Evimaria Terzi (BU)
The problem
 Given a point set and a rectangular query, find the
points enclosed in the query
 We allow insertions/deletions on line

Query
Grid File
 Hashing methods for multidimensional
points (extension of Extensible hashing)
 Idea: Use a grid to partition the
space each cell is associated with one
page
 Two disk access principle (exact match)
Grid File

 Start with one bucket

for the whole space.
 Select dividers along
each dimension.
Partition space into
cells
 Dividers cut all the
way.
Grid File
 Each cell corresponds
to 1 disk page.
 Many cells can point
to the same page.
 Cell directory
potentially exponential
in the number of
dimensions
Grid File Implementation

 Dynamic structure using a grid directory

 Grid array: a 2 dimensional array with
pointers to buckets (this array can be large,
disk resident) G(0,…, nx-1, 0, …, ny-1)
 Linear scales: Two 1 dimensional arrays that
used to access the grid array (main memory)
X(0, …, nx-1), Y(0, …, ny-1)
Example
Buckets/Disk
Blocks
Grid Directory

Linear scale
Y

Linear scale X
Grid File Search

 Exact Match Search: at most 2 I/Os assuming linear scales fit in

memory.
 First use liner scales to determine the index into the cell

directory
 access the cell directory to retrieve the bucket address (may

cause 1 I/O if cell directory does not fit in memory)

 access the appropriate bucket (1 I/O)

 Range Queries:
 use linear scales to determine the index into the cell directory.

 Access the cell directory to retrieve the bucket addresses of

buckets to visit.
 Access the buckets.
Grid File Insertions

 Determine the bucket into which insertion must

occur.
 If space in bucket, insert.
 Else, split bucket
 how to choose a good dimension to split?

 If bucket split causes a cell directory to split do so

and adjust linear scales.
 insertion of these new entries potentially requires a
complete reorganization of the cell directory---
expensive!!!
Grid File Deletions
 Deletions may decrease the space utilization.
Merge buckets
 We need to decide which cells to merge and
a merging threshold
 Buddy system and neighbor system
 A bucket can merge with only one buddy in each
dimension
 Merge adjacent regions if the result is a rectangle
Tree-based PAMs
 Most of tb-PAMs are based on kd-tree
 kd-tree is a main memory binary tree
for indexing k-dimensional points
 Needs to be adapted for the disk model
 Levels rotate among the dimensions,
partitioning the space based on a value
for that dimension
 kd-tree is not necessarily balanced
2-dimensional kd-trees
 A data structure to support range queries in
R2
 Not the most efficient solution in theory
 Everyone uses it in practice

 Preprocessing time: O(nlogn)

 Space complexity: O(n)
 Query time: O(n1/2+k)
2-dimensional kd-trees
Algorithm:


 Choose x or y coordinate (alternate)

 Choose the median of the coordinate; this defines a
horizontal or vertical line
 Recurse on both sides
We get a binary tree:


 Size O(n)
 Depth O(logn)
 Construction time O(nlogn)
Construction of kd-trees
Construction of kd-trees
Construction of kd-trees
Construction of kd-trees
Construction of kd-trees
The complete kd-tree
Region of node v

Region(v) : the subtree rooted at v stores the

points in black dots
Searching in kd-trees
 Range-searching in 2-d
 Given a set of n points, build a data
structure that for any query rectangle R
reports all point in R
kd-tree: range queries
 Recursive procedure starting from v = root
 Search (v,R)
 If v is a leaf, then report the point stored in v if it lies in R

 Otherwise, if Reg(v) is contained in R, report all points in

the subtree(v)
 Otherwise:

 If Reg(left(v)) intersects R, then Search(left(v),R)

 If Reg(right(v)) intersects R, then

Search(right(v),R)
Query time analysis

 We will show that Search takes at most

O(n1/2+P) time, where P is the number of
reported points
 The total time needed to report all points in all
sub-trees is O(P)
 We just need to bound the number of nodes v
such that region(v) intersects R but is not
contained in R (i.e., boundary of R intersects
the boundary of region(v))
 gross overestimation: bound the number of
region(v) which are crossed by any of the 4
horizontal/vertical lines
Query time (Cont’d)
 Q(n): max number of regions in an n-point kd-tree intersecting
a (say, vertical) line?

 If ℓ intersects region(v) (due to vertical line splitting), then

after two levels it intersects 2 regions (due to 2 vertical
splitting lines)
 The number of regions intersecting ℓ is Q(n)=2+2Q(n/4) 
Q(n)=(n1/2)
d-dimensional kd-trees
 A data structure to support range queries in Rd

 Preprocessing time: O(nlogn)

 Space complexity: O(n)
 Query time: O(n1-1/d+k)
Construction of the
d-dimensional kd-trees
 The construction algorithm is similar as in 2-d
 At the root we split the set of points into two subsets
of same size by a hyper-plane vertical to x1-axis
 At the children of the root, the partition is based on
the second coordinate: x2-coordinate
 At depth d, we start all over again by partitioning on
the first coordinate
 The recursion stops until there is only one point left,
which is stored as a leaf
External memory kd-trees (kdB-tree)

 Pack many interior nodes (forming a subtree)

into a block using BFS-traversal.
 it may not be feasible to group nodes at lower level into
a block productively.
 Many interesting papers on how to optimally pack nodes
into blocks recently published.
 Similar to B-tree, tree nodes split many ways
instead of two ways
 insertion becomes quite complex and expensive.
 No storage utilization guarantee since when a higher
level node splits, the split has to be propagated all the
way to leaf level resulting in many empty blocks.
LSD-tree
 Local Split Decision – tree
 Use kd-tree to partition the space. Each
partition contains up to B points. The
kd-tree is stored in main-memory.
 If the kd-tree (directory) is large, we
store a sub-tree on disk
 Goal: the structure must remain
balanced: external balancing property
Example: LSD-tree
N2 N6 N7

x:x1

(internal)
y:y1 y:y2
directory
y3
x:x2 x:x3
y1 (external)
y4 y:y4
N8 y:y3
N5
y2

N1 N2 N3 N4 N5 N6 N7 N8 buckets

N1 N3 N4

x1 x2 x3
LSD-tree: main points
 Split strategies:
 Data dependent
 Distribution dependent
 Paging algorithm
 Two types of splits: bucket splits and
internal node splits
PAMs
 Point Access Methods
 Multidimensional Hashing: Grid File
 Exponential growth of the directory
 Hierarchical methods: kd-tree based
 Storing in external memory is tricky
 Space Filling Curves: Z-ordering
 Map points from 2-dimensions to 1-dimension.
Use a B+-tree to index the 1-dimensional
points
Z-ordering
 Basic assumption: Finite precision in the
representation of each co-ordinate, K bits (2K
values)
 The address space is a square (image) and
represented as a 2K x 2K array
 Each element is called a pixel
Z-ordering
 Impose a linear ordering on the pixels
of the image  1 dimensional problem
A
11
ZA = shuffle(xA, yA) = shuffle(“01”, “11”)
10 = 0111 = (7)10
01 ZB = shuffle(“01”, “01”) = 0011
00
00 01 10 11
B
Z-ordering
 Given a point (x, y) and the precision K
find the pixel for the point and then
compute the z-value
 Given a set of points, use a B+-tree to
index the z-values
 A range (rectangular) query in 2-d is
mapped to a set of ranges in 1-d
Queries
 Find the z-values that contained in the
query and then the ranges
QA
QA  range [4, 7]
11
10
QB  ranges [2,3] and [8,9]
01
00
00 01 10 11
QB
Hilbert Curve
 We want points that are close in 2d to
be close in the 1d
 Note that in 2d there are 4 neighbors
for each point where in 1d only 2.
 Z-curve has some “jumps” that we
would like to avoid
 Hilbert curve avoids the jumps :
recursive definition
Hilbert Curve- example
 It has been shown that in general Hilbert is better than
the other space filling curves for retrieval [Jag90]
 Hi (order-i) Hilbert curve for 2ix2i array

H1
H2 ... H(n+1)
Handling Regions
 A region breaks into one or more pieces, each one
with different z-value
 Works for raster representations (pixels)
 We try to minimize the number of pieces in the
representation: precision/space overhead trade-off
ZR1 = 0010 = (2)
11
ZR2 = 1000 = (8) 10
ZG = 11 01
00
( “11” is the common prefix) 00 01 10 11
Z-ordering for Regions
 Break the space into 4 equal quadrants: level-1
blocks
 Level-i block: one of the four equal quadrants of a
level-(i-1) block
 Pixel: level-K blocks, image level-0 block
 For a level-i block: all its pixels have the same prefix
up to 2i bits; the z-value of the block
Quadtree
 Object is recursively divided into blocks until:
 Blocks are homogeneous

 Pixel level

 Quadtree: ‘0’ stands for S and W SW NE

SE
‘1’ stands for N and E NW
10 11
00
11 01

10 11
01 1001
00 1011
00 01 10 11
Region Quadtrees
 Implementations
 FL (Fixed Length)
 FD (Fixed length-Depth)
 VL (Variable length)
 Use a B+-tree to index the z-values and
answer range queries
Linear Quadtree (LQ)
 Assume we use n-bits in each dimension (x,y) (so
we have 2nx2n pixels)
 For each object O, compute the z-values of this
object: z1, z2, z3, …, zk (each value can have
between 0 and 2n bits)
 For each value zi we append at the end the level l
of this value ( level l =log(|zi|))
 We create a value with 2n+l bits for each z-value
and we insert it into a B+-tree (l= log2(h))
Z-value, l | Morton block
A: 00, 01 = 00000001
B: 0110, 10 = 01100010 B C
C: 111000,11 = 11100011
n=3

A:1, B:98, C: 227 A

Insert into B+-tree using Mb

Query Alg
WindowQ(query w, quadtree block b)
{ Mb = Morton block of b;
If b is totally enclosed in w {
Compute Mbmax
Use B+-tree to find all objects with M values between Mb<=M<= Mbmax
add to result
} else {
Find all objects with Mb in the B+-tree
add to result
Decompose b into four quadrants sw, nw, se, ne
For child in {sw, nw, se, ne}
if child overlaps with w
WindowQ(w, child)
}
}
z-ordering - analysis
Q: How many pieces (‘quad-tree blocks’)
per region?
A: proportional to perimeter (surface etc)
z-ordering - analysis
(How long is the coastline, say, of Britain?
Paradox: The answer changes with the yard-
stick -> fractals ...)

http://en.wikipedia.org/wiki/How_Long_Is_the_Coast_of_Britain%3F_Statistical_Self-Similarity_and_Fractional_Dimension
Unit: 200 km, 100 km and 50 km in length.
The resulting coastline is about 2350 km, 2775 km and 3425 km

https://commons.wikimedia.org/wiki/File:Britain-fractal-coastline-50km.png#/media/File:Britain-fractal-coastline-combined.jpg
z-ordering - analysis
Q: Should we decompose a region to full
detail (and store in B-tree)?
z-ordering - analysis
Q: Should we decompose a region to full
detail (and store in B-tree)?
A: NO! approximation with 1-5 pieces/z-
values is best [Orenstein90]
z-ordering - analysis
Q: how to measure the ‘goodness’ of a curve?
z-ordering - analysis
Q: how to measure the ‘goodness’ of a curve?
A: e.g., avg. # of runs, for range queries

4 runs 3 runs
(#runs ~ #disk accesses on B-tree)
z-ordering - analysis
Q: So, is Hilbert really better?
A: 27% fewer runs, for 2-d (similar for 3-d)

Q: are there formulas for #runs, #of

quadtree blocks etc?
A: Yes, see a paper by [Jagadish ’90]
H.V. Jagadish. Linear clustering of objects with multiple attributes.
SIGMOD 1990. http://www.cs.ucr.edu/~tsotras/cs236/W15/hilbert-curve.pdf

CS2040 Note
No ratings yet
CS2040 Note
2 pages
Spatial Data Indexing and Queries
No ratings yet
Spatial Data Indexing and Queries
56 pages
99 Geometric Search
No ratings yet
99 Geometric Search
56 pages
G3 - R-Tree, R+-Tree
No ratings yet
G3 - R-Tree, R+-Tree
47 pages
Kdtrees
No ratings yet
Kdtrees
12 pages
Multidimensional Search Trees
No ratings yet
Multidimensional Search Trees
100 pages
Spatial Data Management
No ratings yet
Spatial Data Management
7 pages
KD Trees
No ratings yet
KD Trees
12 pages
Lect8 05
No ratings yet
Lect8 05
27 pages
L17-18 QuadTrees PDF
No ratings yet
L17-18 QuadTrees PDF
45 pages
Quad Trees: CMSC 420
No ratings yet
Quad Trees: CMSC 420
45 pages
CSE 326: Data Structures Lecture #21 Multidimensional Search Trees
No ratings yet
CSE 326: Data Structures Lecture #21 Multidimensional Search Trees
42 pages
Multidimensional Search Trees
No ratings yet
Multidimensional Search Trees
119 pages
KD Tree
No ratings yet
KD Tree
41 pages
Part10 Quadtrees Etc
No ratings yet
Part10 Quadtrees Etc
69 pages
Gis Zorder
No ratings yet
Gis Zorder
67 pages
An Efficient and Robust Access Method For Points and Rectangles
No ratings yet
An Efficient and Robust Access Method For Points and Rectangles
38 pages
R-Trees, Advanced Data Structures
No ratings yet
R-Trees, Advanced Data Structures
29 pages
CPE 514-3 - Graphics Data Structure
No ratings yet
CPE 514-3 - Graphics Data Structure
20 pages
Spatial, Text, and Multimedia Databases: Erik Zeitler Udbl
No ratings yet
Spatial, Text, and Multimedia Databases: Erik Zeitler Udbl
53 pages
Timos Sellis: The R - Tree: A Dynamic Index For Multi-Dimensional Objects
No ratings yet
Timos Sellis: The R - Tree: A Dynamic Index For Multi-Dimensional Objects
11 pages
07 Kdtrees
No ratings yet
07 Kdtrees
17 pages
BST Range Search!
No ratings yet
BST Range Search!
17 pages
L19.Kd Trees
0% (1)
L19.Kd Trees
19 pages
Multidimensional Index Structures
No ratings yet
Multidimensional Index Structures
70 pages
wk5 Prog Assign KD Trees PDF
No ratings yet
wk5 Prog Assign KD Trees PDF
2 pages
3 - Efficient Data Access
No ratings yet
3 - Efficient Data Access
7 pages
Search Trees
No ratings yet
Search Trees
55 pages
Advanced Indexing Techniques: Bibliographical Notes
No ratings yet
Advanced Indexing Techniques: Bibliographical Notes
4 pages
Range Queries
No ratings yet
Range Queries
4 pages
2IL50 Data Structures: 2017-18 Q3 Lecture 9: Range Searching
No ratings yet
2IL50 Data Structures: 2017-18 Q3 Lecture 9: Range Searching
40 pages
Project in DSA Java
No ratings yet
Project in DSA Java
5 pages
Trees For Semidynamic Point Sets: AT&T Bell Labo Ttories Murray Hill, NJ 07974
No ratings yet
Trees For Semidynamic Point Sets: AT&T Bell Labo Ttories Murray Hill, NJ 07974
11 pages
M.tech DS-Scheme CIE 2
No ratings yet
M.tech DS-Scheme CIE 2
5 pages
R Tree
No ratings yet
R Tree
11 pages
Sorting
No ratings yet
Sorting
9 pages
Computational Geometry: Range Trees
No ratings yet
Computational Geometry: Range Trees
66 pages
Computational Geometry: Range Searching and Kd-Trees
No ratings yet
Computational Geometry: Range Searching and Kd-Trees
59 pages
KD Trees
No ratings yet
KD Trees
7 pages
A Model For The Prediction of R-Tree Performance: Yannis Theodoridis Timos Sellis
No ratings yet
A Model For The Prediction of R-Tree Performance: Yannis Theodoridis Timos Sellis
11 pages
Database Design and Implementation 07.multidim
No ratings yet
Database Design and Implementation 07.multidim
20 pages
K-D Trees
No ratings yet
K-D Trees
19 pages
Sribd 2
No ratings yet
Sribd 2
1 page
Hull Optinp Talk
No ratings yet
Hull Optinp Talk
26 pages
Binary Indexed Tree
No ratings yet
Binary Indexed Tree
9 pages
Efficient Implementation of Range Trees
No ratings yet
Efficient Implementation of Range Trees
15 pages
Assignment 3: Kdtree: Due June 4, 11:59 PM
No ratings yet
Assignment 3: Kdtree: Due June 4, 11:59 PM
19 pages
A Comparison of R, R+,R, X and Hilberg Tree: Submitted by
No ratings yet
A Comparison of R, R+,R, X and Hilberg Tree: Submitted by
9 pages
R-Trees: Extension of B+-Trees
No ratings yet
R-Trees: Extension of B+-Trees
44 pages
R-Trees - Paper
100% (1)
R-Trees - Paper
36 pages
Bkd-Tree: A Dynamic Scalable Kd-Tree: Abstract. in This Paper We Propose A New Data Structure, Called The Bkd-Tree, For
No ratings yet
Bkd-Tree: A Dynamic Scalable Kd-Tree: Abstract. in This Paper We Propose A New Data Structure, Called The Bkd-Tree, For
18 pages
Lecture06 RangeTree
No ratings yet
Lecture06 RangeTree
5 pages
Quad Tree: Insert Function
No ratings yet
Quad Tree: Insert Function
4 pages
Nhom 10 - Chapter 1,2,3 - Acorralthesis
No ratings yet
Nhom 10 - Chapter 1,2,3 - Acorralthesis
75 pages
Computational Geometry One Dimensional Range SearchingTwo Dimensional Range
No ratings yet
Computational Geometry One Dimensional Range SearchingTwo Dimensional Range
28 pages
Notes 01
No ratings yet
Notes 01
8 pages
Orthogonal Range Trees
No ratings yet
Orthogonal Range Trees
6 pages
1 Persistent Data Structures
No ratings yet
1 Persistent Data Structures
4 pages
The Tech Interview Playbook: From DSA to System Design
From Everand
The Tech Interview Playbook: From DSA to System Design
Chinmoy Mukherjee
No ratings yet
Quadtree: Exploring Hierarchical Data Structures for Image Analysis
From Everand
Quadtree: Exploring Hierarchical Data Structures for Image Analysis
Fouad Sabry
No ratings yet
C Programming Tutorial
No ratings yet
C Programming Tutorial
291 pages
Python Language
No ratings yet
Python Language
12 pages
Christine Mindrift Resume
No ratings yet
Christine Mindrift Resume
3 pages
Unit III Assignment Digital Electronics
No ratings yet
Unit III Assignment Digital Electronics
6 pages
Solting
No ratings yet
Solting
10 pages
Session 2 - Embedded C
No ratings yet
Session 2 - Embedded C
46 pages
Peer-Graded Assignment.
No ratings yet
Peer-Graded Assignment.
2 pages
Using Exit Class For Characteristic Relationship With Derivation
No ratings yet
Using Exit Class For Characteristic Relationship With Derivation
2 pages
FLAT-Unit 5
No ratings yet
FLAT-Unit 5
22 pages
COAL Mid Paper-1
No ratings yet
COAL Mid Paper-1
3 pages
Data Structures Notes
No ratings yet
Data Structures Notes
2 pages
Lecture 1 - Introduction - Sets
No ratings yet
Lecture 1 - Introduction - Sets
45 pages
Minimum Cost QM
No ratings yet
Minimum Cost QM
1 page
Sathaye College: Practical No: 7
No ratings yet
Sathaye College: Practical No: 7
6 pages
Bash Shell Cheat Sheet
100% (1)
Bash Shell Cheat Sheet
3 pages
Eti Chapter-1 MCQ
No ratings yet
Eti Chapter-1 MCQ
12 pages
Fuzzy LOGIC
No ratings yet
Fuzzy LOGIC
32 pages
Chomsky Hierarchy
No ratings yet
Chomsky Hierarchy
1 page
50 Jenkins Interview Questions and Answers 2023
No ratings yet
50 Jenkins Interview Questions and Answers 2023
10 pages
BH23 III-II Syllabus-Final
No ratings yet
BH23 III-II Syllabus-Final
37 pages
Afo Day 7
No ratings yet
Afo Day 7
45 pages
3rd Sem Model Question Papers
No ratings yet
3rd Sem Model Question Papers
7 pages
Part 2 Solution To Exercise On 2DDCT Using Matrix Implementation
No ratings yet
Part 2 Solution To Exercise On 2DDCT Using Matrix Implementation
3 pages
CN Hamming Code and CRC
No ratings yet
CN Hamming Code and CRC
8 pages
College Bus Willing Students CSE
No ratings yet
College Bus Willing Students CSE
5 pages
Java Programming Notes
No ratings yet
Java Programming Notes
8 pages
01 2023 1 00061737 Fee Voucher
No ratings yet
01 2023 1 00061737 Fee Voucher
1 page
Java Awt DND Dragsource
No ratings yet
Java Awt DND Dragsource
1 page
Face Prep Capgemini Slot Analysis 23rd Aug 2021 Slot 1
No ratings yet
Face Prep Capgemini Slot Analysis 23rd Aug 2021 Slot 1
17 pages
PC Unit II Notes
No ratings yet
PC Unit II Notes
56 pages

Spatial Indexing I: Point Access Methods

Uploaded by

Spatial Indexing I: Point Access Methods

Uploaded by

Spatial Indexing I

Point Access Methods

Many slides are based on slides provided

 Start with one bucket

 Dynamic structure using a grid directory

 Exact Match Search: at most 2 I/Os assuming linear scales fit in

cause 1 I/O if cell directory does not fit in memory)

 Access the cell directory to retrieve the bucket addresses of

 Determine the bucket into which insertion must

 If bucket split causes a cell directory to split do so

 Preprocessing time: O(nlogn)

 Choose x or y coordinate (alternate)

Region(v) : the subtree rooted at v stores the

 Otherwise, if Reg(v) is contained in R, report all points in

 If Reg(left(v)) intersects R, then Search(left(v),R)

 If Reg(right(v)) intersects R, then

 We will show that Search takes at most

 If ℓ intersects region(v) (due to vertical line splitting), then

 Preprocessing time: O(nlogn)

 Pack many interior nodes (forming a subtree)

 Quadtree: ‘0’ stands for S and W SW NE

A:1, B:98, C: 227 A

Insert into B+-tree using Mb

Q: are there formulas for #runs, #of

You might also like