0% found this document useful (0 votes)
35 views59 pages

Computational Geometry: Range Searching and Kd-Trees

This document discusses range queries on databases and data structures for efficient querying. It introduces kd-trees, which allow multi-dimensional data to be stored and queried like points in space. Balanced binary search trees are presented as a solution for 1D range queries, where query time depends on pruning white nodes (not visiting subtrees) and output size k rather than total data size n. The 1D range query algorithm traverses the tree, reporting full subtrees or single points as needed.

Uploaded by

Chen Huan Yuan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views59 pages

Computational Geometry: Range Searching and Kd-Trees

This document discusses range queries on databases and data structures for efficient querying. It introduces kd-trees, which allow multi-dimensional data to be stored and queried like points in space. Balanced binary search trees are presented as a solution for 1D range queries, where query time depends on pruning white nodes (not visiting subtrees) and output size k rather than total data size n. The 1D range query algorithm traverses the tree, reporting full subtrees or single points as needed.

Uploaded by

Chen Huan Yuan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Introduction

Kd-trees

Range searching and kd-trees

Computational Geometry

Lecture 7: Range searching and kd-trees

Computational Geometry Lecture 7: Range searching and kd-trees


Introduction Database queries
Kd-trees 1D range trees

Databases

Databases store records or objects

Personnel database: Each employee has a name, id code, date


of birth, function, salary, start date of employment, . . .

Fields are textual or numerical

Computational Geometry Lecture 7: Range searching and kd-trees


Introduction Database queries
Kd-trees 1D range trees

Database queries
G. Ometer
born: Aug 16, 1954
salary salary: $3,500

A database query may ask for


all employees with age
between a1 and a2 , and salary
between s1 and s2

19,500,000 19,559,999
date of birth
Computational Geometry Lecture 7: Range searching and kd-trees
Introduction Database queries
Kd-trees 1D range trees

Database queries

When we see numerical fields of objects as coordinates, a


database stores a point set in higher dimensions

Exact match query: Asks for the objects whose coordinates


match query coordinates exactly
Partial match query: Same but not all coordinates are
specified
Range query: Asks for the objects whose coordinates lie in a
specified query range (interval)

Computational Geometry Lecture 7: Range searching and kd-trees


Introduction Database queries
Kd-trees 1D range trees

Database queries

Example of a 3-dimensional 4,000


(orthogonal) range query:
children in [2 , 4], salary in
3,000
[3000 , 4000], date of birth in 4
[19, 500, 000 , 19, 559, 999] 2

19,500,000 19,559,999

Computational Geometry Lecture 7: Range searching and kd-trees


Introduction Database queries
Kd-trees 1D range trees

Data structures

Idea of data structures


Representation of structure, for convenience (like DCEL)
Preprocessing of data, to be able to solve future
questions really fast (sub-linear time)
A (search) data structure has a storage requirement, a query
time, and a construction time (and an update time)

Computational Geometry Lecture 7: Range searching and kd-trees


Introduction Database queries
Kd-trees 1D range trees

1D range query problem

1D range query problem: Preprocess a set of n points on


the real line such that the ones inside a 1D query range
(interval) can be reported fast

The points p1 , . . . , pn are known beforehand, the query [x, x0 ]


only later

A solution to a query problem is a data structure description,


a query algorithm, and a construction algorithm

Question: What are the most important factors for the


efficiency of a solution?

Computational Geometry Lecture 7: Range searching and kd-trees


Introduction Database queries
Kd-trees 1D range trees

Balanced binary search trees

A balanced binary search tree with the points in the leaves

49

23 80
10 37 62 89

3 19 30 49 59 70 89 93

3 10 19 23 30 37 59 62 70 80 93 97

Computational Geometry Lecture 7: Range searching and kd-trees


Introduction Database queries
Kd-trees 1D range trees

Balanced binary search trees

The search path for 25

49

23 80
10 37 62 89

3 19 30 49 59 70 89 93

3 10 19 23 30 37 59 62 70 80 93 97

Computational Geometry Lecture 7: Range searching and kd-trees


Introduction Database queries
Kd-trees 1D range trees

Balanced binary search trees

The search paths for 25 and for 90

49

23 80
10 37 62 89

3 19 30 49 59 70 89 93

3 10 19 23 30 37 59 62 70 80 93 97

Computational Geometry Lecture 7: Range searching and kd-trees


Introduction Database queries
Kd-trees 1D range trees

Example 1D range query

A 1-dimensional range query with [25, 90]

49

23 80
10 37 62 89

3 19 30 49 59 70 89 93

3 10 19 23 30 37 59 62 70 80 93 97

Computational Geometry Lecture 7: Range searching and kd-trees


Introduction Database queries
Kd-trees 1D range trees

Node types for a query

Three types of nodes for a given query:


White nodes: never visited by the query
Grey nodes: visited by the query, unclear if they lead to
output
Black nodes: visited by the query, whole subtree is
output

Question: What query time do we hope for?

Computational Geometry Lecture 7: Range searching and kd-trees


Introduction Database queries
Kd-trees 1D range trees

Node types for a query

The query algorithm comes down to what we do at each type


of node

Grey nodes: use query range to decide how to proceed: to


not visit a subtree (pruning), to report a complete subtree, or
just continue

Black nodes: traverse and enumerate all points in the leaves

Computational Geometry Lecture 7: Range searching and kd-trees


Introduction Database queries
Kd-trees 1D range trees

Example 1D range query

A 1-dimensional range query with [61, 90]

49
split node
23 80
10 37 62 89

3 19 30 49 59 70 89 93

3 10 19 23 30 37 59 62 70 80 93 97

Computational Geometry Lecture 7: Range searching and kd-trees


Introduction Database queries
Kd-trees 1D range trees

1D range query algorithm

Algorithm 1DRangeQuery(T, [x : x0 ])
1. νsplit ←FindSplitNode(T, x, x0 )
2. if νsplit is a leaf
3. then Check if the point in νsplit must be reported.
4. else ν ← lc(νsplit )
5. while ν is not a leaf
6. do if x ≤ xν
7. then ReportSubtree(rc(ν))
8. ν ← lc(ν)
9. else ν ← rc(ν)
10. Check if the point stored in ν must be reported.
11. ν ← rc(νsplit )
12. Similarly, follow the path to x0 , and . . .

Computational Geometry Lecture 7: Range searching and kd-trees


Introduction Database queries
Kd-trees 1D range trees

Query time analysis

The efficiency analysis is based on counting the numbers of


nodes visited for each type
White nodes: never visited by the query; no time spent
Grey nodes: visited by the query, unclear if they lead to
output; time determines dependency on n
Black nodes: visited by the query, whole subtree is
output; time determines dependency on k, the output size

Computational Geometry Lecture 7: Range searching and kd-trees


Introduction Database queries
Kd-trees 1D range trees

Query time analysis

Grey nodes: they occur on only two paths in the tree, and
since the tree is balanced, its depth is O(log n)

Black nodes: a (sub)tree with m leaves has m − 1 internal


nodes; traversal visits O(m) nodes and finds m points for the
output

The time spent at each node is O(1) ⇒ O(log n + k) query


time

Computational Geometry Lecture 7: Range searching and kd-trees


Introduction Database queries
Kd-trees 1D range trees

Storage requirement and preprocessing

A (balanced) binary search tree storing n points uses O(n)


storage

A balanced binary search tree storing n points can be built in


O(n) time after sorting, so in O(n log n) time overall
(or by repeated insertion in O(n log n) time)

Computational Geometry Lecture 7: Range searching and kd-trees


Introduction Database queries
Kd-trees 1D range trees

Result

Theorem: A set of n points on the real line can be


preprocessed in O(n log n) time into a data structure of O(n)
size so that any 1D range query can be answered in
O(log n + k) time, where k is the number of answers reported

Computational Geometry Lecture 7: Range searching and kd-trees


Introduction Database queries
Kd-trees 1D range trees

Example 1D range counting query

A 1-dimensional range tree for range counting queries

49 14
7 7
23 80
4 3 4 3
10 37 62 89
2 2 2 1 2 2 1 2
3 19 30 49 59 70 89 93
1 1 1 1 1 1 1 1 1 1 1 1
3 10 19 23 30 37 59 62 70 80 93 97

Computational Geometry Lecture 7: Range searching and kd-trees


Introduction Database queries
Kd-trees 1D range trees

Example 1D range counting query

A 1-dimensional range counting query with [25, 90]

49 14
7 7
23 80
4 3 4 3
10 37 62 89
2 2 2 1 2 2 1 2
3 19 30 49 59 70 89 93
1 1 1 1 1 1 1 1 1 1 1 1
3 10 19 23 30 37 59 62 70 80 93 97

Computational Geometry Lecture 7: Range searching and kd-trees


Introduction Database queries
Kd-trees 1D range trees

Result

Theorem: A set of n points on the real line can be


preprocessed in O(n log n) time into a data structure of O(n)
size so that any 1D range counting query can be answered in
O(log n) time

Note: The number of points does not influence the output


size so it should not show up in the query time

Computational Geometry Lecture 7: Range searching and kd-trees


Kd-trees
Introduction Querying in kd-trees
Kd-trees Kd-tree query time analysis
Higher-dimensional kd-trees

Range queries in 2D

Computational Geometry Lecture 7: Range searching and kd-trees


Kd-trees
Introduction Querying in kd-trees
Kd-trees Kd-tree query time analysis
Higher-dimensional kd-trees

Range queries in 2D

Question: Why can’t we simply use a balanced binary tree in


x-coordinate?

Or, use one tree on x-coordinate and one on y-coordinate, and


query the one where we think querying is more efficient?

Computational Geometry Lecture 7: Range searching and kd-trees


Kd-trees
Introduction Querying in kd-trees
Kd-trees Kd-tree query time analysis
Higher-dimensional kd-trees

Kd-trees

Kd-trees, the idea: Split the point set alternatingly by


x-coordinate and by y-coordinate

split by x-coordinate: split by a vertical line that has half the


points left and half right

split by y-coordinate: split by a horizontal line that has half


the points below and half above

Computational Geometry Lecture 7: Range searching and kd-trees


Kd-trees
Introduction Querying in kd-trees
Kd-trees Kd-tree query time analysis
Higher-dimensional kd-trees

Kd-trees

Kd-trees, the idea: Split the point set alternatingly by


x-coordinate and by y-coordinate

split by x-coordinate: split by a vertical line that has half the


points left or on, and half right

split by y-coordinate: split by a horizontal line that has half


the points below or on, and half above

Computational Geometry Lecture 7: Range searching and kd-trees


Kd-trees
Introduction Querying in kd-trees
Kd-trees Kd-tree query time analysis
Higher-dimensional kd-trees

Kd-trees

`1
`5 `7 `1

p4 p9
p5 `2 `3
p10
`2 p2
`4 `5 `6 `7
p1 p7 `3
`8
p8
p3 `8
p6 p3 p4 p5 `9 p8 p9 p10
`9

`4 `6 p1 p2 p6 p7

Computational Geometry Lecture 7: Range searching and kd-trees


Kd-trees
Introduction Querying in kd-trees
Kd-trees Kd-tree query time analysis
Higher-dimensional kd-trees

Kd-tree construction
Algorithm BuildKdTree(P, depth)
1. if P contains only one point
2. then return a leaf storing this point
3. else if depth is even
4. then Split P with a vertical line ` through the
median x-coordinate into P1 (left of or
on `) and P2 (right of `)
5. else Split P with a horizontal line ` through
the median y-coordinate into P1 (below
or on `) and P2 (above `)
6. νleft ← BuildKdTree(P1 , depth + 1)
7. νright ← BuildKdTree(P2 , depth + 1)
8. Create a node ν storing `, make νleft the left
child of ν, and make νright the right child of ν.
9. return ν
Computational Geometry Lecture 7: Range searching and kd-trees
Kd-trees
Introduction Querying in kd-trees
Kd-trees Kd-tree query time analysis
Higher-dimensional kd-trees

Kd-tree construction

The median of a set of n values can be computed in O(n)


time (randomized: easy; worst case: much harder)

Let T(n) be the time needed to build a kd-tree on n points

T(1) = O(1)
T(n) = 2 · T(n/2) + O(n)

A kd-tree can be built in O(n log n) time

Question: What is the storage requirement?

Computational Geometry Lecture 7: Range searching and kd-trees


Kd-trees
Introduction Querying in kd-trees
Kd-trees Kd-tree query time analysis
Higher-dimensional kd-trees

Kd-tree regions of nodes

`1
`1
`2

`3

`2 ν

region(ν) `3

Computational Geometry Lecture 7: Range searching and kd-trees


Kd-trees
Introduction Querying in kd-trees
Kd-trees Kd-tree query time analysis
Higher-dimensional kd-trees

Kd-tree regions of nodes

How do we know region(ν) when we are at a node ν?

Option 1: store it explicitly with every node


Option 2: compute it on-the-fly, when going from
the root to ν

Question: What are reasons to choose one or the other


option?

Computational Geometry Lecture 7: Range searching and kd-trees


Kd-trees
Introduction Querying in kd-trees
Kd-trees Kd-tree query time analysis
Higher-dimensional kd-trees

Kd-tree querying

p4 p12
p5
p13
p2
p8 p10 p3 p4 p5 p11 p12 p13
p1 p9
p7 p11
p3
p6 p6
p1 p2
p7 p8 p9 p10

Computational Geometry Lecture 7: Range searching and kd-trees


Kd-trees
Introduction Querying in kd-trees
Kd-trees Kd-tree query time analysis
Higher-dimensional kd-trees

Kd-tree querying

Algorithm SearchKdTree(ν, R)
Input. The root of (a subtree of) a kd-tree, and a range R
Output. All points at leaves below ν that lie in the range.
1. if ν is a leaf
2. then Report the point stored at ν if it lies in R
3. else if region(lc(ν)) is fully contained in R
4. then ReportSubtree(lc(ν))
5. else if region(lc(ν)) intersects R
6. then SearchKdTree(lc(ν), R)
7. if region(rc(ν)) is fully contained in R
8. then ReportSubtree(rc(ν))
9. else if region(rc(ν)) intersects R
10. then SearchKdTree(rc(ν), R)

Computational Geometry Lecture 7: Range searching and kd-trees


Kd-trees
Introduction Querying in kd-trees
Kd-trees Kd-tree query time analysis
Higher-dimensional kd-trees

Kd-tree querying

Question: How about a range counting query?


How should the code be adapted?

Computational Geometry Lecture 7: Range searching and kd-trees


Kd-trees
Introduction Querying in kd-trees
Kd-trees Kd-tree query time analysis
Higher-dimensional kd-trees

Kd-tree query time analysis

To analyze the query time of kd-trees, we use the concept of


white, grey, and black nodes

White nodes: never visited by the query; no time spent


Grey nodes: visited by the query, unclear if they lead to
output; time determines dependency on n
Black nodes: visited by the query, whole subtree is
output; time determines dependency on k, the output size

Computational Geometry Lecture 7: Range searching and kd-trees


Kd-trees
Introduction Querying in kd-trees
Kd-trees Kd-tree query time analysis
Higher-dimensional kd-trees

Kd-tree query time analysis

p4 p12
p5
p13
p2
p8 p10 p3 p4 p5 p11 p12 p13
p1 p9
p7 p11
p3
p6 p6
p1 p2
p7 p8 p9 p10

Computational Geometry Lecture 7: Range searching and kd-trees


Kd-trees
Introduction Querying in kd-trees
Kd-trees Kd-tree query time analysis
Higher-dimensional kd-trees

Kd-tree query time analysis

White, grey, and black nodes with respect to region(ν):

White node ν: R does not intersect region(ν)


Grey node ν: R intersects region(ν), but region(ν) 6⊆ R
Black node ν: region(ν) ⊆ R

Computational Geometry Lecture 7: Range searching and kd-trees


Kd-trees
Introduction Querying in kd-trees
Kd-trees Kd-tree query time analysis
Higher-dimensional kd-trees

Kd-tree query time analysis

Computational Geometry Lecture 7: Range searching and kd-trees


Kd-trees
Introduction Querying in kd-trees
Kd-trees Kd-tree query time analysis
Higher-dimensional kd-trees

Kd-tree query time analysis

Question: How many grey and how many black leaves?


Computational Geometry Lecture 7: Range searching and kd-trees
Kd-trees
Introduction Querying in kd-trees
Kd-trees Kd-tree query time analysis
Higher-dimensional kd-trees

Kd-tree query time analysis

Question: How many grey and how many black nodes?


Computational Geometry Lecture 7: Range searching and kd-trees
Kd-trees
Introduction Querying in kd-trees
Kd-trees Kd-tree query time analysis
Higher-dimensional kd-trees

Kd-tree query time analysis

Grey node ν: R intersects region(ν), but region(ν) 6⊆ R


It implies that the boundaries of R and region(ν) intersect

Advice: If you don’t know what to do, simplify until you do

Instead of taking the boundary of R, let’s analyze the number


of grey nodes if the query is with a vertical line `

Computational Geometry Lecture 7: Range searching and kd-trees


Kd-trees
Introduction Querying in kd-trees
Kd-trees Kd-tree query time analysis
Higher-dimensional kd-trees

Kd-tree query time analysis

Question: How many grey and how many black leaves?


Computational Geometry Lecture 7: Range searching and kd-trees
Kd-trees
Introduction Querying in kd-trees
Kd-trees Kd-tree query time analysis
Higher-dimensional kd-trees

Kd-tree query time analysis

We observe: At every vertical split, ` is only to one side, while


at every horizontal split ` is to both sides

Let G(n) be the number of grey nodes in a kd-tree with n


points (leaves). Then G(1) = 1 and:

If a subtree has n leaves: G(n) = 1 + G(n/2) at even depth


If a subtree has n leaves: G(n) = 1 + 2 · G(n/2) at odd depth

If we use two levels at once, we get:

G(n) = 2 + 2 · G(n/4) or G(n) = 3 + 2 · G(n/4)

Computational Geometry Lecture 7: Range searching and kd-trees


Kd-trees
Introduction Querying in kd-trees
Kd-trees Kd-tree query time analysis
Higher-dimensional kd-trees

Kd-tree query time analysis

x y
y y x x

n leaves n leaves

Computational Geometry Lecture 7: Range searching and kd-trees


Kd-trees
Introduction Querying in kd-trees
Kd-trees Kd-tree query time analysis
Higher-dimensional kd-trees

Kd-tree query time analysis

G(1) = 1

G(n) = 2 · G(n/4) + O(1)

Question: What does this recurrence solve to?

Computational Geometry Lecture 7: Range searching and kd-trees


Kd-trees
Introduction Querying in kd-trees
Kd-trees Kd-tree query time analysis
Higher-dimensional kd-trees

Kd-tree query time analysis

The grey subtree has unary and binary nodes

Computational Geometry Lecture 7: Range searching and kd-trees


Kd-trees
Introduction Querying in kd-trees
Kd-trees Kd-tree query time analysis
Higher-dimensional kd-trees

Kd-tree query time analysis

1
The depth is log n, so the binary depth is 2 · log n
Important: The logarithm is base-2

Counting only binary nodes, there are


1 1/2 √
2 2 ·log n = 2log n = n1/2 = n

Every unary grey node has a unique binary parent (except the
root), so there are at most twice as many unary nodes as
binary nodes, plus 1

Computational Geometry Lecture 7: Range searching and kd-trees


Kd-trees
Introduction Querying in kd-trees
Kd-trees Kd-tree query time analysis
Higher-dimensional kd-trees

Kd-tree query time analysis

The number of grey nodes if the query were a vertical line



is O( n)

The same is true if the query were a horizontal line

How about a query rectangle?

Computational Geometry Lecture 7: Range searching and kd-trees


Kd-trees
Introduction Querying in kd-trees
Kd-trees Kd-tree query time analysis
Higher-dimensional kd-trees

Kd-tree query time analysis

Computational Geometry Lecture 7: Range searching and kd-trees


Kd-trees
Introduction Querying in kd-trees
Kd-trees Kd-tree query time analysis
Higher-dimensional kd-trees

Kd-tree query time analysis

Computational Geometry Lecture 7: Range searching and kd-trees


Kd-trees
Introduction Querying in kd-trees
Kd-trees Kd-tree query time analysis
Higher-dimensional kd-trees

Kd-tree query time analysis

The number of grey nodes for a query rectangle is at most


the number of grey nodes for two vertical and two horizontal
√ √
lines, so it is at most 4 · O( n) = O( n) !

For black nodes, reporting a whole subtree with k leaves,


takes O(k) time (there are k − 1 internal black nodes)

Computational Geometry Lecture 7: Range searching and kd-trees


Kd-trees
Introduction Querying in kd-trees
Kd-trees Kd-tree query time analysis
Higher-dimensional kd-trees

Result

Theorem: A set of n points in the plane can be preprocessed


in O(n log n) time into a data structure of O(n) size so that

any 2D range query can be answered in O( n + k) time,
where k is the number of answers reported

For range counting queries, we need O( n) time

Computational Geometry Lecture 7: Range searching and kd-trees


Kd-trees
Introduction Querying in kd-trees
Kd-trees Kd-tree query time analysis
Higher-dimensional kd-trees

Efficiency


n log n n
4 2 2
16 4 4
64 6 8
256 8 16
1024 10 32
4096 12 64
1.000.000 20 1000

Computational Geometry Lecture 7: Range searching and kd-trees


Kd-trees
Introduction Querying in kd-trees
Kd-trees Kd-tree query time analysis
Higher-dimensional kd-trees

Higher dimensions

A 3-dimensional kd-tree alternates splits on x-, y-, and


z-coordinate

A 3D range query is performed with a box

Computational Geometry Lecture 7: Range searching and kd-trees


Kd-trees
Introduction Querying in kd-trees
Kd-trees Kd-tree query time analysis
Higher-dimensional kd-trees

Higher dimensions

The construction of a 3D kd-tree is a trivial adaptation of the


2D version

The 3D range query algorithm is exactly the same as the 2D


version

The 3D kd-tree still requires O(n) storage if it stores n points

Computational Geometry Lecture 7: Range searching and kd-trees


Kd-trees
Introduction Querying in kd-trees
Kd-trees Kd-tree query time analysis
Higher-dimensional kd-trees

Higher dimensions
How does the query time analysis change?

Intersection of B and region(ν) depends on intersection of


facets of B ⇒ analyze by axes-parallel planes (B has no more
grey nodes than six planes)

Computational Geometry Lecture 7: Range searching and kd-trees


Kd-trees
Introduction Querying in kd-trees
Kd-trees Kd-tree query time analysis
Higher-dimensional kd-trees

Higher dimensions

x
y y
z z z z

m leaves

Computational Geometry Lecture 7: Range searching and kd-trees


Kd-trees
Introduction Querying in kd-trees
Kd-trees Kd-tree query time analysis
Higher-dimensional kd-trees

Kd-tree query time analysis

Let G3 (n) be the number of grey nodes for a query with an


axes-parallel plane in a 3D kd-tree

G3 (1) = 1

G3 (n) = 4 · G3 (n/8) + O(1)

Question: What does this recurrence solve to?

Question: How many leaves does a perfectly balanced binary


search tree with depth 23 log n have?

Computational Geometry Lecture 7: Range searching and kd-trees


Kd-trees
Introduction Querying in kd-trees
Kd-trees Kd-tree query time analysis
Higher-dimensional kd-trees

Result

Theorem: A set of n points in d-space can be preprocessed in


O(n log n) time into a data structure of O(n) size so that any
d-dimensional range query can be answered in O(n1−1/d + k)
time, where k is the number of answers reported

Computational Geometry Lecture 7: Range searching and kd-trees

You might also like