Introduction To Bounding Volume Hierarchies: Herman J. Haverkort 18 May 2004
Introduction To Bounding Volume Hierarchies: Herman J. Haverkort 18 May 2004
(draft)
Herman J. Haverkort
18 May 2004
Abstract of points, or like a segment of the time line. How should I order
my CDs? By the oldest work on the CD, maybe? But then
This paper is an excerpt from Chapter 1 of my PhD thesis even the most recent work might be put in the very first place
[Hav04], written to introduce my publications about bounding on the shelf, if it just happens to be on the same CD as the old-
volume hierachies [Aga02, Arg04, Hav04BG]. est work. If I sort like this, how can I be sure of finding a work
from a certain era without checking all earlier CDs too?
When geometric objects have more than one dimension, the
1 Computational geometry and problem becomes even more difficult. But in many applica-
tions, a lot of questions about a set of geometric objects need
geometric data structures to be answered fast. For example, a flight simulator should not
Computational geometry is the area of algorithms research that need to scan the complete hard disk to determine if the plane
deals with computations on geometric objects. Examples of is going to hit a mountain in the next second. In such appli-
such objects are points, lines and polygons in the planewhich cations, it is essential that geometric objects are stored in such
may represent a city planor balls, blocks and more complex a way that relevant objects can be identified quickly, while ir-
shapes in three dimensionswhich may represent the interior relevant objects are ignored without checking them one by one.
of a power plant. In these cases, the geometric objects repre- Often we can do this by sorting the objects into groups. If we do
sent physical objects in the real world. But this is not always this in a clever way, we can, hopefully, discriminate quickly be-
the case. For example, a database storing the age and salary of a tween groups with potentially relevant objects and groups with-
companys employees can also be thought of as a database that out such objects. Therefore, finding useful groupings of objects
stores points in a two-dimensional space: each point represents is a key issue in many problems in computational geometry.
an employee, with one coordinate indicating the age and the A set of geometric objects that is sorted, partitioned into
other coordinate indicating the salary of the employee. There- groups and/or otherwise preprocessed, so that certain queries
fore, geometric computations are found in many applications about the set can be answered efficiently, is called a geometric
of computers: databases, computer-aided design, geographical data structure. The goal of research into such data structures
information systems, flight simulators, other virtual reality ap- is to make them as efficient as possible with respect to stor-
plications, robotics, computer vision and route planning are just age space, the time needed to build the data structure, the time
a few examples. needed to insert or delete objects, and the time needed to an-
To do efficient computations on geometric objects, it is cru- swer queries. Examples of such queries are: which objects lie
cial that we can store, search, and sometimes update, sets (partly) inside a given viewing window? Or: which object is
of geometric objects efficiently. When the objects are one- closest to a given query point? Of course, we would like our
dimensional points, this is relatively easy: we can sort them data structure to facilitate fast answers, not for just one partic-
by their coordinate in that dimension, and put them in memory ular query point or window, but for any query that might be
in that order. This makes it possible to find points fast. It is like asked. One cannot usually expect to optimize for all of the ob-
looking up words in a dictionary: thanks to the ordering, we jectives mentioned at the same time. In general, the faster the
can find a word without turning all pages one by one. queries, the larger the demands on storage, preprocessing and
When an object cannot be described by a single point in a update time.
one-dimensional space, it is less clear how to store a set of We do not usually measure the running times of data struc-
objects effectively. For example, I have a collection of music ture algorithms by counting milliseconds. We could, of course,
CDs. I would like to order them by the year in which the mu- but with computers getting faster all the time, this would make
sic was written, so that I can find all music from a particular era our results outdated even before they are published. Rather we
fast. My CDs usually contain several works of music written ask the question: how well will a data structure be able to take
in a range of years. This makes it impossible to characterise a advantage of bigger and faster computers? To answer that ques-
CD by a single point on a time line: a CD is rather like a group tion, we analyse in what way the number of basic operations
1
performed by the central processor depends on the input size
(the number of objects stored) and the output size (the number
of objects retrieved). The first is usually denoted by n, the sec-
ond by k. We will write that algorithms have a running time
of, for example, O(n) or O(n3 ). In the first case, the running
time is a linear function of n. This means that if we can double
the speed of our hardware, this algorithm can process twice as
much data in the same time. In the second case, the running Figure 1: Example of a bounding-volume hierarchy, using rect-
time is a cubic function of n, which means that our double- angles as bounding volumes.
speed computer will enable us to handle only 26% more data
with this algorithm. This means that even if the second algo-
rithm would be a little faster in practice on the current hardware,
on us the burden of maintaining several structures.
the first algorithm is probably more promising in the future.
If the amount of data is so big that we cannot keep all of it In practice, so-called bounding-volume hierarchies often
in main memory while working on it, we count the number of provide a good solution. They are easy to implement, and
disk accesses rather than the number of operations. In that case although a bounding-volume hierarchy for n objects does not
we analyse how the running times depend on three parameters: store more than 2n pointers and geometric objects, it can be
the input size n, the output size k, and the amount of data trans- used for different types of queries. A query in a bounding-
ferred in one disk access. volume hierarchy does not go directly for the answer to the
query; rather it generates a set of candidate answers, which then
In the most basic form, geometric data structures store points
need to be checked one by one. In practice, the set of candidate
and we want to be able to retrieve, for any query range, the
answers is usually small enough to make this approach efficient.
points that are inside, or sometimes the points that are clos-
For a bounding-volume hierarchy to be useful, it should allow
est to that query range. This type of data structures has been
fast generation of candidate answers, and it should select the
well studied and structures have been developed for simplex
candidates such that they are likely to be true answers.
range queries, axis-parallel (hyper-)rectangular range queries,
circular or (hyper-)spherical range queries, and point queries. Below, I will first explain what a bounding-volume hierarchy
With O(n) space, one can build a data structure for n points is and how it is used. After that, I will explain what issues have
in d dimensions that reports all points inside a simplex in time to be addressed when designing a bounding-volume hierarchy.
O(n11/d + k) [Mat93], where k is the number of points re- I will then focus on a particular class of bounding-volume hier-
ported. For queries with axis-parallel rectangles, one can use archies, namely R-trees, and give an overview of our results on
the same data structure, or the much simpler kd-tree with the R-trees. To conclude, I will suggest a few subjects for further
same query time (see e.g. [Brg97KOS] for a description). With research in this area.
more space, one can often get faster queries. For example, a
layered range tree answers axis-parallel rectangle queries in
time O(logd1 n+k), using O(n logd1 n) space [Brg97KOS]. 2.1 Definition and usage
There are also other data structures whose query times depend
more heavily on the output size and less on the input size. For A bounding-volume hierarchy is a tree structure on a set of geo-
more about data structures for points, see, for example, the sur- metric objects (the data objects). Each object is stored in a leaf
vey by Agarwal and Erickson [Aga98E]. of the tree. Each internal node stores for each of its children
an additional geometric object V (), that encloses all data ob-
jects that are stored in descendants of . In other words, V ()
2 Data structures for object data: is a bounding volume for the descendants of . For an example,
bounding-volume hierarchies see Figure 1.
Bounding-volume hierarchies can be used to do many types
Designing efficient data structures becomes significantly more of queries on the set of data objects. For example, the algorithm
difficult if the objects stored are not points, but objects that have in Figure 2 finds all objects that intersect a query range Q and
some shape and size, such as line segments, balls or polyhedra. are stored in descendants of node . To find all data input ob-
Theoretically efficient solutions for such problems are often too jects that intersect Q, start the algorithm with the root of the hi-
complicated and bear too much overhead to be useful in prac- erarchy as . The query will then descend into the tree, visiting
tice. It becomes even more difficult if we want a data structure exactly those nodes whose bounding volumes intersect Q. The
that supports multiple types of queries at the same time. One bounding-volume hierarchy can also be used for other types of
can cheat, of course, by just taking a few data structures to- queries, such as nearest-neighbour queries (see Figure 3).
gether and store each object multiple times: once in each struc- The algorithms can easily be adapted to hierarchies with
ture. But this increases the storage requirements and also puts leaves that store multiple data objects.
2
Algorithm Intersected (Q,)
1. for every child of
2. if V () intersects Q then
3. if is a leaf then { object M stored in is a candidate answer}
4. if M intersects Q then
5. report M
6. else
7. Intersected (Q,).
Figure 2: Finding all objects that intersect Q. To find all objects that lie completely inside Q, replace the intersection test in line
4 by a test if M lies inside Q. To find all objects that completely contain Q, replace the tests in line 2 and 4 by a test if Q is
completely contained in V (), or in M , respectively.
2.2 Designing bounding-volume hierarchies The minimum (best-fit) bounding box for a given set of data ob-
jects is easy to compute, needs only few bytes of storage, and
When designing a bounding-volume hierarchy, we have to de- robust intersection tests are easy to implement and extremely
cide what kind of bounding volumes to use, what the structure fast. Experiments have been done with a number of other
of the hierarchy should look like, and how to order the data shapes though. Among them are the set-theoretic difference
objects in the tree. of two boxes [Ary00], orientedthat is: non-axis-aligned
bounding boxes [Bar96, Got96], spheres [Oos90] (with little
The shape of the bounding volumes. The choice of bound- success), the intersection of a box and a sphere [Kat97], the
ing volume is determined by a trade-off between two objectives. Minkowski sum of a box and a sphere [Lar00], a circular sec-
On one hand, we would like to use bounding volumes that have tion of a spherical shell [Krs98], pie slices [Bar96], and dis-
a very simple shape. Thus, we need only few bytes to store cretely oriented polytopes (k-DOPs) [Jag90, Klo98], for exam-
them and intersection tests and distance computations are sim- ple octagons [Sit99] or bounded aspect ratio k-DOPs [Dun99].
ple and fast. On the other hand, we would like to have bound- Circles and spheres seem to leave too little freedom to adjust
ing volumes that fit the corresponding data objects very tightly. the shape to fit the objects inside. But some of the more com-
Thus, we try to avoid going into subtrees that will not lead to plex shapes might actually work well. It is difficult to get a clear
any object that satisfies our query. On one extreme, we could picture from comparitive studies on this issue. Some authors
use the full space as the bounding volume for everything. On who compared axis-aligned bounding boxes with discretely-
the other extreme, we would use the union of the data objects oriented octagons (in two dimensions) or oriented bounding
as their bounding volume. Both extremes are pointless. In the boxes (in three dimensions) reported that in the end, axis-
first case we would traverse the complete tree for every query; aligned bounding boxes often seem to work better, despite the
in the second case, intersection tests would be just as complex bad fit to the data; see Van den Bergen [Bgn99] and Sitzmann
as doing a complete query. and Stuckey [Sit99]. Sitzmann, however, also reported positive
In practice, the most commonly used bounding volume is an results for octagon hierarchies on data consisting of randomly
axis-parallel (hyper-)rectanglewe will just call them boxes. oriented line segments. The right type of bounding volume
3
might, in fact, depend on the input: some of the non-standard
bounding volumes are specifically aimed at fitting the triangles
used to approximate smooth surfaces in virtual reality applica-
tions. Finding the right type of bounding volume definitively
remains as a subject for futher study.
In our research, we decided to try to establish the best per-
formance that can be achieved with axis-aligned bounding-box
hierarchies, both from a theoretical and from a practical point Figure 4: A set of rectangles for which an overlap-free hierar-
of view. chy of degree two is impossible.
The structure of the hierarchy. Since a bounding-volume and t is the minimum degree of the nodes.
hierarchy is a tree structure by definition, the main choice left is
to decide on the degree of the nodes, that is: the number of chil- The distribution of the objects in the hierarchy. Finally, the
dren and/or input objects stored in a node. The optimal degree way in which the objects are distributed in the hierarchy may
depends on the way in which the bounding-volume hierarchy have a huge impact on its performance. One of the major issues
is used. The cost of processing a node in the hierarchy is com- is that overlap between bounding volumes in the same node
posed of the costs of accessing the location of the node in mem- can make search paths branch and spread out into large parts of
ory, the cost of reading the nodes children pointers and their the hierarchy. Therefore, it is important to keep the amount of
bounding volumes, and the cost of the intersection or distance overlap small. Unfortunately, overlap cannot be avoided com-
computations on those bounding volumes. If the hierarchy is pletely. Points can always be distributed among the different
stored on disk, the access cost tends to be high: the disks head parts of the hierarchy in such a way that the bounding boxes in
must be moved to the correct physical location. Once the disk a node do not overlap, but with other objects this is not always
head is at the correct position, a complete block of data is read possible. Figure 4 shows a set of rectangles that does not admit
into main memory at once. Computations on those data are rel- of an overlap-free hierarchy of bounding rectangles (or other
atively cheap, since these are done in main memory. Therefore, convex bounding volumes) with nodes of degree two. The only
high-degree nodes that fill a full block of data are preferred. On way to avoid overlap is to cut data objects into smaller parts
the other hand, if the hierarchy is stored in main memory, our (clipping), but this comes at a cost: it would take more storage
main concern is to keep the number of intersection or distance space, and while collecting the answers to a query, time may
computations down. For queries that do not yield too many an- be wasted retrieving pointers to objects which we had found
swers, this is best achieved by making many low-degree nodes. already through another part.
For example, two nodes that are irrelevant to our query can of- Moreover, minimizing the amount of overlap does not nec-
ten be skipped faster if we construct a parent node that gets essarily lead to optimal query efficiency, as is illustrated by
these two nodes as its children. A single distance computation the following example. In Figure 5, we divided the line seg-
on the parents bounding volume may then reveal that we can ments into groups of four: each group corresponds to a node
skip its two children without doing distance computations on just above leaf level in a hierarchy with nodes of degree four.
them. Of course, this potential advantage of having many low- In the top figure, we did the grouping so that we minimize the
degree nodes only materializes if usually, the parent node will overlap between the bounding boxes of the nodes. A query with
indeed be skipped if both of its children are, and usually, we do the grey square will visit 8 nodes on this level. In the lower fig-
not need to go into both children after all. Whether or not this is ure, the line segments are grouped in another way. A query with
actually the case depends on the data, the queries, and the way the grey square will now visit only 4 nodes on this level.
in which the data objects are distributed in the tree. If minimizing overlap is not enough to guarantee optimal
Another issue with regard to the structure of the hierarchy queries, then how should we distribute or group the objects in
is its height. If we want to be able to go from the root to any the hierarchy? It is this issue that was the main subject of my
data object fast, small height is a necessary condition, but not a PhD research.
sufficient one. The main problem is that the bounding volumes
of a nodes children may intersect. If the object lies inside their 2.3 R-Trees
intersection, there is no way to tell which child has the object
as a descendant. However, small height may still be useful to We restricted our study to hierarchies that use axis-parallel
guarantee that update algorithms can run fast. Most algorithms boxes as bounding volumes. Extending the study to other types
to insert or delete an object run in time O(h), where h is the of bounding volumes is an obvious subject for further research
height of the tree. Small height is most easily guaranteed by but it lies beyond the scope of our work. Bounding-volume hi-
requiring that all leaves are at the same depth. This a suffi- erarchies that use axis-parallel boxes as bounding volumes and
cient, but not a necessary condition, to guarantee that the tree have nodes of high degree are also known as R-trees. The R-
has height O(logt n), where n is the number of objects stored, tree was originally introduced by Guttman [Gut84]. His study
4
ment (the tree is built once and not changed afterwards), and
a dynamic environment (the tree is continuously updated). In
a dynamic environment it may be very difficult to maintain an
ideal distribution of objects over the tree. The insertion of
an object can, in principle, change the ideal distribution a lot.
To allow for reasonably efficient update operations, one has to
relax the ideal a bit. As a result, static trees, built with a par-
titioning or a linear-ordering algorithm, usually allow for more
efficient queries than their dynamic counterparts or insertion-
based algorithms.
Despite the huge body of research on R-trees, until recently,
Figure 5: Minimizing overlap does not always lead to optimal very little was known about the query times that can be guaran-
query efficiency. teed for worst-case data and queries. From Kanth and Singh
[Kan98] and De Berg et al. [Brg00] some lower bounds for
intersection queries with axis-parallel rectangles were known:
has inspired two decades of research about how to distribute the
query times better than ((n/t)11/d + k/t) can, in general,
data objects in an R-tree, some authors designing new distri-
not be guaranteed. Here n is the number of data objects, t is the
bution algorithms from scratch, others suggesting optimization
degree of the nodes, k is the number of object bounding boxes
heuristics to be used in conjunction with known methods.
intersected, and d is the number of dimensions. There were
R-trees are parametrized by the maximum degree of the
no algorithms to construct R-trees that can guarantee to do any
nodes, denoted by t in this paper. This parameter is set to
query faster than a full traversal of the complete hierarchy, even
match the characteristics of the hardware used: usually the tree
if there are no answers to be reported. The only results in that
is stored on disk, and t is chosen such that a node fills a full
direction were by De Berg et al. [Brg00], but they could guar-
block on the disk. For in-memory applications, smaller values
antee fast queries only for relatively small query ranges. Other
of t would be used. The minimum degree of the nodes is set
research on R-trees was mainly experimental, or of a statistical
to a fixed fraction of t; in the R-tree variants studied it ranged
nature, making statements about expected query times under
from 10% to 50% of t. R-trees usually store data objects in the
certain assumptions on the distribution of the data and/or the
leaves only and have all leaves on the same level in the tree,
queries. To our knowledge, our algorithms [Aga02, Arg04] are
although some authors have designed variants where this is not
the first algorithms that construct R-trees that guarantee worst-
the case (e.g. [Agg97, Kan97, Ros01]).
case query times better than (n) for all axis-parallel rectangle
Essentially three types of algorithms have been designed to
range queries.
distribute the objects in an R-tree:
Note that in the bound mentioned above, as well as in all
by repeated insertion: One defines an insertion algorithm that results mentioned below, k is not the number of objects in-
strives to optimize the tree locally; a complete tree is built tersected, but the number of data object bounding boxes inter-
by inserting the data objects one by one, e.g. [Ang97, sected. If k would be the number of data objects intersected, it
Bkr92, Bmn90, Grc98b, Ros01]. Usually, deletion algo- would be very difficult to prove anything about the efficiency of
rithms are provided as well. R-trees. Even if the objects are disjoint, their bounding boxes
may in the worst case intersect in a single point, leading to a
by recursive partitioning: One defines an algorithm to dis- query time of (n/t). For an example, see Figure 6. In three
tribute any number of data objects among up to t subtrees; dimensions, there are sets of line segments such that any hier-
the tree is built by applying the partitioning algorithm re- archy of convex bounding volumes on such a set, needs (n/t)
cursively top-down, e.g. [Agg97, Grc98a, Whi96]. The time to answer a query with an axis-parallel line in the worst
resulting data structure can be maintained either by using case [Bar96]. However, even if the objects intersected cannot
insertion and deletion heuristics as above (and, for exam- be identified efficiently in the worst case, this is no reason to
ple, rebalancing the complete tree during quiet hours), or give up on at least identifying the object bounding boxes inter-
by using the logarithmic method [Aga01APV]. sected efficiently. From now on, we will assume that the data
by linear ordering: One defines a function that maps each objects stored in our hierarchies are in fact bounding boxes, and
data object to a one-dimensional value; the tree can then k will be the number of such bounding boxes intersected by the
be built and maintained as a standard B-tree that uses the query range.
function values as keys [Brg00, Bhm99, Kam93, Kam94].
2.4 Our results
For an extensive survey on R-trees, see Manolopoulos et
al. [Man03]. Given the fact that we use axis-parallel boxes as bounding vol-
When comparing the query efficiency of R-trees built by such umes and given the maximum degree of the nodes, we set out
algorithms, one should distinguish between a static environ- to optimize the structure of the tree for fast intersection queries.
5
Only if all three of these restrictions apply, we cannot do our
lower bound construction. In fact, for that case we show how
to construct R-trees that can answer any axis-parallel rectangle
query by visiting O(log2 n + k) nodes.
Note that all our lower bounds, like the previous bounds by
Kanth and Singh [Kan98] and De Berg et al.[Brg00], do not
hold for replicating data structures, that is, data structures that
may store each object (or a pointer to it) more than once.
In the same paper [Aga02] we also give an algorithm
for the construction of axis-aligned-bounding-box hierarchies
Figure 6: All bounding boxes of these line segments overlap in with nodes of degree two that achieves optimal query time
a single point. A query with that point needs to examine the (n11/d + k) in the general case. In a follow-up paper, The
complete hierarchy. Priority R-Tree [Arg04], we extend this method to get opti-
mal ((n/t)11/d + k/t) query time on nodes of degree t (as-
suming that I/O-operations dominate). That paper describes the
We chose to optimize for intersection queries since such queries method for nodes of degree t in detail; it is not necessary to read
are an important application to start with, and they are indica- the other paper first to understand it. In the follow-up paper we
tive of the efficiency of some other types of queries as well. also present experimental results in two dimensions. The results
For example, queries for objects intersecting a rectangle and indicate that our algorithm creates R-trees that are efficient in
queries for objects contained in a rectangle visit exactly the practice, while being more robust than the heuristic approaches
same nodes, and nearest-neighbour queries with a point visit known so far.
exactly those nodes which would be visited by a intersection One may wonder if it is possible to construct R-trees that
query with a circle centered on that point and just touching the combine the good properties of both constructions mentioned
nearest neighbour. Therefore, a good performance on intersec- above: get O((n/t)11/d + k/t) query in the general case, and
tion queries is crucial and can be expected to be a good indi- O(log2 n + k) query time if the three restrictions mentioned
cation of the performance of several other types of queries. To above apply. In the first paper [Aga02] we describe a con-
avoid redundancy in the data structure, we excluded clipping struction, called kd-interval tree, that goes a long way towards
variants from our studies. achieving this goal. A kd-interval tree in two p dimensions an-
Our research has led to three articles: Box-trees and R-trees swers axis-parallel range queries in time O( nt + k) and point
with near-optimal query time [Aga02], The Priority R-Tree: queries in time O(log2 +k), provided that the data rectangles
a practically efficient and worst-case-optimal R-tree [Arg04], dont overlap much. As overlap among the data rectangles in-
and: Box-trees for collision checking in industrial installa- creases,
tions [Hav04BG]. All of them are, in the first place, about p the point query performance degenerates gradually into
O( nt + k). One could use similar techniques as in our PR-
static R-trees, that is: R-trees that are not updated anymore, trees [Arg04] to get a better dependency on the degree t in the
once built (although in [Arg04] we also discuss updates with k-term.
the logarithmic method).
Our lower bound constructions [Aga02] show that it is not
In the first paper [Aga02] we prove that there are sets of rect-
possible to achieve something similar in more than two dimen-
angles, such that in any R-tree on such a set, there are queries
sions: there are sets of disjoint data boxes that make any R-tree
that yield no answers but nevertheless visit ((n/t)11/d )
that guarantees polylogarithmic query times for point queries,
nodes. It is not so much this bound itself which is interest-
spend near-linear time on some (hyper-)cube queries.
ing: it was already known from Kanth and Singh [Kan98] and
De Berg et al. [Brg00]. What is interesting is the type of data In another follow-up paper, Box-trees for collision checking
that can bring out this worst-case behaviour. We show that such in industrial installations [Hav04BG], we look into the three-
worst-case sets of rectangles and queries exist even if any one dimensional situation further. The data boxes in the lower-
or two of the following restrictions apply: bound construction mentioned above do not look extremely
strange: they can be arbitrarily close to a unit cube in shape
no point is contained in the bounding boxes of more than and size. There is one peculiar thing about them though: they
a constant number of data rectangles (in other words: they must be arranged in such a way that certain cubic query ranges
dont overlap much); yield no answers while there are a lot of data boxes nearby.
It turns out that if we accept that such cases are difficult (but
the aspect ratio of the query rectangles is bounded by a probably rare in practice), and if we accept that certain arrange-
constant (in other words: the query rectangles are not ex- ments of extremely flat data boxes are difficult (but probably
tremely long and thin); rare in practice), we can build a three-dimensional kd-interval-
tree with polylogarithmic query time for the remaining cases
we have only two dimensions. (the cases we expect to find in practice). We prove that these
6
query times are achieved not only for queries with boxes but bounding rectangle. Naturally, the second variant is more ro-
also for queries with other query ranges of constant complex- bust when the data consists of rectangles. However, the exper-
ity. In Box-trees for collision checking in industrial installa- iments also show that the second variant is weaker on some
tions [Hav04BG] we describe how to build a tree with nodes sets of point objects. It makes one wonder if this unwanted
of low degree; one may use the transformation algorithms de- behaviour cannot be avoided. Can we design a space-filling
scribed in our first paper [Aga02] to transform the tree into a curve, to be used as the basis for an R-tree, which is good for
real R-tree with high-degree nodes. both point and rectangle data?
To distinguish between arrangements of boxes that should The next big question that remains is: what is the best type
be handled efficiently, and arrangements of boxes that may be of bounding volume? It might depend on the type of queries
considered difficult, we define the slicing number of a set of we want to perform. Are axis-aligned bounding boxes the best
data objects as follows: let the slicing number C with respect choice for axis-aligned rectangle queries? What would be the
to a cube C be the maximum number of data object bounding best bet for general range queries in two dimensions? Do the
boxes that intersect four parallel edges of C; then the overall results on octagons by Sitzmann and Stuckey [Sit99] suggest
slicing number is the maximum value of C over all possible that octagons, sometimes helpful, sometimes harmful, are just
cubes C. A low slicing number means that the data boxes do a little bit too much? Would the optimum be found at discretely
not overlap much and that there are no arrangements of lots of oriented hexagons? And how would that be in three dimen-
extremely flat data boxes very close to each other. sions? Dodecahedra?
The main results for point and axis-aligned rectangle queries Our research has been primarily aimed at two- and three-
can be summarized in Figure 7, where we use the following dimensional settings. Our theoretical results are valid for multi-
notation: dimensional data as well. Unfortunately, this includes the rather
disappointing lower bounds. From this we must conclude that
n: the total number of data objects in the hierarchy;
the theoretical approach taken in this thesis, aiming for optimal
k: the number of data object bounding boxes that intersect worst-case query times, may not give us a data structure that is
the query range; practical for high-dimensional data. In practice, one would like
to have a data structure that does not only guarantee optimal
k : (with > 0) the number of data object bounding boxes query times on the worst possible data, but can also take advan-
that intersect the query range, or lie within a distance of tage of easier data to allow for faster queries. Since in many
times the diameter of the query range; practical situations, we do not have worst-case data, this would
lead to a data structure that is much faster in practice. We do not
t: the maximum degree of the nodes in the hierarchy;
know if our data structures take advantage of easy data or fail
: the maximum aspect ratio (width/height or height/width) to do so. For two-dimensional data, it worked out wellin our
of the query range; experiments, the PR-tree does appear to be efficientbut this
success does not necessarily carry over to higher dimensions.
Handling high-dimensional data may require more study into
questions of the type: what is easy data, and how can we de-
2.5 Subjects for further research sign a data structure that simultaneously guarantees worst-case
query times and takes advantage of easy data? We have made
The 3-dimensional kd-interval-tree mentioned above has good an attempt is an attempt to deal with the first question in three
theoretical bounds for low-degree nodes, but when turned into dimensions [Hav04BG], but it is doubtful if it makes sense to
an R-tree (using the technique explained in [Aga02]), the de- generalize that approach to higher dimensions. The right ques-
pendency on the degree of the nodes is not as good as one would tions to ask may depend on the number of dimensions. Typical
wish. We cannot yet say if data sets of realistic size and struc- applications for low-dimensional data include motion planning.
ture will nevertheless bring out the strength of the kd-interval- There we have objects that may have a shape in, for example,
tree, and if so, for what types of data and queries this method four dimensions (three spatial dimensions and one time dimen-
would indeed be the method of choice. sion). But high-dimensional data more often comes from ap-
In [Arg04] we compare our PR-tree to two variants of the plications where the data objects have no shape and size, but
Hilbert-R-tree, which is an R-tree based on ordering objects are just points whose coordinates represent the values of non-
along the Hilbert space filling curve [Kam94]. Although the geometric properties of the objects.
Hilbert-R-tree cannot guarantee worst-case query times, and
does not outperform the PR-tree, it still has advantages: it is
built faster and it is much easier to implement and maintain. References
We tested two variants of the Hilbert-R-tree in two dimensions:
one in which each data object is represented by its center point, [Aga98E] P. K. Agarwal and J. Erickson: Geometric range
and one in which each data object is represented by a four- searching and its relatives. In B. Chazelle, J. E. Goodman,
dimensional point whose coordinates are those of the objects and R. Pollack (eds.), Advances in Discrete and Com-
7
Results in two dimensions: asymptotic upper bounds O(...)
input rectangles tree [paper] point queries rectangle queries
2 pn
disjoint 2D kd-interval [Aga02] log n t +k
disjoint 2D kd-interval+lsf [Aga02] log2 n log2 n + k
pn k pn k
intersecting PR [Arg04] t + t t + t