Path Caching: A Technique For Optimal External Searching: Sridhar Ramaswamy and Sairam Subramanian
Path Caching: A Technique For Optimal External Searching: Sridhar Ramaswamy and Sairam Subramanian
External Searching
Sridhar Ramaswamy and Sairam Subramanian
Department of Computer Science
Brown University
Providence, Rhode Island 02912
CS-94-27
May 1994
Path Caching: A Technique for Optimal External
Searching
(Extended Abstract)
Figure 1: Diagonal corner queries, 2-sided, 3-sided and general 2-dimensional range queries.
for answering general 2 and 3-sided queries with optimal nodes are called allocation nodes of interval I . A node x
I/O. We also apply this technique to improve on existing is an allocation node of interval I if I contains the cover-
bounds for segment trees in secondary memory, and interval of x and does not contain the cover-interval of
to implement a restricted version of interval trees in x's parent. The intervals stored at node x are placed in
secondary memory. a list CL(x) called the cover-list of x.
The rest of the paper is organized as follows. Section 2 Given a query point q, let P be path of T from the
explains the general principles behind path caching by root to the leaf y such that q is in the cover-interval of y.
applying it to segment trees. Section 3 explains the It is not hard to show that the intervals containing the
application of path caching to priority search trees to query point q are exactly those intervals that are stored
obtain reasonably good worst-case bounds for 2-sided at the nodes on P . The time required to answer such
range searching. We also present results about the a query is O(log n + t) (where t is number of intervals
application of path caching to 3-sided range searching, that contain q), which is optimal. The data structure
segment trees and interval trees. Section 4 applies occupies O(n log n) space because each interval is stored
recursion and path caching to 2-sided queries to improve in at most 2 log n nodes of the tree.
on the bounds of Section 3. It also brie
y touches on Let us now try to implement this data structure in
applying similar ideas to 3-sided queries. Section 5 secondary storage. Given a block-size of B it is easy
shows how updates to the data structures can be to see that we require at least t=B I/O's to output all
handled in optimal amortized time. We nally present the intervals. Also, it can be shown that we require
our conclusions and open problems in Section 6.
(logB n) I/O's to identify the path to y. Thus an ideal
implementation of segment trees in secondary memory
2 Path caching would require O( Bn log n) disk blocks of space and would
Let us illustrate the idea of path caching by applying it answer stabbing queries with O(logB n + t=B ) I/O's.
to the main memory data structure segment tree. The To lower the time required to locate the root-to-y
segment tree is an elegant data structure that is used to path P we can store the tree T in a blocked fashion
answer stabbing queries on a collection of intervals. by mapping subtrees of height log B into disk blocks.
Before we discuss the use of path caching in this The resulting search structure is called the skeletal B-
context we give a brief description of the segment tree; tree and is similar in structure to a B -tree (see Figure 2).
a more complete treatment can be found in [Ben]. For With this blocking, and a searching strategy similar to
ease of exposition we will assume that none of the input B-trees we can locate a log B -sized portion of P with
intervals share any endpoints. every I/O. If the cover-list of each node is stored in a
To build a segment tree on a set of n intervals we rst blocked fashion (with B intervals per block) then we
build a binary search tree T on the 2n endpoints of the could examine the cover-list CL(x) of each node x on
intervals. The endpoints e1 ; e2; ; ek are stored at the P and retrieve the intervals in CL(x) B at a time. A
leaves of the search tree in sorted order. With each node closer look reveals that this approach could result in
x in the tree we associate a half-open interval2 called a query time of O(log n + t=B ). This is because even
the cover-interval of x. If x is a leaf node containing though we can identify P in O(logB n) time we still have
the endpoint ej then the cover-interval of x is the half- to do at least O(log n) I/O's, one for each cover-list on
open interval [ej ; ej+1). If x is an internal node then the path (see Figure 3). These I/O's may be wasteful if
its cover-interval is the union of the cover-intervals of the cover-lists contain fewer than B intervals. To avoid
its children. To answer stabbing queries we store each paying the additional log n in the query time we need
input interval I in up to 2 log n nodes of the tree. These to avoid wasteful I/O's (ones that return fewer than
2A half-open interval [ ) contains all the points between
a; b a
B intervals) as much as possible. In particular, if the
and including but excluding .
b a b number of wasteful I/O's are smaller than the number
coalesce all the cover-lists on P that have B elements
and store it in a cache C (y) at y then we could look
at C (y) instead of looking at log n possibly underfull
1
cover-lists. If C (y) is stored in a blocked fashion
then retrieving intervals from it would cause at most
one wasteful I/O instead of log n wasteful ones (see
2 3 4 Figure 3). The time for reporting all the intervals would
then be 2t=B + 1. This combined with the time for
5 6 7 8 9 10 11 12 13 nding P would give us the desired query time.
We therefore make the following modication to the
1 segment tree:
For each leaf y identify the underfull cover-lists
CL1 ; CLk (cover-lists that contain less than B in-
tervals) along the root-to-y path. Make copies of the
2 3 4 intervals in all the underfull cover-lists and store it in a
cache C (y) in y. Block C (y) into blocks of size B on to
secondary memory.
5 6 7 8 9 10 11 12 13 From the above discussions we can see that using
this modied version of a segment tree we can answer
Figure 2: Constructing the skeletal graph stabbing queries with O(logB n + t=B ) I/O's. The only
thing left to analyze is the amount of storage required
for the modied data structure. The number of disk
underfull cover−lists
cause wasteful I/O blocks required to block the search tree T is O(n=B ).
The total number of intervals in all the cover-lists is
O(n log n). These can be stored in O( Bn log n) disk
blocks. At each leaf y we have a cache C (y) that
contains up to B log n intervals (from the log n nodes
along the root-to-y path). Therefore to store the caches
from the 2n leaves we need 2n log n disk blocks. Putting
all of this together we see that the space required is
O(n log n) blocks.
y
Sibling
Ancestor
Descendant
of Sibling
Corner
Figure 4: Binary tree implementation of Priority Search Tree in secondary memory showing corner, ancestor, sibling
and sibling's descendant. Here, B is 4.
In the next section, we show how we can combine the general 2-sided queries by using a secondary memory
idea of path caching with recursion to make the storage priority search tree. Similar ideas can be used to get
required even less while keeping queries ecient. better space overheads for the other data structures as
Using similar ideas, we can obtain the following well.
bounds for 3-sided queries, segment trees and interval We rst describe a two level scheme for building a sec-
trees (details will be given in the full version of the ondary memory priority search tree that requires only
paper). O( Bn log log B ) storage while still admitting optimal
query performance. We then brie
y describe a multi-
Theorem 3.3 Given n input points on the plane, path level version of this idea that requires only O( Bn log B )
caching can be used to construct a data structure that storage.
answers any 3-sided query using O(logB n + t=B ) I/O's. Recall that the basic scheme divides the points into
Here t is the output size of the query. The data structure regions of size B . A careful look at this scheme shows
requires O( Bn log2 B ) disk blocks of storage. that the log B overhead is due to the fact that the
ancestor and sibling caches of each of the n=B regions
Theorem 3.4 We can implement Segment Trees in can potentially contain log B blocks. To reduce the
secondary memory so that a point enclosure query space overhead we could either (1) reduce the amount of
containing t intervals can be answered in O(logB n + information stored in each region's cache or (2) reduce
t=B ) I/O's. For n input intervals, the storage used is the total number of regions.
O( Bn log n) disk blocks. A closer look shows that to get optimal query
time, with any region, we must store the information
Theorem 3.5 We can implement Interval Trees in associated with log B of its ancestors. This is because
secondary memory so that a point enclosure query in the priority tree structure the path length from
containing t intervals can be answered in O(logB n + any block to the root is O(log n). Thus to achieve
t=B ) I/O's. For n input intervals, the storage used is a query overhead of at most logB n we must divide
O( Bn log B ) disk blocks. such a path into no more than logB n pieces. Since
logB n = log n= log B , we see that with each node we
In the following sections, we explore 2-sided queries must store the information associated with O(log B ) of
in more detail to improve the bounds obtained in this its ancestors. We therefore turn to the second idea. To
section. We then discuss algorithms for updating the get a linear space data structure we build a basic priority
resulting data structures. search tree that divides the points into regions of size
B log B instead of B . We thus have n=B log B regions.
4 Using recursion to reduce the space To build the caches associated with each of the regions
we proceed in a slightly dierent fashion. First, we sort
overhead the points in each region R right-to-left (i.e. largest to
In this section we describe how to extend the ideas of smallest) according to their x-coordinates. We store
section 3 to develop a recursive data structure that has a these points (B at a time) in a list of disk blocks
much smaller space overhead and still allows queries to associated with R. In the same fashion we also sort
be answered in optimal time. Due to space limitations the points top-to-bottom (i.e. largest to smallest) and
we restrict our ourselves to the problem of answering store them in a list of disk blocks associated with R.
Thus the points in each region are blocked according to collect all the points that lie inside the query. However,
their x as well as their y coordinates. These lists are just looking at the these two caches is not enough to
called the X -list and the Y -list of R respectively. guarantee that we have collected all the points from the
To build the ancestor cache associated with a region ancestors and their siblings. This is because we only
R we look at its log B immediate ancestors. From use one block from each ancestor (and each sibling) to
each of the ancestor's X -list we copy the points in the build the A and S -lists.
rst block. We then sort all these points right-to-left To collect the other points in these regions that are
according to their x-coordinates and store them in a list in our 2-sided query we examine the X and Y -lists of
of disk blocks associated with region R. These blocks the ancestors and their siblings respectively. These lists
constitute the ancestor list (A-list) of R. Similarly, to are examined block by block until we reach a block
build the sibling cache (S -list) of R we consider the that is not fully contained in our query. The X -list
rst blocks from the Y -lists of the siblings of R and its of an ancestor Q of R is examined if and only if all the
ancestors. Adding up the all the storage requirements points from Q that were in the ancestor cache of R are
we get the following lemma. found to be inside the 2-sided query. Similarly the Y -
Lemma 4.1 The total storage required to implement list of a sibling T (along the path from R to the root)
the top-level priority search tree and the associated A, is examined if and only if the points from T that are in
S , X , and Y lists is O( Bn ). the sibling cache are all inside our 2-sided query.
We claim that this algorithm will correctly nd all
Unlike in the basic scheme we now have regions which points in the ancestor and the sibling regions that are
contain O(B log B ) points or O(log B ) disk blocks. in our query. We now show that all the points in
To complete our data structure we therefore build the ancestor regions are found correctly. The case for
secondary level structures for each of these regions. the siblings of the ancestors is similar. Consider some
For each region we build a priority search tree as per ancestor region Q of R and its associated right-to-left
Lemma 3.1. In other words, we divide the region into ordering of the points as represented in its X -list. Since
blocks of size B and build ancestor and sibling caches Q is an ancestor, it is cut by the vertical line of the
(as before) for each of the blocks. In this case the height 2-sided query (see Figure 4). Therefore, all the points
of any path is at most O(log log B ); therefore for each in Q that are to the right of the vertical line are in the
region we use all of its ancestors and the siblings to query and the rest aren't. Thus, all the points in the
construct the ancestor and sibling caches. query are present in consecutive disk blocks (starting
We now count the space overhead incurred due the from the rst one) in the X -list of Q. We therefore
priority search trees built for each region. For each block need to examine the ith block in the X -list if and only
in a given region R we need no more than O(loglog B ) if all the previous i 1 blocks are completely contained
disk blocks to build its ancestor and sibling caches. This in our query. Since the rst block is part of R's ancestor
follows from the fact that the priority search tree of R cache we need to look at the X -list of Q if and only if
has height O(log log B ). A given region R has O(log B ) all of the points of Q that are in the ancestor cache of
disk blocks; therefore the space required for storing the R are contained in the 2-sided query.
ancestor and sibling caches of all the blocks in R is To account for the time taken to nd these points
O(log B log log B ). Adding this over all the regions we we note that there are O(logB n) caches that we must
get the following lemma. look at. It is not hard to see that apart from these
Lemma 4.2 The total storage required by the two level initial lookups we only look at a disk-block from a
data structure is O( Bn log log B ). region Q if our last I/O yielded B points (inside the
query) from this region. Therefore all the other I/O's
4.1 Answering queries using the two-level data are paid for. Thus the time to nd these points is
structure O(logB n + (tA + tS )=B ) where tA and tS denote the
We now show how to answer 2-sided queries using this number of points contained in the ancestors and their
data structure. To answer such a query we proceed siblings.
as follows: As in the basic scheme we rst determine To nd the points in the query that are in the
the region R (in the top level priority search tree) descendants of the siblings, we use the same approach
that contains the corner of the query. As discussed in as the basic scheme. We nd the points in all these
section 3, the points in the query belong to either the regions by scanning their Y -lists. We traverse a region
region R, or one of R's ancestors Q, or a sibling T of Q Q if and only if its parent P is fully contained in the
(or R), or to a descendant of some sibling T . query. An argument similar to the one above shows that
To nd the points that are in the ancestors and their all such points are found by this algorithm, and that the
siblings we look at the logB n ancestor and sibling caches number of I/O's required is O(tD =B ). Here tD denotes
along the path from R to the root. From these caches we the number of points contained in the descendants of
the siblings. fully dynamic, i.e. it can handle both additions and
To nd the points in the query that are in region R deletions of points. The amortized time bound for an
we use the second level priority tree associated with R. update is O(logB n). In this abstract we only give a
Let tR denote the number of points in R that belong brief overview of the dynamization. The details will be
to the query. We nd these points by asking a 2-sided given in the full paper.
query inside region R. This requires at most O(tR =B ) Before we describe our dynamic data structure we
I/O's (by the arguments in section 3). rst discuss an alternate way of visualizing the top level
Therefore, the total number of I/O's required to priority search tree in the two level scheme. In this
answer a two sided query is O(logB n + t=B ) where tree a node corresponds to a region of size B log B . We
t is the size of the output. This in conjunction with partition this priority search tree into subtrees of height
Lemma 4.2 gives us the following theorem. log B log log B . Each such subtree is considered a super
Theorem 4.3 There exists a secondary memory static node. As in section 3, in order to build the ancestor
implementation of the priority search tree that can be and sibling cache of any region R, we only consider
used to answer general 2-sided queries using O(logB n + those ancestors (and their siblings) of R that are in the
t=B ) I/O's; where t is the size of the output. This data same super node as R. Considering subtrees of height
structure requires O( Bn log log B ) disk blocks of space to log B loglog B (instead of log B ) does not change the
query times because we are now dealing with regions of
store n points. size B log B .
4.2 A multilevel scheme to further lower the The advantage of viewing these subtrees as super
space over-head nodes is that now we can isolate the duplication of
It is possible to reduce the space overhead further by information due to the S and A-lists to within a super
using more than two levels. The idea is the same node, since none of the caches ever cross the super
as before. At the second stage instead of building a node boundary. The layer of super nodes (regions)
basic priority search tree for each region we build a tree immediately following the last layer in a super node N
that contains regions of size B log log B and build the can be thought of as children of N in this visualization.
X , Y , A, and S lists same as before. A three level Note that each supernode N contains B= log B regions
scheme gives us a space overhead of O( Bn log loglog B ) and has O(B= log B ) super nodes as its children. Also
while maintaining optimum query time. If we carry this note that the number of super nodes on any path in the
multilevel scheme further then we get a data structure tree is O(logB n).
with the following bounds. Our dynamic data structure associates an update
buer U of size B with each super node N in the
Theorem 4.4 There exists a secondary memory static tree. It also associates an update buer u (also of
implementation of the priority search tree that can be size B ) with each region R. These buers are used to
used to answer general 2-sided queries using O(logB n + store incoming updates until we have collected enough
t=B + log B ) I/O's; where t is the size of the output. of them to account for rebuilding some structure. To
This data structure requires O( Bn log B ) disk blocks of process a query, we rst use the algorithm from section 4
space to store n points. to collect the points that have been entered into the
data structure. We then look through the associated
These ideas can be applied to 3-sided queries as well, update buers to add new points that have been added
to reduce the space overhead incurred by the sibling and to discard old points that have been deleted by an
caches. In particular we can get the following bounds unprocessed update.
for answering 3-sided queries. We now need to be careful in claiming optimal query
Theorem 4.5 There exists a secondary memory static time because the points that we collect by searching the
implementation of the priority search tree that can be priority search structures may then have to be deleted
used to answer general 3-sided queries using O(logB n + when we look at the respective update buers. However,
t=B + log B ) I/O's; where t is the size of the output. it can be shown that for every B log B points we collect
This data structure requires O( Bn log B log B ) disk we can lose at most B points, thus resulting in at most
blocks of space to store n points. two wasted I/O's for log B useful ones. Therefore, the
loss of points due to unprocessed deletes is very small
5 A fully dynamic secondary memory and does not aect the overall query performance.
data structure for answering 2-sided Whenever an update occurs, we rst locate the super
node N where the update should be made. The update
queries could be a point insertion or a deletion. We then log the
In this section we show how to dynamize the two-level update in the associated update buer U . If the buer
scheme discussed in section 4. Our data structure is does not over
ow we don't do anything. If U over
ows
then we take all the updates collected and propagate is done once every B log B updates, the amortized time
them to the regions in N where they should go. For required for one such rebuild is O(1). However, since
instance, a point insertion is trickled down to the region we push updates down when we rebuild super nodes, we
of N that contains its coordinates. In each such region may have to do up to logB n rebuilds (along an entire
we then log the update into the local update buer u path) due to a single over
ow. Therefore, the amortized
associated with it. We now rebuild the X and Y lists of cost of a rebuild is O(logB n).
each region in N taking into account the updates that A moment of thought reveals that pushing points
have percolated into it. We also rebuild each region's down is not enough to keep the priority search tree
ancestor and sibling caches; again taking into account balanced. Repeated additions or deletions to one
the updates that have percolated into that region. side can make subtrees unbalanced. We therefore
The number of I/O's required to rebuild the A, S , periodically rebuild subtrees in the following manner.
X , and Y lists of one region R is O(log B ). Therefore, With each node in the tree we associate a size which
the number of I/O's required to rebuild all the caches is the number of points in the subtree rooted at that
is (B= log B )O(log B ) = O(B ). Since we do this only node. We say that a node is unbalanced if the size of
once in B updates the amortized cost of rebuilding the one of its children is more than twice the size of its
caches is O(1) per update. other child. Whenever this happens at a node R we
If for none of the regions in N any of their buers rebuild the subtree rooted at R. The number of I/O's
over
ow, we don't do anything further. Otherwise for required to rebuild a priority search tree with x points
each region R whose update buer has over
owed we is O((x=B ) logB x + (x=B ) loglog B ). This is because
rebuild the second level priority search tree associated we need to rebuild the secondary level priority search
with it. As before, we take into account the updates in trees as well as the primary level tree along with all the
the buer u of R. caches at both levels. Since a subtree of size x can get
The number of I/O's required to rebuild the pri- unbalanced only after O(x) updates we get an amortized
ority search tree associated with a given region R is rebuilding time of O((logB x + loglog B )=B ) = O(1).
O(log B log log B ). This is because we need to rebuild Summing up all the I/O's that are required to rebuild
the caches of log B blocks each of which could contain various things we see that the total amortized time for
up to log log B disk-blocks of information. Since this is an update is O(logB n). We therefore have the following
done only once every B updates the amortized time per theorem.
update is O((log B log log B )=B ) = O(1). Theorem 5.1 There exists a fully dynamic secondary
For every super node N ; once every B log B updates memory implementation of the priority search tree that
we do a rebuild. This rebuilding keeps the same x- can be used to answer general 2-sided queries using
division as the old one. Keeping the x-divisions same O(logB n + t=B ) I/O's; where t is the size of the
we change the y-lines of the regions so that each region output. The amortized I/O-complexity of processing
now contains exactly B log B points. Using these new both deletions and additions of points is O(logB n). This
regions structure we rebuild the A, S , X , and Y lists data structure requires O( Bn log log B ) disk blocks of
for each region as well as the secondary level priority space to store n points.
search trees associated with each region. Note that it is
important to keep the same x-division to preserve the Similar ideas can be used to get a dynamic data
underlying binary structure of the priority search tree. structure for answering 3-sided queries as well. The
We cannot view the regions in a given supernode in time to answer queries is still optimal but the time to
isolation since they are part of a bigger priority search process updates is not as good. In particular we get the
tree. following bounds for answering 3-sided queries.
To keep the invariant that each region in N contain Theorem 5.2 There exists a fully dynamic secondary
B log B points we may have to push points into its memory implementation of the priority search tree that
children or we may have to borrow points from them. can be used to answer general 3-sided queries using
These are logged as updates in the corresponding O(logB n + t=B ) I/O's; where t is the size of the output.
supernodes. Pushing points into a node is equivalent The amortized I/O-complexity of processing both dele-
to adding points to that region while borrowing points tions and additions of points is O(logB n log2 B ). This
from a node is the same as deleting points from the data structure requires O( Bn log B loglog B ) disk blocks
region. These updates may then cause an over
ow in the
buers associated with one or more of those supernodes. of space to store n points.
We repeat the same process described above with any
such supernode. 6 Conclusions and open problems
It is easy to show that the number of I/O's required to Special cases of 2-dimensional range searching have
rebuild a super node is O(B log log B ). Since a rebuild many applications in databases. We have presented
a technique called path caching which can be used to [Edeb] H. Edelsbrunner, \A new Approach to Rect-
implement many main memory data structures for these angle Intersections, Part II," Int. J. Computer
problems in secondary memory. Our data structures Mathematics 13(1983), 221{229.
have optimal query performance at the expense of a [Gun] O. Gunther, \The Design of the Cell Tree: An
slight overhead in storage. Furthermore, our technique Object-Oriented Index Structure for Geometric
is simple enough to allow inserts and deletes in optimal Databases," Proc. of the fth Int. Conf. on
or near optimal amortized time. Data Engineering (1989), 598{605.
There seem to be some fundamental obstacles to [Gut] Antonin Guttman, \R-Trees: A Dynamic Index
implementing many main memory data structures in Structure for Spatial Searching," Proc. 1984
secondary memory. We believe that studying space- ACM-SIGMOD Conference on Management of
time tradeos, as we have done, is important in Data (1985), 47{57.
understanding the complexities of secondary storage
structures. The hope is that this will eventually help us [IKO] C. Icking, R. Klein, and T. Ottmann, Priority
develop ecient data structures that will provide good Search Trees in Secondary Memory (Extended
worst case bounds on querying as well as update times. Abstract) , Lecture Notes In Computer Science
As of today, we have to rely on heuristics|that may or #314, Springer-Verlag, 1988.
may not perform well at all times|to handle many of [KKR] P. C. Kanellakis, G. M. Kuper, and P. Z.
these problems. Revesz, \Constraint Query Languages," Proc.
Specically, the important problem of dynamic in- 9th ACM PODS (1990), 299{313.
terval management that we highlighted in [KRV] re- [KRV] P. C. Kanellakis, S. Ramaswamy, D. E. Ven-
mains open. Can we solve this problem optimally us- gro, and J. S. Vitter, \Indexing for Data Mod-
ing O( Bn ) storage, answering queries in O(logB n + t=B ) els with Constraints and Classes," Proc. 12th
time, while being able to perform updates in O(logB n) ACM PODS (1993), 233{243, (A complete ver-
worst-case time? sion of the paper appears in Technical Report
Acknowledgments: We thank Paris Kanellakis for 93-21, Brown University.).
helpful discussions on this area.
[KKD] W. Kim, K. C. Kim, and A. Dale, \Indexing
References Techniques for Object-Oriented Databases,"
in Object-Oriented Concepts, Databases, and
Applications , W. Kim and F. H. Lochovsky,
[BaM] R. Bayer and E. McCreight, \Organization of eds., Addison-Wesley, 1989, 371{394.
Large Ordered Indexes," Acta Informatica 1 [KiL] W. Kim and F. H. Lochovsky, eds., Object-
(1972), 173{189. Oriented Concepts, Databases, and Applica-
[Ben] J. L. Bentley, \Algorithms for Klee's Rect- tions , Addison-Wesley, 1989.
angle Problems," Dept. of Computer Science, [LoS] D. B. Lomet and B. Salzberg, \The hB-Tree:
Carnegie Mellon Univ. unpublished notes, 1977. A Multiattribute Indexing Method with Good
[BlGa] G. Blankenagel and R. H. Guting, \XP-Trees Guaranteed Performance," ACM Transactions
- External Priority Search Trees," FernUniver- on Database Systems 15(4)(1990), 625{658.
sitat Hagen, Informatik{Bericht Nr. 92, 1990. [LOL] C. C. Low, B. C. Ooi, and H. Lu, \H-trees: A
[BlGb] G. Blankenagel and R. H. Guting, \External Dynamic Associative Search Index for OODB,"
Segment Trees," FernUniversitat Hagen, Infor- Proc. ACM SIGMOD (1992), 134{143.
matik{Bericht, 1990. [McC] E. M. McCreight, \Priority Search Trees,"
[ChT] Y.-J. Chiang and R. Tamassia, \Dynamic Al- SIAM Journal of Computing 14(2)(1985), 257{
gorithms in Computational Geometry," Pro- 276.
ceedings of IEEE, Special Issue on Computa- [NHS] J. Nievergelt, H. Hinterberger, and K. C. Sev-
tional Geometry 80(9) (1992), 362{381. cik, \The Grid File: An Adaptable, Symmetric
[Cod] E. F. Codd, \A Relational Model for Large Multikey File Structure," ACM Transactions
Shared Data Banks," CACM 13(6) (1970), on Database Systems 9(1)(1984), 38{71.
377{387. [Ore] J. A. Orenstein, \Spatial Query Processing in
[Com] D. Comer, \The Ubiquitous B-tree," Comput- an Object-Oriented Database System," Proc.
ing Surveys 11(2)(1979), 121{137. ACM SIGMOD (1986), 326{336.
[Edea] H. Edelsbrunner, \A new Approach to Rect- [OSB] M. H. Overmars, M. H. M. Smid, M. T.
angle Intersections, Part I," Int. J. Computer de Berg, and M. J. van Kreveld, \Maintaining
Mathematics 13(1983), 209{219. Range Trees in Secondary Memory: Part I:
Partitions," Acta Informatica 27(1990), 423{
452.
[Rob] J. T. Robinson, \The K-D-B Tree: A Search
Structure for Large Multidimensional Dynamic
Indexes," Proc. ACM SIGMOD (1984), 10{18.
[Sama] H. Samet, Applications of Spatial Data Struc-
tures: Computer Graphics, Image Processing,
and GIS , Addison-Wesley, 1989.
[Samb] H. Samet, The Design and Analysis of Spatial
Data Structures , Addison-Wesley, 1989.
[SRF] T. Sellis, N. Roussopoulos, and C. Faloutsos,
\The R+ -Tree: A Dynamic Index for Multi-
Dimensional Objects," Proc. 1987 VLDB Con-
ference, Brighton, England (1987).
[SmO] M. H. M. Smid and M. H. Overmars, \Main-
taining Range Trees in Secondary Memory:
Part II: Lower Bounds," Acta Informatica 27
(1990), 453{480.
[ZdM] S. Zdonik and D. Maier, Readings in Object-
Oriented Database Systems , Morgan Kauf-
mann, 1990.