MultidimensionalAccessMethods
MultidimensionalAccessMethods
VOLKER GAEDE
IC-Parc, Imperial College, London
AND
OLIVER GÜNTHER
Humboldt-Universität, Berlin
Search operations in databases require special support at the physical level. This is
true for conventional databases as well as spatial databases, where typical search
operations include the point query (find all objects that contain a given search
point) and the region query (find all objects that overlap a given search region).
More than ten years of spatial database research have resulted in a great variety
of multidimensional access methods to support such operations. We give an
overview of that work. After a brief survey of spatial data management in general,
we first present the class of point access methods, which are used to search sets of
points in two or more dimensions. The second part of the paper is devoted to
spatial access methods to handle extended objects, such as rectangles or polyhedra.
We conclude with a discussion of theoretical and experimental results concerning
the relative performance of various approaches.
This work was partially supported by the German Research Society (DFG/SFB 373) and by the ESPRIT
Working Group CONTESSA (8666).
Authors’ address: Institut für Wirtschaftsinformatik, Humboldt-Universität zu Berlin, Spandauer Str.
1, 10178 Berlin, Germany; email: ^{gaede,guenther}@wiwi.hu-berlin.de&.
Permission to make digital / hard copy of part or all of this work for personal or classroom use is granted
without fee provided that the copies are not made or distributed for profit or commercial advantage, the
copyright notice, the title of the publication, and its date appear, and notice is given that copying is by
permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to
lists, requires prior specific permission and / or a fee.
© 1998 ACM 0360-0300/98/0600–0170 $05.00
Several shorter surveys have been had an impact on the design of multidi-
published previously in various Ph.D. mensional access methods. Sections 4
theses such as Ooi [1990], Kolovson and 5 form the core of this survey, pre-
[1990], Oosterom [1990], and Schiwietz senting a variety of point access meth-
[1993]. Widmayer [1991] gives an over- ods (PAMs) and spatial access methods
view of work published before 1991. (SAMs), respectively. Some remarks
Like the thesis by Schiwietz, however, about theoretical and experimental
his survey is available only in German. analyses are contained in Section 6, and
Samet’s books [1989, 1990] present the Section 7 concludes the article.
state of the art until 1989. However,
they primarily cover quadtrees and re- 2. ORGANIZATION OF SPATIAL DATA
lated data structures. Lomet [1991] dis-
cusses the field from a systems-oriented 2.1 What Is Special About Spatial?
point of view.
The remainder of the article is orga- To obtain a better understanding of the
nized as follows. Section 2 discusses requirements in spatial database sys-
some basic properties of spatial data tems, we first discuss some basic prop-
and their implications for the design erties of spatial data. First, spatial data
and implementation of spatial data- have a complex structure. A spatial
bases. Section 3 gives an overview of data object may be composed of a single
some traditional data structures that point or several thousands of polygons,
hold the complete database in main (10) Minimum impact. The integration
memory. Therefore, access methods of an access method into a data-
need to integrate secondary and ter- base system should have minimum
tiary storage in a seamless manner. impact on existing parts of the sys-
(3) Broad range of supported opera- tem.
tions. Access methods should not
support just one particular type of 2.2 Definitions and Queries
operation (such as retrieval) at the
expense of other tasks (such as de- We have already introduced the term
letion). multidimensional access methods to de-
note the large class of access methods
(4) Independence of the input data and
that support searches in spatial data-
insertion sequence. Access methods
bases and are the subject of this survey.
should maintain their efficiency
Within this class, we distinguish be-
even when input data are highly tween point access methods (PAMs) and
skewed or the insertion sequence is spatial access methods (SAMs). Point
changed. This point is especially access methods have primarily been de-
important for data that are distrib- signed to perform spatial searches on
uted differently along the various point databases (i.e., databases that
dimensions. store only points). The points may be
(5) Simplicity. Intricate access meth- embedded in two or more dimensions,
ods with many special cases are but they do not have a spatial exten-
often error-prone to implement and sion. Spatial access methods, however,
thus not sufficiently robust for can manage extended objects, such as
large-scale applications. lines, polygons, or even higher-dimen-
(6) Scalability. Access methods should sional polyhedra. In the literature, one
adapt well to database growth. often finds the term spatial access
(7) Time efficiency. Spatial searches method referring to what we call multi-
should be fast. A major design goal dimensional access method. Other terms
is to meet the performance charac- used for this purpose include spatial
teristics of one-dimensional B-trees: index or spatial index structure.
first, access methods should guar- We generally assume that the given
antee a logarithmic worst-case objects are embedded in d-dimensional
search performance for all possible Euclidean space E d or a suitable sub-
input data distributions regardless space thereof. In this article, this space
of the insertion sequence and sec- is also referred to as the universe or
ond, this worst-case performance original space. Any point object stored
should hold for any combination of in a spatial database has a unique loca-
the d attributes. tion in the universe, defined by its d
coordinates. Unless the distinction is
(8) Space efficiency. An index should essential, we use the term point both for
be small in size compared to the locations in space and for point objects
data to be addressed and therefore stored in the database. Note, however,
guarantee a certain storage utiliza- that any point in space can be occupied
tion. by several point objects stored in the
(9) Concurrency and recovery. In mod- database.
ern databases where multiple us- A (convex) d-dimensional polytope P
ers concurrently update, retrieve, in E d is defined to be the intersection of
and insert data, access methods some finite number of closed halfspaces
should provide robust techniques in E d , such that the dimension of the
for transaction management with- smallest affine subspace containing P is
out significant performance penal- d. If a [ E d 2 {0} and c [ E 1 then the
ties. (d 2 1)-dimensional set H(a, c) 5 {x [
smaller than the Cartesian product large interval remaining. Note that the
R 3 S. split interval need not be the same in-
terval as the one that caused the split;
consequently, there is no guarantee that
3. BASIC DATA STRUCTURES
the split relieves the bucket in question
from its overload. If an interval contains
3.1 One-Dimensional Access Methods
more objects than bucket capacity per-
Classical one-dimensional access meth- mits, the overload is stored on an over-
ods are an important foundation for al- flow page, which is linked to the origi-
most all multidimensional access meth- nal page. When t 5 B, the file has
ods. Although the related surveys by doubled and all intervals have the same
Knott [1975] and Comer [1979] are length (B 2 A)/2 k11 . In this case we
somewhat dated, they represent good reset the pointer t to A and resume the
coverage of the different approaches. In split procedure for the smaller inter-
practice, the most common one-dimen- vals.
sional structures include linear hashing
[Litwin 1980; Larson 1980], extendible 3.1.2 Extendible Hashing [Fagin et
hashing [Fagin et al. 1979], and the al. 1979]. As does linear hashing, ex-
B-tree [Bayer and McCreight 1972]. Hi- tendible hashing organizes the data in
erarchical access methods such as the binary intervals, here called cells. Over-
B-tree are scalable and behave well in flow pages are avoided in extendible
the case of skewed input; they are hashing by using a central directory.
nearly independent of the distribution Each cell has an index entry in that
of the input data. This is not necessarily directory; it initially corresponds to one
true for hashing techniques, whose per- bucket. If during an insertion a bucket
formance may degenerate depending on at maximal depth exceeds its maximum
the given input data and hash function. capacity, all cells are split into two. New
This problem is aggravated by the use index entries are created and the direc-
of order-preserving hash functions tory doubles in size. Since each bucket
[Orenstein 1983; Garg and Gotlieb was not at full capacity before the split,
1986] that try to preserve neighborhood it may now be possible to fit more than
relationships between data items in or- one cell in the same bucket. In that
der to support range queries. As a re- case, adjacent cells are regrouped in
sult, highly skewed data keep accumu- data regions and stored on the same
lating at a few selected locations in disk page. In the case of skewed data
image space. this may lead to a situation where nu-
merous directory entries exist for the
3.1.1 Linear Hashing [Larson 1980; same data region (and therefore the
Litwin 1980]. Linear hashing divides same disk page). Even in the case of
the universe [A, B) of possible hash val- uniformly distributed data, the average
ues into binary intervals of size (B 2 directory size is Q(n 111/b ) and therefore
A)/2 k or (B 2 A)/2 k11 for some k $ 0. superlinear [Flajolet 1983]. Here b de-
Each interval corresponds to a bucket, notes the bucket size and n is the num-
that is, a collection of records stored on ber of index entries. Exact match
a disk page. t [ [A, B) is a pointer that searches take no more than two page
separates the smaller intervals from the accesses: one for the directory and one
larger ones: all intervals of size (B 2 for the bucket with the data. This is
A)/2 k are to the left of t and all intervals more than the best-case performance of
of size (B 2 A)/2 k11 are to the right of t. linear hashing, but better than the
If a bucket reaches its capacity due to worst case.
an insertion, the interval [t, t 1 (B 2 Besides the potentially poor space uti-
A)/2 k ) is split into two subintervals of lization of the index, extendible hashing
equal size, and t is advanced to the next also suffers from a nonincremental
growth of the index due to the doubling ary memory and are therefore less
steps. To address these problems, suited for large spatial databases. In
Lomet [1983] proposed a technique this section, we review several of these
called bounded-index extendible hash- fundamental data structures, which are
ing. In this proposal, the index grows as adapted and incorporated in numerous
in extendible hashing until its size multidimensional access methods. To il-
reaches a predetermined maximum; lustrate the methods, we introduce a
that is, the index size is bounded. Once small scenario that we use as a running
this limit is reached while inserting new example throughout this survey. The
items, bounded-index extendible hash- scenario, depicted in Figure 9, contains
ing starts doubling the data bucket size 10 points pi and 10 polygons ri, ran-
rather than the index size. domly distributed in a finite two-dimen-
3.1.3 The B-Tree [Bayer and Mc- sional universe. To represent polygons,
Creight 1972]. Other than hashing we often use their centroids ci (not pic-
schemes, the B-tree and its variants tured) or their minimum bounding
[Comer 1979] organize the data in a boxes (MBBs) mi. Note that the quality
hierarchical manner. B-trees are bal- of the MBB approximation varies con-
anced trees that correspond to a nesting siderably. The MBB m8, for example,
of intervals. Each node n corresponds to provides a fairly tight fit, whereas r5 is
a disk page D( n ) and an interval I( n ). If only about half as large as its MBB m5.
n is an interior node then the intervals
I( n i ) corresponding to the immediate de- 3.2.1 The k-d-Tree [Bentley 1975,
scendants of n are mutually disjoint 1979]. One of the most prominent d-
subsets of I( n ). Leaf nodes contain dimensional data structures is the k-d-
pointers to data items; depending on the tree. The k-d-tree is a binary search tree
type of B-tree, interior nodes may do so that represents a recursive subdivision
as well. B-trees have an upper and of the universe into subspaces by means
lower bound for the number of descen- of (d 2 1)-dimensional hyperplanes.
dants of a node. The lower bound pre- The hyperplanes are iso-oriented, and
vents the degeneration of trees and their direction alternates among the d
leads to an efficient storage utilization. possibilities. For d 5 3, for example,
Nodes whose number of descendants splitting hyperplanes are alternately
drops below the lower bound are deleted perpendicular to the x-, y-, and z-axes.
and their contents distributed among Each splitting hyperplane has to con-
the adjacent nodes at the same tree tain at least one data point, which is
level. The upper bound follows from the used for its representation in the tree.
fact that each tree node corresponds to Interior nodes have one or two descen-
exactly one disk page. If during an in- dants each and function as discrimina-
sertion a node reaches its capacity, it is tors to guide the search. Searching and
split in two. Splits may propagate up insertion of new points are straightfor-
the tree. As the size of the intervals ward operations. Deletion is somewhat
depends on the given data (and the in- more complicated and may cause a reor-
sertion sequence), the B-tree is an adap- ganization of the subtree below the data
tive data structure. For uniformly dis- point to be deleted.
tributed data, however, extendible as Figure 10 shows a k-d-tree for the
well as linear hashing outperform the running example. Because the tree can
B-tree on the average for exact match only handle points, we represent the
queries, insertions, and deletions. polygons by their centroids ci. The first
3.2 Main Memory Structures
splitting line is the vertical line crossing
c3. We therefore store c3 in the root of
Early multidimensional access methods the corresponding k-d-tree. The next
did not take into account paged second- splits occur along horizontal lines cross-
ing p10 (for the left subtree) and c7 (for need not be strictly alternating any-
the right subtree), and so on. more. As a result, the split points are
One disadvantage of the k-d-tree is not part of the input data; all data
that the structure is sensitive to the points are stored in the leaves. Interior
order in which the points are inserted. nodes contain the dimension (e.g., x or
Another one is that data points are scat- y) and the coordinate of the correspond-
tered all over the tree. The adaptive ing split. Splitting is continued recur-
k-d-tree [Bentley and Friedman 1979] sively until each subspace contains only
mitigates these problems by choosing a a certain number of points. The adap-
split such that one finds about the same tive k-d-tree is a rather static structure;
number of elements on both sides. Al- it is obviously difficult to keep the tree
though the splitting hyperplanes are balanced in the presence of frequent
still parallel to the axes, they need not insertions and deletions. The structure
contain a data point and their directions works best if all the data are known a
tive impact on the tree performance. To avoid the storage utilization prob-
BSP-trees also have higher space re- lems that are often associated with a
quirements, since storing an arbitrary strictly regular partitioning, the BD-
hyperplane per split occupies more stor- tree employs a more flexible splitting
age space than a simple discriminator, policy. Here one can split a node by
which is typically just a real number. making an interval-shaped excision
from the corresponding region. The two
3.2.3 The BD-Tree [Ohsawa and
child nodes of the node to be split will
Sakauchi 1983]. The BD-tree is a bi-
then have different interpretations: one
nary tree representing a subdivision of
represents the excision; the other one
the data space into interval-shaped re-
represents the remainder of the original
gions. Each of those regions is encoded
region. Note that the remaining region
in a bit string and associated with one
is no longer interval-shaped. With this
of the BD-tree nodes. Here, these bit
policy, the BD-tree can guarantee that,
strings are called DZ-expressions; they
after node splitting, each of the data
are also known as Peano codes, ST_Mor-
buckets contains at least one third of
tonNumber, or z-values (cf. Section
the original entries.
5.1.2).
Figure 13 shows a BD-tree for the
Given a region R, one computes the
running example. An excision is always
corresponding DZ-expression as follows.
represented by the left child of the node
For simplicity we restrict this presenta-
that was split.
tion to the two-dimensional case; we
For an exact match we first compute
also assume that the first subdividing
the full bit-interleaved prefix of the
hyperplane is a vertical line. If R lies to
search record. Starting from the root,
the left of that line, the first bit of the
we recursively compare this prefix with
corresponding DZ-expression is 0; other-
the stored DZ-expressions of each inter-
wise it is 1. In the next step, we subdi-
nal node. If it matches, we follow the
vide the subspace containing R by a
corresponding link; otherwise we follow
horizontal line. If R lies below that line,
the other link until we reach the leaf
the second bit of the DZ-expression is 0,
level of the BD-tree. More sophisticated
otherwise it is 1. As this decomposition
algorithms were proposed later by Dan-
progresses, we obtain one bit per split-
damudi and Sorenson [1986, 1991].
ting line. Bits at odd positions refer to
vertical lines and bits at even positions 3.2.4 The Quadtree. The quadtree
to horizontal lines, which explains why with its many variants is closely related
this scheme is often referred to as bit to the k-d-tree. For an extensive discus-
interleaving. sion of this structure, see Samet [1984,
1990a,b]. Although the term quadtree one of the first quadtree variants: the
usually refers to the two-dimensional point quadtree, essentially a multidi-
variant, the basic idea applies to an mensional binary search tree. The point
arbitrary d. Like the k-d-tree, the quadtree is constructed consecutively by
quadtree decomposes the universe by inserting the data points one by one.
means of iso-oriented hyperplanes. An For each point, we first perform a point
important difference, however, is the search. If we do not find the point in the
fact that quadtrees are not binary trees tree, we insert it into the leaf node
anymore. In d dimensions, the interior where the search has terminated. The
nodes of a quadtree have 2 d descen- corresponding partition is divided into
dants, each corresponding to an inter- 2 d subspaces with the new point at the
val-shaped partition of the given sub- center. The deletion of a point requires
space. These partitions do not have to the restructuring of the subtree below
be of equal size, although that is often the corresponding quadtree node. A
the case. For d 5 2, for example, each simple way to achieve this is to reinsert
interior node has four descendants, each all points into the subtree. Figure 14
corresponding to a rectangle. These shows a two-dimensional point quadtree
rectangles are typically referred to as for the running example.
the NW, NE, SW, and SE (northwest, Another popular variant is the region
etc.) quadrants. The decomposition into quadtree [Samet 1984]. Region quadtrees
subspaces is usually continued until the are based on a regular decomposition of
number of objects in each partition is the universe; that is, the 2 d subspaces
below a given threshold. Quadtrees are resulting from a partition are always of
therefore not necessarily balanced; sub- equal size. This greatly facilitates
trees corresponding to densely popu- searches. For the running example, Fig-
lated regions may be deeper than oth- ure 15 shows how region quadtrees can
ers. be used to represent sets of points. Here
Searching in a quadtree is similar to the threshold for the number of points
searching in an ordinary binary search in any given subspace was set to one. In
tree. At each level, one has to decide more complex versions of the region
which of the four subtrees need be in- quadtree, such as the PM quadtree
cluded in the future search. In the case [Samet and Webber 1985], it is also
of a point query, typically only one sub- possible to store polygonal data directly.
tree qualifies, whereas for range queries PM quadtrees divide the quadtree re-
there are often several. We repeat this gions (and the data objects in them)
search step recursively until we reach until they contain only a small number
the leaves of the tree. of polygon edges or vertices. These
Finkel and Bentley [1974] proposed edges or vertices (which together form
an exact description of the data objects) certainly use main memory structures
are then attached to the leaves of the for data that reside on disk, but their
tree. Another class of quadtree struc- performance is often considerably below
tures has been designed for the manage- the optimum because there is no control
ment of collections of rectangles; see over how the operating system performs
Samet [1988] for a survey. the disk accesses. The access methods
presented in this and the following sec-
4. POINT ACCESS METHODS tion have been designed with secondary
The multidimensional data structures storage management in mind. Their op-
presented in the previous section do not erations are closely coordinated with
take secondary storage management the operating system to ensure that
into account explicitly. They were origi- overall performance is optimized.
nally designed for main memory appli- As mentioned before, we first present
cations where all the data are available a selection of point access methods.
without accessing the disk. Despite Usually, the points in the database are
growing main memories, this is of organized in a number of buckets, each
course not always the case. In many of which corresponds to a disk page and
spatial database applications, such as to some subspace of the universe. The
geography, the amount of data to be subspaces (often called data regions,
managed is notoriously large. One can bucket regions, or simply regions, even
that objects located close to each other d-dimensional orthogonal grid on the
in original space should be likely to be universe. Because the grid is not neces-
stored close together on the disk. This sarily regular, the resulting cells may
could contribute substantially to mini- be of different shapes and sizes. A grid
mizing the number of disk accesses per directory associates one or more of these
range query. We begin our presentation cells with data buckets, which are
with several structures based on ex- stored on one disk page each. Each cell
tendible hashing. Structures based on is associated with one bucket, but a
linear hashing are discussed in Section bucket may contain several adjacent
4.1.5. The discussion of two hybrid cells. Since the directory may grow
methods, the BANG file and the buddy large, it is usually kept on secondary
tree, is postponed until Section 4.2.
storage. To guarantee that data items
4.1.1 The Grid File (Nievergelt et al. are always found with no more than two
1981]. As a typical representative of an disk accesses for exact match queries,
access method based on hashing, we the grid itself is kept in main memory,
first discuss the grid file and some of its represented by d one-dimensional ar-
variants.2 The grid file superimposes a rays called scales.
Figure 16 shows a grid file for the
2
See Hinrichs [1985], Ouksel [1985], Whang and
running example. We assume bucket ca-
Krishnamurthy [1985], Six and Widmayer [1988], pacity to be four data points. The center
and Blanken et al. [1990]. of the figure shows the directory with
scales on the x- and y-axes. The data page in question, a splitting hyperplane
points are displayed in the directory for H is introduced and a new data page n j
demonstration purposes only; they are is allocated. The new entry and the en-
not, of course, stored there. In the lower tries of the original page n i are redis-
left part, four cells are combined into a tributed among n i and n j , depending on
single bucket, represented by four their location relative to H. H is in-
pointers to a single page. There are thus serted into the corresponding scale; all
four directory entries for the same page, cells that intersect H are split accord-
which illustrates a well-known problem ingly. Splitting is therefore not a local
of the grid file: it suffers from a super- operation and can lead to superlinear
linear growth of the directory even for directory growth even for uniformly dis-
data that are uniformly distributed tributed data [Regnier 1985; Freeston
[Regnier 1985; Widmayer 1991]. The 1987; Widmayer 1991].
bucket region containing the point c5 Deletion is not a local operation ei-
could have been merged with one of the ther. With the deletion of an entry, the
neighboring buckets for better storage storage utilization of the corresponding
utilization. We present various merging data page may drop below the given
strategies later, when we discuss the threshold. Depending on the current
deletion of data points. partitioning of space, it may then be
To answer an exact match query, one possible to merge this page with a
first uses the scales to locate the cell neighbor page and to drop the partition-
containing the search point. If the ap- ing hyperplane from the corresponding
propriate grid cell is not in main mem- scale. Depending on the implementation
ory, one disk access is necessary. The of the grid directory, merging may re-
loaded cell contains a reference to the
quire a complete directory scan [Hin-
page where possibly matching data can
richs 1985]. Hinrichs discusses several
be found. Retrieving this page may re-
methods for finding candidates with
quire another disk access. Altogether,
which a given data bucket can merge,
no more than two page accesses are
necessary to answer this query. For a including the neighbor system and the
range query, one must examine all cells multidimensional buddy system. The
that overlap the search region. After neighbor system allows merging two ad-
eliminating duplicates, one fetches the jacent regions if the result is a rectan-
corresponding data pages into memory gular region again. In the buddy sys-
for more detailed inspection. tem, two adjacent regions can be
To insert a point, one first performs merged provided that the joined region
an exact match query to locate the cell can be obtained by a regular binary
and the data page n i where the entry subdivision of the universe. Both sys-
should be inserted. If there is sufficient tems are not able to eliminate com-
space left on n i , the new entry is in- pletely the possibility of a deadlock, in
serted. If not, we have to distinguish which case no merging is feasible be-
two cases, depending on the number of cause the resulting bucket region would
grid cells that point to the data page not be box-shaped [Hinrichs 1985; See-
where the new data item is to be in- ger and Kriegel 1990].
serted. If there are several, one checks For a theoretical analysis of the grid
whether an existing hyperplane stored file and some of its variants, see Reg-
in the scales can be used for splitting nier [1985] and Becker [1992]. Regnier
the data page successfully. If so, a new shows in particular that the grid file’s
data page is allocated and the data average directory size for uniformly dis-
points are distributed accordingly tributed data is Q(n 11(d21)/(db11) ),
among the data pages. If none of the where b is bucket size. He also proves
existing hyperplanes is suitable, or if that the average space occupancy of the
only one grid cell points to the data data buckets is about 69% (ln 2).
the original grid file by introducing a in the secondary file. Consequently, all
second grid file. As indicated by the points in S can be associated with an
name “twin,” the relationship between empty or a full bucket region of P. Note
these two grid files is not hierarchical, that there usually exists no unique opti-
as in the case of the two-level grid file, mum for the distribution of data points
but somewhat more balanced. Both grid between the two files.
files span the whole universe. The dis- The fact that data points may be
tribution of the data among the two files found in either of the two grid files
is performed dynamically. Hutflesz et requires search operations to visit the
al. [1988b] report an average occupancy two files, which causes some overhead.
of 90% for the twin grid file (compared Nevertheless, the performance results
to 69% for the original grid file) without reported by Hutflesz et al. [1988b] indi-
substantial performance penalties. cate that the search efficiency of the
To illustrate the underlying tech- twin grid file is competitive with the
nique, consider the running example de- original grid file. Although the twin grid
picted in Figure 18. Let us assume that file is somewhat inferior to the original
each bucket can accommodate four grid file for smaller query ranges, this
points. If the number of points in a changes for larger search spaces.
bucket exceeds that limit, one possibil-
ity is to create a new bucket and redis- 4.1.5 Multidimensional Linear Hash-
tribute the points among the two new ing. Unlike multidimensional extend-
buckets. Before doing this, however, the ible hashing, multidimensional linear
twin grid file tries to redistribute the hashing uses no or only a very small
points between the two grid files. A directory. It therefore occupies rela-
transfer of points from the primary file tively little storage compared to extend-
P to the secondary file S may lead to a ible hashing, and it is usually possible
bucket overflow in S. It may, however, to keep all relevant information in main
also imply a bucket underflow in P, memory.
which may in turn lead to a bucket Several different strategies have been
merge and therefore to a reduction of proposed to perform the required ad-
buckets in P. The overall objective of dress computation. Early proposals
the reshuffling is to minimize the total [Ouksel and Scheuermann 1983] failed
number of buckets in the two grid files to support range queries; however, Krie-
P and S. Therefore we shift points from gel and Seeger [1986] later proposed a
P to S if and only if the resulting de- variant of linear hashing called multidi-
crease in the number of buckets in P mensional order-preserving linear hash-
outweighs the increase in the number of ing with partial expansions (MOLHPE).
buckets in S. This strategy also favors This structure is based on the idea of
points to be placed in the primary file in partially extending the buckets without
order to form large and empty buckets expanding the file size at the same
time. To this end, they use a d-dimen- three to four buckets be read in a row on
sional expansion pointer referring to the the average before a seek is required,
group of pages to be expanded next. whereas MOLHPE manages to read
With this strategy, Kriegel and Seeger only one [Hutflesz et al. 1988a]. Wid-
can guarantee a modest file growth, at mayer [1991] later noted, however, that
least in the case of well-behaved data. both z-hashing and MOLHPE are of
According to their experimental results, limited use in practice, due to their in-
MOLHPE outperforms its competitors ability to adapt to different data distri-
for uniformly distributed data. It fails, butions.
however, for nonuniform distributions,
mostly because the hashing function
does not adapt gracefully to the given 4.2 Hierarchical Access Methods
distribution.
To solve this problem, the same au- In this section we discuss several PAMs
thors later applied a stochastic tech- that are based on a binary or multiway
nique [Burkhard 1984] to determine the tree structure. Except for the BANG file
split points. Because of the name of that and the buddy tree, which are hybrid
technique (a-quantiles), the access structures, they perform no address
method was called quantile hashing computation. Like hashing-based meth-
[Kriegel and Seeger 1987, 1989]. The ods, however, they organize the data
critical property of the division in quan- points in a number of buckets. Each
tile hashing is that the original data, bucket usually corresponds to a leaf
which may have a nonuniform distribu- node of the tree (also called data node)
tion, are transformed into uniformly and a disk page, which contains those
distributed values for a. These values points located in the corresponding
are then used as input to the MOLHPE bucket region. The interior nodes of the
algorithms for retrieval and update. tree (also called index nodes) are used to
Since the region boundaries are not nec- guide the search; each of them typically
essarily simple binary intervals, a small corresponds to a larger subspace of the
directory is needed. In exchange, universe that contains all bucket re-
skewed input data can be maintained as gions in the subtree below. A search
efficiently as uniformly distributed operation is then performed by a top-
data. Piecewise linear order-preserving down tree traversal.
(PLOP) hashing was proposed by the At this point, individual tree struc-
same authors a year later [Kriegel and tures still dominate the field, although
Seeger 1988]. Because this structure more generic concepts are gradually at-
can also be used as an access method for tracting more attention. The general-
extended objects, we delay its discus- ized search (GIST) tree by Hellerstein et
sion until Section 5.2.7. al. [1995], for example, attempts to sub-
Another variant with better order- sume many of these common features
preserving properties than MOLHPE under a generic architecture.
has been reported by Hutflesz et al. Differences among individual struc-
[1988a]. Their dynamic z-hashing uses a tures are mainly based on the charac-
space-filling technique called z-ordering teristics of the regions. Table 1 shows
[Orenstein and Merrett 1984] to guar- that in most PAMs the regions at the
antee that points located close to each same tree level form a partitioning of
other are also stored close together on the universe; that is, they are mutually
the disk. Z-ordering is described in de- disjoint, with their union being the com-
tail in Section 5.1.2. One disadvantage plete space. For SAMs this is not neces-
of z-hashing is that a number of useless sarily true; as we show in Section 5,
data blocks will be generated, as in the overlapping regions and partial cover-
interpolation-based grid file [Ouksel age are important techniques to im-
1985]. On the other hand, z-hashing lets prove the search performance of SAMs.
4.2.1 The k-d-B-Tree [Robinson 1981]. about half the entries are shifted to the
The k-d-B-tree combines some of the new data node. Various heuristics are
properties of the adaptive k-d-tree available to find an optimal split [Rob-
[Bentley and Friedman 1979] and the inson 1981]. If the parent index node
B-tree [Comer 1979] to handle multidi- does not have enough space left to ac-
mensional points. It partitions the uni- commodate the new entries, a new page
verse in the manner of an adaptive k-d- is allocated and the index node is split
tree and associates the resulting by a hyperplane. The entries are dis-
subspaces with tree nodes. Each inte- tributed among the two pages depend-
rior node corresponds to an interval- ing on their position relative to the
shaped region. Regions corresponding to splitting hyperplane, and the split is
nodes at the same tree level are mutu- propagated up the tree. The split of the
ally disjoint; their union is the complete index node may also affect regions at
universe. The leaf nodes store the data lower levels of the tree, which must be
points that are located in the corre- split by this hyperplane as well. Be-
sponding partition. Like the B-tree, the cause of this forced split effect, it is not
k-d-B-tree is a perfectly balanced tree possible to guarantee a minimum stor-
that adapts well to the distribution of age utilization.
the data. Other than for B-trees, how- Deletion is straightforward. After per-
ever, no minimum space utilization can forming an exact match query, the entry
be guaranteed. A k-d-B-tree for the run- is removed. If the number of entries
ning example is sketched in Figure 19. drops below a given threshold, the data
Search queries are answered in a node may be merged with a sibling data
straightforward manner, analogously to node as long as the union remains a
the k-d-tree algorithms. For the inser- d-dimensional interval. The procedure
tion of a new data point, one first per- to find a suitable sibling node to merge
forms a point search to locate the right with may involve several nodes. The
bucket. If it is not full, the entry is union of data pages results in the dele-
inserted. Otherwise, it is split and tion of at least one hyperplane in the
parent index node. If an underflow oc- tree for the running example with one
curs, the deletion has to be propagated external directory page.
up the tree. As indicated previously, the split
strategy of the LSD-tree does not as-
4.2.2 The LSD-Tree [Henrich et al.
sume the data to be uniformly distrib-
1989]. We list the LSD (Local Split De-
uted. On the contrary, it tries to accom-
cision) tree as a point access method
modate skewed data by combining two
although its inventors emphasize that
split strategies:
the structure can also be used for man-
aging extended objects. This claim is —data-dependent (SP 1 ): The choice of
based on the fact that the LSD-tree the split depends on the data and
adapts well to data that are nonuni- tries to achieve a most balanced
formly distributed and that it is there- structure; that is, there should be an
fore well-suited for use in connection equal number of objects on both sides
with the transformation technique; a of the split. As the name of the struc-
more detailed discussion of this ap- ture suggests, this split decision is
proach appears in Section 5.1.1. made locally.
The directory of the LSD-tree is orga- —distribution-dependent (SP 2 ): The
nized as an adaptive k-d-tree, partition- split is done at a fixed dimension and
ing the universe into disjoint cells of position. The given data are not taken
various sizes. This results in a better into account because an underlying
adaption to the data distribution than (known) distribution is assumed.
the fixed binary partitioning. Although
the k-d-tree may be arbitrarily unbal- To determine the split position SP, one
anced, the LSD-tree preserves the ex- computes the linear combination of the
ternal balancing property; that is, the split locations that would result from
heights of its external subtrees differ at applying just one of those strategies:
most by one. This property is main-
tained by a special paging algorithm. If SP 5 a SP 1 1 ~ 1 2 a ! SP 2 .
the structure becomes too large to fit in
main memory, this algorithm identifies The factor a is determined empirically
subtrees that can be paged out such based on the given data; it can vary as
that the external balancing property is objects are inserted and deleted from
preserved. Although efficient, this spe- the tree.
cial paging strategy is obviously a major Henrich [1995] presented two algo-
impediment for the integration of the rithms to improve the storage utiliza-
LSD-tree into a general-purpose data- tion of the LSD-tree by redistributing
base system. Figure 20 shows an LSD- data entries among buckets. Since these
strategies make the LSD-tree sensitive Seeger and Kriegel [1990], the number
to the insertion sequence, the splitting of possible buddies is larger than in the
strategy must be adapted accordingly. grid file and other structures, which
In order to improve the search perfor- makes the buddy tree more flexible in
mance for nonpoint data and range que- the case of updates. Experiments by
ries, Henrich and Möller [1995] suggest Kriegel et al. [1990] indicate that the
storing auxiliary information on the ex- buddy tree is superior to several other
isting data regions along with the index PAMs, including the hB-tree, the BANG
entries of the LSD-tree. file, and the two-level grid file. A buddy
tree for the running example is shown
4.2.3 The Buddy Tree [Seeger and
in Figure 21.
Kriegel 1990]. The buddy tree is a dy-
Two older structures, the interpola-
namic hashing scheme with a tree-
tion-based grid file by Ouksel [1985]
structured directory. The tree is con-
and the balanced multidimensional ex-
structed by consecutive insertion,
tendible hash tree by Otoo [1986], are
cutting the universe recursively into
both special cases of the buddy tree that
two parts of equal size with iso-oriented
can be obtained by restricting the prop-
hyperplanes. Each interior node n corre-
erties of the regions. Interpolation-
sponds to a d-dimensional partition
based grid files avoid the excessive
P d ( n ) and to an interval I d ( n ) # P d ( n ).
growth of the grid file directory by rep-
I d ( n ) is the MBB of the points or inter-
resenting blocks explicitly, which guar-
vals below n. Partitions P d (and there-
antees that there is only one directory
fore intervals I d ) that correspond to
entry for each data bucket. The disad-
nodes on the same tree level are mutu-
vantage of this approach is that empty
ally disjoint. As in all tree-based struc-
regions have to be introduced in the
tures, the leaves of the directory point
case of skewed data input. Seeger
to the data pages. Other important
[1991] later showed that the buddy tree
properties of the buddy tree include:
can easily be modified to handle spa-
(1) each directory node contains at least tially extended objects by using one of
two entries; the techniques presented in Section 5.
(2) whenever a node n is split, the
MBBs I d ( n i ) and I d ( n j ) of the two 4.2.4 The BANG File [Freeston
resulting subnodes n i and n j are re- 1987]. To obtain a better adaption to
computed to reflect the current situ- the given data points, Freeston [1987]
ation; and proposed a new structure, which he
(3) except for the root of the directory, called the BANG (Balanced And Nested
there is exactly one pointer refer- Grid) file—even though it differs from
ring to each directory page. the grid file in many aspects. Similar to
the grid file, it partitions the universe
Due to property 1, the buddy tree may into intervals (boxes). What is different,
not be balanced; that is, the leaves of however, is that in the BANG file
the directory may be on different levels. bucket regions may intersect, which is
Property 2 tries to achieve a high selec- not possible in the regular grid file. In
tivity at the directory level. Properties 1 particular, one can form nonrectangular
and 3 make sure that the growth of the bucket regions by taking the geometric
directory remains linear. To avoid the difference of two or more intervals
deadlock problem of the grid file, the (nesting). To increase storage utiliza-
buddy tree uses k-d-trees [Orenstein tion, it is possible during insertion to
1982] to partition the universe. Only a redistribute points between different
restricted number of buddies are admit- buckets. To manage the directory, the
ted, namely, those that could have been BANG file uses a balanced search tree
obtained by some recursive halving of structure. In combination with the
the universe. However, as shown by hash-based partitioning of the universe,
the BANG file can therefore be viewed the spanning problem at the possible
as a hybrid structure. expense of lower storage utilization. Ku-
Figure 22 shows the BANG file for the mar [1994a] made a similar proposal
running example. Three rectangles have based on the BD-tree and called the
been cut out of the universe R1: R2, R5, resulting structure a G-tree (grid tree).
and R6. In turn, the rectangles R3 and The structure differs from the BD-tree
R4 are nested into R2 and R5, respec- in the way the partitions are mapped
tively. If one represents the resulting into buckets. To obtain a simpler map-
space partitioning as a tree using bit ping, the G-tree sacrifices the minimum
interleaving, one obtains the structure storage utilization that holds for the
shown on the right-hand side of Figure BD-tree.
22. Here the asterisk represents the Although the data partitioning given
empty string, that is, the universe. A in Figure 22 is feasible for the BD-tree
comparison with Figure 13 shows that and the original BANG file, it cannot be
the BANG file can in fact be regarded as achieved with the BANG file using
a paginated version of the BD-tree dis- forced splits [Freeston 1989a]. For this
cussed in Section 3.2.3. variant, we would have to split the root
In order to achieve a high storage and move, for example, entry c5 to the
utilization, the BANG file performs bucket containing the entries p7 and c6.
spanning splits that may lead to the Freeston [1989b] also proposed an ex-
displacement of parts of the tree. As a tension to the BANG file to handle ex-
result, a point search may in the worst tended objects. As often found in PAM
case require the traversal of the entire extensions, the centroid is used to deter-
directory in a depth-first manner. To mine the bucket in which to place a
address this problem, Freeston [1989a] given object. To account for the object’s
later proposed different splitting strate- spatial extension, the bucket regions
gies, including forced splits as used by are extended where necessary [Seeger
the k-d-B-tree. These strategies avoid and Kriegel 1988; Ooi 1990].
Ouksel and Mayer [1992] proposed an but a directed acyclic graph. With re-
access method called a nested interpola- gard to the geometry, this corresponds
tion-based grid file that is closely re- to the union of the corresponding re-
lated to the BANG file. The major dif- gions. Once again, the resulting region
ference concerns the way the directory is typically no longer box-shaped. This
is organized. In essence, the directory peculiarity is illustrated in Figure 23,
consists of a list of one-dimensional ac- which shows an hB-tree for the running
cess methods (e.g., B-trees) storing the example. Here the root node contains
z-order encoding of the different data two pointers to its left descendant node.
regions, along with pointers to the re- Its corresponding region u is the union
spective data buckets. By doing so, Ouk- of two rectangles: the one to the left of
sel and Mayer improved the worst-case x1 and the one above y1. The remaining
bounds from O(n) (as in the case of the space (the right lower quadrant) is ex-
BANG file) to O(logb n), where b is cluded from u, which is made explicit by
bucket size. the entry ext in the corresponding k-d-
4.2.5 The hB-Tree [Lomet and Salz- tree. A similar observation applies to
berg 1989, 1990]. The hB-tree (holey region G, which is again L-shaped: it
brick tree) is related to the k-d-B-tree in corresponds to the NW, the SE, and the
that it utilizes k-d-trees to organize the NE quadrants of the rectangle above y1.
space represented by its interior nodes. Searching is similar to the k-d-B-tree;
One of the most noteworthy differences each internal k-d-tree is traversed as
is that node splitting is based on multi- usual. Insertions are also carried out
ple attributes. As a result, nodes no analogously to the k-d-B-tree until a
longer correspond to d-dimensional in- leaf node reaches its capacity and a split
tervals but to intervals from which is required. Instead of using just one
smaller intervals have been excised. single hyperplane to split the node, the
Similar to the BANG file, the result is a hB-tree split is based on more than one
somewhat fractal structure (a holey attribute and on the internal k-d-tree of
brick) with an external enclosing region the data node to be split. Lomet and
and several cavities called extracted re- Salzberg [1989] show that this policy
gions. As we show later, this technique guarantees a worst-case data distribu-
avoids the cascading of splits that is tion between the two resulting two
typical for many other structures. nodes of 31 : 32. This observation is not
In order to minimize redundancy, the restricted to the hB-tree but generalizes
k-d-tree corresponding to an interior to other access methods such as the
node can have several leaves pointing to BD-tree and the BANG file.
the same child node. Strictly speaking, The split of the leaf node causes the
the hB-tree is therefore no longer a tree introduction of an additional k-d-tree
conjecture that one can maintain the responding regions overlap the search
major strengths of the B-tree in higher point. Among those entries inspected,
dimensions, provided one relaxes the we choose the best-matching entry to
strict requirements concerning tree investigate next. We may possibly also
balance and storage utilization. The BV- store some guards in the guard set. At
tree is not completely balanced. Fur- the next level this procedure is repeated
thermore, although the B-tree guaran- recursively, this time taking the stored
tees a worst-case storage utilization of guards into account. Before following
50%, Freeston argues that such a com- the best-matching entry down to the
paratively high storage utilization can- next level, the guard set is updated by
not be ensured for higher dimensions merging the matching new guards with
for topological reasons. However, the the existing ones. Two guards at the
BV-tree manages to achieve the 33% same level are merged by discarding the
lower bound suggested by Lomet and poorer match. This search continues re-
Salzberg [1989]. cursively until we reach the leaf level.
To achieve a guaranteed worst-case Note that for point queries, the length
search performance, the BV-tree com- of the search path is equal to the height
bines the excision concept [Freeston of the BV-tree because each region in
1987] with a technique called promo- space is represented by a unique node
tion. Here, intervals from lower levels of entry.
the tree are moved up the tree, that is, Figure 24 shows a BV-tree and the
closer to the root. To keep track of the corresponding space partitioning for the
resulting changes, with each promoted running example. For illustration pur-
region we store a level number (called a poses we confine the grouped regions or
guard) that denotes the region’s original objects not by a tight polyline, but by a
level. loosely wrapped boundary. In this ex-
The search algorithms are based on a ample, the region D0 acts as a guard. It
notional backtracking technique. While is clear from the space partitioning that
descending the tree, we store possible D0 originally belongs to the bottom in-
alternatives (relevant guards of the dif- dex level (i.e., the middle level in the
ferent index levels) in a guard set. The figure). Since it functions as a guard for
entries of this set act as backtracking the enclosed region S1, however, it has
points and represent a single path from been promoted to the root level. Sup-
the root to the level currently inspected; pose we are interested in all objects
for point queries, they can be main- intersecting the black rectangle X.
tained as a stack. To answer a point Starting at the root, we place D0 in the
query, we start at the root and inspect guard set and investigate S1. Because
all node entries to see whether the cor- inspection of S1 reveals that the search
region is included neither in P0 nor in they first partition the universe with a
N0 or M0, we backtrack to D0 and in- grid. Each of the grid cells is labeled
spect the entries for D0. In our example, with a unique number that defines its
no entry satisfies the query. position in the total order (the space-
In a later paper, Freeston [1997] dis- filling curve). The points in the given
cusses complexity issues related to up- data set are then sorted and indexed
dates of guards. In the presence of such according to the grid cell in which they
updates, it is necessary to “downgrade” are contained. Note that although the
(demote) entries that are no longer labeling is independent of the given
guards, which may in turn affect the data, it is obviously critical for the pres-
overall structure negatively. Freeston’s ervation of proximity in one-dimen-
conclusion is that the logarithmic access sional address space. That is, the way
performance and the minimum storage we label the cells determines how clus-
utilization of the BV-tree can be pre- tered adjacent cells are stored on sec-
served by postponing the demotion of ondary memory.
such entries, which may lead to (very) Figure 25 shows four common label-
large index nodes. ings. Figure 25a corresponds to a row-
wise enumeration of the cells [Samet
4.3 Space-Filling Curves for Point Data 1990b]. Figure 25b shows the cell enu-
meration imposed by the Peano curve
We already mentioned the main reason [Morton 1966], also called quad codes
why the design of multidimensional ac- [Finkel and Bentley 1974], N-trees
cess methods is so difficult compared to [White 1981], locational codes [Abel and
the one-dimensional case: There is no Smith 1983], or z-ordering [Orenstein
total order that preserves spatial prox- and Merrett 1984]. Figure 25c shows
imity. One way out of this dilemma is to the Hilbert curve [Faloutsos and Rose-
find heuristic solutions, that is, to look man 1989; Jagadish 1990a], and Figure
for total orders that preserve spatial 25d depicts Gray ordering [Faloutsos
proximity at least to some extent. The 1986, 1988], which is obtained by inter-
idea is that if two objects are located leaving the Gray codes of the x- and
close together in original space, there y-coordinates in a bitwise manner. Gray
should at least be a high probability codes of successive cells differ in exactly
that they are close together in the total one bit.
order, that is, in the one-dimensional Based on several experiments, Abel
image space. For the organization of and Mark [1990] conclude that z-order-
this total order one could then use a ing and the Hilbert curve are most suit-
one-dimensional access method (such as able as multidimensional access meth-
a B1-tree), which may provide good per- ods. Jagadish [1990a] and Faloutsos
formance at least for point queries. and Rong [1991] all prefer the Hilbert
Range queries are somewhat more com- curve of those two.
plicated; a simple mapping from multi- Z-ordering is one of the few spatial
dimensional to one-dimensional range access methods that has found its way
queries often implies major performance into commercial database products. In
penalties. Tropf and Herzog [1981] particular, Oracle [1995] has adapted
present a more sophisticated and effi- the technique and offered it for some
cient algorithm for this problem. time as a product.
Research on the underlying mapping An important advantage of all space-
problem goes back well into the last filling curves is that they are practically
century; see Sagan [1994] for a survey. insensitive to the number of dimensions
With regard to its relevance for spatial if the one-dimensional keys can be arbi-
searching, Samet [1990b] provides a trarily large. Everything is mapped into
good overview of the subject. One thing one-dimensional space, and one’s favor-
all proposals have in common is that ite one-dimensional access method can
Figure 26. Search queries in dual space— endpoint transformation: (a) intersection query; (b) contain-
ment/enclosure queries; (c) point query.
Figure 27. Search queries in dual space—midpoint transformation: (a) intersection query; (b) contain-
ment/enclosure queries; (c) point query.
al. 1989; Orenstein 1990; Pagel et al. nism to avoid searching large empty
1993]. Second, depending on the map- query spaces, which may occur as a
ping chosen, the distribution of points result of the transformation.
in dual space may be highly nonuniform
even though the original data are uni- 5.1.2 Space-Filling Curves for Ex-
formly distributed. With the endpoint tended Objects. Space-filling curves (cf.
transformation, for example, there are Section 4.3) are a very different type of
no image points below the main diago- transformation approach that seems to
nal [Faloutsos et al. 1987]. Third, the have fewer of the drawbacks listed in
images of two objects that are close in the previous section. Space-filling
the original space may be arbitrarily far
curves can be used to represent ex-
apart from each other in dual space.
tended objects by a list of grid cells or,
To overcome some of these problems,
Henrich et al. [1989], Faloutsos and equivalently, a list of one-dimensional
Rong [1991], as well as Pagel et al. intervals that define the position of the
[1993] have proposed special transfor- grid cells concerned. In other words, a
mation and split strategies. A structure complex spatial object is approximated
designed explicitly to be used in connec- not by only one simpler object, but by
tion with the transformation technique the union of several such objects. There
is the LSD-tree (cf. Section 4.2.2). Per- are different variations of this basic
formance studies by Henrich and Six concept, including z-ordering [Orenstein
[1991] confirm the claim that the LSD- and Merrett 1984], the Hilbert R-tree
tree adapts well to nonuniform distribu- [Kamel and Faloutsos 1994], and the
tions, which is of particular relevance in UB-tree [Bayer 1996]. As an example,
this context. It also contains a mecha- we discuss z-ordering in more detail.
For a discussion of the Hilbert R-tree, ing the universe. After several splits,
see Section 5.2.1. starting with a vertical split line, we
Z-ordering [Orenstein and Merrett obtain Figure 28b. Nine Peano regions
1984] is based on the Peano curve. A of different shapes and sizes approxi-
simple algorithm to obtain the z-order- mate the object. The labeling of each
ing representation of a given extended Peano region is shown in Figure 28c.
object can be described as follows. Start- Consider the Peano region z# in the
ing from the (fixed) universe containing lower left part of the given polygon. It
the data object, space is split recur- lies to the left of the first vertical hyper-
sively into two subspaces of equal size plane and below the first horizontal hy-
by (d 2 1)-dimensional hyperplanes. As perplane, resulting in the first two bits
in the k-d-tree, the splitting hyper- being 00. As we further partition the
planes are iso-oriented, and their direc- lower left quadrant, z# lies on the left of
tions alternate in fixed order among the the second vertical hyperplane but
d possibilities. The subdivision contin- above the second horizontal hyperplane.
ues until one of the following conditions The complete bit string accumulated so
holds. far is therefore 0001. In the next round
(1) The current subspace does not over- of decompositions, z# lies to the right of
lap the data object. the third vertical hyperplane and above
the third horizontal hyperplane, result-
(2) The current subspace is fully en-
ing in two additional 1s. The complete
closed in the data object.
bit string describing z# is therefore
(3) Some given level of accuracy has 000111.
been reached. Figures 28b and 28c also give some
The data object is thus represented by bit strings along the coordinate axes,
a set of cells, called Peano regions or which describe only the splits orthogo-
z-regions. As shown in Section 3.2.3, nal to the given axis. The string 01 on
each such Peano region can be repre- the x-axis, for example, describes the
sented by a unique bit string, called subspace to the left of the first vertical
Peano code, ST_MortonNumber, z-value, split and to the right of the second ver-
or DZ-expression. Using those bit tical split. By bit-interleaving the bit
strings, the cells can then be stored in a strings that one finds when projecting a
standard one-dimensional index, such Peano region onto the coordinate axes,
as a B1-tree. we obtain its Peano code. Note that if a
Figure 28 shows a simple example. Peano code z 1 is the prefix of some other
Figure 28a shows the polygon to be ap- Peano code z 2 , the Peano region corre-
proximated, with the frame represent- sponding to z 1 encloses the Peano re-
gion corresponding to z 2 . The Peano re- they do not satisfy the search predicate.
gion corresponding to 00, for example, A simple way to reduce the number of
encloses the regions corresponding to false drops is to add a single bit to the
0001 and 000. This is an important ob- encoding that reflects for each Peano
servation, since it can be used for query region whether it is completely enclosed
processing [Gaede and Riekert 1994]. in the original object [Gaede 1995a]. An
Figure 29 shows Peano regions for the advantage of z-ordering is that local
running example. changes of granularity lead to only local
As z-ordering is based on an underly- changes of the corresponding encoding.
ing grid, the resulting set of Peano re-
gions is usually only an approximation 5.2 Overlapping Regions
of the original object. The termination
criterion depends on the accuracy or The key idea of the overlapping regions
granularity (maximum number of bits) technique is to allow different data
desired. More Peano regions obviously buckets in an access method to corre-
yield more accuracy, but they also in- spond to mutually overlapping sub-
crease the size and complexity of the spaces. With this method we can assign
approximation. As pointed out by Oren- any extended object directly and as a
stein [1989b], there are two possibly whole to one single bucket region. Con-
conflicting objectives: the number of sider, for instance, the k-d-B-tree for
Peano regions to approximate the object the running example, depicted in Figure
should be small, since this results in 19, and one of the polygons given in the
fewer index entries; and the accuracy of scenario (Figure 9), say r10. r10 over-
the approximation should be high, since laps two bucket regions, the one con-
this reduces the expected number of taining p10, c1, and c2, and the other
false drops [Orenstein 1989a, b; Gaede one containing c10 and p9. If we extend
1995b]. Objects are thus paged in from one of those regions to accommodate
secondary memory, only to find out that r10, this polygon could be stored in the
researchers led to the development of and Yao 1981] to R-trees, yielding two
more sophisticated policies. The packed structures both called the R-link tree.
R-tree [Roussopoulos and Leifker 1985], Kornacker and Banks empirically dem-
for example, computes an optimal parti- onstrate that their R-link tree is supe-
tioning of the universe and a corre- rior to the R-tree using lock-coupling.
sponding minimal R-tree for a given 5.2.2 The R*-Tree [Beckmann et al.
scenario. However, it requires all data 1990]. Based on a careful study of R-
to be known a priori. tree behavior under different data dis-
Other interesting variants of the R- tributions, Beckmann et al. [1990] iden-
tree include the sphere tree by Oosterom tified several weaknesses of the original
[1990] and the Hilbert R-tree by Kamel algorithms. In particular, they con-
and Faloutsos [1994]. The sphere tree firmed the observation of Roussopoulos
corresponds to a hierarchy of nested and Leifker [1985] that the insertion
d-dimensional spheres rather than in- phase is critical for good search perfor-
tervals. The Hilbert R-tree combines the mance. The design of the R*-tree (see
overlapping regions technique with Figure 31) therefore introduces a policy
space-filling curves (cf. Section 4.3). It called forced reinsert: If a node over-
first stores the Hilbert values of the flows, it is not split right away. Rather,
data rectangles’ centroids in a B1-tree, p entries are removed from the node
then enhances each interior B1-tree and reinserted into the tree. The pa-
node by the MBB of the subtree below. rameter p may vary; Beckmann et al.
This facilitates the insertion of new ob- suggest it should be about 30% of the
jects considerably. Together with a re- maximal number of entries per page.
vised splitting policy, Kamel and Fa- Another issue investigated by Beck-
loutsos report good performance results mann et al. concerns the node-splitting
for both searches and updates. How- policy. Although Guttman’s R-tree algo-
ever, since their splitting policy takes rithms tried only to minimize the area
only the objects’ centroids into account, covered by the bucket regions, the R*-
the performance of the structure is tree algorithms also take the following
likely to deteriorate in the presence of objectives into account.
large objects.
Ng and Kameda [1993] discuss how to —Overlap between bucket regions at
support concurrency in R-trees by the same tree level should be mini-
adopting the lock-coupling technique of mized. The less overlap, the smaller
B-trees [Bayer and Schkolnick 1977] to the probability that one has to follow
R-trees. Similarly, Ng and Kameda multiple search paths.
[1994] and Kornacker and Banks [1995] —Region perimeters should be mini-
apply ideas of the B-link tree [Lehman mized. The preferred rectangle is the
square, since this is the most compact the usual block size. In order to find a
rectangular representation. suitable split, the X-tree also maintains
—Storage utilization should be maxi- the history of previous splits.
mized.
5.2.3 The P-Tree [Jagadish 1990c]. In
The improved splitting algorithm of many applications, intervals are not a
Beckmann et al. [1990] is based on the good approximation of the data objects
plane-sweep paradigm [Preparata and enclosed. In order to combine the flexi-
Shamos 1985]. In d dimensions, its time bility of polygon-shaped containers with
complexity is O(d z n z log n) for a node the simplicity of the R-tree, Jagadish
with n intervals. [1990c] and Schiwietz [1993] indepen-
In summary, the R*-tree differs from dently proposed different variations of
the R-tree mainly in the insertion algo- polyhedral trees or P-trees. To distin-
rithm; deletion and searching are essen- guish the two structures, we refer to the
tially unchanged. Beckmann et al. re- P-tree of Jagadish [1990c] as the JP-tree
port performance improvements of up to and to the P-tree of Schiwietz [1993] as
50% compared to the basic R-tree. Their the SP-tree.
implementation also shows that reinser- The JP-tree first introduces a variable
tion may improve storage utilization. In number m of orientations in the d-di-
broader comparisons, however, Hoel mensional universe, where m . d. For
and Samet [1992] and Günther and instance, in two dimensions (d 5 2) we
Gaede [1997] found that the CPU time
may have four orientations (m 5 4):
overhead of reinsertion can be substan-
two parallel to the coordinate axes (i.e.,
tial, especially for large page sizes; see
iso-oriented) and two parallel to the two
Section 6 for further details.
main diagonals. Objects are approxi-
One of the major insights of the R*-
mated by minimum bounding polytopes
tree is that node splitting is critical for
the overall performance of the access whose faces are parallel to these m ori-
method. Since a naive (exhaustive) ap- entations. Clearly, the quality of the
proach has time complexity O(d z 2 n ) for approximations is positively correlated
n given intervals, there is a need for with m. We can now map the original
efficient and optimal splitting policies. space into an m-dimensional orientation
Becker et al. [1992] proposed a polyno- space, such that each (d-dimensional)
mial time algorithm that finds a bal- approximating polytope P d turns into
anced split, which also optimizes one of an m-dimensional interval I m . Any
several possible objective functions point inside (outside) P d maps onto a
(e.g., minimum sum of areas or mini- point inside (outside) I m , whereas the
mum sum of perimeters). They assume opposite is not necessarily true. To
in their analysis that the intervals are maintain the m-dimensional intervals,
presorted in some specific order. More a large selection of SAMs is available;
recently, Ang and Tan [1997] presented Jagadish [1990c] suggests the R-tree or
a new linear node splitting algorithm, R1-tree (cf. Section 5.3.2) for this pur-
based on a simple heuristic. According pose.
to the results reported, it outperforms An interesting feature of the JP-tree
its competitors. is the ability to add hyperplanes to the
Berchtold et al. [1996] proposed a attribute space dynamically without
modification of the R-tree called the X- having to reorganize the structure. By
tree that seems particularly well suited projecting the new intervals of the ex-
for indexing high-dimensional data. The tended orientation space onto the old
X-tree reduces overlap among directory orientation space, it is still possible to
intervals by using a new organization: it use the old structure. Consequently, we
postpones node splitting by introducing can obtain an R-tree from a higher-
supernodes, that is, nodes larger than dimensional JP-tree structure by drop-
ping all hyperplanes that are not iso- dra. This work continues the line of
oriented. work by Jagadish [1990c] in the use of
The interior nodes of the JP-tree rep- nonstandard axes for better filtering.
resent a hierarchy of nested polytopes,
similar to the R-tree or the cell tree (cf. 5.2.4 The P-Tree [Schiwietz 1993]. The
Section 5.3.3). Polytopes corresponding P-Tree of Schiwietz, here called the SP-
to different nodes at the same tree level tree, chooses a slightly different ap-
may overlap. For search operations we proach for storing polygonal objects that
first compute the minimum bounding tries to combine the advantages of the
polytope of the search region and map it cell tree and the R*-tree for the two-
onto an m-dimensional interval. The dimensional case, while avoiding the
search efficiency then depends on the drawbacks of both methods. Basically,
chosen PAM. The same applies for dele- the SP-tree is an R-tree whose interior
tion. nodes correspond to a nesting of poly-
The introduction of additional hyper- topes rather than just rectangles. In
planes yields a better approximation, general, the number of vertices (and
but it increases the size of the entries, therefore the storage requirements) of a
thus reducing the fanout of the interior polytope are not bounded. Moreover,
nodes. Experiments reported by Jagad- when used for approximating other ob-
ish [1990c] suggest that a 10-dimen- jects, the accuracy of the approximation
sional orientation space (m 5 10) is a is positively correlated with the number
good choice for storing 2-dimensional of vertices of the approximating convex
lines (d 5 2) with arbitrary orientation. polygon. On the other hand, when used
This needs to be compared to a simple as index entries, there should be an
MBB approach. Although the latter upper bound in order to guarantee a
technique may sometimes render poor minimum fanout of the interior nodes.
approximations, the representation re- To determine a reasonably good compro-
quires only four numbers per line. Stor- mise between these conflicting objec-
ing a 10-dimensional interval, on the tives, extensive investigations have
other hand, requires 20 numbers, that been conducted by Brinkhoff et al.
is, five times as many. Another draw- [1993a] and Schiwietz [1993]. According
back of the JP-tree is the fixed orienta- to these studies, pentagons or hexagons
tion of the hyperplanes. Figure 32 seem to offer the best tradeoff between
shows the running example for m 5 4. storage requirements and approxima-
To overcome the problem of poor fil- tion quality.
tering, Brodsky et al. [1995] proposed If node splittings or insertions lead to
methods for effectively computing a set additional vertices such that some
of optimal axes for separating polyhe- bounding polygons have more vertices
than the threshold, the surplus vertices per) subspace. If none of the objects
are removed one by one. This leads to a placed in the corresponding subspace
larger area and therefore to a decrease crosses the splitting hyperplane, the
of the quality of the approximation. To lower bound of the upper interval is
reduce overlap between the convex con- greater than the discriminator, and the
tainers, Schiwietz suggests using a upper bound of the lower interval is less
method similar to the R*-tree. Further- than dk. Leaf nodes of the binary tree
more, in order to save storage space and contain the minimal bounds (dotted
to improve storage utilization, it is pos- lines) of the objects in the corresponding
sible to restrict the number of orienta- data page.
tions for the polygon edges (similar to Prior to inserting an object o, we de-
the JP-tree). termine its centroid and its MBB. By
Figure 33 shows the SP-tree for the comparing the centroid with the stored
running example. To our knowledge, no discriminators, we determine the next
performance results have been reported child to be inspected. Note that there is
so far for either of the two P-trees. no ambiguity. During insertion, we have
5.2.5 The SKD-Tree [Ooi et al. 1987; to adjust the upper and lower bounds
Ooi 1990]. A variant of the k-d-tree ca- for extended objects accordingly. Upon
pable of storing spatially extended ob- reaching the data node level, we test
jects is the spatial k-d-tree or skd-tree. whether there is enough space available
The skd-tree allows regions to overlap. to accommodate the object. If so, we
To keep track of the mutual overlap, we insert the object; otherwise we split the
store an upper and a lower bound with data node and insert the new discrimi-
each discriminator, representing the nator into the skd-tree. Likewise, the
maximal extent of the objects in the two bounds of the new subspaces need to be
subtrees. For example, consider the adjusted.
splitting hyperplane (discriminator) hx1 As usual, searching starts at the root
depicted in Figure 34 and its upper and and corresponds to a top-down tree tra-
lower bounds bx1 and bx2, respectively. versal. At each interior node we check
The solid lines are the splitting hyper- the discriminator and the boundaries to
planes and the dashed lines represent decide which child(ren) to visit next.
the upper and lower bounds of the cor- Deleting an object starts with an ex-
responding subtrees. m3 is the rectan- act match query to determine the cor-
gle closest to hx1 without crossing it, rect leaf node. If a deletion causes an
thus determining the maximum extent underflow, we insert the remaining en-
bx1 of the objects in the left (lower) tries into the sibling data node and re-
subspace. Similarly, m5 determines the move the splitting hyperplane. If this
minimum extent bx2 for the right (up- insertion results in an overflow, we split
the page and insert the new hyperplane point enables the GBD-tree to perform
into the skd-tree. If no merge with a an insertion along a single path from
sibling leaf node is possible, we delete the root to a leaf. However, no apparent
that leaf and its parent node. By redi- advantage is gained in search perfor-
recting the reference of the latter to its mance. The reported performance ex-
sibling (interior) node, we extend the periments [Ohsawa and Sakauchi 1990]
subspace of the sibling. All affected en- compare only storage utilization and in-
tries are reinserted. sertion performance with the R-tree.
According to the results reported in The most important comparison, that of
Ooi [1990] and Ooi et al. [1991], the search performance, is omitted.
skd-tree is competitive with the R-tree Figure 35 depicts a GBD-tree for the
both in storage utilization and search running example. The partitioning on
efficiency. the left-hand side shows the minimum
bounding boxes (dotted or dashed) and
5.2.6 The GBD-Tree [Ohsawa and
the underlying intervals (Peano re-
Sakauchi 1990]. The GBD-tree (gener-
gions).
alized BD-tree) is an extension of the
Among the approaches similar to the
BD-tree [Ohsawa and Sakauchi 1983]
GBD-tree are an extension of the buddy
that allows for secondary storage man-
tree by Seeger [1991] and the extension
agement and supports the management
of the BANG file to handle extended
of extended objects. The BD-tree is a
spatial objects [Freeston 1989b].
binary tree, but the GBD-tree is a bal-
anced multiway tree that stores spatial 5.2.7 PLOP-Hashing [Kriegel and
objects as a hierarchy of minimum Seeger 1988; Seeger and Kriegel 1988].
bounding boxes. Each leaf node (bucket) Piecewise linear order-preserving (PLOP)
stores the MBBs of those objects whose hashing [Seeger and Kriegel 1988] is a
centroids are contained in the corre- variant of hashing that allows the stor-
sponding bucket region. Each interior age of extended objects without trans-
node stores the MBB of the (usually forming them into points. An earlier
overlapping) MBBs of its descendants. version of this structure [Kriegel and
The intervals are encoded using the Seeger 1988] was only able to handle
same DZ-expressions as described in multidimensional point data.
Section 3.2.3. PLOP-hashing partitions the universe
The one advantage of the GBD-tree in a similar way to the grid file: ex-
over the R-tree is that insertions and tended objects may span more than one
deletions may be processed more effi- directory cell. Hyperplanes extend along
ciently, due to the encoding scheme and the axes of the data space. For the orga-
the placement by centroid. The latter nization of these hyperplanes, PLOP-
Provided that there is enough space, the 5.3.3 The Cell Tree [Günther 1988].
insertion is straightforward. If the The main goal in the design of the cell
bounding interval I d (o) overlaps space tree [Günther 1988; 1989] was to facili-
that has not yet been covered, we have tate searches on data objects of arbi-
to enlarge the intervals corresponding trary shapes, that is, especially on data
to one or more leaf nodes. Each of these objects that are not intervals them-
enlargements may require considerable selves. The cell tree uses clipping to
effort because overlaps must be avoided. manage large spatial databases that
In some rare cases, it may not be possi- may contain polygons or higher-dimen-
ble to increase the current intervals in sional polyhedra. It corresponds to a
such a way that they cover the new decomposition of the universe into dis-
object without some mutual overlap joint convex subspaces. The interior
[Günther 1988; Ooi 1990]. In case of nodes correspond to a hierarchy of
such a deadlock, some data intervals nested polytopes and each leaf node cor-
have to be split and reinserted into the responds to one of the subspaces (Figure
tree. 39). Each tree node is stored on one disk
If a leaf node overflows it has to be page.
split. Node splittings work similarly to To avoid some of the disadvantages
the case of the R-tree. An important resulting from clipping, the convex poly-
difference, however, is that splits may hedra are restricted to be subspaces of a
propagate not only up the tree, but also BSP (binary space partitioning). There-
down the tree. The resulting forced split fore we can view the cell tree as a
of the nodes below may lead to several combination of a BSP- and an R1-tree,
complications, including further frag- or as a BSP-tree mapped on paged sec-
mentation of the data intervals; see, for ondary memory. In order to minimize
example, the rectangles m5 and m8 in the number of disk accesses that occur
Figure 38. during a search operation, the leaf
For deletion, we first locate all the nodes of a cell tree contain all the infor-
data nodes where fragments of the ob- mation required for answering a given
ject are stored and remove them. If stor- search query; we load no pages other
age utilization drops below a given than those containing relevant data.
threshold, we try to merge the affected This is an important advantage of the
node with its siblings or to reorganize cell tree over the R-tree and related
the tree. This is not always possible, structures.
which is the reason why the R1-tree Before inserting a nonconvex object,
cannot guarantee a minimum space uti- we decompose it into a number of con-
lization. vex components whose union is the orig-
inal object. The components do not have objects may be inevitable. In Figure 39,
to be mutually disjoint. All components we had to split r2 and insert the result-
are assigned the same object identifier ing cells into two pages.
and inserted into the cell tree one by As do all structures based on clipping,
one. Due to clipping, we may have to the cell tree has to cope with the frag-
subdivide each component into several mentation of space, which becomes in-
cells during insertion, because it over- creasingly problematic as more objects
laps more than one subspace. Each cell are inserted into the tree. After some
is stored in one leaf node of the cell tree. time, most new objects will be split into
If an insertion causes a disk page to fragments during insertion. To avoid
overflow, we have to split the corre- the negative effects of this fragmenta-
sponding subspace and cell tree node tion, Günther and Noltemeier [1991]
and distribute its descendants between proposed the concept of oversize shelves.
the two resulting nodes. Each split may Oversize shelves are special disk pages
propagate up the tree. attached to the interior nodes of the tree
For point searches, we start at the that accommodate objects which would
root of the tree. Using the underlying have been split into too many fragments
BSP partitioning, we identify the sub- if they had been inserted regularly. The
space that includes the search point and authors propose a dynamically adjust-
continue the search in the correspond- ing threshold for choosing between plac-
ing subtree. This step is repeated recur- ing a new object on an oversize shelf or
sively until we reach a leaf node, where inserting it regularly. Performance re-
we examine all cells to see whether they sults of Günther and Gaede [1997] show
contain the search point. The solution substantial improvements compared to
consists of those objects that contain at the cell tree without oversize shelves.
least one of the cells that qualify. A
similar algorithm exists for range 5.4 Multiple Layers
searches. A performance evaluation of
the cell tree [Günther and Bilmes 1991] The multiple layer technique can be re-
shows that it is competitive with other garded as a variant of the overlapping
popular spatial access methods. regions approach, because data regions
Figure 39 shows our running example of different layers may overlap. How-
with five partitioning hyperplanes, each ever, there are several important differ-
of them stored in the interior nodes. ences. First, the layers are organized in
Even though the partitioning by means a hierarchy. Second, each layer parti-
of the BSP-tree offers more flexibility tions the complete universe in a differ-
than rectilinear hyperplanes, clipping ent way. Third, data regions within a
layer are disjoint; that is, they do not layers, called the filter tree. As in the
overlap. Fourth, the data regions do not MX-CIF quadtree, each layer is the re-
adapt to the spatial extensions of the sult of a regular subdivision of the uni-
corresponding data objects. verse. A new object is assigned to a
In order to get a better understanding unique layer, depending on the object’s
of the multilayer technique, we discuss position and extension. Objects within
how to insert an extended object. First, one layer are first sorted by the Hilbert
we try to find the lowest layer in the code of their center, then packed into
hierarchy whose hyperplanes do not data pages of a given size. Finally, the
split the new object. If there is such a largest Hilbert code of each data page,
layer, we insert the object into the cor- together with its reference, is inserted
responding data page. If the insertion into a B-tree.
causes no page to overflow, we are done. We continue with a detailed descrip-
Otherwise, we must split the data re- tion of two dynamic SAMs based on
gion by introducing a new hyperplane multiple layers.
and distribute the entries accordingly.
Objects intersecting the hyperplane 5.4.1 The Multilayer Grid File [Six
have to be moved to a higher layer or an and Widmayer 1988]. Yet another vari-
overflow page. As the database becomes ant of the grid file capable of handling
populated, the data space of the lower extended objects is the multilayer grid
layers becomes more and more frag- file (not to be confused with the multi-
mented. As a result, large objects keep
level grid file of Whang and Krish-
accumulating on higher layers of the
namurthy [1985]). The multilayer grid
hierarchy or, even worse, it is no longer
file consists of an ordered sequence of
possible to insert objects without inter-
grid layers. Each of these layers corre-
secting existing hyperplanes.
sponds to a separate grid file (with
The multilayer approach seems to of-
fer one advantage over the overlapping freely positionable splitting hyper-
regions technique: a possibly higher se- planes) that covers the whole universe.
lectivity during searching due to the A new object is inserted into the first
restricted overlap of the different lay- grid file in the sequence that does not
ers. However, there are also several dis- imply any clipping of the object. This is
advantages: the multilayer approach an important difference from the twin
suffers from fragmentation, which may grid file (see Section 4.1.4), where ob-
render the technique inefficient for jects can be moved freely between the
some data distributions; certain queries two layers. If one of the grid files is
require the inspection of all existing extended by adding another splitting
layers; it is not clear how to cluster hyperplane, those objects that would be
objects that are spatially close to each split have to be moved to another layer.
other but in different layers; and there Figure 40 illustrates a multilayer grid
is some ambiguity about the layer in file with two layers for the running ex-
which to place the object. ample.
An early static multilayer access In the multilayer grid file, the size of
method is the MX-CIF quadtree [Kedem the bucket regions typically increases
1982; Abel and Smith 1983; Samet within the sequence; that is, larger ob-
1990b]. This structure stores each ex- jects are more likely to find their final
tended spatial object with the quadtree location in later layers. If a new object
node whose associated quadrant pro- cannot be stored in any of the current
vides the tightest fit without intersect- layers without clipping, a new layer has
ing the object. Objects within a node are to be allocated. An alternative is to al-
organized by means of binary trees. low clipping only for the last layer. Six
Sevcik and Koudas [1996] later pro- and Widmayer claim that d 1 1 layers
posed a similar SAM based on multiple are sufficient to store a set of d-dimen-
R-file therefore often performs poorly. tree performs better when there is less
This disadvantage is something that the overlap between the data rectangles.
R-file shares with the grid file. Wid- Kriegel et al. [1990] present an exten-
mayer [1991] also notes that the R-file sive experimental study of access-
is “algorithmically complicated.” method performance for a variety of
point distributions. The study involves
6. COMPARATIVE STUDIES four point access methods: the hB-tree,
the BANG file, the two-level grid file,
In this section, we give a brief overview
of theoretical and experimental results and the buddy tree. The authors decided
on the comparison of different access not to include PLOP-hashing since its
methods. Unfortunately, the number of performance suffers considerably for
such evaluations, especially theoretical nonuniform data. The zkdB1-tree of
analyses, is rather limited. Orenstein and Merrett [1984] was also
Greene [1989] compares the search not included since the authors consid-
performance of the R-tree, the k-d-B- ered both the BANG file and the hB-
tree, and the R1-tree for 10,000 uni- tree as improvements of that strategy.
formly distributed rectangles of varying Finally, Kriegel et al. did not include
size. Query parameters include the size quantile hashing although they claim
of the query rectangles and the page [Kriegel and Seeger 1987, 1989] that
size. Greene’s study shows that the k-d- this structure is very efficient for non-
B-tree can never really compete with uniform data.
the two R-tree variants. On the other According to the benchmarks, the
hand, there is not much difference be- buddy tree and, to some degree, the
tween the R1-tree and the R-tree, even BANG file outperform all other struc-
though the former is significantly more tures. The reported results show in an
difficult to code. As expected, the R1- impressive way how the performance of
the access methods studied varies with buddy tree with clipping and the grid
different data distributions and query file failed completely for certain distri-
range sizes. For clustered data and a butions, since they produced unmanage-
query range of size 10% of the size of ably large files. The transformation
the universe, there is almost no perfor- technique supports fast insertions at
mance difference between the buddy the expense of low storage utilization.
tree and the BANG file. If the size of the The R*-tree, on the other hand, requires
query range drops to only 0.1% of the fairly long insertion times but offers
size of the universe; however, the buddy good storage utilization. For intersec-
tree performs about twice as fast. tion and containment queries, the
For extended objects, Kriegel et al. buddy tree combined with overlapping
[1990] compared the R-tree and PLOP- regions is continuously superior to the
hashing with the buddy tree and the buddy tree with transformation. The
BANG file. The latter two techniques performance advantage of the overlap-
were enhanced by the transformation ping regions technique decreases for
technique to handle rectangles. Once larger query regions, even though the
again, the buddy tree and the BANG file buddy tree with transformation never
outperformed the other two access outperforms the buddy tree with over-
methods for nearly all data distribu- lapping regions. When the data set con-
tions. Note that the benchmarks mea- tains uniformly distributed rectangles
sured only the number of page accesses of varying size, the buddy tree with
but not the CPU time. clipping outperforms the other tech-
Beckmann et al. [1990] compared the niques for intersection and enclosure
R*-tree with several variants of the R- queries. For some queries the buddy
tree for a variety of data distributions. tree with overlapping performs slightly
Besides the performance of the different better than the R*-tree.
structures for point, intersection, and Ooi [1990] compares a static and a
enclosure queries for varying query re- dynamic variant of the skd-tree with
gion sizes, they also compared spatial the packed R-tree described by Rousso-
join performance. The R*-tree is the poulos and Leifker [1985]. For large
clear winner for all data distributions page sizes, the skd-tree clearly outper-
and queries, and it also has the best forms the R-tree in terms of page ac-
storage utilization and insertion times. cesses per search operation. The space
A comparison for point data confirms requirements of the skd-tree, however,
these results. Similarly to previous per- are higher than those of the R-tree.
formance measurements, only the num- Since the skd-tree stores the extended
ber of disk accesses is measured. A re- objects by their centroids, containment
lated study by Kamel and Faloutsos queries are answered more efficiently
[1994] finds even better search results than by the R-tree. This behavior is
for the Hilbert R-tree, whereas updates clearly reflected in the performance re-
take about the same time as for the sults. A comparison with the extended
R*-tree. The impact of global clustering k-d-tree, enhanced by overflow pages,
on the search performance of the R*- suggests that the skd-tree is superior,
tree was investigated by Brinkhoff and although the extended k-d-tree (which
Kriegel [1994]. Kamel et al. [1996] use is based on clipping) performs rather
Hilbert codes for bulk insertion into dy- well for uniformly distributed data.
namic R*-trees. Günther and Bilmes [1991] compare
Seeger [1991] studied the relative the R-tree to two clipping-based access
performance of clipping, overlapping re- methods, the cell tree and the R1-tree.
gions, and transformation techniques Unlike most studies, the data sets con-
implemented on top of the buddy tree. sist of convex polygons instead of just
He also included the two-level grid file rectangles. The cell tree requires up to
and the R*-tree in the comparison. The twice as much space as its competitors.
However, the average number of page some slight performance benefits. Al-
accesses per search operation is less though the R*-tree is more compact
than for the other two access methods. than the other structures, its search
Moreover, this advantage tends to in- performance is not as good as that of
crease with the size of the database and the R1-tree for line segments. Unfortu-
the size of the query regions. Besides nately, Hoel and Samet do not report
measurements on the number of page the overall performance times for the
faults, CPU time measurements are different queries.
also given. Peloux et al. [1994] carried out a sim-
Günther and Gaede [1997] compare ilar performance comparison of two
the original cell tree as presented by quadtree variants, a variant of the R1-
Günther [1989] with the cell tree with tree, and the R*-tree. What makes their
oversize shelves [Günther and Nolte- study different is that all structures
meier 1991], the R*-tree [Beckmann et have been implemented on top of a com-
al. 1990], and the hB-tree [Lomet and mercial object-oriented system using
Salzberg 1989] for some real carto- the application programmer interface. A
graphic data. There is a slight perfor- further difference to Hoel and Samet
mance advantage of the cell tree with [1992] is that Peloux et al. used poly-
oversize shelves compared to the R*- gons rather than line segments as test
tree and the hB-tree, but a major differ- data. Furthermore, they report the var-
ence from the original cell tree. An ear- ious times for index traversal, loading
lier comparison using artificially polygons, and the like. Besides showing
generated data can be found in Günther that the R1-tree and a quadtree variant
[1991]. Both studies suggest that over- based on Hierarchical EXCELL [Tam-
size shelves may lead to significant im- minen 1983] outperform the R*-tree for
provements for access methods with point queries, they clearly demonstrate
clipping. that the database system must provide
Oosterom [1990] compares the query some means for physical clustering.
times of his KD2B-tree and the sphere Otherwise, reading a single index page
tree with the R-tree for different que- may induce several page faults.
ries. The KD2B-tree is a paged version Smith and Gao [1990] compare the
of the KD2-tree, which in turn is a performance of a variant of the zkdB1-
variant of the BSP-tree. The two struc- tree, the grid file, the R-tree, and the
tures differ in two aspects: each interior R1-tree for insertions, deletions, and
node stores two iso-oriented lines to al- search operations. They also measured
low for overlap and gaps, and the corre- storage utilization. The conclusion of
sponding partition lines do not clip; that their experiments is that z-ordering and
is, an object is handled as a unit. The the grid file perform well for insertions
KD2B-tree outperforms the R-tree for and deletions but deliver poor search
all queries, whereas the sphere tree is performance. R- and R1-trees, in con-
inferior to the R-tree. trast, offer moderate insertion and dele-
Hoel and Samet [1992] compare the tion performance but superior search
performance of the PMR-quadtree [Nel- performance. Although the R1-tree per-
son and Samet 1987], the R*-tree, and forms slightly better than the R-tree for
the R1-tree for indexing line segments. search operations, the authors conclude
The R1-tree shows the best insertion that the R1-tree is not a good choice for
performance, whereas the R*-tree occu- general-purpose applications, due to its
pies the least space. However, the inser- potentially poor space utilization.
tion behavior of the R1-tree heavily de- Hutflesz et al. [1990] showed that the
pends on the page size, unlike the PMR- R-file has a 10 to 20% performance ad-
quadtree. The performance of all vantage over the R-tree on a data set
structures compared is about the same, containing 48,000 rectangles with a
even though the PMR-quadtree shows high degree of overlap (each point in the
inventors of the R*-tree [Beckmann et access methods. Even for experts it be-
al. 1990]. The central formula of Pagel comes more and more difficult to recog-
et al. [1993] to compute the number of nize their merits and weaknesses, since
disk accesses in an R-tree has been every new method seems to claim supe-
found independently by Kamel and Fa- riority to at least one access method
loutsos [1993]. Faloutsos and Kamel previously published. This survey did
[1994] later refined this formula by us- not try to resolve this problem but
ing properties of the data set. More rather to give an overview of the pros
recently, Theodoridis and Sellis [1996] and cons of a variety of structures. It
proposed a theoretical model to deter- will come as no surprise to the reader
mine the number of disk accesses in an that at present no access method has
R-tree that requires only two parame- proven itself superior to all its competi-
ters: the amount of data and the density tors in whatever sense. Even if one
in the data space. Their model also ex- benchmark declares one structure as
tends to nonuniform distributions. the clear winner, another benchmark
In pursuit of an implementation-inde- may prove the same structure inferior.
pendent comparison criterion for access But why are such comparisons so dif-
methods, Pagel et al. [1995] suggest us- ficult? Because there are so many differ-
ing the degree of clustering. As a lower ent criteria to define optimality, and so
bound they assume the optimal cluster- many parameters that determine per-
ing of the static situation, that is, if the formance. Both time and space effi-
complete data set has been exposed be- ciency of an access method strongly de-
forehand. Incidentally, the significance pend on the data processed and the
of clustering for access methods has queries asked. An access method that
been demonstrated in numerous empir- performs reasonably well for iso-ori-
ical investigations as well.4 ented rectangles may fail for arbitrarily
In the area of constraint database oriented lines. Strongly correlated data
systems (see Gaede and Wallace [1997] may render an otherwise fast access
for a recent survey) a number of inter- method irrelevant for any practical ap-
esting papers related to multidimen- plication. An index that has been opti-
sional access methods have been pub- mized for point queries may be highly
lished. Kanellakis et al. [1993], for inefficient for arbitrary region queries.
example, presented a semidynamic struc- Large numbers of insertions and dele-
ture that guarantees certain worst-case tions may deteriorate a structure that is
bounds for space, search, and insertion. efficient in a more static environment.
Subramanian and Ramaswamy [1995] The initiative of Kriegel et al. [1990]
and Hellerstein et al. [1997] comple- to set up a standardized testbed for
ment this work by proving some impor- benchmarking and comparing access
tant lower and upper bounds. Sexton methods under different conditions was
[1997] and Stuckey [1997] look at index- an important step in the right direction.
ing from a language point of view. Their The world wide web provides a conve-
work can be regarded as a generaliza- nient infrastructure to access and dis-
tion of work by Hellerstein et al. [1995], tribute such benchmarks [Günther et
who proposed a generic framework for al. 1988]. Nevertheless, it remains far
modeling hierarchical access methods. from easy to compare or rank different
access methods. Experimental bench-
7. CONCLUSIONS marks need to be studied with care and
can only be a first indicator for usabil-
Research in spatial database systems
ity.
has resulted in a multitude of spatial
When it comes to technology transfer,
4
See Jagadish [1990a], Kamel and Faloutsos that is, to the use of access methods in
[1993], Brinkhoff and Kriegel [1994], Kumar commercial products, most vendors re-
[1994b], and Ng and Han [1994]. sort to structures that are easy to un-
derstand and implement. Quadtrees in thank the referees for their detailed and insight-
SICAD [Siemens Nixdorf Informations- ful comments.
systeme AG 1997] and Smallworld GIS
[Newell and Doe 1997], R-trees in Infor- REFERENCES
mix [Informix Inc. 1997], and Z-ordering
ABEL, D. J. AND MARK, D. M. 1990. A compara-
in Oracle [Oracle Inc. 1995] are typical tive analysis of some two-dimensional order-
examples. Performance seems to be of ings. Int. J. Geograph. Inf. Syst. 4, 1, 21–31.
minor importance in the selection, ABEL, D. J. AND SMITH, J. L. 1983. A data struc-
which comes as no surprise given the ture and algorithm based on a linear key for a
relatively small differences among rectangle retrieval problem. Comput. Vis. 24,
1–13.
methods in virtually all published anal-
ANG, C. AND TAN, T. 1997. New linear node
yses. Rather, the tendency is to take a splitting algorithm for R-trees. In Advances in
structure that is simple and robust and Spatial Databases, M. Scholl and A. Voisard,
to optimize its performance by a highly Eds., LNCS, Springer-Verlag, Berlin/Heidel-
tuned implementation and tight inte- berg/New York.
gration with other system components. AREF, W. G. AND SAMET, H. 1994. The spatial
filter revisited. In Proceedings of the Sixth
Nevertheless, the implementation International Symposium on Spatial Data
and experimental evaluation of access Handling, 190 –208.
methods is essential as it often reveals BAYER, R. 1996. The universal B-tree for multi-
deficiencies and problems that are not dimensional indexing. Tech. Rep. I9639,
Technische Universität München, Munich,
obvious from the design or a theoretical Germany. http://www.leo.org/pub/comp/doc/
model. In order to make such compara- techreports/tum/informatik/report/1996/TUM-
tive evaluations both easier to perform I9639.ps.gz.
and easier to verify, it is essential to BAYER, R. AND MCCREIGHT, E. M. 1972. Organi-
provide platform-independent access to zation and maintenance of large ordered indi-
ces. Acta Inf. 1, 3, 173–189.
the implementations of a broad variety
BAYER, R. AND SCHKOLNICK, M. 1977. Concur-
of access methods. Some extensions of rency of operations on B-trees. Acta Inf. 9,
the World Wide Web, including our own 1–21.
MMM project [Günther et al. 1997], BECKER, B., FRANCIOSA, P., GSCHWIND, S., OHLER,
may provide the right technological T., THIEM, F., AND WIDMAYER, P. 1992. En-
base for such a paradigm change. Once closing many boxes by an optimal pair of
boxes. In Proceedings of STACS’92, A. Finkel
every published paper includes a URL and M. Jantzen, Eds., LNCS 525, Springer-
(uniform resource locator), that is, an Verlag, Berlin/Heidelberg/New York, 475– 486.
Internet address that points to an im- BECKER, L. 1992. A new algorithm and a cost
plementation, possibly with a standard- model for join processing with the grid file.
ized user interface, transparency will Ph.D. thesis, Universität-Gesamthochschule
Siegen, Germany.
increase substantially. Until then, most
BECKMANN, N., KRIEGEL, H.-P., SCHNEIDER, R., AND
users will have to rely on general wis- SEEGER, B. 1990. The R*-tree: An efficient
dom and their own experiments to se- and robust access method for points and rect-
lect an access method that provides the angles. In Proceedings of ACM SIGMOD In-
best fit for their current application. ternational Conference on Management of
Data, 322–331.
BELUSSI, A. AND FALOUTSOS, C. 1995. Esti-
ACKNOWLEDGMENTS mating the selectivity of spatial queries using
the ‘correlation’ fractal dimension. In Proceed-
While working on this survey, we had the plea-
ings of the 21st International Conference on
sure of discussions with many colleagues. Special Very Large Data Bases, 299 –310.
thanks go to D. Abel, A. Buchmann, C. Faloutsos, BENTLEY, J. L. 1975. Multidimensional binary
A. Frank, M. Freeston, J. C. Freytag, J. Heller- search trees used for associative searching.
stein, C. Kolovson, H.-P. Kriegel, J. Nievergelt, J. Commun. ACM 18, 9, 509 –517.
Orenstein, P. Picouet, W.-F. Riekert, D. Rotem, BENTLEY, J. L. 1979. Multidimensional binary
J.-M. Saglio, B. Salzberg, H. Samet, M. Schiwietz, search in database applications. IEEE Trans.
R. Schneider, M. Scholl, B. Seeger, T. Sellis, A. P. Softw. Eng. 4, 5, 333–340.
Sexton, and P. Widmayer. We would also like to BENTLEY, J. L. AND FRIEDMAN, J. H. 1979. Data
structures for range searching. ACM Comput. DANDAMUDI, S. P. AND SORENSON, P. G. 1991.
Surv. 11, 4, 397– 409. Improved partial-match search algorithms for
BERCHTOLD, S., KEIM, D., AND KRIEGEL, H.-P. BD-trees. Comput. J. 34, 5, 415– 422.
1996. The X-tree: An index structure for EGENHOFER, M. 1989. Spatial query languages.
high-dimensional data. In Proceedings of the Ph.D. Thesis, University of Maine, Orono,
22nd International Conference on Very Large ME.
Data Bases, (Bombay) 28 –39. EGENHOFER, M. 1994. Spatial SQL: A query and
BLANKEN, H., IJBEMA, A., MEEK, P., AND VAN DEN presentation language. IEEE Trans. Knowl.
AKKER, B. 1990. The generalized grid file: Data Eng. 6, 1, 86 –95.
Description and performance aspects. In Pro- EVANGELIDIS, G. 1994. The hBP-tree: A concur-
ceedings of the Sixth IEEE International Con- rent and recoverable multi-attribute index
ference on Data Engineering, 380 –388. structure. Ph.D. Thesis, Northeastern Univer-
BRINKHOFF, T. 1994. Der spatial join in geo- sity, Boston, MA.
datenbanksystemen. Ph.D. Thesis, Ludwig- EVANGELIDIS, G., LOMET, D., AND SALZBERG, B.
Maximilians-Universität München. Germany 1995. The hBP-tree: A modified hB-tree sup-
(in German). porting concurrency, recovery and node con-
BRINKHOFF, T. AND KRIEGEL, H.-P. 1994. The solidation. In Proceedings of the 21st Interna-
impact of global clustering on spatial data- tional Conference on Very Large Data Bases,
base systems. In Proceedings of the Twentieth 551–561.
International Conference on Very Large Data FAGIN, R., NIEVERGELT, J., PIPPENGER, N., AND
Bases, 168 –179. STRONG, R. 1979. Extendible hashing: A
BRINKHOFF, T., KRIEGEL, H.-P., AND SCHNEIDER, R. fast access method for dynamic files. ACM
1993a. Comparison of approximations of Trans. Database Syst. 4, 3, 315–344.
complex objects used for approximation-based FALOUTSOS, C. 1986. Multiattribute hashing us-
query processing in spatial database systems. ing Gray-codes. In Proceedings of the ACM
In Proceedings of the Ninth IEEE Interna- SIGMOD International Conference on Man-
tional Conference on Data Engineering, 40 – agement of Data, 227–238.
49. FALOUTSOS, C. 1988. Gray-codes for partial
BRINKHOFF, T., KRIEGEL, H.-P., AND SEEGER, B. match and range queries. IEEE Trans. Softw.
1993b. Efficient processing of spatial joins Eng. 14, 1381–1393.
using R-trees. In Proceedings of ACM SIG- FALOUTSOS, C. AND GAEDE, V. 1996. Analysis of
MOD International Conference on Manage- n-dimensional quadtrees using the Hausdorff
ment of Data, 237–246. fractal dimension. In Proceedings of the 22nd
BRINKHOFF, T., KRIEGEL, H.-P., SCHNEIDER, R., AND International Conference on Very Large Data
SEEGER, B. 1994. Multi-step processing of Bases, (Bombay), 40 –50.
spatial joins. In Proceedings of the ACM SIG- FALOUTSOS, C. AND KAMEL, I. 1994. Beyond uni-
MOD International Conference on Manage- formity and independence: Analysis of R-trees
ment of Data, 197–208. using the concept of fractal dimension. In
BRODSKY, A., LASSEZ, C., LASSEZ, J.-L., AND MAHER, Proceedings of the Thirteenth ACM SIGACT–
M. J. 1995. Separability of polyhedra for SIGMOD–SIGART Symposium on Principles
optimal filtering of spatial and constraint of Database Systems, 4 –13.
data. In Proceedings of the Fourteenth ACM FALOUTSOS, C. AND RONG, Y. 1991. DOT: A spa-
SIGACT–SIGMOD–SIGART Symposium on tial access method using fractals. In Proceed-
Principles of Database Systems (San Jose, ings of the Seventh IEEE International Con-
CA), 54 – 64. ference on Data Engineering, 152–159.
BURKHARD, W. 1984. Index maintenance for FALOUTSOS, C. AND ROSEMAN, S. 1989. Fractals
non-uniform record distributions. In Proceed- for secondary key retrieval. In Proceedings of
ings of the Third ACM SIGACT–SIGMOD the Eighth ACM SIGACT–SIGMOD–SIGART
Symposium on Principles of Database Sys- Symposium on Principles of Database Sys-
tems, 173–180. tems, 247–252.
BURKHARD, W. A. 1983. Interpolation-based in- FALOUTSOS, C., SELLIS, T., AND ROUSSOPOULOS,
dex maintenance. BIT 23, 274 –294. N. 1987. Analysis of object-oriented spatial
CHEN, L., DRACH, R., KEATING, M., LOUIS, S., access methods. In Proceedings of the ACM
ROTEM, D., AND SHOSHANI, A. 1995. Access SIGMOD International Conference on Man-
to multidimensional datasets on tertiary stor- agement of Data, 426 – 439.
age systems. Inf. Syst. 20, 2, 155–183. FINKEL, R. AND BENTLEY, J. L. 1974. Quad
COMER, D. 1979. The ubiquitous B-tree. ACM trees: A data structure for retrieval of com-
Comput. Surv. 11, 2, 121–138. posite keys. Acta Inf. 4, 1, 1–9.
DANDAMUDI, S. P. AND SORENSON, P. G. 1986. FLAJOLET, P. 1983. On the performance evalua-
Algorithms for BD-trees. Softw. Pract. Exper. tion of extendible hashing and trie searching.
16, 2, 1077–1096. Acta Inf. 20, 345–369.
FRANK, A. AND BARRERA, R. 1989. The fieldtree: V. Gaede, A. Brodsky, O. Günther, D. Srivas-
A data structure for geographic information tava, V. Vianu, and M. Wallace, Eds., LNCS
systems. In Design and Implementation of 1191, Springer-Verlag, Berlin/Heidelberg/
Large Spatial Database Systems, A. Buch- New York, 7–52.
mann, O. Günther, T. R. Smith, and Y.-F. GARG, A. K. AND GOTLIEB, C. C. 1986. Order-
Wang, Eds., LNCS 409, Springer-Verlag, Ber- preserving key transformation. ACM Trans.
lin/Heidelberg/New York, 29 – 44. Database Syst. 11, 2, 213–234.
FREESTON, M. 1987. The BANG file: A new kind GREENE, D. 1989. An implementation and per-
of grid file. In Proceedings of the ACM SIG- formance analysis of spatial data access
MOD International Conference on Manage- methods. In Proceedings of the Fifth IEEE
ment of Data,, 260 –269. International Conference on Data Engineer-
FREESTON, M. 1989a. Advances in the design of ing, 606 – 615.
the BANG file. In Proceedings of the Third GÜNTHER, O. 1988. Efficient Structures for Geo-
International Conference on Foundations of metric Data Management. LNCS 337, Spring-
Data Organization and Algorithms, LNCS er-Verlag, Berlin/Heidelberg/New York.
367, Springer-Verlag, Berlin/Heidelberg/New
GÜNTHER, O. 1989. The cell tree: An object-ori-
York, 322–338.
ented index structure for geometric data-
FREESTON, M. 1989b. A well-behaved structure bases. In Proceedings of the Fifth IEEE Inter-
for the storage of geometric objects. In Design national Conference on Data Engineering,
and Implementation of Large Spatial Data- 598 – 605.
base Systems, A. Buchmann, O. Günther,
T. R. Smith, and Y.-F. Wang, Eds., LNCS 409, GÜNTHER, O. 1991. Evaluation of spatial access
Springer-Verlag, Berlin/Heidelberg/New York, methods with oversize shelves. In Geographic
287–300. Database Management Systems, G. Gambosi,
M. Scholl, and H.-W. Six, Eds., Springer-Ver-
FREESTON, M. 1995. A general solution of the lag, Berlin/Heidelberg/New York, 177–193.
n-dimensional B-tree problem. In Proceedings
of the ACM SIGMOD International Confer- GÜNTHER, O. 1993. Efficient computation of
ence on Management of Data, 80 –91. spatial joins. In Proceedings of the Ninth
IEEE International Conference on Data Engi-
FREESTON, M. 1997. On the complexity of BV- neering, 50 –59.
tree updates. In Proceedings of CDB’97 and
CP’96 Workshop on Constraint Databases and GÜNTHER, O. AND BILMES, J. 1991. Tree-based
their Application, V. Gaede, A. Brodsky, O. access methods for spatial databases: Imple-
Günther, D. Srivastava, V. Vianu, and M. mentation and performance evaluation. IEEE
Wallace, Eds., LNCS 1191, Springer-Verlag, Trans. Knowl. Data Eng. 3, 3, 342–356.
Berlin/Heidelberg/New York, 282–293. GÜNTHER, O. AND BUCHMANN, A. 1990. Research
FUCHS, H., ABRAM, G. D., AND GRANT, E. D. issues in spatial databases. SIGMOD Rec. 19,
1983. Near real-time shaded display of rigid 4, 61– 68.
objects. Computer Graph. 17, 3, 65–72. GÜNTHER, O. AND GAEDE, V. 1997. Oversize
FUCHS, H., KEDEM, Z., AND NAYLOR, B. 1980. On shelves: A storage management technique for
visible surface generation by a priori tree large spatial data objects. Int. J. Geog. Inf.
structures. Computer Graph. 14, 3. Syst. 11, 1, 5–32.
GAEDE, V. 1995a. Geometric information makes GÜNTHER, O. AND NOLTEMEIER, H. 1991. Spatial
spatial query processing more efficient. In database indices for large extended objects. In
Proceedings of the Third ACM International Proceedings of the Seventh IEEE Interna-
Workshop on Advances in Geographic Infor- tional Conference on Data Engineering, 520 –
mation Systems (ACM-GIS’95) (Baltimore, 526.
MD) 45–52. GÜNTHER, O., MÜLLER, R., SCHMIDT, P., BHARGAVA,
GAEDE, V. 1995b. Optimal redundancy in spa- H., AND KRISHNAN, R. 1997. MMM: A
tial database systems. In Advances in Spatial WWW-based approach for sharing statistical
Databases, M. J. Egenhofer and J. R. Herring, software modules. IEEE Internet Comput. 1, 3.
Eds., LNCS 951, Springer-Verlag, Berlin/Hei- GÜNTHER, O., ORIA, V., PICOUET, P., SAGLIO, J.-M.,
delberg/New York, 96 –116. AND SCHOLL, M. 1998. Benchmarking spa-
GAEDE, V. AND RIEKERT, W.-F. 1994. Spatial ac- tial joins à la carte. In Proceedings of the 10th
cess methods and query processing in the International Conference on Scientific and
object-oriented GIS GODOT. In Proceedings Statistical Database Management. IEEE, New
of the AGDM’94 Workshop (Delft, The Nether- York.
lands), Netherlands Geodetic Commission, GÜTING, R. H. 1989. Gral: An extendible rela-
40 –52. tional database system for geometric applica-
GAEDE, V. AND WALLACE, M. 1997. An informal tions. In Proceedings of the Fifteenth Interna-
introduction to constraint databases. In Pro- tional Conference on Very Large Data Bases,
ceedings of CDB’97 and CP’96 Workshop on 33– 44.
Constraint Databases and their Application, GÜTING, R. H. AND SCHNEIDER, M. 1993.
Realms: A foundation for spatial data types in 1988b. Twin grid files: Space optimizing ac-
database systems. In Advances in Spatial Da- cess schemes. In Proceedings of the ACM SIG-
tabases, D. Abel and B. C. Ooi, Eds., LNCS MOD International Conference on Manage-
692, Springer-Verlag, Berlin/Heidelberg/New ment of Data, 183–190.
York. HUTFLESZ, A., SIX, H.-W., AND WIDMAYER, P.
GUTTMAN, A. 1984. R-trees: A dynamic index 1990. The R-file: An efficient access struc-
structure for spatial searching. In Proceedings ture for proximity queries. In Proceedings of
of the ACM SIGMOD International Confer- the Sixth IEEE International Conference on
ence on Management of Data, 47–54. Data Engineering, 372–379.
HELLERSTEIN, J. M., KOUTSOUPIAS, E., AND PAPAD- HUTFLESZ, A., WIDMAYER, P., AND ZIMMERMANN, C.
IMITRIOU, C. H. 1997. Towards a theory of 1991. Global order makes spatial access
indexability. In Proceedings of the Sixteenth faster. In Geographic Database Management
ACM SIGACT–SIGMOD–SIGART Sympo- Systems, G. Gambosi, M. Scholl, and H.-W.
sium on Principles of Database Systems. Six, Eds., Springer-Verlag, Berlin/Heidelberg/
HELLERSTEIN, J. M., NAUGHTON, J. F., AND PFEF- New York, 161–176.
FER, A. 1995. Generalized search trees for INFORMIX INC. 1997. The DataBlade architec-
database systems. In Proceedings of the 21st ture. URL http://www.informix.com.
International Conference on Very Large Data JAGADISH, H. V. 1990a. Linear clustering of ob-
Bases, 562–573. jects with multiple attributes. In Proceedings
HENRICH, A. 1995. Adapting the transformation of the ACM SIGMOD International Confer-
technique to maintain multidimensional non- ence on Management of Data, 332–342.
point objects in k-d-tree based access struc- JAGADISH, H. V. 1990b. On indexing line seg-
tures. In Proceedings of the Third ACM Interna- ments. In Proceedings of the Sixteenth Inter-
tional Workshop on Advances in Geographic
national Conference on Very Large Data
Information Systems (ACM-GIS’95) (Balti-
Bases, 614 – 625.
more, MD) ACM Press, New York.
JAGADISH, H. V. 1990c. Spatial search with
HENRICH, A. AND MÖLLER, J. 1995. Extending a
polyhedra. In Proceedings of the Sixth IEEE
spatial access structure to support additional
International Conference on Data Engineer-
standard attributes. In Advances in Spatial
ing, 311–319.
Databases, M. J. Egenhofer and J. R. Herring,
Eds., LNCS 951, Springer-Verlag, Berlin/Hei- KAMEL, I. AND FALOUTSOS, C. 1992. Parallel R-
delberg/New York, 132–151. trees. In Proceedings of the ACM SIGMOD
International Conference on Management of
HENRICH, A. AND SIX, H.-W. 1991. How to split
Data, 195–204.
buckets in spatial data structures. In Geo-
graphic Database Management Systems, G. KAMEL, I. AND FALOUTSOS, C. 1993. On packing
Gambosi, M. Scholl, and H.-W. Six, Eds., R-trees. In Proceedings of the Second Interna-
Springer-Verlag, Berlin/Heidelberg/New York, tional Conference on Information and Knowl-
212–244. edge Management, 490 – 499.
HENRICH, A., SIX, H.-W., AND WIDMAYER, KAMEL, I. AND FALOUTSOS, C. 1994. Hilbert R-
P. 1989. The LSD tree: Spatial access to tree: An improved R-tree using fractals. In
multidimensional point and non-point objects. Proceedings of the Twentieth International
In Proceedings of the Fifteenth International Conference on Very Large Data Bases, 500 –
Conference on Very Large Data Bases, 45–53. 509.
HINRICHS, K. 1985. Implementation of the grid KAMEL, I., KHALIL, M., AND KOURAMAJIAN, V.
file: Design concepts and experience. BIT 25, 1996. Bulk insertion in dynamic R-trees. In
569 –592. Proceedings of the Seventh International Sym-
HOEL, E. G. AND SAMET, H. 1992. A qualitative posium on Spatial Data Handling (Delft, The
comparison study of data structures for large Netherlands), 3B.31–3B.42.
segment databases. In Proceedings of the KANELLAKIS, P. C., RAMASWAMY, S., VENGROFF,
ACM SIGMOD International Conference on D. E., AND VITTER, J. S. 1993. Indexing for
Management of Data, 205–214. data models with constraints and classes. In
HOEL, E. G. AND SAMET, H. 1995. Benchmark- Proceedings of the Twelfth ACM SIGACT–
ing spatial join operations with spatial out- SIGMOD–SIGART Symposium on Principles
put. In Proceedings of the 21st International of Database Systems, 233–243.
Conference on Very Large Data Bases, 606 – KEDEM, G. 1982. The quad-CIF tree: A data
618. structure for hierarchical on-line algorithms.
HUTFLESZ, A., SIX, H.-W., AND WIDMAYER, P. In Proceedings of the Nineteenth Conference
1988a. Globally order preserving multidi- on Design and Automation, 352–357.
mensional linear hashing. In Proceedings of KEMPER, A. AND WALLRATH, M. 1987. An analy-
the Fourth IEEE International Conference on sis of geometric modeling in database sys-
Data Engineering, 572–579. tems. ACM Comput. Surv. 19, 1, 47–91.
HUTFLESZ, A., SIX, H.-W., AND WIDMAYER, P. KLINGER, A. 1971. Pattern and search statis-
tics. In Optimizing Methods in Statistics, S. ence on Database and Expert Systems Appli-
Rustagi, Ed., 303–337. cations (DEXA’94), D. Karagiannis, Ed.,
KNOTT, G. 1975. Hashing functions. Comput. J. LNCS 856, Springer-Verlag, Berlin/Heidel-
18, 3, 265–278. berg/New York, 57–70.
KOLOVSON, C. 1990. Indexing techniques for LARSON, P. A. 1980. Linear hashing with par-
multi-dimensional spatial data and historical tial expansions. In Proceedings of the Sixth
data in database management systems. Ph.D. International Conference on Very Large Data
Thesis, University of California at Berkeley. Bases, 224 –232.
KOLOVSON, C. AND STONEBRAKER, M. 1991. Seg- LEHMAN, P. AND YAO, S. 1981. Efficient locking
ment indexes: Dynamic indexing techniques for concurrent operations on B-trees. ACM
for multi-dimensional interval data. In Pro- Trans. Database Syst. 6, 4, 650 – 670.
ceedings of the ACM SIGMOD International LIN, K.-I., JAGADISH, H., AND FALOUTSOS, C.
Conference on Management of Data, 138 –147. 1994. The TV-tree: An index structure for
KRIEGEL, H.-P. 1984. Performance comparison high-dimensional data. VLDB J. 3, 4, 517–
of index structures for multikey retrieval. In 543.
Proceedings of the ACM SIGMOD Interna- LITWIN, W. 1980. Linear hashing: A new tool
tional Conference on Management of Data, for file and table addressing. In Proceedings of
186 –196. the Sixth International Conference on Very
KRIEGEL, H.-P., HEEP, P., HEEP, S., SCHIWIETZ, M., Large Data Bases, 212–223.
AND SCHNEIDER, R. 1991. An access method LO, M. AND RAVISHANKAR, C. 1994. Spatial joins
based query processor for spatial database using seeded trees. In Proceedings of the ACM
systems. In Geographic Database Manage- SIGMOD International Conference on Man-
ment Systems, G. Gambosi, M. Scholl, and agement of Data, 209 –220.
H.-W. Six, Eds., Springer-Verlag, Berlin/Hei- LOMET, D. B. 1983. Boundex index exponential
delberg/New York, 273–292. hashing. ACM Trans. Database Syst. 8, 1,
KRIEGEL, H.-P., SCHIWIETZ, M., SCHNEIDER, R., AND 136 –165.
SEEGER, B. 1990. Performance comparison LOMET, D. B. 1991. Grow and post index trees:
of point and spatial access methods. In Design Role, techniques and future potential. In Ad-
and Implementation of Large Spatial Data- vances in Spatial Databases, O. Günther and
base Systems, A. Buchmann, O. Günther, H. Schek, Eds., LNCS 525, Springer-Verlag,
T. R. Smith, and Y.-F. Wang, Eds., LNCS 409, Berlin/Heidelberg/New York, 183–206.
Springer-Verlag, Berlin/Heidelberg/New York, LOMET, D. B. AND SALZBERG, B. 1989. The hB-
89 –114. tree: A robust multiattribute search struc-
KRIEGEL, H.-P. AND SEEGER, B. 1986. Multi- ture. In Proceedings of the Fifth IEEE Inter-
dimensional order preserving linear hashing national Conference on Data Engineering,
with partial expansions. In Proceedings of the 296 –304.
International Conference on Database Theory, LOMET, D. B. AND SALZBERG, B. 1990. The hB-
LNCS 243, Springer-Verlag, Berlin/Heidel- tree: A multiattribute indexing method with
berg/New York. good guaranteed performance. ACM Trans.
KRIEGEL, H.-P. AND SEEGER, B. 1987. Multi- Database Syst. 15, 4, 625– 658. Reprinted in
dimensional quantile hashing is very efficient Readings in Database Systems, M. Stone-
for non-uniform record distributions. In Pro- braker, Ed., Morgan-Kaufmann, San Mateo,
ceedings of the Third IEEE International Con- CA, 1994.
ference on Data Engineering, 10 –17. LOMET, D. B. AND SALZBERG, B. 1992. Access
KRIEGEL, H.-P. AND SEEGER, B. 1988. PLOP- method concurrency with recovery. In Pro-
hashing: A grid file without directory. In Pro- ceedings of the ACM SIGMOD International
ceedings of the Fourth IEEE International Conference on Management of Data, 351–360.
Conference on Data Engineering, 369 –376. LU, H. AND OOI, B.-C. 1993. Spatial indexing:
KRIEGEL, H.-P. AND SEEGER, B. 1989. Multi- Past and future. IEEE Data Eng. Bull. 16, 3,
dimensional quantile hashing is very efficient 16 –21.
for non-uniform distributions. Inf. Sci. 48, MATSUYAMA, T., HAO, L. V., AND NAGAO, M. 1984.
99 –117. A file organization for geographic information
KORNACKER, M. AND BANKS, D. 1995. High-con- systems based on spatial proximity. Int.
currency locking in R-trees. In Proceedings of J. Comput. Vis. Graph. Image Process. 26, 3,
the 21st International Conference on Very 303–318.
Large Data Bases, 134 –145. MORTON, G. 1966. A computer oriented geodetic
KUMAR, A. 1994a. G-tree: A new data structure data base and a new technique in file se-
for organizing multidimensional data. IEEE quencing. IBM Ltd.
Trans. Knowl. Data Eng. 6, 2, 341–347. NELSON, R. AND SAMET, H. 1987. A population
KUMAR, A. 1994b. A study of spatial clustering analysis for hierarchical data structures. In
techniques. In Proceedings of the Fifth Confer- Proceedings of the ACM SIGMOD Interna-
tional Conference on Management of Data, ings of the IEEE Computer Software and Ap-
270 –277. plications Conference, 433– 438.
NEWELL, R. G. AND DOE, M. 1997. Discrete ge- OOI, B. C., SACKS-DAVIS, R., AND MCDONELL,
ometry with seamless topology in a GIS. URL K. J. 1991. Spatial indexing by binary de-
http://www.smallworld-us.com. composition and spatial bounding. Inf. Syst.
NG, R. T. AND HAN, J. 1994. Efficient and effec- J. 16, 2, 211–237.
tive clustering methods for spatial data min- OOSTEROM, P. 1990. Reactive data structures
ing. In Proceedings of the Twentieth Interna- for geographic information systems. Ph.D.
tional Conference on Very Large Data Bases, Thesis, University of Leiden, The Nether-
144 –154. lands.
NG, V. AND KAMEDA, T. 1993. Concurrent ac- ORACLE INC. 1995. Oracle 7 multidimension:
cesses to R-trees. In Advances in Spatial Da- Advances in relational database technology
tabases, D. Abel and B. C. Ooi, Eds., LNCS for spatial data management. White paper.
692, Springer-Verlag, Berlin/Heidelberg/New ORENSTEIN, J. 1982. Multidimensional tries
York, 142–161. used for associative searching. Inf. Process.
NG, V. AND KAMEDA, T. 1994. The R-link tree: A Lett. 14, 4, 150 –157.
recoverable index structure for spatial data. ORENSTEIN, J. 1983. A dynamic file for random
In Proceedings of the Fifth Conference on Da- and sequential accessing. In Proceedings of
tabase and Expert Systems Applications the Ninth International Conference on Very
(DEXA’94), D. Karagiannis, Ed., LNCS 856, Large Data Bases, 132–141.
Springer-Verlag, Berlin/Heidelberg/New York, ORENSTEIN, J. 1989a. Redundancy in spatial
163–172. databases. In Proceedings of the ACM SIG-
NIEVERGELT, J. 1989. 762 criteria for assessing MOD International Conference on Manage-
and comparing spatial data structures. In De- ment of Data, 294 –305.
sign and Implementation of Large Spatial Da- ORENSTEIN, J. 1989b. Strategies for optimizing
tabase Systems, A. Buchmann, O. Günther, the use of redundancy in spatial databases. In
T. R. Smith, and Y.-F. Wang, Eds., LNCS 409, Design and Implementation of Large Spatial
Springer-Verlag, Berlin/Heidelberg/New York, Database Systems, A. Buchmann, O. Günther,
3–27. T. R. Smith, and Y.-F. Wang, Eds., LNCS 409,
NIEVERGELT, J. AND HINRICHS, K. 1987. Storage Springer-Verlag, Berlin/Heidelberg/New York,
and access structures for geometric data 115–134.
bases. In Proceedings of the International ORENSTEIN, J. 1990. A comparison of spatial
Conference on Foundations of Data Organiza- query processing techniques for native and
tion, S. Ghosh, Y. Kambayashi, and K. parameter space. In Proceedings of the ACM
Tanaka, Eds., Plenum, New York. SIGMOD International Conference on Man-
NIEVERGELT, J., HINTERBERGER, H., AND SEVCIK, K. agement of Data, 343–352.
1981. The grid file: An adaptable, symmetric ORENSTEIN, J. AND MERRETT, T. H. 1984. A class
multikey file structure. In Proceedings of the of data structures for associative searching.
Third ECI Conference, A. Duijvestijn and P. In Proceedings of the Third ACM SIGACT–
Lockemann, Eds., LNCS 123, Springer-Ver- SIGMOD Symposium on Principles of Data-
lag, Berlin/Heidelberg/New York, 236 –251. base Systems, 181–190.
NIEVERGELT, J., HINTERBERGER, H., AND SEVCIK, ORENSTEIN, J. A. 1986. Spatial query process-
K. C. 1984. The grid file: An adaptable, ing in an object-oriented database system. In
symmetric multikey file structure. ACM Proceedings of the ACM SIGMOD Interna-
Trans. Database Syst. 9, 1, 38 –71. tional Conference on Management of Data,
OHSAWA, Y. AND SAKAUCHI, M. 1983. BD-tree: A 326 –333.
new n-dimensional data structure with effi- OTOO, E. J. 1984. A mapping function for the
cient dynamic characteristics. In Proceedings directory of a multidimensional extendible
of the Ninth World Computer Congress, IFIP hashing. In Proceedings of the Tenth Interna-
1983, 539 –544. tional Conference on Very Large Data Bases,
OHSAWA, Y. AND SAKAUCHI, M. 1990. A new tree 493–506.
type data structure with homogeneous node OTOO, E. J. 1985. Symmetric dynamic index
suitable for a very large spatial database. In maintenance scheme. In Proceedings of the
Proceedings of the Sixth IEEE International International Conference on Foundations of
Conference on Data Engineering, 296 –303. Data Organization, Plenum, New York, 283–
OOI, B. C. 1990. Efficient Query Processing in 296.
Geographic Information Systems. LNCS 471, OTOO, E. J. 1986. Balanced multidimensional
Springer-Verlag, Berlin/Heidelberg/New York. extendible hash tree. In Proceedings of the
OOI, B. C., MCDONELL, K. J., AND SACKS-DAVIS, Fifth ACM SIGACT–SIGMOD Symposium on
R. 1987. Spatial kd-tree: An indexing Principles of Database Systems, 100 –113.
mechanism for spatial databases. In Proceed- OUKSEL, M. 1985. The interpolation based grid
file. In Proceedings of the Fourth ACM SI- ceedings of the Seventh IEEE International
GACT–SIGMOD Symposium on Principles of Conference on Data Engineering, 10 –18.
Database Systems, 20 –27. ROUSSOPOULOS, N. AND LEIFKER, D. 1984. An in-
OUKSEL, M. AND SCHEUERMANN, P. 1983. Stor- troduction to PSQL: A pictorial structured
age mappings for multidimensional linear dy- query language. In Proceedings of the IEEE
namic hashing. In Proceedings of the Second Workshop on Visual Languages.
ACM SIGACT–SIGMOD Symposium on Prin- ROUSSOPOULOS, N. AND LEIFKER, D. 1985. Direct
ciples of Database Systems, 90 –105. spatial search on pictorial databases using
OUKSEL, M. A. AND MAYER, O. 1992. A robust packed R-trees. In Proceedings of the ACM
and efficient spatial data structure. Acta Inf. SIGMOD International Conference on Man-
29, 335–373. agement of Data, 17–31.
OVERMARS, M. H., SMID, M. H., BERG, T., AND VAN SAGAN, H. 1994. Space-Filling Curves. Spring-
KREVELD, M. J. 1990. Maintaining range er-Verlag, Berlin/Heidelberg/New York.
trees in secondary memory: Part I: Partitions. SAMET, H. 1984. The quadtree and related hier-
Acta Inf. 27, 423– 452. archical data structure. ACM Comput. Surv.
PAGEL, B. U., SIX, H.-W., AND TOBEN, H. 1993a. 16, 2, 187–260.
The transformation technique for spatial ob- SAMET, H. 1988. Hierarchical representation of
jects revisited. In Advances in Spatial Data- collections of small rectangles. ACM Comput.
bases, D. Abel and B. C. Ooi, Eds., LNCS 692, Surv. 20, 4, 271–309.
Springer-Verlag, Berlin/Heidelberg/New York, SAMET, H. 1990a. Applications of Spatial Data
73– 88. Structures. Addison-Wesley, Reading, MA.
PAGEL, B. U., SIX, H.-W., AND WINTER, M. 1995. SAMET, H. 1990b. The Design and Analysis of
Window query optimal clustering of spatial Spatial Data Structures. Addison-Wesley,
objects. In Proceedings of the Fourteenth ACM Reading, MA.
SIGACT–SIGMOD–SIGART Symposium on SAMET, H. AND WEBBER, R. E. 1985. Storing a
Principles of Database Systems, 86 –94. collection of polygons using quadtrees. ACM
PAGEL, B. U., SIX, H.-W., TOBEN, H., AND WID- Trans. Graph. 4, 3, 182–222.
MAYER, P. 1993b. Towards an analysis of SCHIWIETZ, M. 1993. Speicherung und anfrage-
range query performance in spatial data bearbeitung komplexer geo-objekte. Ph.D. The-
structures. In Proceedings of the Twelfth ACM sis, Ludwig-Maximilians-Universität München,
SIGACT–SIGMOD–SIGART Symposium on Germany (in German).
Principles of Database Systems, 214 –221. SCHNEIDER, R. AND KRIEGEL, H.-P. 1992. The
PAPADIAS, D., THEODORIDIS, Y., SELLIS, T., AND TR*-tree: A new representation of polygonal
EGENHOFER, M. J. 1995. Topological rela- objects supporting spatial queries and opera-
tions in the world of minimum bounding rect- tions. In Proceedings of the Seventh Workshop
angles: A study with R-trees. In Proceedings on Computational Geometry, LNCS 553,
of the ACM SIGMOD International Confer- Springer-Verlag, Berlin/Heidelberg/New York,
ence on Management of Data, 92–103. 249 –264.
PAPADOPOULOS, A. AND MANOLOPOULOS, Y. 1997. SCHOLL, M. AND VOISARD, A. 1989. Thematic
Performance of nearest neighbor queries in map modeling. In Design and Implementation
R-trees. In Proceedings of the International of Large Spatial Database Systems, A. Buch-
Conference on Database Theory (ICDT’97), F. mann, O. Günther, T. R. Smith, and Y.-F.
Afrati and P. Kolaitis, Eds., LNCS 1186, Wang, Eds., LNCS 409, Springer-Verlag, Ber-
Springer-Verlag, Berlin/Heidelberg/New York, lin/Heidelberg/New York.
394 – 408. SEEGER, B. 1991. Performance comparison of
PELOUX, J., REYNAL, G., AND SCHOLL, M. 1994. segment access methods implemented on top
of the buddy-tree. In Advances in Spatial
Evaluation of spatial indices implemented
Databases, O. Günther and H. Schek, Eds.,
with the O 2 DBMS. Ingénièrie des Systèmes
LNCS 525, Springer-Verlag, Berlin/Heidel-
d’Information 6.
berg/New York, 277–296.
PREPARATA, F. P. AND SHAMOS, M. I. 1985. Com-
SEEGER, B. AND KRIEGEL, H.-P. 1988. Tech-
putational Geometry. Springer-Verlag, New niques for design and implementation of spa-
York. tial access methods. In Proceedings of the
REGNIER, M. 1985. Analysis of the grid file algo- Fourteenth International Conference on Very
rithms. BIT 25, 335–357. Large Data Bases, 360 –371.
ROBINSON, J. T. 1981. The K-D-B-tree: A search SEEGER, B. AND KRIEGEL, H.-P. 1990. The bud-
structure for large multidimensional dynamic dy-tree: An efficient and robust access method
indexes. In Proceedings of the ACM SIGMOD for spatial data base systems. In Proceedings
International Conference on Management of of the Sixteenth International Conference on
Data, 10 –18. Very Large Data Bases, 590 – 601.
ROTEM, D. 1991. Spatial join indices. In Pro- SELLIS, T., ROUSSOPOULOS, N., AND FALOUTSOS, C.
1987. The R1-tree: A dynamic index for ings of the First International Conference on
multi-dimensional objects. In Proceedings of Expert Data Base Systems.
the Thirteenth International Conference on STUCKEY, P. 1997. Constraint search trees. In
Very Large Data Bases, 507–518. Proceedings of the International Conference on
SEVCIK, K. AND KOUDAS, N. 1996. Filter trees Logic Programming (CLP’97), L. Naish, Ed.,
for managing spatial data over a range of size MIT Press, Cambridge, MA.
granularities. In Proceedings of the 22th In-
SUBRAMANIAN, S. AND RAMASWAMY, S. 1995. The
ternational Conference on Very Large Data
P-range tree: A new data structure for range
Bases (Bombay), 16 –27.
searching in secondary memory. In Proceed-
SEXTON, A. P. 1997. Querying indexed files. In ings of the ACM-SIAM Symposium on Discrete
Proceedings of the CDB’97 and CP’96 Work-
Algorithms (SODA’95).
shop on Constraint Databases and Their Ap-
plication, V. Gaede, A. Brodsky, O. Günther, TAMMINEN, M. 1982. The extendible cell method
D. Srivastava, V. Vianu, and M. Wallace, for closest point problems. BIT 22, 27– 41.
Eds., LNCS 1191, Springer-Verlag, Berlin/ TAMMINEN, M. 1983. Performance analysis of
Heidelberg/New York, 263–281. cell based geometric file organisations. Int.
SHEKHAR, S. AND LIU, D.-R. 1995. CCAM: A con- J. Comp. Vis. Graph. Image Process. 24, 160 –
nectivity-clustered access method for aggre- 181.
gate queries on transportation networks: A TAMMINEN, M. 1984. Comment on quad- and oc-
summary of results. In Proceedings of the trees. Commun. ACM 30, 3, 204 –212.
Eleventh IEEE International Conference on
THEODORIDIS, Y. AND SELLIS, T. K. 1996. A
Data Engineering, 410 – 419.
model for the prediction of R-tree perfor-
SIEMENS NIXDORF INFORMATIONSSYSTEME AG mance. In Proceedings of the Fifteenth ACM
1997. URL http://www.sni.de.
SIGACT–SIGMOD–SIGART Symposium on
SIX, H. AND WIDMAYER, P. 1988. Spatial search- Principles of Database Systems, 161–171.
ing in geometric databases. In Proceedings of
TROPF, H. AND HERZOG, H. 1981. Multi-
the Fourth IEEE International Conference on
Data Engineering, 496 –503. dimensional range search in dynamically bal-
anced trees. Angewandte Informatik 2, 71–77.
SMID, M. H. AND OVERMARS, M. H. 1990. Main-
taining range trees in secondary memory part WHANG, K.-Y. AND KRISHNAMURTHY, R. 1985.
II: Lower bounds. Acta Inf. 27, 453– 480. Multilevel grid files. IBM Research Labora-
tory, Yorktown Heights, NY.
SMITH, T. R. AND GAO, P. 1990. Experimental
performance evaluations on spatial access WHITE, M. 1981. N-trees: Large ordered in-
methods. In Proceedings of the Fourth Inter- dexes for multi-dimensional space. Tech. Rep.,
national Symposium on Spatial Data Han- Application Mathematics Research Staff, Sta-
dling (Zürich), 991–1002. tistical Research Division, US Bureau of the
STONEBRAKER, M. (ED.) 1994. Readings in Data- Census.
base Systems. Morgan-Kaufmann, San Mateo, WIDMAYER, P. 1991. Datenstrukturen für Geo-
CA. datenbanken. In Entwicklungstendenzen bei
STONEBRAKER, M., SELLIS, T., AND HANSON, E. Datenbank-Systemen, G. Vossen and K.-U.
1986. An analysis of rule indexing imple- Witt, Eds., Oldenbourg-Verlag, Munich,
mentations in data base systems. In Proceed- Chapter 9, 317–361 (in German).