Optimizing Multidimensional Index Trees For Main Memory
Optimizing Multidimensional Index Trees For Main Memory
Access
Kihong Kim Sang K. Cha Keunjoo Kwon
School of Electrical Engineering and Computer Science
Seoul National University
{ next, chask, icdi }@kdb.snu.ac.kr
139
43182, 27112 43182, 27112 43182, 27112
14, 15
R0 43178, 27109 R0 28, 29 R0
R3 R3 R3
5, 8
43160, 27095 10, 15
10, 6
43170, 27091 20, 11
R1 R1 R1
R2 R2 R2
43153, 27087 3, 7
1,3
43166, 27085 16, 5
8, 2
compression technique can reduce the MBR size to less than a the R-tree. Section 7 finally concludes this paper.
fourth, thereby increasing the fanout by more than 150%. A
potential problem with the proposed technique is that the 2. Motivation
information loss by quantization may increase false hits, which
have to be filtered out through a subsequent refinement step in 2.1 Memory Hierarchy
most multidimensional indexes [9]. However, we can keep the Table 1 summarizes the properties of the memory hierarchy
number of false hits negligibly small by the proper choice of the observed in Sun UltraSPARC II and Intel Xeon platforms. In
quantization level so that the cost of filtering out false hits can be UltraSPARC II, the block size is 32 bytes for the L1 cache and 64
paid off by the significant savings in cache misses. bytes for the L2 cache [10]. Typically, the L1 cache can be
accessed in one clock cycle, and the L2 cache can be accessed in
This paper also explores several options in the design of CR- two clock cycles. The memory access time depends on the DRAM
tree including whether to use the pointer elimination technique of type. When EDO DRAM is used, each memory access takes 50 ns
the CSB+-tree, whether to apply the proposed compression on average. When a cache miss occurs in the L1 cache and the L2
technique to leaf nodes or not, the choice of quantization levels, cache, a victim is selected. The miss penalty is the cost of
and the choice of node size. Our experimental study shows that all selecting a victim and accessing the backing store. In
the resultant CR-tree variants significantly outperform the R-tree UltraSPARC II, each L1 cache miss incurs two accesses to the L2
in terms of the search performance and the space requirement. cache, and each L2 cache miss incurs four accesses to main
The basic CR-tree that uses only the proposed technique performs memory.
search operations up to 2.5 times faster than the R-tree while
performing update operations similarly to the R-tree and using 2.2 Basic Idea
about 54% less memory space. Compared with the basic CR-tree, The idea in this paper is to make the R-tree cache-conscious by
most of CR-tree variants use less memory with algorithmic compressing MBRs. Figure 1 illustrates the compression scheme
overhead. Our analysis of the proposed technique and various used in this paper. Figure 1(a) shows the absolute coordinates of
indexes used in our experiment coincides with the experimental R0~R3. Figure 1(b) shows the coordinates of R1~R3 represented
result. relatively to the lower left corner of R0. These relative coordinates
This paper is organized as follows. Section 2 presents the basic have a less number of significant bits than absolute coordinates.
idea of this paper and formulates our problem. Section 3 presents Figure 1(c) shows the coordinates of R1~R3 quantized into 16
the proposed MBR compression scheme, and the section 4 levels or four bits by cutting off trailing insignificant bits. We call
describes the proposed CR-tree. Section 5 analytically compares the resultant MBR QRMBR (quantized relative representation of
the CR-tree with the ordinary R-tree, and section 6 presents the MBR). Note that QRMBRs can be slightly bigger than original
MBRs.
result of the experiment conducted to compare the CR-tree with
The CR-tree is a cache-conscious R-tree that uses QRMBRs as
index keys. For the sake of simplicity, the quantization levels are
L1 Cache L2 Cache Memory
made the same for all nodes. Figure 2 shows the structure of a
Block size 16~32B 32~64B 4~16KB CR-tree node that can contain up to M entries. It keeps a flag
Size 16~64KB 256KB~8MB ~32GB indicating whether it is a leaf or not, the number of stored entries,
10~40 clock and the reference MBR that tightly encloses its entire child MBRs.
Hit time 1 clock cycle 1~4 clock cycles
cycles The reference MBR is used to calculate the QRMBRs stored in
Backing store L2 cache Memory Disks the node. Internal nodes store entries of the form (QRMBR, ptr),
4~20 clock 40~200 clock ~6M clock where ptr is the address of a child node and QRMBR is a
Miss penalty quantized relative representation of the child node MBR. Leaf
cycles cycles cycles
nodes store entries of the form (QRMBR, ptr), where ptr refers to
Table 1: Summary of Current Memory Hierarchy
an object and QRMBR is a quantized relative representation of the
140
Leaf or Nonleaf
# entries
(a) Node
Child's Object's
QRMBR
ptr pointer to a child node QRMBR
ptr pointer to a data object
object MBR. In most of our experiments, we quantize each of x The term c · Nnode access can be minimized in three ways:
and y coordinates into 256 levels or one byte. changing the node size such that c · Nnode access becomes minimal,
packing more entries into a fixed-size node, and clustering index
2.3 Problem Formulation entries into nodes efficiently. The second is often termed as
Our goal is to reduce the multidimensional index search time in compression and the third as clustering [11].
main memory databases.
The optimal node size is equal to the cache block size in one-
Observation 1. Let c be the node size in the number of cache dimensional case. In one-dimensional trees such as the B+-tree,
blocks, and Nnode access be the number of nodes accessed during since exactly one internal node is accessed for each level, the
search. The main memory indexes need to be designed to number of visited internal nodes decreases logarithmically the
minimize c · Nnode access . node size. On the other hand, the number of visited leaf nodes
decreases linearly with the node size, and c increases linearly with
In main memory, the index search time mainly consists of the
the node size. Therefore, c · Nnode access increases with the node size,
key comparison time and the memory access time incurred by
and thus it is minimal when c is one.
cache misses. If a cache miss occurs, the CPU has to wait until the
missing data are cached. A cache miss can occur for three reasons: In multidimensional indexes, more than one internal nodes of
missing data, missing instructions, and missing TLB (table look- the same level can be accessed even for the exact match query,
aside buffer) entries. Therefore, we can roughly express our goal and the number of accessed nodes of the same level decreases as
as minimizing the node size increases. Since this decrease is combined with the
log scale decrease of tree height, there is a possibility that the
Tindex search ≅ Tkey compare + Tdata cache + TTLB cache
combined decrease rate of node accesses exceeds the linear
where Tkey compare is the time spent comparing cached keys, Tdata increase rate of c. We will show analytically in section 5.2 that the
cache is the time spent caching data, and TTLB cache is the time spent optimal node size depends on several factors like the query
caching TLB entries. For simplicity, we omit the time for caching selectivity and the cardinality.
missing instructions because the number of instruction misses
Compressing index entries is equivalent to increasing the node
mostly depends on the compiler and we can hardly control it.
size without increasing c. In other words, it reduces Nnode access
Let Ckey compare be the key comparison cost per cache block, while keeping c fixed. Thus, it is highly desirable. Compression
Ccache miss be the cost of handling a single cache miss, and CTLB miss has been addressed frequently in disk-based indexes because it
be the cost of handling a single TLB miss. When the node size is can reduce the tree height, but there is little dedicated work,
smaller than that of a memory page, each access to a node incurs especially in multidimensional indexes. The following simple
at most one TLB miss. For simplicity, we assume that nodes have analysis shows that why the compression in disk-resident indexes
been allocated randomly and that no node and no TLB entry are does not provide as significant gain as in main memory indexes.
cached initially. Then,
Suppose that the tree A can pack f entries on average in a node
Tindex search = c · Ckey compare · Nnode access and the tree B can pack 2f entries in a node using a good
+ c · Ccache miss · Nnode access compression scheme. Then, their expected height is logf N and
+ CTLB miss · Nnode access log2f N, respectively. Thus, the height of B is 1/log2f +1 (= logf N /
= c · Nnode access · (Ckey compare + Ccache miss + CTLB miss / c) log2f N) times smaller than that of A. In disk-based indexes, the
typical node size varies from 4KB to 64KB. Assuming that the
Since Ccache miss and CTLB miss are constant for a given platform, node size is 8KB and nodes are 70% full, f is 716 (≅ 8192×0.7/8)
we can control three parameters: c, Ckey compare, and Nnode access. for a B+-tree index and about 286 (≅ 8192×0.7/20) for a two-
Among them, we cannot expect to reduce Ckey compare noticeably dimensional R-tree. Thus, 1/log2f is typically around 0.1. On the
because the key comparison is generally very simple. In addition, other hand, the node size is small in main memory indexes [4].
CTLB miss and Ccache miss typically have similar values. Therefore, the With a node occupying two cache blocks or 128B, f is about 11
index search time mostly depends on c · Nnode access. for a B+-tree and about 4 for a two-dimensional R-tree. Thus,
Observation 2. The amount of accessed index data can be best 1/log2f is 0.29 for the B+-tree and 0.5 for the R-tree. In summary,
reduced by compressing index entries. node compression can reduce the height of main memory indexes
significantly because the size of nodes is small.
141
Clustering has been studied extensively in disk-based index value. When N is one million and f is 11, about 8.2 bits are saved.
structures. In terms of clustering, the B+-tree is optimal in one- By multiplying 4, we can save about 32 bits per MBR. Note that
dimensional space, but no optimal clustering scheme is known for the number of saved bits does not depend on the original number
the multidimensional case. Instead, many heuristic schemes have of bits as long as the former is smaller than the latter.
been studied in various multidimensional index structures [6] [7]
[12][13][14]. Our work can be used with most of these clustering We can easily extend this analysis result such that the number
schemes.
of bits saved is parameterized further by the dimensionality. The
extended result is log 2 d N / f or
3. MBR Compression (log 2 N − log 2 f )/ d (1)
Here are two desirable properties of the MBR compression
scheme that we seek.
The formula (1) increases logarithmically with N, decreases
Overlap Check Without Decompression: A basic R-tree operation logarithmically with f, but decreases linearly with d. Therefore,
is to check whether each MBR in a node overlaps a given query the number of saved bits mainly depends on the dimensionality. In
rectangle. Checking the overlap of two MBRs should be done one-dimensional space, the relative representation technique can
directly with the compressed MBRs stored in the nodes, without save almost 16 bits for each scalar, but it becomes useless as the
decompressing them. This property enables the basic R-tree dimensionality increases.
operation to be processed with the one-time compression of the
query rectangle instead of the decompression of all the 3.2 QRMBR
compressed MBRs in the encountered nodes. Since we cannot obtain a sufficient compression ratio from the
RMBR technique alone, we introduce the additional quantization
Simplicity: Compression and decompression should be step. This step cuts off trailing insignificant bits from an RMBR
computationally simple and can be performed only with already while the RMBR technique cuts off leading non-discriminating
cached data. Conventional lossless compression algorithms such bits from an MBR. After defining QRMBR, we show that
as the one used in the GNU gzip program are expensive in terms quantizing an RMBR does not harm the correctness of index
of both computation and memory access because most of them search and its small overhead by quantization is paid off by the
maintain an entropy-based mapping table and look up the table for significant savings in cache misses.
compression and decompression [15]. Thus, although they may be
useful for disk-resident indexes, they are not adequate for main Definition 2. (Quantized Relative Representation of MBR or
memory indexes. QRMBR) Let I be the reference MBR, and let l be the desired
quantization level. Then, the corresponding quantized relative
3.1 RMBR representation of an MBR C is defined as
An obvious compression scheme is to represent keys relatively
within a node [16]. If we represent the coordinates of an MBR QRMBRI ,l (C ) =
relatively to the lower left corner of its parent MBR, the resultant (φ I .xl , I . xh ,l (C.xl ), φ I . yl , I . yh ,l (C. yl ), Φ I . xl , I . xh,l (C.xh ), Φ I . yl , I . yh,l (C. yh))
relative coordinates have many leading 0’s. By cutting off these
where φa ,b,l : R → {0, ..., l − 1} and Φ a ,b ,l : R → {1, ..., l} are
leading 0’s and recording the number of bits cut off, we can
effectively reduce the size of an MBR.
0 , if r ≤ a
Definition 1. (Relative Representation of MBR or RMBR) Let
φa , b, l (r ) = l − 1 , if r ≥ b
P and C be MBRs, that are represented by their lower left and l ( r − a ) /(b − a )
upper right coordinates (xl, yl, xh, yh), and let P enclose C. Then, , otherwise
the relative representation of C with respect to P has the
coordinates relative to the lower left corner of P. 1 , if r ≤ a
Φ a ,b ,l ( r ) = l , if r ≥ b
RMBRP(C) = (C.xl – P.xl, C.yl – P.yl, C.xh – P.xl, C.yh – P.yl)
l ( r − a ) /(b − a) , otherwise
However, the following simple analysis shows that the RMBR
technique can save only about 32 bits per MBR. For simplicity, Computational Cost. Lemma 1 says that QRMBR satisfies the
we assume that the coordinates of MBR are uniformly distributed first of two desirable properties mentioned at the beginning of this
in their domain and that R-tree nodes of the same height have section. Therefore, the computational overhead of QRMBR
square-like MBRs roughly of the same size [8]. Without loss of technique is the cost of compressing the query rectangle into a
generality, we assume that the domain of x coordinates has the QRMBR for each visited node. In our implementation,
unit length and consists of 232 different values equally spaced. Let compressing an MBR into a QRMBR consumes at most about 60
f be the average fanout of leaf nodes, and let N be the total number instructions, which corresponds to less than 120 ns on a 400 MHz
of data objects. Then, there are roughly N/f leaf nodes, whose processor because of pipelining. In addition, it incurs no memory
MBRs have the area of f/N and the side length of f / N along access as long as the query MBR and the MBR of the node on
immediate access are cached.
each axis. Since there are 232 different values in the unit interval
along each axis, there are 232 f / N different values in the Lemma 1. Let A and B be MBRs. For any MBR I and integer l, it
holds that if QRMBRI,l(A) and QRMBRI,l(B) do not overlap, A
interval with the length of f / N . Therefore, we can save 32– and B also do not overlap.
log2 (2 32
f / N ) bits or log 2 N / f bits for each x coordinate Proof. See Appendix A. ■
142
Correctness. Since it is generally not possible to recover the SplitNode and AdjustTree can also be invoked if needed. This
original coordinates of an MBR from its QRMBR, there is the algorithm is same as that of other R-tree variants.
possibility of incorrectly determining the overlap relationship
between two MBRs. Lemma 1 guarantees that there is no Algorithm ChooseLeaf. Select a leaf node to insert a new MBR
possibility of saying two actually overlapping MBRs do not C, descending a CR-tree from the root. This algorithm is same as
overlap. Thus, the QRMBR technique does not miss an object that that of other R-tree variants.
satisfies a query. Algorithm Install. Install a pair of an MBR C and an object
However, there is the possibility of concluding that two actually pointer p in a node N.
non-overlapping MBRs overlap. This means that the result of 1. Enlarge N.MBR such that it encloses C
index search may contain false hits that have to be filtered out 2. Make an entry of (QRMBRN.MBR,l(C), p) and append
through a subsequent refinement step. However, this refinement it to N
step is needed for most multidimensional index structures because 3. If N.MBR has been enlarged, recalculate all the
MBRs are typically approximations of objects [9]. Thus, requiring QRMBRs in N by accessing their actual MBRs and
invoke AdjustTree passing N
the refinement step itself is not an overhead, but the number of
false hits can be. Section 5.3 shows that the number of false hits Algorithm SplitNode. The CR-tree can use the split algorithms
can be made negligibly small, typically fewer than one, by used in other R-tree variants including the R-tree and the R*-tree
choosing the quantization level properly. [6][7]. In our experiment, the linear-cost split algorithm of the
original R-tree was used. After splitting a node into two, the
4. CR-tree QRMBRs in the nodes are recalculated according to their MBR.
4.1 Algorithms Algorithm AdjustTree. Ascend from a leaf node L up to the root,
adjusting MBRs of nodes and propagating node splits as
4.1.1 Searching necessary. When a node MBR has been adjusted, recalculate the
The search algorithm is similar to the one used in other R-tree QRMBRs in the node.
variants. The only difference is that the CR-tree compares a query
rectangle with QRMBRs. Instead of recovering MBRs from
QRMBRs, the CR-tree transforms the query rectangle into the
4.1.3 Deletion
corresponding QRMBR using the MBR of each node as the Algorithm Delete. Remove index record E from a CR-tree. The
reference MBR. Then, it compares two QRMBRs to determine CR-tree can use any of the deletion algorithms used in the R-tree
whether they overlap. and the R*-tree. However, the CondenseTree algorithm invoked
by the Delete algorithm needs a slight modification.
Algorithm Search. Given a CR-tree and a query rectangle Q, find
Algorithm CondenseTree. Given a leaf node L from which an
all index records whose QRMBRs overlap Q.
entry has been deleted, eliminate the node if it has too few entries
1. Push the root node to the initially empty stack and relocate its entries. Propagate node elimination upward as
S necessary. Adjust all MBRs of the nodes on the path to the root,
2. If S is empty, stop making them smaller if possible. When a node’s MBR has been
3. Pop a node N from S and set R to be
QRMBRN.MBR,l(Q)
adjusted, recalculate the QRMBRs in the node. This last step is
4. If N is not a leaf, check each entry E to what is different from other R-tree variants.
determine whether E.QRMBR overlaps R. If so,
push E.ptr to S 4.1.4 Bulk Loading
5. If N is a leaf, check each entry E to determine
whether E.QRMBR overlaps R. If so, add E.ptr to
Bulk loading into a CR-tree is not different from that into other R-
the result set tree variants. As long as QRMBRs are correctly maintained,
6. Repeat from step 2 existing bottom-up loading algorithms can be used directly
[17][18].
4.1.2 Insertion 4.2 Variants and Space Comparison
To insert a new object, the CR-tree traverses down from the root This paper also considers three variants of the CR-tree: PE
choosing the child node that needs the least enlargement to (pointer-eliminated) CR-tree, SE (space-efficient) CR-tree, and FF
enclose the object MBR. When visiting an internal node to choose (false-hit free) CR-tree.
one of its children, the object MBR is first transformed into the
QRMBR using the reference MBR. Then, the enlargement is The PE CR-tree eliminates most pointers to child nodes from
calculated between a pair of QRMBRs. When a leaf node is internal nodes as in the CSB+-tree. This extension can widen the
reached, the node MBR is first adjusted such that it encloses the CR-tree significantly because the key size of the CR-tree is now
object MBR. Then, an index entry for the object is created in the small unlike the R-tree. For example, when the size of QRMBR is
node. If the node MBR has been adjusted, the QRMBRs in the four bytes, this pointer elimination doubles the fanout of internal
node are recalculated because their reference MBR has been nodes. However, it is just a minor improvement in most cases
changed. If the node overflows, it is split and the split propagates because pointers to data objects stored in leaf nodes can rarely be
up the tree. eliminated. When the average fanout of both internal and leaf
nodes is 10, the number of internal nodes is about a ninth of that
Algorithm Insert. Insert a new object O whose MBR is C into a of leaf nodes. Therefore, the overall increase of fanout is only
CR-tree by invoking ChooseLeaf and Install. The algorithms about 10%. On the other hand, as in the CSB+-tree, node split
becomes expensive. The new node created by a split has to be
143
Maximum fanout Node space Typical
Internal Leaf Internal Leaf index size
stored consecutively with its siblings, and this often requires Let ah denote the average area that a node of height h covers.
allocating a new space and moving the siblings. Then, ah is 1/Mh. Using the Minkowski sum technique [17][19],
the probability that a node of height h overlaps a given query
The SE CR-tree removes the reference MBR from nodes of the
PE CR-tree. This is possible because the reference MBR of a node rectangle is ( d s + d ah ) d , where s denotes the size of the query
can be obtained from the matching entry in its parent node. This rectangle. Then, the number of height-h nodes that overlap the
extension increases the fanout of internal nodes by four and that
query rectangle is M h ( d s + d ah ) d or
of leaf nodes by two when the MBR size is 16 bytes and the
QRMBR size is 4 bytes. This increase can be larger than the d
increase obtained in the PE CR-tree when the node size is as small
N
as one or two cache blocks.
1+ d ⋅s .
f h
While the above two extensions increase the fanout, the third
extension to the CR-tree decreases the fanout of leaf nodes. Since
By summing this equation from the leaf to the root, the total
the QRMBR technique is a lossy compression scheme, the search
number of node accesses in R-trees is
result can be a superset of the actual answer for a given query.
This can be avoided if we apply the QRMBR technique only to N −1
d
N
∑
log f
internal nodes and store actual MBRs in leaf nodes. Called the FF
1+
1+ d ⋅s . (2)
CR-tree, this extension is useful when the subsequent refinement h =1 f h
step is extremely expensive.
Table 2 shows the space requirements of the various index The CR-tree accesses slightly more nodes than the R-tree
structures used in this paper, assuming all the nodes are 70% full. because the QRMBR is bigger than the original MBR by the
We assume that the size of MBR is 16 bytes, the size of QRMBR quantization error.
is 4 bytes, and the size of pointer is 4 bytes. The internal node
space is calculated by dividing the leaf space by the average Let l denote the quantization level. Then, each node has ld
fanout of internal nodes minus one. This analysis shows that the quantization cells, and the side length of each cell is d ah / l ,
PE CR-tree is not so different from the CR-tree in terms of the where h denotes the height of the node. Since whether to visit a
space requirement and the PE R-tree is no different from the R- child node is determined by comparing the QRMBR of the query
tree. rectangle and the stored QRMBR of the child node, the
probability to visit a child node is
5. Analysis ( d s + d ah / l + d ah −1 + d ah / l ) d . By multiplying by Mh and
Without loss of generality, we assume the data domain of unit
hyper-square. For simplicity, we assume that data objects are summing from the leaf to the root, the total number of node
uniformly distributed in the domain, and the query MBRs are accesses in CR-trees is
hyper-squares. We further assume that the R-tree nodes of the d
N −1
∑
log f
same height have square-like MBRs roughly of the same size as in N N
other analytical work [8][17]. 1+
1+ d ⋅ s + d
h +1 ⋅ s / l
. (3)
h =1 fh f
5.1 Number of Accessed Nodes
Let h denote the height or level of a node assuming that the height Figure 3 compares equations (2) and (3) when the cardinality is
of leaf nodes is one. Let Mh denote the number of nodes at the one million and the query selectivity is 0.01%. Here, we assumed
height of h. Then, from the above assumption, that the pointer size is 4 bytes and that each node is 70% full. The
MBR size is 16 bytes in 2D and increases linearly with
N dimensions. The QRMBR size is a one-fourth of the MBR size. In
Mh = .
fh this figure, the number of node accesses decreases with the node
size. The decrease rate is initially large, but it becomes smaller as
the node size increases. For all the node sizes and all the three
144
600 600
Number of Node Accesses
400 4D 400 4D
300 3D 300 3D
2D 2D
200 200
100 100
0 0
0 256 512 768 1024 0 256 512 768 1024
Node size (bytes) Node size (bytes)
(a) R-tree (b) CR-tree
Figure 3: Number of Node Accesses in R-trees and CR-trees (N = 1M, s = 0.01%, MBR: QRMBR = 4:1)
1200 1200
1000 1000
800 4D 800 4D
600 3D 600 3D
2D 2D
400 400
200 200
0 0
0 256 512 768 1024 0 256 512 768 1024
Node size (bytes) Node size (bytes)
(a) R-tree (b) CR-tree
Figure 4: Number of Cache Misses in R-trees and CR-trees (N = 1M, s = 0.01%, MBR:QRMBR = 4:1)
dimensionalities, the CR-tree surpasses the R-tree by more than Although the optimal one-dimensional node size in terms of the
twice. number of cache misses is shown to be the cache block size in
section 2.3, Figure 4 shows that this choice of node size is not
5.2 Number of Cache Misses optimal in multidimensional cases as discussed in section 2.3.
The number of cache misses can be easily obtained by multiplying Figure 5 shows the number of cache misses computed changing
equations (2) and (3) by the number of cache misses that one node the query selectivity. The observation on this figure is that the
access incurs. Figure 4 shows the analyzed number of cache optimal node size increases with the query selectivity in both the
misses. It shows that as the node size grows, the number of cache R-tree and the CR-tree. Figure 5(a) shows that the optimal node
misses approaches quickly to the minimum, and then increases size increases in the order of 128, 192, 320, 640, and 960 bytes as
slowly. In terms of cache misses, the CR-tree outperforms the R- the selectivity increases. Figure 5(b) shows that the optimal node
tree significantly, by up to 4.3 times. To obtain this figure, the size increases in the order of 64, 128, 192, 256, and 320 bytes as
equations (2) and (3) were multiplied by S/64, where S is the node the selectivity increases. Although we do not visualize because of
size in bytes and 64 is the L2 cache block size. the space limitation, the optimal node size increases in the same
way as the cardinality and the dimensionality increase.
Figure 4(a) shows a saw-like pattern that the number of cache
misses decreases abruptly at certain node sizes while generally 5.3 Ratio of False Hits By Quantization
increasing with the node size. Such bumps occur when the height Following the same steps as in section 5.1, each quantization cell
of tree becomes smaller. For example, the 4D R-tree has the
height of 7 when the node size is 448 or 512 bytes, but its height of a leaf node has the area of f / l d N and the side length of
becomes 6 when the node size is 576 bytes. In other words, such d
f / l d N along each axis, and the probability that the QRMBRs
bumps occur when the gain by the decrease of height surpasses
the overhead associated with the increase of node size.
145
100000
100000
10000
1.00%
1.00% 1000
1000 0.25%
0.25%
100 0.10%
100 0.10%
0.01%
0.01%
10 10 0.001%
0.001%
1 1
0 256 512 768 1024 0 256 512 768 1024
Node size (bytes) Node size (bytes)
(a) R-tree (b) CR-tree
Figure 5: Increase of Optimal Node Size with Selectivity in 2D R-trees and CR-trees
30 2.5
25 2.0
False hit ratio (%)
5 0.5
0 0.0
0 256 512 768 1024 0 256 512 768 1024
Node size (bytes) Node size (bytes)
(a) d =2 (b) QRMBR = 4B
Figure 6: False Hit Ratio by QRMBR Size and Dimensionality (N = 1M, s = 0.01%)
( s + a ) . Dividing by ( s + a ) ,
d
d d d We implemented six index structures on 2D: the ordinary R-
s + d a + 2d f / l d N - d d d d
tree, the PE R-tree, the CR-tree, the PE CR-tree, the SE CR-tree,
the ratio of false hits incurred by quantization to actual answers is and the FF CR-tree. We also implemented a bulk-loading
algorithm [17]. We changed the size of nodes from 64 bytes to
( )
d 1024 bytes for the implemented index structures. We used 16-byte
1 + 2
d
f / l d N / d s + d a − 1 . (4) MBRs and changed the size of QRMBRs from 2 bytes to 8 bytes.
If not specified, the default size of QRMBRs is 4 bytes, and the
Figure 6 shows the ratio when the cardinality is one million and nodes are 70% full.
the query selectivity is 0.01%. Here, we assume that the pointer We generated two synthetic data sets consisting of one million
size is 4 bytes and that each node is 70% full. Figure 6(a) shows small rectangles located in the unit square. One is uniformly
the false hit ratio in the 2D CR-tree for three different QRMBR distributed in the unit square and the other has the Gaussian
sizes: 2 bytes, 4 bytes, and 8 bytes, and Figure 6(b) shows the distribution around the center point (0.5, 0.5) with the standard
false hit ratio for three different dimensionality. The false hit ratio deviation of 0.25. We set the average side length of rectangles to
increases with both the node size and the dimensionality. Using be 0.001.
QRMBRs of 4 bytes incurs around one false hit in this
configuration, but it saves tens or hundreds of cache misses as
shown in Figure 4.
146
90 1400 Ordinary R-tree
PE R-tree
80 1200 FF CR-tree
70 SE CR-tree
1000
CR-tree
Search time (us)
50 16
45 14
40
12
Insertion time (us)
35
30 10
25 8
20 6 Ordinary R-tree
15 PE R-tree
4
10 CR-tree
5 2 PE CR-tree
0 0
0 256 512 768 1024 0 256 512 768 1024
Node size (bytes) Node size (bytes)
(a) Insertion Time (b) Deletion Time
147
30 6 3 CR-tree (2byte)
CR-tree (4byte)
Ratio of false hits (%)
15 3 1.5
10 2 1
5 1 0.5
0 0 0
0 256 512 768 1024 0 256 512 768 1024 0 256 512 768 1024
Node size (bytes) Node size (bytes) Node size (bytes)
(a) Selectivity = 0.01% (b) Selectivity = 0.25% (c) Selectivity = 1.00%
6.2 Update Performance linearly with the fanout. Thus, the total number of enlargement
To measure the update performance, we inserted 100,000 objects calculations increases with the fanout.
into indexes bulk-loaded with the one million uniform data set, The PE R-tree performs slightly worse than the R-tree because
then removed 100,000 randomly selected objects from the indexes. it increases the fanout by less than 25%. Since the fanout of the
Figure 8(a) and (b) show the measured elapsed time per CR-tree is about 150% larger than that of the R-tree, it performs
insertion and deletion, respectively. For a given node size, the worse than the R-tree for a given node size. Since the fanout of
CR-tree consumes about 15% more time than the R-tree for the PE CR-tree is about 400% larger than that of the R-tree, it
insertion. However, when the fanout is same (for example, the performs significantly worse than the R-tree for a given node size.
CR-tree with the node size of 256 bytes and the R-tree with the On the other hand, when the fanout is same, the ranking of the
node size of 640 bytes), the CR-tree performs similarly to or CR-tree is determined by the saving in cache misses and the
better than the R-tree. This can be explained in the following way. overhead of updating QRMBRs when the node MBR grows or
shrinks.
When descending a tree for insertion, the child node that needs
to be enlarged least is selected. Since the enlargement calculation Figure 8(b) shows that the rankings for deletion are slightly
consumes about 30 instructions in our implementation, it becomes different from those for insertion. Deletion is a combination of
more expensive than the cache miss in the CR-tree and its variants. highly selective search and node update. As you can expect from
Since a single cache block contains about 5.6 QRMBRs in the Figure 7, the CR-tree performs similarly to the R-tree as the
CR-tree, the enlargement calculation cost is about 168 instructions selectivity decreases. On the other hand, node update becomes
per cache block, but a cache miss consumes about 80~100 more expensive as the node size increases because the cost of
processor cycles on 400MHz UltraSPARC II. On the other hand, updating QRMBRs increases. Therefore, the CR-tree outperforms
since insertion accesses only one node for each height, the number the R-tree when the node size is small, but they cross over as the
of accessed nodes decreases logarithmically with the fanout, but node size increases.
the number of enlargement calculations for each node increases
148
4500 3500 14000
4000
3000 12000
3500
2500 10000
(cache blocks)
3000
2500 2000 8000
2000 1500 6000 Ordinary R-tree
PE R-tree
1500
1000 4000 FF CR-tree
1000 SE CR-tree
500 500 2000 CR-tree
PE CR-tree
0 0 0
0 256 512 768 1024 0 256 512 768 1024 0 256 512 768 1024
Node size (bytes) Node size (bytes) Node size (bytes)
(a) Accessed Index Data (b) Number of Cache Misses (c) Number of Key Comparisons
6.3 Impact of Quantization Levels data, the number of L2 cache misses, and the number of key
To assess the impact of quantization levels, we measured the ratio comparisons for the experiment reported in Figure 7.
of false hits incurred by quantization and the search time for three Figure 11(a) shows the amount of accessed index data, which is
different quantization levels, 24, 28, and 216. These correspond to the number of L2 cache misses when no index data is cached
QRMBRs of 2 bytes, 4 bytes, and 8 bytes, respectively. In this initially or the worst-case cache misses. In terms of the worst-case
experiment, we used the trees bulk-loaded with the 1M uniform cache misses, the six trees are clearly ranked by their fanout or in
data set. the order of the SE CR-tree, the PE CR-tree, the CR-tree, the FF
Figure 9 shows the ratio of false hits measured varying the CR-tree, the PE R-tree, and the R-tree, from the best to the worst.
quantization level. In section 5.3, we have shown that the ratio of The first three form one group, and the last two form another
group as in Figure 7. This result coincides with Figure 4.
( )
2
false hits can be estimated by 1 + 2 f / l 2 N / s + a − 1 . Figure 11(b) shows the measured number of L2 cache misses
This ratio increases with the fanout or the node size, and using the Perfmon tool [20]. The UltraSPARC processors provide
decreases with the increasing quantization level and selectivity. two registers for measuring processor events. We used the
Figure 9 is consistent with the analytical result. With the 16-bit Perfmon tool to make these registers count L2 cache misses and to
quantization, the result of the CR-tree search is almost the same as read the values stored in them. The number of L2 cache misses is
that of the R-tree search. With the 8-bit quantization, the CR-tree slightly different from the amount of accessed index data because
search result contains at most 1% more objects than the R-tree of cache hits and missing instructions. Instruction cache misses
result. With the 4-bit quantization, the ratio of the false hits explains why the number of measured cache misses can be larger
increases steadily with the node size, up to 26% when the node than that of the worst-case cache misses in Figure 11(a) when
size is 1024 bytes and the selectivity is 0.01%. When the both the node size and the selectivity are small.
selectivity is high, the graph shows a similar slope with respect to Another observation on Figure 11(b) is that the cache hit ratio
the selectivity but the ratio of the false hits is contained within a increases with the node size. This has to do with the typical cache
few percents. So the 4-bit quantization becomes useful as the replacement policy based on the circular mapping of memory
selectivity increases. blocks to cache blocks. Namely, the memory block with the
Figure 10 shows the effect of the quantization on the search address A is cached into the cache block whose address is
time. The time for filtering out the false hits is not counted. The determined by the cache size modulo of A. With this policy, a
figure shows that the 8-bit quantization performs the best when node consuming multiple memory blocks is placed consecutively
the selectivity is 0.01%. The 4-bit quantization with 0.01% in the cache. As the node size increases, the probability that the
selectivity performs well when the node size is small but becomes concurrently needed memory blocks are mapped to the conflicting
the worst as the node size grows. However, the 4-bit quantization location of the cache decreases.
performs the best regardless of the node size when the selectivity Figure 11(c) shows that the QRMBR technique increases the
is high. This is because the number of false hits becomes number of key comparisons slightly. Since the overlap test
relatively insignificant as the node size grows. between two MBRs consumes less than 10 instructions on average
in our implementation, saving an L2 cache miss is worth saving at
6.4 Breakdown of Search Performance least 10 overlap tests. The R-tree and the PE R-tree have similar
To better understand the search performance of the indexes used fanouts and form one group. The PE CR-tree and the SE CR-tree
in our experiment, we measured the amount of accessed index also have similar fanouts and form another group.
149
7. Conclusion [14] V. Gaede and O. Günther, “Multidimensional Access
There has been much research on multidimensional indexes. This Methods”, Computing Surveys, 30(2), 1998, pp. 170-231.
paper addressed the problem of optimizing the cache behavior of [15] A. Bookstein, S. T. Klein, T. Raita, “Is Huffman Coding
multidimensional indexes for use in the main memory database Dead?”, Proceedings of ACM SIGIR Conference, 1993, pp.
environment. To pack more entries in the node whose size is 80-87.
given in multiples of cache blocks, we have proposed an efficient [16] J. Goldstein, R. Ramakrishnan, and U. Shaft, “Compressing
MBR compression scheme called the quantized relative Relations and Indexes”, Proceedings of IEEE Conference on
representation of MBR or QRMBR which represents the Data Engineering, 1998, pp. 370-379.
coordinates of child nodes relatively to the MBR of the parent [17] I. Kamel and C. Faloutsos, “On Packing R-trees”,
node and quantizes the resultant relative MBR using a fixed Proceedings of ACM CIKM Conference, 1993, pp. 490-499.
number of bits. The CR-tree based on QRMBR effectively [18] S. T. Leutenegger, J. M. Edgington, M. A. Lopez, “STR: A
increases the fanout of the R-tree and decreases the index size for Simple and Efficient Algorithm for R-Tree Packing”,
the improved cache behavior. Proceedings of IEEE Conference on Data Engineering, 1997,
pp. 497-506.
Our extensive experimental study combined with analytical one
shows that the 2D CR-tree and its three variants outperform the [19] S. Berchtold, C. Böhm, H.-P. Kriegel, “The Pyramid-Tree:
ordinary R-tree up to 2.5 times in the search time and use about Breaking the Curse of Dimensionality”, Proceedings of ACM
60% less memory space. To see the practical impact of the CR- SIGMOD Conference, 1998, pp. 142-153.
tree, we are currently integrating the CR-tree into P*TIME, a [20] R. Enbody, Perfmon Performance Monitoring Tool, 1999,
prototype transact in memory engine under development. available from http://www.cps.msu.edu/~enbody/perfmon.ht
ml.
8. References Appendix A. Proof of Lemma 1.
[1] P. Boncz, S. Manegold, and M. Kersten, “Database We prove the contrapositive that if A and B overlap, QRMBRI,l(A)
Architecture Optimized for the New Bottleneck: Memory and QRMBRI,l(B) overlap. By definition, two rectangles overlap if
Access”, Proceedings of VLDB Conference, 1999, pp. 54-65. and only if they share at least one point. Thus, A and B share at
[2] A. Ailamaki, D. J. DeWitt, M. D. Hill, D. A. Wood, least one point. Let (x, y) denote this point. Then, the following
“DBMSs on a Modern Processor: Where Does Time Go?”, holds.
Proceedings of VLDB Conference, 1999, pp. 266-277.
[3] J. L. Hennessy and D. A. Patterson, Computer Architecture: A.xl ≤ x ≤ A.xh, A. yl ≤ y ≤ A. yh
A Quantitative Approach, 2nd Edition, Morgan Kaufmann, B.xl ≤ x ≤ B.xh, B. yl ≤ y ≤ B. yh
1996.
[4] J. Rao, K. A. Ross, “Cache Conscious Indexing for Decision- For simplicity, we omit the subscripts a, b, and l from the
Support in Main Memory”, Proceedings of VLDB quantization functions φ and Φ. Since, φ and Φ are monotonically
Conference, 1999, pp. 78-89. non-decreasing functions and φ ( r ) ≤ Φ (r ) for any r ∈ R ,
[5] J. Rao, K. A. Ross, “Making B+-trees Cache Conscious in
Main Memory”, Proceedings of ACM SIGMOD Conference, φ ( A.xl ) ≤ φ ( x ) ≤ Φ ( x ) ≤ Φ ( A.xh),
2000, pp. 475-486. φ ( A. yl ) ≤ φ ( y ) ≤ Φ ( y ) ≤ Φ ( A. yh),
[6] A. Guttman, “R-trees: A Dynamic Index Structure for Spatial φ ( B.xl ) ≤ φ ( x ) ≤ Φ ( x ) ≤ Φ ( B.xh),
Searching”, Proceedings of ACM SIGMOD Conference,
1984, pp. 47-57. φ ( B. yl ) ≤ φ ( y ) ≤ Φ ( y ) ≤ Φ( B. yh)
[7] N. Beckmann, H.-P. Kriegel, R. Schneider, B. Seeger, “The
R*-Tree: An Efficient and Robust Access Method for Points Therefore, QRMBRI,l(A) and QRMBRI,l(B) share at least the
and Rectangles”, Proceedings of ACM SIGMOD Conference, point (φ(x), φ(y)). Thus, they overlap and this completes the
1990, pp. 322-331. proof.■
[8] C. Faloutsos and I. Kamel, “Beyond Uniformity and
Independence: Analysis of R-trees Using the Concept of
Fractal Dimension”, Proceedings of ACM PODS Symposium,
1994, pp. 4–13.
[9] T. Brinkhoff, H.-P. Kriegel, R. Schneider, and B. Seeger,
“Multi-Step Processing of Spatial Joins”, Proceedings of
ACM SIGMOD Conference, 1994, pp. 197-208.
[10] Sun Microsystems, UltarSPARCTM User’s Manual, 1997.
[11] J. M. Hellerstein, “Indexing Research: Forest or Trees?”,
Proceedings of ACM SIGMOD Conference, 2000, pp. 574,
Panel.
[12] T. Sellis, N. Roussopoulos, and C. Faloutsos, “The R+-tree:
A Dynamic Index for Multi-dimensional Objects”,
Proceedings of VLDB Conference, 1987, pp. 507-518.
[13] I. Kamel and C. Faloutsos, “Hilbert R-tree: An Improved R-
tree Using Fractals”, Proceedings of VLDB Conference, 1994,
pp. 500-509.
150