2005 Frias
2005 Frias
Abstract O(1) time. For more details see [6, section 6.6].
Associative containers are basic generic classes of the C++ The use of maps is illustrated in the following
standard library. Access to the elements can be done by example that prints, in lexicographic order, how many
key or iterator, but not by rank. This paper presents a times each word appears in the input:
new implementation of the map class, which extends the
string word;
Standard with the ability to support efficient direct access map<string,int> M;
by rank without using extra space. This is achieved using while (cin >> word) ++M[word];
LBST trees. This document reports on the algorithmic for (map<string,int>::iterator it=M.begin();
engineering of this implementation as well as, experimental it!=M.end(); ++it)
cout << it->first << " " << it->second << endl;
results that show its competitive performance compared to
the widespread GCC library map implementation.
According to the Standard, access to the map ele-
ments can be done using direct access by key (as the
1 Introduction. instruction ++M[word] in the example above) or using
The use of standard libraries has become a practice sequential bidirectional iterators that traverse the map
of increasing importance in order to ease and fasten in order (as in the for loop above). However, direct
software development. The C++ programming language access by rank in the ordered sequence is not consid-
defines a standard library [5] whose algorithmic core ered. This means that, for instance, getting the median
constitutes the Standard Template Library (STL) [6]. element in a map of size n takes Θ(n) time.
The STL defines three different types of objects This paper presents a new implementation of the
that cooperate between them. These are containers map class. While retaining compatibility with the C++
(to store collections of elements), iterators (to access standard library, this new implementation offers effi-
and traverse elements in a container) and algorithms cient random access to the map elements. This func-
(to process the elements in a container). The C++ tionality is achieved through the inclusion of random
standard library defines both the functionality and the access iterators for maps, which allow the user to access
performance (using Big-O notation) of these objects. the i-th element basically in the same way as he/she
Associative data structures (including sets, maps, would access the i-th element of an array or a vector.
multisets and multimaps) are a particularly important For instance, using random access iterators one can
kind of containers in the STL. The map class is a get the median element in M using the standard brackets
generic container that corresponds to the idea of a operator, in the following way:
dictionary. Specifically, it is defined as a unique ordered
associative container, that is, a container with efficient string median = M.begin()[M.size()/2].first;
search, insertion and deletion of elements (pairs of key
and value) based on unique keys that are also used Besides, it is possible to compute the difference
to define an order among container elements. This between two random access iterators p and q with the
order is internally kept to allow the user to efficiently usual subtraction: p - q. This could be useful, for
navigate the structure through iterators. On a map with instance, to calculate how many words are between two
n keys, search, insertion and deletion operations must specific ones in the map M above. Additionally, an
be performed in O(log n) time; iterator traversal (either integer k can be added to an iterator (p+=k) to get
in increasing or decreasing order) must take amortized the same result as repeating k times ++p. Finally, it
is possible to compare two iterators (p<q, p<=q, . . . ).
Note that in all these operations the elements cardinal
∗ This work has been supported by GRAMMARS project under
position in the sequential structure is used to get the
grant CYCIT TIN2004-07925 and partially by AEDRI-II project
under grant MCYT TIC2002-00190.
result (instead of their keys).
§ Departament de Llenguatges i Sistemes Informàtics, Univer- In order to offer all these possibilities, we have
sitat Politècnica de Catalunya. implemented the map class using Logarithmic Binary
Search Trees (LBSTs) [10]. This variant of balanced used for each container when possible, even if this could
binary search trees adds to existing implementations the cause unnecessary overhead. Another characteristic of
ability to support direct access by rank in logarithmic the GCC implementation is that almost all algorithms
time without using extra space. Moreover, its balancing are coded iteratively. Finally, it is not specialized for
criterion is fast to evaluate, and hence it is expected to operations with sorted ranges as input and so, for these
be efficient in practice. operations, violates the standard cost requirements.
In this paper we report on the algorithmic and Other approaches not using red-black trees have
implementation issues of this new variant of maps, been proposed. For example, CPH STL [2] offers an
and experimentally show its competitive performance implementation using B+ trees that is reported to be
compared to the GCC implementation, which uses red- faster than SGI for search and traversal but equal or
black trees as internal data structure. Besides, by worse for insertion and deletion [4]. Another implemen-
contrast to other implementations, care has been taken tation uses AVL trees; in this case, the obtained results
to use the most appropriate algorithm upon input are equal or slightly worse compared to SGI’s [7].
characteristics as required by the Standard. Altogether, In any case, if these implementations were extended
equivalent or better performance has been obtained in to support logarithmic access by rank, both extra space
main operations such as search, insertion or deletion by (to store subtrees sizes) and time in modifying oper-
key, thus proving that more functionality can be offered ations (to keep size data updated) would be required.
without relevant drawbacks. In [1, section 14.1] an extended red-black tree is anal-
The remainder of this paper is organized as follows. ysed.
First, in Section 2, related work is outlined. Then, in
Section 3, the characteristics of the LBST implementa- 3 Implementation characteristics.
tion are presented. In Section 4, the design and results The final version of the map class is the result of
of the experiments are shown. Finally, Section 5 sets out successive refinements guided by experimental results.
the conclusions of this work. The appendix presents the Specifically, three complete versions were developed.
details of the feasibility of the new method presented in Although all of them are based on LBSTs, some changes
Section 3 for one-way insertion and deletion operations. in the structure were introduced successively. This
section defines LBSTs and presents the evolution and
2 Related work. characteristics of the three implementations.
Due to the map cost requirements fixed by the C++ Stan- Logarithmic binary search trees. LBSTs [10]
dard, balanced binary search trees have been typically are binary search trees that are balanced upon subtree
used for map implementation. Note that because ele- sizes. Implementing the map class with LBSTs adds to
ments must be kept in sorted order, hashing is not an existing implementations the ability to support direct
option. Also, because the Standard fixes the costs re- access by rank (by means of random access iterators)
quirements in the worst case scenario, randomized data in logarithmic time without using extra space. Further-
structures like skip-lists can neither be used. more, its balancing criterion is easier and faster to eval-
There are many different STL map implementations. uate than that of other variants of BSTs that are also
However, none offers direct access by rank. Further- balanced upon subtree sizes, such as Weighted BSTs [9].
more, a lot of them originally derive from the Hewlett- Before we go on, let us recall the definition of
Packard implementation [11] that uses red-black trees: LBSTs. Let `(n) be the number of bits required to
GCC 1 , STL Port 2 , SGI 3 and Microsoft 4 . encode n, that is, `(0) = 0, and `(n) = 1 + blog2 nc for
This paper compares the performance of the LBST any n ≥ 1. Given a BST T , denote by |T | its number of
implementation against that of the GCC STL v.3 imple- nodes, and let `(T ) denote `(|T |). Then, T is an LBST
mentation [3]. The main reason for this selection is that if and only if (1) T is an empty tree, or (2) T is a non-
this is a widespread, accessible and free implementation. empty tree with subtrees L and R, such that L and R
An important characteristic of the GCC implementation are LBSTs and |(`(L) − `(R))| ≤ 1.
is its use of generic operations for the four ordered asso- First version. The first version was a straightfor-
ciative containers included in the Standard (map, set, ward recursive implementation of the LBST algorithms.
multimap and multiset), i.e., the same algorithm is A basic LBST node contains (apart from a key and its
related value) its subtree size and two pointers to its
1 http://gcc.gnu.org children. As in the GCC implementation, each node
2 http://www.stlport.org
was enlarged to store its parent pointer, so as to support
3 http://www.sgi.com/tech/stl/
4 http://www.microsoft.com/downloads/details.aspx?FamilyId= navigating the structure starting at any point. Empty
272BE09D-40BB-49FD-9CB0-4BFA122FA91B&displaylang=en trees were represented by a null pointer. Finally, a spe-
cial header node was introduced to mark the end of the slower. In order to understand the reason for this
sorted sequence (according to keys) of the map elements. difference, observe that single element insertion and
Unlike the GCC implementation, this first version deletion have three main phases: (1) top-down search,
was already compliant with the costs required by the (2) local insertion or deletion of the input key, and (3)
Standard for some specialized operations with sorted down-top update of subtree sizes and rebalance through
ranges as input (this mainly includes constructors and single or double rotations. The main difference between
multiple element insertions). For instance, given n ele- red-black trees and LBSTs is in the last phase, which
ments in sorted order, the GCC implementation builds a can be inferred from the code and confirmed by profiling
map in Θ(n log n) time, while our implementation does and experimental data. Specifically, while the red-black
so in Θ(n) time, as the C++ Standard requires. This tree final update takes an amortized constant number of
was achieved implementing specific algorithms for these steps, this in LBSTs requires logarithmic time, because
cases. all levels, from the update point until the root, must be
Besides, our algorithms may be more convenient visited.
than the ones in GCC for the “insertion with hint” As the extra visited nodes are in fact the same that
operation. This standard operation is intended to in the top-down search phase but in opposite order,
reduce insertion time by starting the search at the the feasibility of suppressing the down-top traversal was
hint, instead of at the tree root. Our implementation considered. This involved new insertion and deletion
tries to minimize key comparisons assuming that the algorithms to update the tree in just one (top-down)
hint is close to the target key, whereas the GCC phase. These algorithms must update the tree and
implementation ignores it unless the new element should preserve the balancing criterion without knowing if
go just before it. The experimental results showed that the insertion or deletion will finally take place. The
this approach reduces the cost of insertions by hint details of these new, sophisticated algorithms are in the
when the hint is good and comparisons are expensive. Appendix.
Anyway, the cost of our algorithm is never worse than The implementation of these algorithms, jointly
O(log n). with other minor changes, made up the third and final
Unfortunately, LBSTs exhibited a limitation: delet- version. First, the LBST structure was required to be
ing a node from an iterator requires logarithmic cost, more flexible. The new LBST structure was based on
because every node from the deleted one to the root an idea of [8] that consists on changing the meaning of
must be checked for balance, and its size field must be the node’s size field: rather than corresponding to its
updated. Note that the Standard requires amortized subtree size, it corresponds to the size of only one of
constant cost for this operation. However, this require- the two children (which one is determined by the sign).
ment of cost is not critical, and was probably established Additionally, the total tree size is stored in the header
arbitrarily assuming a red-black tree implementation. node.
Observe that the cost of deleting by iterator (which is As a consequence of this change, the size of each
perhaps not a very common operation) is usually off- subtree must be calculated at every step down the tree.
set by the cost of previous insertions. Moreover, our However, this can be done in constant time. When
implementation does offer an efficient range erase. starting at the root (the common case), we know the
Second version. Experimental comparison of our total size of the tree and the size of one of its children.
first version against the GCC implementation proved Therefore, the child size is known directly, or otherwise
that, on single element operations, GCC was faster. trivially calculated. This process can be repeated at
The analysis of the GCC code suggested that recursiv- each step down the tree. On the other hand, for iterator
ity could be the reason of the weakness: it is well known operations that start anywhere in the tree, the size of
that recursive implementations usually have worse per- the current subtree can be obtained visiting a constant
formance on hardware than iterative ones. number of tree levels on the average.
Therefore, a second, iterative version was imple- Finally, changing the meaning of the size field
mented. Besides, a unique dummy node for all empty produced another benefit: a substantial reduction, in
trees was introduced. This dummy node reduced the search schemes, of the number of visited nodes. In the
number of possible cases and simplified the resulting third version, the only visited nodes are those whose key
code, thus making it both faster and easier to debug. must be compared by the operator <. By contrast, in the
Third version. The experimental analysis of the first and second versions, some nodes were visited just
second version showed that all operations achieved to look up its size field. Altogether, these modifications
similar or better performance than GCC’s, except for decreased memory accesses, improved their locality and
single element insertion, which was still around 30% reduced the total time of operations.
4 Performance evaluation.
FIND for EXISTING data and OP size 2
This section presents the experiments that were con- 0.03
gcc3.x map
All the experiments were performed on different 0 50 100 150 200 250 300
map size (in 10^4)
350 400 450 500
memory.
0.03
By default, the results assume integer keys. Results
for other key types will only be mentioned when they 0.025
time (in sec)
(range containing all elements whose key is the given 0 50 100 150 200 250 300
map size (in 10^4)
350 400 450 500
calling both lower bound and upper bound, so our im- 0.03
0.4 0.08
0.07
0.35
0.06
0.3
0.05
0.25
0.04
0.2 0.03
0.02
0.15
0.01
0.1 0 50 100 150 200 250 300 350 400 450 500
0 50 100 150 200 250 300 350 400 450 500 map size (in 10^4)
map size (in 10^4)
0.65
1.5
time (in sec)
time (in sec)
0.6
0.55 1
0.5
0.45 0.5
0.4
0
0.35 0 50 100 150 200 250 300 350 400 450 500
0 50 100 150 200 250 300 350 400 450 500
operation size (in 10^4)
map size (in 10^4)
(b) Erase of not existing integer keys, fixed operation (b) Insertion from a sorted range and fixed map size of
size of 32 ∗ 104 8 ∗ 104
0.055
2 0.14
time (in sec)
1 0.08
0.06
0.5
0.04
0 0.02
0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500
operation size (in 10^4) map size (in 10^4)
(a) Constructor from a sorted range (a) 32-position decrement with fixed operation size of
8 ∗ 104
0.14
0.02
0.12
time (in sec)
0.015 0.1
0.08
0.01
0.06
0.005 0.04
0.02
0
0 50 100 150 200 250 300 350 400 450 500
0
map size (in 10^4) 0 50 100 150 200 250 300 350 400 450 500
map size (in 10^4)
(a) Case 1: tree after a single (b) Case 2: tree before and after the double rotation, supposing insertion in
right rotation A2L or in A2R.
Figure 10: (Hypothetical) unbalancing insertion in right branch of left child (A2). Case 3: tree before and after
the double rotation. Supposing insertion in A2L
Figure 11: (Hypothetical) unbalancing insertion in right branch of left child (A2). Case 3: tree before and after
the double rotation. Supposing insertion in A2R.
Figure 12: (Hypothetical) unbalancing insertion in right branch of left child (A2). Case 3*: tree before and after
the triple rotation for * insertion case in A2R
(a) Cases 1,2,3: tree (b) Case 4: tree before and after a double rotation.
before and after a left
rotation
Figure 13: Unbalancing erase in left child (A).
Figure 14: Unbalancing erase in left child (A). Case 5: tree before and after the double rotation.
(a) Initial impossible con- (b) Case where b increment is not enough. Inside the square is found the
figuration, analogous to in- unbalanced subtree to be the starting point at the next erase step
sertion’s 3.*
Figure 15: Subtree b analysis for case 5* of unbalancing erase in left child (A).
A.2 Erase by key. There are 5 basic possible configuration is analogous to the initial configuration
configurations, shown in figure 16. They are the of an unbalancing insertion in left child, except for
following: transition direction, and so share some similarities and
differences. So, next, b possible configurations, different
1. `(C1) = λ, `(C2) = λ + 1 from those of insertion, are discussed.
2. `(C1) = λ, `(C2) = λ
3. `(C1) = λ − 1, `(C2) = λ
4. `(C1) = λ + 1, `(C2) = λ
5. `(C1) = λ, `(C2) = λ − 1