Chapter 4
Chapter 4
C+ + MA PS A ND HA S H TA B L E S
Contents
I. Lesson 4.1 – Maps
• Entries and Composition Pattern
• Map ADT
• A C++ Map Interface
• STL Map Class
: key(k), value(v) { }
K key; // key
V value; // value
};
Map ADT
• Such an object would normally be called a position. In order to be
more consistent with the C++ Standard Template Library, we define a
somewhat more general object called an iterator, which can both
reference entries and navigate around the map. Given a map iterator
p, the associated entry may be accessed by dereferencing the
iterator, namely as *p. The individual key and value can be accessed
using p->key() and p->value(), respectively.
• In order to advance an iterator from its current position to the next,
we overload the increment operator. Thus, ++p advances the iterator
p to the next entry of the map. We can enumerate all the entries of a
map M by initializing p to M.begin() and then repeatedly
incrementing p as long as it is not equal to M.end().
• In order to indicate that an object is not present in the map, we
assume that there exists a special sentinel iterator called end. By
convention, this sentinel refers to an imaginary element that lies just
beyond the last element of the map.
The map ADT consists of the following:
• size(): Return the number of entries in M.
• add entry (k,v) to M, and otherwise, replace the value field of this entry with v; return an iterator to the
inserted/modified entry.
• erase(k): Remove from M the entry with key equal to k; an error condition occurs if M has no such entry.
• erase(p): Remove from M the entry referenced by iterator p; an error condition occurs if p points to the end
sentinel.
p = myMap.find("Joe"); // *p = (“Joe”,50)
p = myMap.find("Joe");
cout << "(" << p−>first << "," << p−>second << ")\n";
}
Lesson 4.2 – Hash Tables
• The keys associated with values in a map are typically thought of as
“addresses” for those values. Examples of such applications include a
compiler’s symbol table and a registry of environment variables. Both
of these structures consist of a collection of symbolic names where
each name serves as the “address” for properties about a variable’s
type and value. One of the most efficient ways to implement a map in
such circumstances is to use a hash table.
• In general, a hash table consists of two major components, a bucket
array and a hash function.
Bucket Arrays
• A bucket array for a hash table is an array A of size N, where each cell
of A is thought of as a “bucket” (that is, a collection of key-value pairs)
and the integer N defines the capacity of the array. If the keys are
integers well distributed in the range [0,N − 1], this bucket array is all
that is needed. An entry e with key k is simply inserted into the bucket
A[k]. (See Figure 4.2.1.)
Hash Functions
• The second part of a hash table structure is a function, h, called a
hash function, that maps each key k in our map to an integer in the
range [0,N − 1], where N is the capacity of the bucket array for this
table. Equipped with such a hash function, h, we can apply the bucket
array method to arbitrary keys. The main idea of this approach is to
use the hash function value, h(k), as an index into our bucket array, A,
instead of the key k (which is most likely inappropriate for use as a
bucket array index). That is, we store the entry (k,v) in the bucket
A[h(k)].
Figure 4.2.2 The two parts of a hash function:
hash code and compression function.
Hash Codes
• The first action that a hash function performs is to take an arbitrary
key k in our map and assign it an integer value. The integer assigned
to a key k is called the hash code for k. This integer value need not be
in the range [0,N −1], and may even be negative, but we want the set
of hash codes assigned to our keys to avoid collisions as much as
possible. If the hash codes of our keys cause collisions, then there is
no hope for our compression function to avoid them.
• In addition, to be consistent with all of our keys, the hash code we use
for a key k should be the same as the hash code for any key that is
equal to k.
Hash Codes in C++
• The hash codes described below are based on the assumption that
the number of bits of each type is known. This information is provided
in the standard include file <limits>. This include file defines a
templated class numeric limits.
• Given a base type T (such as char, int, or float), the number of bits in a
variable of type T is given by “numeric limits<T>.digits.” Let us
consider several common data types and some example functions for
assigning hash codes to objects of these types.
Converting to an Integer
• On many machines, the type long has a bit representation that is
twice as long as type int. One possible hash code for a long object is
to simply cast it down to an integer and then apply the integer hash
code. The problem is that such a hash code ignores half of the
information present in the original value. If many of the keys in our
map only differ in these bits, they will collide using this simple hash
code. A better hash code, which takes all the original bits into
consideration, sums an integer representation of the high-order bits
with an integer representation of the low-order bits.
Cyclic Shift Hash Codes