0% found this document useful (0 votes)
10 views30 pages

Chapter 4

Chapter 4 covers Maps and Hash Tables in C++. It explains the Map Abstract Data Type (ADT), including the structure of key-value pairs, iterators, and the C++ Standard Template Library (STL) map class. Additionally, it discusses hash tables, their components like bucket arrays and hash functions, and provides a C++ implementation of a hash map using separate chaining.

Uploaded by

igcasan.jc07
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views30 pages

Chapter 4

Chapter 4 covers Maps and Hash Tables in C++. It explains the Map Abstract Data Type (ADT), including the structure of key-value pairs, iterators, and the C++ Standard Template Library (STL) map class. Additionally, it discusses hash tables, their components like bucket arrays and hash functions, and provides a C++ implementation of a hash map using separate chaining.

Uploaded by

igcasan.jc07
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Chapter 4

C+ + MA PS A ND HA S H TA B L E S
Contents
I. Lesson 4.1 – Maps
• Entries and Composition Pattern
• Map ADT
• A C++ Map Interface
• STL Map Class

II. Lesson 4.2 – Hash Tables


• Bucket Arrays
• Hash Functions
• Hash Codes
• A C++ Hash Table Implementation
Lesson 4.1 - Maps
• Figure 4.1.1 A conceptual illustration of the map ADT. Keys (labels) are
assigned to values (folders) by a user. The resulting entries (labeled
folders) are inserted into the map (file cabinet). The keys can be used
later to retrieve or remove values.
Map
• A map allows us to store elements so they can be located quickly
using keys. The motivation for such searches is that each element
typically stores additional useful information besides its search key,
but the only way to get at that information is to use the search key.
• Specifically, a map stores key-value pairs (k,v), which we call entries,
where k is the key and v is its corresponding value. In addition, the
map ADT requires that each key be unique, so the association of keys
to values defines a mapping.
Entries and Composition Pattern
• . An entry is actually an example of a more general object-oriented
design pattern, the composition pattern, which defines a single
object that is composed of other objects. A pair is the simplest
composition, because it combines two objects into a single pair
object.
In Code Fragment below, we present such an implementation
storing a single key-value pair. We define a class Entry, which is
templated based on the key and value types.
template <typename K, typename V>

class Entry { // a (key, value) pair

public: // public functions

Entry(const K& k = K(), const V& v = V()) // constructor

: key(k), value(v) { }

const K& key() const { return key; } // get key

const V& value() const { return value; } // get value

void setKey(const K& k) { key = k; } // set key

void setValue(const V& v) { value = v; } // set value

private: // private data

K key; // key

V value; // value

};
Map ADT
• Such an object would normally be called a position. In order to be
more consistent with the C++ Standard Template Library, we define a
somewhat more general object called an iterator, which can both
reference entries and navigate around the map. Given a map iterator
p, the associated entry may be accessed by dereferencing the
iterator, namely as *p. The individual key and value can be accessed
using p->key() and p->value(), respectively.
• In order to advance an iterator from its current position to the next,
we overload the increment operator. Thus, ++p advances the iterator
p to the next entry of the map. We can enumerate all the entries of a
map M by initializing p to M.begin() and then repeatedly
incrementing p as long as it is not equal to M.end().
• In order to indicate that an object is not present in the map, we
assume that there exists a special sentinel iterator called end. By
convention, this sentinel refers to an imaginary element that lies just
beyond the last element of the map.
The map ADT consists of the following:
• size(): Return the number of entries in M.

• empty(): Return true if M is empty and false otherwise.

• find(k): If M contains an entry e = (k,v), with key equal to k, then

• return an iterator p referring to this entry, and otherwise

• return the special iterator end.

• put(k,v): If M does not have an entry with key equal to k, then

• add entry (k,v) to M, and otherwise, replace the value field of this entry with v; return an iterator to the
inserted/modified entry.

• erase(k): Remove from M the entry with key equal to k; an error condition occurs if M has no such entry.

• erase(p): Remove from M the entry referenced by iterator p; an error condition occurs if p points to the end
sentinel.

• begin(): Return an iterator to the first entry of M.

• end(): Return an iterator to a position just beyond the end of M


• Example: In the following, we show the effect of a series of operations
on an initially empty map storing entries with integer keys and single-
character values. In the column “Output,” we use the notation pi : [(k,v)]
to mean that the operation returns an iterator denoted by pi that refers
to the entry (k,v). The entries of the map are not listed in any particular
order
A C++ Map Interface
• Before discussing specific implementations of the map ADT, we first
define a C++ interface for a map in previous Code Fragment.
• It is not a complete C++ class, just a declaration of the public
functions. The interface is templated by two types, the key type K,
and the value type V.
template <typename K, typename V>
class Map { // map interface
public:
class Entry; // a (key,value) pair
class Iterator; // an iterator (and position)
int size() const; // number of entries in the map
bool empty() const; // is the map empty?
Iterator find(const K& k) const; // find entry with key k
Iterator put(const K& k, const V& v); // insert/replace pair (k,v)
void erase(const K& k) // remove entry with key k
throw(NonexistentElement);
void erase(const Iterator& p); // erase entry at p
Iterator begin(); // iterator to first entry
Iterator end(); // iterator to end entry
};
STL Map Class
• The C++ Standard Template Library (STL) provides an
implementation of a map simply called map. As with many of the
other STL classes we have seen, the STL map is an example of a
container, and hence supports access by iterators.
• The principal member functions of the STL map are given below. Let
M be declared to be an STL map, let k be a key object, and let v be a
value object for the class M. Let p be an iterator for M.
• size(): Return the number of elements in the map.
• empty(): Return true if the map is empty and false otherwise.
• find(k): Find the entry with key k and return an iterator to it; if no such key exists
return end.
• operator[k]: Produce a reference to the value of key k; if no such key exists,
create a new entry for key k.
• insert(pair(k,v)): Insert pair (k,v), returning an iterator to its position.
• erase(k): Remove the element with key k.
• erase(p): Remove the element referenced by iterator p.
• begin(): Return an iterator to the beginning of the map.
• end(): Return an iterator just past the end of the map.
An example of the use of the STL map is shown in
Code Fragment below.
map<string, int> myMap; // a (string,int) map

map<string, int>::iterator p; // an iterator to the map

myMap.insert(pair<string, int>("Rob", 28)); // insert (“Rob”,28)

myMap["Joe"] = 38; // insert(“Joe”,38)

myMap["Joe"] = 50; // change to (“Joe”,50)

myMap["Sue"] = 75; // insert(“Sue”,75)

p = myMap.find("Joe"); // *p = (“Joe”,50)

myMap.erase(p); // remove (“Joe”,50)

myMap.erase("Sue"); // remove (“Sue”,75)

p = myMap.find("Joe");

if (p == myMap.end()) cout << "nonexistent\n"; // outputs: “nonexistent”

for (p = myMap.begin(); p != myMap.end(); ++p) { // print all entries

cout << "(" << p−>first << "," << p−>second << ")\n";

}
Lesson 4.2 – Hash Tables
• The keys associated with values in a map are typically thought of as
“addresses” for those values. Examples of such applications include a
compiler’s symbol table and a registry of environment variables. Both
of these structures consist of a collection of symbolic names where
each name serves as the “address” for properties about a variable’s
type and value. One of the most efficient ways to implement a map in
such circumstances is to use a hash table.
• In general, a hash table consists of two major components, a bucket
array and a hash function.
Bucket Arrays
• A bucket array for a hash table is an array A of size N, where each cell
of A is thought of as a “bucket” (that is, a collection of key-value pairs)
and the integer N defines the capacity of the array. If the keys are
integers well distributed in the range [0,N − 1], this bucket array is all
that is needed. An entry e with key k is simply inserted into the bucket
A[k]. (See Figure 4.2.1.)
Hash Functions
• The second part of a hash table structure is a function, h, called a
hash function, that maps each key k in our map to an integer in the
range [0,N − 1], where N is the capacity of the bucket array for this
table. Equipped with such a hash function, h, we can apply the bucket
array method to arbitrary keys. The main idea of this approach is to
use the hash function value, h(k), as an index into our bucket array, A,
instead of the key k (which is most likely inappropriate for use as a
bucket array index). That is, we store the entry (k,v) in the bucket
A[h(k)].
Figure 4.2.2 The two parts of a hash function:
hash code and compression function.
Hash Codes
• The first action that a hash function performs is to take an arbitrary
key k in our map and assign it an integer value. The integer assigned
to a key k is called the hash code for k. This integer value need not be
in the range [0,N −1], and may even be negative, but we want the set
of hash codes assigned to our keys to avoid collisions as much as
possible. If the hash codes of our keys cause collisions, then there is
no hope for our compression function to avoid them.
• In addition, to be consistent with all of our keys, the hash code we use
for a key k should be the same as the hash code for any key that is
equal to k.
Hash Codes in C++
• The hash codes described below are based on the assumption that
the number of bits of each type is known. This information is provided
in the standard include file <limits>. This include file defines a
templated class numeric limits.
• Given a base type T (such as char, int, or float), the number of bits in a
variable of type T is given by “numeric limits<T>.digits.” Let us
consider several common data types and some example functions for
assigning hash codes to objects of these types.
Converting to an Integer
• On many machines, the type long has a bit representation that is
twice as long as type int. One possible hash code for a long object is
to simply cast it down to an integer and then apply the integer hash
code. The problem is that such a hash code ignores half of the
information present in the original value. If many of the keys in our
map only differ in these bits, they will collide using this simple hash
code. A better hash code, which takes all the original bits into
consideration, sums an integer representation of the high-order bits
with an integer representation of the low-order bits.
Cyclic Shift Hash Codes

• A variant of the polynomial hash code replaces multiplication by a


with a cyclic shift of a partial sum by a certain number of bits. Such a
function, applied to character strings in C++ could, for example, look
like the following. We assume a 32-bit integer word length, and we
assume access to a function hashCode(x) for integers. To achieve a 5-
bit cyclic shift we form the “bitwise or” of a 5-bit left shift and a 27-bit
right shift. As before, we use an unsigned integer so that right shifts
fill with zeros.
int hashCode(const char* p, int len) { // hash a character array
unsigned int h = 0;
for (int i = 0; i < len; i++) {
h = (h << 5) | (h >> 27); // 5-bit cyclic shift
h += (unsigned int) p[i]; // add in next character
}
return hashCode(int(h));
}
Hashing Floating Point Quantities
• On most machines, types int and float are both 32-bit quantities.
Nonetheless, the approach of casting a float variable to type int
would not produce a good hash function, since this would truncate
the fractional part of the floating-point value. For the purposes of
hashing, we do not really care about the number’s value. It is
sufficient to treat the number as a sequence of bits. Assuming that a
char is stored as an 8-bit byte, we could interpret a 32-bit float as a
four-element character array, and a 64-bit double as an eight-
element character array.
• C++ provides an operation called a reinterpret cast, to cast between
such unrelated types. This cast treats quantities as a sequence of bits
and makes no attempt to intelligently convert the meaning of one
quantity to another.
• For example, we could design a hash function for a float by first
reinterpreting it as an array of characters and then applying the
character-array hashCode function defined above. We use the
operator sizeof, which returns the number of bytes in a type.

int hashCode(const float& x) { // hash a


float
int len = sizeof(x);
const char* p = reinterpret cast<const char*>(&x);
return hashCode(p, len);
}
A C++ Hash Table Implementation
• In Code Fragments below, we present a C++ implementation of the
map ADT, called HashMap, which is based on hashing with separate
chaining. The class is templated with the key type K, the value type V,
and the hash comparator type H.
• The hash comparator defines a function, hash(k), which maps a key
into an integer index. As with less-than comparators, a hash
comparator class does this by overriding the “()” operator.
template <typename K, typename V, typename H>
class HashMap {
public: // public types
typedef Entry<const K,V> Entry; // a (key,value) pair
class Iterator; // a iterator/position
public: // public functions
HashMap(int capacity = 100); // constructor
int size() const; // number of entries
bool empty() const; // is the map empty?
Iterator find(const K& k); // find entry with key k
Iterator put(const K& k, const V& v); // insert/replace (k,v)
void erase(const K& k); // remove entry with key k
void erase(const Iterator& p); // erase entry at p
Iterator begin(); // iterator to first entry
Iterator end(); // iterator to end entry
protected: // protected types
typedef std::list<Entry> Bucket; // a bucket of entries
typedef std::vector<Bucket> BktArray; // a bucket array
// . . .insert HashMap utilities here
private:
int n; // number of entries
H hash; // the hash comparator
BktArray B; // bucket array
public: // public types
// . . .insert Iterator class declaration here
};
• We have defined the key part of Entry to be “const K,” rather than
“K.” This prevents a user from inadvertently modifying a key. The
class makes use of two major data types. The first is an STL list of
entries, called a Bucket, each storing a single bucket. The other is an
STL vector of buckets, called BktArray.

You might also like