0% found this document useful (0 votes)
57 views25 pages

Hash Table Time Costs - Hash Functions - The Map Interface and Implementations

The document summarizes key aspects of hash tables and hash maps. It discusses hash table time costs, including that search takes O(n) time in the worst case but is O(1) on average. It analyzes the average costs of linear probing and separate chaining for unsuccessful and successful searches. It also covers hash function design requirements and examples of hash functions for common data types. Finally, it provides an overview of the Java HashMap implementation, including its use of separate chaining for collision resolution.

Uploaded by

ShengFeng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views25 pages

Hash Table Time Costs - Hash Functions - The Map Interface and Implementations

The document summarizes key aspects of hash tables and hash maps. It discusses hash table time costs, including that search takes O(n) time in the worst case but is O(1) on average. It analyzes the average costs of linear probing and separate chaining for unsuccessful and successful searches. It also covers hash function design requirements and examples of hash functions for common data types. Finally, it provides an overview of the Java HashMap implementation, including its use of separate chaining for collision resolution.

Uploaded by

ShengFeng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

CSE 12

Hash Table Time Costs

Hash table time costs


Hash functions
The Map<K,V> interface and implementations

19

Hashing and hash tables


We are considering issues in implementing hash tables
We have seen that since the number of possible key
values is much larger than the number of buckets in a
table, collisions will happen
We have considered two collision resolution strategies:
Linear probing (a kind of open addressing strategy)
Separate chaining (a kind of closed addressing
strategy)
Now lets consider cost analysis of the basic algorithms
operating on a hash table using those strategies

Hash table time costs


Suppose a hash table has m buckets, and contains n entries
In the worst case, a search for a key takes O(n) steps
Why? Describe the 'worst case' situation.

This is as bad as a linked list!


However, the average case is much better

To study this, we consider the load factor of the hash table


The load factor is: the number of entries the table contains n,
divided by the number of buckets in the table m:

= n/m

In the analysis, we will distinguish between successful (the key


is in the table) and unsuccessful (the key is not in the table)
search operations
3

Analysis of Linear Probing

Using linear probing with load factor , the average case


number of steps for unsuccessful and successful search are
approximately:

1
1

U n,m 1+
2
2 1

S n,m

1
1
1+

2 1

Unsuccessful search
Successful search

These are quite good, if is not too close to 1 (that is, if the
table is not too full).
For example if =.75, then Un = 8.5 and Sn = 2.5
Note that these are O(1), independent of n
4

Analysis of Separate Chaining


Average case: The hash function distributes the n keys over
the m chains, so the chains will have an average length equal
to = n/m
An unsuccessful search will exhaustively search a chain of
length n/m = on average
A successful search will search a chain that contains the target
key, plus on average (n-1)/m other keys. On average, half of
these other keys will be searched before finding the target key

So, in the average case, searching is O(1), independent of n:

n
U n,m
m
n 1

S n,m
1 1
2m
2

Unsuccessful search
Successful search
5

Hash Function Design


A hash function has four important requirements:

Deterministic A hash function must always produce the same


hash index each time it is given the same key

Efficient Every access to the table requires hashing a key, so it


is important to the tables performance that the hash function be
fast to compute

Uniform Avoiding the worst case O(n) access time requires


that the hash function distributes the keys uniformly over the
hash table. A poor hash function will promote clustering, which
hurts performance

Consistent with equals() If two keys are equal, the hash


function should produce the same index for them. Otherwise
hash table operations may not work correctly
6

Hash Functions
A hash function takes a key of some type as argument, and
returns an integer
A hash table method will call the hash function, take the integer
it returns mod the size of the table to produce an index in
range, and perform collision resolution as required
What is a good hash function? It depends on the type of the
key

Java's Object class defines a public instance method

public int hashCode()

which just returns the address of the object in the JVM!


This method should be overridden in subclasses, and defined
appropriately
7

hashCode() for some Java classes


Character the Characters 16 bit char value, cast to int

Float or Double the floating-point number, cast to int


Integer the Integers int value

String the sum of the int values of each character in the


String, multiplied by 31 raised to the power of the
characters position in the String
ArrayList the sum of the hash values of each element in
the List, multiplied by 31 raised to power of the elements
position in the List

hashCode() for String


A technique known as folding
Treat the characters in the String as digits, in base 216; multiply
their char values by a prime (in fact, 31) raised to the power of
its digit position
This works well as a hash function for Strings, and other
arbitrary-length sequences such as Lists
Example: Java.hashCode()

J
74 * 313 +

a
97 * 312

v
+ 118 * 311

a
+ 97 * 310

Note: in practice this polynomial is computed using Horners


rule, a dynamic programming idea which avoids duplicate effort
in computing powers
9

Hash Functions For a Collection


Hashing a collection raises additional issues

Contracts for equals() and hashCode() say that two equal objects
must have the same hash code
But when are two collections equal?

When they store the same elements?


When they store the same elements in such a way that the
collections' iterators would iterate over them in the same
order?
When they store the same elements in the same structure?

In any case, equals() and hashcode() need to be designed


together, to be consistent

10

HashMap: A Map Implementation

Let's look at source code for one implementation of the


Map ADT

This is java.util.HashMap, from JDK version 1.2

(Later versions are slightly different)

11

HashMap in the JCF

12

The HashMap Class


package java.util;

/**
* This class implements a hashtable, which maps keys to values.
* Any non-null object can be used as a key or as a value.
* <p>
* To successfully store and retrieve objects from a hashtable, the
* objects used as keys must implement the <code>hashCode</code>
* method and the <code>equals</code> method.
*/
public class HashMap<K,V> implements Map<K,V>, Cloneable {

13

HashMap instance variables


/**
* The hash table data.
*/
private Entry table[];

What is the type of elements of the


table array?

/**
* The total number of entries in the hash table.
*/
private int count;
/**
* The table is rehashed when count exceeds this threshold.
*/
private int threshold;
/**
* The maximum load factor allowed for this hash table.
*/
private float loadFactor;
14

Entry inner class


HashMap defines this inner class:
private static class Entry<K,V> {
int hash;
K key;
V value;
Entry<K,V> next;
}

From the declarations so far, can you tell what


collision resolution strategy is used by HashMap?
15

HashMap constructors
/**
* Constructs a new, empty hashtable with the specified
* initial capacity and the specified load factor.
*
* @param initialCapacity the initial capacity of the table
* @param loadFactor a number between 0.0 and 1.0.
* @exception IllegalArgumentException if the initial
* capacity is less than zero, or if the load factor
* is less than or equal to zero.
*/
public HashMap(int initialCapacity, float loadFactor) {
if ((initialCapacity < 0) || (loadFactor <= 0.0)) {
throw new IllegalArgumentException();
}
table = new Entry[initialCapacity];
threshold = (int) (initialCapacity * loadFactor);
this.loadFactor = loadFactor;
}
16

HashMap default constructor


/**
* Constructs a new, empty hashtable with a
* default capacity and load factor.
*
* @since JDK1.2
*/
public HashMap() {
this(11, 0.75);
}

17

HashMap get() method


/**
* Returns the value to which the specified key is
* mapped in this hashtable.
*
* @param key a key in the hashtable.
* @return the value to which the key is mapped in
* this hashtable; or null if none.
*/
public V get(Object key) {
int hash = key.hashCode();
int index = (hash & 0x7FFFFFFF) % table.length;
for ( Entry<K,V> e = table[index] ;
e != null ;
e = e.next ) {
if ( e.hash == hash && e.key.equals(key) ) {
return e.value;
}
}
return null;
}

18

HashMap put() method (comments)


/**
* Maps the specified <code>key</code> to the specified
* <code>value</code> in this hashtable.
* Neither the key nor the value can be <code>null</code>.
* <p>
* The value can be retrieved by calling the <code>get</code>
* method with a key that is equal to the original key.
*
* @param key the hashtable key.
* @param value the value.
* @return the previous value of the specified key in this
* hashtable,or <code>null</code> if it did not have one.
* @exception NullPointerException if the key or value is
* <code>null</code>.
* @since JDK1.2
*/

19

HashMap put() method (code)


public V put(K key, V value) {
// Make sure the value is not null
if (value == null) {
throw new NullPointerException();
}
// If the key is already in the hashtable, update its value
int hash = key.hashCode();
int index = (hash & 0x7FFFFFFF) % table.length;
for (Entry<K,V> e = table[index] ; e != null ; e = e.next) {
if ( e.hash == hash && e.key.equals(key) ) {
V old = e.value;
e.value = value;
return old;
}
}
if (count >= threshold) {
// Rehash the table if the threshold is exceeded
rehash(); // this enlarges the capacity of the table
index = (hash & 0x7FFFFFFF) % table.length;
}
20

HashMap put() method (code, cont'd)

// Create and add the new entry.


Entry<K,V> e = new Entry<K,V>();
e.hash = hash;
e.key = key;
e.value = value;
e.next = table[index];
table[index] = e;
count++;
return null;
}

21

HashMap rehash() method


/** Increases the capacity of and internally
* reorganizes this hashtable, in order to accommodate
* and access its entries more efficiently.
*/
protected void rehash() {
int oldCapacity = table.length;
Entry oldMap[] = table;
int newCapacity = oldCapacity * 2 + 1;
Entry newMap[] = new Entry[newCapacity];
threshold = (int)(newCapacity * loadFactor);
table = newMap;
for (int i = oldCapacity ; i-- > 0 ;) {
for (Entry<K,V> old = oldMap[i] ; old != null ;){
Entry<K,V> e = old;
old = old.next;
int index = (e.hash & 0x7FFFFFFF)%newCapacity;
e.next = newMap[index];
newMap[index] = e;
}
}
22

Map Variations Sorted Map


A SortedMap stores key, value pairs ordered by key
the ordering is either the key types natural ordering as
defined by Comparable, or one defined by a
Comparator supplied when an SortedMap is created
The keys() operation returns a Collection view of the keys
in the SortedMap such that an iteration over the Collection
provides the keys in order

SortedMap is an interface in the JCF, implemented by


java.util.TreeMap

23

Map Variations Multi Map


A Multi Map allows a one-to-many relationship between a
key and a collection of values
Requires a redefinition of some Map operations:
put(key, value) if key is already in the map, add value
to the set of values associated with it
get(key) return the first value associated with key
remove(key) remove all the values associated with key
Requires some new operations:
getValues(key) get all the values associated with key
remove(key, value) remove only value from the set of
values associated with key
MultiMap is not an interface in the JCF
24

Next time

Final review

Reading: everything!

25

You might also like