Hashing
Hashing
Chapter 28
Liang, Introduction to Java Programming, Ninth Edition, (c) 2013 Pearson Education, Inc. All
rights reserved.
1
Objectives
➢ To know what hashing is for.
➢ To obtain the hash code for an object and design the hash
function to map a key to an index .
➢ To handle collisions using open addressing.
➢ To know the differences among linear probing, quadratic
probing, and double hashing.
➢ To handle collisions using separate chaining.
➢ To understand the load factor and the need for rehashing.
➢ To implement MyHashMap using hashing.
➢ To implement MyHashSet using hashing.
Liang, Introduction to Java Programming, Ninth Edition, (c) 2013 Pearson Education, Inc. All
rights reserved.
2
Why Hashing?
• In a binary search tree, an element can be found in
O(log(n)) time in a well-balanced search tree.
• Is there a more efficient way to search for an element in
a container?
• This chapter introduces a technique called hashing.
• You can use hashing to implement a map or a set to
search, insert, and delete an element in O(1) time.
Liang, Introduction to Java Programming, Ninth Edition, (c) 2013 Pearson Education, Inc. All
rights reserved.
3
Map
• A map is a data structure that stores entries. Each entry
contains two parts: key and value.
• The key is also called a search key, which is used to
search for the corresponding value.
• For example, a dictionary can be stored in a map, where
the words are the keys and the definitions of the words
are the values.
• A map is also called a dictionary, a hash table, or an
associative array. The new trend is to use the term map.
Liang, Introduction to Java Programming, Ninth Edition, (c) 2013 Pearson Education, Inc. All
rights reserved.
4
What is Hashing?
• If you know the index of an element in the array, you can
retrieve the element using the index in O(1) time.
• So, can we store the values in an array and use the key as the
index to find the value?
• The answer is yes if you can map a key to an index.
• The array that stores the values is called a hash table. The
function that maps a key to an index in the hash table is called a
hash function.
• Hashing is a technique that retrieves the value using the index
obtained from key without performing a search.
Liang, Introduction to Java Programming, Ninth Edition, (c) 2013 Pearson Education, Inc. All
rights reserved.
5
Hash Function and Hash Codes
• A typical hash function;
✓ first converts a search key to an integer value called a hash
code, and
✓ … then compresses the hash code into an index of the hash
table.
Liang, Introduction to Java Programming, Ninth Edition, (c) 2013 Pearson Education, Inc. All
rights reserved.
6
Hash Codes for Primitive Data Types
➢ To get hash codes for keys of type byte, short,
int and char simply cast them to int.
➢ To get the hash code for a float, use
Float.floatToIntBits(key)
➢ To get the one for long, use folding
➢int hashcode = (int)(key^(key>>32))
➢ For a double convert the double to its long value
using Double.doubleToLongBits(key) and then
do folding on resulting long value.
Liang, Introduction to Java Programming, Ninth Edition, (c) 2013 Pearson Education, Inc. All
rights reserved.
7
Hash Codes for Strings
➢ Search keys are often strings, hence it is important to have
good hashing functions for strings
➢ One solution is to use the sum of the Unicode of all
characters to get the hash code
– This solution will produce a lot of collision for different
keys with same letters
➢ A better solution is to generate a hash code that takes the
position of the characters into consideration
– E.g. Using a polynomial hash code
Liang, Introduction to Java Programming, Ninth Edition, (c) 2013 Pearson Education, Inc. All
rights reserved.
11
Linear Probing Animation
http://www.cs.armstrong.edu/liang/animation/web/LinearProbing.html
Liang, Introduction to Java Programming, Ninth Edition, (c) 2013 Pearson Education, Inc. All
rights reserved.
12
Quadratic Probing
• Linear probing looks at the consecutive cells beginning at index
k.
• This often results in clustering of entries
• Quadratic probing can avoid the clustering problem in linear
probing.
0 key: 44
New element with For simplicity, only the keys are
key 26 to be inserted 1 shown and the values are not
shown.
2
3 .
k + j2 for j>=1,
4 .key: 4 k, k+1, k+4, k+9, …
key: 16
5
Quadratic probe 2 6 .key: 28
N-1
times before finding .
an empty cell N-1 N-1
7 .
N-1
8
N-1
9
N-1 N-1
key: 21
10
N-1
Liang, Introduction to Java Programming, Ninth Edition, (c) 2013 Pearson Education, Inc. All
rights reserved.
13
Quadratic Probing
http://cs.armstrong.edu/liang/animation/web/QuadraticProbing.html
Liang, Introduction to Java Programming, Ninth Edition, (c) 2013 Pearson Education, Inc. All
rights reserved.
14
Double Hashing
• Double hashing uses a secondary hash function on the keys to
determine the increments to avoid the clustering problem.
h`(k) = 7 – k % 7;
0 0 0
key: 45 key: 45 1 key: 45
h(12) 1 1
2 2 2
5 5 h(12) + 2*h'(12) 5
. 27
6 key: . 27
6 key: . 27
6 key:
N-1
N-1 N-1
. . .
N-1 N-1 N-1 N-1
7 .
7 . 7 .
N-1 N-1 N-1
8
8 8
N-1 N-1 N-1 N-1
9 9 9
N-1 N-1 N-1
key: 23 key: 23 key: 23
10 10 10
N-1 N-1 N-1 N-1
Liang, Introduction to Java Programming, Ninth Edition, (c) 2013 Pearson Education, Inc. All
rights reserved.
15
Class Exercise
Show how the keys 15, 57, 34, 83, 28, 41, 53, 70, 44, and 39
will be place in hash table of size 13 using
a. Quadratic probing
b. Double hashing with h`(key)= 7-key % 7
Suppose the size of the hash table is doubled after adding 39,
give the new hash table after adding the key 52 using double
hashing.
Liang, Introduction to Java Programming, Ninth Edition, (c) 2013 Pearson Education, Inc. All
rights reserved.
16
Handling Collisions Using Separate Chaining
• The separate chaining scheme places all entries with the same hash index into
the same location, rather than finding new locations.
• Each location in the separate chaining scheme is called a bucket. A bucket is a
container that holds multiple entries.
New element with 0 key: 44 For simplicity, only the keys are
key 26 to be inserted shown and the values are not
1
shown.
2
3 .
4 key: 4 key: 26
5 . key: 16
6 .
N-1 key: 28
.
N-1 N-1
7 N-1
N-1
8
N-1
9
N-1
10 key: 21
N-1
Liang, Introduction to Java Programming, Ninth Edition, (c) 2013 Pearson Education, Inc. All
rights reserved.
17
Separate Chaining Animation
http://cs.armstrong.edu/liang/animation/web/SeparateChaining.html
Liang, Introduction to Java Programming, Ninth Edition, (c) 2013 Pearson Education, Inc. All
rights reserved.
18
Implementing Map Using Hashing
«interface»
MyMap<K, V>
+clear(): void Removes all entries from this map.
+containsKey(key: K): boolean Returns true if this map contains an entry for the
specified key.
+containsValue(value: V): boolean Returns true if this map maps one or more keys to the
specified value.
+entrySet(): Set< Entry<K, V>> Returns a set consisting of the entries in this map.
+get(key: K): V Returns a value for the specified key in this map.
+getAll(key: K): Set<V> Returns all values for the specified key in this map.
+isEmpty(): boolean Returns true if this map contains no mappings.
+keySet(): Set<K> Returns a set consisting of the keys in this map.
+put(key: K, value: V): V Puts a mapping in this map.
MyMap
+remove(key: K): void Removes the entries for the specified key.
+size(): int Returns the number of mappings in this map.
+values(): Set<V> Returns a set consisting of the values in this map.
MyHashMap
MyHashMap<K, V> Concrete class that implements MyMap
MyMap.Entry<K, V>
TestMyHashMap
-key: K
-value: V
+Entry(key: K, value: V) Constructs an entry with the specified key and value.
+getKey(): Key Returns the key in the entry.
+getValue(): Value Returns the value in the entry.
Liang, Introduction to Java Programming, Ninth Edition, (c) 2013 Pearson Education, Inc. All
rights reserved.
19
Complexity of Key Map Operations
Liang, Introduction to Java Programming, Ninth Edition, (c) 2013 Pearson Education, Inc. All
rights reserved.
20
Implementing Set Using Hashing
«interface»
MySet<E>
+clear(): void Removes all elements from this set.
+contains(e: E): boolean Returns true if the element is in the set.
+add(e: E): boolean Adds the element to the set and returns true if the element is added
successfully.
+remove(e: E): boolean Removes the element from the set and returns true if the set
contained the element.
+isEmpty(): boolean Returns true if this set contains no elements.
+size(): int Returns the number of elements in this set.
+iterator(): java.util.Iterator<E> Returns an iterator for the elements in this set.
MySet MyHashSet
Liang, Introduction to Java Programming, Ninth Edition, (c) 2013 Pearson Education, Inc. All
rights reserved.
21
Complexity of Key Set Operations
Liang, Introduction to Java Programming, Ninth Edition, (c) 2013 Pearson Education, Inc. All
rights reserved.
22