Hash Tables
Hash Tables
» HASH FUNCTION
» SEPARATE CHAINING
Department: Computer science
Course Code: CSS-215
Course Instructor: Asst. Prof. Dr. Mohammed Ala’anzy
Office no.: G-405
ST IMPLEMENTATIONS:
SUMMARY
Worst-case cost Average case
Implementation (After N inserts) (After N random inserts)
Search Insert Search hit Insert
Sequential search
(Unordered list) N N N/2 N
binary search
(ordered array) log N N log N N/2
BST N N 1.39 log N 1.39 log N
2-3 tree c Log N c Log N c Log N c Log N
red-black BST 2 log N 2 log N 1 log N* 1 log N*
Q. Can we do better?
A. Yes, but with different access to the data.
HASHING: INTRODUCTION
Hashing is a process that transforms input data (of any size) into a
fixed-size string of characters, using a mathematical algorithm called a
hash function. This output, known as the hash code or hash value, is
typically used for indexing, data retrieval, or ensuring data integrity.
Now, the quest is for a data structure that enables both storing and
searching data in constant time, as O(1). This is where the Hashing
data structure becomes crucial. Its introduction allows for the effortless
storage and retrieval of data in constant time, addressing the efficiency
concerns that were previously a challenge.
THE MAIN COMPONENTS OF
HASHING
Hashing involves three main components:
1. Key: The key serves as input to the hash function and can be a string or an integer.
It determines the index or location for storing an item in a data structure.
2. Hash Function: This function takes the input key and produces the hash index,
which is essentially the index of an element in an array known as a hash table.
3. Hash Table: The hash table is a data structure that utilizes a hash function to map
keys to values. It stores data in an associative manner within an array, where each
data value has a unique index, ensuring efficient retrieval based on the hashed key.
THE COMPONENTS OF
HASHING
Hash table
HASHING IMPLEMENTATION
Consider a set of strings, for example, {“ab”, “cd”, “efg”,} that needs to be stored in a
table. The primary objective is to efficiently search or update values within the table in
O(1) time, with no emphasis on the ordering of the strings. In this context, the given
set of strings can serve as keys, and each string itself can act as its corresponding
value. The key challenge lies in determining the most effective method to store the
value associated with each key.
INDEX FINDING IN THE HASH
TABLE
Step 1: Hash functions utilize mathematical formulas to determine a hash value,
which serves as the index in the data structure for storing the corresponding value.
Step 3: Calculate the numerical value by summing all characters in the string. For
instance, "xyz" would be 5 + 10 + 25 = 40.
INDEX FINDING IN THE HASH
TABLE
Step 4: Assume a table of size 5 is available to store these strings.
Using the hash function (sum of characters in key mod Table size),
compute the location in the array by taking the sum(string) mod 5. “xyz”
public final class Integer{ public final class Boolean{ public final class Double {
private final int value; private final boolean value; private final double value;
... ... ...
public int hashCode(){ public int hashCode(){ public int hashCode(){
return value; if (value) return 1231; long bits =
} else return 1237; doubleToLongBits(value);
} } return (int) (bits ^ (bits >>> 32));
} }
}
convert to IEEE 64-bit
representation;
xor most significant 32-bits
with least significant 32-bits
IMPLEMENTING HASH CODE:
STRINGS
Java library implementation
char Unicode
public final class String{ • Horner's method to hash string of length
private final char[] s; … …
L: L multiplies/adds.
... ‘a’ 97
public int hashCode(){ • Equivalent to h = s[0] · + … + s[L – 3] · +
int hash = 0; ‘b’ 98
s[ L – 2 ] · + s[ L – 1 ] · .
for (int i = 0; i < length(); i++) ‘c’ 99
hash = s[i] + (31 * hash); … …
return hash;
} EX. String s = "call";
} int code = s.hashCode();
character of s 3045982 = 99· + 97· + 108· + 108·
= 108 + 31· (108 + 31 · (97 +
31 · (99)))
(Horner's method)
hashCode()
The hashCode() method in Java calculates a hash code for a String object. The hash
code is a 32-bit integer that is used to identify the String object and to store it in a hash
table. The hash code is calculated using the following formula:
s[0]* + s[1]* + ... + s[n-1]
where:
Basic rule. Need to use the whole key to compute hash code;
MODULAR HASHING
What we get back from the hashcode is an int between and .
Hash function. An int between 0 and M - 1 (for use as array index). typically a prime or power of 2
Collisions
Java's String data uniformly distribute the keys of Tale of Two Cities
HASH TABLES
» HASH FUNCTION
» SEPARATE CHAINING
HASH TABLES
» HASH FUNCTION
» SEPARATE CHAINING
COLLISIONS
Collision. Two distinct keys hashing to same index.
Birthday problem ⇒ can't avoid collisions unless you have a ridiculous (quadratic)
amount of memory.
Hash functions are designed to distribute values across a range of possible hash
codes, ideally minimizing collisions. However, due to the finite range of hash codes
and the infinite potential inputs, collisions are inevitable in hash functions. The
challenge is to handle collisions effectively to ensure the proper functioning of hash-
based data structures.
WAYS TO SOLVE THE
COLLISION ISSUE
There are various techniques to handle collisions:
1. Separate Chaining: Each bucket in the hash table is a linked list, and multiple values that hash to
the same index are stored in this linked list.
2. Open Addressing: The system looks for the next available slot when a collision occurs. There are
different strategies like linear probing, quadratic probing, and double hashing.
3. Robin Hood Hashing: Similar to linear probing, but when inserting a new element, it compares the
distance it has traveled with the distance of the element already present.
Handling collisions is crucial because it ensures that hash-based data structures, like hash tables or
hash maps, maintain their efficiency in terms of constant-time (O(1)) average lookup, insertion, and
deletion operations. The choice of collision resolution strategy depends on the specific requirements
and characteristics of the application.
SEPARATE CHAINING
SYMBOL TABLE
Use an array of M < N linked lists. [H. P. Luhn, IBM 1953]
Hash: map key to integer i between 0 and M - 1.
Insert: put at front of chain (if not already there).
Search: need to search only chain.
ANALYSIS OF SEPARATE
CHAINING
Consequence. Number of probes for search/insert is proportional to N / M.
M too large ⇒ too many empty chains.