Hash Tables
Hash Tables
What drives the need for hash tables given the existence of
balanced binary search trees?:
support relatively fast searches (O (log n)), insertion and deletion
Of the three types of open addressing double hashing gives the best
performance.
Overall, open addressing works very well up to load factors of around
0.5 (when 2 probes on average are required to find a record). For load
factors greater than 0.6 performance declines dramatically.
Rehashing
If the load factor goes over the safe limit, we
should increase the size of the hash table (as
for dynamic arrays). This process is called
rehashing.
Comments:
we cannot just double the size of the table, as the
size should be a prime number;
it will change the main hash function
it’s not enough to just copy items
Rehashing will take time O(N)
Dealing with Collisions (2nd approach):
Separate Chaining
In separate chaining the hash table consists
of an array of lists.
When a collision occurs the new record is
added to the list.
Deletion is straightforward as the record can
simply be removed from the list.
Finally, separate chaining is less sensitive to
load factor and it is normal to aim for a load
factor of around 1 (but it will work also for
load factors over 1).
Figure: Separate chaining (using linked lists).
If array-based implementation of list is used: they are called buckets.
Implementation – data members
public class SCHashTable<T extends KeyedItem>
implements HashTableInterface<T>
{
private List<T>[] table;