Hash Function
Hash Function
Hash Function
Hash function is an algorithm to generate hash values corresponding to each block of data (be it a
string of characters, an object in object-oriented programming, etc.). The hash value acts as a key to
distinguish data blocks, however, people accept the phenomenon of key duplication or collision and
try to improve the algorithm to minimize such collisions. Hash functions are often used in hash tables
to reduce the computational cost of finding a block of data in a set (because comparing hashes is
faster than comparing large blocks of data).
Hash table
In computing, a hash table (hash map) is a data structure that implements an associative array
abstract data type, a structure that can map keys to values. A hash table uses a hash function to
compute an index, also called a hash code, into an array of buckets or slots, from which the desired
value can be found. During lookup, the key is hashed and the resulting hash indicates where the
corresponding value is stored.
Ideally, the hash function will assign each key to a unique bucket, but most hash table designs employ
an imperfect hash function, which might cause hash collisions where the hash function generates the
same index for more than one key. Such collisions are typically accommodated in some way.
In many situations, hash tables turn out to be on average more efficient than search trees or any
other table lookup structure. For this reason, they are widely used in many kinds of computer
software, particularly for associative arrays, database indexing, caches, and sets.
The division method
The division method involves mapping a key k into one of m slots by taking the
remainder of k divided by m as expressed in the hash function
h(k) = k mod m .
For example, if the hash table has size m = 12 and the key is k = 100, then h(k) = 4.
Mid-Square hashing
Mid-Square hashing is a hashing technique in which unique keys are generated. In this technique, a
seed value is taken and it is squared. Then, some digits from the middle are extracted. These
extracted digits form a number which is taken as the new seed. This technique can generate keys
with high randomness if a big enough seed value is taken. However, it has a limitation. As the seed
is squared, if a 6-digit number is taken, then the square will have 12-digits. This exceeds the range
of int data type. So, overflow must be taken care of. In case of overflow, use long long int data type
or uses string as multiplication if overflow still occurs. The chances of a collision in mid-square
hashing are low, not obsolete. So, in the chances, if a collision occurs, it is handled using some hash
map.
Example:
Collision
The problem arises of duplicate (index) cases if the hash algorithm is not good, and
almost no hash algorithm is really perfect to generate a unique key if storing a large
amount of data, to solve the problem. In this topic we use a linked list to store one
more layer of elements for that index.
It can be considered that generating a hash is to create a key Hash and store it in the
array of values where there is a linked list with an element containing our real key,
after we get there we will use the real key to retrieve the value.
Hashing is an improvement over Direct Access Table. The idea is to use a hash
function that converts a given phone number or any other key to a smaller
number and uses the small number as the index in a table called a hash table.
Hash Function: A function that converts a given big number to a small
practical integer value. The mapped integer value is used as an index in the
hash table. In simple terms, a hash function maps a big number or string to a
small integer that can be used as an index in the hash table.
In this article, the collision technique, quadratic probing is discussed.
Quadratic Probing: Quadratic probing is an open-addressing scheme where
we look for i2‘th slot in i’th iteration if the given hash value x collides in the
hash table.
How Quadratic Probing is done?
Let hash(x) be the slot index computed using the hash function.
1. INSERT(key): The insert Operation inserts the key according to the hash value of that key if that
hash value in the table is empty otherwise the key is inserted in first empty place from the
bottom of the hash table and the address of this empty place is mapped in NEXT field of the
previous pointing node of the chain.(Explained in example below).
2. DELETE(Key): The key if present is deleted.Also if the node to be deleted contains the address of
another node in hash table then this address is mapped in the NEXT field of the node pointing to
the node which is to be deleted
3. SEARCH(key): Returns True if key is present, otherwise return False.
The best case complexity of all these operations is O(1) and the worst case complexity is O(n)
where n is the total number of keys.It is better than separate chaining because it inserts the
colliding element in the memory of hash table only instead of creating a new linked list as in
separate chaining.
Illustration:
Example:
n = 10
Input : {20, 35, 16, 40, 45, 25, 32, 37, 22, 55}
Hash function
h(key) = key%10
Hash Buckets
In computing, a hash table [hash map] is a data structure that provides virtually
direct access to objects based on a key [a unique String or Integer]. A hash table uses
a hash function to compute an index into an array of buckets or slots, from which the
desired value can be found. Here are the main features of the key used:
The key used can be your SSN, your telephone number, account number, etc
Must have unique keys
Each key is associated with–mapped to–a value
Hash buckets are used to apportion data items for sorting or lookup purposes. The
aim of this work is to weaken the linked lists so that searching for a specific item can
be accessed within a shorter
Perfect Hash Functions
Consider the table in which the keys are stored using linear probing. Suppose we
delete A4 and then then try to find B4. Because when searching B we hash it to
position 4 and see that this position is empty and conclude that B4 is not found
(which is not true).
Extendible Hashing
Extendible Hashing is a dynamic hashing method wherein directories, and buckets
are used to hash data. It is an aggressively flexible method in which the hash
function also experiences dynamic changes.
Main features of Extendible Hashing: The main features in this hashing technique
are: