Hash Function
Hash Function
A hash function is any function that can be used to map data of arbitrary size to fixed-size
values.
The values returned by a hash function are called hash values, hash codes, digests, or simply
hashes.
The values are usually used to index a fixed-size table called a hash table.
Definition: A hash function is a function that takes a set of inputs of any arbitrary size and fits
them into a table or other data structure that contains fixed-size elements.
Hashing refers to the concept of taking any arbitrary amount of input data (any data – word
document, audio file, video file, executable file, etc.) and applying the hashing algorithm to it.
The algorithm generates a gibberish output data called the ‘hash’ or ‘hash value’. This hash
value is also known as a message digest.
There are two important things to note, however. One, hashing is a one-way function. It can be
used on any input data to generate a hash value. But, applying a hash function to a hash value
will not reveal the input data. Second, hashing always produces a fixed-length hash value,
irrespective of the length/ size of the input.
A Few Examples
In the above examples, the hash is different for different inputs. Also, the input length varies,
but the size of the hash value remains same.
Probably most common type of hash function to ever exist on this planet. It uses basic
properties of division to generate the values for the corresponding keys.
Function:
h(k) = k mod m
or in more general terms
h(k) = (ak+b) mod m //Where a and b are constants.
where k is the key and m is the size of our hash table.
We should choose size which is a prime and not close to a power of 2. It does not work as
desired if there are some patterns in the input data.
Example:
Calculating a Hash Table:
Formal definitions of hash functions vary from application to
application.
Let’s take a simple example by taking each number mod 10, and
putting it into a hash table that has 10 slots.
We take each value, apply the hash function to it, and the result tells
us what slot to put that value in, with the left column denoting the
slot, and the right column denoting what value is in that slot, if any.
Our hash function here is to take each value mod 10. The table to
the right shows the resulting hash table. We hash a series of values as
we get them, so the first value we hash is the first value in the string of
values, and the last value we hash is the last value in the string of
values.
Definition: A collision occurs when more than one value to be hashed by a particular hash function hash
to the same slot in the table or data structure (hash table) being generated by the hash function.
Let’s take the exact same hash function from before: take the
value to be hashed mod 10, and place it in that slot in the hash
table.
Open Hashing or Chaining method creates an external chain of values that has the same index.
The chain is generated from that position as a linked list. Collision is resolved by storing multiple
values together in that same index.
Closed Hashing or Open Addressing tries to utilize the empty indexes in a hash table for handling
collision. In this method, the size of the hash table needs to be larger than the number of keys
for storing all the elements.
Open Addressing
Open addressing or closed hashing is the second most used method to resolve collision.
This method aims to keep all the elements in the same table and tries to find empty slots for
values.
Closed hashing refers to the fact that the values always stay stored in the hash table.
Open addressing is named because the locations for the values are not fixed and can be
addressed to an empty slot if a collision happens.
This method resolves collisions by probing or searching through the hash table for indexes that
are available for storing elements.
Unlike open hashing or chaining, open addressing stores one value in each index. The basic
functions of this method are to add, remove or find an element.
It has different approaches for these functions but the well-known are:
Linear Probing
Quadratic Probing
Double Hashing