0% found this document useful (0 votes)
132 views9 pages

Hash Function

Download as docx, pdf, or txt
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 9

Hash function

Hash function is an algorithm to generate hash values corresponding to each block of data (be it a
string of characters, an object in object-oriented programming, etc.). The hash value acts as a key to
distinguish data blocks, however, people accept the phenomenon of key duplication or collision and
try to improve the algorithm to minimize such collisions. Hash functions are often used in hash tables
to reduce the computational cost of finding a block of data in a set (because comparing hashes is
faster than comparing large blocks of data).

Hash table
In computing, a hash table (hash map) is a data structure that implements an associative array
abstract data type, a structure that can map keys to values. A hash table uses a hash function to
compute an index, also called a hash code, into an array of buckets or slots, from which the desired
value can be found. During lookup, the key is hashed and the resulting hash indicates where the
corresponding value is stored.

Ideally, the hash function will assign each key to a unique bucket, but most hash table designs employ
an imperfect hash function, which might cause hash collisions where the hash function generates the
same index for more than one key. Such collisions are typically accommodated in some way.

In many situations, hash tables turn out to be on average more efficient than search trees or any
other table lookup structure. For this reason, they are widely used in many kinds of computer
software, particularly for associative arrays, database indexing, caches, and sets.
The division method
The division method involves mapping a key k into one of m slots by taking the
remainder of k divided by m as expressed in the hash function

h(k) = k mod m .
For example, if the hash table has size m = 12 and the key is k = 100, then h(k) = 4.

Folding Method in Hashing


Algorithm:
 The folding method is used for creating hash functions starts with the item
being divided into equal-sized pieces i.e., the last piece may not be of equal size.
 The outcome of adding these bits together is the hash value, H(x) = (a + b +c)
mod M, where a, b, and c represent the preconditioned key broken down into
three parts and M is the table size, and mod stands for modulo
 In other words, the sum of three parts of the preconditioned key is divided by
the table size. The remainder is the hash key.
Explanation:
Example 1: The task is to fold the key 123456789 into a Hash Table of ten spaces (0
through 9).
 It is given that the key, say X is 123456789 and the table size (i.e., M = 10).
 Since it can break X into three parts in any order. Let’s divide it evenly.
 Therefore, a = 123, b = 456, c = 789.
 Now, H(x) = (a + b + c) mod M i.e., H(123456789) =(123 + 456 + 789) mod 10 =
1368 mod 10 = 8.
 Hence, 123456789 is inserted into the table at address 8.

Mid-Square hashing
Mid-Square hashing is a hashing technique in which unique keys are generated. In this technique, a
seed value is taken and it is squared. Then, some digits from the middle are extracted. These
extracted digits form a number which is taken as the new seed. This technique can generate keys
with high randomness if a big enough seed value is taken. However, it has a limitation. As the seed
is squared, if a 6-digit number is taken, then the square will have 12-digits. This exceeds the range
of int data type. So, overflow must be taken care of. In case of overflow, use long long int data type
or uses string as multiplication  if overflow still occurs. The chances of a collision in mid-square
hashing are low, not obsolete. So, in the chances, if a collision occurs, it is handled using some  hash
map.
Example:

Suppose a 4-digit seed is taken. seed = 4765


Hence, square of seed is = 4765 * 4765 = 22705225
Now, from this 8-digit number, any four digits are extracted (Say, the
middle four).
So, the new seed value becomes seed = 7052
Now, square of this new seed is = 7052 * 7052 = 49730704
Again, the same set of 4-digits is extracted.
So, the new seed value becomes seed = 7307

Collision
The problem arises of duplicate (index) cases if the hash algorithm is not good, and
almost no hash algorithm is really perfect to generate a unique key if storing a large
amount of data, to solve the problem. In this topic we use a linked list to store one
more layer of elements for that index.

It can be considered that generating a hash is to create a key Hash and store it in the
array of values where there is a linked list with an element containing our real key,
after we get there we will use the real key to retrieve the value.

Quadratic Probing in Hashing

Hashing is an improvement over Direct Access Table. The idea is to use a hash
function that converts a given phone number or any other key to a smaller
number and uses the small number as the index in a table called a hash table.
Hash Function: A function that converts a given big number to a small
practical integer value. The mapped integer value is used as an index in the
hash table. In simple terms, a hash function maps a big number or string to a
small integer that can be used as an index in the hash table.
In this article, the collision technique, quadratic probing is discussed.
Quadratic Probing: Quadratic probing is an open-addressing scheme where
we look for i2‘th slot in i’th iteration if the given hash value x collides in the
hash table.
How Quadratic Probing is done?
Let hash(x) be the slot index computed using the hash function.

If the slot hash(x) % S is full, then we try (hash(x) + 1*1) % S.


If (hash(x) + 1*1) % S is also full, then we try (hash(x) + 2*2) % S.
If (hash(x) + 2*2) % S is also full, then we try (hash(x) + 3*3) % S.
This process is repeated for all the values of i until an empty slot is found.
For example: Let us consider a simple hash function as “key mod 7” and
sequence of keys as 50, 700, 76, 85, 92, 73, 101.
Coalesced hashing
Coalesced hashing is a collision avoidance technique when there is a fixed sized data. It is a
combination of both Separate chaining  and Open addressing . It uses the concept of Open
Addressing(linear probing) to find first empty place for colliding element from the bottom of the
hash table and the concept of Separate Chaining to link the colliding elements to each other
through pointers. The hash function used is h=(key)%(total number of keys). Inside the hash table,
each node has three fields:
 h(key): The value of hash function for a key.
 Data: The key itself.
 Next: The link to the next colliding elements.
The basic operations of Coalesced hashing are:

1. INSERT(key): The insert Operation inserts the key according to the hash value of that key if that
hash value in the table is empty otherwise the key is inserted in first empty place from the
bottom of the hash table and the address of this empty place is mapped in NEXT field of the
previous pointing node of the chain.(Explained in example below).
2. DELETE(Key): The key if present is deleted.Also if the node to be deleted contains the address of
another node in hash table then this address is mapped in the NEXT field of the node pointing to
the node which is to be deleted
3. SEARCH(key): Returns True if key is present, otherwise return False.
The best case complexity of all these operations is O(1) and the worst case complexity is O(n)
where n is the total number of keys.It is better than separate chaining because it inserts the
colliding element in the memory of hash table only instead of creating a new linked list as in
separate chaining.
Illustration:
Example:

n = 10
Input : {20, 35, 16, 40, 45, 25, 32, 37, 22, 55}
Hash function

h(key) = key%10

Hash Buckets
In computing, a hash table [hash map] is a data structure that provides virtually
direct access to objects based on a key [a unique String or Integer]. A hash table uses
a hash function to compute an index into an array of buckets or slots, from which the
desired value can be found. Here are the main features of the key used:

 The key used can be your SSN, your telephone number, account number, etc
 Must have unique keys
 Each key is associated with–mapped to–a value

Hash buckets are used to apportion data items for sorting or lookup purposes. The
aim of this work is to weaken the linked lists so that searching for a specific item can
be accessed within a shorter
Perfect Hash Functions
Consider the table in which the keys are stored using linear probing. Suppose we
delete A4 and then then try to find B4. Because when searching B we hash it to
position 4 and see that this position is empty and conclude that B4 is not found
(which is not true).

In computer science , a perfect hash function h for a set S is a hash function that


maps distinct elements in S to a set of m integers, with no collisions. In mathematical
terms, it is an injective function.
Perfect hash functions may be used to implement a lookup table with constant
worst-case access time.

Extendible Hashing
Extendible Hashing is a dynamic hashing method wherein directories, and buckets
are used to hash data. It is an aggressively flexible method in which the hash
function also experiences dynamic changes.

Main features of Extendible Hashing: The main features in this hashing technique
are:

Directories: The directories store addresses of the buckets in pointers. An id is


assigned to each directory which may change each time when Directory Expansion
takes place.
Buckets: The buckets are used to hash the actual data.
Basic Structure of Extendible Hashing:

requently used terms in Extendible Hashing : 


 
 Directories: These containers store pointers to buckets. Each directory is given a
unique id which may change each time when expansion takes place. The hash
function returns this directory id which is used to navigate to the appropriate
bucket. Number of Directories = 2^Global Depth.
 Buckets: They store the hashed keys. Directories point to buckets. A bucket may
contain more than one pointers to it if its local depth is less than the global
depth.
 Global Depth: It is associated with the Directories. They denote the number of
bits which are used by the hash function to categorize the keys. Global Depth =
Number of bits in directory id.
 Local Depth: It is the same as that of Global Depth except for the fact that Local
Depth is associated with the buckets and not the directories. Local depth in
accordance with the global depth is used to decide the action that to be
performed in case an overflow occurs. Local Depth is always less than or equal
to the Global Depth.
 Bucket Splitting: When the number of elements in a bucket exceeds a particular
size, then the bucket is split into two parts.
 Directory Expansion: Directory Expansion Takes place when a bucket overflows.
Directory Expansion is performed when the local depth of the overflowing
bucket is equal to the global depth.
 Cryptographic Hash Functions

You might also like