MCA Data Structures With Algorithms 14
MCA Data Structures With Algorithms 14
14 Hashing
Names of Sub-Units
Hashing Table Organizations, Hashing: The Symbol Table, Hashing Functions, Static and Dynamic
Hashing, Collision-Resolution Techniques
Overview
This unit begins by discussing about the concept of hashing, hashing table organization. Next the unit
discusses the Hashing: the symbol table. Further the unit explains the hashing functions and static
and dynamic hashing. Towards the end, the unit discusses the collision-resolution techniques.
Learning Objectives
Learning Outcomes
https://www.kdkce.edu.in/pdf/YDC-4IT-ADS-Hashing%20Techniques.pdf
14.1 INTRODUCTION
Hashing is a technique used for a quick retrieval of the desired data from a large volume of data. This
scheme is used when a record is stored at a particular address and this address is to be computed by
applying a formula: hash function on the key, the primary key of the record. The hash function ( ) is used
in the following manner:
a=h(k)
In this equation, a is the address computed at the time of the application of the hash function on
the k key of the record. The hash function should be selected in such a way that it results in a unique
address, every time it is used. However, it is not practically feasible because there are frequent chances
of “collision” i.e. we may get the address of the record with the k1 key, where already a record with the
k2 key is stored. These collided records are called synonyms and we apply certain collision resolution
techniques to resolve the conflict.
The domain of the hashing function is the interval of the key. If the keys are of 3 digits, we say the
domain of the hash function is (0,999). If the keys are of 5 digits, we say that domain of the hash function
is (0, 99999).
The range of the hash function is the capacity of the storage, where the records are stored. If the array
where the records are stored consists of 1000 addresses, we say that the range of the hash functio n is
(0,999). The hash function should be chosen so that it distributes the keys uniformly over the range of
the storage. If the total capacity of the storage is n, then the good function should distribute the keys
over the range (0, n-1).
2
UNIT 14: Hashing
A hash table is a data structure that stores data in multiple places at the same time. The information
is kept in an array with unique indexes for each entry. Data retrieval can be quick after we understand
the index values of the various data fields. As the size of the data grows larger, the search and insertion
operations in data structures become exceedingly fast. Hash tables keep data in arrays and utilise the
hash technique to create an index that may be used to find or insert elements.
Division Method
This method is also called the divide and remainder method. Here, the key is divided by any number n
and the remainder is taken to be the address.
Hence, the hash function is: h(k)=k mod n
This method produces the addresses ranging from 0 to n-1. The value of n should be chosen carefully.
If it is taken as the power of 10, say 100, then all the keys having identical last two digits certainly hash
into the same address. If n is chosen as an even integer, then all the even keys hash into odd addresses.
A good choice of n is any number, which is not divisible by 2, 3, 5, 10 or which is a prime number.
3
Data Structures with Algorithms
Folding Method
Suppose the hash address that needs to be generated is of d digits. Now, when we use this method, the
key is first divided into the groups of d digits, starting from the right. Then, these groups are added to
compute their sum. The last d digit of the sum is considered as the hash address. For example, suppose
the key is 12345678 and the hash address desired is of 3 digits. Then, from the right, the key 12345678 is
divided into the groups of 3 digits, beginning from the right. Thereafter, the three groups of digits are
added to make the sum as 1035, as shown in the following example:
12/345/678
Sum of 12+345+678 is 1035
The last three digits of the sum are considered as hash address i.e. 035 or 35 as are the hash address.
Hence, this method is considered to be flexible and can be modified as per our need.
4
UNIT 14: Hashing
Code optimization: For machine-dependent optimization, use the information in the symbol table.
Target code generation: Uses the identifier’s address information from the table to generate code.
Password verification is the most common application of hashing. When the user enters the password,
the hash is created and compared to the hash in the database. The user can log in if the hashes are the
same; otherwise, the user must re-enter the password.
The following are some of the most commonly used hashing functions:
MD: It is referred to as Message Digest. MD2, MD4, MD5, and MD6 are all possibilities. MD is a Hash
function with a 128-bit value.
(SHA): It stands for Secure Hash Algorithm. It can be one of the following: SHA-0, SHA-1, SHA-2, or
SHA-3. The SHA-2 family includes versions such as SHA-224, SHA-256, SHA-384, and SHA-512.
RIPEMD: It stands for Race Integrity Primitives Evaluation Message Digest is the acronym for RACE
Integrity Primitives Evaluation Message Digest. Many people utilise RIPEMD, RIPEMD -128, and
RIPEMD-160. This method is also available in 256 and 320-bit versions.
Whirlpool: It is a modified variant of AES that uses a 512-bit hash function. Whirlpool comes in three
different versions: WHIRLPOOL-0, WHIRLPOOL-T, and WHIRLPOOL.
5
Data Structures with Algorithms
In other words, if any hash function “a” returns a hash value “c,” finding an input value “b” that
hashes to “c” should be extremely difficult.
An attacker using a hash value seeking to find the input will be unable to do so because of this
property.
Resistance to second pre –image
Pre-image second to resistance, it should be extremely difficult to discover a different input that
generates the same hash value for any input and its hash value.
To put it another way, if any hash function given an input “a” returns the hash value h(a), it
should be difficult to identify any other input value “b” for which h(b) = h(a) (a).
Resistance to Collision
Collision resistance implies that finding two different inputs of any length that create the same
hash should be extremely difficult. Collision-free hash function is another name for this property.
This attribute protects against the well-known hash collision attack.
Simply put, finding any two inputs x and y for a given hash function h is extremely difficult,
hence h(x) = h (y).
This collision-free property ensures that collisions for a given hash function should be difficult
to find.
This attribute also makes it difficult for an attacker to locate two input values that produce the
same hash.
Some of the application of hash function is commonly used in the following fields:
Authentication using Cryptocurrency Password Verification
Check for data and file integrity
Signature on a computer
6
UNIT 14: Hashing
alternative slot. The program focused on subsequent data bucket 501 followed by allocation of A2 to
the bucket, as shown in Figure 1:
Data Buckets
220
221
Data Record
222 222
A2 HASH
501
502
503
New Record © guru99.com 504
7
Data Structures with Algorithms
Querying: Examines the hash index’s depth value and uses those bits to calculate the bucket address.
Update: This command run a query and updates the data.
Delete: Executes a query to locate the data to be deleted.
8
UNIT 14: Hashing
Quadratic probing method: If there is a collision at the hash address h, then this method probes
the table at locations h+1, h+4, h+9,….., i.e., at locations h+i2 (%HASHSIZE) for i=1, 2, ….., that is, the
increment function is i2. This method reduces clustering and if the HASHSIZE is a power of 2, then
relatively few positions are probed.
14.6.2 Chaining
Here, the synonym is linked with the help of pointers i.e. extra space outside the hash table is allocated
and connected in the form of a linked list.
Chaining method is used for the linked storage and hence, in this method, each slot of the hash table
has a pointer to the linked list and all the elements hashed to a slot are placed in the linked list attached
to that slot. For example, we can see in Figure 2 that all the elements hashed to slot 3 are placed in the
linked list attached to that slot. Similarly, the element hashed to slot 0 is linked to that slot with the help
of a pointer:
0 NULL
1
2
3 NULL
4
5
9 NULL
9
Data Structures with Algorithms
Over�low: A third advantage is that it is no longer necessary that the size of the hash table exceeds
the number of records. If there are more records than the entries in the table, then it means that
only some of the linked lists are now sure to contain more than one record. Even if the size of the
records is several times more than the size of the table, then the average length of the linked list
remains small and the sequential search on the appropriate list remains efficient.
Deletion: Deletion becomes a quick and easy task in a chained hash table. For example, deletion of a
node from a linked list just requires the adjustment of address pointers.
Hashing is a technique used for a quick retrieval of the desired data from a large volume of data.
Hash Table or Hash Map is a two-dimensional structure where the data (associated with some key)
is mapped or hashed to some value.
A substantial information store generated and handled by a compiler is referred to as a symbol
table.
A hash function converts an arbitrary-size input value to a fixed-size value.
Static hashing is a hashing technique that allows users to do lookups on a finished dictionary set (all
objects in the dictionary are final and not changing).
Dynamic hashing is a hashing technique that creates and removes data buckets on demand.
In open hashing the subsequent data cluster focused on entering new record in the open hashing
method.
If we have occupied buckets, vacant bucket is given to identical hash and results will be associated
after previous one in close hashing.
Chaining method is used for the linked storage.
14.8 GLOSSARY
Hashing: This technique used for a quick retrieval of the desired data from a large volume of data
Hash table: It is also known as hash map and it is a two-dimensional structure where the data is
mapped or hashed to some value
Symbol table: A substantial information store generated and handled by a compiler is referred to
as a symbol table
10
UNIT 14: Hashing
11
Data Structures with Algorithms
https://levelup.gitconnected.com/the-3-applications-of-hash-functions-fab1a75f4d3d
https://web.stanford.edu/class/archive/cs/cs106b/cs106b.1208/lectures/hashing/Lecture27_Slides.
pdf
Discuss with your friends and classmates about the concept of hashing and their applications. Also,
discuss about the real world examples of hashing.
12