0% found this document useful (0 votes)
12 views

MCA Data Structures With Algorithms 14

Uploaded by

SAI SRIRAM
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

MCA Data Structures With Algorithms 14

Uploaded by

SAI SRIRAM
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

UNIT

14 Hashing

Names of Sub-Units

Hashing Table Organizations, Hashing: The Symbol Table, Hashing Functions, Static and Dynamic
Hashing, Collision-Resolution Techniques

Overview

This unit begins by discussing about the concept of hashing, hashing table organization. Next the unit
discusses the Hashing: the symbol table. Further the unit explains the hashing functions and static
and dynamic hashing. Towards the end, the unit discusses the collision-resolution techniques.

Learning Objectives

In this unit, you will learn to:


 Explain the concept of hashing
 Describe the significance of hashing table organizations and hashing: the symbol table
 Discuss the hashing functions
 Assess the significance of static and dynamic hashing
 Explain the concept of collision-resolution techniques
Data Structures with Algorithms

Learning Outcomes

At the end of this unit, you would:


 Evaluate the concept of hashing
 Determine the significance of hashing table organizations and hashing: the symbol table
 Explore the concept of hashing functions
 Determine the significance of static and dynamic hashing
 Evaluate the importance of collision-resolution techniques

Pre-Unit Preparatory Material

 https://www.kdkce.edu.in/pdf/YDC-4IT-ADS-Hashing%20Techniques.pdf

14.1 INTRODUCTION
Hashing is a technique used for a quick retrieval of the desired data from a large volume of data. This
scheme is used when a record is stored at a particular address and this address is to be computed by
applying a formula: hash function on the key, the primary key of the record. The hash function ( ) is used
in the following manner:
a=h(k)
In this equation, a is the address computed at the time of the application of the hash function on
the k key of the record. The hash function should be selected in such a way that it results in a unique
address, every time it is used. However, it is not practically feasible because there are frequent chances
of “collision” i.e. we may get the address of the record with the k1 key, where already a record with the
k2 key is stored. These collided records are called synonyms and we apply certain collision resolution
techniques to resolve the conflict.
The domain of the hashing function is the interval of the key. If the keys are of 3 digits, we say the
domain of the hash function is (0,999). If the keys are of 5 digits, we say that domain of the hash function
is (0, 99999).
The range of the hash function is the capacity of the storage, where the records are stored. If the array
where the records are stored consists of 1000 addresses, we say that the range of the hash functio n is
(0,999). The hash function should be chosen so that it distributes the keys uniformly over the range of
the storage. If the total capacity of the storage is n, then the good function should distribute the keys
over the range (0, n-1).

14.2 HASH TABLE ORGANIZATION


Hash Table or Hash Map is a two-dimensional structure where the data (associated with some key) is
mapped or hashed to some value. Hash function is used to determine the value of a key. In addition, it is
used to transform the key into the address or the slot (bucket address) where the corresponding value
is to be sought.

2
UNIT 14: Hashing

A hash table is a data structure that stores data in multiple places at the same time. The information
is kept in an array with unique indexes for each entry. Data retrieval can be quick after we understand
the index values of the various data fields. As the size of the data grows larger, the search and insertion
operations in data structures become exceedingly fast. Hash tables keep data in arrays and utilise the
hash technique to create an index that may be used to find or insert elements.

14.2.1 Linear Probing


As you know, we use a hashing technique to index an array that is already in use. You can use this to find
subsequent free addresses in the array by navigating to subsequent slots until you search for a free slot.
This process is called linear probing.
Some of the fundamental operations of linear probing are as follows:
 Search: Find a data entry in a hash table
 Insert: Insert data item into hash table
 Delete: Hash table data item delete item

14.2.2 Hashing Method


There are several methods of hashing and each method has its own way of computing the hashing
address. The method which uniformly distributes the hashing addresses and leads to minimum collisions
is highly preferred. Following are the different methods of hashing:
 Division method
 Mid square method
 Folding method

Division Method
This method is also called the divide and remainder method. Here, the key is divided by any number n
and the remainder is taken to be the address.
Hence, the hash function is: h(k)=k mod n
This method produces the addresses ranging from 0 to n-1. The value of n should be chosen carefully.
If it is taken as the power of 10, say 100, then all the keys having identical last two digits certainly hash
into the same address. If n is chosen as an even integer, then all the even keys hash into odd addresses.
A good choice of n is any number, which is not divisible by 2, 3, 5, 10 or which is a prime number.

Mid Square Method


In this method, the key value is squared and a few digits are extracted from the middle of the squared
value. For example, if the hash address to be selected is of 3 digits and the key is 1234, then the square of
1234 is 1522756 and the middle value of 3 digits can be chosen as 227. This method produces the addresses
that are uniformly distributed over the range of the hashing function and if the keys are large enough,
then a selected part of the key can be squared.

3
Data Structures with Algorithms

Folding Method
Suppose the hash address that needs to be generated is of d digits. Now, when we use this method, the
key is first divided into the groups of d digits, starting from the right. Then, these groups are added to
compute their sum. The last d digit of the sum is considered as the hash address. For example, suppose
the key is 12345678 and the hash address desired is of 3 digits. Then, from the right, the key 12345678 is
divided into the groups of 3 digits, beginning from the right. Thereafter, the three groups of digits are
added to make the sum as 1035, as shown in the following example:
12/345/678
Sum of 12+345+678 is 1035
The last three digits of the sum are considered as hash address i.e. 035 or 35 as are the hash address.
Hence, this method is considered to be flexible and can be modified as per our need.

14.2.3 Search Functionality


To find an element, we must compute the hash code of the key supplied and then store the element in
the array using the hash code as an index. If the element cannot be recognised using the computed hash
code, we can use linear probing to find it.

14.2.4 Insert Operation


To insert an element, we must first compute the hash code of the key provided, and then use the hash
code as an index to position the element in the array. Using linear probing for vacant regions, we can
discover the resulting hash code.

14.2.5 Delete Operation


In order to delete an element, we need to compute ciphered hash of key value entered and place it through
ciphered hash as array index. By using linear probing, we can obtain the data item if it is not identified
by calculated ciphered hash. If it is located, reserve a counter fort value for managing execution of hash
table.

14.3 HASHING: THE SYMBOL TABLE


A substantial information store generated and handled by a compiler is referred to as a symbol table. It
stores information on the binding and scope of names and other items such as function names, classes,
variables, and objects.
During the syntactic and lexical analysis phases, a symbol table can be created.
Compilers take data during the analysis phase and generate code during the synthesis phase. It is used
to achieve time efficiency and is used in numerous phases of the compiler, as described below:
 Lexical analysis: It creates a symbol table from the most recent table records.
 Syntax analysis: Add details about the scope, attitude type, line of reference, and dimension in the
table.
 Semantic analysis: Search semantics using the information in the table.
 Intermediate code generation: Uses a symbol table to determine how and what type of run time is
allotted, as well as to add temporary variable data.

4
UNIT 14: Hashing

 Code optimization: For machine-dependent optimization, use the information in the symbol table.
 Target code generation: Uses the identifier’s address information from the table to generate code.

14.3.1 Implementation of Symbol Table


Generally, two tables are managed in hashing scheme. Symbol table and hash table are the significant
technique to deploy symbol tables.
Hash table refers to the array having index spanning from 0 to Size Of Table-1. There cords directs to the
identities of symbol table. For looking up identities (names), one must utilize hash procedure which has
been resulted in integer between 0 to tablesize-1. Lookup and insertion rapid-O(1). Quick search can be
applied but hashing becomes complex to deploy.

14.4 HASHING FUNCTIONS


In cryptography, hash functions are the most often employed mathematical functions for establishing
security. A hash function converts an arbitrary-size input value to a fixed-size value. As a result, the
input can be any length, but the output is always the same length. Hash values or hashes are the outputs
generated.
Another important thing to remember is that Hash Functions and Cryptography are independent.
Encryption is a two-way function, which means that encrypted data can only be decrypted via a private
key, making it reversible. Hashing is a one-way function that is also known as hash functions (i.e., hashes
cannot be reversed). Hashing, as a result, outperforms cryptography.

Password verification is the most common application of hashing. When the user enters the password,
the hash is created and compared to the hash in the database. The user can log in if the hashes are the
same; otherwise, the user must re-enter the password.
The following are some of the most commonly used hashing functions:
 MD: It is referred to as Message Digest. MD2, MD4, MD5, and MD6 are all possibilities. MD is a Hash
function with a 128-bit value.
 (SHA): It stands for Secure Hash Algorithm. It can be one of the following: SHA-0, SHA-1, SHA-2, or
SHA-3. The SHA-2 family includes versions such as SHA-224, SHA-256, SHA-384, and SHA-512.
 RIPEMD: It stands for Race Integrity Primitives Evaluation Message Digest is the acronym for RACE
Integrity Primitives Evaluation Message Digest. Many people utilise RIPEMD, RIPEMD -128, and
RIPEMD-160. This method is also available in 256 and 320-bit versions.
 Whirlpool: It is a modified variant of AES that uses a 512-bit hash function. Whirlpool comes in three
different versions: WHIRLPOOL-0, WHIRLPOOL-T, and WHIRLPOOL.

14.4.1 Properties of Hash Functions


To be efficient against numerous attacks from attackers, an ideal hash function should have the
following features. They are as follows:
 Resistance to Pre-Image
 The hash algorithm could not be reversed due to pre-image resistance.

5
Data Structures with Algorithms

 In other words, if any hash function “a” returns a hash value “c,” finding an input value “b” that
hashes to “c” should be extremely difficult.
 An attacker using a hash value seeking to find the input will be unable to do so because of this
property.
 Resistance to second pre –image
 Pre-image second to resistance, it should be extremely difficult to discover a different input that
generates the same hash value for any input and its hash value.
 To put it another way, if any hash function given an input “a” returns the hash value h(a), it
should be difficult to identify any other input value “b” for which h(b) = h(a) (a).
 Resistance to Collision
 Collision resistance implies that finding two different inputs of any length that create the same
hash should be extremely difficult. Collision-free hash function is another name for this property.
This attribute protects against the well-known hash collision attack.
 Simply put, finding any two inputs x and y for a given hash function h is extremely difficult,
hence h(x) = h (y).
 This collision-free property ensures that collisions for a given hash function should be difficult
to find.
 This attribute also makes it difficult for an attacker to locate two input values that produce the
same hash.

Some of the application of hash function is commonly used in the following fields:
 Authentication using Cryptocurrency Password Verification
 Check for data and file integrity
 Signature on a computer

14.5 STATIC AND DYNAMIC HASHING


Static hashing is a hashing technique that allows users to do lookups on a finished dictionary set (all
objects in the dictionary are final and not changing). Dynamic hashing, on the other hand, is a hashing
technique that creates and removes data buckets on demand.

14.5.1 Static Hashing


Static hashing is a method of shortening a string of letters in computer programming in which the set
of shorter characters remains the same length to make data access quicker.
Address of output data bucket remains identical in static hashing. If you create address for stud_Id=13
through hashing procedure modulo(5), the output address is 3. It doesn’t modify the bucket location. The
count of information buckets remains same if static hashing is used.
Static hashing function is classified into two methods are as follows:
 Open hashing: Rather than replacing the former one, the subsequent data cluster focused on
entering new record in the open hashing method. It is also called linear probing. For instance, A2 be
latest record that we need to add. The hash procedure creates address 222. It can be allocated at an

6
UNIT 14: Hashing

alternative slot. The program focused on subsequent data bucket 501 followed by allocation of A2 to
the bucket, as shown in Figure 1:

Data Buckets

220
221
Data Record
222 222
A2 HASH
501
502
503
New Record © guru99.com 504

Figure 1: Working of Open Hashing


 Close hashing: Here, if we have occupied buckets, vacant bucket is given to identical hash and
results will be associated after previous one.

Some of the commonly used static hash functions are as follows:


 Insert: This command adds a new record to the hash table. The hash key will be used to construct an
address for that record.
 Delete: This action fetches the record to be destroyed before deleting the record’s address from
memory.
 Update: The hash function first locates the record before updating it with new data.
 Query: This method, also known as a search, uses the hash function to find entries that match
specified criteria.

14.5.2 Dynamic Hashing


Dynamic hashing is a method of hashing, or decreasing a string of characters, in which the set of shorter
characters grows, shrinks, and reorganises to fit the way data is retrieved in computer programming.
All things listed in an object dictionary become dynamic and may change when dynamic hashing is
employed.
This facilitates the process of removing and adding data buckets on the fly. The hash technique aids
in the development of a huge number of elements. Static hashing has the drawback of shrinking
dynamically as the database expands or shrinks. Extended hashing is another name for it. The hash
function produces a larger number of values.
Some of the commonly used dynamic hash functions are as follows:
 Insertion: Calculates the bucket’s address. If the bucket is already full, new buckets can be added.
Furthermore, the hash function can be re-computed by adding additional bits to the hash value. It
is possible to add data to buckets that are not yet full.

7
Data Structures with Algorithms

 Querying: Examines the hash index’s depth value and uses those bits to calculate the bucket address.
 Update: This command run a query and updates the data.
 Delete: Executes a query to locate the data to be deleted.

14.6 COLLISION-RESOLUTION TECHNIQUES


Collision is said to occur when the hashing function produces the hash address (for a key), which is
already used by another key i.e. a key that already exists on that hash address. Such a situation is
not desirable and several methods are used to resolve collisions in this situation. There are two types
of collision resolution methods that resolve the problem which occurs on getting the same addresses
(synonyms) on the application of hash function on two different keys. These collision resolution methods
are:
 Open addressing
 Chaining

14.6.1 Open Addressing


When the collision occurs, the synonym is placed at some other adjacent empty location in the hash table.
That is, no location outside the hash table is allocated for storing the key values. For example, consider
our hash table, which has the hash addresses in the range 0 to 100 and in this hash table, we want to
store the records of a few students. On applying the hash function on some roll number, say 312456,
we get the hash address as 36. However, we find that the location is already occupied by a record of
another student with roll number, say 302446 (so the situation of collision has occurred); and we cannot
store our record at this hash address. So, we look for an adjacent hash address in the Open Addressing
scheme to see if it is vacant to store the record of the student with the roll number 312456. This implies
that the hash address 37 is searched and if it is found vacant, the record is stored at this location; and if
it is also occupied with some student record, the next adjacent location (i.e. hash address 38) is searched
and so on. The Open Addressing method can be further divided into the following methods:
 Linear probing method: The simplest method to resolve a collision is to start with the hash address
(the location where the collision occurred) and do a sequential search for an empty location. Hence,
this method searches in a straight line, and it is therefore called linear probing. While applying this
method, the array should be considered as circular so that when the last location is reached, the
search proceeds to the first location of the array.
The major drawback of linear probing is that, as the table becomes about half full, there is a
tendency towards clustering; that is, the records start to appear at continuous positions with small
gaps between them. Therefore, the sequential search needed to find an empty position, becomes
longer and longer. Hence, the performance of the hash table starts to degenerate.
The reason for clustering is the simple fact that on finding a collision, the increment added to
the hash address, to find the next vacant location, in the case of linear probing is ‘1’. To avoid this
problem of clustering, we should have different increment values every time to find the next empty
location. The solution is a method called rehashing ()where we use a second hash function. This
function generates an increment value, which is added to the hash address generated by the first
hash function (which generated the collision). This second hash address location is checked and if it
is vacant, the record is stored here; else again the second hash function is applied to generate some
other increment value until we get an empty location.

8
UNIT 14: Hashing

 Quadratic probing method: If there is a collision at the hash address h, then this method probes
the table at locations h+1, h+4, h+9,….., i.e., at locations h+i2 (%HASHSIZE) for i=1, 2, ….., that is, the
increment function is i2. This method reduces clustering and if the HASHSIZE is a power of 2, then
relatively few positions are probed.

14.6.2 Chaining
Here, the synonym is linked with the help of pointers i.e. extra space outside the hash table is allocated
and connected in the form of a linked list.
Chaining method is used for the linked storage and hence, in this method, each slot of the hash table
has a pointer to the linked list and all the elements hashed to a slot are placed in the linked list attached
to that slot. For example, we can see in Figure 2 that all the elements hashed to slot 3 are placed in the
linked list attached to that slot. Similarly, the element hashed to slot 0 is linked to that slot with the help
of a pointer:

0 NULL

1
2

3 NULL
4
5





9 NULL

Figure 2: Hash Table with pointers to the linked list


Linked storage has the following advantages:
 Space saving: When the hash table is maintained as a contiguous array, enough space has to be
set aside at the compilation time to avoid overflow. If the records themselves are in the hash table,
the empty positions (if there are many such positions) consume considerable space that might be
needed elsewhere. If, on the other hand, the hash table contains only pointers to the records, i.e. the
pointers that require only one word each, then the size of the hash table may be reduced to a large
extent.
 Collision resolution: It allows a simple and efficient way of collision handling. In this resolution, we
need to add only a link field to each record. Clustering is no problem at all, because the keys with
distinct hash addresses always go to distinct linked lists attached to d istinct slots. In other words,
there are 10 linked lists and one linked list is attached to each slot of the hash table. The elements are
hashed to their respective slots and hence linked (in the form of a node) to the linked list attached to
that slot.

9
Data Structures with Algorithms

 Over�low: A third advantage is that it is no longer necessary that the size of the hash table exceeds
the number of records. If there are more records than the entries in the table, then it means that
only some of the linked lists are now sure to contain more than one record. Even if the size of the
records is several times more than the size of the table, then the average length of the linked list
remains small and the sequential search on the appropriate list remains efficient.
 Deletion: Deletion becomes a quick and easy task in a chained hash table. For example, deletion of a
node from a linked list just requires the adjustment of address pointers.

The disadvantage of linked storage, however, is as follows:


 The use of space: All the links require space. If the record is large, then this space is negligible in
comparison with the space needed for the records themselves; but if the records are small, then
it is not so. For instance, if we use the chaining method and make the hash table quite small with
the number of entries (n entries) equal to the number of items, then we use 3n words of storage
altogether: n for the hash table, n for the keys, and n for the links to find the next node on each chain.
Now, since the hash table is nearly full, the result is many collisions and several items in some of the
chains. Hence, searching gets a bit slow. On the other hand, suppose we use open addressing, then
putting the same 3n words of storage entirely into the hash table means that it is only one-third full;
and therefore, there are relatively few collisions and the search for any given item gets faster.

Conclusion 14.7 CONCLUSION

 Hashing is a technique used for a quick retrieval of the desired data from a large volume of data.
 Hash Table or Hash Map is a two-dimensional structure where the data (associated with some key)
is mapped or hashed to some value.
 A substantial information store generated and handled by a compiler is referred to as a symbol
table.
 A hash function converts an arbitrary-size input value to a fixed-size value.
 Static hashing is a hashing technique that allows users to do lookups on a finished dictionary set (all
objects in the dictionary are final and not changing).
 Dynamic hashing is a hashing technique that creates and removes data buckets on demand.
 In open hashing the subsequent data cluster focused on entering new record in the open hashing
method.
 If we have occupied buckets, vacant bucket is given to identical hash and results will be associated
after previous one in close hashing.
 Chaining method is used for the linked storage.

14.8 GLOSSARY

 Hashing: This technique used for a quick retrieval of the desired data from a large volume of data
 Hash table: It is also known as hash map and it is a two-dimensional structure where the data is
mapped or hashed to some value
 Symbol table: A substantial information store generated and handled by a compiler is referred to
as a symbol table

10
UNIT 14: Hashing

 Hash function: It converts an arbitrary-size input value to a fixed-size value


 Static hashing: It is a hashing technique that allows users to do lookups on a finished dictionary set
(all objects in the dictionary are final and not changing)
 Dynamic hashing: This hashing technique that creates and removes data buckets on demand
 Opening hashing: In open hashing the subsequent data cluster focused on entering new record in
the open hashing method
 Close hashing: If we have occupied buckets, vacant bucket is given to identical hash and results will
be associated after previous one
 Chaining: This method is used for the linked storage

14.9 SELF-ASSESSMENT QUESTIONS

A. Essay Type Questions


1. Explain the concept of hashing.
2. During the syntactic and lexical analysis phases, a symbol table can be created. Discuss
3. Hashing is a one-way function that is also known as hash functions. Describe the significance of
hashing functions.
4. Explain the concept of static and dynamic hashing.
5. When the collision occurs, the synonym is placed at some other adjacent empty location in the hash
table. Discuss

14.10 ANSWERS AND HINTS FOR SELF-ASSESSMENT QUESTIONS

A. Hints for Essay Type Questions


1. Hashing is a technique used for a quick retrieval of the desired data from a large volume of data.
Refer to Section Introduction
2. A substantial information store generated and handled by a compiler is referred to as a symbol
table. Refer to Section Hashing: The Symbol Table
3. In cryptography, hash functions are the most often employed mathematical functions for
establishing security. Refer to Section Hashing Functions
4. Static hashing is a hashing technique that allows users to do lookups on a finished dictionary set
(all objects in the dictionary are final and not changing). Dynamic hashing, on the other hand, is a
hashing technique that creates and removes data buckets on demand. Refer to Section Static and
Dynamic Hashing
5. Collision is said to occur when the hashing function produces the hash address (for a key), which
is already used by another key i.e. a key that already exists on that hash address. Refer to Section
Collision Resolution Techniques

11
Data Structures with Algorithms

@ 14.11 POST-UNIT READING MATERIAL

 https://levelup.gitconnected.com/the-3-applications-of-hash-functions-fab1a75f4d3d
 https://web.stanford.edu/class/archive/cs/cs106b/cs106b.1208/lectures/hashing/Lecture27_Slides.
pdf

14.12 TOPICS FOR DISCUSSION FORUMS

 Discuss with your friends and classmates about the concept of hashing and their applications. Also,
discuss about the real world examples of hashing.

12

You might also like