CH 4 Hash Table
CH 4 Hash Table
Hash Table
Chapter 4: Hash Table (6 lectures)
4.1 Concept of hashing
4.2 Terminologies – Hash table,Hash function, Bucket,
Hash address, collision, synonym, overflow etc.
4.3 Properties of good hash function
4.4 Hash functions : division function, MID square ,
folding methods
4.5 Collision resolution techniques
4.5.1 Open Addressing - Linear probing, quadratic
probing, rehashing
4.5.2 Chaining - Coalesced , separate chaining
What is Hash Table?
• Hash table or hash map is a data structure used to store key-
value pairs.
• It is a collection of items stored to make it easy to find them
later.
• It uses a hash function to compute an index into an array of
buckets or slots from which the desired value can be found.
• It is an array of list where each list is known as bucket.
• It contains value based on the key.
• Hash table is used to implement the map interface and
extends Dictionary class.
• Hash table is synchronized and contains only unique
elements.
Hash table is one of the most important data
structures that uses a special function known as a
hash function that maps a given value with a key to
access the elements faster. Hash table stores some
information, and the information has basically two
main components, i.e., key and value. The hash
table can be implemented with the help of an
associative array. The efficiency of mapping
depends upon the efficiency of the hash function
used for mapping.
This figure shows the hash table with the size of n = 10.
Each position of the hash table is called as Slot. In this
hash table, there are n slots = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}.
Slot 0, slot 1, slot 2 and so on.
Hash table contains no items, so every slot is empty.
The mapping between an item and the slot where item
belongs in the hash table is called the hash function.
The hash function takes any item in the collection and
returns an integer in the range of slot names between 0 to
n-1.
Hash function
• Suppose we have integer items {26, 70, 18, 31,
54, 93}. One common method of determining a
hash key is the division method of hashing and
the formula is :
Hash Key = Key Value % Number of Slots in the
Table
Division method or remainder method takes an
item and divides it by the table size and returns
the remainder as its hash value.
Data Item Value % No. of Slots Hash Value
26 26 % 10 = 6 6
70 70 % 10 = 0 0
27 18 % 10 = 8 8
31 31 % 10 = 1 1
54 54 % 10 = 4 4
93 93 % 10 = 3 3
After computing the hash values, we can insert each item into the hash
table at the designated position as shown in the figure below:
• In the hash table, 6 of the 10 slots are occupied, it is
referred to as the load factor and denoted by λ λ =
No. of items / table size.
For example , λ = 6/10.
• It is easy to search for an item using hash function
where it computes the slot name for the item and
then checks the hash table to see if it is present.
• Constant amount of time O(1) is required to
compute the hash value and index of the hash table
at that location.
What is Hashing?
• Hashing is the process of mapping large amount of data item to
smaller table with the help of hashing function.
• Hashing is also known as Hashing Algorithm or Message Digest
Function.
• It is a technique to convert a range of key values into a range of
indexes of an array.
• It is used to facilitate the next level searching method when compared
with the linear or binary search.
• Hashing allows to update and retrieve any data entry in a constant
time O(1).
• Constant time O(1) means the operation does not depend on the size
of the data.
• Hashing is used with a database to enable items to be retrieved more
quickly.
• It is used in the encryption and decryption of digital signatures.
The main idea behind the hashing is to create the
(key/value) pairs. If the key is given, then the
algorithm computes the index at which the value
would be stored. It can be written as:
Index = hash(key)
Hashing is a well-known technique to search any
particular element among several elements.
It minimizes the number of comparisons while
performing the search.
Advantage-
Unlike other searching techniques, Hashing is
extremely efficient.
The time taken by it to perform the search does
not depend upon the total number of elements.
It completes the search with constant time
complexity O(1).
What is Hash Function?
• A fixed process converts a key to a hash key is known as
a Hash Function.
• This function takes a key and maps it to a value of a
certain length which is called a Hash value or Hash.
• Hash value represents the original string of characters, but
it is normally smaller than the original.
• It transfers the digital signature and then both hash value
and signature are sent to the receiver. Receiver uses the
same hash function to generate the hash value and then
compares it to that received with the message.
• If the hash values are same, the message is transmitted
without errors.
The properties of a good hash function are-
It is efficiently computable.
It minimizes the number of collisions.
It distributes the keys uniformly over the table.
Hashing Mechanism- In hashing,
An array data structure called as Hash table is used
to store the data items.
Based on the hash key value, data items are
inserted into the hash table.
Hash Key Value- Hash key value is a special value
that serves as an index for a data item.
It indicates where the data item should be be
stored in the hash table.
Hash key value is generated using a hash function.
Hash function is a function that maps any big number or string to a
small integer value.
Hash function takes the data item as an input and returns a small
integer value as an output. The small integer value is called as a
hash value. Hash value of the data item is then used as an index for
storing it into the hash table.
Types of Hash Functions-
Mid Square Hash Function
Division Hash Function
Folding Hash Function etc
Collision -When the two different values have the same
value, then the problem occurs between the two
values, known as a collision. In the above example,
the value is stored at index 6. If the key value is 26,
then the index would be:
h(26) = 26%10 = 6
Therefore, two values are stored at the same index,
i.e., 6, and this leads to the collision problem. To
resolve these collisions, we have some techniques
known as collision techniques.
The following are the collision techniques:
Open Hashing: It is also known as closed addressing.
Closed Hashing: It is also known as open addressing.
Open Hashing - In Open Hashing, one of
the methods used to resolve the
collision is known as a chaining
method.
Let's first understand the chaining to resolve the collision.
Suppose we have a list of key values
A = 3, 2, 9, 6, 11, 13, 7, 12 where m = 10, and h(k) = 2k+3
In this case, we cannot directly use h(k) = ki/m as h(k) = 2k+3
The index of key value 3 is:
index = h(3) = (2(3)+3)%10 = 9
The value 3 would be stored at the index 9.
The index of key value 2 is:
index = h(2) = (2(2)+3)%10 = 7
The value 2 would be stored at the index 7.
The index of key value 9 is:
index = h(9) = (2(9)+3)%10 = 1
The value 9 would be stored at the index 1.
The index of key value 6 is:
index = h(6) = (2(6)+3)%10 = 5
The value 6 would be stored at the index 5.
The index of key value 11 is:
index = h(11) = (2(11)+3)%10 = 5
The value 11 would be stored at the index 5. Now, two
values (6, 11) stored at the same index, i.e., 5. This leads
to the collision problem, so we will use the chaining
method to avoid the collision. We will create one more list
and add the value 11 to this list. After the creation of the
new list, the newly created list will be linked to the list
having value 6.
The index of key value 13 is:
index = h(13) = (2(13)+3)%10 = 9
The value 13 would be stored at index 9. Now, we have two
values (3, 13) stored at the same index, i.e., 9. This leads
to the collision problem, so we will use the chaining
method to avoid the collision. We will create one more list
and add the value 13 to this list. After the creation of the
new list, the newly created list will be linked to the list
having value 3.