0% found this document useful (0 votes)
21 views

Module 5 Hashing

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Module 5 Hashing

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 66

Data structures & Applications

(BCS304)

Dr. Hemavathi P
Associate Professor
Dept. of CSE
BIT
Text Books
 1. Ellis Horowitz and Sartaj Sahni,
Fundamentals of Data Structures in C, 2nd
Ed, Universities Press, 2014.
Module 5:
Hashing
Outline
 Hashing
Hash Table organizations,
Hashing Functions,
 Dynamic Hashing.
Hashing
• There are several searching techniques like linear search, binary search, search trees etc.
• In these techniques, time taken to search any particular element depends on the total
number of elements.

• Linear Search takes O(n) time to perform the search in unsorted arrays consisting of n
elements.
• Binary Search takes O(logn) time to perform the search in sorted arrays consisting of n
elements.
• It takes O(logn) time to perform the search in Binary Search Tree consisting of n elements.

• The main drawback of these techniques is-


• As the number of elements increases, time taken to perform the search also increases.
• This becomes problematic when total number of elements become too large.
Hashing
• There are many possibilities for representing the dictionary
and one of the best methods for representing is hashing.
• Hashing is a type of a solution which can be used in almost
all situations.
• Hashing is a computation technique in which hashing
functions take variable-length data as input and issue a
shortened fixed-length data as output. The output data is
often called a "Hash Code", "Key", or simply "Hash". The
data on which hashing works is called a "Data Bucket".
Characteristics of Hashing
• Hashing techniques come with the following
characteristics −
• The first characteristic is, hashing technique is
deterministic. Means, whatever number of times you
invoke the function on the same test variable, it delivers
the same fixed-length result.
• The second characteristic is its unidirectional action.
There is no way you can use the Key to retrieve the
original data. Hashing is irreversible.
Characteristics of Hashing
Applications of Hashing
• Hashing is applicable in the following area −
• Password verification
• Associating filename with their paths in operating systems
• Data Structures, where a key-value pair is created in which the
key is a unique value, whereas the value associated with the keys
can be either same or different for different keys.
• Board games such as Chess, tic-tac-toe, etc.
• Graphics processing, where a large amount of data needs to be
matched and fetched.
• Database Management Systems where phenomenal records are
required to be searched, queried, and matched for retrieval. For
example, DBMS used in banking or large public transport
reservation software.
Hashing / Hashing Mechanism
• Hashing is a well-known technique to search any particular element
among several elements.
• It minimizes the number of comparisons while performing the search.
• Unlike other searching techniques,
• Hashing is extremely efficient.
• The time taken by it to perform the search does not depend upon the total
number of elements.
• It completes the search with constant time complexity O(1) and at worst case
it is O(n)
• This method generally uses the hash functions to map the keys into a
table, which is called a hash table.
Hash Table
• Hash table is a type of data structure which is used for
storing and accessing data very quickly. Insertion of data in
a table is based on a key value.
• Hence every entry in the hash table is defined with some
key.
• By using this key data can be searched in the hash table by
few key comparisons and then searching time is dependent
upon the size of the hash table.
Hash Function
• Hash functions are mathematical functions that are
executed to generate addresses of data records. Hash
functions use memory locations that store data, called ‘Data
Buckets’.
• Hash functions are used in cryptographic signatures,
securing privacy of vulnerable data, and verifying
correctness of the received files and texts. In computation,
hashing is used in data processing to locate a single string
of data in an array, or to calculate direct addresses of
records on the disk by requesting its Hash Code or Key.
Hash Function
• The function that transforms the data into hash value
is called hash function
• OR
• Hash function is a function which is applied on a key by
which it produces an integer, which can be used as an
address of hash table. Hence one can use the same hash
function for accessing the data from the hash table.
• In this. the integer returned by the hash function is called
hash key.
Hashing
• Key-value pairs are stored in a fixed size table called a hash table.
• A hash table is partitioned into many buckets.
• Each bucket has many slots.
• Each slot holds one record.
• A hash function h(x) transforms the identifier (key) into an address in the hash
table
Hash table

s slots
0 1 s-1
0 . . .

1
b buckets

. . .
. . .
. . .
. . .
b-1
Properties of a good hash function
1. Low cost
• Execution cost and searching cost should be less
2. Determinism
• Hash procedure must be deterministic i.e same hash value must be generated
for a given input value excluding time of day and memory address of the
object
3. Uniformity
• Must map the keys as evenly as possible over its output range which
minimizes the number of collisions
Types of Hash Functions
• There are various types of hash functions available such as-
• Division Method
• Mid Square Method
• Folding Method
• Multiplication Method

• It depends on the user which hash function wants to use.


1. Division method
• This is the most simple and easiest method to generate
a hash value. The hash function divides the value k by
M and then uses the remainder obtained, then
Formula:
h(K) = k mod M
k is the key value, and
M is the size of the hash table.

• It is best suited that M is a prime number as that can


make sure the keys are more uniformly distributed. The
hash function is dependent upon the remainder of a
division.
Ex:
• k = 12345
M = 95
h(12345) = 12345 mod 95
= 90
• k = 1276
M = 11
h(1276) = 1276 mod 11
=0
• Pros:
1.This method is quite good for any value of M.
2.The division method is very fast since it requires only a
single division operation.
• Cons:
1.This method leads to poor performance since
consecutive keys map to consecutive hash values in the
hash table.
2.Sometimes extra care should be taken to choose the
value of M.
2. Mid square method
• The mid-square method is a very good hashing method. It involves two steps to
compute the hash value-

• Square the value of the key k i.e. k2


• Extract the middle r digits as the hash value.
Formula:

h(K) = h(k x k)

Here,
k is the key value.

• The value of r can be decided based on the size of the table


Example:
1.Suppose the hash table has 100 memory locations. So r = 2 because
two digits are required to map the key to the memory location.
k = 60
k x k = 60 x 60
= 3600
h(60) = 60
The hash value obtained is 60
2. consider that if we want to place a record of 3101 and the size of
table is 1000.
So 3101*3101=9616201 i.e. h (3101) = 162 (middle 3 digit)
Pros:

• The performance of this method is good as most or all digits of the key
value contribute to the result. This is because all digits in the key
contribute to generating the middle digits of the squared result.
• The result is not dominated by the distribution of the top digit or
bottom digit of the original key value.
Cons:

• The size of the key is one of the limitations of this method, as the key is
of big size then its square will double the number of digits.
• Another disadvantage is that there will be collisions but we can try to
reduce collisions.
3. Digit folding method
• Divide the key-value k into a number of parts i.e. k1, k2, k3,….,kn, where each
part has the same number of digits except for the last part that can have lesser
digits than the other parts.
• Add the individual parts. The hash value is obtained by ignoring the last carry if
any.
Formula:

k = k1, k2, k3, k4, ….., kn


s = k1+ k2 + k3 + k4 +….+ kn
h(K)= s

Here,
s is obtained by adding the parts of the key k
Example:
k = 12345
k1 = 12, k2 = 34, k3 = 5
s = k1 + k2 + k3
= 12 + 34 + 5
= 51
h(K) = 51

Note:
The number of digits in each part varies depending upon the size of the hash
table. Suppose for example the size of the hash table is 100, then each part
must have two digits except for the last part which can have a lesser number
of digits.
4. Multiplication Method
This method involves the following steps:
1.Choose a constant value A such that 0 < A < 1.
2.Multiply the key value with A.
3.Extract the fractional part of kA.
4.Multiply the result of the above step by the size of the hash table
i.e. M.
5.The resulting hash value is obtained by taking the floor of the result
obtained in step 4.
• Formula:
h(K) = floor (M (kA mod 1))
Here,
M is the size of the hash table.
k is the key value.
A is a constant value.
Example:
k = 12345
A = 0.357840
M = 100
h(12345) = floor[ 100 (12345*0.357840 mod 1)]
= floor[ 100 (4417.5348 mod 1) ]
= floor[ 100 (0.5348) ]
= floor[ 53.48 ]
= 53
Pros:
• The advantage of the multiplication method is that it can work with any
value between 0 and 1, although there are some values that tend to
give better results than the rest.
Cons:
• The multiplication method is generally suitable when the table size is
the power of two, then the whole process of computing the index by the
key using multiplication hashing is very fast.
Types of Hashing
• Static Hashing
• Dynamic Hashing
Static Hashing
• It is a hashing technique that enables users to lookup a definite data
set. Meaning, the data in the directory is not changing, it is "Static" or
fixed. In this hashing technique, the resulting number of data buckets
in memory remains constant.
• Operations Provided by Static Hashing
• Delete − Search a record address and delete a record at the same address or
delete a chunk of records from records for that address in memory.
• Insertion − While entering a new record using static hashing, the hash
function (h) calculates bucket address "h(K)" for the search key (k), where the
record is going to be stored.
• Search − A record can be obtained using a hash function by locating the
address of the bucket where the data is stored.
• Update − It supports updating a record once it is traced in the data bucket.
Static Hashing
• Advantages of Static Hashing
• Static hashing is advantageous in the following ways −
• Offers unparalleled performance for small-size databases.
• Allows Primary Key value to be used as a Hash Key.
• Disadvantages of Static Hashing
• Static hashing comes with the following disadvantages −
• It cannot work efficiently with the databases that can be scaled.
• It is not a good option for large-size databases.
• Bucket overflow issue occurs if there is more data and less memory.
Dynamic Hashing
• It is a hashing technique that enables users to lookup a dynamic data
set. Means, the data set is modified by adding data to or removing the
data from, on demand hence the name ‘Dynamic’ hashing. Thus, the
resulting data bucket keeps increasing or decreasing depending on the
number of records.
• In this hashing technique, the resulting number of data buckets in
memory is ever-changing.
• Operations Provided by Dynamic Hashing
• Delete − Locate the desired location and support deleting data (or a chunk of
data) at that location.
• Insertion − Support inserting new data into the data bucket if there is a space
available in the data bucket.
• Query − Perform querying to compute the bucket address.
• Update − Perform a query to update the data.
Dynamic Hashing
• Advantages of Dynamic Hashing
• Dynamic hashing is advantageous in the following ways −
• It works well with scalable data.
• It can handle addressing large amount of memory in which data size is always
changing.
• Bucket overflow issue comes rarely or very late.
• Disadvantages of Dynamic Hashing
• Dynamic hashing comes with the following disadvantage −
• The location of the data in memory keeps changing according to the bucket
size. Hence if there is a phenomenal increase in data, then maintaining the
bucket address table becomes a challenge.
Differences between Static and Dynamic Hashing
Collision in Hashing
• Hash function is used to compute the hash value for a key.
• Hash value is then used as an index to store the key in the hash table.
• Hash function may return the same hash value for two or more keys.
• When the hash value of a key maps to an already occupied bucket of
the hash table, it is called as a Collision.
Collision Resolution Techniques
• Collision Resolution Techniques are the techniques used for resolving
or handling the collision.
Collision Resolution
Techniques

Separate Chaining Open Addressing


(Open Hashing) (Closed Hashing)

Linear Probing

Quadratic Probing

Double Hashing
Separate Chaining
• To handle the collision,
• This technique creates a linked list to the slot for which collision occurs.
• The new key is then inserted in the linked list.
• These linked lists to the slots appear like chains.
• That is why, this technique is called as separate chaining.
Example-Separate Chaining
• Using the hash function ‘key mod 7’, insert the following sequence of
keys in the hash table-
• 50, 700, 76, 85, 92, 73 and 101
Step-1
• Draw an empty hash table.
• For the given hash function, the possible range of hash values is [0, 6].
• So, draw an empty hash table consisting of 7 buckets as-
Step-2
• Insert the given keys in the hash table one by one.
• The first key to be inserted in the hash table = 50.
• Bucket of the hash table to which key 50 maps = 50 mod 7 = 1.
• So, key 50 will be inserted in bucket-1 of the hash table as-
Step-3
• The next key to be inserted in the hash table = 700.
• Bucket of the hash table to which key 700 maps = 700 mod 7 = 0.
• So, key 700 will be inserted in bucket-0 of the hash table as-
Step-4
• The next key to be inserted in the hash table = 76.
• Bucket of the hash table to which key 76 maps = 76 mod 7 = 6.
• So, key 76 will be inserted in bucket-6 of the hash table as-
Step-5
• The next key to be inserted in the hash table = 85.
• Bucket of the hash table to which key 85 maps = 85 mod 7 = 1.
• Since bucket-1 is already occupied, so collision occurs.
• Separate chaining handles the collision by creating a linked list to bucket-1.
• So, key 85 will be inserted in bucket-1 of the hash table as-
Step-6
• The next key to be inserted in the hash table = 92.
• Bucket of the hash table to which key 92 maps = 92 mod 7 = 1.
• Since bucket-1 is already occupied, so collision occurs.
• Separate chaining handles the collision by creating a linked list to bucket-1.
• So, key 92 will be inserted in bucket-1 of the hash table as-
Step-7
• The next key to be inserted in the hash table = 101.
• Bucket of the hash table to which key 101 maps = 101 mod 7 = 3.
• Since bucket-3 is already occupied, so collision occurs.
• Separate chaining handles the collision by creating a linked list to bucket-3.
• So, key 101 will be inserted in bucket-3 of the hash table as-
Algorithm to insert an item using
Chaining Approach

Void insert_chaining(int item , NODE a[],n)


{ int h_value;
h_value=item%n;
a[h_value]=insert_rear(item,a[h_value]);
}
Algorithm to search for an item

Int search_ht(int key , NODE a[],n)


{ int h_value;
NODE cur;
h_value=item%n;
cur=search(key,a[h_value]);
if(cur==NULL)return 0;
return 1;
}
Cont…
NODE Search(int key, NODE first)
{ NODE cur;
if(first==NULL) return NULL;
cur=first;
while (cur!=NULL)
{ if(key==cur->info) return cur;
cur=cur->link;
}
return NULL;
}
Open Addressing
• In open addressing,
• Unlike separate chaining, all the keys are stored inside the hash table.
• No key is stored outside the hash table.

• Techniques used for open addressing are-


• Linear Probing
• Quadratic Probing
• Double Hashing
Operations in Open Addressing
• Insert Operation:
• Hash function is used to compute the hash value for a key to be inserted.
• Hash value is then used as an index to store the key in the hash table.

• In case of collision,
• Probing is performed until an empty bucket is found.
• Once an empty bucket is found, the key is inserted.
• Probing is performed in accordance with the technique used for open
addressing.
1. Linear Probing
• In linear probing,
• When collision occurs, we linearly probe for the next bucket.
• We keep probing until an empty bucket is found.

• Advantage-

• It is easy to compute.

• Disadvantage-

• The main problem with linear probing is clustering.


• Many consecutive elements form groups.
• Then, it takes time to search an element or to find an empty bucket.
Linear Probing
Example- Linear Probing
Algorithm to insert an item using
Linear Probing
Void insert_LP(item , a[],n)
{ int i, index, h_value;
h_value=item%n;
for(i=0;i<n;i++)
{ index=(h_value + i)%n;
if(a[index]==0) break;
} if(a[index]==0)
a[index]=item
else printf(“hash table is full”);
}
Open Addressing
• Search Operation:
• To search any particular key,
• Its hash value is obtained using the hash function used.
• Using the hash value, that bucket of the hash table is checked.
• If the required key is found, the key is searched.
• Otherwise, the subsequent buckets are checked until the required key or an
empty bucket is found.
• The empty bucket indicates that the key is not present in the hash table.
Algorithm to search for an item
when inserted using Linear Probing
Void search_hash table(key , a[],n)
{ int i, index, h_value;
h_value=key%n;
for(i=0;i<n;i++)
{ index=(h_value + i)%n;
if(key==a[index]) return 1;
if(a[index]==0) return 0;
}
if(i==n) return 0;
}
Linear Probing
• Delete Operation:
• The key is first searched and then deleted.
• After deleting the key, that particular bucket is marked as “deleted”.
2. Quadratic Probing
• In quadratic probing,
• When collision occurs, we probe for i2‘th bucket in ith iteration.
• We keep probing until an empty bucket is found.
Quadratic Probing
Example – Quadratic Probing
Double Hashing
• In double hashing,
• We use another hash function hash2(x) and look for i * hash2(x)
bucket in ith iteration.
• It requires more computation time as two hash functions need to be
computed.
Double Hashing
Example –Double Hashing
Separate chaining Vs Open
Addressing
Comparison of Open Addressing
Techniques
Function to sort the elements
void hash_sort(int a[], int n)
{
NODE b[10] , temp;
int i, j, digit, h_value;
h_value=hash(a,n);
for(i=0 ;i<10 ; i++)
b[i]=NULL;

for(i=0 ; i<n ;i++)


{ digit=a[i]/h_value
b[digit]=insert( a[i],b[digit]);
}
Cont…

For(i=j=0;i<10;i++)
{ temp=b[i];
while(temp!=NULL)
{ a[j++]=temp->info;
temp=temp->link;
}
}
}

You might also like