0% found this document useful (0 votes)
4 views16 pages

HASHING

Uploaded by

Jasmine D
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views16 pages

HASHING

Uploaded by

Jasmine D
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

IIIIII// CCSS88335911--DDAATTAA

IIIIYYRR HASHING : SSTTRRUUCCTTUURREESS


Hashing is a technique that is used to store, retrieve and find data in the data structure
called Hash Table. It is used to overcome the drawback of Linear Search (Comparison) &
Binary Search (Sorted order list). It involves two important concepts-
 Hash Table
 Hash Function
Hash table
A hash table is a data structure that is used to store and retrieve data (keys) very
quickly.
It is an array of some fixed size, containing the
keys. Hash table run from 0 to Tablesize – 1.
Each key is mapped into some number in the range 0 to Tablesize –
1. This mapping is called Hash function.
Insertion of the data in the hash table is based on the key value obtained from
the hash function.
Using same hash key value, the data can be retrieved from the hash table by
few or more Hash key comparison.
The load factor of a hash table is calculated using the formula:

(Number of data elements in the hash table) / (Size of the hash

table)

Factors affecting Hash Table Design

Hash
function
Table size.
Collision handling scheme

0
1
2
3
.
. Simple Hash table with table size = 10
8
9

Page 1 of 271
IIIIII// CCSS88335911--DDAATTAA
IIIIYYRR SSTTRRUUCCTTUURREESS

Hash function:
It is a function, which distributes the keys evenly among the cells in the
Hash Table.
Using the same hash function we can retrieve data from the hash table.
Hash function is used to implement hash table.
The integer value returned by the hash function is called hash key.
If the input keys are integer, the commonly used hash function is

H ( key ) = key % Tablesize

typedef unsigned int index;


index Hash ( const char *key , int Tablesize )
{
unsigned int Hashval = 0 ;
while ( * key ! = „ \0 „ )
Hashval + = * key ++ ;
return ( Hashval % Tablesize )
;
}

A simple hash function

Types of Hash Functions


1. Division Method
2. Mid Square Method
3. Multiplicative Hash Function
4. Digit Folding
1. Division Method:
It depends on remainder of
division. Divisor is Table Size.
Formula is ( H ( key ) = key % table size )
IIIIII// CCSS88335911--DDAATTAA
IIIIYYRR Page 259 of 271
SSTTRRUUCCTTUURREESS
IIIIII// CCSS88335911--DDAATTAA
IIIIYYRR SSTTRRUUCCTTUURREESS

E.g. consider the following data or record or key (36, 18, 72, 43, 6) table size = 8

2. Mid Square Method:


We first square the item, and then extract some portion of the resulting digits. For
example, if the item were 44, we would first compute 442=1,936. Extract the middle two digit 93 from
the answer. Store the key 44 in the index 93.

93 44

3. Multiplicative Hash Function:


Key is multiplied by some constant value.
Hash function is given by,
H(key)=Floor (P * ( key * A ))
P = Integer constant [e.g.
P=50]
A = Constant real number [A=0.61803398987],suggested by Donald Knuth to use this
constant

Page 260 of
271
IIIIII// CCSS88335911--DDAATTAA
IIIIYYRR SSTTRRUUCCTTUURREESS

E.g. Key 107


H(107)=Floor(50*(107*0.61803398987))
=Floor(3306.481845)
H(107)=3306
Consider table size is 5000

3306 107

4999

4. Digit Folding Method:

The folding method for constructing hash functions begins by dividing the item into
equal-size pieces (the last piece may not be of equal size). These pieces are then added together
to give the resulting hash key value. For example, if our item was the phone number 436-555-
4601, we would take the digits and divide them into groups of 2 (43, 65, 55, 46, 01). After the
addition, 43+65+55+46+01, we get 210. If we assume our hash table has 11 slots, then we need
to perform the extra step of dividing by 11 and keeping the remainder. In this case 210 % 11 is
1, so the phone number 436-555-4601 hashes to slot 1.

6-555-4601

Page 261 of
271
IIIIII// CCSS88335911--DDAATTAA
IIIIYYRR SSTTRRUUCCTTUURREESS

Collision:
If two more keys hashes to the same index, the corresponding records cannot be stored in the
same location. This condition is known as collision.
Characteristics of Good Hashing Function:

 It should be Simple to compute.


 Number of Collision should be less while placing record in Hash Table.
 Hash function with no collision  Perfect hash function.
 Hash Function should produce keys which are distributed uniformly in hash table.
 The hash function should depend upon every bit of the key. Thus the hash
function that simply extracts the portion of a key is not suitable.
Collision Resolution Strategies / Techniques (CRT):
If collision occurs, it should be handled or overcome by applying some technique. Such
technique is called CRT.
There are a number of collision resolution techniques, but the most popular are:
 Separate chaining (Open Hashing)
 Open addressing. (Closed Hashing)
Linear Probing
Quadratic
Probing Double
Hashing
Separate chaining (Open Hashing)
Open hashing technique.
Implemented using singly linked list
concept. Pointer (ptr) field is added to each
record.
When collision occurs, a separate chaining is maintained for colliding
data. Element inserted in front of the list.
H (key) =key % table size
Two operations are there:-
 Insert
 Find

Page 262 of
271
IIIIII// CCSS88335911--DDAATTAA
IIIIYYRR SSTTRRUUCCTTUURREESS

Structure Definition for Node


typedef Struct node *Position;
Struct node
{
int data; defines the nodes
Position next;
};

Structure Definition for Hash Table


typedef Position List;
struct Hashtbl
{ Defines the hash table which contains
int Tablesize; array of linked
list List * theLists;
};

Initialization for Hash Table for Separate Chaining


Hashtable initialize(int Tablesize)
{
HashTable H;
int i;
H = malloc (sizeof(struct HashTbl)); Allocates table
H  Tablesize = NextPrime(Tablesize);
Hthe Lists=malloc(sizeof(List) * HTablesize);  Allocates array of list
for( i = 0; i < H  Tablesize; i++ )
{
H  TheLists[i] = malloc(Sizeof(Struct node));  Allocates list headers
H  TheLists[i]  next = NULL;
}
return H;
}
Insert Routine for Separate Chaining
void insert (int Key, Hashtable H)
{
Position P, newnode; *[Inserts element in the Front of the list
always]* List L;

Page 263 of 271


IIIIII// CCSS88339511--DDAATTAA
IIIIYYRR SSTTRRUUCCTTUURREESS

P = find ( key, H
); if(P = = NULL)
{
newnode = malloc(sizeof(Struct node));
L = H  TheLists[Hash(key,Tablesize)];
newnode  nex t= L  next;
newnode  data = key;
L  next = newnode;
}}
Position find( int key, Hashtable H){
Position P, List L;
L = H TheLists[Hash(key,Tablesize)];
P = L  next;
while(P != NULL && P  data != key)
P = P  next;
return P;}
If two keys map to same value, the elements are chained together.
Initial configuration of the hash table with separate chaining. Here we use SLL(Singly Linked List)
concept to chain the elements.

NULL
0
NULL
1
NULL
2
NULL
3
NULL
4
NULL
5
NULL
6
NULL
7
NULL
8
NULL
9

Page 264 of
271
IIIIII// CCSS88335911--DDAATTAA
IIIIYYRR SSTTRRUUCCTTUURREESS

Insert the following four keys 22 84 35 62 into hash table of size 10 using separate
chaining. The hash function is
H(key) = key % 10
1. H(22) = 22 % 10 =2 2. 84 % 10 = 4

3.H(35)=35%10=5 4. H(62)=62%10=2

Page 265 of
271
IIIIII// CCSS88335911--DDAATTAA
IIIIYYRR SSTTRRUUCCTTUURREESS

Advantages
1. More number of elements can be inserted using array of Link List
Disadvantages
1. It requires more pointers, which occupies more memory space.
2.Search takes time. Since it takes time to evaluate Hash Function and also to traverse
the List
Open Addressing
Closed Hashing
Collision resolution technique
Uses Hi(X)=(Hash(X)+F(i))mod Tablesize
When collision occurs, alternative cells are tried until empty cells are
found. Types:-
 Linear Probing
 Quadratic Probing
 Double
Hashing Hash function
 H(key) = key % table size.
Insert Operation
 To insert a key; Use the hash function to identify the list to which
the element should be inserted.
 Then traverse the list to check whether the element is already present.
 If exists, increment the count.
 Else the new element is placed at the front of the list.
Linear Probing:
Easiest method to handle collision.
Apply the hash function H (key) = key % table size
Hi(X)=(Hash(X)+F(i))mod Tablesize,where F(i)=i.
How to Probing:
first probe – given a key k, hash to H(key)
second probe – if H(key)+f(1) is occupied, try H(key)
+f(2) And so forth.
Probing Properties:
We force f(0)=0
The ith probe is to (H (key) +f (i)) %table size.
If i reach size-1, the probe has failed.
Depending on f (i), the probe may fail
sooner. Long sequences of probe are costly.
Probe Sequence is:
H (key) % table size
H (key)+1 % Table size
H (Key)+2 % Table
size

Page 266 of
271
IIIIII// CCSS88335911--DDAATTAA
IIIIYYRR SSTTRRUUCCTTUURREESS

1. H(Key)=Key mod Tablesize


This is the common formula that you should apply for any
hashing If collocation occurs use Formula 2
2. H(Key)=(H(key)+i) Tablesize
Where i=1, 2, 3, …… etc
Example: - 89 18 49 58 69; Tablesize=10
1. H(89) =89%10
=9
2. H(18) =18%10
=8
3. H(49) =49%10
=9 ((coloids with 89.So try for next free cell using formula 2))
i=1 h1(49) = (H(49)+1)%10
= (9+1)%10
=10%10
=0
4. H(58) =58%10
=8 ((colloids with 18))
i=1 h1(58) = (H(58)
+1)%10
= (8+1) %10
=9%10
=9 =>Again collision
i=2 h2(58) =(H(58)+2)%10
=(8+2)%10
=10%10
=0 =>Again collision

Page 267 of
271
IIIIII// CCSS88339511--DDAATTAA
IIIIYYRR SSTTRRUUCCTTUURREESS

EMPTY 89 18 49 58 69
0 49 49 49
1 58 58
2 69
3
4
5
6
7
8 18 18 18
9 89 89 89 89

Linear probing

Quadratic Probing
To resolve the primary clustering problem, quadratic probing can be used. With quadratic
probing, rather than always moving one spot, move i2 spots from the point of collision, where
i is the number of attempts to resolve the collision.
Another collision resolution method which distributes items more evenly.

Page 268 of
271
IIIIII// CCSS88335911--DDAATTAA
IIIIYYRR SSTTRRUUCCTTUURREESS

From the original index H, if the slot is filled, try cells H+12, H+22, H+32,.., H + i2
with wrap-around.
Hi(X)=(Hash(X)+F(i))mod Tablesize,F(i)=i2
Hi(X)=(Hash(X)+ i2)mod Tablesize

Limitation: at most half of the table can be used as alternative locations to resolve collisions.
This means that once the table is more than half full, it's difficult to find an empty spot. This
new problem is known as secondary clustering because elements that hash to the same hash
key will always probe the same alternative cells.
Double Hashing
Double hashing uses the idea of applying a second hash function to the key when a
collision occurs. The result of the second hash function will be the number of positions forms
the point of collision to insert.
There are a couple of requirements for the second function:
It must never evaluate to 0 must make sure that all cells can be probed. Hi(X)=(Hash(X)
+i*Hash2(X))mod Tablesize
A popular second hash function is:
Hash2 (key) = R - (key % R) where R is a prime number that is smaller than the size of the
table.

Page 269 of 271


IIIIII// CCSS88335911--DDAATTAA
IIIIYYRR SSTTRRUUCCTTUURREESS

Rehashing
Once the hash table gets too full, the running time for operations will start to take too
long and may fail. To solve this problem, a table at least twice the size of the original will
be built and the elements will be transferred to the new table.
Advantage:
A programmer doesn‟t worry about table system.
Simple to implement
Can be used in other data structure as well
The new size of the hash table:
should also be prime
will be used to calculate the new insertion spot (hence the name rehashing)
This is a very expensive operation! O(N) since there are N elements to rehash and the
table size is roughly 2N. This is ok though since it doesn't happen that often.
The question becomes when should the rehashing be applied?
Some possible answers:
once the table becomes half
full once an insertion fails

Page 270 of 271


IIIIII// CCSS88339511--DDAATTAA
IIIIYYRR SSTTRRUUCCTTUURREESS

once a specific load factor has been reached, where load factor is the ratio of
the number of elements in the hash table to the table size
Extendible Hashing
Extendible Hashing is a mechanism for altering the size of the hash table to accommodate
new entries when buckets overflow.
Common strategy in internal hashing is to double the hash table and rehash each entry.
However, this technique is slow, because writing all pages to disk is too expensive.
Therefore, instead of doubling the whole hash table, we use a directory of pointers to
buckets, and double the number of buckets by doubling the directory, splitting just
the bucket that overflows.
Since the directory is much smaller than the file, doubling it is much cheaper. Only
one page of keys and pointers is split.
000 100 0 1
010 100
100 000
111 000 100 000
000 100
001 000 111 000
010 100
011 000 101 000
001 000
101 000 111 001
011 000
111 001
001 010
101 100
101 110
00 01 10 11
IIIIII// CCSS88339511--DDAATTAA
IIIIYYRR SSTTRRUUCCTTUURREESS

000 100 100 000


111 000 010 100
001 000 101 000
111 001 011 000
001 010 101 100
001 011 101 110

You might also like