Hashing New
Hashing New
This presentation is designed for teaching purpose only. Topics are not
specified in detail in the presentation , so this is not suggested for reading
material for examination.
2
Unit 5: Hashing
4
Hash Tables
. Hashing is a method of directly computing address of a
record with the help of a key by using a suitable
mathematical function called hash function.
• A hash Table is an array based structure used to store
<key,information> pair.The implementation of hash
tables is called hashing.
• Hashing is a technique used for performing insertions,
deletions and finds in constant average time (i.e. O(1))
• This data structure, however, is not efficient in
operations that require any ordering information among
the elements, such as findMin, findMax and printing the
entire table in sorted order. 5
General Idea
• The ideal hash table structure is merely an array of some fixed
size, containing the items.
• A stored item needs to have a data member, called key, that will
be used in computing the index value for the item.
– Key could be an integer, a string, etc
– e.g. a name or Id that is a part of a large employee structure
• The size of the array is TableSize.
• The items that are stored in the hash table are indexed by values
from 0 to TableSize – 1.
• Each key is mapped into some number in the range 0 to TableSize
– 1.
• The mapping is called a hash function.
6
Example Hash
Table
0
1
Items
2
john 25000 john25000
25000
3 john
phil 31250 key Hash 4 phil31250
phil 31250
Function
dave 27500 5
6 dave27500
dave 27500
mary 28200
7 mary28200
mary 28200
key 8
9
7
Hash Function
The hash function is a mathematical function that transforms different
keys in to different addresses of hash table.
8
Hash function
Issues:
• Keys may not be numeric.
• Number of possible keys is much larger than the
space available in table.
• Different keys may map into same location
– Hash function is not one-to-one => collision.
– If there are too many collisions, the performance of
the hash table will suffer dramatically.
9
Hash Functions
• If the input keys are integers then simply Key mod
TableSize is a general strategy.
• If the keys are strings, hash function needs more care.
– First convert it into a numeric value.
10
Perfect Hash Function
The hash function that transforms different keys in to different addresses
and avoids collision called as perfect hash function.
The hash function:
must be simple to compute.
must distribute the keys evenly among the cells.
If we know which keys will occur in advance we can write perfect hash
functions, but we don’t.
Hashing methods /Functions
1. Direct method: Key is the address without any
arithmetic manipulation
Ex. Total monthly sales by days of the month
12
3. Digit Extraction Method
- Extract digits from the key & form address
4.Folding:
a. Fold shift b. Fold Boundary
– e.g. 123|456|789: add them
5. Mid Squaring:
– Square the key and then truncate
– 9452*9452=89340304=3403
13
6.Substraction Method
ex. 100 employees
employee nos starts from 1001 to 1100.
7. Pseudorandom method
y=ax+c
x=key ,a & c are prime numbers
Table size =307
y=((17*121267)+7) modulo 307
Y=41
8.Rotation :When keys are mostly same and last digit diffrent
ex:120605 --->512060 use any hash function
Terminologies
1. Bucket
A table uses a hash function to compute an index in to an array of
buckets or slots from which a desired value can be obtained.
M slots of the hash table can be divided in to B buckets,with each
bucket consisting M/B slots.
2. Collision
When we try to place an element in the bucket that holds an
element collision occurs.
• In such case the element should be rehashed to alternate
empty location.
Terminologies
3. Probe
If the hashed address is found to be already occupied by an element then
the locations following the hashed location are examined to locate the
first empty location.
• Two methods are popular
a. Linear Probing b. Quadratic Probing
4. Synonym
The mapping defined by a hash function is going to be many to one.
The mapping function maps a set of values to the same location in the
hash table. Elements mapped to same location in the hash table are
known as synonym.
5. Overflow
When there are more colliding records for a given bucket than table
capacity overflow occurs.
6. Open Hashing or External Hashing
No limitation on size of the table (storage on hard disk)
7. Close Hashing or Internal Hashing
Fixed space for storage
8.Load Density
Maximum storage capacity
9. Load Factor
Load factor of a hash table is the ratio of n/T.
N= no of keys in the table
T = size of the hash table
10.Full Table
Rehashing
With respect to closed hashing .When we try to store record with key1
at the bucket position Hash(key1) & finds its a collision.
To handle collision ,we use strategy to choose a sequence of alternative
locations Hash1(key1),Hash2(key1) and so on within the bucket table
so as to place record with key1 .
If the table is close to full, the search time grows and may become
equal to the table size.
When the load factor exceeds a certain value (e.g. greater than 0.5) we
do rehashing :
Build a second table twice as large as the original
and rehash there all the keys of the original table.
Rehashing is expensive operation, with running time O(N)
However, once done, the new hash table will have good performance.
This is called Rehashing.
Hash Function 1:Stings
• Add up the ASCII values of all characters of the key.
int hash(const string &key, int tableSize)
{
int hasVal = 0;
20
Collision Resolution
• If, when an element is inserted, it hashes to the
same value as an already inserted element, then we
have a collision and need to resolve it.
• There are several methods for dealing with this:
1. Open addressing
a. Linear Probing
b. Quadratic Probing
c. Double Hashing
21
1. Linear Probing
Index Data
Place new record linearly down wherever the empty location is
found 0
Ex:
1 131
131,21,31,4,5,61,7,8
2 21
3 31
Index=key mod 10
4 4
5 5
6 61
7 7
8 8
22 9
Drawback of Linear Probing
The tendency for some collision resolution
schemes to create long runs of filled slots
near the hash function position of keys is
called as primary clustering.
23
Classes (Refer T2)
Class dataitem Class hashtable
{ {
Private int data; Dataitem[] hasharray;
Int arraysize;
} hashtable(int size)
{
hasharray=new
dataitem[arraysize];
}
hashfun(datatype key)
hashval=(key % 10)
Return hashval
Insert
1. Check if table is full
Display table full
2.Accept item to be inserted in hashtable.
3. Calculate hash value of the key.
4.while(hasharray[hashval]!=0 || hasharray[hashval].data!=-1
3.1 increment hashval
3.2 hashval %=arraysize
5. hasharray[hashval]=item
6.stop
Find
1. Accept key to be searched in hashtable.
2. Calculate hash value of the key.
3.while(hasharray[hashval]!=0
3.1 if hasharray[hashval].data==key
Return hasharray[hashval]
3.1 increment hashval
3.2 hashval %=arraysize
4. return null
5.stop
Delete
1. Accept key to be searched in hashtable.
2. Calculate hash value of the key.
3.while(hasharray[hashval]!=0
3.1 if hasharray[hashval].data==key
Dataitem temp=hasharray[hashval]
hasharray[hashval] =-1
Return temp
3.1 increment hashval
3.2 hashval %=arraysize
4. return null
5.stop
Linear Probing with replacement
1. Accept item to be inserted in hashtable.
2. Calculate j as hash value of the key.
3.while(hasharray[hashval]!=0 || hasharray[hashval].data!=-1
3.1 if hasharray[hashval].data %10 !=j //non home record
{ dataout= hasharray[hashval].data
hasharray[hashval].data=item
item=dataout }//update existing data
3.2 increment hashval
3.3 hashval %=arraysize
4. hasharray[hashval]=item
5.stop
Quadratic Probing
Hi(key)=(Hash(key)+i^2)%m
i=0---(max -1)/2 whichever applicable
Ex: 37,90,55,22,11,17,49,87
Index Data
0 90
17->? Collision
1 11
(17+0^2)%10=7
2 22
(17+1^2)%10=8, place 17
3
5 55
6 30
49
Index Data
87 ->? collision
0 90
1 11
2 22
5 55
6 87
7 37
31
Double Hashing
Double hashing is a technique in which a second hash
function is applied to key when collision occurs.
32
Double Hashing
If a collision occurs when inserting, apply a second
auxiliary hash function, h 2 (k), and probe at a distance
h 2 (k), 2 * h 2 (k), 3 * h 2 (k), etc. until find empty position.
So, f(i) = i * h 2 (k) and we have two auxiliary functions:
h( k, i ) = ( h 1 (k) + i * h 2 (k) ) mod m
With H = h 1 ( k ), we try the following cells in sequence with
wraparound:
H
H + h 2 (k)
H + 2 * h 2 (k)
H + 3 * h 2 (k)
Ind Da
ex ta
Ex: 12,1,18,56,79,49
0
Insert 49
1 1
H1(49)=49%10=9
H2(M-(Key%M) 2 12
M is prime number smaller
3 49
than siz of table
H2(49)=7-(49%7)=7 4
Hash(49)=[H1(49)+i*H2(49)%10 5
=[9+1*7]%10=6 full
6 56
Hash(49)=[H1(49)+i*H2(49)%10 7
=[9+2*7]%10=3
8 18
9 79
Chaining/Bucket Hashing
Index Data Chain
1.Chaining without Replacement
0 -1 -1
Ex. 131,3,4,21,61,6,71,8,9
1 131 2
2 21 5
3 3 -1
4 4 -1
5 61 7
6 6 -1
7 71 -1
8 8 -1
9 9 -1
2.Chaining with Replacement
Ex.131,21,31,4,5,2 Index Data Chain
0 -1 -1
1 131 2
2 21 3
3 31 -1
4 4 -1
5 5 -1
6 2 -1
7 -1 36 -1
2.Chaining with Replacement
0 -1 -1
1 131 6
2 2 -1
3 31 -1
4 4 -1
5 5 -1
6 21 3
7 -1 37 -1
Linear probing with chaining with
replacement
1. Initialize hash table with value & chain to -1.
2. Check if table full & display message.
3. If table slot empty i.e table[key][0]==-1
Store new value in empty table slot
4.Otherwise //table slot is not empty
4.1 Read chain at key position //ch=table[key][1]
4.2 Check if collision occur and existing value & new are synoname
4.2.1If it has no immediate chain //immediate next empty
4.2.1.1 Find next empty slot
Place record and update chain ,set flag & break
4.2.2 Read while element !=-1 and chain !=-1
ch=table[ch][1]
Place record and update chain ,set flag & break
5.2 else keys are not synoname
5.2.1 Read chain at table slot & check if it is empty
5.2.2 Store existing value temp=table[key][0]
Search for empty table slot //i=key+1;i<max;i++
Store table[key][0]=new key
Store table[i][0]=temp
update chain ,set flag & break
5.2.2//if unmatch & chain exists
5.2.2.1 Read chain
Read existing element
5.2.2.2 While chain !=-1
ch=table[key][1]
Store element
update chain ,set flag & break
5. Stop
Separate Chaining
• The idea is to keep a list of all elements that hash
to the same value.
– The array elements are pointers to the first nodes of the
lists.
– A new item is inserted to the front of the list.
• Advantages:
– Better space utilization for large items.
– Simple collision handling: searching linked list.
– Overflow: we can store more items than the hash table
size.
– Deletion is quick and easy: deletion from the linked list.
40
Example
Keys: 0, 81, 64,49, 36, 25, 16, 9, 4, 1
hash(key) = key % 10.
0 0
1 81 1
2
4 64 4
5 25
6 36 16
7
9 49 9
Operations
• Initialization: all entries are set to NULL
• Find:
– locate the cell using hash function.
– sequential search on the linked list in that cell.
• Insertion:
– Locate the cell using hash function.
– (If the item does not exist) insert it as the first item in
the list.
• Deletion:
– Locate the cell using hash function.
– Delete the item from the linked list.
42
Hashing using Separate
Chaining(Linked List)
Class hashing
Class node { public:
node hashtable[max];
{ Hashing()
int key; {
for(i=0;i<n;i++)
node next;
{
} hashtable[i]=null;
}
}
Void insert();
Void search();
Void delete();
}
Insert(int k)
1.Create a new node say curr.
2. Assign data for new node
3. Calculate pos=hash(curr.key)
4.If hashtable[pos]==null
hashtable[pos]=curr;
5.Else
5.1 temp=hashtable[pos];
5.2 while(temp.next!=null)
temp=temp.next;
5.3 temp.next=curr
6.stop
Display()
1. Declare curr
2. For (i=0;i<10 ;i++)
2.1 curr=hashtable[i]
2.2 while (curr !=null)
Display curr.data
curr=curr.next
2.3 end while loop
3.stop
Search (int x)
1. Declare curr for traversal.
2. Find pos= hash(x);
3. Curr=hashtable[pos];
4. while (curr!=null && curr.key !=x)
4.1 Display curr.data
4.2 curr=curr.next
5. If curr=null
Display record not found
6 else
Display record not found
7.Stop
Delete(key)
1. Get the value
2. Compute the address using hash function.
3. Using linked list deletion algorithm, delete the element from the
hashtable[key].
Linked List Deletion Algorithm:
4. If unable to delete, print "Value Not Found"
5.Stop
Hashing Applications
• Compilers use hash tables to implement the
symbol table (a data structure to keep track of
declared variables).
• Game programs use hash tables to keep track of
positions it has encountered (transposition table)
• Online spelling checkers.
48