0% found this document useful (0 votes)
26 views26 pages

Hash Functions

Hash functions are used to map data of arbitrary size to data of a fixed size. This allows data to be stored and retrieved more efficiently from databases. Common hash function algorithms include truncation, mid-square, folding, and division methods. Hash collisions occur when two keys map to the same hash value. Collision resolution techniques include chaining, which links colliding keys together, and open addressing techniques like linear probing and quadratic probing that find alternate locations to store colliding keys. Coalesced hashing is a hybrid approach that chains keys within the hash table itself to reduce wasted space.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views26 pages

Hash Functions

Hash functions are used to map data of arbitrary size to data of a fixed size. This allows data to be stored and retrieved more efficiently from databases. Common hash function algorithms include truncation, mid-square, folding, and division methods. Hash collisions occur when two keys map to the same hash value. Collision resolution techniques include chaining, which links colliding keys together, and open addressing techniques like linear probing and quadratic probing that find alternate locations to store colliding keys. Coalesced hashing is a hybrid approach that chains keys within the hash table itself to reduce wasted space.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

UNIT V

HASH FUNCTIONS
Hash Function
• Hashing is the transformation of a string of characters into a
usually shorter fixed-length value or key that represents the
original string.

• Hashing is used to index and retrieve items in a database because


it is faster to find the item using the shorter hashed key than to
find it using the original value.

• The hashing algorithm is called the hash function-- A hash function


is a mathematical function that converts an input value into a
compressed numerical value – a hash or hash value.

• Basically, it's a processing unit that takes in data of arbitrary


length and gives the output of a fixed length – the hash value.
Perfect hash function
• An ideal (perfect) hash function transforms all
different hashed keys into different subscripts
of a table

• When a file has a million records, it is difficult


to have such a function
Hash collision (clash)
• When two hashed keys have the same values,
it is called a hash collision or a hash clash
• E.g., given a hash function h(key) = key%n
• For n=1000, h(1322) = 1322 % 1000 = 322 and
h(2322) = 2322 % 1000 = 322
• That means both key 1322 and 2322 may
attempt to insert the record into the same
position
Techniques used in hash function.

• Truncation Method
• Mid square Method
• Folding Method
• Division Method
Truncation Method

• This is the simplest method for computing address from a


key. In this method we take only a part of the key as
address.
• Example: Let us take some 8 digit keys and find addresses
for them. Let the table size is 100 and we have to take 2
rightmost digits for getting the hash table address.
• Suppose the keys are. 62394572, 87135565, 93457271,
45393225.
• So the address of above keys will be 72,65,71 and 25
respectively.
• This method is easy to compute but chances of collision are
more because last two digits can be same in more than one
key.
Midsquare Method

• In this method the key is squared and some


digits from the middle of this square are taken
as address.
• Example:
Key Square of key Address

1123 1261129 612

2273 5166529 665

3139 9853321 533


Folding Method

• In this technique the key is divided into different part


where the length of each part is same as that of the
required address, except possibly the last part.
• Example:
• Let key is 123945234 and the table size is 1000 then
we will break this key as follows
• 123945234 ----> 123 945 234
• Now we will add these broken parts.
123+945+234=1302. The sum is 1302, we will ignore
the final carry 1, so the address for the key 123945234
is 302.
Division Method (Modulo-Division)
• In Modulo-Division method the key is divided
by the table size and the remainder is taken as
the address of the hash table.
• Let the table size is n then
• H (k) =k mod n
Hash collision (clash)
• When two hashed keys have the same values,
it is called a hash collision or a hash clash
• E.g., given a hash function h(key) = key%n
• For n=1000, h(1322) = 1322 % 1000 = 322 and
h(2322) = 2322 % 1000 = 322
• That means both key 1322 and 2322 may
attempt to insert the record into the same
position
Resolving hash clashes
1. Chaining ( Open Hashing) Keys with the same
hash values will be linked together and a
search process should sequentially traverse all
the items in the linked list
2. Open Addressing (Closed Hashing) : Whenever
there is a clash, it will rehash – to find another
slot in the table
– many techniques: e.g., linear probing, quadratic
probing
Chaining
• Chaining avoids collision. The idea is to make each cell of
hash table point to a linked list of records that have same
hash function value.
• Let’s create a hash function, such that our hash table has ‘N’
number of buckets. To insert a node into the hash table, we
need to find the hash index for the given key.

• Example: hashIndex = key % Tablesize


• Insert: Move to the table location that corresponds to the
above calculated hash index and insert the new node at the
end of the list.
• Delete: To delete a node from hash table, calculate the
hash index for the key, move to the bucket corresponds to
the calculated hash index, search the list in the current
bucket to find and remove the node with the given key (if
found).
Chaining
• Example: h(key) = key % 10
Input: 2813,1615,2822,8232, 3553, 2125,4288

0
1
2 2822 8232
3 2813 3553
4
5 1615 2125
6
7
8 4288
9
• Open addressing ensures that all elements are
stored directly into the hash table, thus it
attempts to resolve collisions using various
methods.
– Linear Probing resolves collisions by placing the data
into the next open slot in the table.
– Quadratic Probing-The i2 slot is searched in ith probe
– Double Hashing We use another hash function
hash2(x) and look for i*hash2(x) slot in i’th rotation
• Open Addressing
Like separate chaining, open addressing is a method for
handling collisions. In Open Addressing, all elements are
stored in the hash table itself. So at any point, size of the
table must be greater than or equal to the total number of
keys (Note that we can increase table size by copying old
data if needed).

• Insert(k): Keep probing until an empty slot is found. Once an


empty slot is found, insert k.

Search(k): Keep probing until slot’s key doesn’t become


equal to k or an empty slot is reached.

• Delete(k): If we simply delete a key, then search may fail.


So slots of deleted keys are marked specially as “deleted”.
Insert can insert an item in a deleted slot, but the search
doesn’t stop at a deleted slot.
Linear Probing
• Let hash(x) be the slot index computed using
hash function
• If slot hash(x) % S is full, then we try (hash(x) +
1) % S
• If (hash(x) + 1) % S is also full, then we try
(hash(x) + 2) % S
• If (hash(x) + 2) % S is also full, then we try
(hash(x) + 3) % S and so on
Quadratic Probing We look for i2‘th slot in i’th
iteration.
• Let hash(x) be the slot index computed using hash
function.
• If slot hash(x) % S is full, then we try (hash(x) +
1*1) % S
• If (hash(x) + 1*1) % S is also full, then we try
(hash(x) + 2*2) % S
• If (hash(x) + 2*2) % S is also full, then we try
(hash(x) + 3*3) % S and so on
• Double Hashing We use another hash function
hash2(x) and look for i*hash2(x) slot in i’th
rotation.
• Let hash(x) be the slot index computed using hash
function.
• If slot hash(x) % S is full, then we try (hash(x) +
1*hash2(x)) % S
• If (hash(x) + 1*hash2(x)) % S is also full, then we
try (hash(x) + 2*hash2(x)) % S
• If (hash(x) + 2*hash2(x)) % S is also full, then we
try (hash(x) + 3*hash2(x)) % S and so on
Open Addressing: Linear probing
• Place the record in the next available position in the array, i.e., rh(i) = i+1.
E.g., (input: 2822, 2813,1615, 3553,2125, 4288, 8232)

0
1
2 2822
3 2813
4 3553 3553: h(3553)=3, rh(1)=4
5 1615
6 2125 2125: h(2125)=5, rh(1)=6
7 8232 8232: h(8232)=2, rh(2)=3,
rh(3)=4, rh(4)=5, rh(5)=6, rh(6)=7
8 4288
9
Open addressing -- quadratic Probing
• The jth rehash is hj(key) = (h(key)+j2) % array_size
• E.g., (input: 2822, 1615, 2813, 3553, 2125, 8232,4288)

0
8232: h(8232)=2, h 1=2+1=3,
1 8232 h 2=2+(2*2)=6, h3=(2+3*3)%10=1
2 2822
3 2813
4 3553 3553: h(3553)=3, h1=3+1=4
5 1615
6 2125 2125: h(2125)=5, h1=5+1=6
7
8 4288
9
Coalesced Hashing

A hybrid of chaining and open addressing, is


coalesced hashing. This links together chains
of nodes within the table itself.
Like open addressing, it achieves space usage.
Unlike chaining, it cannot have more elements
than table slots.
• Coalesced hashing is a collision avoidance technique when
there is a fixed sized data. It is a combination of
both Separate chaining and Open addressing.
• It uses the concept of Open Addressing(linear probing) to
find first empty place for colliding element from the bottom
of the hash table and the concept of Separate Chaining to
link the colliding elements to each other through pointers.
• The hash function used is h=(key)%(total number of keys).
Inside the hash table, each node has three fields:
• h(key): The value of hash function for a key.
• Data: The key itself.
• Next: The link to the next colliding elements.
Example
• n = 10
• Input : {20, 35, 16, 40, 45, 25, 32, 37, 22, 55}
• h(key) = key%10
• Initially
• Hash Value Data Next
0 20 Null
1
2
3
4
5 35 Null
6 16 Null
7
8
9
• Now we have to insert 40, h(40)=0 which is already occupied so we search
for the first empty block from the bottom and insert it there. Also the
address of this newly inserted node i.e(9 )is initialised in the next field of
0th index value node

Hash Value Data Next


0 20 9
1
2
3
4
5 35 Null
6 16 Null
7
8
9 40 Null
• Finally the hash table looks like this
• Input : {20, 35, 16, 40, 45, 25, 32, 37, 22, 55}
• h(key) = key%10

Hash Value Data Next


0 20 9
1 55 Null
2 32 3
3 22 Null
4 37 1
5 35 8
6 16 Null
7 25 4
8 45 7
9 40 Null
Chaining vs Open addressing
• Chaining is Simpler to implement. Open
Addressing requires more computation.
• In chaining, Hash table never fills up, we can
always add more elements to chain. In open
addressing, table may become full.
• Chaining uses extra space for links. No links in
Open addressing.

You might also like