SlideShare a Scribd company logo
1
Hashing
2
General Idea
• The ideal hash table structure is merely an array of some fixed
size, containing the items.
• A stored item needs to have a data member, called key, that will
be used in computing the index value for the item.
– Key could be an integer, a string, etc
– e.g. a name or Id that is a part of a large employee structure
• The size of the array is TableSize.
• The items that are stored in the hash table are indexed by values
from 0 to TableSize – 1.
• Each key is mapped into some number in the range 0 to
TableSize – 1.
• The mapping is called a hash function.
3
Example
Hash
Function
mary 28200
dave 27500
phil 31250
john 25000
Items
Hash
Table
key
key
0
1
2
3
4
5
6
7
8
9
mary 28200
dave 27500
phil 31250
john 25000
4
Hash Function
• The hash function:
– must be simple to compute.
– must distribute the keys evenly among the cells.
• If we know which keys will occur in
advance we can write perfect hash
functions, but we don’t.
5
Hash function
Problems:
• Keys may not be numeric.
• Number of possible keys is much larger than the
space available in table.
• Different keys may map into same location
– Hash function is not one-to-one => collision.
– If there are too many collisions, the performance of
the hash table will suffer dramatically.
6
Hash Functions
• If the input keys are integers then simply
Key mod TableSize is a general strategy.
– Unless key happens to have some undesirable
properties. (e.g. all keys end in 0 and we use
mod 10)
• If the keys are strings, hash function needs
more care.
– First convert it into a numeric value.
7
Some methods
• Truncation:
– e.g. 123456789 map to a table of 1000 addresses by
picking 3 digits of the key.
• Folding:
– e.g. 123|456|789: add them and take mod.
• Key mod N:
– N is the size of the table, better if it is prime.
• Squaring:
– Square the key and then truncate
• Radix conversion:
– e.g. 1 2 3 4 treat it to be base 11, truncate if necessary.
8
Collision Resolution
• If, when an element is inserted, it hashes to the
same value as an already inserted element, then we
have a collision and need to resolve it.
• There are several methods for dealing with this:
– Separate chaining
– Open addressing
• Linear Probing
• Quadratic Probing
• Double Hashing
9
Separate Chaining
• The idea is to keep a list of all elements that hash to
the same value.
– The array elements are pointers to the first nodes of the
lists.
– A new item is inserted to the front of the list.
• Advantages:
– Better space utilization for large items.
– Simple collision handling: searching linked list.
– Overflow: we can store more items than the hash table
size.
– Deletion is quick and easy: deletion from the linked list.
10
Example
0
1
2
3
4
5
6
7
8
9
0
81 1
64 4
25
36 16
49 9
Keys: 0, 1, 4, 9, 16, 25, 36, 49, 64, 81
hash(key) = key % 10.
11
Operations
• Initialization: all entries are set to NULL
• Find:
– locate the cell using hash function.
– sequential search on the linked list in that cell.
• Insertion:
– Locate the cell using hash function.
– (If the item does not exist) insert it as the first item in the
list.
• Deletion:
– Locate the cell using hash function.
– Delete the item from the linked list.
12
Hashing: Open Addressing
13
Collision Resolution with
Open Addressing
• Separate chaining has the disadvantage of
using linked lists.
– Requires the implementation of a second data
structure.
• In an open addressing hashing system, all
the data go inside the table.
– Thus, a bigger table is needed.
• Generally the load factor should be below 0.5.
– If a collision occurs, alternative cells are tried
until an empty cell is found.
14
Open Addressing
• More formally:
– Cells h0(x), h1(x), h2(x), …are tried in succession where
hi(x) = (hash(x) + f(i)) mod TableSize, with f(0) = 0.
– The function f is the collision resolution strategy.
• There are three common collision resolution
strategies:
– Linear Probing
– Quadratic probing
– Double hashing
15
Linear Probing
• In linear probing, collisions are resolved by
sequentially scanning an array (with
wraparound) until an empty cell is found.
– i.e. f is a linear function of i, typically f(i)= i.
• Example:
– Insert items with keys: 89, 18, 49, 58, 9 into an
empty hash table.
– Table size is 10.
– Hash function is hash(x) = x mod 10.
• f(i) = i;
16
Figure 20.4
Linear probing
hash table after
each insertion
17
Find and Delete
• The find algorithm follows the same probe
sequence as the insert algorithm.
– A find for 58 would involve 4 probes.
– A find for 19 would involve 5 probes.
• We must use lazy deletion (i.e. marking
items as deleted)
– Standard deletion (i.e. physically removing the
item) cannot be performed.
– e.g. remove 89 from hash table.
18
Clustering Problem
• As long as table is big enough, a free cell
can always be found, but the time to do so
can get quite large.
• Worse, even if the table is relatively empty,
blocks of occupied cells start forming.
• This effect is known as primary clustering.
• Any key that hashes into the cluster will
require several attempts to resolve the
collision, and then it will add to the cluster.
19
Linear Probing – Analysis -- Example
• What is the average number of probes for a successful
search and an unsuccessful search for this hash table?
– Hash Function: h(x) = x mod 11
Successful Search:
– 20: 9 -- 30: 8 -- 2 : 2 -- 13: 2, 3 -- 25: 3,4
– 24: 2,3,4,5 -- 10: 10 -- 9: 9,10, 0
Avg. Probe for SS = (1+1+1+2+2+4+1+3)/8=15/8
Unsuccessful Search:
– We assume that the hash function uniformly
distributes the keys.
– 0: 0,1 -- 1: 1 -- 2: 2,3,4,5,6 -- 3: 3,4,5,6
– 4: 4,5,6 -- 5: 5,6 -- 6: 6 -- 7: 7 -- 8: 8,9,10,0,1
– 9: 9,10,0,1 -- 10: 10,0,1
Avg. Probe for US =
(2+1+5+4+3+2+1+1+5+4+3)/11=31/11
0 9
1
2 2
3 13
4 25
5 24
6
7
8 30
9 20
10 10
20
Quadratic Probing
• Quadratic Probing eliminates primary clustering
problem of linear probing.
• Collision function is quadratic.
– The popular choice is f(i) = i2
.
• If the hash function evaluates to h and a search in cell
h is inconclusive, we try cells h + 12
, h+22
, … h + i2
.
– i.e. It examines cells 1,4,9 and so on away from the
original probe.
• Remember that subsequent probe points are a
quadratic number of positions from the original
probe point.
21
Figure 20.6
A quadratic
probing hash
table after each
insertion (note
that the table size
was poorly chosen
because it is not a
prime number).
22
Quadratic Probing
• Problem:
– We may not be sure that we will probe all locations in
the table (i.e. there is no guarantee to find an empty cell
if table is more than half full.)
– If the hash table size is not prime this problem will be
much severe.
• However, there is a theorem stating that:
– If the table size is prime and load factor is not larger
than 0.5, all probes will be to different locations and an
item can always be inserted.
23
Double Hashing
• A second hash function is used to drive the collision
resolution.
– f(i) = i * hash2(x)
• We apply a second hash function to x and probe at a
distance hash2(x), 2*hash2(x), … and so on.
• The function hash2(x) must never evaluate to zero.
– e.g. Let hash2(x) = x mod 9 and try to insert 99 in the
previous example.
• A function such as hash2(x) = R – ( x mod R) with R
a prime smaller than TableSize will work well.
– e.g. try R = 7 for the previous example.(7 - x mode 7)
24
Hashing Applications
• Compilers use hash tables to implement the
symbol table (a data structure to keep track
of declared variables).
• Game programs use hash tables to keep
track of positions it has encountered
(transposition table)
• Online spelling checkers.
CENG 213 Data Structures 25
Summary
• Hash tables can be used to implement the insert
and find operations in constant average time.
– it depends on the load factor not on the number of items
in the table.
• It is important to have a prime TableSize and a
correct choice of load factor and hash function.
• For separate chaining the load factor should be
close to 1.
• For open addressing load factor should not exceed
0.5 unless this is completely unavoidable.
– Rehashing can be implemented to grow (or shrink) the
table.

More Related Content

PPT
11_hashtable-1.ppt. Data structure algorithm
farhankhan89766
 
PPT
4.4 hashing
Krish_ver2
 
PPTX
Working with python Nice PPT must try very good
MuhammadChala
 
PPTX
Unit viii searching and hashing
Tribhuvan University
 
PPT
Introduction to Hashing in Data Structure using C++
debasisdas225831
 
PPTX
Hashing in data structure is presented in these slides
jamnona
 
PPTX
Hashing techniques, Hashing function,Collision detection techniques
ssuserec8a711
 
PPT
Hashing In Data Structure Download PPT i
cajiwol341
 
11_hashtable-1.ppt. Data structure algorithm
farhankhan89766
 
4.4 hashing
Krish_ver2
 
Working with python Nice PPT must try very good
MuhammadChala
 
Unit viii searching and hashing
Tribhuvan University
 
Introduction to Hashing in Data Structure using C++
debasisdas225831
 
Hashing in data structure is presented in these slides
jamnona
 
Hashing techniques, Hashing function,Collision detection techniques
ssuserec8a711
 
Hashing In Data Structure Download PPT i
cajiwol341
 

Similar to Hashing Techniques in Data Strucures and Algorithm (20)

PDF
hashtableeeeeeeeeeeeeeeeeeeeeeeeeeee.pdf
timoemin50
 
PPTX
Lec12-Hash-Tables-27122022-125641pm.pptx
IqraHanif27
 
PPTX
Data Structures-Topic-Hashing, Collision
sailaja156145
 
PPTX
Lecture14_15_Hashing.pptx
SLekshmiNair
 
PPTX
hashing in data structures and its applications
manjeshbngowda
 
PDF
LECT 10, 11-DSALGO(Hashing).pdf
MuhammadUmerIhtisham
 
PPT
Hashing in Data Structure and analysis of Algorithms
KavitaSingh962656
 
PPT
Hashing
Ghaffar Khan
 
PPT
Algorithms Binary Search Hashing ppt BSIT
emmanuelsolabo07
 
PPTX
hashing in data strutures advanced in languae java
ishasharma835109
 
PPT
Concept of hashing
Rafi Dar
 
PPTX
HASHING IS NOT YASH IT IS HASH.pptx
JITTAYASHWANTHREDDY
 
PPTX
Hashing .pptx
ParagAhir1
 
PPTX
Hashing.pptx
kratika64
 
PPTX
Presentation.pptx
AgonySingh
 
PPTX
Hashing
kurubameena1
 
PPTX
Hash function
MDPiasKhan
 
PPT
Analysis Of Algorithms - Hashing
Sam Light
 
PDF
Tojo Sir Hash Tables.pdfsfdasdasv fdsfdfsdv
2021csabhishekgdurga
 
PPTX
Hashing a searching technique in data structures
shiks1234
 
hashtableeeeeeeeeeeeeeeeeeeeeeeeeeee.pdf
timoemin50
 
Lec12-Hash-Tables-27122022-125641pm.pptx
IqraHanif27
 
Data Structures-Topic-Hashing, Collision
sailaja156145
 
Lecture14_15_Hashing.pptx
SLekshmiNair
 
hashing in data structures and its applications
manjeshbngowda
 
LECT 10, 11-DSALGO(Hashing).pdf
MuhammadUmerIhtisham
 
Hashing in Data Structure and analysis of Algorithms
KavitaSingh962656
 
Hashing
Ghaffar Khan
 
Algorithms Binary Search Hashing ppt BSIT
emmanuelsolabo07
 
hashing in data strutures advanced in languae java
ishasharma835109
 
Concept of hashing
Rafi Dar
 
HASHING IS NOT YASH IT IS HASH.pptx
JITTAYASHWANTHREDDY
 
Hashing .pptx
ParagAhir1
 
Hashing.pptx
kratika64
 
Presentation.pptx
AgonySingh
 
Hashing
kurubameena1
 
Hash function
MDPiasKhan
 
Analysis Of Algorithms - Hashing
Sam Light
 
Tojo Sir Hash Tables.pdfsfdasdasv fdsfdfsdv
2021csabhishekgdurga
 
Hashing a searching technique in data structures
shiks1234
 
Ad

Recently uploaded (20)

PPTX
CARE OF UNCONSCIOUS PATIENTS .pptx
AneetaSharma15
 
PDF
UTS Health Student Promotional Representative_Position Description.pdf
Faculty of Health, University of Technology Sydney
 
PPTX
Artificial-Intelligence-in-Drug-Discovery by R D Jawarkar.pptx
Rahul Jawarkar
 
PDF
2.Reshaping-Indias-Political-Map.ppt/pdf/8th class social science Exploring S...
Sandeep Swamy
 
PPTX
An introduction to Prepositions for beginners.pptx
drsiddhantnagine
 
DOCX
Unit 5: Speech-language and swallowing disorders
JELLA VISHNU DURGA PRASAD
 
DOCX
Action Plan_ARAL PROGRAM_ STAND ALONE SHS.docx
Levenmartlacuna1
 
PDF
Health-The-Ultimate-Treasure (1).pdf/8th class science curiosity /samyans edu...
Sandeep Swamy
 
PDF
Study Material and notes for Women Empowerment
ComputerScienceSACWC
 
PPTX
Dakar Framework Education For All- 2000(Act)
santoshmohalik1
 
DOCX
SAROCES Action-Plan FOR ARAL PROGRAM IN DEPED
Levenmartlacuna1
 
PPTX
HISTORY COLLECTION FOR PSYCHIATRIC PATIENTS.pptx
PoojaSen20
 
PDF
Virat Kohli- the Pride of Indian cricket
kushpar147
 
PPTX
An introduction to Dialogue writing.pptx
drsiddhantnagine
 
PPTX
Trends in pediatric nursing .pptx
AneetaSharma15
 
PDF
The Picture of Dorian Gray summary and depiction
opaliyahemel
 
PPTX
How to Manage Leads in Odoo 18 CRM - Odoo Slides
Celine George
 
PPTX
Software Engineering BSC DS UNIT 1 .pptx
Dr. Pallawi Bulakh
 
PDF
Sunset Boulevard Student Revision Booklet
jpinnuck
 
PPTX
Introduction to pediatric nursing in 5th Sem..pptx
AneetaSharma15
 
CARE OF UNCONSCIOUS PATIENTS .pptx
AneetaSharma15
 
UTS Health Student Promotional Representative_Position Description.pdf
Faculty of Health, University of Technology Sydney
 
Artificial-Intelligence-in-Drug-Discovery by R D Jawarkar.pptx
Rahul Jawarkar
 
2.Reshaping-Indias-Political-Map.ppt/pdf/8th class social science Exploring S...
Sandeep Swamy
 
An introduction to Prepositions for beginners.pptx
drsiddhantnagine
 
Unit 5: Speech-language and swallowing disorders
JELLA VISHNU DURGA PRASAD
 
Action Plan_ARAL PROGRAM_ STAND ALONE SHS.docx
Levenmartlacuna1
 
Health-The-Ultimate-Treasure (1).pdf/8th class science curiosity /samyans edu...
Sandeep Swamy
 
Study Material and notes for Women Empowerment
ComputerScienceSACWC
 
Dakar Framework Education For All- 2000(Act)
santoshmohalik1
 
SAROCES Action-Plan FOR ARAL PROGRAM IN DEPED
Levenmartlacuna1
 
HISTORY COLLECTION FOR PSYCHIATRIC PATIENTS.pptx
PoojaSen20
 
Virat Kohli- the Pride of Indian cricket
kushpar147
 
An introduction to Dialogue writing.pptx
drsiddhantnagine
 
Trends in pediatric nursing .pptx
AneetaSharma15
 
The Picture of Dorian Gray summary and depiction
opaliyahemel
 
How to Manage Leads in Odoo 18 CRM - Odoo Slides
Celine George
 
Software Engineering BSC DS UNIT 1 .pptx
Dr. Pallawi Bulakh
 
Sunset Boulevard Student Revision Booklet
jpinnuck
 
Introduction to pediatric nursing in 5th Sem..pptx
AneetaSharma15
 
Ad

Hashing Techniques in Data Strucures and Algorithm

  • 2. 2 General Idea • The ideal hash table structure is merely an array of some fixed size, containing the items. • A stored item needs to have a data member, called key, that will be used in computing the index value for the item. – Key could be an integer, a string, etc – e.g. a name or Id that is a part of a large employee structure • The size of the array is TableSize. • The items that are stored in the hash table are indexed by values from 0 to TableSize – 1. • Each key is mapped into some number in the range 0 to TableSize – 1. • The mapping is called a hash function.
  • 3. 3 Example Hash Function mary 28200 dave 27500 phil 31250 john 25000 Items Hash Table key key 0 1 2 3 4 5 6 7 8 9 mary 28200 dave 27500 phil 31250 john 25000
  • 4. 4 Hash Function • The hash function: – must be simple to compute. – must distribute the keys evenly among the cells. • If we know which keys will occur in advance we can write perfect hash functions, but we don’t.
  • 5. 5 Hash function Problems: • Keys may not be numeric. • Number of possible keys is much larger than the space available in table. • Different keys may map into same location – Hash function is not one-to-one => collision. – If there are too many collisions, the performance of the hash table will suffer dramatically.
  • 6. 6 Hash Functions • If the input keys are integers then simply Key mod TableSize is a general strategy. – Unless key happens to have some undesirable properties. (e.g. all keys end in 0 and we use mod 10) • If the keys are strings, hash function needs more care. – First convert it into a numeric value.
  • 7. 7 Some methods • Truncation: – e.g. 123456789 map to a table of 1000 addresses by picking 3 digits of the key. • Folding: – e.g. 123|456|789: add them and take mod. • Key mod N: – N is the size of the table, better if it is prime. • Squaring: – Square the key and then truncate • Radix conversion: – e.g. 1 2 3 4 treat it to be base 11, truncate if necessary.
  • 8. 8 Collision Resolution • If, when an element is inserted, it hashes to the same value as an already inserted element, then we have a collision and need to resolve it. • There are several methods for dealing with this: – Separate chaining – Open addressing • Linear Probing • Quadratic Probing • Double Hashing
  • 9. 9 Separate Chaining • The idea is to keep a list of all elements that hash to the same value. – The array elements are pointers to the first nodes of the lists. – A new item is inserted to the front of the list. • Advantages: – Better space utilization for large items. – Simple collision handling: searching linked list. – Overflow: we can store more items than the hash table size. – Deletion is quick and easy: deletion from the linked list.
  • 10. 10 Example 0 1 2 3 4 5 6 7 8 9 0 81 1 64 4 25 36 16 49 9 Keys: 0, 1, 4, 9, 16, 25, 36, 49, 64, 81 hash(key) = key % 10.
  • 11. 11 Operations • Initialization: all entries are set to NULL • Find: – locate the cell using hash function. – sequential search on the linked list in that cell. • Insertion: – Locate the cell using hash function. – (If the item does not exist) insert it as the first item in the list. • Deletion: – Locate the cell using hash function. – Delete the item from the linked list.
  • 13. 13 Collision Resolution with Open Addressing • Separate chaining has the disadvantage of using linked lists. – Requires the implementation of a second data structure. • In an open addressing hashing system, all the data go inside the table. – Thus, a bigger table is needed. • Generally the load factor should be below 0.5. – If a collision occurs, alternative cells are tried until an empty cell is found.
  • 14. 14 Open Addressing • More formally: – Cells h0(x), h1(x), h2(x), …are tried in succession where hi(x) = (hash(x) + f(i)) mod TableSize, with f(0) = 0. – The function f is the collision resolution strategy. • There are three common collision resolution strategies: – Linear Probing – Quadratic probing – Double hashing
  • 15. 15 Linear Probing • In linear probing, collisions are resolved by sequentially scanning an array (with wraparound) until an empty cell is found. – i.e. f is a linear function of i, typically f(i)= i. • Example: – Insert items with keys: 89, 18, 49, 58, 9 into an empty hash table. – Table size is 10. – Hash function is hash(x) = x mod 10. • f(i) = i;
  • 16. 16 Figure 20.4 Linear probing hash table after each insertion
  • 17. 17 Find and Delete • The find algorithm follows the same probe sequence as the insert algorithm. – A find for 58 would involve 4 probes. – A find for 19 would involve 5 probes. • We must use lazy deletion (i.e. marking items as deleted) – Standard deletion (i.e. physically removing the item) cannot be performed. – e.g. remove 89 from hash table.
  • 18. 18 Clustering Problem • As long as table is big enough, a free cell can always be found, but the time to do so can get quite large. • Worse, even if the table is relatively empty, blocks of occupied cells start forming. • This effect is known as primary clustering. • Any key that hashes into the cluster will require several attempts to resolve the collision, and then it will add to the cluster.
  • 19. 19 Linear Probing – Analysis -- Example • What is the average number of probes for a successful search and an unsuccessful search for this hash table? – Hash Function: h(x) = x mod 11 Successful Search: – 20: 9 -- 30: 8 -- 2 : 2 -- 13: 2, 3 -- 25: 3,4 – 24: 2,3,4,5 -- 10: 10 -- 9: 9,10, 0 Avg. Probe for SS = (1+1+1+2+2+4+1+3)/8=15/8 Unsuccessful Search: – We assume that the hash function uniformly distributes the keys. – 0: 0,1 -- 1: 1 -- 2: 2,3,4,5,6 -- 3: 3,4,5,6 – 4: 4,5,6 -- 5: 5,6 -- 6: 6 -- 7: 7 -- 8: 8,9,10,0,1 – 9: 9,10,0,1 -- 10: 10,0,1 Avg. Probe for US = (2+1+5+4+3+2+1+1+5+4+3)/11=31/11 0 9 1 2 2 3 13 4 25 5 24 6 7 8 30 9 20 10 10
  • 20. 20 Quadratic Probing • Quadratic Probing eliminates primary clustering problem of linear probing. • Collision function is quadratic. – The popular choice is f(i) = i2 . • If the hash function evaluates to h and a search in cell h is inconclusive, we try cells h + 12 , h+22 , … h + i2 . – i.e. It examines cells 1,4,9 and so on away from the original probe. • Remember that subsequent probe points are a quadratic number of positions from the original probe point.
  • 21. 21 Figure 20.6 A quadratic probing hash table after each insertion (note that the table size was poorly chosen because it is not a prime number).
  • 22. 22 Quadratic Probing • Problem: – We may not be sure that we will probe all locations in the table (i.e. there is no guarantee to find an empty cell if table is more than half full.) – If the hash table size is not prime this problem will be much severe. • However, there is a theorem stating that: – If the table size is prime and load factor is not larger than 0.5, all probes will be to different locations and an item can always be inserted.
  • 23. 23 Double Hashing • A second hash function is used to drive the collision resolution. – f(i) = i * hash2(x) • We apply a second hash function to x and probe at a distance hash2(x), 2*hash2(x), … and so on. • The function hash2(x) must never evaluate to zero. – e.g. Let hash2(x) = x mod 9 and try to insert 99 in the previous example. • A function such as hash2(x) = R – ( x mod R) with R a prime smaller than TableSize will work well. – e.g. try R = 7 for the previous example.(7 - x mode 7)
  • 24. 24 Hashing Applications • Compilers use hash tables to implement the symbol table (a data structure to keep track of declared variables). • Game programs use hash tables to keep track of positions it has encountered (transposition table) • Online spelling checkers.
  • 25. CENG 213 Data Structures 25 Summary • Hash tables can be used to implement the insert and find operations in constant average time. – it depends on the load factor not on the number of items in the table. • It is important to have a prime TableSize and a correct choice of load factor and hash function. • For separate chaining the load factor should be close to 1. • For open addressing load factor should not exceed 0.5 unless this is completely unavoidable. – Rehashing can be implemented to grow (or shrink) the table.