0% found this document useful (0 votes)

19 views63 pages

L04 Hashing

The document provides an overview of hashing, including its definition, methods for hashing, collision resolution techniques, and performance metrics. It discusses various collision resolution strategies such as separate chaining, linear probing, quadratic probing, and double hashing, along with the concept of load factor and its impact on performance. Additionally, it explains the importance of good hash functions and the challenges associated with collisions in hash tables.

Uploaded by

chiongchiong880806

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views63 pages

L04 Hashing

Uploaded by

chiongchiong880806

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 63

L04: Hashing

For efficient look-up in a table

Outline
 What is hashing?
 How to hash?
 What is collision?
 How to resolve collision?
 Separate chaining
 Linear probing
 Quadratic probing
 Double hashing
 Load factor
 Primary clustering and secondary clustering
2
ADT Table Operations
ADT table supports 3 operations:
1. Insertion
2. Deletion
3. Retrieval
Sorted Balanced Hashing
Array BST
Insertion O(n) O(log n) O(1) avg
Deletion O(n) O(log n) O(1) avg
Retrieval O(log n) O(log n) O(1) avg

Note: Balanced Binary Search Tree (BST) will be covered

in the next lecture
3
Direct Addressing
Table
A simplified version of hash table
SBS bus Problem

5
Direct Addressing Table: Operations

insert (key, data)

a[key] = data // where a[] is an array - the table

delete (key)
a[key] = null
find (key)
return a[key]

6
Direct Addressing Table: Restrictions
 Keys must be integer
 What happens for key values 151A and NR10?
 Range of keys must be small
 Keys must be dense, i.e. not many gaps in
the key values.
 How to overcome these restrictions?

7
Hash Table

A generalization of direct addressing

table, to remove its restrictions.
9
10
Good Hash Functions
 Fast to compute
 Scatter keys evenly throughout the hash table
 Less collisions
 Need less slots (space)

11
Bad Hash Functions

 What happen when you hash Singapore’s

house phone numbers by selecting the first
three digits?

12
Perfect Hash Functions

13
How to Define a Hash Function?
 Uniform hash function
 Division method
 Multiplication method
 Hashing of strings

14
Uniform Hash Functions

15
3 Division method (mod operator)
 Map into a hash table of m slots.
 Use the modulo operator (% ) to map an integer
to a value between 0 and m-1.

hash(k ) = k % m
The most popular method.

16
3 How to pick m (table size)?

17
Multiplication method
1. Multiply by a fraction A between 0 and 1
2. Extract the fractional part
3. Multiply the fraction by m, the hash table size

hash (k ) = m(kA- kA )

18
Hashing of strings

hash1(s) // s is a string
sum = 0
foreach character c in s
sum += c // sum up the ASCII values of all characters
return sum % m // m is the hash table size

19
Hashing of strings
hash(“Tan Ah Teck”)

= (“T” + “a” + “n” + “ ” +

“A” + “h” + “ ” +
“T” + “e” + “c” + “k”) % 11 // hash table size is 11
= (84 + 97 + 110 + 32 +
65 + 104 + 32 +
84 + 101 + 99 + 107) % 11
= 825 % 11
=0

20
Hashing of strings:
 Lee Chin Tan
 Chen Le Tian
 Chan Tin Lee

All 3 strings above have the same hash value! Why?

 Problem: The hash code produced by hash1 does

not depend on positions of characters! – Bad

21
Improved Hashing of strings
A better way is to “shift” the sum before adding the next
character, so that its position will affect the hash code.
This kind of hash code is called a polynomial hash code

hash2(s)
sum = 0
foreach character c in s
sum = sum*37 + c
return sum % m // m is the hash table size

22
Collision Resolution
4 Probability of Collision (1/2)
 von Mises Paradox (The Birthday Paradox): “How
many people must be in a room before the probability
that some share a birthday, ignoring the year and leap
days, becomes at least 50 percent?”

Q(n) = Probability of unique birthday for n people

364 363 362 365 - n  1

=   ...
365 365 365 365
P(n) = Probability of collisions (same birthday) for n people
= 1 – Q(n)
P(23) = 0.507

24
4 Probability of Collision (2/2)
 This means that if there are 23 people in a room,
the probability that some people share a
birthday is 50.7%.
 So, if we insert 23 keys in to a table with 365
slots, more than half of the time we get
collisions. Such a result is counter-intuitive to
many.
 So, collision is very likely!

25
How to resolve collisions?

Conflict resolution schemes commonly used:

 Separate Chaining
 Linear Probing
 Quadratic Probing
 Double Hashing

26
27
28
Load Factor
 n: number of keys in the hash table
 m: size of the hash tables – number of slots
 Define the load factor 
 = n/m
a measure of how full the hash table is.
Note:  can be >= 1.
If table size is the number of linked lists, then 
is the average length of the linked lists.

29
Average Running Time
 Find O(1 + )
 Insert O(1)
 Delete O(1 + )

 Note that  affects the performance of find and

delete operations.
 If  is bounded by some constant, then all three
operations are O(1).

30
31
Linear Probing
0
hash(k) = k mod 7 In linear probing,
1 when there is a
Here the table size m=7
collision, we scan
2 forwards for the the
Note: 7 is a prime number.
3 next empty slot
(wrapping around
4 when we reach the
last slot).
5

32
Linear Probing: Insert 18
0
hash(k) = k mod 7
1
hash(18) 2
= 18 mod 7
3
=4
4 18
5

33
Linear Probing: Insert 14
0 14
hash(k) = k mod 7
1

2
hash(14)
3
= 14 mod 7
=0 4 18
5

34
Linear Probing: Insert 21
0 14
hash(k) = k mod 7
1 21
Collision occurs!
2 Look for next empty slot.
3
hash(21)
= 21 mod 7 4 18
=0 5

35
Linear Probing

4.2 Linear Probing: Insert 1

0 14
hash(k) = k mod 7 Collides with 21
1 (hash value 0). Look
21
for next empty slot.
hash(1) 2 1
= 1 mod 7 3
=1 4 18
5

36
Linear Probing: Insert 35
0 14
hash(k) = k mod 7 Collision, need to
1 check next 3 slots.
21
hash(35) 2 1
= 35 mod 7 3 35
=0
4 18
5

37
Linear Probing: Find 35
0 14
hash(k) = k mod 7 Found 35, after 4
1 probes.
21
hash(35) = 0
2 1
3 35
4 18
5

38
Linear Probing: Find 8

0 14
hash(k) = k mod 7 8 NOT found.
1 Need 5 probes!
21
hash(8) = 1
2 1
3 35
4 18
5

39
Linear Probing: Delete 21
0 14
hash(k) = k mod 7
1 21
hash(21) = 0
2 1
3 35
4 18
5

40
Linear Probing: Find 35
0 14
hash(k) = k mod 7 35 NOT found!
1 Incorrect!
hash(35) = 0
2 1
We cannot simply
3 35 remove a value,
4 because it can
18
affect find()!
5

41
How to delete?
 Lazy Deletion
 Use three different states of a slot
 Occupied
 deleted
 Empty
 When a value is removed from linear probed
hash table, we just mark the status of the slot
as “deleted”, Instead of emptying the slot.
 Need to use a state array the same size as
the hash table.

42
Linear Probing: Delete 21

0 14
hash(k) = k mod 7 Slot 1 is occupied
1 but now marked
X
21
as deleted.
hash(21) = 0
2 1
3 35
4 18
5

43
Linear Probing

Linear Probing: Find 35

0 14
hash(k) = k mod 7
1 X
21
hash(35) = 0
2 1
3 35
Found 35
4 18 Now we can find 35

44
Linear Probing: Insert 15
0 14
hash(k) = k mod 7
1 X
21
15 15 is inserted into
hash(15) = 1 the slot 1 which
2 1 was marked as
Note: We continue 3 35 deleted
to search for 15, 4 18
and found that 15 We can insert a
is not in the hash 5 new value into a
table (total 5 slot that has been
6
probes).
marked as
deleted.
45
46
Problem of Linear Probing

0 14
1 15
Consecutive
2 1 occupied slots.
3 35
4 18
5

6 This is called
Primary
Clustering
47
Linear Probing

48
Modified Linear Probing
Q: How to modify linear probing to avoid primary clustering?
We can modify the probe sequence as follows:
hash(key)
( hash(key) + 1 * d ) % m
( hash(key) + 2 * d ) % m
( hash(key) + 3 * d ) % m
:
where d is some constant integer >1 and is co-prime to m.
Note: Since d and m are co-primes, the probe sequence
covers all the slots in the hash table.

49
Quadratic Probing

50
Quadratic Probing: Insert 3, 18
0
hash(k) = k mod 7
1
hash(3) = 3
Hash(18)= 4 2
3 3
4 18
5

51
Quadratic Probing: Insert 38
0 38
hash(k) = k mod 7
1
hash(38) = 3
2
Collision
3 3
+1
4 18
5
+4
6

52
Can quadratic probing always find a free slot?
Insert 12 into the previous example, followed by 10.

what happen?

Theorem
If  < 0.5, and m is prime, then we can always
find an empty slot.
(m is the table size and  is the load factor)

When quadratic probing is used, in the worst

case, 50% of the hash table is wasted.

53
Secondary Clustering
 In quadratic probing, clusters are formed along
the path of probing, instead of around the home
location.
 These clusters are called secondary clusters.
 Secondary clusters are formed as a result of
using the same pattern in probing by all keys.

54
Double Hashing

55
After Inserting 14 and 18, Insert 21
hash(k) = k mod 7 0 14 hash(21)
hash2(k) = k mod 5
1 21 =21 mod 7
2
=0
3
hash2(21)
4 18 = 21 mod 5
5 =1
6

56
Double Hashing: Insert 4
hash(k) = k mod 7 0 14 If we insert 4, the
hash2(k) = k mod 5
1 probe sequence is
21
4 (home), 8, 12, …
2
3 hash(4) = 4
4
hash2(4) = 4
18
5 4
6

57
Double Hashing: Insert 35
hash(k) = k mod 7 0 14 But if we insert 35,
hash2(k) = k mod 5
1 the probe sequence
21
hash(35) = 0 is 0, 0, 0, …
hash2(35) = 0 2
What is wrong?
3 Since hash2(35)=0.
4 Not acceptable!
18
5 4
6

58
Warning:

59
Note:
 If hash2(k) =1, then it is the same as linear probing.

 If hash2(k) =d, where d is a constant integer > 1, then it

is the same as modified linear probing.

60
Good Collision Resolution Method

 Small cluster size

 Always find an empty slot if it exists
 Give different probe sequences when 2 keys
collide
 Fast

61
[CS1020 Lecture 10 AY2013/4 S1]
62
Summary
 How to hash? Criteria for good hash functions?
 How to resolve collision?
Collision resolution techniques:
 separate chaining
 linear probing
 quadratic probing
 double hashing
 Problem on deletions
 Primary clustering and secondary clustering.

Hashing PPT
No ratings yet
Hashing PPT
39 pages
Hafta.
No ratings yet
Hafta.
34 pages
Chapter 11 Hashing
No ratings yet
Chapter 11 Hashing
42 pages
Hashing
No ratings yet
Hashing
25 pages
Hash Tables in DS
No ratings yet
Hash Tables in DS
14 pages
Lecture 3.2.2 Collision Resolution Strategies
No ratings yet
Lecture 3.2.2 Collision Resolution Strategies
35 pages
20 Hashing
No ratings yet
20 Hashing
47 pages
Collision Resolution
No ratings yet
Collision Resolution
17 pages
Lecture 13
No ratings yet
Lecture 13
15 pages
Hashing
No ratings yet
Hashing
29 pages
Hashing Updated
No ratings yet
Hashing Updated
26 pages
5 Hash - New
No ratings yet
5 Hash - New
24 pages
HASHING
No ratings yet
HASHING
63 pages
HASHING
No ratings yet
HASHING
28 pages
Hashing
No ratings yet
Hashing
66 pages
Ds 17hashing
No ratings yet
Ds 17hashing
27 pages
Full Unit 6 Cse 205
No ratings yet
Full Unit 6 Cse 205
20 pages
Collision Handling Techniques
No ratings yet
Collision Handling Techniques
4 pages
Chapter 11-Hash Tables
No ratings yet
Chapter 11-Hash Tables
42 pages
SHA-3 and The Hash Function Keccak: Christof Paar Jan Pelzl
No ratings yet
SHA-3 and The Hash Function Keccak: Christof Paar Jan Pelzl
21 pages
Module 5
No ratings yet
Module 5
33 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
26 pages
Wa0024.
No ratings yet
Wa0024.
11 pages
Unit 1 Hashing
No ratings yet
Unit 1 Hashing
61 pages
Dsa 5
No ratings yet
Dsa 5
22 pages
SORTING PROGRAMS - Counting + Bucket + Heap
No ratings yet
SORTING PROGRAMS - Counting + Bucket + Heap
27 pages
11-Hashing-Hong Kong
No ratings yet
11-Hashing-Hong Kong
25 pages
Chapter 3 File Organization Hashing
No ratings yet
Chapter 3 File Organization Hashing
37 pages
Algo Cha 8
No ratings yet
Algo Cha 8
20 pages
Chapter10 HashTables
No ratings yet
Chapter10 HashTables
49 pages
Hashing: Data Structure
No ratings yet
Hashing: Data Structure
17 pages
Hash Table v2
No ratings yet
Hash Table v2
34 pages
Hashing v2 12032018
No ratings yet
Hashing v2 12032018
23 pages
11 Hashing
No ratings yet
11 Hashing
60 pages
Hashing - Datastructures and Algorithms
No ratings yet
Hashing - Datastructures and Algorithms
32 pages
Cse373 10 Hashing
No ratings yet
Cse373 10 Hashing
36 pages
TCP2101 Algorithm Design & Analysis: - Hash Tables
No ratings yet
TCP2101 Algorithm Design & Analysis: - Hash Tables
58 pages
Hashing24 PDF
No ratings yet
Hashing24 PDF
10 pages
CSE 326: Data Structures Hash Tables: Autumn 2007
No ratings yet
CSE 326: Data Structures Hash Tables: Autumn 2007
29 pages
HASHING
No ratings yet
HASHING
21 pages
Hash Table PDF
No ratings yet
Hash Table PDF
25 pages
Ads M Tech Mid 2
No ratings yet
Ads M Tech Mid 2
26 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
32 pages
Hashing
No ratings yet
Hashing
20 pages
A Review Paper On Cryptographic Hash Function
No ratings yet
A Review Paper On Cryptographic Hash Function
11 pages
Dsa Labtask 12
No ratings yet
Dsa Labtask 12
5 pages
Hashing: Data Structure
No ratings yet
Hashing: Data Structure
17 pages
Hashing
No ratings yet
Hashing
35 pages
Collision
No ratings yet
Collision
24 pages
Hashing PPT For Student
No ratings yet
Hashing PPT For Student
53 pages
Hashing
No ratings yet
Hashing
23 pages
Hashing PDF
No ratings yet
Hashing PDF
56 pages
Cryptographic Hash Functions
No ratings yet
Cryptographic Hash Functions
55 pages
3 Hashing
No ratings yet
3 Hashing
20 pages
Chapter One - Hashing PDF
No ratings yet
Chapter One - Hashing PDF
30 pages
Cs 218 - Data Structures: Hashing
No ratings yet
Cs 218 - Data Structures: Hashing
18 pages
Ch7 Hashing
No ratings yet
Ch7 Hashing
12 pages
Chapter 8 - Searching
No ratings yet
Chapter 8 - Searching
44 pages
Unit3 Notes
No ratings yet
Unit3 Notes
42 pages
Hashing: Amar Jukuntla
No ratings yet
Hashing: Amar Jukuntla
22 pages
Hashing
No ratings yet
Hashing
35 pages
BCS304-DSA Notes M-5
100% (1)
BCS304-DSA Notes M-5
22 pages
9-Hashing Schemes
No ratings yet
9-Hashing Schemes
23 pages
Collision Resolution Techniques
No ratings yet
Collision Resolution Techniques
15 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
27 pages
ADI Hashing
No ratings yet
ADI Hashing
47 pages
Tools JohnTheRipper
No ratings yet
Tools JohnTheRipper
43 pages
Lab8 Hash
No ratings yet
Lab8 Hash
43 pages
Cryptographic Hash Functions
No ratings yet
Cryptographic Hash Functions
37 pages
Review Paper On Secure Hash Algorithm With Its Variants: Research
No ratings yet
Review Paper On Secure Hash Algorithm With Its Variants: Research
8 pages
IT245 - Module 8
No ratings yet
IT245 - Module 8
41 pages
Poetry
No ratings yet
Poetry
37 pages
Presentation 1 PPT Computer Club
No ratings yet
Presentation 1 PPT Computer Club
23 pages
VHDL Realisation of SHA-256 Algorithm
No ratings yet
VHDL Realisation of SHA-256 Algorithm
59 pages
Handout 9 - Hashing
No ratings yet
Handout 9 - Hashing
11 pages
Passwords and Hash Summ For Data Files
No ratings yet
Passwords and Hash Summ For Data Files
3 pages
Dsa 1
No ratings yet
Dsa 1
6 pages
Zodiac Price List 2023 Marco Z Lupi Final
No ratings yet
Zodiac Price List 2023 Marco Z Lupi Final
5 pages
Collision Risk in Hash-Based Surrogate Keys - by Krzysztof K. Zdeb - Nov, 2024 - Towards Data Science
No ratings yet
Collision Risk in Hash-Based Surrogate Keys - by Krzysztof K. Zdeb - Nov, 2024 - Towards Data Science
11 pages
DSA Assignment No 1
No ratings yet
DSA Assignment No 1
2 pages
Unit5 FDS
No ratings yet
Unit5 FDS
21 pages
Be - Artificial Intelligence and Data Science - Semester 4 - 2024 - March - Data Structures and Algorithms Ds A 2019 Pattern
No ratings yet
Be - Artificial Intelligence and Data Science - Semester 4 - 2024 - March - Data Structures and Algorithms Ds A 2019 Pattern
2 pages
Plantilla-IOC 6 MARZO
No ratings yet
Plantilla-IOC 6 MARZO
11 pages
Assignment-3 - Abdul Haleem
No ratings yet
Assignment-3 - Abdul Haleem
4 pages
MD5 Research: 1. Using The Textbook, Describe MD-5
No ratings yet
MD5 Research: 1. Using The Textbook, Describe MD-5
3 pages
DSA Final
No ratings yet
DSA Final
5 pages
Overview of Hash Tables
No ratings yet
Overview of Hash Tables
4 pages
An Empirical Study To Detect The Collision Rate in Similarity Hashing Algorithm Using MD5
No ratings yet
An Empirical Study To Detect The Collision Rate in Similarity Hashing Algorithm Using MD5
4 pages
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
From Everand
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
Peter Kattan
2.5/5 (2)

L04 Hashing

Uploaded by

L04 Hashing

Uploaded by

L04: Hashing

For efficient look-up in a table

Note: Balanced Binary Search Tree (BST) will be covered

insert (key, data)

A generalization of direct addressing

 What happen when you hash Singapore’s

hash (k ) = m(kA- kA )

= (“T” + “a” + “n” + “ ” +

All 3 strings above have the same hash value! Why?

 Problem: The hash code produced by hash1 does

Q(n) = Probability of unique birthday for n people

364 363 362 365 - n  1

Conflict resolution schemes commonly used:

 Note that  affects the performance of find and

4.2 Linear Probing: Insert 1

Linear Probing: Find 35

When quadratic probing is used, in the worst

 If hash2(k) =d, where d is a constant integer > 1, then it

 Small cluster size

You might also like