0% found this document useful (0 votes)

10 views47 pages

Lecture 8 Hashing

Uploaded by

kalebwondwossent

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views47 pages

Lecture 8 Hashing

Uploaded by

kalebwondwossent

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

Lecture 8:Hashing

1
The Search Problem
• Find items with keys matching a given search
key
– Given an array A, containing n keys, and a
search key x, find the index i such as x=A[i]
– As in the case of sorting, a key could be part
of a large record.

2
Applications
• Keeping track of customer account information
at a bank
– Search through records to check balances and
perform transactions
• Keep track of reservations on flights
– Search to find empty seats, cancel/modify
reservations
• Search engine
– Looks for all documents containing a given word

3
Special Case: Dictionaries

• Dictionary = data structure that supports mainly

two basic operations: insert a new item and
return an item with a given key
• Queries: return information about the set S:
– Search (S, k)
– Minimum (S), Maximum (S)
– Successor (S, x), Predecessor (S, x)
• Modifying operations: change the set
– Insert (S, k)
– Delete (S, k) – not very often
4
Direct Addressing
• Assumptions:
– Key values are distinct
– Each key is drawn from a universe U = {0, 1, . . . , m - 1}
• Idea:
– Store the items in an array, indexed by keys

• Direct-address table representation:

– An array T[0 . . . m - 1]
– Each slot, or position, in T corresponds to a key in U
– For an element x with key k, a pointer to x (or x itself) will be placed
in location T[k]
– If there are no elements with key k in the set, T[k] is empty,
represented by NIL

5
Operations
Alg.: DIRECT-ADDRESS-SEARCH(T, k)
return T[k]

Alg.: DIRECT-ADDRESS-INSERT(T, x)
T[key[x]] ← x

Alg.: DIRECT-ADDRESS-DELETE(T, x)
T[key[x]] ← NIL
• Running time for these operations: O(1)

6
Comparing Different Implementations
• Implementing dictionaries using:
– Direct addressing
– Ordered/unordered arrays
– Ordered/unordered linked lists

Insert Search
direct addressing O(1) O(1)
ordered array O(N) O(lgN)
ordered list O(N) O(N)
unordered array O(1) O(N)
unordered list O(1) O(N)
7
Hash Tables
• When K is much smaller than U, a hash table
requires much less space than a direct-
address table
– Can reduce storage requirements to |K|
– Can still get O(1) search time, but on the average
case, not the worst case

8
Hash Tables
Idea:
– Use a function h to compute the slot for each key
– Store the element in slot h(k)

• A hash function h transforms a key into an index in a hash

table T[0…m-1]:
h : U → {0, 1, . . . , m - 1}
• We say that k hashes to slot h(k)
• Advantages:
– Reduce the range of array indices handled: m instead of |U|
– Storage is also reduced

9
Example: HASH TABLES
0

U
(universe of keys) h(k1)
h(k4)

K k1
h(k2) = h(k5)
(actual k4 k2
keys)
k5 k3 h(k3)

m-1

10
Do you see any problems
with this approach?
0

U
(universe of keys) h(k1)
h(k4)

K k1
h(k2) = h(k5)
(actual k4 k2
keys) Collisions!
k5 k3 h(k3)

m-1

11
Collisions
• Two or more keys hash to the same slot!!
• For a given set K of keys
– If |K| ≤ m, collisions may or may not happen,
depending on the hash function
– If |K| > m, collisions will definitely happen (i.e., there
must be at least two keys that have the same hash
value)
• Avoiding collisions completely is hard, even with
a good hash function

12
Handling Collisions
• We will review the following methods:
– Chaining
– Open addressing
• Linear probing
• Quadratic probing
• Double hashing
• We will discuss chaining first, and ways to
build “good” functions.

13
Handling Collisions Using Chaining
• Idea:
– Put all elements that hash to the same slot into a
linked list

– Slot j contains a pointer to the head of the list of all

elements that hash to j 14
Collision with Chaining - Discussion
• Choosing the size of the table
– Small enough not to waste space
– Large enough such that lists remain short
– Typically 1/5 or 1/10 of the total number of elements
• How should we keep the lists: ordered or not?
– Not ordered!
• Insert is fast
• Can easily remove the most recently inserted elements

15
Insertion in Hash Tables
Alg.: CHAINED-HASH-INSERT(T, x)
insert x at the head of list T[h(key[x])]

• Worst-case running time is O(1)

• Assumes that the element being inserted isn’t already
in the list
• It would take an additional search to check if it was
already inserted

16
Deletion in Hash Tables
Alg.: CHAINED-HASH-DELETE(T, x)
delete x from the list T[h(key[x])]

• Need to find the element to be deleted.

• Worst-case running time:
– Deletion depends on searching the corresponding
list

17
Searching in Hash Tables

Alg.: CHAINED-HASH-SEARCH(T, k)

search for an element with key k in list

T[h(k)]

• Running time is proportional to the length of

the list of elements in slot h(k)

18
Analysis of Hashing with Chaining:
Worst Case
• How long does it take to search T
for an element with a given key? 0

• Worst case:
– All n keys hash to the same slot
– Worst-case time to search is (n),
plus time to compute the hash
chain
function

m-1

19
Analysis of Hashing with Chaining:
• Average case
Average Case
– depends on how well the hash function
distributes the n keys among the m slots T
n0 = 0
• Simple uniform hashing assumption:
– Any given element is equally likely to n2
hash into any of the m slots (i.e., probability n3
of collision Pr(h(x)=h(y)), is 1/m)
nj
• Length of a list:
T[j] = nj, j = 0, 1, . . . , m – 1 nk
• Number of keys in the table:
n = n0 + n1 +· · · + nm-1 nm – 1 = 0

• Average value of nj:

E[nj] = α = n/m 20
Load Factor of a Hash Table
• Load factor of a hash table T:
T
 = n/m 0

– n = # of elements stored in the table

chain
– m = # of slots in the table = # of linked lists chain

•  encodes the average number of chain

elements stored in a chain
chain
•  can be <, =, > 1
m-1

21
Case 1: Unsuccessful Search
(i.e., item not stored in the table)
Theorem
An unsuccessful search in a hash table takes expected time (1   )
under the assumption of simple uniform hashing
(i.e., probability of collision Pr(h(x)=h(y)), is 1/m)
Proof
• Searching unsuccessfully for any key k
– need to search to the end of the list T[h(k)]
• Expected length of the list:
– E[nh(k)] = α = n/m
• Expected number of elements examined in an unsuccessful search is α
• Total time required is:
– O(1) (for computing the hash function) + α  (1   )
22
Case 2: Successful Search

23
Analysis of Search in Hash Tables
• If m (# of slots) is proportional to n (# of
elements in the table):

• n = O(m)

• α = n/m = O(m)/m = O(1)

 Searching takes constant time on average

24
Hash Functions
• A hash function transforms a key into a table
address
• What makes a good hash function?
(1) Easy to compute
(2) Approximates a random function: for every input,
every output is equally likely (simple uniform hashing)
• In practice, it is very hard to satisfy the simple
uniform hashing property
– i.e., we don’t know in advance the probability
distribution that keys are drawn from

25
Good Approaches for Hash Functions

• Minimize the chance that closely related keys

hash to the same slot
– Strings such as pt and pts should hash to
different slots
• Derive a hash value that is independent from
any patterns that may exist in the
distribution of the keys

26
The Division Method
• Idea:
– Map a key k into one of the m slots by taking the
remainder of k divided by m
h(k) = k mod m
• Advantage:
– fast, requires only one operation
• Disadvantage:
– Certain values of m are bad, e.g.,
• power of 2
• non-prime numbers
27
Example - The Division Methodm97 m

• If m = 2p, then h(k) is just the least 100

significant p bits of k
– p = 1 m = 2
 h(k) = {0, 1} , least significant 1 bit of k
– p = 2m = 4
 h(k) ={0, 1, 2, 3} , least significant 2
bits of k
 Choose m to be a prime, not close to a
power of 2 k mod 97
 Column 2: k mod 100
 Column 3:
28
The Multiplication Method
Idea:
• Multiply key k by a constant A, where 0 < A < 1
• Extract the fractional part of kA
• Multiply the fractional part by m
• Take the floor of the result
h(k) = = m (k A mod 1)

fractional part of kA = kA - kA

• Disadvantage: Slower than division method

• Advantage: Value of m is not critical, e.g., typically 2p
29
Example – Multiplication Method

30
Universal Hashing
• In practice, keys are not randomly distributed
• Any fixed hash function might yield Θ(n) time
• Goal: hash functions that produce random
table indices irrespective of the keys
• Idea:
– Select a hash function at random, from a designed
class of functions at the beginning of the execution

31
Universal Hashing

(at the beginning

of the execution)

32
Definition of Universal Hash Functions

H={h(k): U(0,1,..,m-1)}

33
Universal Hashing – Main Result

With universal hashing the chance of collision

between distinct keys k and l is no more than
the 1/m chance of collision if locations h(k)
and h(l) were randomly and independently
chosen from the set {0, 1, …, m – 1}

34
Designing a Universal Class
of Hash Functions
• Choose a prime number p large enough so that every possible
key k is in the range [0 ... p – 1]
Zp = {0, 1, …, p - 1} and Zp* = {1, …, p - 1}
• Define the following hash function
ha,b(k) = ((ak + b) mod p) mod m,
 a  Zp* and b  Zp The class Hp,m of hash
functions is universal
• The family of all such hash functions is
Hp,m = {ha,b: a  Zp* and b  Zp}
• a , b: chosen randomly at the beginning of execution
35
Example: Universal Hash Functions
E.g.: p = 17, m = 6

ha,b(k) = ((ak + b) mod p) mod m

h3,4(8) = ((38 + 4) mod 17) mod 6

= (28 mod 17) mod 6

= 11 mod 6

=5
36
Advantages of Universal Hashing

• Universal hashing provides good results on

average, independently of the keys to be stored
• Guarantees that no input will always elicit the
worst-case behavior
• Poor performance occurs only when the
random choice returns an inefficient hash
function – this has small probability
37
Open Addressing
• If we have enough contiguous memory to store all the keys (m >
N)  store the keys in the table itself
• No need to use linked lists anymore
• Basic idea:
– Insertion: if a slot is full, try another one,
until you find an empty one
– Search: follow the same sequence of probes
– Deletion: more difficult ... (we’ll see why)

• Search time depends on the length of the

probe sequence!
38
Generalize hash function notation:
• A hash function contains two arguments now:
(i) Key value, and (ii) Probe number

h(k,p), p=0,1,...,m-1

• Probe sequences
<h(k,0), h(k,1), ..., h(k,m-1)>
– Must be a permutation of <0,1,...,m-1>
– There are m! possible permutations
– Good hash functions should be able to
produce all m! probe sequences

39
Common Open Addressing Methods

• Linear probing
• Quadratic probing
• Double hashing

• Note: None of these methods can generate

more than m2 different probing sequences!

40
Linear probing: Inserting a key
• Idea: when there is a collision, check the next available
position in the table (i.e., probing)

h(k,i) = (h1(k) + i) mod m

i=0,1,2,...
• First slot probed: h1(k)
• Second slot probed: h1(k) + 1
• Third slot probed: h1(k)+2, and so on
probe sequence: < h1(k), h1(k)+1 , h1(k)+2 , ....>

• Can generate m probe sequences maximum, why?

wrap around
41
Linear probing: Searching for a key
• Three cases:
(1) Position in table is occupied with an
element of equal key 0

(2) Position in table is empty

h(k1)
(3) Position in table occupied with a h(k4)
different element
• Case 2: probe the next higher index h(k2) = h(k5)
until the element is found or an
h(k3)
empty position is found
• The process wraps around to the m-1
beginning of the table

42
Linear probing: Deleting a key
• Problems
– Cannot mark the slot as empty 0
– Impossible to retrieve keys inserted
after that slot was occupied
• Solution
– Mark the slot with a sentinel value
DELETED
• The deleted slot can later be used for
insertion m-1
• Searching will be able to find all the
keys
43
Primary Clustering Problem
• Some slots become more likely than others
• Long chunks of occupied slots are created
 search time increases!!

initially, all slots have probability 1/m

Slot b:
2/m

Slot d:
4/m

Slot e:
5/m

44
Quadratic probing

i=0,1,2,...

45
Double Hashing
(1) Use one hash function to determine the first slot
(2) Use a second hash function to determine the
increment for the probe sequence
h(k,i) = (h1(k) + i h2(k) ) mod m, i=0,1,...
• Initial probe: h1(k)
• Second probe is offset by h2(k) mod m, so on ...
• Advantage: avoids clustering
• Disadvantage: harder to delete an element
• Can generate m2 probe sequences maximum
46
Double Hashing: Example
h1(k) = k mod 13 0
1 79
h2(k) = 1+ (k mod 11) 2
h(k,i) = (h1(k) + i h2(k) ) mod 13 3
4 69
• Insert key 14: 5 98

h1(14,0) = 14 mod 13 = 1 6
7 72
h(14,1) = (h1(14) + h2(14)) mod 13 8
9
= (1 + 4) mod 13 = 5 10
14

h(14,2) = (h1(14) + 2 h2(14)) mod 13 11 50

12
= (1 + 8) mod 13 = 9
47

Hashing in DBMS
No ratings yet
Hashing in DBMS
6 pages
9-Hashing Schemes
No ratings yet
9-Hashing Schemes
23 pages
Lecture9f13 Hashing
No ratings yet
Lecture9f13 Hashing
29 pages
Day3.2 DS2 HashTablesHeaps
No ratings yet
Day3.2 DS2 HashTablesHeaps
61 pages
Exercises ILP
No ratings yet
Exercises ILP
19 pages
Brute-Force Searching and String Matching
No ratings yet
Brute-Force Searching and String Matching
15 pages
Dsa Lecture 13 Hash Tables
No ratings yet
Dsa Lecture 13 Hash Tables
15 pages
Chapter 8 Hash Table (Part A)
No ratings yet
Chapter 8 Hash Table (Part A)
34 pages
09 Hashtable
No ratings yet
09 Hashtable
53 pages
Lecture05 Hash Table
No ratings yet
Lecture05 Hash Table
65 pages
Lecture 7 - Hash - Table - Direct - Adreess - Tables - Hash - Tables - Intro - Separate - Chaining
No ratings yet
Lecture 7 - Hash - Table - Direct - Adreess - Tables - Hash - Tables - Intro - Separate - Chaining
77 pages
Lecture 13 - Hash Tables
No ratings yet
Lecture 13 - Hash Tables
51 pages
Unit Nine
No ratings yet
Unit Nine
31 pages
DSA2 Chapter 5 Hashing
No ratings yet
DSA2 Chapter 5 Hashing
44 pages
Dsa Merged
No ratings yet
Dsa Merged
339 pages
Assignment 4
No ratings yet
Assignment 4
8 pages
Lecture 12
No ratings yet
Lecture 12
19 pages
06 - APS - Hash Table
No ratings yet
06 - APS - Hash Table
28 pages
Chapter6 Searching
No ratings yet
Chapter6 Searching
28 pages
Hash
No ratings yet
Hash
5 pages
Tri Tue Nhan Tao BTL
No ratings yet
Tri Tue Nhan Tao BTL
17 pages
DSA - Unit 1
No ratings yet
DSA - Unit 1
43 pages
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
No ratings yet
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
53 pages
Hashing RPK
No ratings yet
Hashing RPK
61 pages
11 Hashing
No ratings yet
11 Hashing
60 pages
Dsa 4
No ratings yet
Dsa 4
55 pages
Modue 5
No ratings yet
Modue 5
10 pages
CS 03
No ratings yet
CS 03
22 pages
Hash Tables
No ratings yet
Hash Tables
35 pages
Solving Problems by Searching: Chapter 3 - Unit 2
No ratings yet
Solving Problems by Searching: Chapter 3 - Unit 2
70 pages
12 Hashing
No ratings yet
12 Hashing
9 pages
AI Notes
No ratings yet
AI Notes
13 pages
Uninformed and Informed Search
No ratings yet
Uninformed and Informed Search
55 pages
Collision Resolution Techniques
No ratings yet
Collision Resolution Techniques
15 pages
5-Informed Search Methods - Best First Search-01!02!2024
No ratings yet
5-Informed Search Methods - Best First Search-01!02!2024
86 pages
HASHING
No ratings yet
HASHING
8 pages
SORTING PROGRAMS - Counting + Bucket + Heap
No ratings yet
SORTING PROGRAMS - Counting + Bucket + Heap
27 pages
Hashing: Review Examples Questions
No ratings yet
Hashing: Review Examples Questions
14 pages
Ads-Unit I
No ratings yet
Ads-Unit I
16 pages
0.1 Direct-Address Tables
No ratings yet
0.1 Direct-Address Tables
10 pages
Dsa Module 6 Ktustudents - in
No ratings yet
Dsa Module 6 Ktustudents - in
9 pages
Module 5
No ratings yet
Module 5
33 pages
String Matching
No ratings yet
String Matching
9 pages
Hashing
No ratings yet
Hashing
23 pages
Lecture 27 - Hashing
No ratings yet
Lecture 27 - Hashing
48 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
26 pages
Unit 1 Hashing
No ratings yet
Unit 1 Hashing
61 pages
Hashing
No ratings yet
Hashing
37 pages
1 Hashing: 1.1 Desired Properties
No ratings yet
1 Hashing: 1.1 Desired Properties
8 pages
06 Hashing
No ratings yet
06 Hashing
6 pages
DS On Search
No ratings yet
DS On Search
17 pages
AI File
No ratings yet
AI File
5 pages
Informed - Uninformed - Heuristic Search
No ratings yet
Informed - Uninformed - Heuristic Search
49 pages
CH 4
No ratings yet
CH 4
58 pages
AI Practical Exam
No ratings yet
AI Practical Exam
8 pages
Aidoc 1
No ratings yet
Aidoc 1
11 pages
Lab 3
No ratings yet
Lab 3
5 pages
Uber LeetCode
No ratings yet
Uber LeetCode
6 pages
DS SORTING SEARCHING Notes
No ratings yet
DS SORTING SEARCHING Notes
27 pages
Ai Lab Manual
No ratings yet
Ai Lab Manual
28 pages
Hash Tables - : Structure
No ratings yet
Hash Tables - : Structure
21 pages
Chapter 4
No ratings yet
Chapter 4
75 pages
A Variable Neighborhood Search Heuristic For Periodic Routing Problems
No ratings yet
A Variable Neighborhood Search Heuristic For Periodic Routing Problems
13 pages
Heuristic in Ai: BY-Chiranjeev Sharma A2324717001 Suraj Kumar Ojha A2324717007 7cse3 Mba
No ratings yet
Heuristic in Ai: BY-Chiranjeev Sharma A2324717001 Suraj Kumar Ojha A2324717007 7cse3 Mba
14 pages
COLLISON
No ratings yet
COLLISON
17 pages
Dsa Module 6 Ktuassist
No ratings yet
Dsa Module 6 Ktuassist
9 pages
Hashing
No ratings yet
Hashing
30 pages
Algorithms & Data Structures 06
No ratings yet
Algorithms & Data Structures 06
13 pages
Lect Hashing
No ratings yet
Lect Hashing
36 pages
Ai Chapter 3
No ratings yet
Ai Chapter 3
20 pages
Hash Table PDF
No ratings yet
Hash Table PDF
25 pages
Hashing in Data Structure
No ratings yet
Hashing in Data Structure
43 pages
Final Hashing
No ratings yet
Final Hashing
41 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
32 pages
Hashing
No ratings yet
Hashing
20 pages
05 Hashing
No ratings yet
05 Hashing
47 pages
AST20105 Data Structure and Algorithms: Chapter 9 - Hash Table
No ratings yet
AST20105 Data Structure and Algorithms: Chapter 9 - Hash Table
39 pages
Chapter 3 - Searching-Part 1
No ratings yet
Chapter 3 - Searching-Part 1
103 pages
Hashing PDF
No ratings yet
Hashing PDF
56 pages
L5 HashTables
No ratings yet
L5 HashTables
22 pages
Hashing
No ratings yet
Hashing
56 pages
DSA MK Lect2 PDF
No ratings yet
DSA MK Lect2 PDF
92 pages
Hashing PPT For Student
No ratings yet
Hashing PPT For Student
53 pages
MIT6 006F11 Lec08 PDF
No ratings yet
MIT6 006F11 Lec08 PDF
7 pages
Hashing PDF
No ratings yet
Hashing PDF
65 pages
Hashing and Indexing
No ratings yet
Hashing and Indexing
28 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
27 pages
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
No ratings yet
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
53 pages
Recursive Analysis
From Everand
Recursive Analysis
R. L. Goodstein
No ratings yet
An Introduction to Linear Algebra and Tensors
From Everand
An Introduction to Linear Algebra and Tensors
M. A. Akivis
1/5 (1)

Lecture 8 Hashing

Uploaded by

Lecture 8 Hashing

Uploaded by

Lecture 8:Hashing

• Dictionary = data structure that supports mainly

• Direct-address table representation:

• A hash function h transforms a key into an index in a hash

– Slot j contains a pointer to the head of the list of all

• Worst-case running time is O(1)

• Need to find the element to be deleted.

search for an element with key k in list

• Running time is proportional to the length of

the list of elements in slot h(k)

• Average value of nj:

– n = # of elements stored in the table

•  encodes the average number of chain

• α = n/m = O(m)/m = O(1)

 Searching takes constant time on average

• Minimize the chance that closely related keys

• If m = 2p, then h(k) is just the least 100

fractional part of kA = kA - kA

• Disadvantage: Slower than division method

(at the beginning

With universal hashing the chance of collision

ha,b(k) = ((ak + b) mod p) mod m

h3,4(8) = ((38 + 4) mod 17) mod 6

= (28 mod 17) mod 6

• Universal hashing provides good results on

• Search time depends on the length of the

• Note: None of these methods can generate

h(k,i) = (h1(k) + i) mod m

• Can generate m probe sequences maximum, why?

(2) Position in table is empty

initially, all slots have probability 1/m

h(14,2) = (h1(14) + 2 h2(14)) mod 13 11 50

You might also like