0% found this document useful (0 votes)

49 views18 pages

Week 10: Hash Table: Readings

Hash tables provide constant time operations on average by mapping keys to integer indices in an array. Collisions occur when two keys hash to the same index. Separate chaining resolves collisions by storing keys in linked lists at each index. Linear probing resolves collisions by searching sequentially for empty slots after the initial index. Both separate chaining and linear probing require lazy deletion, where deleted items are marked as deleted instead of removed to avoid invalidating the hash table structure.

Uploaded by

tjm.stkr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views18 pages

Week 10: Hash Table: Readings

Uploaded by

tjm.stkr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Week 10:

Hash Table

Readings
p

Required

Exercise

[Weiss] ch20
20.5

6
nus.soc.cs1102b.week10

Hash Table is a data structure that support

the most common dynamic set operations
in constant time on average. It has many
many applications.

Recap
Unsorted

Sorted

BST

Array/List Array

Insert

O(1)

Hash
Table

O(N)

O(log N) O(1) avg

Delete

O(N)

Find

O(N)

O(logN) O(log N) O(1) avg

findMin

O(N)

O(1)

O(log N) O(N)

findMax O(N)

O(1)

O(log N) O(N)

9
nus.soc.cs1102b.week10

Direct Addressing
Table

9 October 2002

Direct address table, is a simplified

version of hash table.

Consider the problem of maintaining

information about SBS (and TIBS) bus
services. We want to support three
operations find, insert and delete.

SBS Bus Problem

find(N)
n

insert(N)
n

Does bus service no. N exist?

Introduce a new bus service no. N

delete(N)
n

Remove bus service no. N

11
nus.soc.cs1102b.week10

Since bus numbers are integers between 0

999, we can create an array with 1000
booleans, initialized to false. If bus
service N exists, just set position N to
true. All find, delete, and insert can be
done in O(1) time.

SBS Bus Problem

0 false
1 false
2 true
:
:

989

true

12
nus.soc.cs1102b.week10

We can extends this idea, if we want to

maintain additional data about a bus. Use
an array of 1000 slots, each can reference
to an Object.

Direct Addressing Table

0
1
2

2, data
:

989

989, data

13
nus.soc.cs1102b.week10

Direct Addressing Table

insert (key, data)
a[key] = data
delete (key)
a[key] = null
search (key)
return a[key]
14
nus.soc.cs1102b.week10

9 October 2002

This works only if keys are integers,

(cannot keep track of bus no NR10,
162M) and the range for the keys must be
small (if keys are phone numbers, you
need an array of size 10 million).

Restrictions
p
p

Keys must be integer

Range of keys must be small

15
nus.soc.cs1102b.week10

Hash Table is a generalization of direct

addressing table, to remove these
restrictions.

Hash Table

The idea is to map any keys to small

integers. We call this hashing. The
function that map keys to integers are call
hash function.

Idea
p
p

Map non-integer keys to integers

Map large integers to smaller integers

HASHING

17
nus.soc.cs1102b.week10

h is a hash function. This example shows

how we map phone numbers to slot
numbers between 0 and 999.

Hash Table
66752378
h

66752378,
data

68744483
h

974
68744483,
data

18
nus.soc.cs1102b.week10

9 October 2002

Here is the pseudocode: notice that we

have replaced key with h(key).
(This does not work! See next slide)

Hash Table
insert (key, data)
a[ h(key) ] = data
delete (key)
a[ h(key) ] = null
search (key)
return a[ h(key) ]
19
nus.soc.cs1102b.week10

But a hash function does not guarantee

that two different keys goes into different
slots! This is called a collision.

Hash Table
66752378,
data

67774385

h
:
68744483,
data

20
nus.soc.cs1102b.week10

Problem
p

Two keys can have the same hash value

COLLISION

21
nus.soc.cs1102b.week10

To implement hash table, we need to

answer two que stions: how to define a
hash function and how to resolve
collision. They are important issues that
can affect the efficiency of hash table.

Overview of This Lecture

p
p

How to hash?
How to resolve collision?

22
nus.soc.cs1102b.week10

9 October 2002

Hash Functions

Good Hash Functions

p appear

random

p fast
p depends

on all information in the key

p keys that are close have hash values
that are far apart

24
nus.soc.cs1102b.week10

It is possible to have a perfect hash

function: where collision is guaranteed
not to occur.

Perfect Hashing Function

One-to-one mapping between keys and
hash values.
p Maybe possible if all keys are known
p

25
nus.soc.cs1102b.week10

A uniform hashing function put a key into

a slot with equal probability.

Uniform Hashing Function

Distributes keys evenly

Example
n

if k are integers uniformly distributed among 0 and

X-1

k [0, X )
k m
hash( k ) =
X
26
nus.soc.cs1102b.week10

9 October 2002

There are many ways to hash an integer.

Hashing Integers

The most popular one is the division

method: where we use the mod operator
(% in Java) to map an integer to values
between 0 and m-1 (inclusive).

Division Method
p

Mapped into table of m slots

hash( k )= k % m
28
nus.soc.cs1102b.week10

mod operator
p

n mod m = remainder of n divided by m

29
nus.soc.cs1102b.week10

The choice of m (or hash table size) is

important. If m is power of two, say 2n ,
then key modulo of m is the same of last n
bits of the key.
If m is 10n , then our hash values is the last
n digit of keys.
We usually pick m to be a prime number
close to a power of two.

How to pick m?
p

m = 16

m = 10

m = 13

30
nus.soc.cs1102b.week10

9 October 2002

Rule
p

Pick m to be a prime number not too

close to power of two.

31
nus.soc.cs1102b.week10

Another method is the multiplication

method. The golden ratio = (sqrt(5) 1)/2
seems to be a good choice for A.

Multiplication Method
1.Multiply by a number 0 <= A < 1
2.Extract the fractional part
3.Multiply by m

hash(k ) = m(kA kA)

32
nus.soc.cs1102b.week10

Hashing Strings

To hash a string, we can just sum up all

ascii values of ecah characters.

Hashing of Strings
hash(s, m)
sum = 0
foreach character c in s
sum += c
return sum % m

34
nus.soc.cs1102b.week10

9 October 2002

hash(Tan Ah Teck, 11)

= (T + a + n + +
A + h + +
T + e + c + k) % 11
= (84 + 97 + 110 + 32 +
65 + 104 + 32 +
84 + 101 + 99 + 107) % 11
= 825 % 11
= 0
35
nus.soc.cs1102b.week10

This only depends on the characters that

are present in a string, not their positions.

Hashing of Strings
Lee Chin Tan
Chen Le Tian
p Chan Tin Lee

p
p

Does not depend on

position of characters!

36
nus.soc.cs1102b.week10

A better way is to shift the sum

everytime, so that the position affects the
calculated hash values. (Note: Javas
String.hashCode( ) uses 31 instead of 37)

Hashing of Strings
hash(s)
sum = 0
foreach character c in s
sum += sum*37 + c
return sum % m

37
nus.soc.cs1102b.week10

Collision
Resolution

9 October 2002

Probability of Collision
p

von Mises Paradox: "How many people

must be in a room before the probability
that some share a birthday, ignoring the
year and leap days, becomes at least 50
percent?"

39
nus.soc.cs1102b.week10

Probability of Collision
Q(n) = Probability of unique birthday for n people
= 364
363 362 365 n + 1

365

...

365

P(n) = Probability of collisions for n people

= 1 Q(n)

P(23) = 0.507
40
nus.soc.cs1102b.week10

If we more than 23 keys into a table with

365 slots, more than half of the time we
get collision.

Probability of Collision

Collision is very likely !

41
nus.soc.cs1102b.week10

Collision Resolutions
Separate Chaining
Linear Probing
p Quadratic Probing
p Double Hashing
p
p

42
nus.soc.cs1102b.week10

9 October 2002

Separate Chaining

Separate Chaining is the most straight

forward method, using a linked- list to
store the collided keys.

Idea
0

k1,data
k2,data

m-1

k4,data

k3,data
44
nus.soc.cs1102b.week10

Insertion can be done in O(1) time. But

deletion and search takes O(n) time where
n is the length of the list.

Hash Table
insert (key, data)
insert data into the list a[ h(key) ]
delete (key)
delete data from the list a[ h(key) ]
search (key)
find key from the list a[ h(key) ]
45
nus.soc.cs1102b.week10

Analysis
n: number of keys
m: number of slots
p L: load factor
p
p

p
p

L = n/m
Average length of list = L

46
nus.soc.cs1102b.week10

9 October 2002

However, we can bound the length of the

chain by a constant.

Average Running Time

Search O(1 + L)
Insert O(1)
p Delete O(1 + L)

p
p

If L is bounded by some constant, then all

three operations are O(1)

47
nus.soc.cs1102b.week10

When ever the load factor exceeds the

bound, we need to rehash all keys into a
bigger table (increase m to reduce L)

Rehashing
p

To keep L bounded, we may need to

reconstruct the whole table

48
nus.soc.cs1102b.week10

Linear Probing

In linear probing, when we get a collision,

we scan through the table looking for an
empty slot (wrapping around when we
reach the last slot)

Linear Probing
hash(k)
k mod 7

0
1
2
3
4
5
6
50
nus.soc.cs1102b.week10

9 October 2002

21 collides with 14. Look for the next

empty slot.

Insert 21
hash(k)
k mod 7

2
3
4

5
6
53
nus.soc.cs1102b.week10

1 collided with 21. Look for an empty

slot.

Insert 1
hash(k)
k mod 7

3
4

5
6
54
nus.soc.cs1102b.week10

Insert 35
hash(k)
k mod 7

5
6
55
nus.soc.cs1102b.week10

Find a values is similar to find. We probe

the array starting from the original hash
position (in this case hash(35) = 0)

Find 35
hash(k)
k mod 7

FOUND 35

5
6
56
nus.soc.cs1102b.week10

9 October 2002

When probing, if we reach an empty slot,

we know that the value does not exist in
the hash table.

Find 8
hash(k)
k mod 7

8 NOT FOUND

6
57
nus.soc.cs1102b.week10

To delete, we first find the value, and

remove it from the table.

Delete 21
hash(k)
k mod 7

5
6
58
nus.soc.cs1102b.week10

We cannot simply remove a value,

because it can affect find( ) !

Find 35
hash(k)
k mod 7

35 NOT FOUND!

5
6
59
nus.soc.cs1102b.week10

Problem

Cannot Delete!

60
nus.soc.cs1102b.week10

9 October 2002

When a value is removed from linear

probed hash table, we just mark it as
deleted, instead of emptying the slot.

How to delete?
p
p

Lazy Deletion
Three different states
n
n
n

occupied
occupied but mark as deleted
empty

61
nus.soc.cs1102b.week10

Delete 21
hash(k)
k mod 7

21
X

5
6
62
nus.soc.cs1102b.week10

Find 35
hash(k)
k mod 7

21
X

FOUND 35

5
6
63
nus.soc.cs1102b.week10

When we insert, we can put a value into

either an empty slot, or a slot that has
been marked as deleted.

Insert 15
hash(k)
k mod 7

21
X

5
6
64
nus.soc.cs1102b.week10

9 October 2002

Insert 15
hash(k)
k mod 7

5
6
65
nus.soc.cs1102b.week10

The problem with linear probing is that it

can create many consecutive occupied
slots, increasing the running time of
find/insert/delete. This is called primary
clustering.

Problem

Primary Clustering

67
nus.soc.cs1102b.week10

An improvement to linear probing is

quadratic probing.

Quadratic Probing

The probe sequence for linear probing is

this.

Linear Probing
hash(key)
( hash(key) + 1 ) % m
( hash(key) + 2 ) % m
( hash(key) + 3 ) % m
:

69
nus.soc.cs1102b.week10

9 October 2002

For quadratic probing, we use this probe

sequence.

Quadratic Probing
hash(key)
( hash(key) + 1 ) % m
( hash(key) + 4 ) % m
( hash(key) + 9 ) % m
:

70
nus.soc.cs1102b.week10

Insert 3
hash(k)
k mod 7

0
1
2
3

5
6
71
nus.soc.cs1102b.week10

Notice that the calculation of +1 +4 +9 ..

starts from the original hash position. If
we were to start from the previous probe
position, the probe sequence should be +1
+3 +5 ..+ (2i -1).

Insert 38
hash(k)
k mod 7

1
2
3

(Q: Show mathematically that they are the

same)

5
6
72
nus.soc.cs1102b.week10

How can we be sure that quadratic

probing always terminate? Insert 12 into
the previous example, follow by 10. See
what happen?

Theorem
p

If L < 0.5, and m is prime, then we can

always find an empty slot if table is not
full.

73
nus.soc.cs1102b.week10

9 October 2002

Using quadratic probing requires more

careful design of hash table. It also
suffers from a (less minor) problem if
two keys has the same initial position,
they have the same probe sequence.

Problems
If two keys have the same initial position,
their probe sequence is the same.
p Secondary clustering.
p

74
nus.soc.cs1102b.week10

Double hashing uses a second hash

function to calculate the probe sequence,
so unless two keys have the same hash
values for both hash functions, they have
different probe sequences.

Double Hashing

hash2 (key) is the secondary hash function.

Double Hashing
hash(key)
(hash(key) + hash2(key)) % m
(hash(key) + 2*hash2(key)) % m
(hash(key) + 3*hash2(key)) % m
:

76
nus.soc.cs1102b.week10

We use k%5 as the secondary hash

function in this example. Can you give
two keys that have the same probe
sequence in this example?

Insert 21
hash(k)
k mod 7

hash 2(k)
k mod 5

If we insert 21, the probe sequence is the

same as linear probing.

3
4

5
6
77
nus.soc.cs1102b.week10

9 October 2002

If we insert 4, the probe sequence is 4, 8,

12 (from the first probe position) or 4,
4, 4, (from the previous probe
position).

Insert 4
hash(k)
k mod 7

hash 2(k)
k mod 5

2
3
4

6
78
nus.soc.cs1102b.week10

But if we insert 35, the probe sequence is

0, 0, 0,
What is wrong?

Insert 35
hash(k)
k mod 7

hash 2(k)
k mod 5

2
3
4

6
79
nus.soc.cs1102b.week10

Warning
p

Secondary hash function must not

evaluate to 0 !

Change hash2(key) to
hash2(key) = 5 (key % 5)

80
nus.soc.cs1102b.week10

Good Collision Resolution

Minimize clustering
Can find an empty slot if L is small
p Give different probe sequence when initial
probe is the same
p Fast
p
p

81
nus.soc.cs1102b.week10

9 October 2002

BCS304-DSA Notes M-5
100% (1)
BCS304-DSA Notes M-5
22 pages
Clrs Solution Collection
No ratings yet
Clrs Solution Collection
217 pages
Lecture 27 - Hashing
No ratings yet
Lecture 27 - Hashing
48 pages
Lect Hashing
No ratings yet
Lect Hashing
36 pages
Lecture 7 - Hash - Table - Direct - Adreess - Tables - Hash - Tables - Intro - Separate - Chaining
No ratings yet
Lecture 7 - Hash - Table - Direct - Adreess - Tables - Hash - Tables - Intro - Separate - Chaining
77 pages
CH 4
No ratings yet
CH 4
58 pages
Hashing in Data Structure
No ratings yet
Hashing in Data Structure
43 pages
DSA2 Chapter 5 Hashing
No ratings yet
DSA2 Chapter 5 Hashing
44 pages
Hashing PPT For Student
No ratings yet
Hashing PPT For Student
53 pages
Final Hashing
No ratings yet
Final Hashing
41 pages
05 Hashing
No ratings yet
05 Hashing
47 pages
Chapter 5 - Hashing - Part1
No ratings yet
Chapter 5 - Hashing - Part1
28 pages
Hashing and Indexing
No ratings yet
Hashing and Indexing
28 pages
Hash Tables: COT4810 Ken Pritchard 2 Sep 04
No ratings yet
Hash Tables: COT4810 Ken Pritchard 2 Sep 04
20 pages
Cse373 10 Hashing
No ratings yet
Cse373 10 Hashing
36 pages
Hashing
No ratings yet
Hashing
44 pages
Lab 3
No ratings yet
Lab 3
5 pages
Hashing
No ratings yet
Hashing
56 pages
Lec12 Hash Tables 09092024 090609pm
No ratings yet
Lec12 Hash Tables 09092024 090609pm
48 pages
Hashing
No ratings yet
Hashing
96 pages
Hashing PDF
No ratings yet
Hashing PDF
56 pages
Dsa 4
No ratings yet
Dsa 4
55 pages
Hash Tables
No ratings yet
Hash Tables
55 pages
Hashing PDF
No ratings yet
Hashing PDF
65 pages
Hashing PDF
No ratings yet
Hashing PDF
61 pages
UNIT V - Hashing
No ratings yet
UNIT V - Hashing
20 pages
Module 5
No ratings yet
Module 5
72 pages
Hashing
No ratings yet
Hashing
37 pages
Lecture 13 - Hash Tables
No ratings yet
Lecture 13 - Hash Tables
51 pages
ADI Hashing
No ratings yet
ADI Hashing
47 pages
Data Structures and Algorithms: CS245-2010S-13 Hash Tables
No ratings yet
Data Structures and Algorithms: CS245-2010S-13 Hash Tables
41 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
27 pages
Lecture 8 Hashing
No ratings yet
Lecture 8 Hashing
47 pages
Unit 1 Hashing
No ratings yet
Unit 1 Hashing
61 pages
Hash Tables - : Structure
No ratings yet
Hash Tables - : Structure
21 pages
09 Hashtable
No ratings yet
09 Hashtable
53 pages
Hashing
No ratings yet
Hashing
42 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
26 pages
Class 30: Active Learning: Hashing
No ratings yet
Class 30: Active Learning: Hashing
24 pages
Algo Cha 8
No ratings yet
Algo Cha 8
20 pages
5 Hash - New
No ratings yet
5 Hash - New
24 pages
L5 HashTables
No ratings yet
L5 HashTables
22 pages
Maps
No ratings yet
Maps
36 pages
Hash Table
No ratings yet
Hash Table
9 pages
Module 5
No ratings yet
Module 5
33 pages
Modue 5
No ratings yet
Modue 5
10 pages
Hashing RPK
No ratings yet
Hashing RPK
61 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
32 pages
08 Hashing
No ratings yet
08 Hashing
26 pages
Hash Tables
No ratings yet
Hash Tables
20 pages
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
No ratings yet
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
53 pages
AST20105 Data Structure and Algorithms: Chapter 9 - Hash Table
No ratings yet
AST20105 Data Structure and Algorithms: Chapter 9 - Hash Table
39 pages
SORTING PROGRAMS - Counting + Bucket + Heap
No ratings yet
SORTING PROGRAMS - Counting + Bucket + Heap
27 pages
Handout 9 - Hashing
No ratings yet
Handout 9 - Hashing
11 pages
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
No ratings yet
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
53 pages
Chapter10 HashTables
No ratings yet
Chapter10 HashTables
49 pages
CSC508 Hashing
No ratings yet
CSC508 Hashing
35 pages
2,2 Hashing
No ratings yet
2,2 Hashing
30 pages
SQL Test (MSSQL/Oracle) : Lotno Stepid in - Qty in - Date Out - Qty Out - Date
No ratings yet
SQL Test (MSSQL/Oracle) : Lotno Stepid in - Qty in - Date Out - Qty Out - Date
2 pages
CHPT 8 Project Quality Management Exercise and Answer
0% (2)
CHPT 8 Project Quality Management Exercise and Answer
3 pages
CH 03 PP
No ratings yet
CH 03 PP
5 pages
HR Om11-Pom9 ch02 PP
No ratings yet
HR Om11-Pom9 ch02 PP
4 pages
Real Time Metroplitan Bus Positionin System Desing Using Gps and GSM
No ratings yet
Real Time Metroplitan Bus Positionin System Desing Using Gps and GSM
4 pages
Systems Analysis and Design1
No ratings yet
Systems Analysis and Design1
1 page
Lecture 14 Hashing
No ratings yet
Lecture 14 Hashing
44 pages
M.C.a. (Sem - II) Paper - I - Data Structures
No ratings yet
M.C.a. (Sem - II) Paper - I - Data Structures
132 pages
Week 10: Hash Table: Readings
No ratings yet
Week 10: Hash Table: Readings
18 pages
Data Structures AND Algorithms: Lecture Notes 11
No ratings yet
Data Structures AND Algorithms: Lecture Notes 11
84 pages
Task 3 Hashing Quadratic Probing
No ratings yet
Task 3 Hashing Quadratic Probing
7 pages
Viva Questions
No ratings yet
Viva Questions
7 pages
C
No ratings yet
C
20 pages
109search Hash Malik Ch09
100% (1)
109search Hash Malik Ch09
62 pages
Hash Table v2
No ratings yet
Hash Table v2
34 pages
Data Structure Full Book
100% (2)
Data Structure Full Book
361 pages
Cs 301
No ratings yet
Cs 301
404 pages
Graphs, Hashing, Sorting, Files: Definitions: Graph, Vertices, Edges
No ratings yet
Graphs, Hashing, Sorting, Files: Definitions: Graph, Vertices, Edges
24 pages
358 33 Powerpoint Slides DSC Chapter 15
No ratings yet
358 33 Powerpoint Slides DSC Chapter 15
55 pages
Hashing Part 1 Lecture
No ratings yet
Hashing Part 1 Lecture
33 pages
CSE 326: Data Structures Hash Tables: Autumn 2007
No ratings yet
CSE 326: Data Structures Hash Tables: Autumn 2007
29 pages
Collision Resolution Techniques
No ratings yet
Collision Resolution Techniques
15 pages
CLRS Chapter 11 Solutions
No ratings yet
CLRS Chapter 11 Solutions
7 pages
Tutorial9 (With Ans)
No ratings yet
Tutorial9 (With Ans)
4 pages
Hash Table PDF
No ratings yet
Hash Table PDF
25 pages
Hashing: Quadratic Probing: Pamantasan NG Lungsod NG Muntinlupa NBP Reservations, Poblacion Muntinlupa City
No ratings yet
Hashing: Quadratic Probing: Pamantasan NG Lungsod NG Muntinlupa NBP Reservations, Poblacion Muntinlupa City
11 pages
2 - Programming and Data Structures PDF
No ratings yet
2 - Programming and Data Structures PDF
224 pages
CS301 Final Term MAGA File.. All Paperz Are in 1 File.
No ratings yet
CS301 Final Term MAGA File.. All Paperz Are in 1 File.
28 pages
hw2 15211
No ratings yet
hw2 15211
8 pages
Hashing: An Ideal Hash Table
No ratings yet
Hashing: An Ideal Hash Table
11 pages
Data Structures Part - A (Shortanswer Questions) : Vemu Institute of Technology
No ratings yet
Data Structures Part - A (Shortanswer Questions) : Vemu Institute of Technology
6 pages
12 Hashing
No ratings yet
12 Hashing
40 pages
Interview PDF
No ratings yet
Interview PDF
100 pages