0% found this document useful (0 votes)

29 views

COMP211slides 11

performance of separate chaining depends on the average chain length. The average chain length is modeled by a Poisson distribution with parameter λ, where λ is the load factor. For unsuccessful searches, the average number of comparisons is equal to the average chain length, which is λ. For successful searches, the average number of comparisons is the chain length at insertion plus one, which averages to 1 + λ/2. Thus, the performance degrades gradually as the load factor increases, in contrast to open addressing which degrades sharply.

Uploaded by

alexisthe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views

COMP211slides 11

Uploaded by

alexisthe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Hashing

Data organization in main memory or disk

sequential, binary trees,

The location of a key depends on other

keys => unnecessary key comparisons to

find a key
Question: find key with a single comparison
Hashing: the location of a record is
computed using its key only
Fast for random accesses - slow for range
queries

E.G.M. Petrakis

Hashing

Hash Table
Hash Function: transforms keys to
array indices
0
1
2
3
4

h(key)

index

data

.
.
.
n
E.G.M. Petrakis

h(key): Hash
Hashing
Function

h(key) = key mod 1000

E.G.M. Petrakis

position

key

0
1
2
3
.
.
.
395
396
397
398
399
400
401
.
.
.
990
991
992
993
994
995
996
997
998
999

4967000

record

8421002
.
.
.
4618396
4957397
1286399

.
.
.
0000990
0000991
1200992
0047993
9846995
4618996
4967997
0001999
Hashing

Good Hash Functions

1. Uniform: distribute keys evenly in space
2. Perfect: two records cannot occupy the
same location or ki k j , i j : h(ki ) h(k j )
3. Order preserving:ki k j , i j : h(ki ) h(k j )
Difficult to find such hash functions
Property 2 is the most essential
Most functions are no better than
h(key) = key mod m
k i k j : h(k i ) = h(k j )
Hash collision:
E.G.M. Petrakis

Hashing

Collision Resolution
1. Open Addressing (rehashing):

compute new position to store the

key in the table (no extra space)
i. linear probing
ii. double hashing
2. Separate Chaining: lists of keys
mapped to the same position (uses
extra space)
E.G.M. Petrakis

Hashing

Open Addressing
Computes a new address to store the key
if it is occupied (rehashing)

if occupied too, compute a new address,

until an empty position is found
primary hash function: i=h(key)
rehash function: rh(i)=rh(h(key))
hash sequence: (h0,h1,h2) = (h(key),
rh(h(key)), rh(rh(h(key))))

To find a key follow the same hash

sequence

E.G.M. Petrakis

Hashing

Example
i=h(key)=key mod 100
rh(i) = (i+1) mod 100

key: 193
i=h(193)=93
rh(i)=(93+1)=94
Key 193 will occupy
position 94

E.G.M. Petrakis

Hashing

0
1
2
.
.
.
90
91
92
93
94
.
.
.
100

100
101
.
.
.
990
991
992
993

193

.
.
.
7

Problem 1: Locate Empty

Positions

No empty position can be found

the table is full

check on number of empty positions
ii. the hash function fails to find an empty
position although the table is not full !!
i=h(key) = key mod 1000
rh(i) = (i + 200) mod 1000 => checks only 5
positions on a table of 1000 positions
rh(i) = (i+1) mod 1000 successive positions
rh(i) = (i+c) mod 1000 where GCD(c,m) = 1

E.G.M. Petrakis

Hashing

Problem 2: Primary
Clustering
Different keys that hash
into different addresses
compete with each other
in successive rehashes

i=h(key) = key mod 100

rh(i) = (i+1) mod 100
keys: 1990, 1991, 1992,
1993, 1994 => 94

E.G.M. Petrakis

Hashing

0
1
2
.
.
.
90
91
92
93
94
.
.
.
100

100
101
.
.
.
990
991
992
993
.
.
.
9

Problem 3: Secondary
Clustering

Different keys which

hash to the same hash
value have the same
rehash sequence

i=h(key) = key mod 10

rh(i,j) = (i + j) mod 10
i. key 23 : h(23) = 3

rh = 4, 6, 9, 3,
ii. key 13 : h(13) = 3
rh = 4, 6, 9, 3,
E.G.M. Petrakis

Hashing

0
1
2
3
4
5
6
7
8
9

53
14
15
46

Linear Probing
Store the key into the next free
position
h0 = h(key) usually h0 = key mod m
hi = (hi-1 + 1) mod m, i >= 1

E.G.M. Petrakis

0
1
2
3
4
5
6
7
8
9

301
22
102
452
35

S = {22, 35, 301, 99, 102, 452}

Hashing

Observation 1
Different insertion

sequences => different

hash sequences
S1={11,3,27,99,8,50,77,2
2,12,31,33,40,53}=>28

probes

S2={53,40,33,31,12,22,7
7,50,8,99,27,3,11}=> 30
probes
E.G.M. Petrakis

Hashing

0
1
2
3
4
5
6
7
8
9
10
11
12

number
17
27
12
3
40
31
53
33
99
8
22
11
50

2
1
4
1
4
1
6
1
1
2
2
1
2

of
probes

H(key) = key mod 13

Observation 2
0

Deletions are not easy:

i=h(key) = key mod 10
rh(i) = (i+1) mod 10

Action: delete(65) and search(5)

Problem: search will stop at the

empty position and will not find 5

Solution:

1
2

mark position as deleted rather than empty

the marked position can be reused

E.G.M. Petrakis

Hashing

Observation 3
Linear probing tends

to create long
sequences of occupied
positions
m

P =

B +1
m

the longer a sequence

is, the longer it tends

to become
P: probability to use a
position in the cluster
E.G.M. Petrakis

Hashing

Observation 4
Linear probing suffers from both
primary and secondary clustering
Solution: double hashing
uses two hash functions h1, h2 and a
rehashing function rh

E.G.M. Petrakis

Hashing

Double Hashing
Two hash functions and a rehashing
function

primary hash function i=h1(key)= key mod m

secondary hash function h2(key)
rehashing function: rh(key) = (i + h2(key)) mod m

h2(m,key) is some function of m, key

helps rh in computing random positions in the

hash table
h2 is computed once for each key!

E.G.M. Petrakis

Hashing

Example of Double Hashing

i. hash function:

h1(key) = key mod m

m div 2
h2 (key) =
q

q =0
q 0

q = (key div m) mod m

ii. rehash function:
rh(i, key) = (i + h2(key)) mod m

E.G.M. Petrakis

Hashing

Example (continued)
A. m = 10, key = 23
h1(23) = 3, h2(23) = 2
rh(3,2)=(3+2) mod 10 = 5
rehash sequence: 5, 7, 9, 1,
m = 10, key = 13
h1(key)=3, h2(13)=1, rh(3,1)=(3+1)mod10=4
rehash sequence: 4, 5, 6,
E.G.M. Petrakis

Hashing

Performance of Open
Addressing
Distinguish between
successful and
unsuccessful search

Assume a series of probes to random

positions

independent events
load factor: = n/m
: probability to probe an occupied position
each position has the same probability P=1/m

E.G.M. Petrakis

Hashing

Unsuccessful Search
The hash sequence is exhausted
let u be the expected number of probes
u equals the expected length of the hash
sequence
P(k): probability to search k positions in
the hash sequence

E.G.M. Petrakis

Hashing

u =

kP(k)

k 1

P(1) +
P(2) + P(2) +
P(3) + P(3) + P(3) +
L

P(k) + P(k) + L + P(k) +

____

P( 1probes) + P( 2probes) + L

E.G.M. Petrakis

Hashing

u = P( k probes) =
k 1

P(first k 1 positions ocupied ) =

k 1

1
k

k 1

E.G.M. Petrakis

k 1

independent events
u increases with =>
performance drops as
increases

Hashing

Successful Search
The hash sequence is not exhausted
the number of probes to find a key

equals the number of probes s at the

time the key was inserted plus 1
was less at that time u: equivalent to
consider all values of unsuccessful search

1
1
1
s = (u + 1)dx = 1 + ln(
)

1
0
E.G.M. Petrakis

approximation
Hashing

increases with

Performance
The performance drops as increases
the higher the value of is, the higher
the probability of collisions

Unsuccessful search is more

expensive than successful search

unsuccessful search exhausts the hash
sequence

E.G.M. Petrakis

Hashing

Experimental Results
LOAD
FACTOR

SUCCESSFUL

LINEAR

UNSUCCESSFUL

i + bkey DOUBLE

LINEAR

i + bkey DOUBLE

25%

1.17

1.16

1.15

1.39

1.37

1.33

50%

1.50

1.44

1.39

2.50

2.19

2.00

75%

2.50

2.01

1.85

8.50

4.64

4.00

90%

5.50

2.85

2.56

50.50

11.40

10.00

95%

10.50

3.52

3.15

200.50

22.04

20.00

E.G.M. Petrakis

Hashing

Performance on Full Table

TABLE
SIZE (m)

SUCCESSFUL

LINEAR i + bkey

UNSUCCESSFUL

LOG2m

DOUBLE

100

6.60

4.62

4.12

50.50

6.64

500

14.35

6.22

5.72

250.50

8.97

1000

20.15

6.91

6.41

500.50

9.97

5000

44.64

8.52

8.02

2500.5

12.29

10000

63.00

9.21

8.71

5000.50

13.29

E.G.M. Petrakis

Hashing

Separate Chaining
Keys hashing to the same hash value
are stored in separate lists
one list per hash position
can store more than m records
easy to implement
the keys in each list can be ordered

E.G.M. Petrakis

Hashing

nil

130

nil

h(key) = key mod m

372

192

nil

nil
417

227

nil

E.G.M. Petrakis

nil
Hashing

Performance of Separate
Chaining
Depends on the average chain size
insertions are independent events
let P(c,n,m): probability that a position

has been selected c times after n

insertions on a table of size m
P(c,n,m): probability that the chain has
length c => binomial distribution
n c n c p=1/m: success case
P(c, n, m) = p q
q=1-p: failure case
c

E.G.M. Petrakis

Hashing

n c

n 1
1

P(c, n, m) = 1
m
c m

=
c

1 n n c +1
1
1
1 1
m m m
c! m
n c +1

n, m

1
m

1 e
m

E.G.M. Petrakis

Hashing

P(c,n,m)=(1/c!)ce-
Poison

Unsuccessful Search
The entire chain is searched
the average number of comparisons
equals its average length u

u = cP(c, ) = c e =
c 0
c 0 c!

E.G.M. Petrakis

Hashing

Successful Search
Not the whole chain is searched
the average number of comparisons

equals the length s of the chain at time

the key was inserted plus 1
the performance at the time a key was
inserted equals that of unsuccessful
search!

1
1

s = (u + 1)dx = (x + 1)dx = 1 +

0
2
0
E.G.M. Petrakis

Hashing

Performance
The performance drops with the

length of the chains

worst case: all keys are stored in a single
chain
worst case performance: O(N)
unsuccessful search performs better
than successful search!! WHY ?
no problem with deletions!!

E.G.M. Petrakis

Hashing

Coalesced Hashing
The hash sequence is
implemented as a
linked list within the
hash table

no rehash function
the next hash position

is the next available

position in linked list
extra space for the list
E.G.M. Petrakis

Hashing

h(key) = key mod 10

keys: 19, 29, 49, 59
0
1
2
3
4
5
6
7
8
9

5
34

initialization

avail

0
1
2
3
4
5
6
7
8
9

nilkey
nilkey
.
.
.
.
.
.
.
nilkey

-1
0
1
2
3
4
5
6
7
8

List of
empty positions
E.G.M. Petrakis

initially: avail = 9
h(key) = key mod 10
keys:
14,29,34,28,42,39,84,38
0
1
2
3
4
5
6
7
8
9
Hashing

nilkey
nilkey
42
38
14
84
39
28
34
29

-1
0
-1
-1
8
-1
-1
3
5
6

Holds lists of
rehashing
positions and
list of empty
positions
35

Performance of Coalesced
Hashing
Unsuccessful search
1 2
e + 0.75 probes/search
4
2

Successful search
e 2 1
+ + 0.75
8
4

E.G.M. Petrakis

probes/search

Hashing

Lect Hashing
No ratings yet
Lect Hashing
36 pages
11 Hashing
No ratings yet
11 Hashing
60 pages
05 Hashing
No ratings yet
05 Hashing
47 pages
Hashing PPT For Student
No ratings yet
Hashing PPT For Student
53 pages
Algo Cha 8
No ratings yet
Algo Cha 8
20 pages
Collision
No ratings yet
Collision
24 pages
Dsa Module 6 Ktuassist
No ratings yet
Dsa Module 6 Ktuassist
9 pages
Hashing: Data Structure
No ratings yet
Hashing: Data Structure
17 pages
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
No ratings yet
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
53 pages
TCP2101 Algorithm Design & Analysis: - Hash Tables
No ratings yet
TCP2101 Algorithm Design & Analysis: - Hash Tables
58 pages
Hashing - Datastructures and Algorithms
No ratings yet
Hashing - Datastructures and Algorithms
32 pages
Hash Tables in DS
No ratings yet
Hash Tables in DS
14 pages
Hashing RPK
No ratings yet
Hashing RPK
61 pages
3 Hashing
No ratings yet
3 Hashing
20 pages
Hashing
No ratings yet
Hashing
20 pages
Hashing: Data Structure
No ratings yet
Hashing: Data Structure
17 pages
Hashing
No ratings yet
Hashing
66 pages
unit 1 Hashing
No ratings yet
unit 1 Hashing
61 pages
Hashing PPT
No ratings yet
Hashing PPT
39 pages
Searching, Sorting and Hashing
No ratings yet
Searching, Sorting and Hashing
52 pages
Hashing
No ratings yet
Hashing
35 pages
Hashing
No ratings yet
Hashing
35 pages
CSE 326: Data Structures Hash Tables: Autumn 2007
No ratings yet
CSE 326: Data Structures Hash Tables: Autumn 2007
29 pages
L04 Hashing
No ratings yet
L04 Hashing
63 pages
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
No ratings yet
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
53 pages
Lec 7
No ratings yet
Lec 7
6 pages
Hashing: Amar Jukuntla
No ratings yet
Hashing: Amar Jukuntla
22 pages
Hash Functions
No ratings yet
Hash Functions
60 pages
Hashing
No ratings yet
Hashing
10 pages
hashing v2 12032018
No ratings yet
hashing v2 12032018
23 pages
Hash Tables: Dr. Dibakar Saha
No ratings yet
Hash Tables: Dr. Dibakar Saha
26 pages
Hashing
No ratings yet
Hashing
23 pages
CHAPTER 8 Hashing: Instructors: C. Y. Tang and J. S. Roger Jang
No ratings yet
CHAPTER 8 Hashing: Instructors: C. Y. Tang and J. S. Roger Jang
78 pages
Chapter 8 - Hashing
No ratings yet
Chapter 8 - Hashing
78 pages
Chapter One - Hashing PDF
No ratings yet
Chapter One - Hashing PDF
30 pages
Hashing PDF
No ratings yet
Hashing PDF
65 pages
Hash Table: Didih Rizki Chandranegara
No ratings yet
Hash Table: Didih Rizki Chandranegara
33 pages
Lab 09 - Hashing
No ratings yet
Lab 09 - Hashing
47 pages
Algorithm Lecture6 Search
No ratings yet
Algorithm Lecture6 Search
40 pages
Collision Resolution
No ratings yet
Collision Resolution
17 pages
Hashing
No ratings yet
Hashing
56 pages
Lecture 3.2.2 Collision Resolution Strategies
No ratings yet
Lecture 3.2.2 Collision Resolution Strategies
35 pages
Dsa 4
No ratings yet
Dsa 4
55 pages
Chapter 11 Hashing
No ratings yet
Chapter 11 Hashing
42 pages
MODULE-5
No ratings yet
MODULE-5
33 pages
2,2Hashing
No ratings yet
2,2Hashing
30 pages
Hashing: Review Examples Questions
No ratings yet
Hashing: Review Examples Questions
14 pages
Hashing
No ratings yet
Hashing
57 pages
Hashing24 PDF
No ratings yet
Hashing24 PDF
10 pages
Unit29 Hashing2
No ratings yet
Unit29 Hashing2
20 pages
Full Unit 6 Cse 205 (1)
No ratings yet
Full Unit 6 Cse 205 (1)
20 pages
Hashing Part 1 Lecture
No ratings yet
Hashing Part 1 Lecture
33 pages
Lecture 12
No ratings yet
Lecture 12
33 pages
Hashing new
No ratings yet
Hashing new
48 pages
ADS M TECH MID 2
No ratings yet
ADS M TECH MID 2
26 pages
Hashing ClassNotes
No ratings yet
Hashing ClassNotes
8 pages
Cse373 10 Hashing
No ratings yet
Cse373 10 Hashing
36 pages
De Moiver's Theorem (Trigonometry) Mathematics Question Bank
From Everand
De Moiver's Theorem (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
III. Τεχνολογία κατασκευής VLSI: Matthias Bucher
No ratings yet
III. Τεχνολογία κατασκευής VLSI: Matthias Bucher
16 pages
Sorting: Put Data in Order Based On Many Methods
No ratings yet
Sorting: Put Data in Order Based On Many Methods
35 pages
Hashing On The Disk: Keys Are Stored in " " (" ") Retrieval
No ratings yet
Hashing On The Disk: Keys Are Stored in " " (" ") Retrieval
45 pages
Algorithms and Complexity: The Sphere of Algorithmic Problems
No ratings yet
Algorithms and Complexity: The Sphere of Algorithmic Problems
41 pages
Graphs: Graph G (V, E)
No ratings yet
Graphs: Graph G (V, E)
55 pages
Balanced BST: Guarantee O (Logn) Performance at All Times
No ratings yet
Balanced BST: Guarantee O (Logn) Performance at All Times
51 pages
Multiway Search Tree (MST) : Generalization of Bsts Suitable For Disk
No ratings yet
Multiway Search Tree (MST) : Generalization of Bsts Suitable For Disk
39 pages
COMP211slides 6
No ratings yet
COMP211slides 6
64 pages
COMP211slides 5
No ratings yet
COMP211slides 5
34 pages
Technical University of Crete: Data Structures File Structures
No ratings yet
Technical University of Crete: Data Structures File Structures
20 pages
Algorithm Analysis: Algorithms That Are Equally Correct Can Vary in Their Utilization of Computational Resources
No ratings yet
Algorithm Analysis: Algorithms That Are Equally Correct Can Vary in Their Utilization of Computational Resources
36 pages
ACE201slides 8
No ratings yet
ACE201slides 8
75 pages
Real Time Fractal Flame Rendering
No ratings yet
Real Time Fractal Flame Rendering
12 pages
HMY101slides 9
No ratings yet
HMY101slides 9
61 pages
ΛΟΓΙΚΗ ΣΧΕΔΙΑΣΗ
No ratings yet
ΛΟΓΙΚΗ ΣΧΕΔΙΑΣΗ
60 pages
Shreddit User Guide
No ratings yet
Shreddit User Guide
14 pages
Oracle
No ratings yet
Oracle
23 pages
Admission Control Param Nokia
No ratings yet
Admission Control Param Nokia
10 pages
The Torrent Guide For Everyone - 24 Pages PDF
No ratings yet
The Torrent Guide For Everyone - 24 Pages PDF
24 pages
300+ TOP DATA STRUCTURES Interview Questions and Answers PDF
No ratings yet
300+ TOP DATA STRUCTURES Interview Questions and Answers PDF
23 pages
Abap For Sap Hana MCQ
No ratings yet
Abap For Sap Hana MCQ
33 pages
Chapter 2 8086 Addressing Modes1
100% (1)
Chapter 2 8086 Addressing Modes1
14 pages
Lecture 9: AVL-tree 1: CSC2100 Data Structure
No ratings yet
Lecture 9: AVL-tree 1: CSC2100 Data Structure
22 pages
The Zx97 Computer
No ratings yet
The Zx97 Computer
13 pages
Embedded Systems 2marks
No ratings yet
Embedded Systems 2marks
28 pages
如何进行随机分配
100% (1)
如何进行随机分配
5 pages
Managing and Optimizing Resources For SQL Server: Balmukund Lakhani Technical Lead - SQL Support Team
No ratings yet
Managing and Optimizing Resources For SQL Server: Balmukund Lakhani Technical Lead - SQL Support Team
28 pages
Flex Ray Communication System
No ratings yet
Flex Ray Communication System
245 pages
A Beautiful Mind (2001) 1080p BluRay 10bit HEVC 6CH 3.3GB - MkvCage
No ratings yet
A Beautiful Mind (2001) 1080p BluRay 10bit HEVC 6CH 3.3GB - MkvCage
2 pages
STD 12 Computer Chapter 13 Other Free Useful Tools and Services 1
No ratings yet
STD 12 Computer Chapter 13 Other Free Useful Tools and Services 1
6 pages
Factsheet Interface CSV EN V01 1
No ratings yet
Factsheet Interface CSV EN V01 1
3 pages
Introduction To Oracle SQL
No ratings yet
Introduction To Oracle SQL
94 pages
B2MML V0600 Common PDF
No ratings yet
B2MML V0600 Common PDF
60 pages
Case Study 1: Contoso, LTD: Active Directory
No ratings yet
Case Study 1: Contoso, LTD: Active Directory
39 pages
Network Topologies
100% (1)
Network Topologies
43 pages
Certification Study Guide 301b PDF
No ratings yet
Certification Study Guide 301b PDF
205 pages
Question Bank Unit - I: 2 Marks
No ratings yet
Question Bank Unit - I: 2 Marks
49 pages
Pro SQL Server Internals 2nd Edition Dmitri Korotkevitch Ebook All Chapters PDF
100% (5)
Pro SQL Server Internals 2nd Edition Dmitri Korotkevitch Ebook All Chapters PDF
62 pages
Objective:: Computer Fundamentals
100% (1)
Objective:: Computer Fundamentals
14 pages
ZOTAC G41 Mother Board BIOS Important Note AMI2.1
No ratings yet
ZOTAC G41 Mother Board BIOS Important Note AMI2.1
7 pages
Ab Initio Transform Components: We Have An Total of 13 Transformation Components
No ratings yet
Ab Initio Transform Components: We Have An Total of 13 Transformation Components
11 pages
installer-log
No ratings yet
installer-log
8 pages
Fortianalyzer: Device Registration and Communication
No ratings yet
Fortianalyzer: Device Registration and Communication
41 pages
Data Base Concepts
No ratings yet
Data Base Concepts
51 pages
03 Collections2
No ratings yet
03 Collections2
20 pages

COMP211slides 11

Uploaded by

COMP211slides 11

Uploaded by

Hashing

Data organization in main memory or disk

The location of a key depends on other

keys => unnecessary key comparisons to

h(key) = key mod 1000

Good Hash Functions

compute new position to store the

if occupied too, compute a new address,

To find a key follow the same hash

Problem 1: Locate Empty

No empty position can be found

the table is full

i=h(key) = key mod 100

Different keys which

i=h(key) = key mod 10

S = {22, 35, 301, 99, 102, 452}

sequences => different

H(key) = key mod 13

Deletions are not easy:

Action: delete(65) and search(5)

empty position and will not find 5

mark position as deleted rather than empty

the longer a sequence

is, the longer it tends

primary hash function i=h1(key)= key mod m

h2(m,key) is some function of m, key

helps rh in computing random positions in the

Example of Double Hashing

h1(key) = key mod m

q = (key div m) mod m

Assume a series of probes to random

P(k) + P(k) + L + P(k) +

__________ __________ ____

P(first k 1 positions ocupied ) =

equals the number of probes s at the

Unsuccessful search is more

expensive than successful search

Performance on Full Table

h(key) = key mod m

has been selected c times after n

equals the length s of the chain at time

length of the chains

is the next available

h(key) = key mod 10

You might also like

____