0% found this document useful (0 votes)

7 views28 pages

06 - APS - Hash Table

The document provides an overview of hash tables, detailing their structure, advantages, and methods for resolving collisions. It discusses hash functions, including division and multiplication methods, and techniques for collision resolution such as chaining and open addressing. Additionally, it covers probing strategies like linear, quadratic, and double hashing, along with their respective implementations and complexities.

Uploaded by

kgztjmqsss

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views28 pages

06 - APS - Hash Table

Uploaded by

kgztjmqsss

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Algorithms & data structures

Hash table
Damjan Strnad
2

Hash table
●
a hash table is a data structure that stores key-data
pairs (k,r), where k is the key and r is the associated
data of a table element
●
hash table allows direct access to data through key
values, therefore a natural implementation of a hash
table is by using an array:
– the element index in the array is calculated from the key
3

Hash table
●
if the array is large enough and the keys are integers,
each key maps to a unique array location and we can use
direct addressing (array location k belongs to key k):
– the set of active keys K is a subset of a set of all keys U
and determines the
U
locations with valid T
0 key data
0 6 NIL
pointers to stored 2 1
8 1
elements; other 2
K 1 NIL
locations have value 3 3
3
NIL 7 4 4
4
– such table is not yet 5 NIL
5

a hash table NIL

7
7
8
NIL
4

Hash table
●
when the number of possible keys is bigger than the array
size, we calculate the element address from the key value
using a hash function h : U  {0,...,m-1}, which maps from a
set of keys U into the slots of a hash table T:
– h(k) is the address of element with key k in a hash table

U T
0

h(k1)
k1
K k2
k3 h(k2)=h(k3)
k4 k5 h(k5)

h(k4)

m-1
5

Hash table
●
the advantage of a hash table to a table with direct
addressing is smaller memory consumption (O(|K|) instead
O(|U|)
●
average access time is still O(1), but not for the worst case
●
the disadvantage of a hash table is that two keys can map
to the same slot, which is called a collision
●
the number of collisions can be reduced with a good hash
function that maps keys uniformly across the table
addresses
●
because collisions can occur for |U|>m in any case, we
must use one of techniques for collision resolving
6

Hash functions
●
a good hash function maps each of the keys with equal
probability into one of m slots in a hash table
●
we will assume that keys are natural numbers; when they
are not, we have to transform them into natural numbers:
– example: let the key be a string CLRS. The ASCII values for
individual letters are: C=67, L=76, R=82, S=83. There are
128 values for a 7-bit ASCII, therefore the string CLRS can
be uniquely transformed into a natural number as:
(67 · 1283) + (76 · 1282) + (82 · 1281) + (83 · 1280) = 141 764 947
●
two methods for construction of good hash functions:
– division method
– multiplication method
7

Division method
●
uses the following equation for a hash function:
h(k) = k mod m
●
example: hash table size is m=12, the key is k=100
– h(100) = 100 mod 12 = 4
– values 4,16,28,... (k=4+12i, i=0,1,2,...) map into the
same slot
●
using powers of 2 for m is not always good:
– operation k mod m, where m=2p, returns bottom p bits
of the key; if those are not uniformly distributed among
all possible keys (e.g., postal codes), it will cause poor
dispersion of h(k) and consequently many collisions
●
in practice a good value for m is a prime number that is
not very close to the power of 2
8

Multiplication method
●
uses the following equation for a hash function:
h(k) = ⌊m(kA mod 1)⌋
●
A is a constant from range (0,1), a recommended value is:
A=( √ 5−1)/2≈0,618034
●
(kA mod 1) means we only keep the fractional part of the
product
●
the method advantage is that the value of m is not critical,
the disadvantage is it is slow compared to division method
●
example: m=8, A=13/32, k=21: h(k )=⌊8⋅(21⋅13 mod 1)⌋=
32
=⌊8⋅(8,53125 mod 1)⌋=
=⌊8⋅0,53125⌋=
=⌊ 4,25⌋=
=4
9

Resolving collisions by chaining

●
in chaining, elements that hash to the same slot are
stored in a linked list at that slot
●
each slot contains a pointer to the head of the linked list; if
the slot is empty the pointer is NIL
U T
NIL

k1 k8 NIL
k1
K k8 NIL

k3 k2
k7 k2 k3 NIL
k6 k7
k4 k5 NIL

k6 NIL
NIL

NIL

k4 k5 NIL
10

Resolving collisions by chaining

●
example: a phone book
keys indices key-value pairs
11

Chained hash table operations

● insertion: INSERT-INTO-CHAINED-SLOT(T,x)
– inserts element x into the head of list T[h(key[x])]
– time complexity is O(1) if we assume that x is not yet in T;
if we need to verify that assumption, we must execute the
search
● searching: FIND-IN-CHAINED-SLOT(T,k)
– finds the element with key k in the list T[h(k)]
– worst case time complexity is proportional to the length of
linked list in slot h(k)
● removal: REMOVE-FROM-CHAINED-SLOT(T,x)
– removes element x from list T[h(key[x])]
– if the lists are doubly linked, the time complexity is O(1); if
they are singly linked, we must find the predecessor of x,
for which time complexity equals to that of searching
12

Chained hashing analysis

●
a load factor α for a hash table storing n elements in m
slots is defined as:
n
α=
m
●
α is the average number of elements stored in a chain
(list)
●
the worst case occurs when all elements are in the same
slot; in this case the time complexity of search is O(n), to
which we must add the time of computing the hash
function value
●
average time complexity depends on how well the hash
function h distributes the keys among m slots
13

Chained hashing analysis

●
suppose that each key is equally likely mapped to any of
m slots; this assumption is called simple uniform
hashing
●
theorem: With chained hashing both successful and
unsuccessful search has average time complexity Θ(1+α)
if we assume simple uniform hashing
●
if the hash table does not contain the element with key k,
the search is unsuccessful, otherwise it is successful
●
in expression Θ(1+α) the constant 1 is the time for
computing the hash function and α is the average search
time for a list of length α=n/m
14

Open addressing
●
open addressing is the second technique for resolving
collisions
●
all elements are stored in the same hash table; each slot
contains either a key or a NIL
●
searching is done by systematic inspection of slots until
the sought element is found or it is determined that the
element is not in the table
●
the advantage of open addressing is that we avoid
pointers; the saved memory can be used to enlarge the
hash table
●
hash table size m must be greater than the expected
number of elements n, therefore open addressing is only
used when the latter is known
15

Open addressing – insertion

●
insertion is performed by probing the hash table until we
find a free slot
●
the hash function is extended by using an additional
argument, which is the probing trial counter:
h : U × {0, 1, ... ,m-1}  {0, ... ,m-1}
●
for each key k a probe sequence 〈h(k,0), h(k,1), ... ,
h(k,m-1)〉 is determined:
– h(k,0) is the first possible slot for key k, h(k,1) is the second
possible slot for key k, etc.
●
the probe sequence is a permutation of 〈0, 1, ... , m-1〉,
which means that it contains all m slots of a hash table
16

Open addressing – insertion

● procedure INSERT-INTO-HASH-TABLE(T,k) returns a
slot index into which key k is written or an overflow error if
there is no free slot
INSERT-INTO-HASH-TABLE(T,k)
i ← 0
repeat
j ← h(k,i)
if T[j] = NIL or T[j] = DELETED then % slot j is free
T[j] ← k % insert k into slot j
return j % return slot number
else % slot is occupied, try next one
i ← i + 1
until i = m
error „hash table overflow“
17

Open addressing – search

●
the procedure for finding key k searches the same
sequence of slots as the insertion procedure
●
search is unsuccessful if a free slot is encountered or if all
slots are full but none contains key k
●
worst case time complexity is O(n), where n is the number
of elements stored in the hash table
SEARCH-IN-HASH-TABLE(T,k)
i ← 0
repeat
j ← h(k,i)
if T[j] = k then % key k found in slot j
return j % return slot number
i ← i + 1
until T[j] = NIL or i = m
return NIL % key not found
18

Open addressing – removal

●
when removing an element from a hash table with open
addressing, we need to write a special value at the location of
the removed element; the special value signals that the table
location was not empty all of the time
●
this allows us to continue subsequent searches to the first truly
empty slot, stepping over once occupied locations (if we
stored NIL to the removed element location, all elements
behind it in the probe sequence would become inaccessible)
●
when inserting DELETE-IN-HASH-TABLE(T,k)
i ← 0
new elements repeat
the locations j ← h(k,i)
if T[j] = k then % key found
containing the T[j] ← DELETED % mark deleted
special value return
can be i ← i + 1
until T[j] = NIL or i = m
overwritten error „key not found“
19

Probe sequences
●
uniform hashing is a generalization of simple uniform
hashing in which every of m! possible permutations of
〈0, 1, ... , m-1〉 is selected as a probe sequence with equal
probability
●
uniform hashing is difficult to achieve in practice, so we
approximate it using methods that guarantee at least that
the probe sequence is a permutation of 〈0, 1, ... , m-1〉:
– linear probing
– quadratic probing
– double hashing
20

Linear probing
●
uses the following hash function:
h(k,i) = (h‘(k) + i) mod m; i=0, 1, ..., m-1
●
h‘(k) is an auxiliary hash function which determines the
first inspected slot T[h‘(k)]
●
the probe sequence is 〈T[h‘(k)], T[h‘(k)+1], ... , T[m-1],
T[0], T[1], ..., T[h‘(k)-1]〉
●
implementation of linear probing is simple, but the
problem is emergence of long clusters of occupied slots in
the hash table (i.e., primary clustering), which increases
the average search time
21

Quadratic probing
●
uses the following hash function:
h(k,i) = (h‘(k) + c1i + c2i2) mod m; i=0, 1, ..., m-1
● c1 and c2 are non-zero constants, which must be chosen
so that all hash table slots are addressed:
– for m=2n a good choice is c1=c2=0.5
– for arbitrary m and c1=c2=0.5 instead of „mod m“ we use
the first higher power of 2 and skip values h(k,i)≥m
●
the first probed slot is T[h‘(k)]; the index of next probed
slots changes according to a quadratic function of i
●
in practice quadratic probing performs better than linear
probing
22

Double hashing
●
uses the following hash function:
h(k,i) = (h1(k) + i·h2(k)) mod m; i=0, 1, ..., m-1
● h1 are h2 are auxiliary hash functions; the values of h2 should
be non-zero 0 and coprime to m
– for m=2n this is achieved if h2 only returns odd values
● the first probed slot is T[h1(k)]; addresses of subsequently
probed slots depend on h2(k)
●
example:
– h1(k)=k mod m, h2(k)=1 + (k mod (m-1)), k=123456, m=701
– h1(k) = 123456 mod 701 = 80, h2(k) = 1 + (123456 mod 700) =
257
– h(k,i) = [80 + i · 257] mod 701
– the probe sequence is 〈80, 337, 594, 150, 407, ...〉
23

Double hashing – example

●
insert key k=14 into the hash table of size m=13 below using
open addressing with double hashing; the hashing functions are
h1(k) = k mod 13 and h2(k) = 1 + (k mod 11):
0 – h1(14) = 14 mod 13 = 1
1 79
2
– h2(14) = 1 + (14 mod 11) = 1 + 3 = 4
3 68 – h(14,i) = [h1(14) + ih2(14)] mod 13 = [1 + 4i] mod 13
4
5 96
– the probe sequence for i = 0, 1, ..., m-1 is
6 〈1,5,9,0,4,8,12,3,7,11,2,6,10〉
7 – we try to store the key into slot 1, which is occupied
8 – next we try to store the key into slot 5, which is also
9 occupied
10 49 – in the third trial we successfully store the key into
11
slot 9 which is still free*
12
24

Double hashing – example

●
insert key k=14 into the hash table of size m=13 below using
open addressing with double hashing; the hashing functions are
h1(k) = k mod 13 and h2(k) = 1 + (k mod 11):
0 – h1(14) = 14 mod 13 = 1
1 79
2
– h2(14) = 1 + (14 mod 11) = 1 + 3 = 4
3 68 – h(14,i) = [h1(14) + ih2(14)] mod 13 = [1 + 4i] mod 13
4
5 96
– the probe sequence for i = 0, 1, ..., m-1 is
6 〈1,5,9,0,4,8,12,3,7,11,2,6,10〉
7 – we try to store the key into slot 1, which is occupied
8 – next we try to store the key into slot 5, which is also
9 14 occupied
10 49 – in the third trial we successfully store the key into
11
slot 9 which is still free
12
25

Double hashing – example

●
we have removed key 96 from a hash table and marked the
removed element location using a special symbol ×
●
we want to verify if key 14 is stored in the hash table:
0 – the probe sequence will be identical to the one
1 79 used at insertion, i.e. 〈1,5,9,0,4,8,12,3,7,11,2,6,10〉
2 – the first inspected slot is 1, where the key is not
3 68
found
4
5 ×
– the next inspected slot is 5, where the special
6 symbol is encountered, which means we must
7 continue the search
8 – the next inspected slot is 9, where the key is finally
9 14 found*
10 49
11
12
26

Double hashing – example

Open addressing analysis

●
the average number of trials for an unsuccessful search is
at most 1/(1−α) if we assume uniform hashing and a load
factor α < 1
●
element insertion requires at most 1/(1−α) trials if uniform
hashing is assumed
●
if a load factor α < 1, uniform hashing and equal
probability of finding every key in a hash table are
assumed, then the average number of trials for a
successful search is at most:
1 1
⋅ln
α 1−α
28

Disadvantages and applications

●
disadvantages of a hash table:
– it cannot be a dynamic data structure owing to a fixed hash
function
– it cannot be adapted to a changed element distribution
– it does not allow efficient implementation of operations like
minimum or maximum element extraction, predecessor or
successor search, sorting, ...
●
applications:
– dictionaries and associative arrays
– symbol tables in compilers
– memory pages in operating systems

Dsa Merged
No ratings yet
Dsa Merged
339 pages
09 Hashtable
No ratings yet
09 Hashtable
53 pages
Maps and Hashing - Final
No ratings yet
Maps and Hashing - Final
51 pages
Lecture 13 - Hash Tables
No ratings yet
Lecture 13 - Hash Tables
51 pages
11 Hashtable-1
No ratings yet
11 Hashtable-1
48 pages
DSA2 Chapter 5 Hashing
No ratings yet
DSA2 Chapter 5 Hashing
44 pages
Dsa 4
No ratings yet
Dsa 4
55 pages
DSA - Unit 1
No ratings yet
DSA - Unit 1
43 pages
Hashing Updated
No ratings yet
Hashing Updated
26 pages
Lecture 12
No ratings yet
Lecture 12
19 pages
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
No ratings yet
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
53 pages
Lecture 8 Hashing
No ratings yet
Lecture 8 Hashing
47 pages
Hash Tables in DS
No ratings yet
Hash Tables in DS
14 pages
Hashing RPK
No ratings yet
Hashing RPK
61 pages
Ds 5 Update
No ratings yet
Ds 5 Update
26 pages
CH 4
No ratings yet
CH 4
58 pages
SORTING PROGRAMS - Counting + Bucket + Heap
No ratings yet
SORTING PROGRAMS - Counting + Bucket + Heap
27 pages
Lecture 27 - Hashing
No ratings yet
Lecture 27 - Hashing
48 pages
Module 5
No ratings yet
Module 5
33 pages
Win32 Programming
No ratings yet
Win32 Programming
90 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
26 pages
11 Hashing
No ratings yet
11 Hashing
60 pages
Algo Cha 8
No ratings yet
Algo Cha 8
20 pages
CSC508 Hashing
No ratings yet
CSC508 Hashing
35 pages
Unit 1 Hashing
No ratings yet
Unit 1 Hashing
61 pages
L-2005-08-Advance Data Structure Part 1-HS
No ratings yet
L-2005-08-Advance Data Structure Part 1-HS
46 pages
Lab 09 - Hashing
No ratings yet
Lab 09 - Hashing
47 pages
Hashing
No ratings yet
Hashing
37 pages
Hash Table PDF
No ratings yet
Hash Table PDF
25 pages
Hashing
No ratings yet
Hashing
23 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
32 pages
Hashing PPT For Student
No ratings yet
Hashing PPT For Student
53 pages
Hashing
No ratings yet
Hashing
23 pages
Hashing
No ratings yet
Hashing
44 pages
Lec12 Hash Tables 09092024 090609pm
No ratings yet
Lec12 Hash Tables 09092024 090609pm
48 pages
What Is Hashing
No ratings yet
What Is Hashing
11 pages
DSA Lab 11 Hashing
No ratings yet
DSA Lab 11 Hashing
9 pages
Lect Hashing
No ratings yet
Lect Hashing
36 pages
AST20105 Data Structure and Algorithms: Chapter 9 - Hash Table
No ratings yet
AST20105 Data Structure and Algorithms: Chapter 9 - Hash Table
39 pages
Hashing PDF
No ratings yet
Hashing PDF
56 pages
Hashing in Data Structure
No ratings yet
Hashing in Data Structure
43 pages
Chapter One - Hashing PDF
No ratings yet
Chapter One - Hashing PDF
30 pages
Ads M Tech Mid 2
No ratings yet
Ads M Tech Mid 2
26 pages
Hashing
No ratings yet
Hashing
20 pages
Hashing
No ratings yet
Hashing
56 pages
Hashing and Graphs
No ratings yet
Hashing and Graphs
28 pages
Technical Skilling - Competitive Coding: Linked List
No ratings yet
Technical Skilling - Competitive Coding: Linked List
43 pages
Hash Table: Didih Rizki Chandranegara
No ratings yet
Hash Table: Didih Rizki Chandranegara
33 pages
Collision
No ratings yet
Collision
24 pages
Hashing
No ratings yet
Hashing
37 pages
Logcat
No ratings yet
Logcat
2,163 pages
05 Hashing
No ratings yet
05 Hashing
47 pages
Done DS GTU Study Material Presentations Unit-4 13032021035653AM
No ratings yet
Done DS GTU Study Material Presentations Unit-4 13032021035653AM
24 pages
Hashing Algorithms
No ratings yet
Hashing Algorithms
22 pages
Hashing
No ratings yet
Hashing
10 pages
Topic 1: Hashing - Introduction: Hashing Is A Method of Storing and Retrieving Data From A Database Efficiently
No ratings yet
Topic 1: Hashing - Introduction: Hashing Is A Method of Storing and Retrieving Data From A Database Efficiently
31 pages
3 Hashing
No ratings yet
3 Hashing
20 pages
Exceptions 2
No ratings yet
Exceptions 2
829 pages
Obscure Topics in Cocoa Objectivec Mattt Thompson PDF Download
No ratings yet
Obscure Topics in Cocoa Objectivec Mattt Thompson PDF Download
77 pages
Hashing: Amar Jukuntla
No ratings yet
Hashing: Amar Jukuntla
22 pages
SHARE Pittsburgh IIB Internals of IBM Integration Bus
No ratings yet
SHARE Pittsburgh IIB Internals of IBM Integration Bus
79 pages
CPP - Notes PPT Pointer
No ratings yet
CPP - Notes PPT Pointer
28 pages
Unit29 Hashing2
No ratings yet
Unit29 Hashing2
20 pages
C Module 5
No ratings yet
C Module 5
23 pages
Java Assignment
No ratings yet
Java Assignment
10 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
27 pages
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
No ratings yet
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
53 pages
01 - APS - Algorithm Analysis
No ratings yet
01 - APS - Algorithm Analysis
34 pages
Adocode
No ratings yet
Adocode
804 pages
OOP - Chapter 4
No ratings yet
OOP - Chapter 4
22 pages
Data Structures and Algorithms: What Are Pointers?
No ratings yet
Data Structures and Algorithms: What Are Pointers?
11 pages
C Questions and Answers
No ratings yet
C Questions and Answers
17 pages
SCA Guide 20.2.0
No ratings yet
SCA Guide 20.2.0
216 pages
MODULE 4 Pointers
No ratings yet
MODULE 4 Pointers
52 pages
Dynamic Memory PDF
No ratings yet
Dynamic Memory PDF
42 pages
C Programming Learn To Code 1st Edition Sisir Kumar Jena Download
No ratings yet
C Programming Learn To Code 1st Edition Sisir Kumar Jena Download
91 pages
Java 4 Unit - Multithreding
No ratings yet
Java 4 Unit - Multithreding
59 pages
Java 8 Features
No ratings yet
Java 8 Features
38 pages
09 - APS - Greedy Method
No ratings yet
09 - APS - Greedy Method
69 pages
03 - APS - Tree
No ratings yet
03 - APS - Tree
47 pages
Inheritance and Polymorphism
No ratings yet
Inheritance and Polymorphism
70 pages
Log
No ratings yet
Log
19 pages
Crash
No ratings yet
Crash
24 pages
Object Oriented Thinking
No ratings yet
Object Oriented Thinking
77 pages
Exceptions
No ratings yet
Exceptions
75 pages
Logcat 1729682405140
No ratings yet
Logcat 1729682405140
13 pages
HPC 1 BFS
No ratings yet
HPC 1 BFS
10 pages
Databases I - 04 SQL-DQL, Other
No ratings yet
Databases I - 04 SQL-DQL, Other
57 pages
Kavitha Text File
No ratings yet
Kavitha Text File
11 pages
Settings Provider
No ratings yet
Settings Provider
11 pages
08 - APS - Decrease and Conquer
No ratings yet
08 - APS - Decrease and Conquer
10 pages
Log
No ratings yet
Log
5 pages
Log Cat 1750821413330
No ratings yet
Log Cat 1750821413330
31 pages
Ders Notları
No ratings yet
Ders Notları
24 pages
Log Cat 1744009026577
No ratings yet
Log Cat 1744009026577
15 pages
Log-2024 11 22 09 42
No ratings yet
Log-2024 11 22 09 42
11 pages
Databases I - 02 Logical Modeling, Docker
No ratings yet
Databases I - 02 Logical Modeling, Docker
14 pages
Logic Design Questions
No ratings yet
Logic Design Questions
11 pages
Java Programs Part B 1 To 10 1
No ratings yet
Java Programs Part B 1 To 10 1
13 pages
Exception Handling
No ratings yet
Exception Handling
7 pages
Učni Načrt Predmeta / Course Syllabus
No ratings yet
Učni Načrt Predmeta / Course Syllabus
3 pages

06 - APS - Hash Table

Uploaded by

06 - APS - Hash Table

Uploaded by

Algorithms & data structures

a hash table NIL

Resolving collisions by chaining

Resolving collisions by chaining

Chained hash table operations

Chained hashing analysis

Chained hashing analysis

Open addressing – insertion

Open addressing – insertion

Open addressing – search

Open addressing – removal

Double hashing – example

Double hashing – example

Double hashing – example

Double hashing – example

Open addressing analysis

Disadvantages and applications

You might also like