0% found this document useful (0 votes)
2 views88 pages

Lecture 10

Lecture 10 covers hash tables, focusing on their structure, hash functions, and collision resolution techniques such as separate chaining and coalesced chaining. It discusses the importance of good hash functions, examples of poor hash functions, and various methods for hashing, including the division method, mid-square method, and multiplication method. The lecture also introduces universal hashing and addresses the handling of non-natural number keys, emphasizing the need for effective collision resolution strategies.

Uploaded by

ilincaflavius01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views88 pages

Lecture 10

Lecture 10 covers hash tables, focusing on their structure, hash functions, and collision resolution techniques such as separate chaining and coalesced chaining. It discusses the importance of good hash functions, examples of poor hash functions, and various methods for hashing, including the division method, mid-square method, and multiplication method. The lecture also introduces universal hashing and addresses the handling of non-natural number keys, emphasizing the need for effective collision resolution strategies.

Uploaded by

ilincaflavius01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 88

DATA STRUCTURES

LECTURE 10

Lect. PhD. Oneţ-Marian Zsuzsanna

Babeş - Bolyai University


Computer Science and Mathematics Faculty

2024 - 2025

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


In Lecture 9...

Binary heap

Direct address table

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Today

Hash tables

Collision resolution through separate chaining

Collision resolution through coalesced chaining

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Hash tables

Hash tables are generalizations of direct-address tables and


they represent a time-space trade-off.

Searching for an element still takes Θ(1) time, but as average


case complexity (worst case complexity is higher)

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Hash tables - main idea I

We will still have a table T of size m (but now m is not the


number of possible keys, |U|) - hash table

Use a function h that will map a key k to a slot in the table T


- hash function

h : U → {0, 1, ..., m − 1}

Remarks:
In case of direct-address tables, an element with key k is
stored in T [k].
In case of hash tables, an element with key k is stored in
T [h(k)].

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Hash tables - main idea II

The point of the hash function is to reduce the range of array


indexes that need to be handled => instead of |U| values, we
only need to handle m values.

Consequence:
two keys may hash to the same slot => a collision
we need techniques for resolving the conflict created by
collisions

The two main points of discussion for hash tables are:

How to define the hash function

How to resolve collisions

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


A good hash function I

A good hash function:

can minimize the number of collisions (but cannot eliminate all


collisions)

is deterministic

can be computed in Θ(1) time

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


A good hash function II

satisfies (approximately) the assumption of simple uniform


hashing: each key is equally likely to hash to any of the
m slots, independently of where any other key has
hashed to

1
P(h(k) = j) = ∀j = 0, ..., m − 1 ∀k ∈ U
m

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Examples of bad hash functions

h(k) = constant number

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Examples of bad hash functions

h(k) = constant number

h(k) = random number

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Examples of bad hash functions

h(k) = constant number

h(k) = random number


assuming that the keys are CNP numbers:
a hash function considering just parts of it (first digit, birth
year/date, county code, etc.)
assume m = 100 and you use the birth day from the CNP (as
a number): h(CNP) = birthday % 100

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Examples of bad hash functions

h(k) = constant number

h(k) = random number


assuming that the keys are CNP numbers:
a hash function considering just parts of it (first digit, birth
year/date, county code, etc.)
assume m = 100 and you use the birth day from the CNP (as
a number): h(CNP) = birthday % 100

m = 16 and h(k) % m can also be problematic

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Examples of bad hash functions

h(k) = constant number

h(k) = random number


assuming that the keys are CNP numbers:
a hash function considering just parts of it (first digit, birth
year/date, county code, etc.)
assume m = 100 and you use the birth day from the CNP (as
a number): h(CNP) = birthday % 100

m = 16 and h(k) % m can also be problematic

etc.

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Hash function

The simple uniform hashing theorem is hard to satisfy,


especially when we do not know the distribution of data. Data
does not always have a uniform distribution
dates
group numbers at our faculty
postal codes
first letter of an English word
In practice we use heuristic techniques to create hash
functions that perform well.

Most hash functions assume that the keys are natural


numbers. If this is not true, they have to be interpreted as
natural number. In what follows, we assume that the keys are
natural numbers.

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


The division method

The division method


h(k) = k mod m

For example:
m = 13
k = 63 => h(k) = 11
k = 52 => h(k) = 0
k = 131 => h(k) = 1

Requires only a division so it is quite fast


Experiments show that good values for m are primes not too
close to exact powers of 2

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


The division method

Interestingly, Java uses the division method with a table size


which is power of 2 (initially 16).

They avoid a problem by using a second function for hashing,


before applying the mod:

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Mid-square method

Assume that the table size is 10r , for example m = 100 (r =


2)

For getting the hash of a number, multiply it by itself and


take the middle r digits.

For example, h(4567) = middle 2 digits of 4567 * 4567 =


middle 2 digits of 20857489 = 57

Same thing works for m = 2r and the binary representation of


the numbers
m = 24 , h(1011) = middle 4 digits of 01111001 = 1110

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


The multiplication method I

The multiplication method


h(k) = floor (m ∗ frac(k ∗ A)) where
m - the hash table size
A - constant in the range 0 < A < 1
frac(k ∗ A) - fractional part of k ∗ A

For example
m = 13 A = 0.6180339887
k=63 => h(k) = floor(13 * frac(63 * A)) = floor(12.16984) = 12
k=52 => h(k) = floor(13 * frac(52 * A)) = floor(1.790976) = 1
k=129=> h(k)= floor(13 * frac(129 * A)) = floor(9.442999) = 9

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


The multiplication method II

Advantage: the value of m is not critical, typically m = 2p for


some integer p

Some
√ values for A work better than others. Knuth suggests
5−1
2 = 0.6180339887

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Universal hashing I

If we know the exact hash function used by a hash table, we


can always generate a set of keys that will hash to the same
position (collision). This reduces the performance of the
table.

For example:

m = 13
h(k) = k mod m
k = 11, 24, 37, 50, 63, 76, etc.

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Universal hashing II

Instead of having one hash function, we have a collection H


of hash functions that map a given universe U of keys into the
range {0, 1, . . . , m − 1}

Such a collection is said to be universal if for each pair of


distinct keys x, y ∈ U the number of hash functions from H
for which h(x) = h(y ) is precisely |H|
m

In other words, with a hash function randomly chosen from H


the chance of collision between x and y , where x ̸= y , is
exactly m1

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Universal hashing III

Example 1
Fix a prime number p >the maximum possible value for a key from
U.
For every a ∈ {1, . . . , p − 1} and b ∈ {0, . . . , p − 1} we can define
a hash function ha,b (k) = ((a ∗ k + b) mod p) mod m.

For example:
h3,7 (k) = ((3 ∗ k + 7) mod p) mod m
h4,1 (k) = ((4 ∗ k + 1) mod p) mod m
h8,0 (k) = ((8 ∗ k) mod p) mod m
There are p ∗ (p − 1) possible hash functions that can be
chosen.

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Universal hashing IV

Example 2
If the key k is an array < k1 , k2 , . . . , kr > such that ki < m (or it
can be transformed into such an array, by writing the k as a
number in base m).
Let < x1 , x2 , . . . , xr > be a fixed sequence of random numbers,
such that xi ∈ {0, . . . , m − 1} (another number in base m with the
same length).
h(k) = ri=1 ki ∗ xi mod m
P

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Universal hashing V

Example 3
Suppose the keys are u − bits long and m = 2b .
Pick a random b − by − u matrix (called h) with 0 and 1 values
only.
Pick h(k) = h ∗ k where in the multiplication we do addition mod
2.
 
  1  
1 0 0 0   1
0 1 1 1 0 = 1
1
1 1 1 0 0
0

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Using keys that are not natural numbers I

The previously presented hash functions assume that keys are


natural numbers.

If this is not true there are two options:

Define special hash functions that work with your keys (for
example, for real number from the [0,1) interval h(k) = [k ∗ m]
can be used)

Use a function that transforms the key to a natural number


(and use any of the above-mentioned hash functions) -
hashCode in Java, hash in Python

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Using keys that are not natural numbers II

If the key is a string s:


we can consider the ASCII codes for every letter
we can use 1 for a, 2 for b, etc.

Possible implementations for hashCode

s[0] + s[1] + ... + s[n − 1]


Anagrams have the same sum SAUCE and CAUSE
DATES has the same sum (D = C + 1, T = U - 1)
Assuming maximum length of 10 for a word (and the second
letter representation), hashCode values range from 1 (the
word a) to 260 (zzzzzzzzzz). Considering a dictionary of
about 50,000 words, we would have on average 192 word for a
hashCode value.

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Using keys that are not natural numbers III

s[0] ∗ 26n−1 + s[1] ∗ 26n−2 + ... + s[n − 1] where

n - the length of the string

Generates a much larger interval of hashCode values.

Instead of 26 (which was chosen since we have 26 letters) we


can use a prime number as well (Java uses 31, for example).

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Collisions

When two keys, x and y , have the same value for the hash
function h(x) = h(y ) we have a collision.

A good hash function can reduce the number of collisions, but


it cannot eliminate them at all:

Try fitting m + 1 keys into a table of size m

There are different collision resolution methods:


Separate chaining
Coalesced chaining
Open addressing

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


The birthday paradox

How many randomly chosen people are needed in a room, to


have a good probability - about 50% - of having two people
with the same birthday?

It is obvious that if we have 367 people, there will be at least


two with the same birthday (there are only 366 possibilities).

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


The birthday paradox

How many randomly chosen people are needed in a room, to


have a good probability - about 50% - of having two people
with the same birthday?

It is obvious that if we have 367 people, there will be at least


two with the same birthday (there are only 366 possibilities).

What might not be obvious, is that approximately 70 people


are needed for a 99.9% probability

23 people are enough for a 50% probability

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Separate chaining

Collision resolution by separate chaining: each slot from the


hash table T contains a linked list, with the elements that
hash to that slot
Dictionary operations become operations on the corresponding
linked list:
insert(T , x) - insert a new node to the beginning of the list
T [h(key [x])]
search(T , k) - search for an element with key k in the list
T [h(k)]
delete(T , x) - delete x from the list T [h(key [x])]

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Hash table with separate chaining - representation

A hash table with separate chaining would be represented in


the following way (for simplicity, we will keep only the keys in
the nodes).

Node:
key: TKey
next: ↑ Node

HashTable:
T: ↑Node[] //an array of pointers to nodes
m: Integer
h: TFunction //the hash function

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Hash table with separate chaining - search

function search(ht, k) is:


//pre: ht is a HashTable, k is a TKey
//post: function returns True if k is in ht, False otherwise
position ← ht.h(k)
currentNode ← ht.T[position]
while currentNode ̸= NIL and [currentNode].key ̸= k execute
currentNode ← [currentNode].next
end-while
if currentNode ̸= NIL then
search ← True
else
search ← False
end-if
end-function

Usually search returns the info associated with the key k

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Analysis of hashing with chaining

The average performance depends on how well the hash


function h can distribute the keys to be stored among the m
slots.

Simple Uniform Hashing assumption: each element is


equally likely to hash into any of the m slots, independently of
where any other elements have hashed to.
load factor α of the table T with m slots containing n
elements
is n/m
represents the average number of elements stored in a chain
in case of separate chaining can be less than, equal to, or
greater than 1.

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Analysis of hashing with chaining - Insert

The slot where the element is to be added can be:

empty - create a new node and add it to the slot


occupied - create a new node and add it to the beginning of
the list

In either case worst-case time complexity is: Θ(1)

If we have to check whether the element already exists in the


table, the complexity of searching is added as well.

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Analysis of hashing with chaining - Search I

There are two cases


unsuccessful search
successful search

We assume that
the hash value can be computed in constant time (Θ(1))
the time required to search an element with key k depends
linearly on the length of the list T [h(k)]

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Analysis of hashing with chaining - Search II

Theorem: In a hash table in which collisions are resolved by


separate chaining, an unsuccessful search takes time
Θ(1 + α), on the average, under the assumption of simple
uniform hashing.

Theorem: In a hash table in which collisions are resolved by


chaining, a successful search takes time Θ(1 + α), on the
average, under the assumption of simple uniform hashing.

Proof idea: Θ(1) is needed to compute the value of the hash


function and α is the average time needed to search one of
the m lists

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Analysis of hashing with chaining - Search III

If n = O(m) (the number of hash table slots is proportional to


the number of elements in the table, if the number of
elements grows, the size of the table will grow as well)
α = n/m = O(m)/m = Θ(1)
searching takes constant time on average

Worst-case time complexity is Θ(n)


When all the nodes are in a single linked-list and we are
searching this list
In practice hash tables are pretty fast

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Analysis of hashing with chaining - Delete

If the lists are doubly-linked and we know the address of the


node: Θ(1)

If the lists are singly-linked: proportional to the length of the


list

All dictionary operations can be supported in Θ(1) time


on average.

In theory we can keep any number of elements in a hash table


with separate chaining, but the complexity is proportional to
α. If α is too large ⇒ resize and rehash.

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Example

Assume we have a hash table with m = 6 that uses separate


chaining for collision resolution, with the following policy: if
the load factor of the table after an insertion is greater than
or equal to 0.7, we double the size of the table

Using the division method, insert the following elements, in


the given order, in the hash table: 38, 11, 8, 72, 57, 29, 2.

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Example

h(38) = 2 (load factor will be 1/6)


h(11) = 5 (load factor will be 2/6)
h(8) = 2 (load factor will be 3/6)
h(72) = 0 (load factor will be 4/6)
h(55) = 1 (load factor will be 5/6 - greater than 0.7)
The table after the first five elements were added:

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Example

Is it OK if after the resize this is our hash table?

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Example

The result of the hash function (i.e. the position where an


element is added) depends on the size of the hash table. If
the size of the hash table changes, the value of the hash
function changes as well, which means that search and remove
operations might not find the element.
After a resize operation, we have to add all elements again in
the hash table, to make sure that they are at the correct
position → rehash

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Example

After rehash and adding the other two elements:

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Iterator

What do you think, which containers cannot be represented


on a hash table?

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Iterator

What do you think, which containers cannot be represented


on a hash table?
How can we define an iterator for a hash table with separate
chaining?

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Iterator

What do you think, which containers cannot be represented


on a hash table?
How can we define an iterator for a hash table with separate
chaining?

Since hash tables are used to implement containers where the


order of the elements is not important, our iterator can iterate
through them in any order.

For the hash table from the previous example, the easiest
order in which the elements can be iterated is: 2, 32, 5, 72,
55, 8, 11

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Iterator

Iterator for a hash table with separate chaining is a


combination of an iterator on an array (table) and on a linked
list.

We need a current position to know the position from the


table that we are at, but we also need a current node to know
the exact node from the linked list from that position.

IteratorHT:
ht: HashTable
currentPos: Integer
currentNode: ↑ Node

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Iterator - init

How can we implement the init operation?

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Iterator - init

How can we implement the init operation?

subalgorithm init(ith, ht) is:


//pre: ith is an IteratorHT, ht is a HashTable
ith.ht ← ht
ith.currentPos ← 0
while ith.currentPos < ht.m and ht.T[ith.currentPos] = NIL execute
ith.currentPos ← ith.currentPos + 1
end-while
if ith.currentPos < ht.m then
ith.currentNode ← ht.T[ith.currentPos]
else
ith.currentNode ← NIL
end-if
end-subalgorithm

Complexity of the algorithm:

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Iterator - init

How can we implement the init operation?

subalgorithm init(ith, ht) is:


//pre: ith is an IteratorHT, ht is a HashTable
ith.ht ← ht
ith.currentPos ← 0
while ith.currentPos < ht.m and ht.T[ith.currentPos] = NIL execute
ith.currentPos ← ith.currentPos + 1
end-while
if ith.currentPos < ht.m then
ith.currentNode ← ht.T[ith.currentPos]
else
ith.currentNode ← NIL
end-if
end-subalgorithm

Complexity of the algorithm: O(m)

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Iterator - other operations

How can we implement the getCurrent operation?

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Iterator - other operations

How can we implement the getCurrent operation?


How can we implement the next operation?

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Iterator - other operations

How can we implement the getCurrent operation?


How can we implement the next operation?
How can we implement the valid operation?

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Sorted containers

How can we define a sorted container on a hash table with


separate chaining?

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Sorted containers

How can we define a sorted container on a hash table with


separate chaining?
Hash tables are in general not very suitable for sorted
containers.
However, if we have to implement a sorted container on a hash
table with separate chaining, we can store the individual lists in
a sorted order and for the iterator we can return them in a
sorted order.

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Coalesced chaining

Collision resolution by coalesced chaining: each element from


the hash table is stored inside the table (no linked lists), but
each element has a next field, similar to a linked list on array.

When a new element has to be inserted and the position


where it should be placed is occupied, we will put it to any
empty position, and set the next link, so that the element can
be found in a search.

Since elements are in the table, α can be at most 1.

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Coalesced chaining - example

Consider a hash table of size m = 13 that uses coalesced


chaining for collision resolution and a hash function with the
division method

Insert into the table the following elements: 5, 18, 16, 15, 13,
31, 26.

Let’s compute the value of the hash function for every key:

Key 5 18 16 15 13 31 26
Hash 5 5 3 2 0 5 0

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Example

Initially the hash table is empty. All next values are -1 and the
first empty position is position 0.
5 will be added to position 5. But 18 should also be added
there. Since that position is already occupied, we add 18 to
position firstEmpty and set the next of 5 to point to position
0. Then we reset firstEmpty to the next empty position.

We keep doing this, until we add all elements.

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Example

The final table:

pos 0 1 2 3 4 5 6 7 8 9 10 11 12
T 18 13 15 16 31 5 26
next 1 4 -1 -1 6 0 -1 -1 -1 -1 -1 -1 -1
firstEmpty = 7

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Coalesced chaining - representation

What fields do we need to represent a hash table where


collision resolution is done with coalesced chaining?

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Coalesced chaining - representation

What fields do we need to represent a hash table where


collision resolution is done with coalesced chaining?

HashTable:
T: TKey[]
next: Integer[]
m: Integer
firstEmpty: Integer
h: TFunction

For simplicity, in the following, we will consider only the keys.

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Coalesced chaining - insert
subalgorithm insert (ht, k) is:
//pre: ht is a HashTable, k is a TKey
//post: k was added into ht
if ht.firstEmpty = ht.m then
@resize and rehash
end-if
pos ← ht.h(k)
if ht.T[pos] = -1 then //-1 means empty position
ht.T[pos] ← k
ht.next[pos] ← -1
if pos = ht.firstEmpty then
changeFirstEmpty(ht)
end-if
else
current ← pos
while ht.next[current] ̸= -1 execute
current ← ht.next[current]
end-while
//continued on the next slide...
Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES
Coalesced chaining - insert

ht.T[ht.firstEmpty] ← k
ht.next[ht.firstEmpty] ← - 1
ht.next[current] ← ht.firstEmpty
changeFirstEmpty(ht)
end-if
end-subalgorithm

Complexity: Θ(1) on average, Θ(n) - worst case

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Coalesced chaining - ChangeFirstEmpty

subalgorithm changeFirstEmpty(ht) is:


//pre: ht is a HashTable
//post: the value of ht.firstEmpty is set to the next free position
ht.firstEmpty ← ht.firstEmpty + 1
while ht.firstEmpty < ht.m and ht.T[ht.firstEmpty] ̸= -1
execute
ht.firstEmpty ← ht.firstEmpty + 1
end-while
end-subalgorithm

Complexity: O(m)

Think about it: Should we keep the free spaces linked in a list
as in case of a linked lists on array?

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Coalesced chaining - search

How would you search for an element in a hash table with


coalesced chaining?

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Coalesced chaining - search

How would you search for an element in a hash table with


coalesced chaining?

Even if it is an array, we are not going to search as in an array


(i.e., start from position 0 and go until you find the element)

We compute the value of the hash function and check the


linked list which starts from that position. If the element is in
the table, it should be in this list.

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Coalesced chaining - simple examples of remove I

Remove is a tricky operation for coalesced chaining and at


first it might not even be clear what situations make it
complicated. So let’s take 5 simple examples where we will
add a few elements in a hash table with coalesced chaining
with m = 5 and then we will remove element 11. For every
example we will only focus on how to do the removal so that
the result is correct for that particular hash table.
firstEmpty is not going to be marked on the following
examples, simply assume that it is the first empty position
from left to right.

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Coalesced chaining - simple examples of remove II

Example 1: insert 11, 8, 3


elems 3 11 8
next -1 -1 -1 0 -1

Remove 11
elems 3 8
next -1 -1 -1 0 -1

In this case we just mark the position as empty.

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Coalesced chaining - simple examples of remove III

Example 2: insert 56, 8, 11, 12


elems 11 56 12 8
next -1 0 -1 -1 -1

Remove 11
elems 56 12 8
next -1 -1 -1 -1 -1

In this case we remove 11 in the same way as we remove an


element from the end of a linked list on array.

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Coalesced chaining - simple examples of remove IV

Example 3: insert 11, 20, 56


elems 20 11 56
next -1 2 -1 -1 -1

Remove 11
elems 20 56
next -1 -1 -1 -1 -1

Now we need to remove the first element of the linked list.


But position 1 cannot be empty (because a search for 56
would start from position 1), so we move 56 to replace 11.

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Coalesced chaining - simple examples of remove V

Example 4: insert 56, 11, 12, 1


elems 11 56 12 1
next 3 0 -1 -1 -1

Remove 11
elems 56 12 1
next -1 3 -1 -1 -1

We remove 11 in the same way in which we remove an


element from the middle of a linked list on array.

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Coalesced chaining - simple examples of remove VI

Example 5: insert 56, 11, 20, 13


elems 11 56 20 13
next 2 0 -1 -1 -1

Remove 11
elems 20 56 13
next -1 0 -1 -1 -1

Position 0 cannot become empty, since then 20 would not be


found, so we move 20 to replace 11.

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Coalesced chaining - simple examples of remove VII

Let’s see a few more complicated example (on the one used
previously).

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Coalesced chaining - remove

A hash table with coalesced chaining is essentially an array, in


which we have multiple singly linked lists. Can we remove an
element like we remove from a regular singly linked list? Just
set the next of the previous element to jump over it?
For example, if from the previously built hash table I want to
remove element 18, can we just do it like that?

pos 0 1 2 3 4 5 6 7 8 9 10 11 12
T 13 15 16 31 5 26
next -1 4 -1 -1 6 1 -1 -1 -1 -1 -1 -1 -1

firstEmpty = 0

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


If we remove 18 simply by setting the next of 5 to be 13, we
will never be able to find 13 and 26, because a search for
them is going to start from position 0, and that position being
empty, we will never check any other position.

Obs 1: Some positions from the linked list of elements are


not allowed to become empty (specifically, the ones which are
equal to the value of the hash function of any element from
the linked list).

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Would then be a solution to move every element to the
previous position in the linked list?

For example, if we remove 18, we would have:

pos 0 1 2 3 4 5 6 7 8 9 10 11 12
T 13 31 15 16 26 5
next 1 4 -1 -1 -1 0 -1 -1 -1 -1 -1 -1 -1
firstEmpty = 6

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Would then be a solution to move every element to the
previous position in the linked list?

For example, if we remove 18, we would have:

pos 0 1 2 3 4 5 6 7 8 9 10 11 12
T 13 31 15 16 26 5
next 1 4 -1 -1 -1 0 -1 -1 -1 -1 -1 -1 -1
firstEmpty = 6
For this example, it would work. This hash table is now
correct and every element can be found in it. But what if now
we remove 5? Is the hash table below correct?

pos 0 1 2 3 4 5 6 7 8 9 10 11 12
T 31 26 15 16 13
next 1 -1 -1 -1 -1 0 -1 -1 -1 -1 -1 -1 -1
firstEmpty = 4
Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES
Now element 13 is not going to be found, because a search
for 13 starts from position 0, but 13 is currently on a position
before 0 in the linked list.

Obs 2: Not any element can get to any position in the linked
list (specifically, no element is allowed to be on a position
which is before the position to which it hashes)

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Considering the cases discussed previously, we can describe
how remove should look like:
Compute the value of the hash function for the element, let’s
call it p.
Starting from p follow the links in the hash table to find the
element.
If element is not found, we want to remove something which is
not there, so nothing to do. Assume we do find it, on position
elem pos.
Starting from position elem pos search for another element in
the linked list, which should be on that position. If you find
one, let’s say on position other pos, move the element from
other pos to elem pos and restart the remove process for
other pos.
If no element is found which hashes to elem pos, you can
simply remove the element, like in case of a singly linked list,
setting its previous to point to its next.

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Coalesced chaining - remove
subalgorithm remove(ht, elem) is:
pos ← ht.h(elem)
prevpos ← -1 //find the element to be removed and its previous
while pos ̸= -1 and ht.t[pos] ̸= elem execute:
prevpos ← pos
pos ← ht.next[pos]
end-while
if pos = -1 then
@element does not exist
else
over ← false //becomes true when nothing hashes to pos
repeat
p ← ht.next[pos]
pp ← pos //previous of p
while p ̸= -1 and ht.h(ht.t[p]) ̸= pos execute
pp ← p
p ← ht.next[p]
end-while
//continued on the next slide
Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES
if p = -1 then
over ← true //no element hashes to pos
else
ht.t[pos] ← ht.t[p] //move element from position p to pos
prevpos ← pp
pos ← p
end-if
until over
//now element from pos can be removed (no element hashes to it)
if prevpos = -1 then //see next slide for explanation
idx ← 0
while (idx < ht.m and prevpos = -1) execute
if ht.next[idx] = pos then
prevpos ← idx
else
idx ← idx + 1
end-if
end-while
end-if
//continued on the next slide...

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


if prevpos ̸= -1 then
ht.next[prevpos] ← ht.next[pos]
end-if
ht.t[pos] ← −1
ht.next[pos] ← -1
if ht.firstFree > pos then
ht.firstFree ← pos
end-if
end-if
end-subalgorithm

Complexity:

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


if prevpos ̸= -1 then
ht.next[prevpos] ← ht.next[pos]
end-if
ht.t[pos] ← −1
ht.next[pos] ← -1
if ht.firstFree > pos then
ht.firstFree ← pos
end-if
end-if
end-subalgorithm

Complexity: O(m), but Θ(1) on average

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


What happens when the element you need to remove is right
on the position where it should be? In this case we might
assume that the element has no previous (and in the above
implementation its prev will be -1), but this is not true. It
might happen that an element is on its position, but it still
has a previous element. For example, element 13:

pos 0 1 2 3 4 5 6 7 8 9 10 11 12
T 13 31 15 16 26 5
next 1 4 -1 -1 -1 0 -1 -1 -1 -1 -1 -1 -1
firstEmpty = 6

If we wanted to remove 13, it would be ok, because 26 would


be moved in its place, but if no other element hashed to
position 0 and we just made its next -1, element 31 would
never be found.

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


This is why we have the while loop in the remove code when
prevpos is −1: we go through the table and see if there is an
element whose next is pos, because this element would then
be the previous of pos.
This while loop happens rarely, only when an element is found
on the position where it hashes and no other element hashes
to its position. Nevertheless, having a while loop which goes
through all the elements of the table is not a very hash
table-like operation and it increases the complexity of the
function.

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Coalesced chaining - iterator

How can we define an iterator for a hash table with coalesced


chaining? What should the following operations do?
init
getCurrent
next
valid

How can we implement a sorted container on a hash table


with coalesced chaining? How can we implement its iterator?

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES


Summary

Today we have talked about:

Separate chaining

Coalesced chaining

Lect. PhD. Oneţ-Marian Zsuzsanna DATA STRUCTURES

You might also like