Course7 Hashing
Course7 Hashing
1 12
Definition
Hashing is a technique used for storing and retrieving information as quickly as possible.
It is used to perform optimal searches
9
8
7
x1 x4 6
hash function
x2 index=h(xi ) 5
x3 4
3
2
1
0
h is called: hash function such that h: S → [0...N − 1]. It maps a set of objects
S = {x1 , x2 , ...xm } into a table of size N.
It is a good idea to pick a prime as the table size to have a better distribution of values.
h(x) represents the primary address of x (it is an index of T)
Example of hash function is MOD function.
2 12
Collisions
Example: Let suppose that we have a hash table T and its size is 100. The hash function
is h(x) = x%100. If x1 = 22 and x2 = 122.
x1 and x2 will be mapped to the same location based our hash function.
This is called Collision
Definition: Collision is the condition where two items or more are stored in the same
location.
When collisions occur, we need to “handle” them using Collision Resolution
Techniques (see later).
Collisions can be reduced with a selection of a good hash function.
3 12
Characteristics of Good hash functions
4 12
Collision Resolution Techniques
5 12
Separate Chaining
Operations:
▶ f ind: The search for x begins in the cell
h(x). If x is not in that cell, we search in
the corresponding linked list.
▶ insert: the insertion begins with the
search for the key x. If x does not exist,
we add it in the cell h(x) if it is empty
otherwise we add it in the linked list
associated with this cell.
▶ delete: if x exists, delete it and
rearrange the table.
6 12
Linear Probing
In linear probing, we search the hash table sequentially, starting from the original hash
location.
If a location is occupied, we check the next locations (h(x)-1, h(x)-2,...) Until finding an
empty location.
( We wrap around from the last table location to the first table location if necessary, it
means the next location of 0 is N-1: circular array).
There must always be at least one empty cell in the table (one position is sacrificed)
Example: insert the following items into an array of size=10, h(x)=x%size:
55, 21, 63, 103, 10, 82, 18.
7 12
Linear Probing
8 12
Linear Probing
8 12
Linear Probing
8 12
Linear Probing
8 12
Linear Probing
9 12
Linear Probing
9 12
Linear Probing
9 12
Linear Probing
9 12
Linear Probing
Operations:
▶ f ind: The search for x begins in the cell h(x), if the cell doesn’t match, look elsewhere. If if
x does not exist in the table, the find operation stops in the empty cell.
▶ insert: compute h(x), if the cell is full, find another by sequentially searching for the next
available cell: h(x)-1, h(x)-2, ...
▶ delete: Search x. Test all the data above (circularly) until you find an empty cell and
possibly move some values. Clear a cell.
Complexity: In the worse case O(n), in the average case O(1).
10 12
Quadratic Probing
Same idea as linear probing except that collisions are resolved by examining cells:
h(x) − 12 , h(x) − 22 , h(x) − 32 , ...
11 12
Double hashing
Like the Linear probing except that the cyclic sequence of cells to be tested uses a step
calculated by a 2nd hash function h′ (x).
Let h(x) be the function used to calculate the primary address and h′ (x) the function
which calculates the step of the sequence: h(x), h(x) − h′ (x) , h(x) − 2h′ (x) ,
h(x) − 3h′ (x) , ...
For the sequence to be circular, the subtractions are done modulo the size of the table.
As in linear probing, one cell is sacrificed.
12 / 12