Hashing
Hashing
Sagheer Ahmed
Department of CS
Air University, Islamabad
[email protected] 1
Introduction To Searching
• Linear Search & Binary Search
• Locate an item by a sequence of comparisons
• Item being sought – repeatedly compared with items in the list
• Fast Searching
• Location of an item is determined directly as a function of the item itself
• No hit and trial comparisons
• Ideally – Time required to locate an item is constant and does not depend
on the number of items stored
• Hash tables and hash functions 2
Why Hashing??
• Increased content especially internet
• Impossible to find anything, unless new data structures and
algorithms for storing and accessing data are developed.
• Problem with traditional data structures like Arrays and Linked
Lists?
• Sorted array ->Binary search -> time complexity =O(log n)
• Unsorted array -> Linear search -> time complexity = O(n)
• Either case may not be desirable if we need to process a very large data set.
• A new technique called hashing that allows us to update and
retrieve any entry in constant time O(1). The constant time or O(1)
performance means, the amount of time to perform the operation
does not depend on data size n.
Applications of Hashing
0 1 2 3 4 5 6 7 8 9 10
-1 -1 3 -1
1 -1 -1 5 -1
4 -1 7 -1 -1
6 -1 9 -1
6
A simple example – direct hashing
table[number] = number
• Hash Function:
• The function h defined by h(i) = i that determines the
location of an item i in the hash table is called hash function
7
A simple example – direct hashing
• 7 integers, ranging 0 – 999 to be stored in a hash table
• The hash table can be implemented by:
• An integer array, table.
• Initialize each array element with some dummy value, like –1.
• Store value i at location table[i].
8
A simple example – direct hashing
• For the hash function h(i) = i:
• Time required to search the table for a given item is constant, only one
location needs to be examined
0 1 2 3 4 5 6
-17 -1
1 -1 3 -1
9 -1 5 -1
4 -1 6
10
Hashing and hash functions
The above function would always produce an integer in the
range 0 –24.
h(52) = 52 % 25 = 2
0 1 2 3
………….. 23 24
11
Hash tables – formal definition
• The hash table structure is an array of some fixed size,
containing the items.
A stored item needs to have a data member, called key, that
will be used in computing the index value for the item.
• Key could be an integer, a string, etc
• e.g. a name or Id that is a part of a large employee
structure
• The size of the array is TableSize.
The items that are stored in the hash table are indexed by
values from 0 to TableSize – 1.
• Each key is mapped into some number in the range 0 to
TableSize – 1.
12
...
14
Example
• Each record has a special field, called its key.
• In this example, the key is the ID of an individual – a long integer
[4]
ID: 506643548
...
15
Example
• The rest of the record has information about the person.
[4]
ID: 506643548
...
16
Example
• When a hash table is in use, some spots contain valid records, and
other spots are empty
...
17
Example: Inserting a New Record
• In order to insert a new record, the key must
somehow be converted to an array index ID: 580625685
...
18
Example: Inserting a New Record
...
19
Example: Inserting a New Record
...
20
Hash function examples for integer keys
• Let us consider a hash table size = 1000
• Truncation: If students have an 9-digit identification
number, take the last 3 digits as the table position
• E.g. 925371622 becomes 622
• Add up the ASCII values of all characters of the key and take mod
with table size
23
Collision
If, when an element is inserted, it hashes to the same value as an
already inserted element, then we have a collision and need to
resolve it.
0
1
2
24
Collision Resolution
25
Separate Chaining
• The idea is to keep a list of all elements that hash to the
same value.
• The array elements are pointers to the first nodes of the lists.
• A new item is inserted to the front of the list.
• Advantages:
• Better space utilization for large items.
• Simple collision handling: searching linked list.
• Overflow: we can store more items than the hash table size.
• Deletion is quick and easy: deletion from the linked list.
26
Example
Keys: 0, 1, 4, 9, 16, 25, 36, 49, 64, 81
hash(key) = key % 10.
0 0 0
1 1 81 1
2 2
3 3
4 4 64 4
5 5 25
6 6 36 16
7 7
8 8
9 9 49 9
27
Operations
• Initialization:
• All entries are set to NULL
• Search:
• Locate the cell using hash function.
• Sequential search on the linked list in that cell.
• Insertion:
• Locate the cell using hash function.
• (If the item does not exist) insert it as the first item in the list.
• Deletion:
28
Node * hashtable[MAX] ;
// Insert a value in the hash table; You need to create a node, insert the
value in the node and place the node at appropriate location.
void insert(int k);
29
Open addressing
• Separate chaining has the disadvantage of using linked lists.
• Requires the implementation of a second data structure.
30
Open Addressing
31
Linear Probing
• In linear probing, collisions are resolved by sequentially
scanning an array (with wraparound) until an empty cell is
found.
Thus, once 77 collides with 52 at location 2, we simply put 77 in
position 3.
0 1 2 3
………….. 23 24
32
Linear Probing
0 1 2 3
………….. 23 24
Note: If the search reaches the end of the table, we continue at first
location.
33
Linear Probing
37
Quardatic probing
38
Quadratic Probing
• Quadratic probing: almost eliminates clustering problem
• Steps to follows:
• Start from the original hash location i
• If location is occupied, check locations i+12, i+22,
i+32, i+42 ...
• Wrap around table, if necessary.
39
Quadratic Probing -- Example
41
Double Hashing
• Example: 0
1
• Table Size is 11 (0..10)
2
• Hash Function: 3 58
h1(x) = x mod 11 4
5
h2(x) = 7 – (x mod 7)
6 91
• Insert keys: 58, 14, 91 7
• 58 mod 11 = 3 8
• 14 mod 11 = 3 3+7=10 9
10 14
• 91 mod 11 = 3 3+7, 3+2*7 mod 11=6
43
class hash
{
Public:
int HashTable[MAX];
//Hash Function to generate the index //hash(key) =
key%MAX;
int hashfunction(int key);
//A function that accepts the hash table and key to be inserted and inserts
the “key” at appropriate location in the table. Use linear probing to resolve
collisions. The returned values is the index at which the key is inserted.
//A function that inserts values in the table and resolves collisions
using quardatic probing.
int double_hashing(int HashTable[], int key);
};
HINTS :
Quardatic probing can be implemented like:
for (i = 0; i% MAX != pos ; i++)
pos = (pos + i * i) % MAX ;
44