0% found this document useful (0 votes)
3 views

Hashing

Uploaded by

Muhammad Haroon
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Hashing

Uploaded by

Muhammad Haroon
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

DATA STRUCTURES AND ALGORITHMS

Week 13: Hashing

Sagheer Ahmed
Department of CS
Air University, Islamabad
[email protected] 1
Introduction To Searching
• Linear Search & Binary Search
• Locate an item by a sequence of comparisons
• Item being sought – repeatedly compared with items in the list
• Fast Searching
• Location of an item is determined directly as a function of the item itself
• No hit and trial comparisons
• Ideally – Time required to locate an item is constant and does not depend
on the number of items stored
• Hash tables and hash functions 2
Why Hashing??
• Increased content especially internet
• Impossible to find anything, unless new data structures and
algorithms for storing and accessing data are developed.
• Problem with traditional data structures like Arrays and Linked
Lists?
• Sorted array ->Binary search -> time complexity =O(log n)
• Unsorted array -> Linear search -> time complexity = O(n)
• Either case may not be desirable if we need to process a very large data set.
• A new technique called hashing that allows us to update and
retrieve any entry in constant time O(1). The constant time or O(1)
performance means, the amount of time to perform the operation
does not depend on data size n.
Applications of Hashing

• Compilers use hash tables to implement the symbol table


(a data structure to keep track of declared variables)
• Game programs
• Spell Checking
• Substring Pattern Matching
• Searching
• Document comparison
4
When not to use hashing?
• Hash tables are very good if there is a need for many
searches in a reasonably stable table
• Hash tables are not so good if there are many
insertions and deletions, or if table traversals are
needed
• If there are more data than available memory then use
a tree
• Also, hashing is very slow for any operations which
require the entries to be sorted
5
• e.g. Find the minimum key
A simple example – direct hashing

• 7 integers, ranging 0-10 to be stored in a hash table:


• Key = {7, 3, 6, 4,9,1,5}
• The hash table can be implemented by:
• An integer array, table.
• Initialize each array element with some dummy value, like –1.
• Store value i at location table[i].

0 1 2 3 4 5 6 7 8 9 10
-1 -1 3 -1
1 -1 -1 5 -1
4 -1 7 -1 -1
6 -1 9 -1
6
A simple example – direct hashing

• To check whether a particular value number stored in the


hash table, we only need to check:

table[number] = number

• Hash Function:
• The function h defined by h(i) = i that determines the
location of an item i in the hash table is called hash function

7
A simple example – direct hashing
• 7 integers, ranging 0 – 999 to be stored in a hash table
• The hash table can be implemented by:
• An integer array, table.
• Initialize each array element with some dummy value, like –1.
• Store value i at location table[i].

0 1 997 998 999

8
A simple example – direct hashing
• For the hash function h(i) = i:

• Time required to search the table for a given item is constant, only one
location needs to be examined

• Very Time efficient – not Space efficient at all

• 7 out of 1000 locations used – 993 unused locations

• Since it is possible to store 7 values in 7 locations, we can improve on space


utilization 9
Hash functions
Key = {7, 3, 6, 4,9,1,5} h(i) = i modulo 7

• One possible hash function could be:


int h(int i)
• Or in C++ syntax: {
return i % 7;
}

0 1 2 3 4 5 6
-17 -1
1 -1 3 -1
9 -1 5 -1
4 -1 6
10
Hashing and hash functions
The above function would always produce an integer in the
range 0 –24.

The integer 52 is thus stored in table[2], since :

h(52) = 52 % 25 = 2

Similarly, 129, 500 and 49 are stored in locations 4,0 and


24 respectively.

Hash Table 500 -1 52 -1 129 . . . . . 49

0 1 2 3
………….. 23 24
11
Hash tables – formal definition
• The hash table structure is an array of some fixed size,
containing the items.
A stored item needs to have a data member, called key, that
will be used in computing the index value for the item.
• Key could be an integer, a string, etc
• e.g. a name or Id that is a part of a large employee
structure
• The size of the array is TableSize.
The items that are stored in the hash table are indexed by
values from 0 to TableSize – 1.
• Each key is mapped into some number in the range 0 to
TableSize – 1.
12

• The mapping is implemented through a hash function.


Example
Hash
Table
0
1
Items key 2
john 25000
3 john 25000
Hash
phil 31250 key hash 4 phil 31250
Functio
dave 27500 n index 5
mary 28200 6 dave 27500
7 mary 28200
void insert(int key) 8
int hash(int key)
{ 9
{
int h = hash(key);
return key%table_size;
table[h] = key;
}
}
13
Example
• The simplest kind of hash table is an array of records.
• This example has 701 records.

[0] [1] [2] [3] [4] [5] [ 700]

...

14
Example
• Each record has a special field, called its key.
• In this example, the key is the ID of an individual – a long integer
[4]
ID: 506643548

[0] [1] [2] [3] [5] [ 700]

...

15
Example
• The rest of the record has information about the person.

[4]
ID: 506643548

[0] [1] [2] [3] [5] [ 700]

...

16
Example
• When a hash table is in use, some spots contain valid records, and
other spots are empty

[0] [1] [2] [3] [4] [5] [ 700]


Number 281942902 Number 233667136 Number 506643548 Number 155778322

...

17
Example: Inserting a New Record
• In order to insert a new record, the key must
somehow be converted to an array index ID: 580625685

• The index is called the hash value of the key

[0] [1] [2] [3] [4] [5] [ 700]


Number 281942902 Number 233667136 Number 506643548 Number 155778322

...

18
Example: Inserting a New Record

• Simplest hash function:


ID: 580625685
(ID mod 701)
(580625685 mod 701) = 3

[0] [1] [2] [3] [4] [5] [ 700]


Number 281942902 Number 233667136 Number 506643548 Number 155778322

...

19
Example: Inserting a New Record

• The new record is inserted at location 3 in the hash table

[0] [1] [2] [3] [4] [5] [ 700]


Number 281942902 Number 233667136 ID: 580625685 Number 506643548 Number 155778322

...

20
Hash function examples for integer keys
• Let us consider a hash table size = 1000
• Truncation: If students have an 9-digit identification
number, take the last 3 digits as the table position
• E.g. 925371622 becomes 622

• Folding: Split a 9-digit number into three 3-digit numbers,


and add them
• E.g. 925371622 becomes 925 + 371 + 622 = 1923

• Modular arithmetic: If the table size is 1000, the first example


always keeps within the table range, but the second example does not
21
(it should be mod 1000)
Hash function
• A hash function should be easy and fast to compute

• A hash function should scatter the data evenly


throughout the hash table.
• How well does the hash function scatter random data?
• How well does the hash function scatter non-random data?

• If the input keys are integers then simply Key mod


TableSize is a general strategy.

• If the keys are strings, hash function needs more care


• First convert it into a numeric value. 22
Hash functions for non-numeric keys

• Add up the ASCII values of all characters of the key and take mod
with table size

int h(String x, int M)


{
char ch[];
ch = x.toCharArray();
int xlength = x.length();
int i, sum;
for (sum=0, i=0; i < x.length(); i++)
sum += ch[i];
return sum % M;
}

Apple = (65+112+112+108+101)%27 = 498 % 27 = (498-486) = 12

23
Collision
If, when an element is inserted, it hashes to the same value as an
already inserted element, then we have a collision and need to
resolve it.
0
1
2

24
Collision Resolution

25
Separate Chaining
• The idea is to keep a list of all elements that hash to the
same value.
• The array elements are pointers to the first nodes of the lists.
• A new item is inserted to the front of the list.
• Advantages:
• Better space utilization for large items.
• Simple collision handling: searching linked list.
• Overflow: we can store more items than the hash table size.
• Deletion is quick and easy: deletion from the linked list.
26
Example
Keys: 0, 1, 4, 9, 16, 25, 36, 49, 64, 81
hash(key) = key % 10.
0 0 0
1 1 81 1
2 2
3 3
4 4 64 4
5 5 25
6 6 36 16
7 7
8 8
9 9 49 9
27
Operations
• Initialization:
• All entries are set to NULL
• Search:
• Locate the cell using hash function.
• Sequential search on the linked list in that cell.
• Insertion:
• Locate the cell using hash function.
• (If the item does not exist) insert it as the first item in the list.
• Deletion:
28

• Locate the cell using hash function.


class Node{
public :
int key ;
Node * next ;
} ;
class hash{
public :

Node * hashtable[MAX] ;

//Hash function that generate hash index int


hashfunction(int key);

//Intialize the array of pointers to NULL void hash();

// Insert a value in the hash table; You need to create a node, insert the
value in the node and place the node at appropriate location.
void insert(int k);

// Display the complete data in the hash table void


display();
};

29
Open addressing
• Separate chaining has the disadvantage of using linked lists.
• Requires the implementation of a second data structure.

• In an open addressing hashing system, all the data go inside


the table.

• If a collision occurs, alternative cells are tried until an empty


cell is found.

30
Open Addressing

• There are three common collision resolution


strategies:
• Linear Probing
• Quadratic probing
• Double hashing

31
Linear Probing
• In linear probing, collisions are resolved by sequentially
scanning an array (with wraparound) until an empty cell is
found.
Thus, once 77 collides with 52 at location 2, we simply put 77 in
position 3.

Hash Table 500 -1 52 77 129 . . . . . 49

0 1 2 3
………….. 23 24

32
Linear Probing

Hash Table 500 -1 52 77 129 102 . . . . 49

0 1 2 3
………….. 23 24

To insert 102, we follow the probe sequence consisting of locations


2,3,4, and 5 to find the first available locations and thus store 102
in table[5].

Note: If the search reaches the end of the table, we continue at first
location.

33
Linear Probing

• To determine if a specified value is in the hash table,


we first apply the hash function to compute the
position at which this value should be found.
• There can by one of the following cases:
• The location is empty
• The location contains the specified value
• The location contains some other value
• Begin a circular linear search until either the item is found or
we reach an empty location or the starting location.
34
Linear Probing -- Example
• Example:
0 9
• Table Size is 11 (0..10)
1
• Hash Function: h(x) = x mod 11 2 2
• Insert keys: 20, 30, 2, 13, 25, 24, 10, 9 3 13
• 20 mod 11 = 9 4 25
• 30 mod 11 = 8 5 24
• 2 mod 11 = 2 6
• 13 mod 11 = 2 2+1=3 7
• 25 mod 11 = 3 3+1=4 8 30
• 24 mod 11 = 2 2+1, 2+2, 2+3=5 9 20
• 10 mod 11 = 10 10 10

• 9 mod 11 = 9 9+1, 9+2 mod 11 =0


35
Linear Probing -- Clustering Problem
• One of the problems with linear probing is that table items tend
to cluster together in the hash table.
• i.e. table contains groups of consecutively occupied
locations.

• This phenomenon is called primary clustering.


• Clusters can get close to one another, and merge into a larger
cluster.
• Thus, the one part of the table might be quite dense, even
though another part has relatively few items.
• Primary clustering causes long probe searches, and therefore,
decreases the overall efficiency.
36
Clustering Problem

• As long as table is big enough, a free cell can always be


found, but the time to do so can get quite large

• Larger table size are preferred

• Studies suggest the use of tables whose capacities are


approx. 1.5 to 2 times the number of items that must be
stored

37
Quardatic probing

• Quadratic Probing eliminates the clustering problem of


linear probing.

• If the hash function evaluates to h and a search in cell h


is inconclusive, we try cells h + 12, h+22, … h + i2.
• i.e. It examines cells 1,4,9 and so on away from the original
probe.

• Subsequent probe points are a quadratic number of


positions from the original probe point.

38
Quadratic Probing
• Quadratic probing: almost eliminates clustering problem
• Steps to follows:
• Start from the original hash location i
• If location is occupied, check locations i+12, i+22,
i+32, i+42 ...
• Wrap around table, if necessary.

39
Quadratic Probing -- Example

• Table Size is 11 (0..10) 0


1
• Hash Function: h(x) = x mod 11
2 2
• Insert keys: 20, 30, 2, 13, 25, 24, 10, 9 3 13
• 20 mod 11 = 9 4 25
• 30 mod 11 = 8 5
• 2 mod 11 = 2 6 24
• 13 mod 11 = 2 2+12=3 7 9
• 25 mod 11 = 3 3+12=4 8 30
• 24 mod 11 = 2 2+12, 2+22=6 9 20
• 10 mod 11 = 10 10 10
• 9 mod 11 = 9 9+12, 9+22 mod 11,
9+32 mod 11 =7 40
Double Hashing

• A second hash function is used to drive the collision


resolution.

• We apply a second hash function to x and probe at a


distance hash2(x), 2*hash2(x), … and so on.

• The function hash2(x) must never evaluate to zero

41
Double Hashing

• Double hashing also reduces clustering.


• Idea: Increment using a second hash function h2. Should
satisfy:
h2(key) ≠0
h2≠h1
• Probes following locations until it finds an unoccupied place
h1(key)
h1(key) + h2(key)
h1(key) + 2*h2(key),
42
Double Hashing -- Example

• Example: 0
1
• Table Size is 11 (0..10)
2
• Hash Function: 3 58
h1(x) = x mod 11 4
5
h2(x) = 7 – (x mod 7)
6 91
• Insert keys: 58, 14, 91 7
• 58 mod 11 = 3 8
• 14 mod 11 = 3 3+7=10 9
10 14
• 91 mod 11 = 3 3+7, 3+2*7 mod 11=6

43
class hash
{
Public:
int HashTable[MAX];
//Hash Function to generate the index //hash(key) =
key%MAX;
int hashfunction(int key);
//A function that accepts the hash table and key to be inserted and inserts
the “key” at appropriate location in the table. Use linear probing to resolve
collisions. The returned values is the index at which the key is inserted.

int linear_probing(int HashTable[], int key);


//A function that accepts the hash table and key to be inserted and inserts
the “key” at appropriate location in the table. Use linear probing to resolve
collisions. The returned values is the index at which the key is inserted.

int quadratic_probing(int HashTable[], int key);

//A function that inserts values in the table and resolves collisions
using quardatic probing.
int double_hashing(int HashTable[], int key);
};

HINTS :
Quardatic probing can be implemented like:
for (i = 0; i% MAX != pos ; i++)
pos = (pos + i * i) % MAX ;
44

You might also like