0% found this document useful (0 votes)
48 views50 pages

Week 11 Lec 01 Hash Table

This document discusses hash tables and how they can be used to solve the search and insert problem in constant time. Hash tables work by using a hash function to map data to array indices, allowing for very fast lookups. However, collisions may occur if multiple data values map to the same index. To resolve collisions, data can be stored in the next empty slot or by probing through the array. With a good hash function and handling of collisions, hash tables provide an efficient O(1) solution to search and insert operations.

Uploaded by

deep patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views50 pages

Week 11 Lec 01 Hash Table

This document discusses hash tables and how they can be used to solve the search and insert problem in constant time. Hash tables work by using a hash function to map data to array indices, allowing for very fast lookups. However, collisions may occur if multiple data values map to the same index. To resolve collisions, data can be stored in the next empty slot or by probing through the array. With a good hash function and handling of collisions, hash tables provide an efficient O(1) solution to search and insert operations.

Uploaded by

deep patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 50

Data structures in C

(PROG20799 )

Hash table
M Mohiuddin

Parts of this lecture are adapted from Simon Hood’s lecture with his explicit
and kind consent.

Course:PROG20799, week11 1
Lecture Overview
• Students’ queries
• Hash Tables
• The “Search and Insert” Problem
• The Hash Function
• Resolving Collisions

Course:PROG20799, week11 2
Hash table
• There is one search algorithm we’ve left for last

• It’s not the most difficult and is the fastest!

• It’s “hash table” which performs searches in O(1)


time i.e. constant time

Course:PROG20799, week11 3
Search and Insert
• In programming we often find ourselves working
with some version of the “search and insert”
problem
• Given a list of items, search for an item in the list
as efficiently as possible
• If the item is not found, insert it into the list
• Items can be integers, chars or whatever

Course:PROG20799, week11 4
Course:PROG20799, week11 5
Five ways to do search and insert
• Store the list in an array and add new items at the end. This
implies a sequential search must be used to find items,
however it’s easy to add them.
2 3 6 1 8 7 9 34 ………………………………5
• Store the list in an array in sorted order and add new items
where they belong. This allows for faster searching, but
inserting new items is cumbersome.
1 2 3 4 5 7 8 9 12 14 16 18 21 23
• Store the list in an unsorted linked list. Addition of items is
easy, but we must traverse the entire list sequentially to find if
a given item exists.
• And…

Course:PROG20799, week11 6
Search and insert methods…..
• Store the list in a sorted linked list. Addition of
items is easy, and we can use our fast
algorithms to search for a given item –a good
choice
• Store the list in a binary search tree. Searching
is built-in and quick, and insertion is as simple
as a linked list –a better choice!

Course:PROG20799, week11 7
0 665 … 789 …….. … 1000

Course:PROG20799, week11 8
Hash table
• The best choice though is a hash table. Constant
time is tough to beat
• Big O of hashtable is O (1)
• The idea is roughly that we can store our items in
an array, for example, and store the items in their
appropriate place while we add them
• If we’re talking about integers, we can store them
in their value’s index
• But, it can be very wasteful! How?

Course:PROG20799, week11 9
Hash functions
• A simple way to reduce a number to a more
manageable size (so we can insert it in a
reasonably sized array index) is to use the
modulus operator
• For example, if we mod all our integer values by
100, we will always have a value between 0 and
99
• Even if our data is 9999999, (9999999 % 100) is
still just 99,
index = key % 100 = 56478876 % 100 = 76
Course:PROG20799, week11 10
Hash functions
• For example, (751 % 100) + 1 is 52, so we store it
in a[52]
• For example, (95422 % 100) + 1 is 23, so we store
it in a[23]
• So we’ve inserted 751 in index 52 and 95422 in
index 23

Course:PROG20799, week11 11
Hash functions
• For example, (751 % 100) + 1 is 52, so we store it
in a[52]
• For example, (95422 % 100) + 1 is 23, so we store
it in a[23]
• So we’ve inserted 751 in index 52 and 95422 in
index 23
• If we want to search for a given value (say 751 ),
we perform the hash function on our search
value and then go straight to that index (52) –
there it is!
Course:PROG20799, week11 12
Example problem
Company ABC has 52 employees and their IDs vary from 5000 to 8000.
Queries: is an employee with IDs 5208, or 7609, or 8000 there? 5260
Size of dataset = 52, spread of dataset = 3000, minimum data value = 5000
2nd approach
1st approach: simplest, fastest array size = spread + 1 3rd approach
but the most wasteful Hashfunction: index = data - 5000 array size = 52 + 1
array size: largest value + 1 Hashfunction: index = data % 52
index key i.e. data index key i.e. data index key i.e. data
0 0 0
1 1 1
……… ……… ..
5208 5208 208 5008 8 5208
….. …
7609 7609 17 7609
2609 7609 …
44 8000

8000 8000 3000 8000


52
Course:PROG20799, week11 13
Example problem
Company ABC has 52 employees and their IDs vary from 5000 to 8000.
Queries: is an employee with IDs 5008, or 7609, or 8000 there?
Size of dataset = 52, spread of dataset = 3K, minimum key value = 5K
2nd approach
1st approach: simplest yet the array size = 3rd approach
most wasteful Hashfunction: index = array size =
array size: Hashfunction: index =
index key i.e. data index key i.e. data index key i.e. data
0 0 0
1 1 1
……… ………

Course:PROG20799, week11 14
Example problem Solution
Company ABC has 52 employees, and their IDs vary from 5000 to 8000.
Queries: is an employee with IDs 5008, or 7609, or 8000 there?
Size of dataset = 52, spread of dataset = 3K, minimum key value = 5K
2nd approach
1st approach: simplest yet the array size = spread of the data + 3rd approach
most wasteful 1 = Hashfunction: index = key – array size = size of data + 1 = 53
array size: largest value + 1 5K Hashfunction: key % 52
index key i.e. data index key i.e. data index key i.e. data
0 0 0
1 1 1
……… ……… 16 5008
5008 5008 8 5008 17 7609, 5009
…. ….
7600 7600

… 2609 2609 44 8000


7609 7609
…..
8000 8000 3000 3000
Course:PROG20799, week11 15
Collisions
• What if we try to insert 751 and 2351?
• Location = 751 % 100 = 51
• Location = 2351 % 100 = 51

Course:PROG20799, week11 16
Collisions
• What if we try to insert 751 and 2351?
• We have a collision –both values, transformed
through our hash function, should be stored in
index 52
• How can we overcome this limitation?

Course:PROG20799, week11 17
Collisions
• What if we try to insert 751 and 2351?
• We have a collision –both values, transformed
through our hash function, should be stored in
index 52
• How can we overcome this limitation?
• We insert the number at the next empty space

Course:PROG20799, week11 18
Resolution of collisions!
• We can use a simple while loop to iterate through
our hash table until we find the given number –
just make sure to start at the proper place

• As long as there are free indices in our array, and


we assume the array is circular, we can iterate
through it easily enough

• The average time to find our value is still going to


be very quick!
Course:PROG20799, week11 19
Example : 52, 33, 84, 43, 16, 59, 31, 23, 61, 64, 67, 80

0 1 2 3 4 5 6 7 8 9 10 11 12

• Lets use hashing function: index = H(key) = key % 12


• If the numbers arrive in the following order:
hash function: index = 52 % 12 = 4, collision = 14

• The final table will be:


84 61 80 52 16 64 43 31 33 67 59 23

0 1 2 3 4 5 6 7 8 9 10 11 12

Course:PROG20799, week11 20
index = 64 % 12 = 4, collisions = 12

52, 33, 84, 43, 16, 59, 31, 23, 61, 64, 67, 80

84 61 52 16 64 43 31 33 67 59 23 80

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

84 61 80 52 16 64 43 31 33 67 59 23

0 1 2 3 4 5 6 7 8 9 10 11 12
index = 64 % 20 = 4, collisions = 4

52, 33, 84, 43, 16, 59, 31, 23, 61, 64, 67, 80

80 61 43 84 23 64 67 31 52 33 16 59

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Course:PROG20799, week11 21
Exercise
• Ex:1 Do the same problem i.e. using the same
hash function and receiving the same numbers in
the same order; however, with an array of 20
elements.
• Ex:2 Now change the hash function to:
H(key) = key%20

Course:PROG20799, week11 22
Hash function—C implementation
#define MaxNumbers 50 // maximum number of records
#define N 100 // size of an array
#define Empty 0
int main() {
FILE *in = fopen("numbers.in", "r");
// core logic follows

Course:PROG20799, week11 23
int key, loc, num[N + 1];
for (int j = 1; j <= N; j++) num[j] = Empty;
int distinct = 0;
while (fscanf(in, "%d", &key) == 1) {
loc= key % N + 1;
while (num[loc] != Empty && num[loc] != key)
loc = loc % N + 1;
if (num[loc] == Empty) { // key is not in the table
if (distinct == MaxNumbers) {
printf("Table full: %d not added\n", key);
exit(1); }
num[loc] = key; // if table not full then key is added and
distinct++; // number of distinct entries incremented
}} Course:PROG20799, week11 24
printf("There are %d distinct numbers\n", distinct);
fclose(in);
return 0;
}

Course:PROG20799, week11 25
Deleting an item from a hash table
• Let’s say we have the following table:

84 23 61 52 16 43 31 33 59

0 1 2 3 4 5 6 7 8 9 10 11
• Recall that 43 and 31 hashed initially to the same location 7.
Suppose 43 is to be deleted. Location = key % 12

Course:PROG20799, week11 26
Deleting an item from a hash table
• Lets say we have the following table:

84 23 61 52 16 31 33 59

0 1 2 3 4 5 6 7 8 9 10 11
• Recall that 43 and 31 hashed initially to the same location 7.
Suppose 43 is to be deleted.
• What happens if we delete 43 and then look for 31??

Course:PROG20799, week11 27
Deleting an item from a hash table
• Lets say we have the following table:

84 23 61 52 16 -1 31 33 59

0 1 2 3 4 5 6 7 8 9 10 11
• Recall that 43 and 31 hashed initially to the same location 7.
Suppose 43 is to be deleted.
• What happens if we delete it and then look for 31 ??
• Solution: for deleted entries, force a value that is different
from the one assigned for empty, for example ‘-1’.
• We still search for the key or the first empty location, but
ignore locations marked as deleted!
• However, a new key can be inserted in a location marked
deleted.

Course:PROG20799, week11 28
Filling locations marked deleted
84 23 61 52 16 43 31 33 59

0 1 2 3 4 5 6 7 8 9 10 11

• Suppose we deleted 43 by marking num[7] = -1


84 23 61 52 16 -1 31 33 59

0 1 2 3 4 5 6 7 8 9 10 11

Course:PROG20799, week11 29
Filling locations marked deleted
84 23 61 52 16 43 31 33 59

0 1 2 3 4 5 6 7 8 9 10 11

• Suppose we deleted 43 by marking num[7] = -1


84 23 61 52 16 -1 31 33 59

0 1 2 3 4 5 6 7 8 9 10 11
• If we now search for 55,

Course:PROG20799, week11 30
Filling locations marked deleted
84 23 61 52 16 43 31 33 59

0 1 2 3 4 5 6 7 8 9 10 11

• Suppose we deleted 43 by marking num[7] = -1


84 23 61 52 16 -1 31 33 59

0 1 2 3 4 5 6 7 8 9 10 11
• If we now search for 55, we will check locations
7,8,9 and 10 and then decide that 55 is not in the
table and must be inserted!
• We can set num[10] = 55, but should we not
insert it at the first location marked deleted ?
Course:PROG20799, week11 31
Hashfunction index = key % 12
index = 55 % 12 = 7

84 23 61 52 16 -1 31 33 55 59

0 1 2 3 4 5 6 7 8 9 10 11

84 23 61 52 16 55 31 33 59

0 1 2 3 4 5 6 7 8 9 10 11

Course:PROG20799, week11 32
Find or insert in a hash table
//find or insert ‘key’ in the hash table, num[1..n]
loc = H(key)
deletedLoc = 0
while ( num[loc]!= Empty && num[loc]!=key) {
if (deletedLoc == 0 && num[loc] == Deleted)
deletedLoc = loc //storing the first location marked deleted
loc = loc % n + 1 }
if (num[loc] == Empty) { // key not found
if (deletedLoc!=0)
loc = deletedLoc
num[loc] = key }
else print key, “found in location”, loc
Course:PROG20799, week11 33
Delete a key from Hash table
void deleteKey (int key, int num[ ]) {
int loc = key % N + 1;
while ( num[loc]!= Empty && num[loc]!=key ){
loc = loc % N + 1; } // % N ensure to circle back if end of array is reached
if ( num[loc]== Empty){
printf("\nKey not found\n");
return; }
if ( num[loc] == key){
num[loc] = Deleted;
printf("\n Key found on location %d and deleted\n", loc);
}
}

Course:PROG20799, week11 34
Hash function for strings
• What if we want to store words or chars? We
can’t mod a word can we?
• A simple way to overcome this limitation is to
convert each char in the word to an int. Then,
add the char/ints representing each letter
together, and mod plus 1 as normal
• Unfortunately, this means that anagrams collide –
mate, meat and team all have the same value

Course:PROG20799, week11 35
Hash function for strings
• We might consider assigning weights to each
letter based on the letter’s position in the word
• The main goal is to avoid collisions -if we have a
collision, our algorithm runs more slowly
• We might assign 3 to the first letter, 5 to the
second, 7 to the third, and so on
• Make sure your hash function is simple! Why ?
• Mate = 3 * ASCII(M) + 5 * ASCII(a) + 7 * ASCII(t) + 9 * ASCII(e ) =
• Team = 3 * ASCII(T) + 5 * ASCII(e) + 7 * ASCII(a) + 9 * ASCII(m ) =

Course:PROG20799, week11 36
Hash function for strings
A simple snippet of code to hash words might then
be to avoid anagrams colliding
int j, wordNum= 0;
int weight = 3;
while (word[j] != '\0') {
wordNum+= weight + word[j++];
weight += 2;
}
location = wordNum % n + 1;

Course:PROG20799, week11 37
Linear probing for collision avoidance
84 23 61 52 16 43 31 33 59

0 1 2 3 4 5 6 7 8 9 10 11

• Linear probing is the approach to avoid a collision


by shifting the location a fixed number of spaces
• Consider the probability of a collision after nine
numbers have been added!
• In fact any key that hashes to 11,0,1,2 will end up
being in location ‘3’!

Course:PROG20799, week11 38
Drawbacks of linear probing with unity
step size
• For a fuller table we may have to move quite far
before we get a free spot
• If we insert values in indices where they don’t
belong, we are increasing the probability of
future collisions
• If, however, we shift the index by an arbitrary
number , we may reduce the size of contiguously
filled indices
• Think of relative primes!

Course:PROG20799, week11 39
Linear probing with double hashing
loc = num % n + 1 // this gives initial hash location
for linear probing with single hashing, k = CONSTANT
For linear probing with double hashing, K is:
k = num% (n - 2) + 1 // this gives the increment for
//this key
It is strongly recommended that n and n-2 are twin
primes like 31/29, 103/101 or 1021/1019 etc. and
must be just less than the size of the array.

Course:PROG20799, week11 40
Double hashing implementation
// returns 0 if the key is found or 1 otherwise
int findOrInsertDouble(int key, int num[]) {
int loc = key % N + 1;
int k = key % (N - 2) + 1;
int deletedLoc = 0;
while ( num[loc]!= Empty && num[loc]!=key) {
collisions++;
if (deletedLoc == 0 && num[loc] == Deleted)
deletedLoc = loc; //storing the first location marked deleted
loc = loc + k;
if (loc > N)
loc = loc - N; }

Course:PROG20799, week11 41
Double hashing implementation
if (num[loc] == Empty) // key not found
{
if (deletedLoc!=0)
loc = deletedLoc;
num[loc] = key;
return 1;
} // one key added
else
{
//printf( "\n%d found on location %d\n", key, loc);
return 0; // no key added
}
}
Course:PROG20799, week11 42
Performance of hash table
Load factor

For a successful search average number of


comparisons

For an unsuccessful search average number of


comparisons

Course:PROG20799, week11 43
The concept of chaining

H(k1)

k1 k2
H(k2)

Each cell in the table points to a linked list

Course:PROG20799, week11 44
Exercise for hashtable with chaining

52, 33, 84, 43, 16, 59, 31, 23, 61, 64, 67, 80
index = key % 12 =

84 61 52 43 80 33 59

0 1 2 3 4 5 6 7 8 9 10 11

16 31 23

64 67

Course:PROG20799, week11 45
Exercise Solution for hashtable with chaining

52, 33, 84, 43, 16, 59, 31, 23, 61, 64, 67, 80

index = key % 12 =

84 61 52 43 80 33 59

0 1 2 3 4 5 6 7 8 9 10 11

16 31 23

4 67

Course:PROG20799, week11 46
Hash table with chaining implementation
struct node {
int key, age;
char name[100];
struct node *next; };
struct hash {
struct node *head;
int count; };
struct hash *hashTable = NULL; //hash table global declaration
………// in the main function, hash table definition
hashTable = (struct hash *) calloc(n, sizeof (struct hash));

Course:PROG20799, week11 47
insertToHash function
void insertToHash(int key, char *name, int age) {
int hashIndex = key % eleCount; // hashing function
struct node *newnode = createNode(key, name, age);
/* head of list for the bucket with index "hashIndex" */
if (!hashTable[hashIndex].head) {
hashTable[hashIndex].head = newnode;
hashTable[hashIndex].count = 1; return; }
newnode->next = (hashTable[hashIndex].head);
//update head of the list and no of nodes in the current bucket
hashTable[hashIndex].head = newnode;
hashTable[hashIndex].count++; }

Course:PROG20799, week11 48
Extra stuff

Course:PROG20799, week11 49
Wasteful, but no Space efficient, but
collisions with collisions
• Data (keys) varies from 10M to 50M
• Spread of the data is 40M
• # of records = size of the data = 3K
• Key = 5,003,760
Normal hash function
Simple hash function:
Minimum array size =
No hash function: Array size = 40M + 1
3K + 1
Array size = 50M + 1 For 3K data, array size of
Usually 6K
For 3K data, array size 40M is still very wastful
Hashfunction:
of 50M is very wastful Hash function:
index = key % 6K
Example: index = key – 10M
key = 15,003,760
key = 15,003,760 key = 15,003,760
index = 3760
index = 15,003,760 index = 5,003,760
Course:PROG20799, week11 50

You might also like