0% found this document useful (0 votes)

2 views

Hashing

Hashing is a technique used to uniquely identify objects by converting keys into indexes in a hash table, allowing for efficient data retrieval. It involves using a hash function to map data of arbitrary size to a fixed size, with operations such as search, insert, and delete being performed in average O(1) time. Various collision resolution techniques, including separate chaining and open addressing methods like linear probing and double hashing, are employed to manage instances where multiple keys hash to the same index.

Uploaded by

s.dhanapal13

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Hashing

Uploaded by

s.dhanapal13

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Hashing

Hashing is a technique that is used to uniquely identify a specific object from a group of similar objects. Some
examples of how hashing is used in our lives include:

 In universities, each student is assigned a unique roll number that can be used to retrieve information
about them.
 In libraries, each book is assigned a unique number that can be used to determine information about the
book, such as its exact position in the library or the users it has been issued to etc.

In both these examples the students and books were hashed to a unique number.

Hashing is a technique to convert a range of key values into a range of indexes of an array.
We're going to use modulo operator to get a range of key values. Consider an example of hash
table of size 20, and the following items are to be stored. Item are in the (key,value) format.

 In hashing, large keys are converted into small keys by using hash functions.
 The values are then stored in a data structure called hash table.
o The idea of hashing is to distribute entries (key/value pairs) uniformly across an array. Each
element is assigned a key (converted key). By using that key you can access the element
in O(1) time. Using the key, the algorithm (hash function) computes an index that suggests
where an entry can be found or inserted.

Implementation:

Hashing is implemented in two steps:

1. An element is converted into an integer by using a hash function. This element can be used as an index
to store the original element, which falls into the hash table.
2. The element is stored in the hash table where it can be quickly retrieved using hashed key.

hash = hashfunc(key)
index = hash % array_size

In this method, the hash is independent of the array size and it is then reduced to an index (a number between 0
and array_size − 1) by using the modulo operator (%).

Hash function
A hash function is any function that can be used to map a data set of an arbitrary size to a data set of a fixed
size, which falls into the hash table. The values returned by a hash function are called hash values, hash codes,
hash sums, or simply hashes.

To achieve a good hashing mechanism, It is important to have a good hash function with the following basic
requirements:

1. Easy to compute: It should be easy to compute and must not become an algorithm in itself.
2. Uniform distribution: It should provide a uniform distribution across the hash table and should not result
in clustering.
3. Less collisions: Collisions occur when pairs of elements are mapped to the same hash value. These
should be avoided.
 (1,20)
 (2,70)
 (42,80)
 (4,25)
 (12,44)
 (14,32)
 (17,11)
 (13,78)
 (37,98)

Sr.No. Key Hash Array Index

1 1 1 % 20 = 1 1

2 2 2 % 20 = 2 2

3 42 42 % 20 = 2 2

4 4 4 % 20 = 4 4

5 12 12 % 20 = 12 12

6 14 14 % 20 = 14 14

7 17 17 % 20 = 17 17

8 13 13 % 20 = 13 13

9 37 37 % 20 = 17 17
Hash table
A hash table is a data structure that is used to store keys/value pairs. It uses a hash function to compute an index
into an array in which an element will be inserted or searched. By using a good hash function, hashing can work
well. Under reasonable assumptions, the average time required to search for an element in a hash table is O(1).

Let us consider string S. You are required to count the frequency of all the characters in this string.

string S = “ababcd”

The simplest way to do this is to iterate over all the possible characters and count their frequency one by
one. The time complexity of this approach is O(26*N) where N is the size of the string and there are 26
possible characters.

void countFre(string S)
{
for(char c = ‘a’;c <= ‘z’;++c)
{
int frequency = 0;
for(int i = 0;i < S.length();++i)
if(S[i] == c)
frequency++;
cout << c << ‘ ‘ << frequency << endl;
}
}

Output

a 2
b 2
c 1
d 1
e 0
f 0
…
z 0
Basic Operations
Following are the basic primary operations of a hash table.
 Search − Searches an element in a hash table.
 Insert − inserts an element in a hash table.
 delete − Deletes an element from a hash table.

vector <string> hashTable[20];

int hashTableSize=20;

Insert

Insert Operation
Whenever an element is to be inserted, compute the hash code of the key passed and locate
the index using that hash code as an index in the array. Use linear probing for empty
location, if an element is found at the computed hash code

void insert(string s)
{

 // Compute the index using Hash Function

int index = hashFunc(s);

// Insert the element in the linked list at the particular index
hashTable[index].push_back(s);
}

Search Operation
Whenever an element is to be searched, compute the hash code of the key passed and locate
the element using that hash code as index in the array. Use linear probing to get the element
ahead if the element is not found at the computed hash code.

void search(string s)
{
//Compute the index by using the hash function
int index = hashFunc(s);
//Search the linked list at that specific index
for(int i = 0;i < hashTable[index].size();i++)
{
if(hashTable[index][i] == s)
{
cout << s << " is found!" << endl;
return;
}
}
cout << s << " is not found!" << endl;
}

Delete Operation
Whenever an element is to be deleted, compute the hash code of the key passed and locate the
index using that hash code as an index in the array. Use linear probing to get the element ahead
if an element is not found at the computed hash code. When found, store a dummy item there to
keep the performance of the hash table intact

struct DataItem* delete(struct DataItem* item) {

int key = item->key;

//get the hash

int hashIndex = hashCode(key);

//move in array until an empty

while(hashArray[hashIndex] !=NULL) {

if(hashArray[hashIndex]->key == key) {
struct DataItem* temp = hashArray[hashIndex];

//assign a dummy item at deleted position

hashArray[hashIndex] = dummyItem;
return temp;
}

//go to next cell

++hashIndex;

//wrap around the table

hashIndex %= SIZE;
}

return NULL;
}

Applications

 Associative arrays: Hash tables are commonly used to implement many types of in-memory tables.
They are used to implement associative arrays (arrays whose indices are arbitrary strings or other
complicated objects).
 Database indexing: Hash tables may also be used as disk-based data structures and database indices
(such as in dbm).
 Caches: Hash tables can be used to implement caches i.e. auxiliary data tables that are used to speed
up the access to data, which is primarily stored in slower media.
 Object representation: Several dynamic languages, such as Perl, Python, JavaScript, and Ruby use
hash tables to implement objects.
 Hash Functions are used in various algorithms to make their computing faster

Collision resolution techniques

Separate chaining (open hashing)

 Separate chaining is one of the most commonly used collision resolution techniques. It is usually
implemented using linked lists. In separate chaining, each element of the hash table is a linked list. To
store an element in the hash table you must insert it into a specific linked list. If there is any collision (i.e.
two different elements have same hash value) then store both the elements in the same linked list.

 The cost of a lookup is that of scanning the entries of the selected linked list for the required key. If the
distribution of the keys is sufficiently uniform, then the average cost of a lookup depends only on the
average number of keys per linked list. For this reason, chained hash tables remain effective even when
the number of table entries (N) is much higher than the number of slots.
 For separate chaining, the worst-case scenario is when all the entries are inserted into the same linked
list. The lookup procedure may have to scan all its entries, so the worst-case cost is proportional to the
number (N) of entries in the table.
 In the following image, CodeMonk and Hashing both hash to the value 2. The linked list at the
index 2 can hold only one entry, therefore, the next entry (in this case Hashing) is linked (attached) to
the entry of CodeMonk.

Implementation of hash tables with separate chaining (open hashing)

Assumption

Hash function will return an integer from 0 to 19.

vector <string> hashTable[20];

int hashTableSize=20;

Insert

void insert(string s)
{
// Compute the index using Hash Function
int index = hashFunc(s);
// Insert the element in the linked list at the particular index
hashTable[index].push_back(s);
}

Search
void search(string s)
{
//Compute the index by using the hash function
int index = hashFunc(s);
//Search the linked list at that specific index
for(int i = 0;i < hashTable[index].size();i++)
{
if(hashTable[index][i] == s)
{
cout << s << " is found!" << endl;
return;
}
}
cout << s << " is not found!" << endl;
}

Linear probing (open addressing or closed hashing)

In open addressing, instead of in linked lists, all entry records are stored in the array itself. When a new entry has
to be inserted, the hash index of the hashed value is computed and then the array is examined (starting with the
hashed index). If the slot at the hashed index is unoccupied, then the entry record is inserted in slot at the
hashed index else it proceeds in some probe sequence until it finds an unoccupied slot.

The probe sequence is the sequence that is followed while traversing through entries. In different probe
sequences, you can have different intervals between successive entry slots or probes.

When searching for an entry, the array is scanned in the same sequence until either the target element is found
or an unused slot is found. This indicates that there is no such key in the table. The name "open addressing"
refers to the fact that the location or address of the item is not determined by its hash value.

Linear probing is when the interval between successive probes is fixed (usually to 1). Let’s assume that the
hashed index for a particular entry is index. The probing sequence for linear probing will be:

index = index % hashTableSize

index = (index + 1) % hashTableSize
index = (index + 2) % hashTableSize
index = (index + 3) % hashTableSize

Implementation of hash table with linear probing

Assumption

 There are no more than 20 elements in the data set.

 Hash function will return an integer from 0 to 19.
 Data set must have unique elements.

string hashTable[21];
int hashTableSize = 21;

Insert

void insert(string s)
{
//Compute the index using the hash function
int index = hashFunc(s);
//Search for an unused slot and if the index will exceed the
hashTableSize then roll back
while(hashTable[index] != "")
index = (index + 1) % hashTableSize;
hashTable[index] = s;
}

void search(string s)
{
//Compute the index using the hash function
int index = hashFunc(s);
//Search for an unused slot and if the index will exceed the
hashTableSize then roll back
while(hashTable[index] != s and hashTable[index] != "")
index = (index + 1) % hashTableSize;
//Check if the element is present in the hash table
if(hashTable[index] == s)
cout << s << " is found!" << endl;
else
cout << s << " is not found!" << endl;
}

Quadratic Probing

Quadratic probing is similar to linear probing and the only difference is the interval between successive probes or
entry slots. Here, when the slot at a hashed index for an entry record is already occupied, you must start
traversing until you find an unoccupied slot. The interval between slots is computed by adding the successive
value of an arbitrary polynomial in the original hashed index.

Let us assume that the hashed index for an entry is index and at index there is an occupied slot. The probe
sequence will be as follows:

index = index % hashTableSize

index = (index + 12) % hashTableSize
index = (index + 22) % hashTableSize
index = (index + 32) % hashTableSize

and so on…

Implementation of hash table with quadratic probing

Assumption

 There are no more than 20 elements in the data set.

 Hash function will return an integer from 0 to 19.
 Data set must have unique elements.

string hashTable[21];
int hashTableSize = 21;

Insert
void insert(string s)
{
//Compute the index using the hash function
int index = hashFunc(s);
//Search for an unused slot and if the index will exceed the
hashTableSize roll back
int h = 1;
while(hashTable[index] != "")
{
index = (index + h*h) % hashTableSize;
h++;
}
hashTable[index] = s;
}

void search(string s)
{
//Compute the index using the Hash Function
int index = hashFunc(s);
//Search for an unused slot and if the index will exceed the
hashTableSize roll back
int h = 1;
while(hashTable[index] != s and hashTable[index] != "")
{
index = (index + h*h) % hashTableSize;
h++;
}
//Is the element present in the hash table
if(hashTable[index] == s)
cout << s << " is found!" << endl;
else
cout << s << " is not found!" << endl;
}

Double hashing

Double hashing is similar to linear probing and the only difference is the interval between successive probes.
Here, the interval between probes is computed by using two hash functions.

Let us say that the hashed index for an entry record is an index that is computed by one hashing function and the
slot at that index is already occupied. You must start traversing in a specific probing sequence to look for an
unoccupied slot. The probing sequence will be:

index = (index + 1 * indexH) % hashTableSize;

index = (index + 2 * indexH) % hashTableSize;

and so on…

Here, indexH is the hash value that is computed by another hash function.

Implementation of hash table with double hashing

Assumption

 There are no more than 20 elements in the data set.

 Hash functions will return an integer from 0 to 19.
 Data set must have unique elements.

string hashTable[21];
int hashTableSize = 21;

Insert

void insert(string s)
{
//Compute the index using the hash function1
int index = hashFunc1(s);
int indexH = hashFunc2(s);
//Search for an unused slot and if the index exceeds the
hashTableSize roll back
while(hashTable[index] != "")
index = (index + indexH) % hashTableSize;
hashTable[index] = s;
}

void search(string s)
{
//Compute the index using the hash function
int index = hashFunc1(s);
int indexH = hashFunc2(s);
//Search for an unused slot and if the index exceeds the
hashTableSize roll back
while(hashTable[index] != s and hashTable[index] != "")
index = (index + indexH) % hashTableSize;
//Is the element present in the hash table
if(hashTable[index] == s)
cout << s << " is found!" << endl;
else
cout << s << " is not found!" << endl;
}

ZTE Config
No ratings yet
ZTE Config
4 pages
ds 5 update
No ratings yet
ds 5 update
26 pages
Hashing in Data Structure
No ratings yet
Hashing in Data Structure
43 pages
11 Hashtable-1
No ratings yet
11 Hashtable-1
48 pages
Lab08 - DS - Hash Tables
No ratings yet
Lab08 - DS - Hash Tables
9 pages
Hashing
No ratings yet
Hashing
44 pages
Lec12-Hash-Tables-09092024-090609pm (1)
No ratings yet
Lec12-Hash-Tables-09092024-090609pm (1)
48 pages
Unit 3 Hashing
No ratings yet
Unit 3 Hashing
23 pages
09 Hashtable
No ratings yet
09 Hashtable
53 pages
DSAL writeups
No ratings yet
DSAL writeups
51 pages
DSAL Manual Assignment 4
No ratings yet
DSAL Manual Assignment 4
6 pages
10 Hash Table
No ratings yet
10 Hash Table
25 pages
Lect Hashing
No ratings yet
Lect Hashing
36 pages
Hashing Techniques
No ratings yet
Hashing Techniques
13 pages
Exp 5 - Dsa Lab File
No ratings yet
Exp 5 - Dsa Lab File
10 pages
05 Hashing
No ratings yet
05 Hashing
47 pages
DS - Unit 5 - Notes
No ratings yet
DS - Unit 5 - Notes
8 pages
22CS302_LM21
No ratings yet
22CS302_LM21
7 pages
Lecture 3.Pptx 3
No ratings yet
Lecture 3.Pptx 3
24 pages
DSA Practical
No ratings yet
DSA Practical
51 pages
ADI Hashing
No ratings yet
ADI Hashing
47 pages
Task 2 - Hashing and Linear Probing
No ratings yet
Task 2 - Hashing and Linear Probing
16 pages
Hashing: Amar Jukuntla
No ratings yet
Hashing: Amar Jukuntla
22 pages
Hashing and Indexing
No ratings yet
Hashing and Indexing
28 pages
unit 1 Hashing
No ratings yet
unit 1 Hashing
61 pages
Hashing PPT For Student
No ratings yet
Hashing PPT For Student
53 pages
DS Module-X
No ratings yet
DS Module-X
74 pages
Hashing RPK
No ratings yet
Hashing RPK
61 pages
DSA Lab 11 Hashing
No ratings yet
DSA Lab 11 Hashing
9 pages
Lab 09 - Hashing
No ratings yet
Lab 09 - Hashing
47 pages
Week 9_Hash Functions and Collision
No ratings yet
Week 9_Hash Functions and Collision
73 pages
Dsa Hashing (21CS32)
No ratings yet
Dsa Hashing (21CS32)
16 pages
hashing
No ratings yet
hashing
14 pages
Hash Table
No ratings yet
Hash Table
9 pages
Hash Table Data Structure
No ratings yet
Hash Table Data Structure
34 pages
Implementation Priority Queue Using Array
No ratings yet
Implementation Priority Queue Using Array
3 pages
Hashing
No ratings yet
Hashing
20 pages
Assignment_No-1
No ratings yet
Assignment_No-1
6 pages
Lab 2
No ratings yet
Lab 2
10 pages
Hashing new
No ratings yet
Hashing new
48 pages
AST20105 Data Structure and Algorithms: Chapter 9 - Hash Table
No ratings yet
AST20105 Data Structure and Algorithms: Chapter 9 - Hash Table
39 pages
Chapter One - Hashing PDF
No ratings yet
Chapter One - Hashing PDF
30 pages
3 Hashing
No ratings yet
3 Hashing
20 pages
Hashing
No ratings yet
Hashing
9 pages
Hash Table: Didih Rizki Chandranegara
No ratings yet
Hash Table: Didih Rizki Chandranegara
33 pages
Hashing
No ratings yet
Hashing
14 pages
Cse373 10 Hashing
No ratings yet
Cse373 10 Hashing
36 pages
Algorithms & Data Structures 06
No ratings yet
Algorithms & Data Structures 06
13 pages
MCA Data Structures With Algorithms 14
No ratings yet
MCA Data Structures With Algorithms 14
12 pages
Unit 5 Data Structure
No ratings yet
Unit 5 Data Structure
12 pages
DS Module 5 Hashing
No ratings yet
DS Module 5 Hashing
23 pages
Chapter 5_Hashing _Part1
No ratings yet
Chapter 5_Hashing _Part1
28 pages
Hashing Cropped (1)
No ratings yet
Hashing Cropped (1)
12 pages
Hashing Algorithms
No ratings yet
Hashing Algorithms
22 pages
Unit-5
No ratings yet
Unit-5
50 pages
HASHING
No ratings yet
HASHING
8 pages
Hashing
No ratings yet
Hashing
23 pages
CH 4
No ratings yet
CH 4
58 pages
Hashing: Data Structure
No ratings yet
Hashing: Data Structure
17 pages
Hashing
From Everand
Hashing
Prakash Hegade
No ratings yet
300+ Python Algorithms: Mastering the Art of Problem-Solving
From Everand
300+ Python Algorithms: Mastering the Art of Problem-Solving
Hernando Abella
5/5 (1)
G185XW01 V1
No ratings yet
G185XW01 V1
1 page
Major Project Report 2023-2024
No ratings yet
Major Project Report 2023-2024
33 pages
Media Convergence Holliman 2010
No ratings yet
Media Convergence Holliman 2010
12 pages
GS M72C - en
No ratings yet
GS M72C - en
2 pages
The 6 Edition of International Conference On Communications and Cyber-Physical Engineering (ICCCE - 2023)
No ratings yet
The 6 Edition of International Conference On Communications and Cyber-Physical Engineering (ICCCE - 2023)
8 pages
Draft Filsfils Spring srv6 Network Programming 07
No ratings yet
Draft Filsfils Spring srv6 Network Programming 07
42 pages
Datasheet RTC ds3231
No ratings yet
Datasheet RTC ds3231
20 pages
MSI CR400-T66+: Acer Aspire 4740G-332g32Mn Core I3 Laptop
No ratings yet
MSI CR400-T66+: Acer Aspire 4740G-332g32Mn Core I3 Laptop
3 pages
Practical - Regression
No ratings yet
Practical - Regression
114 pages
b2 Reading For Detail Job Advertisements Irvingcampos
No ratings yet
b2 Reading For Detail Job Advertisements Irvingcampos
2 pages
UI - UX Training in Hyd
No ratings yet
UI - UX Training in Hyd
17 pages
NIS Microproject 5 by Campusify
No ratings yet
NIS Microproject 5 by Campusify
8 pages
Final Elements and Control Interface
No ratings yet
Final Elements and Control Interface
14 pages
Final Anniversary 2020 2021 Evaluation For Anush Jain
No ratings yet
Final Anniversary 2020 2021 Evaluation For Anush Jain
21 pages
User Manual Toshiba Portégé Z20t-C (English - 145 Pages)
No ratings yet
User Manual Toshiba Portégé Z20t-C (English - 145 Pages)
2 pages
Bank Question Test
No ratings yet
Bank Question Test
81 pages
Q Line User Manual
No ratings yet
Q Line User Manual
37 pages
f03390 Vds Basic Loft Telephone
No ratings yet
f03390 Vds Basic Loft Telephone
2 pages
Unit 4
No ratings yet
Unit 4
16 pages
Chemical Process Calculation by K Ashokan-1
No ratings yet
Chemical Process Calculation by K Ashokan-1
256 pages
EZZ023535 - Synchronous Control
No ratings yet
EZZ023535 - Synchronous Control
36 pages
A Multi-View Feature Fusion Approach For Effective Malware Classification Using Deep Learning
No ratings yet
A Multi-View Feature Fusion Approach For Effective Malware Classification Using Deep Learning
15 pages
Tictactoe
No ratings yet
Tictactoe
7 pages
Kenya National Integrated Identity Management Systems (Niims)
No ratings yet
Kenya National Integrated Identity Management Systems (Niims)
11 pages
HPE_Synergy_Image_Streamer2_Migration_Guide_30-10311B3E-001
No ratings yet
HPE_Synergy_Image_Streamer2_Migration_Guide_30-10311B3E-001
24 pages
Digital Electronics
No ratings yet
Digital Electronics
79 pages
Thesis Theme 2.0
100% (3)
Thesis Theme 2.0
8 pages
Becoming Agile in The Digital Transformation: The Process of A Large-Scale Agile Transformation
No ratings yet
Becoming Agile in The Digital Transformation: The Process of A Large-Scale Agile Transformation
18 pages
857-um002_-en-p
No ratings yet
857-um002_-en-p
48 pages