0% found this document useful (0 votes)

26 views20 pages

UNIT V - Hashing

Uploaded by

VVM

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views20 pages

UNIT V - Hashing

Uploaded by

VVM

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

What is Hashing?

▪ Hashing is the process of mapping large amount of data item to smaller table with the
help of hashing function.
▪ Hashing is also known as Hashing Algorithm or Message Digest Function.
▪ It is a technique to convert a range of key values into a range of indexes of an array.
▪ It is used to facilitate the next level searching method when compared with the linear or
binary search.
▪ Hashing allows to update and retrieve any data entry in a constant time O(1).
▪ Constant time O(1) means the operation does not depend on the size of the data.
▪ Hashing is used with a database to enable items to be retrieved more quickly.
▪ It is used in the encryption and decryption of digital signatures.

Hashing is an important data structure designed to solve the problem of efficiently finding and
storing data in an array.

Example:

if you have a list of 20000 numbers, and you have given a number to search in that list-

you will scan each number in the list until you find a match.

It requires a significant amount of your time to search in the entire list and locate that

specific number. This manual process of scanning is not only time-consuming but inefficient
too. With hashing in the data structure, you can narrow down the search and find the number
within seconds.
Examples of Hashing in Data Structure

The following are real-life examples of hashing in the data structure –

• In schools, the teacher assigns a unique roll number to each student. Later, the teacher
uses that roll number to retrieve information about that student.

• A library has an infinite number of books. The librarian assigns a unique number to each
book. This unique number helps in identifying the position of the books on the bookshelf.

What is Hash Function?

▪ A fixed process converts a key to a hash key is known as a Hash Function.
▪ This function takes a key and maps it to a value of a certain length which is called a Hash
value or Hash.
▪ Hash value represents the original string of characters, but it is normally smaller than the
original.
▪ It transfers the digital signature and then both hash value and signature are sent to the
receiver. Receiver uses the same hash function to generate the hash value and then
compares it to that received with the message.
▪ If the hash values are same, the message is transmitted without errors.

What is Hash Table?

▪ Hash table or hash map is a data structure used to store key-value pairs.
▪ It is a collection of items stored to make it easy to find them later.
▪ It uses a hash function to compute an index into an array of buckets or slots from which
the desired value can be found.
▪ It is an array of list where each list is known as bucket.
▪ It contains value based on the key.
▪ Hash table is used to implement the map interface and extends Dictionary class.
▪ Hash table is synchronized and contains only unique elements.
▪ The above figure shows the hash table with the size of n = 10. Each position of the hash
table is called as Slot. In the above hash table, there are n slots in the table, names = {0,
1, 2, 3, 4, 5, 6, 7, 8, 9}. Slot 0, slot 1, slot 2 and so on. Hash table contains no items, so
every slot is empty.
▪ As we know the mapping between an item and the slot where item belongs in the hash
table is called the hash function. The hash function takes any item in the collection and
returns an integer in the range of slot names between 0 to n-1.
▪ Suppose we have integer items {26, 70, 18, 31, 54, 93}. One common method of
determining a hash key is the division method of hashing and the formula is :

Hash Key = Key Value % Number of Slots in the Table

Hashing in a data structure is a two-step process.

1. The hash function converts the item into a small integer or hash value. This integer is
used as an index to store the original data.

2. It stores the data in a hash table. You can use a hash key to locate data quickly.

Note: Why do we need hashing?

Many applications deal with lots of data
- Search engines and web pages
- There are myriad look ups.
- The look ups are time critical.
- Typical data structures like arrays and lists, may not be sufficient to
handle efficient lookups
- In general: When look-ups need to occur in near constant time. O(1)
- We need something that can do better than a binary search, O(log N).
We want, O(1).

Solution: Hashing
Division method
Choose a number m larger than the number n of keys in K. (The number m is
usually chosen to be a prime number or a number without small divisors, since
this frequently minimizes the number of collisions.) The hash functions H is
defined by
H(k)=k(mod m) or H(k)=k(mod m)+1
Here k (mod m) denotes the remainder when k is divided by m. The second
formula is used when we want the hash addresses to range from 1to m rather
than from 0 to m-1.

Midsquare method
The key k is squared. Then the hash function H is defined by

H(k)=l
Where l is obtained by deleting digits from both ends of k2. We emphasize that
the same positions of k2 must be used for all of the keys.
Folding method
The key k is partitioned into a number of parts, k1 ..... , kr, where each part, except
possibly the last, has the same number of digits as the required address. Then
the parts are added together, ignoring the last carry. That is,
H(k)=k1+k2+ ...... +kr
Where the leading-digit carries, if any, are ignored. Sometimes, for extra ―milling‖,
the even-numbered parts, k2,k4, .... , are each reversed before the addition.
Example
Consider the company in the above Example, each of whose 68 employees is assigned a
unique 4-digit employee number. Suppose L consists of 100 two-digit addresses: 00, 01,
02, ......, 99. We apply the above hash functions to each of the following employee
numbers:

Division Method

3205, 7148, 2345

Choose a prime number m close to 99, such as m=97. Then H(3205)=4, H(7148)=67,
H(2345)=17

That is, dividing 3205 by 97 gives a remainder of 4, dividing 7148 by 97 gives a remainder
of 67, and dividing 2345 by 97 gives a remainder of 17. In the case that the memory
addresses begin with 01 rather than 00, we choose that the function H(k)=k(mod m)+1 to
obtain:

H(3205)=4+1=5, H(7148)=67+1=68, H(2345)=17+1=18

Midsquare method

The following calculations are performed:

k: 3205 7148 2345

k2: 10 272 025 51 093 904 935499 025

H(k): 72 93 99

Observe that the fourth and fifth digits, counting from the right, are chosen for the hash
address.

Folding method

Chopping the key k into two parts and adding yields the following hash addresses:

H(3205)=32+05=37, H(7148)=71+48=19,H(2345)=23+45=68

Observe that the leading digit 1 in H(7148) is ignored. Alternatively, one may want to
reverse the second part before adding, thus producing the following hash addresses:

H(3205)=32+50=82, H(7148)=71+84+55,H(2345)=23+54=77
Collision Resolution

Collisions occur when the hash function maps two different keys to the same location.
Obviously, two records cannot be stored in the same location.
Suppose we want to add a new record R with key k to our file F, but
suppose the memory location address H(k) is already occupied. This situation is
called collision.

Therefore, a method used to solve the problem of collision, also called collision resolution
technique, is applied. The two most popular methods of resolving collisions are:

1. Chaining

2. Open addressing

Separate Chaining:
The idea is to make each cell of hash table point to a linked list of records that have same
hash function value.
Let us consider a simple hash function as “key mod 7” and sequence of keys as 50, 700, 76,
85, 92, 73, 101.
Advantages:
1) Simple to implement.
2) Hash table never fills up, we can always add more elements to the chain.
3) Less sensitive to the hash function or load factors.
4) It is mostly used when it is unknown how many and how frequently keys may be inserted or
deleted.

Disadvantages:
1) Cache performance of chaining is not good as keys are stored using a linked list. Open
addressing provides better cache performance as everything is stored in the same table.
2) Wastage of Space (Some Parts of hash table are never used)
3) If the chain becomes long, then search time can become O(n) in the worst case.
4) Uses extra space for links.

Performance of Chaining:
Performance of hashing can be evaluated under the assumption that each key is equally likely
to be hashed to any slot of table (simple uniform hashing).

m = Number of slots in hash table

n = Number of keys to be inserted in hash table

Load factor α = n/m

Expected time to search = O(1 + α)

Expected time to delete = O(1 + α)

Time to insert = O(1)

Time complexity of search insert and delete is
O(1) if α is O(1)

Open Addressing:

Linear Probing and Modifications

The hash table contains two types of values: sentinel values (e.g., –1) and

data values. The presence of a sentinel value indicates that the location contains no data value
at present but can be used to hold a value.

When a key is mapped to a particular memory location, then the value it holds is checked. If it

contains a sentinel value, then the location is free and the data value can be stored in it.
However,

if the location already has some data value stored in it, then other slots are examined
systematically in the forward direction to find a free slot. If even a single free location is not
found, then we have an OVERFLOW condition.

The process of examining memory locations in the hash table is called probing.
Open addressing technique can be implemented using

linear probing, quadratic probing, double hashing, and rehashing.

Linear Probing

The simplest approach to resolve a collision is linear probing. In this technique, if a value is
already

stored at a location generated by h(k), then the following hash function is used to resolve the

collision:

h(k, i) = [h￠(k) + i] mod m

Where m is the size of the hash table, h￠(k) = (k mod m), and i is the probe number that varies
from

0 to m–1.

Therefore, for a given key k, first the location generated by [h￠(k) mod m] is probed because for

the first time i=0. If the location is free, the value is stored in it, else the second probe generates

the address of the location given by [h￠(k) + 1]mod m. Similarly, if the location is occupied, then

subsequent probes generate the address as

[h￠(k) + 2]mod m, [h￠(k) + 3]mod m, [h￠(k) + 4]mod m, [h￠(k) + 5]mod m, and so on, until a
free location is found.

Note: Linear probing is known for its simplicity. When we have to store a value, we try the slots:
[h￠(k)]
mod m, [h￠(k) + 1]mod m, [h￠(k) + 2]mod m, [h￠(k) + 3]mod m, [h￠(k) + 4]mod m, [h￠(k) +
5]mod m, and so
no, until a vacant location is found.
Example Consider a hash table of size 10. Using linear probing, insert the keys 72, 27,
36, 24, 63, 81, 92, and 101 into the table.
Let h￠(k) = k mod m, m = 10
Initially, the hash table can be given as:

One main disadvantage of linear probing is that records tend to cluster, that is,

appear next to one another, when the load factor is greater than 50 percent.
Such a clustering substantially increases the average search time for a record.
Two techniques to minimize clustering are as follows:
Quadratic probing
Suppose a record R with key k has the hash address H(k)=h. Then, instead of
searching the locations with addresses h, h+1, h+2,.., we linearly search the
locations with addresses
If the number m of locations in the table T is a prime number, then the above
sequence will access half of the locations in T.
Double hashing
Here a second hash function H‘ is used for resolving a collision, as follows.
Suppose a record R with key k has the hash addresses H(k)=h and H‘(k)=h‘≠m.
Then we linearly search the locations with addresses
h, h+h‘, h+2h‘, h+3h‘,....
ADVANTAGES :

Linear probing finds an empty location by doing a linear search in the array beginning from

position h(k). Although the algorithm provides good memory caching through good locality of

reference.

DISADVANTAGES :

results in clustering, and thus there is a higher risk of more collisions where one collision has
already taken place. The performance of linear probing is sensitive to the distribution of input
values.

As the hash table fills, clusters of consecutive cells are formed and the time required for a
search increases with the size of the cluster.

Quadratic Probing
In this technique, if a value is already stored at a location generated by h(k), then the following
hash function is used to resolve the collision:
h(k, i) = [h￠(k) + c1i + c2i2] mod m
where m is the size of the hash table, h￠(k) = (k mod m), i is the probe number that varies from
0 to m–1, and c1 and c2 are constants such that c1 and c2 π 0.
Quadratic probing eliminates the primary clustering phenomenon of linear probing because
instead of doing a linear search, it does a quadratic search.
For a given key k, first the location generated by h￠(k) mod m is probed. If the location is free,
the value is stored in it, else subsequent locations probed are offset by factors that depend in a
quadratic manner on the probe number i.

Although quadratic probing performs better than linear probing, in order to maximize the
utilization of the hash table, the values of c1, c2, and m need to be constrained.

Example
Consider a hash table of size 10. Using quadratic probing, insert the keys 72,
27, 36, 24, 63, 81, and 101 into the table. Take c1 = 1 and c2 = 3.
Solution
Let h￠(k) = k mod m, m = 10
Initially, the hash table can be given as:
If m is a prime number, then the above sequence will access all the locations in
the table T.

Remark: One major disadvantage in any type of open addressing procedure is in

the implementation of deletion. Specifically, suppose a record R is deleted from
the location T[r]. Afterwards, suppose we meet T[r] while searching for another
record R‘. This does not necessarily mean that the search is unsuccessful. Thus,
when deleting the record R, we must label the location T[r] to indicate that it
ADVANTAGES

Quadratic probing resolves the primary clustering problem that exists in the linear probing

technique. Quadratic probing provides good memory caching because it preserves some locality

of reference.
DISADVANTAGES

secondary clustering. It means that if there is a collision between two keys, then the same probe

sequence will be followed for both. With quadratic probing, the probability for multiple collisions

increases as the table becomes full. This situation is usually encountered when the hash table is

more than full.

Double Hashing

In double hashing, we use two hash functions rather than a single function. The hash function in
the case of double hashing can be given as:

h(k, i) = [h1(k) + ih2(k)] mod m

where m is the size of the hash table, h1(k) and h2(k) are two hash functions given as h1(k) = k
mod

m, h2(k) = k mod m', i is the probe number that varies from 0 to m–1, and m' is chosen to be less
than

m. We can choose m' = m–1 or m–2.

When we have to insert a key k in the hash table, we first probe the location given by applying

[h1(k) mod m] because during the first probe, i = 0. If the location is vacant, the key is inserted
into

it, else subsequent probes generate locations that are at an offset of [h2(k) mod m] from the
previous

location. Since the offset may vary with every probe depending on the value generated by the

second hash function, the performance of double hashing is very close to the performance of the

ideal scheme of uniform hashing.

Example

Consider a hash table of size = 10. Using double hashing, insert the keys 72,
27, 36, 24, 63, 81, 92, and 101 into the table. Take h1 = (k mod 10) and h2 = (k mod 8).
Solution
Let m = 10

Initially, the hash table can be given as:

ADVANTAGES

Double hashing minimizes repeated collisions and the effects of clustering. That is, double
hashing is free from problems associated with primary clustering as well as secondary
clustering.
Rehashing

When the hash table becomes nearly full, the number of collisions increases, thereby degrading

the performance of insertion and search operations. In such cases, a better option is to create a

new hash table with size double of the original hash table.

All the entries in the original hash table will then have to be moved to the new hash table. This

is done by taking each entry, computing its new hash value, and then inserting it in the new hash

table.

Though rehashing seems to be a simple process, it is quite expensive and must therefore not

be done frequently. Consider the hash table of size 5 given below.

The hash function used is h(x) = x % 5. Rehash the entries into to a new hash table.
COMPARISION BETWEEN SEPARATE CHAININING AND OPEN ADDRESSING

S.No. Separate Chaining Open Addressing

Open Addressing requires more

1. Chaining is Simpler to implement. computation.

In chaining, Hash table never fills up,

we can always add more elements to In open addressing, table may
2. chain. become full.

Open addressing requires extra

Chaining is Less sensitive to the hash care to avoid clustering and load
3. function or load factors. factor.

Chaining is mostly used when it is Open addressing is used when the

unknown how many and how frequently frequency and number of keys is
4. keys may be inserted or deleted. known.

Cache performance of chaining is not Open addressing provides better

good as keys are stored using linked cache performance as everything is
5. list. stored in the same table.

In Open addressing, a slot can be

Wastage of Space (Some Parts of hash used even if an input doesn’t map to
6. table in chaining are never used). it.

7. Chaining uses extra space for links. No links in Open addressing

Hashing
No ratings yet
Hashing
34 pages
ADI Hashing
No ratings yet
ADI Hashing
47 pages
Module 5: HASHING: Functions. The Values Are Then Stored in A Data Structure Called Hash Table
No ratings yet
Module 5: HASHING: Functions. The Values Are Then Stored in A Data Structure Called Hash Table
39 pages
Hashing
No ratings yet
Hashing
25 pages
What Is Hashing
No ratings yet
What Is Hashing
11 pages
08 Hashing
No ratings yet
08 Hashing
26 pages
Hashing Slide
No ratings yet
Hashing Slide
16 pages
Hashing PDF
No ratings yet
Hashing PDF
56 pages
Final Hashing
No ratings yet
Final Hashing
41 pages
Module 5
No ratings yet
Module 5
33 pages
DS Module-X
No ratings yet
DS Module-X
74 pages
Hashing
No ratings yet
Hashing
7 pages
Hashing
No ratings yet
Hashing
56 pages
Hashing
No ratings yet
Hashing
23 pages
CH 4 Hash Table
No ratings yet
CH 4 Hash Table
20 pages
Unit 5
No ratings yet
Unit 5
50 pages
Hashing Techniques
No ratings yet
Hashing Techniques
13 pages
Hashing
No ratings yet
Hashing
12 pages
Week 9 - Hash Functions and Collision
No ratings yet
Week 9 - Hash Functions and Collision
73 pages
Hash Function
No ratings yet
Hash Function
9 pages
3 Hashing
No ratings yet
3 Hashing
20 pages
HASHING
No ratings yet
HASHING
8 pages
Hashing
No ratings yet
Hashing
30 pages
Unit-5 Hashing
No ratings yet
Unit-5 Hashing
12 pages
Hashing
No ratings yet
Hashing
30 pages
Hashing
No ratings yet
Hashing
48 pages
DS Module 5 Hashing
No ratings yet
DS Module 5 Hashing
23 pages
HAshing (Satish Sir)
No ratings yet
HAshing (Satish Sir)
52 pages
Hash Table
No ratings yet
Hash Table
26 pages
Lecture 08 - Hash Tables
No ratings yet
Lecture 08 - Hash Tables
21 pages
Hashing Algorithms
No ratings yet
Hashing Algorithms
22 pages
DSA G5 Hashing Handouts
No ratings yet
DSA G5 Hashing Handouts
7 pages
Hashing
No ratings yet
Hashing
35 pages
Hashing
No ratings yet
Hashing
44 pages
Unit-6c DBMS - Hashing
No ratings yet
Unit-6c DBMS - Hashing
21 pages
Handout 9 - Hashing
No ratings yet
Handout 9 - Hashing
11 pages
05 Hashing
No ratings yet
05 Hashing
47 pages
Lect Hashing
No ratings yet
Lect Hashing
36 pages
Hashing
No ratings yet
Hashing
37 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
27 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
26 pages
Hashing and Graphs
No ratings yet
Hashing and Graphs
28 pages
Hash
No ratings yet
Hash
7 pages
Hashing
No ratings yet
Hashing
20 pages
Hashing
No ratings yet
Hashing
20 pages
Hashing PPT For Student
No ratings yet
Hashing PPT For Student
53 pages
Module 5-Hashing and Collision
No ratings yet
Module 5-Hashing and Collision
51 pages
Unit 9 Hashing BIM
No ratings yet
Unit 9 Hashing BIM
5 pages
Module 6 DSA 24
No ratings yet
Module 6 DSA 24
64 pages
Done DS GTU Study Material Presentations Unit-4 13032021035653AM
No ratings yet
Done DS GTU Study Material Presentations Unit-4 13032021035653AM
24 pages
Hashing
No ratings yet
Hashing
5 pages
Hashing in Data Structure
No ratings yet
Hashing in Data Structure
43 pages
Hashing
No ratings yet
Hashing
42 pages
DSA - Unit 1
No ratings yet
DSA - Unit 1
43 pages
Unit-5 2
No ratings yet
Unit-5 2
9 pages
Lecture 27 - Hashing
No ratings yet
Lecture 27 - Hashing
48 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
32 pages
Hashing in DBMS
No ratings yet
Hashing in DBMS
5 pages
Module 5
No ratings yet
Module 5
22 pages
Hashing
From Everand
Hashing
Prakash Hegade
No ratings yet
CPSC 331 Practice Midterm 1 Fall 2022
No ratings yet
CPSC 331 Practice Midterm 1 Fall 2022
19 pages
Hashing
No ratings yet
Hashing
38 pages
DSA Final
No ratings yet
DSA Final
5 pages
File Organisation and Indexing
No ratings yet
File Organisation and Indexing
10 pages
8.physical Database Design
No ratings yet
8.physical Database Design
20 pages
Know Thy Search Engine
100% (1)
Know Thy Search Engine
74 pages
System Design DSA Combined New
No ratings yet
System Design DSA Combined New
7 pages
Raghav Sirs Notes 1 PDF
No ratings yet
Raghav Sirs Notes 1 PDF
100 pages
Rdbms Questions
No ratings yet
Rdbms Questions
2 pages
DSD Syllabus
No ratings yet
DSD Syllabus
5 pages
MCA Assignment (Semester 2 + 3 Full) Sikkim Manipal University, SMU
100% (1)
MCA Assignment (Semester 2 + 3 Full) Sikkim Manipal University, SMU
222 pages
Hashing
No ratings yet
Hashing
24 pages
UNIT-III Compiler Design - SCS1303: School of Computing Department of Computer Science and Engineering
No ratings yet
UNIT-III Compiler Design - SCS1303: School of Computing Department of Computer Science and Engineering
24 pages
Dsa Interview Questions
No ratings yet
Dsa Interview Questions
12 pages
DSA Continue Assessment
No ratings yet
DSA Continue Assessment
2 pages
CSF Semester III IV
No ratings yet
CSF Semester III IV
44 pages
CS106B Notes
No ratings yet
CS106B Notes
8 pages
1) Draw A Red-Black Tree For The Following Values Inserted in This Order. Illustrate Each Operation That Occurs: K W o S y T P R 10 Points
No ratings yet
1) Draw A Red-Black Tree For The Following Values Inserted in This Order. Illustrate Each Operation That Occurs: K W o S y T P R 10 Points
18 pages
% Cabap
No ratings yet
% Cabap
6 pages
IT-209 Data Structures
No ratings yet
IT-209 Data Structures
4 pages
Unit 4 DBMS Pre Reading
No ratings yet
Unit 4 DBMS Pre Reading
8 pages
Master of Science (Integrated-Information Technology) : Gujarat Technological University
No ratings yet
Master of Science (Integrated-Information Technology) : Gujarat Technological University
2 pages
Interview Question With Answer Imporant
No ratings yet
Interview Question With Answer Imporant
37 pages
Ds Lesson Plan
No ratings yet
Ds Lesson Plan
4 pages
DS Lab (BCSL305) MANUAL - ISE
No ratings yet
DS Lab (BCSL305) MANUAL - ISE
68 pages
Lecture 5.Pptx 2
No ratings yet
Lecture 5.Pptx 2
22 pages
DBMS Module 6
No ratings yet
DBMS Module 6
94 pages
AMCAT Test Paper 2 Ans
100% (1)
AMCAT Test Paper 2 Ans
23 pages
Complexity Analysis of Algorithms
100% (1)
Complexity Analysis of Algorithms
13 pages
iSU ABAP Level 1 Internal Tables
100% (1)
iSU ABAP Level 1 Internal Tables
40 pages

UNIT V - Hashing

Uploaded by

UNIT V - Hashing

Uploaded by

What is Hashing?

The following are real-life examples of hashing in the data structure –

What is Hash Function?

What is Hash Table?

Hash Key = Key Value % Number of Slots in the Table

Hashing in a data structure is a two-step process.

Note: Why do we need hashing?

3205, 7148, 2345

H(3205)=4+1=5, H(7148)=67+1=68, H(2345)=17+1=18

The following calculations are performed:

k: 3205 7148 2345

k2: 10 272 025 51 093 904 935499 025

m = Number of slots in hash table

Load factor α = n/m

Expected time to delete = O(1 + α)

Time to insert = O(1)

Linear Probing and Modifications

linear probing, quadratic probing, double hashing, and rehashing.

h(k, i) = [h￠(k) + i] mod m

subsequent probes generate the address as

Remark: One major disadvantage in any type of open addressing procedure is in

more than full.

h(k, i) = [h1(k) + ih2(k)] mod m

m. We can choose m' = m–1 or m–2.

ideal scheme of uniform hashing.

Initially, the hash table can be given as:

be done frequently. Consider the hash table of size 5 given below.

S.No. Separate Chaining Open Addressing

Open Addressing requires more

In chaining, Hash table never fills up,

Open addressing requires extra

Chaining is mostly used when it is Open addressing is used when the

Cache performance of chaining is not Open addressing provides better

In Open addressing, a slot can be

7. Chaining uses extra space for links. No links in Open addressing

You might also like