0% found this document useful (0 votes)

10 views10 pages

Unit 10

This document discusses hashing as a technique for efficient information retrieval, detailing its objectives, drivers, and various methods including index mapping, collision resolution techniques, and load factor management. It covers hashing fundamentals such as hash functions, separate chaining, open addressing, and double hashing, along with their advantages and disadvantages. Additionally, it explains the importance of rehashing when the load factor exceeds a certain threshold to maintain performance.

Uploaded by

najeebayyaril6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views10 pages

Unit 10

Uploaded by

najeebayyaril6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

File Structures and

UNIT 10 HASHING Advanced Data Structures

Structure
10.0 Introduction
10.1 Objectives
10.2 Drivers and motivations for hashing
10.3 Index Mapping
10.3.1 Challenges with Index Mapping
10.3.2 Hash Function
10.3.3 Simple Hash Example
10.4 Collision Resolution
10.4.1 Separate Chaining
10.4.2 Open Addressing
10.4.3 Double Hashing
10.5 Comparison of Collision Resolution Methods
10.6 Load Factor and Rehashing
10.6.1 Rehashing
10.7 Summary
10.8 Solutions/Answers
10.9 Further Readings

10.0 INTRODUCTION

Hashing is a key technique in information retrieval. Hashing transforms the input data
into a small set of keys that can be efficiently stored and retrieved. Hashing provides
constant time and highly efficient information retrieval capability irrespective of total
search space.

10.1 OBJECTIVES

After going through this unit, you should be able to

 the drivers and motivations for hashing,
 various hashing techniques,
 understand index mapping
 collision avoiding techniques,
 understand load factor and rehashing

10.2 MAIN DRIVERS AND MOTIVATIONS FOR

HASHING

As part of searching and information retrieval we use hashing for mainly below
given reasons:
 Provide constant time data retrieval and insertion
 Manage the data related to large set of input keys efficiently
 Provide cost efficient hash key computations

10.3 INDEX MAPPING

1
Hashing In order to store and retrieve huge set of data, we can think of using a large sized array
and store/retrieve the data from the large array. We can plan to use the data value as
the key for the array. As the array indexing takes only O(1) time, the method
guarantees a constant and fast performance.
Index mapping or trivial hashing method assumes a large sized array and the input
keys as index for the array to retrieve the value.
Let’s look at an example for this index mapping method. Let’s design a large array
based on the index mapping method. We use the array to store the name of the user
and we plan to lookup the user details using their 4-digit employee id. As depicted in
Figure 1, we create a large array to accommodate all the employee ids and use the
employee id as an index to get the user details from the array.
Index mapping approach can be used when we know or when we can predict all the
input keys. For instance, if we were to store the details for months of a year we know
that we can have maximum of 12 months and hence we can design the hash table with
size 12. Similarly, for input keys such as days of month, countries, states within a
country where we can predict the maximum values, we can use the index mapping
method.

Figure 1 Index mapping approach

10.3.1 Challenges with Index Mapping

We can observe two main drawbacks with the index mapping approach. Firstly, this
approach needs only non-negative integer values as the index keys and secondly as we
can see from Figure 1, the array size poses performance and scalability challenges.
The array has to be sized to accommodate the largest key value and the values are
unevenly distributed leading to wastage of space.
To overcome the two limitations mentioned above, we need a conversion function that
2 converts all data types (string, image, alphanumeric etc.) into a non-negative integer
value. Additionally, the conversion function also maps the input values to smaller set File Structures and
Advanced Data Structures
of keys that can be used as index for a smaller sized array. We call the conversion
function as “hash function”.
Let’s use a hash function that takes the input values (employee id) and converts the
value into a key that forms of the index for the hash table storing the employee details.
We have used the below given hash function:
h(x) = x mod 10
where h(x) is the hash value, x is the input key. The mod operation provides the
remainder value for the key. As a result of this hash function we now have the hash
table with optimal size as depicted in Figure 2.

Figure 2 Hash Function and Hash Table

As we can see from Figure 2, instead of creating a huge hash table of size 9875, we
have now managed to store the elements within an array of size 10. The hash function
has converted the input into smaller set of keys that are used as index for the hash
table.

Let us re-look at two challenges we saw in our earlier direct access method. The
examples we discussed in Figure 2 use integer values as input keys. If we use non-
integer values such as images or strings, we need to convert it first into a non-negative
integer value. For instance, using the corresponding ascii values for each of its
character, we can get a numeric value for a string. We can then use the hash function
to create a fingerprint for the numeric value to store the corresponding details in the
right sized hash table.

10.3.2 Hash Function

As we saw in the previous example, a hash function reduces a large non-negative

integer into a smaller hash value which is used as an index into the hash table for
searching the value.
The efficiency of a hash function is determined by following characteristics:
3
Hashing  Computing efficiency – The hash function should compute the hash value
quickly and efficiently even for large key values
 Uniform distribution – The hash function should be able to distribute the
keys evenly in the hash table.
 Deterministic – The hash function should consistently create the same key
for a given value.
 Minimal Collisions – The hash function should minimize the key collisions
in the hash table.

10.3.3 Simple Hash Example

Let us look at implementation of main functions using a modulo-based hash function.

We assume no collisions for this sample code in Java.

// the function returns the stored value for the input key

public V getValue(int inputKey) {

int hashvalue = getHashValue(inputKey);

return this.hashArray[hashvalue].value;

//the function adds the value for the given input key

public void addValue(int inputKey V inputValue) {

int hashvalue = getHashValue(inputKey);

this.hashArray[hashvalue].value = inputValue ;

//the function computes the hash value using mod logic

public int getHashValue(int inputKey) {

return inputKey % this.hashArray.length;

//the function removes the input value from the hashArray

public void removeValue(int inputKey) {

int hashvalue = getHashValue(inputKey);

this.hashArray[hashvalue].value = null;

4
File Structures and
10.4 COLLISION RESOLUTION Advanced Data Structures

For large set of input keys, we might end up having same hash value for two different
input values. For instance, let us consider our simple hash function h(x) = x mod 10
As the hash function provides the remainder value, if we have two input keys 24 and
4874, the hash value will be 4. These cases cause collision as both the input keys 24
and 4874 compete for same slot in the hash table.
We discuss three key collision resolution techniques in subsequent sections.

10.4.1 Separate Chaining

In separate chaining method, we chain the values to the same spot in the hash table.
We use a data structure like linked list that can store multiple values.

The example Figure 3 depicts collision handling using the modulo based hash
function. The input values 0051 and 821 result in the same hash value of 1. In the
hash table we chain the values for user 2 and user 5 for the same slot.

Figure 3 Separate Chaining

We are chaining the data values user2 and user5 for the slot 0 in the hash table.

Insert Operation
Given below are the steps for the insert operation using separate chaining method to
insert a key k:
1. Compute the hash value of the key k using the hash function.
2. Use the computed hash value as the index in the hash table to find the slot for
the key k.
3. Add the key k to the linked list at the slot. 5
Hashing
Search operation
Given below are the steps for the search operation for key k:
1. Compute the hash value of the key k using the hash function.
2. Use the computed hash value as the index in the hash table to find the slot for
searching the key k.
3. Check for all elements in the linked list at the slot for a match with key k.

☞ Check Your Progress – 1

1. _____ provides direct mapping of keys to the hash table index.
2. The two key challenges with index mapping are _______
3. The main characteristics of a hash function are ______.
4. The time complexity of index mapping is ______.
5. ______ data structure can be used to chain multiple values to the same spot in
the hash table.
6. Hash value should always be non-negative. True/False

10.4.2 Open Addressing

In the open addressing collision addressing strategy, we search for the next available
slot in the hash table when the natural slot is not available. We search for the open or
unused locations in the hash table.

There are mainly two variants of open addressing – linear probing and quadratic
probing. In the linear probing we sequentially iterate to the next available spot.

We define the linear probe by the following equation for ith iteration:

h(x,i) = (h(x)+i) mod m where m is the hash table size

We need to accordingly change the other hash functions such as getHashValue,

putHashValue and deleteHashValue.

 For getHashValue we need to start from the initial slot (hashTable[h(x)]) we

need to check the subsequent slots till we find the matching key.
 For putHashValue we need to start with the initial slot (hashTable[h(x)]) till
we find an empty slot or till the hashtable is full.
 For deleteHashValue we can place null into the slot or use an availability
marker (occupied, available).

The linear probing leads to a situation known as “primary clustering” wherein the
consecutive slots form “cluster” of keys in the hash table. As the cluster size grows, it
impacts the efficiency of the probing for placing the next key.

Quadratic probing makes larger jumps to avoid the primary clustering. We define the
quadratic probe by the following equation for ith iteration:

h(x,i) = (h(x)+i2) mod m

As we can see in the equation, for every iteration the quadratic probing makes larger
jumps. In Quadratic probing we encounter “secondary clustering” problem.

6
Let us look at an example for linear probing to avoid the collision. Let us consider the File Structures and
Advanced Data Structures
following set of keys [56, 1072, 97, 84, 60] and the hash table size of 5. When we
apply mod based hash function and start placing the keys in the appropriate slots we
get the placement as depicted in Figure 4.

Figure 4 Linear Probing

The value 56 goes to position 1 due to the mod value of (56 mod 5) operation.
Similarly, 1072 assumes position 2. However, when we try to place the next element
97 we end up with a collision at slot 2. So, we find the next empty slot at slot 3 and
place 97 there. Rest of the elements 84 and 60 go to the positions 4 and 0 respectively
based on their mod values.

10.4.3 Double Hashing

We can avoid the challenges with primary clustering and secondary clustering using
the double hashing strategy. Double hashing uses a second hash function to resolve
the collisions. The second hash function is different from the primary hash function
and uses the key to yield non-zero value.

The first hash function in the double hashing finds the initial slot for the key and the
second hash function determines the size of jumps for the probe. The ith probe is
defined as follows

h(x,i) = (h1(x)+ i * h2 (x)) mod m where m is the hash table size

Let us look at an example for the double hashing to understand it better. Let us
consider a hash table of size 5 and we use the below given hash functions:
H1(x) = x mod 5
H2(x) = x mod 7

Let us try to insert two elements 60 and 80 into the hash table. We can place the first
element 60 at slot 0 based on the hash function. When we try to insert the second
element 80, we face a collision at slot 0. For the first iteration we apply the double
hashing as follows:
H(80,1) = (0+1*3) mod 5 = 3

Hence, we now place the element 80 in slot 3 to avoid collision as depicted in figure
5.

7
Hashing

Figure 5 Double Hashing Example

10.5 COMPARISON OF COLLISION RESOLUTION

METHODS

Comparison of linear probing, quadratic probing and double hashing is given below:

Collision Separate Open Open Double

Resolution Chaining Addressing Addressing - Hashing
Technique - Linear Quadratic
Probing Probing
Primary No Yes No No
clustering
Secondary No No Yes No
clustering
Key storage in Inside & Inside Hash Inside Hash Inside Hash
Hash table Outside table table table
Hashtable

10.6 LOAD FACTOR AND REHASHING

The hash table provides constant time complexity for operations such as retrieval,
insertion and deletion with lesser keys. As the key size grows, we run out of vacant
spots in the hash table leading to collision that impacts the time complexity. When the
collision happens, we need to re-adjust the hash table size so that we can
accommodate additional keys. Load factor defines the threshold when we should re-
size the hash table to main the constant time complexity.

Load factor is the ratio of the elements in the hash table to the total size of the hash
table. We define load factor as follows:

Load factor = (Number of keys stored in the hash table)/Total size of the hash table.

In open addressing as all the keys are stored within the hash table, the load factor is
<=1. In separate chaining method as the keys can be stored outside the hash table,
there is a possibility of load factor exceeding the value of 1.

If the load factor is 0.75 then as soon as the hash table reaches 75% of its size, we
increase its capacity. For instance, lets consider the hash table of size 10 with load
factor of 0.75. We can insert seven hash keys into this hash table without triggering
the re-size. As soon as we add the eighth key, the load factor becomes 0.80 that
8
exceeds the configured threshold triggering the hash table resize. We normally double File Structures and
Advanced Data Structures
the hash table size during the resize operation.

10.6.1 Rehashing

when the load factor exceeds the configured value, we increase the size of hash table.
Once we do it we should also re-compute the hash values for the existing keys as the
size of the hash table has changed. This process is called “rehashing”. Rehashing is a
costly exercise especially if the key size is huge. Hence it is necessary to carefully
select the optimal initial size of the hash table to start with.

Given below are the high-level steps for rehashing:

1. For each new key insert into the hash table, compute the load factor
2. If the load factor exceeds the pre-defined value then increase the hash table
size (normally we double the hash table size)
3. Recompute the hash value (rehash) for each of the existing elements in the
hash table.

Let us look at the rehashing with an example. We have a hash table of size 4 with load
factor of 0.60. Let’s start by inserting these elements – 30, 31 and 32. We can insert
30 at slot 2 and 31 at slot 3 and 32 at slot 0. Insertion of 32 triggers the hash table
resize as the load factor has breached the threshold of 0.60. As a result, we double the
hash table size to 8.

With the new hash table size, we need to recalculate the hash values of the already
inserted keys. Key 30 will now be placed in slot 6, key 31 will be placed in slot 7 and
key 32 in slot 0.

☞ Check Your Progress – 2

1. The main techniques of open addressing are _____
2. Load factor triggers _______
3. Linear probing leads to ________ clustering
4. _______ probing make large jumps leading to secondary clustering
5. Second hash function in double hashing can result in 0. True/False
6. ______ should be done for the existing keys of the hash table post resizing.

10.7 SUMMARY

In this unit, we started discussing the main motivations for the hashing. Hashing
allows us to store and retrieve large data efficiently.
Index mapping uses the input values as direct index into the hash table. Index
mapping requires huge hash table size leading to inefficiencies. When we handle large
size input values we encounter collision where multiple input values compete for the
same spot in the hash table. The main collision resolution techniques are separate
chaining and open addressing. In separate chaining we chain the values that get
mapped to a spot. We use linear probing and quadratic probing as part of open
addressing technique to find the next available spot. We use two hash functions as part
of double hashing. Load factor determines the trigger for the hash table resizing and
once the hash table is resized, we re-compute the hash values of the existing keys
using rehashing.

10.8 SOLUTIONS/ANSWERS

9
Hashing
☞ Check Your Progress – 1
1. Index mapping
2. handling non integer keys and large hash table size
3. computing efficiency, uniform distribution, deterministic and minimal
collisions
4. O(1)
5. Linked List
6. True

☞ Check Your Progress – 2

1. linear probing, quadratic probing and double hashing
2. resizing of hash table
3. primary
4. quadratic
5. False

6. Rehashing

10.9 FURTHER READINGS

Horowitz, Ellis, Sartaj Sahni, and Susan Anderson-Freed. Fundamentals of data

structures. Vol. 20. Potomac, MD: Computer science press, 1976.

Cormen, Thomas H., et al. Introduction to algorithms. MIT press, 2022.

Lafore, Robert. Data structures and algorithms in Java. Sams publishing, 2017.

Karumanchi, Narasimha. Data structures and algorithms made easy: data structure
and algorithmic puzzles. Narasimha Karumanchi, 2011.

West, Douglas Brent. Introduction to graph theory. Vol. 2. Upper Saddle River:
Prentice hall, 2001.

https://en.wikipedia.org/wiki/Hash_function#Trivial_hash_function

https://en.wikibooks.org/wiki/A-
level_Computing/AQA/Paper_1/Fundamentals_of_data_structures/Hash_tables_an
d_hashing

https://ieeexplore.ieee.org/book/8039591

Search Hashing
No ratings yet
Search Hashing
55 pages
RRB NTPC 12 January 2021 Question Paper PDF
No ratings yet
RRB NTPC 12 January 2021 Question Paper PDF
3 pages
C &DS (Unit5)
No ratings yet
C &DS (Unit5)
42 pages
Finals Complexity and Algorithmn
No ratings yet
Finals Complexity and Algorithmn
49 pages
Algo Lec3
No ratings yet
Algo Lec3
53 pages
Block 4
No ratings yet
Block 4
30 pages
Unit 5
No ratings yet
Unit 5
50 pages
Module 5-Hashing and Collision
No ratings yet
Module 5-Hashing and Collision
51 pages
Module 6 DSA 24
No ratings yet
Module 6 DSA 24
64 pages
Dsa M5
No ratings yet
Dsa M5
38 pages
Notes of Advanced Data Structures
No ratings yet
Notes of Advanced Data Structures
202 pages
22CS302 LM21
No ratings yet
22CS302 LM21
7 pages
Hash
No ratings yet
Hash
7 pages
Hashing
No ratings yet
Hashing
31 pages
Hashing Data Structure
No ratings yet
Hashing Data Structure
22 pages
Oscillations Printed Notes and Assignment
No ratings yet
Oscillations Printed Notes and Assignment
72 pages
Module 5 Hashing
No ratings yet
Module 5 Hashing
66 pages
Modue 5
No ratings yet
Modue 5
10 pages
Morphological Analysis: Natural Language Processing (CSE 5321)
No ratings yet
Morphological Analysis: Natural Language Processing (CSE 5321)
23 pages
DS Module-X
No ratings yet
DS Module-X
74 pages
Unit 3.4 Hashing Techniques
No ratings yet
Unit 3.4 Hashing Techniques
7 pages
Hashing
No ratings yet
Hashing
4 pages
No-Frills Worksheet For All Ages - Present Simple vs. Present Continuous
No ratings yet
No-Frills Worksheet For All Ages - Present Simple vs. Present Continuous
2 pages
HAshing (Satish Sir)
No ratings yet
HAshing (Satish Sir)
52 pages
Unit V
No ratings yet
Unit V
14 pages
Buble Sort
No ratings yet
Buble Sort
97 pages
Module 5: HASHING: Functions. The Values Are Then Stored in A Data Structure Called Hash Table
No ratings yet
Module 5: HASHING: Functions. The Values Are Then Stored in A Data Structure Called Hash Table
39 pages
PHY 111b
No ratings yet
PHY 111b
8 pages
(Maths) Functions Ques Bank
No ratings yet
(Maths) Functions Ques Bank
22 pages
Hashing
No ratings yet
Hashing
23 pages
Sampling Distributions: Section 7.1
100% (1)
Sampling Distributions: Section 7.1
21 pages
Hash Function
No ratings yet
Hash Function
4 pages
CH 4 Hash Table
No ratings yet
CH 4 Hash Table
20 pages
AnswerKey 123-2024
No ratings yet
AnswerKey 123-2024
1 page
JD Cad 3D
No ratings yet
JD Cad 3D
1 page
Math 155 Lecture Notes Section 10,2
No ratings yet
Math 155 Lecture Notes Section 10,2
7 pages
CW1 Balancing of Rotating Masses
No ratings yet
CW1 Balancing of Rotating Masses
5 pages
Hashing
No ratings yet
Hashing
12 pages
Week 9 - Hash Functions and Collision
No ratings yet
Week 9 - Hash Functions and Collision
73 pages
Hashing
No ratings yet
Hashing
5 pages
Hashing
No ratings yet
Hashing
30 pages
Ricco Serial Verb Constructions in Three-Participant Event
No ratings yet
Ricco Serial Verb Constructions in Three-Participant Event
50 pages
What Is Hashing
No ratings yet
What Is Hashing
11 pages
Lecture 08 - Hash Tables
No ratings yet
Lecture 08 - Hash Tables
21 pages
DS 5
No ratings yet
DS 5
23 pages
DSA Lab 11 Hashing
No ratings yet
DSA Lab 11 Hashing
9 pages
UNIT V - Hashing
No ratings yet
UNIT V - Hashing
20 pages
Hashing Techniques
No ratings yet
Hashing Techniques
15 pages
Hashing in Data Structures
No ratings yet
Hashing in Data Structures
8 pages
Hashing
No ratings yet
Hashing
44 pages
Unit 5 Data Structure
No ratings yet
Unit 5 Data Structure
12 pages
Namma Kalvi 12th Computer Applications Chapter 5 Study Material em 215027
100% (1)
Namma Kalvi 12th Computer Applications Chapter 5 Study Material em 215027
12 pages
SL 007 2022 00 1 0
No ratings yet
SL 007 2022 00 1 0
3 pages
Solutions Chapter4
100% (2)
Solutions Chapter4
27 pages
Hashing New
No ratings yet
Hashing New
48 pages
ADI Hashing
No ratings yet
ADI Hashing
47 pages
Exp 5 - Dsa Lab File
No ratings yet
Exp 5 - Dsa Lab File
10 pages
Final Hashing
No ratings yet
Final Hashing
41 pages
Parametric Equations and Polar Coordinates: Dr. Lê Xuân Đ I
No ratings yet
Parametric Equations and Polar Coordinates: Dr. Lê Xuân Đ I
32 pages
First Term MTH
No ratings yet
First Term MTH
2 pages
Hashing in Data Structure
No ratings yet
Hashing in Data Structure
43 pages
Aptitude Train Problems
No ratings yet
Aptitude Train Problems
30 pages
S4 - Graph Theory
No ratings yet
S4 - Graph Theory
15 pages
Hashing
No ratings yet
Hashing
37 pages
Drawing VSWR Circle On Smith Chart
No ratings yet
Drawing VSWR Circle On Smith Chart
19 pages
Why Are Complex Numbers Needed in Quantum Mechanics? Some Answers For The Introductory Level
No ratings yet
Why Are Complex Numbers Needed in Quantum Mechanics? Some Answers For The Introductory Level
8 pages
Tolerances and Fits: Min Max
No ratings yet
Tolerances and Fits: Min Max
24 pages
Hashing
No ratings yet
Hashing
34 pages
Iiser K SOP PDF
No ratings yet
Iiser K SOP PDF
2 pages
Hashing ClassNotes
No ratings yet
Hashing ClassNotes
8 pages
Thermal Physics & Circular Motion
No ratings yet
Thermal Physics & Circular Motion
2 pages
Econometrics Method (Ecn 417)
No ratings yet
Econometrics Method (Ecn 417)
6 pages
Lecture 14 Hashing
No ratings yet
Lecture 14 Hashing
44 pages
Worded Simultaneous Equations
No ratings yet
Worded Simultaneous Equations
3 pages
Introduction To Hashing & Hashing Techniques: Review of Searching Techniques
No ratings yet
Introduction To Hashing & Hashing Techniques: Review of Searching Techniques
19 pages
Not 0152024 4662024
No ratings yet
Not 0152024 4662024
2 pages
Hashing
No ratings yet
Hashing
16 pages
2015-Map-Normative-Data-Score Chart
No ratings yet
2015-Map-Normative-Data-Score Chart
1 page
Forging
No ratings yet
Forging
60 pages
9 Fourier Transform Properties: Solutions To Recommended Problems
No ratings yet
9 Fourier Transform Properties: Solutions To Recommended Problems
15 pages
Lab 2
No ratings yet
Lab 2
10 pages
C Lang1 PDF
No ratings yet
C Lang1 PDF
127 pages
Hashing PPT For Student
No ratings yet
Hashing PPT For Student
53 pages
Taxicab Geometry
No ratings yet
Taxicab Geometry
3 pages
Hashing and Indexing
No ratings yet
Hashing and Indexing
28 pages
Hash Function
No ratings yet
Hash Function
9 pages
Unit28 Hashing1
No ratings yet
Unit28 Hashing1
19 pages
Lecture 4 - Metrology & Measurement
No ratings yet
Lecture 4 - Metrology & Measurement
15 pages
Fluid Mechanics HW2
No ratings yet
Fluid Mechanics HW2
3 pages
Rational Method With Excel-R1
No ratings yet
Rational Method With Excel-R1
20 pages
Hashing
From Everand
Hashing
Prakash Hegade
No ratings yet
BCS304 DS Module 5 Notes
No ratings yet
BCS304 DS Module 5 Notes
45 pages
Hashing in Data Structures
No ratings yet
Hashing in Data Structures
27 pages
358 33 Powerpoint Slides DSC Chapter 15
No ratings yet
358 33 Powerpoint Slides DSC Chapter 15
55 pages
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
From Everand
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
Sama Alshatali
No ratings yet
SURPAC Model Filling
No ratings yet
SURPAC Model Filling
13 pages
Project Planning and Approval Worksheet
100% (2)
Project Planning and Approval Worksheet
8 pages
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet

Unit 10

Uploaded by

Unit 10

Uploaded by

File Structures and

UNIT 10 HASHING Advanced Data Structures

After going through this unit, you should be able to

10.2 MAIN DRIVERS AND MOTIVATIONS FOR

10.3 INDEX MAPPING

Figure 1 Index mapping approach

10.3.1 Challenges with Index Mapping

Figure 2 Hash Function and Hash Table

10.3.2 Hash Function

As we saw in the previous example, a hash function reduces a large non-negative

10.3.3 Simple Hash Example

We assume no collisions for this sample code in Java.

public V getValue(int inputKey) {

int hashvalue = getHashValue(inputKey);

public void addValue(int inputKey V inputValue) {

int hashvalue = getHashValue(inputKey);

//the function computes the hash value using mod logic

public int getHashValue(int inputKey) {

return inputKey % this.hashArray.length;

//the function removes the input value from the hashArray

public void removeValue(int inputKey) {

int hashvalue = getHashValue(inputKey);

10.4.1 Separate Chaining

Figure 3 Separate Chaining

☞ Check Your Progress – 1

10.4.2 Open Addressing

h(x,i) = (h(x)+i) mod m where m is the hash table size

We need to accordingly change the other hash functions such as getHashValue,

 For getHashValue we need to start from the initial slot (hashTable[h(x)]) we

h(x,i) = (h(x)+i2) mod m

Figure 4 Linear Probing

10.4.3 Double Hashing

h(x,i) = (h1(x)+ i * h2 (x)) mod m where m is the hash table size

Figure 5 Double Hashing Example

10.5 COMPARISON OF COLLISION RESOLUTION

Collision Separate Open Open Double

10.6 LOAD FACTOR AND REHASHING

Given below are the high-level steps for rehashing:

☞ Check Your Progress – 2

☞ Check Your Progress – 2

10.9 FURTHER READINGS

Horowitz, Ellis, Sartaj Sahni, and Susan Anderson-Freed. Fundamentals of data

Cormen, Thomas H., et al. Introduction to algorithms. MIT press, 2022.

You might also like