0% found this document useful (0 votes)

13 views19 pages

CSC 302 - Hashing Techniques

This document introduces hashing and hash tables. It discusses hash functions, types of hashing (static and dynamic), hash table operations, common hashing functions like division remainder and folding, and applications of hash tables like in database systems and symbol tables. Hash tables allow accessing data in constant time and are useful when fast lookups are required.

Uploaded by

Matthew Tedunjaiye

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views19 pages

CSC 302 - Hashing Techniques

Uploaded by

Matthew Tedunjaiye

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Introduction to Hashing & Hashing Techniques

• Review of Searching Techniques

• Introduction to Hashing

• Hash Tables

• Types of Hashing

• Hash Functions

• Applications of Hash Tables

• roblems for which Hash Tables are not suitable

1
Review of Searching Techniques
•PRecall the efficiency of searching techniques covered earlier.

•PThe sequential search algorithm takes time proportional to the

data size, i.e, O(n).

•PBinary search improves on liner search reducing the search time

to O(log n).

•PWith a BST, an O(log n) search efficiency can be obtained; but

the worst-case complexity is O(n).

•PTo guarantee the O(log n) search time, BST height balancing is

required ( i.e., AVL trees).
2
Review of searching Techniques (cont’d)
• PThe efficiency of these search strategies depends on the number of items
in the container being searched.

• Search methods with efficiency independent on data size would be better.

• Consider the following Java class that describes a student record:

class StudentRecord {
String name; // Student name
double height; // Student height
long id; // Unique id
}

• The id field in this class can be used as a search key for records in the
container.

3
Introduction to Hashing
• Suppose that we want to store 10,000 students records (each with a 5-digit ID) in
a given container.

A linked list implementation would take O(n) time.

A height balanced tree would give O(log n) access time.

Using an array of size 100,000 would give O(1) access time but will lead
to a lot of space wastage.

• Is there some way that we could get O(1) access without wasting a lot of space?

• The answer is hashing.

4
Example 1: Illustrating Hashing

• Use the function f(r) = r.id % 13 to load the following

records into an array of size 13.

Al-Otaibi, Ziyad 1.73 985926

Al-Turki, Musab Ahmad Bakeer 1.60 970876
Al-Saegh, Radha Mahdi 1.58 980962
Al-Shahrani, Adel Saad 1.80 986074
Al-Awami, Louai Adnan Muhammad 1.73 970728
Al-Amer, Yousuf Jauwad 1.66 994593
Al-Helal, Husain Ali AbdulMohsen 1.70 996321
5
Example 1: Introduction to Hashing (cont'd)
Name ID h(r) = id % 13
Al-Otaibi, Ziyad 985926 6
Al-Turki, Musab Ahmad Bakeer 970876 10
Al-Saegh, Radha Mahdi 980962 8
Al-Shahrani, Adel Saad 986074 11
Al-Awami, Louai Adnan Muhammad 970728 5
Al-Amer, Yousuf Jauwad 994593 2
Al-Helal, Husain Ali AbdulMohsen 996321 1

0 1 2 3 4 56 7 89 10 11 12

6
Hash Tables
• There are two types of Hash Tables: Open-addressed Hash Tables and Separate-
Chained Hash Tables.

• An Open-addressed Hash Table is a one-dimensional array indexed by

integer values that are computed by an index function called a hash function.

• A Separate-Chained Hash Table is a one-dimensional array of linked lists indexed

by integer values that are computed by an index function called a hash function.

• Hash tables are sometimes referred to as scatter tables..\

• Typical hash table operations are:

Initialization.
Insertion.
Searching
Deletion.
7
Types of Hashing
• There are two types of hashing :
1.PStatic hashing: In static hashing, the hash function maps search-key
values to a fixed set of locations.

2.Dynamic hashing: In dynamic hashing a hash table can grow to handle

more items. The associated hash function must change as the table
grows.

• The load factor of a hash table is the ratio of the number of keys in the table to
the size of the hash table.

• Note: The higher the load factor, the slower the retrieval.

• With open addressing, the load factor cannot exceed 1. With chaining
the load factor often exceeds 1.

8
Hash Functions
• A hash function, h, is a function which transforms a key from a set,P,
K into
an index in a table of size n:
h: K -> {0, 1, , n-2, n-1}

• A key can be a number, a string, a record etc.

• The size of the set of keys, |K|, to be relatively very large.

• It is possible for different keys to hash to the same array location.

• This situation is called collision and the colliding keys are called synonyms.

9
’
Hash Functions (cont Pd)

• A good hash function should:

Minimize collisions.

Be easy and quick to compute.

Distribute key values evenly in the hash table.

Use all the information provided in the key.

10
Common Hashing Functions
1.PDivision Remainder (using the table size as the divisor)

• Computes hash value from key using the % operator.

• Table size that is a power of 2 like 32 and 1024 should be avoided, for it leads to
more collisions.

• Also, powers of 10 are not good for table sizes when the keys rely on decimal
integers.

• rime numbers not close to powers of 2 are better table size values.

11
Common Hashing Functions (cont ’ d)
2.Truncation or Digit/Character Extraction

• Works based on the distribution of digits or characters in the key.

• More evenly distributed digit positions are extracted and used for hashing
purposes.

• For instance, students IDs or ISBN codes may contain common subsequences
which may increase the likelihood of collision.

• Very fast but digits/characters distribution in keys may not be very even.

12
Common Hashing Functions (cont ’ d)
3. Folding

• It involves splitting keys into two or more parts and then combining the parts
to form the hash addresses.

• To map the key 25936715 to a range between 0 and 9999, we can:

split the number into two as 2593 and 6715 and
add these two to obtain 9308 as the hash value.

• Very useful if we have keys that are very large.

• Fast and simple especially with bit patterns.

• A great advantage is ability to transform non-integer keys into integer values.

13
Common Hashing Functions (cont ’ d)
4.Radix Conversion

• Transforms a key into another number base to obtain the hash value.

• Typically use number base other than base 10 and base 2 to calculate the hash
addresses.

• To map the key 55354 in the range 0 to 9999 using base 11 we have:

5535410 = 3865211

• We may truncate the high-order 3 to yield 8652 as our hash address within 0 to
9999.

14
Common Hashing Functions (cont ’ d)

5.Mid-Square

• The key is squared and the middle part of the result taken as the hash value.

• To map the key 3121 into a hash table of size 1000, we square it 31212 =
9740641 and extract 406 as the hash value.

• Works well if the keys do not contain a lot of leading or trailing zeros.

• Non-integer keys have to be preprocessed to obtain corresponding integer values.

15
Common Hashing Functions (cont ’ d)
6. Use of a Random-Number Generator

• Given a seed as parameter, the method generates a random number.

• The algorithm must ensure that:

• It always generates the same random value for a given key.

• It is unlikely for two keys to yield the same random value.

• The random number produced can be transformed to produce a valid hash

value.

16
Some Applications of Hash Tables
• Database systems: Specifically, those that require efficient random access. Generally,
database systems try to optimize between two types of access methods: sequential and
random. Hash tables are an important part of efficient random access because they provide
a way to locate data in a constant amount of time.

• Symbol tables: The tables used by compilers to maintain information about symbols from
a program. Compilers access information about symbols frequently. Therefore, it is
important that symbol tables be implemented very efficiently.

• Data dictionaries: Data structures that support adding, deleting, and searching for data.
Although the operations of a hash table and a data dictionary are similar, other data
structures may be used to implement data dictionaries. Using a hash table is particularly
efficient.

• Network processing algorithms: Hash tables are fundamental components of several

network processing algorithms and applications, including route lookup, packet
classification, and network monitoring.

• Browser Cashes: Hash tables are used to implement browser cashes.

17
Problems for Which Hash Tables are not Suitable

1. ,Problems for which data ordering is required.

Because a hash table is an unordered data structure, certain operations are difficult and
expensive. Range queries, proximity queries, selection, and sorted traversals are possible
only if the keys are copied into a sorted data structure. There are hash table implementations
that keep the keys in order, but they are far from efficient.

2. Problems having multidimensional data.

3. Prefix searching especially if the keys are long and of variable-lengths.

4. Problems that have dynamic data:

Open-addressed hash tables are based on 1D-arrays, which are difficult to resize
once they have been allocated. Unless you want to implement the table as a
dynamic array and rehash all of the keys whenever the size changes. This is an
incredibly expensive operation. An alternative is use a separate-chained hash
tables or dynamic hashing.

5. Problems in which the data does not have unique keys.

Open-addressed hash tables cannot be used if the data does not have unique
keys. An alternative is use separate-chained hash tables.
18

AI Essentials, A Guide To Understanding Artificial Intelligence Without The Tech Talk
100% (1)
AI Essentials, A Guide To Understanding Artificial Intelligence Without The Tech Talk
78 pages
Hashing
No ratings yet
Hashing
13 pages
Unit28 Hashing1
No ratings yet
Unit28 Hashing1
19 pages
Introduction To Hashing & Hashing Techniques: Review of Searching Techniques
No ratings yet
Introduction To Hashing & Hashing Techniques: Review of Searching Techniques
19 pages
Hashing in Data Structure
No ratings yet
Hashing in Data Structure
43 pages
DS Module-X
No ratings yet
DS Module-X
74 pages
Hashing
No ratings yet
Hashing
23 pages
05 Hashing
No ratings yet
05 Hashing
47 pages
Lect Hashing
No ratings yet
Lect Hashing
36 pages
Hash
No ratings yet
Hash
7 pages
09 Hashtable
No ratings yet
09 Hashtable
53 pages
Dsa 4
No ratings yet
Dsa 4
55 pages
Hashing
No ratings yet
Hashing
44 pages
Hashing Unit 1
No ratings yet
Hashing Unit 1
91 pages
Week 9 - Hash Functions and Collision
No ratings yet
Week 9 - Hash Functions and Collision
73 pages
Hashing
No ratings yet
Hashing
56 pages
Hashing and Indexing
No ratings yet
Hashing and Indexing
28 pages
Hashing Presentation
No ratings yet
Hashing Presentation
12 pages
DSA MK Lect2 PDF
No ratings yet
DSA MK Lect2 PDF
92 pages
Hashing
No ratings yet
Hashing
37 pages
Lec12 Hash Tables 09092024 090609pm
No ratings yet
Lec12 Hash Tables 09092024 090609pm
48 pages
Hashing
No ratings yet
Hashing
35 pages
Hash Function
No ratings yet
Hash Function
4 pages
Hashing
No ratings yet
Hashing
4 pages
SORTING PROGRAMS - Counting + Bucket + Heap
No ratings yet
SORTING PROGRAMS - Counting + Bucket + Heap
27 pages
Hash Tables: Dr. Dibakar Saha
No ratings yet
Hash Tables: Dr. Dibakar Saha
26 pages
Chapter One - Hashing PDF
No ratings yet
Chapter One - Hashing PDF
30 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
27 pages
Hashing PDF
No ratings yet
Hashing PDF
56 pages
Unit 7
No ratings yet
Unit 7
27 pages
Hash Tables: COT4810 Ken Pritchard 2 Sep 04
No ratings yet
Hash Tables: COT4810 Ken Pritchard 2 Sep 04
20 pages
Module 5 Hashing
No ratings yet
Module 5 Hashing
66 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
32 pages
Hashing
No ratings yet
Hashing
4 pages
Hashing
No ratings yet
Hashing
23 pages
Lecture 7 - Hash - Table - Direct - Adreess - Tables - Hash - Tables - Intro - Separate - Chaining
No ratings yet
Lecture 7 - Hash - Table - Direct - Adreess - Tables - Hash - Tables - Intro - Separate - Chaining
77 pages
Hash Function
No ratings yet
Hash Function
9 pages
Lecture 3.2.1 Hashing
No ratings yet
Lecture 3.2.1 Hashing
17 pages
Module 5
No ratings yet
Module 5
33 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
26 pages
Unit III-Hashing
100% (1)
Unit III-Hashing
135 pages
Hashing Presentation
No ratings yet
Hashing Presentation
12 pages
Hashing
No ratings yet
Hashing
30 pages
Module 5
No ratings yet
Module 5
72 pages
13 Hashing
No ratings yet
13 Hashing
26 pages
GROUP 15.Pptx Presentation
No ratings yet
GROUP 15.Pptx Presentation
29 pages
MODULE 5 - BCS304 - HASHING - Leftisht Trees - OBST - Notes
No ratings yet
MODULE 5 - BCS304 - HASHING - Leftisht Trees - OBST - Notes
32 pages
Hashing Updated
No ratings yet
Hashing Updated
26 pages
Cse373 10 Hashing
No ratings yet
Cse373 10 Hashing
36 pages
11 Hashtable-1
No ratings yet
11 Hashtable-1
48 pages
Finals Complexity and Algorithmn
No ratings yet
Finals Complexity and Algorithmn
49 pages
Hashing in DBMS
No ratings yet
Hashing in DBMS
5 pages
Hash Tables - : Structure
No ratings yet
Hash Tables - : Structure
21 pages
L5 HashTables
No ratings yet
L5 HashTables
22 pages
Hashing
No ratings yet
Hashing
34 pages
Hashing Part 1 Lecture
No ratings yet
Hashing Part 1 Lecture
33 pages
UNIT 1 - Hashing
No ratings yet
UNIT 1 - Hashing
118 pages
UNIT 1 - Hashing
No ratings yet
UNIT 1 - Hashing
118 pages
DSA Lab 11 Hashing
No ratings yet
DSA Lab 11 Hashing
9 pages
HASHING
No ratings yet
HASHING
8 pages
Proyecto IA2
No ratings yet
Proyecto IA2
14 pages
DBMS Presentation Vedika and Pratiksha
No ratings yet
DBMS Presentation Vedika and Pratiksha
15 pages
Testing Harmoin Production
No ratings yet
Testing Harmoin Production
3 pages
Audio/Visual Emotion and Depression Recognition Baker Rodrigo Ocumpaugh Monitoring Protocol
No ratings yet
Audio/Visual Emotion and Depression Recognition Baker Rodrigo Ocumpaugh Monitoring Protocol
4 pages
Convolution Presentation
No ratings yet
Convolution Presentation
65 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
1st Internals QB 15CS562
No ratings yet
1st Internals QB 15CS562
4 pages
Array Representation
No ratings yet
Array Representation
6 pages
Edge AI and IoT Integration
No ratings yet
Edge AI and IoT Integration
3 pages
Fuzzy Image Classification Using Multiresolution Neural Networks With Applications To Remote Sensing
No ratings yet
Fuzzy Image Classification Using Multiresolution Neural Networks With Applications To Remote Sensing
4 pages
Flashback
No ratings yet
Flashback
3 pages
Active Online Learning For Social Media Analysis To Support Crisis Management
No ratings yet
Active Online Learning For Social Media Analysis To Support Crisis Management
8 pages
Journal On Estimate Food Calorie
No ratings yet
Journal On Estimate Food Calorie
9 pages
ICECE 2018 Paper 28 PDF
100% (1)
ICECE 2018 Paper 28 PDF
4 pages
2 Marks Deep Learning
No ratings yet
2 Marks Deep Learning
4 pages
Automatic Number Plate Recognition 1
No ratings yet
Automatic Number Plate Recognition 1
20 pages
ADMS 3353 Formula Sheet Tables
No ratings yet
ADMS 3353 Formula Sheet Tables
8 pages
Preprocessing + EDA - Jupyter Notebook
No ratings yet
Preprocessing + EDA - Jupyter Notebook
30 pages
Sketch To Image Using GAN
No ratings yet
Sketch To Image Using GAN
6 pages
Course: Machine Learning Duration: 30-40 Hours Course Fee: 12000/-Course Description
No ratings yet
Course: Machine Learning Duration: 30-40 Hours Course Fee: 12000/-Course Description
2 pages
Artificial Intelligence Question Set
No ratings yet
Artificial Intelligence Question Set
2 pages
Demystifying Deep Convolutional Neural Networks - Adam Harley (2014) CNN PDF
No ratings yet
Demystifying Deep Convolutional Neural Networks - Adam Harley (2014) CNN PDF
27 pages
Paradigma Kompleksnosti U Rekonceptualizovanju Menad@Menta: Slavica P. Petrovi)
No ratings yet
Paradigma Kompleksnosti U Rekonceptualizovanju Menad@Menta: Slavica P. Petrovi)
33 pages
Controller (Notes)
No ratings yet
Controller (Notes)
22 pages
80879v00 Deep Learning Ebook
No ratings yet
80879v00 Deep Learning Ebook
15 pages
Turban 02
No ratings yet
Turban 02
88 pages
Curs6site PDF
No ratings yet
Curs6site PDF
40 pages
Curs 2 Info
No ratings yet
Curs 2 Info
13 pages
LESSON-13 Clinical Data Repositories
No ratings yet
LESSON-13 Clinical Data Repositories
19 pages

CSC 302 - Hashing Techniques

Uploaded by

CSC 302 - Hashing Techniques

Uploaded by

Introduction to Hashing & Hashing Techniques

• Review of Searching Techniques

• Applications of Hash Tables

• roblems for which Hash Tables are not suitable

•PThe sequential search algorithm takes time proportional to the

•PBinary search improves on liner search reducing the search time

•PWith a BST, an O(log n) search efficiency can be obtained; but

•PTo guarantee the O(log n) search time, BST height balancing is

• Search methods with efficiency independent on data size would be better.

• Consider the following Java class that describes a student record:

 A linked list implementation would take O(n) time.

 A height balanced tree would give O(log n) access time.

• The answer is hashing.

• Use the function f(r) = r.id % 13 to load the following

Al-Otaibi, Ziyad 1.73 985926

• An Open-addressed Hash Table is a one-dimensional array indexed by

• A Separate-Chained Hash Table is a one-dimensional array of linked lists indexed

• Hash tables are sometimes referred to as scatter tables..\

• Typical hash table operations are:

2.Dynamic hashing: In dynamic hashing a hash table can grow to handle

• A key can be a number, a string, a record etc.

• The size of the set of keys, |K|, to be relatively very large.

• It is possible for different keys to hash to the same array location.

• A good hash function should:

 Be easy and quick to compute.

 Distribute key values evenly in the hash table.

 Use all the information provided in the key.

• Computes hash value from key using the % operator.

• Works based on the distribution of digits or characters in the key.

• To map the key 25936715 to a range between 0 and 9999, we can:

• Very useful if we have keys that are very large.

• Fast and simple especially with bit patterns.

• A great advantage is ability to transform non-integer keys into integer values.

• Non-integer keys have to be preprocessed to obtain corresponding integer values.

• Given a seed as parameter, the method generates a random number.

• The algorithm must ensure that:

• It always generates the same random value for a given key.

• It is unlikely for two keys to yield the same random value.

• The random number produced can be transformed to produce a valid hash

• Network processing algorithms: Hash tables are fundamental components of several

• Browser Cashes: Hash tables are used to implement browser cashes.

1. ,Problems for which data ordering is required.

2. Problems having multidimensional data.

3. Prefix searching especially if the keys are long and of variable-lengths.

4. Problems that have dynamic data:

5. Problems in which the data does not have unique keys.

You might also like

A linked list implementation would take O(n) time.

A height balanced tree would give O(log n) access time.

Be easy and quick to compute.

Distribute key values evenly in the hash table.

Use all the information provided in the key.