0% found this document useful (0 votes)

90 views39 pages

AST20105 Data Structure and Algorithms: Chapter 9 - Hash Table

This document provides an overview of hash tables and related concepts like hash functions, collisions, load factors, and examples. The key points are: 1) A hash table uses a hash function to store data in an array format and allow fast lookup by mapping keys to indexes, but collisions can occur if different keys hash to the same value. 2) Common hash functions include division, folding, mid-square method, and they aim to uniformly distribute hash values to minimize collisions. 3) Load factor and resizing help manage space usage as more elements are added. Collision resolution methods like open addressing and closed addressing specify how to handle collisions. 4) Examples demonstrate hashing strings to indexes using ASCII values

Uploaded by

HANG XU

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

90 views39 pages

AST20105 Data Structure and Algorithms: Chapter 9 - Hash Table

Uploaded by

HANG XU

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

AST20105

Data Structure and Algorithms

Chapter 9 - Hash Table

Hash Table
● Hash Table is a data structure which stores data in an associative manner. In a
hash table, data is stored in an array format, where each data value has its own
unique index value. Access of data becomes very fast if we know the index of
the desired data.
● Thus, it becomes a data structure in which insertion and search operations are
very fast irrespective of the size of the data.
● Hash Table uses an array as a storage medium and uses hash technique to
generate an index where an element is to be inserted or is to be located from.

2
Hash Function
● Hash function is very important part of hash table design.
● A hash function, denoted as h, provides a method to find the table index from key
/ data
○ In other words, it maps keys into locations of a hash table
● A key k hashes to location h(k), where h(k) is the hash value of k
● Hashing refers to a process of inserting keys / data to hash table with the help of
hash function

3
Hash Function
● Hash function is considered to be good, if it provides uniform distribution of hash
values.
● The reason, why hash function is a subject to the principal concern, is that poor
hash functions cause collisions and some other unwanted effects, which badly
affect hash table overall performance.

4
Hashing
● Hashing is a technique to convert a range of key values into a range of indexes
of an array.

5
Load Factor
● Basic underlying data structure used to store hash table is an array.
● The load factor is the ratio between the number of stored items and array's size.
Load factor = # of elements in the array / size of array
● Some says the ideal load factor should be maintained below 0.75.
● Hash table can whether be of a constant size or being dynamically resized, when
load factor exceeds some threshold.
● Resizing is done before the table becomes full to keep the number of collisions
under certain amount and prevent performance degradation.

6
Collision
● What happens, if hash function returns the same hash value for different keys?
● It yields an effect, called collision.
● Solutions:
○ Design a better hash function that can be computed efficiently and minimize
the number of collision.
○ Design collision resolution algorithm.

7
Collision
● Collisions are practically unavoidable and should be considered when one
implements hash table.
● Due to collisions, keys are also stored in the table, so one can distinguish
between key-value pairs having the same hash value.
● There are various ways of collision resolution. Basically, there are two different
strategies:
○ Closed addressing (not changing the hash value)
○ Open addressing (changing to other hash value)

8
Open Hashing (Closed Addressing)
● Each slot of the hash table contains
a link to another data structure (i.e.
linked list), which stores key-value
pairs with the same hash.
● When collision occurs, this data
structure is searched for key-value
pair, which matches the key.

9
Closed Hashing (Open Addressing)
● Each slot actually contains a key-value pair.
● When collision occurs, open addressing algorithm calculates another location to
locate a free slot.
● Hash tables, based on open addressing strategy experience drastic performance
decrease, when table is tightly filled (load factor is 0.75 or more).

10
Closed Hashing vs Open Hashing

11
Simple Example of Hash Table

12
Value Key

Example 1
A 65
L 76
G 71
● Put “ALGORITHMS” into a hash table while
O 79
the keys are their ASCII values.
R 82
I 73
T 84
H 72
M 77
S 83

13
Example 1
● Suppose the size of hash table is 16.
● Putting “A” into the hash table:
○ Key % arrSize = 65 % 16 = 1
○ “A” is put at index 1
● Putting “L” into the hash table:
○ Key % arrSize = 76 % 16 = 12
○ “L” is put at index 12
● And so forth...

14
Value Key

Example 2 C 67

O 79
● Put “COMPUTERS” into a hash table while
the keys are their ASCII values. M 77

P 80

U 85

T 84

E 69

R 82

S 83
15
Example 2
● Suppose the size of hash table is 16.
● Putting “C” into the hash table:
○ Key % arrSize = 67 % 16 = 3
○ “C” is put at index 3
● And so forth...
○ “O” @ 15
○ ”M” @ 13 Slot 5 is already occupied!
Collision handling is
○ “P” @ 0 required
○ “U” @ 5
○ “T” @ 4
● When put “E” into the hash table
○ Key % arrSize = 69 % 16 = 5
16
Hash Functions

17
Hashing Function
● The number of hash functions that can be used to assign positions to n items in a
table of m positions (for n <= m) is equal to mn.
○ Most of these functions are too unwieldy for practical applications and
cannot be represented by a concise formula.

18
Hashing Function - Division
● A hash function guarantees that the number it returns is a valid index to one of the
table cells.
● The simplest way to accomplish this is to use division modulo
○ TSize = sizeof(table), as in
○ h(K) = K mod TSize, if K is a number.

19
Hashing Function - Folding
● In this method, the key is divided into several parts.
● These parts are combined or folded together and are often transformed in a
certain way to create the target address.
● There are two types of folding:
○ Shift folding
○ Boundary folding

20
Hashing Function - Shift Folding
● In shift folding, they are put underneath one another and then processed.
● For example, a social security number (SSN)
○ 123456789 can be divided into three parts,
○ 123, 456, 789, and then these parts can be added.
● The resulting number, 1,368,
○ can be divided modulo TSize or,
○ if the size of the table is 1,000, the first three digits can be used for the
address.

21
Hashing Function - Boundary Folding
● With boundary folding, the key is seen as being written on a piece of paper that is
folded on the borders between different parts of the key.
● In this way, every other part will be put in the reverse order.
● Consider the same three parts of the SSN: 123, 456 and 789.
○ The first part, 123, is taken in the same order,
○ then the piece of paper with the second part is folded underneath it so that
123 is aligned with 654, which is the second part, 456, in reverse order.
○ When the folding continues, 789 is aligned with the two previous parts.
● The result is 123 + 654 +789 = 1,566.
○ can be divided modulo TSize or,
○ if the size of the table is 1,000, the first three digits can be used for the
address.
22
Hashing Function - Mid-Square Method
● In the mid-square method, the key is squared and the middle or mid part of the
result is used as the address.
● If the key is a string, it has to be preprocessed to produce a number by using, for
instance, folding.
● In a mid-square hash function, the entire key participates in generating the
address so that there is a better chance that different addresses are generated for
different keys.
● For example,
○ if the key is 3,121, then 3,1212 = 9,740,641,
○ and for the 1,000-cell table, h(3,121) = 406, which is the middle part of
3,1212.

23
Collision Handling

24
Collision Handling
● In the small number of cases, where multiple keys map to the same integer, then
elements with different keys may be stored in the same "slot" of the hash table.
● It is clear that when the hash function is used to locate a potential match, it will be
necessary to compare the key of that element with the search key.
● There may be more than one element which should be stored in a single slot of
the table.
● Various techniques are used to manage this problem:
○ separate chaining (or chaining)
○ probing (linear and quadratic) and
○ re-hashing, etc.

25
Separate Chaining (Chaining)
● One simple scheme is to chain all collisions in
lists attached to the appropriate slot.
● This allows an unlimited number of collisions to
be handled and doesn't require a prior
knowledge of how many elements are contained
in the collection.
● The tradeoff is the same as with linked lists
versus array implementations of collections:
linked list overhead in space and, to a lesser
extent, in time.

26
Separate Chaining (Chaining)
● To insert key k to hash table Worst case: O(1)
○ Compute h(k) to determine where to insert the element
○ If T[h(k)] is NULL, make this table cell to point to a node contains k
○ Otherwise, add a node contains k to the beginning of the list
● To search for a key k Worst case: O(n)
○ Compute h(k) and search within the list at T[h(k)]
● To delete a key k from the hash table T Worst case: O(n)
○ Compute h(k) to determine where to remove the element
○ Search within the list at T[h(k)] and delete the node contains k if it is found

27
Separate Chaining (Chaining)
Pros: Cons:
● The number of keys in each linked ● More space is needed as linked list
list is a small constant (assuming the is used
hash function is well defined) and ● Memory allocation of node and
this facilitates constant time, i.e. manipulation of pointers slow down
O(1) for searching, insertion and the program
deletion of elements on average
● Deletion is easy

28
Probing
● Probing refers to finding other available place if collision occurs
● Hash function of open addressing is as follows:
h(k, i) = ( hi(k) + f(i) ) mod size, where f(i) is the collision resolution function
○ Typically f(0) = 0
● Hash functions for different open addressing schemes:
○ Linear probing: f(i) = i
○ Quadratic probing: f(i) = i2
○ Re-hashing: f(i) = i * h2(k), where h2(k) is another hash function.

29
Probing
● To insert key k to hash table
○ Probe hash table until an empty slot is found
● To search for a key k
○ Probe hash table until the key is found or confirmed that it is not found
● To delete a key k from the hash table T
● Problem:
○ If the key is deleted, but there are keys that hash to the same location stored
in other locations in the table, then searches for those keys will be treated as
unsuccessful
● Must be “lazy” delete
○ Keep the key in the table, but mark it as deleted.
○ New key will overwrite the location marked as deleted
30
Probing: Linear Probing
● One of the simplest re-hashing
functions is +1 (or -1), ie on a
collision, look in the neighboring slot
in the table.
● It calculates the new address
extremely quickly and may be
extremely efficient.

31
Probing: Linear Probing
● Clustering
○ Linear probing is subject to a clustering phenomenon.
○ Re-hashes from one location occupy a block of slots in the table which
"grows" towards slots to which other keys hash.
○ This exacerbates the collision problem and the number of re-hashed can
become large.

32
Probing: Linear Probing
Pros: Cons:
● Easy to implement. ● Hash table has fixed size.
● Use less memory than separate ● Likely with block of contiguously
chaining. occupied entries (clustering) and this
● Fast when table is sparse. causes bad performance since:
○ It increases chances of
collisions
○ It increases the searching time
of elements

33
Probing: Quadratic Probing
Quadratic probing is an open-addressing scheme where
we look for i2-th slot in i-th iteration if the given hash value x
collides in the hash table.
How Quadratic Probing is done?
Let hash(x) be the slot index computed using the hash
function.
● If the slot hash(x) % S is full, then we try (hash(x) + 12) % S.
● If (hash(x) + 1*1) % S is also full, then we try (hash(x) + 22) % S.
● If (hash(x) + 2*2) % S is also full, then we try (hash(x) + 32) % S.
● This process is repeated for all the values of i until an empty slot is
found.

34
Probing: Quadratic Probing
Pros: Cons:
● Easy to implement. ● Keys that hash to the same initial
● Resolve primary clustering issue. location will probe the same
alternative cells and this causes
clustering around the probe
sequences (called second
clustering)

35
Probing: Re-hashing
● Re-hashing schemes use a second hashing operation when there is a collision.
● If there is a further collision, we re-hash until an empty "slot" in the table is found.
● The re-hashing function can either be a new function or a re-application of the
original one.
● As long as the functions are applied to a key in the same order, then a sought key
can always be located.

36
Probing: Re-hashing

37
Probing: Re-hashing
● How to choose second hash function?
○ Shouldn’t evaluate to zero
○ Relatively prime to the size of table .
○ Otherwise, only a fraction of table entries will be examined

● Pros:
○ Eliminate secondary clustering
● Cons:
○ Time consuming to compute two hash functions

38
Q&A

Kruse - Data Structures and Program Design in C 1991
100% (2)
Kruse - Data Structures and Program Design in C 1991
272 pages
Lect Hashing
No ratings yet
Lect Hashing
36 pages
05 Hashing
No ratings yet
05 Hashing
47 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
26 pages
Hashing Updated
No ratings yet
Hashing Updated
26 pages
06 - APS - Hash Table
No ratings yet
06 - APS - Hash Table
28 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
32 pages
11 Hashtable-1
No ratings yet
11 Hashtable-1
48 pages
DSA MK Lect2 PDF
No ratings yet
DSA MK Lect2 PDF
92 pages
Lec12 Hash Tables 09092024 090609pm
No ratings yet
Lec12 Hash Tables 09092024 090609pm
48 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
27 pages
Hashing PPT For Student
No ratings yet
Hashing PPT For Student
53 pages
Unit 1 Hashing
No ratings yet
Unit 1 Hashing
61 pages
Hashing in Data Structure
No ratings yet
Hashing in Data Structure
43 pages
Hashing Slide
No ratings yet
Hashing Slide
16 pages
SORTING PROGRAMS - Counting + Bucket + Heap
No ratings yet
SORTING PROGRAMS - Counting + Bucket + Heap
27 pages
09 Hashtable
No ratings yet
09 Hashtable
53 pages
Dsa 4
No ratings yet
Dsa 4
55 pages
Hashing
No ratings yet
Hashing
20 pages
L-2005-08-Advance Data Structure Part 1-HS
No ratings yet
L-2005-08-Advance Data Structure Part 1-HS
46 pages
Hashing
No ratings yet
Hashing
23 pages
Hashing
No ratings yet
Hashing
44 pages
Lab 09 - Hashing
No ratings yet
Lab 09 - Hashing
47 pages
Hashing
No ratings yet
Hashing
37 pages
Hashing Techniques
No ratings yet
Hashing Techniques
13 pages
Cse373 10 Hashing
No ratings yet
Cse373 10 Hashing
36 pages
ADI Hashing
No ratings yet
ADI Hashing
47 pages
11 Hash Tables Slides
No ratings yet
11 Hash Tables Slides
34 pages
Hashing PDF
No ratings yet
Hashing PDF
56 pages
Hash Tables: Dr. Dibakar Saha
No ratings yet
Hash Tables: Dr. Dibakar Saha
26 pages
Hashing
No ratings yet
Hashing
56 pages
3 Hashing
No ratings yet
3 Hashing
20 pages
Hashing New
No ratings yet
Hashing New
48 pages
Dsa Lecture 13 Hash Tables
No ratings yet
Dsa Lecture 13 Hash Tables
15 pages
Chapter One - Hashing PDF
No ratings yet
Chapter One - Hashing PDF
30 pages
Modifed Hash
No ratings yet
Modifed Hash
42 pages
Topic 1: Hashing - Introduction: Hashing Is A Method of Storing and Retrieving Data From A Database Efficiently
No ratings yet
Topic 1: Hashing - Introduction: Hashing Is A Method of Storing and Retrieving Data From A Database Efficiently
31 pages
Week 9 - Hash Functions and Collision
No ratings yet
Week 9 - Hash Functions and Collision
73 pages
UNIT V - Hashing
No ratings yet
UNIT V - Hashing
20 pages
Hash Table: Didih Rizki Chandranegara
No ratings yet
Hash Table: Didih Rizki Chandranegara
33 pages
Hash Tables: COT4810 Ken Pritchard 2 Sep 04
No ratings yet
Hash Tables: COT4810 Ken Pritchard 2 Sep 04
20 pages
GROUP 15.Pptx Presentation
No ratings yet
GROUP 15.Pptx Presentation
29 pages
Hashing
No ratings yet
Hashing
4 pages
Hashing
No ratings yet
Hashing
25 pages
DSA2 Chapter 5 Hashing
No ratings yet
DSA2 Chapter 5 Hashing
44 pages
Handout 9 - Hashing
No ratings yet
Handout 9 - Hashing
11 pages
Hashing RPK
No ratings yet
Hashing RPK
61 pages
DSA G5 Hashing Handouts
No ratings yet
DSA G5 Hashing Handouts
7 pages
Dsa Labtask 12
No ratings yet
Dsa Labtask 12
5 pages
Lecture 27 - Hashing
No ratings yet
Lecture 27 - Hashing
48 pages
CH 4
No ratings yet
CH 4
58 pages
Final Hashing
No ratings yet
Final Hashing
41 pages
Algorithms & Data Structures 06
No ratings yet
Algorithms & Data Structures 06
13 pages
Hashing and Indexing
No ratings yet
Hashing and Indexing
28 pages
Dshash
No ratings yet
Dshash
4 pages
Lab08 - DS - Hash Tables
No ratings yet
Lab08 - DS - Hash Tables
9 pages
Idst 2016 SA 05 Hashing
No ratings yet
Idst 2016 SA 05 Hashing
68 pages
Hash Functions
No ratings yet
Hash Functions
60 pages
Hashng Notes SVIMS
No ratings yet
Hashng Notes SVIMS
14 pages
Hashing
From Everand
Hashing
Prakash Hegade
No ratings yet
Ti 84 Plus Calculator: QuickStudy Laminated Reference Guide
From Everand
Ti 84 Plus Calculator: QuickStudy Laminated Reference Guide
Ken Yablonsky
No ratings yet
Pipeline Partitioning Overview Informatica
80% (5)
Pipeline Partitioning Overview Informatica
3 pages
Dbms Unit III Notes
No ratings yet
Dbms Unit III Notes
27 pages
Ds II
No ratings yet
Ds II
135 pages
Data Structure Course
No ratings yet
Data Structure Course
48 pages
358 33 Powerpoint Slides DSC Chapter 15
No ratings yet
358 33 Powerpoint Slides DSC Chapter 15
55 pages
Java Collections Interview Questions
No ratings yet
Java Collections Interview Questions
11 pages
Cs301 Final Term Solved Paper Mega File
No ratings yet
Cs301 Final Term Solved Paper Mega File
31 pages
DS Through C
100% (1)
DS Through C
542 pages
PDSA Week 3
No ratings yet
PDSA Week 3
33 pages
C Programming and Data Structures
No ratings yet
C Programming and Data Structures
5 pages
6.005 Elements of Software Construction: Mit Opencourseware
No ratings yet
6.005 Elements of Software Construction: Mit Opencourseware
37 pages
Indexing
No ratings yet
Indexing
4 pages
103 Performance Analysis
No ratings yet
103 Performance Analysis
107 pages
Java
No ratings yet
Java
50 pages
Ders7 - Data Structures and Search Algorithms
No ratings yet
Ders7 - Data Structures and Search Algorithms
41 pages
M.C.a. (Sem - II) Paper - I - Data Structures
No ratings yet
M.C.a. (Sem - II) Paper - I - Data Structures
132 pages
TCS Aptitude Placement Questions and Answers
100% (1)
TCS Aptitude Placement Questions and Answers
9 pages
Java Unit-4
No ratings yet
Java Unit-4
44 pages
CD Important Questions With Answers
No ratings yet
CD Important Questions With Answers
34 pages
Unit 1 1
No ratings yet
Unit 1 1
63 pages
Accenture Interview Questions
No ratings yet
Accenture Interview Questions
13 pages
Complete Dsa Roadmap
No ratings yet
Complete Dsa Roadmap
1 page
B202 Hashing
No ratings yet
B202 Hashing
32 pages
Primitive Data Structure S: Introduction To Data Structures
No ratings yet
Primitive Data Structure S: Introduction To Data Structures
7 pages
Ads Complete Notes
No ratings yet
Ads Complete Notes
54 pages
PERL Programming Basic
100% (3)
PERL Programming Basic
106 pages
AutoCAD Making VB - Net As Easy As VBA
100% (1)
AutoCAD Making VB - Net As Easy As VBA
28 pages
Bengaluru City University: As Per SEP 2024)
No ratings yet
Bengaluru City University: As Per SEP 2024)
24 pages
MongoDB Indexes Guide
No ratings yet
MongoDB Indexes Guide
68 pages

AST20105 Data Structure and Algorithms: Chapter 9 - Hash Table

Uploaded by

AST20105 Data Structure and Algorithms: Chapter 9 - Hash Table

Uploaded by

AST20105

Data Structure and Algorithms

Chapter 9 - Hash Table

You might also like