0% found this document useful (0 votes)

21 views30 pages

Hash Tables

Hash tables use a hash function to map keys to values in an array. The main components of hashing are the key, hash function, and hash table. The hash function takes a key as input and outputs an index in the hash table where the corresponding value can be stored. Issues with hashing include collisions where different keys hash to the same index, which require collision resolution techniques. Computing effective hash functions that scramble keys uniformly is challenging. Java hashCode conventions require equal objects to have equal hash codes. Common types like integers, booleans, doubles, and strings have predefined hashCode implementations.

Uploaded by

dastanktl26

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views30 pages

Hash Tables

Uploaded by

dastanktl26

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 30

HASH TABLES

» HASH FUNCTION
» SEPARATE CHAINING
Department: Computer science
Course Code: CSS-215
Course Instructor: Asst. Prof. Dr. Mohammed Ala’anzy
Office no.: G-405
ST IMPLEMENTATIONS:
SUMMARY
Worst-case cost Average case
Implementation (After N inserts) (After N random inserts)
Search Insert Search hit Insert
Sequential search
(Unordered list) N N N/2 N
binary search
(ordered array) log N N log N N/2
BST N N 1.39 log N 1.39 log N
2-3 tree c Log N c Log N c Log N c Log N
red-black BST 2 log N 2 log N 1 log N* 1 log N*

Q. Can we do better?
A. Yes, but with different access to the data.
HASHING: INTRODUCTION
 Hashing is a process that transforms input data (of any size) into a
fixed-size string of characters, using a mathematical algorithm called a
hash function. This output, known as the hash code or hash value, is
typically used for indexing, data retrieval, or ensuring data integrity.

 As the amount of internet data keeps growing, there's a constant

struggle to handle it efficiently. Even in everyday programming, dealing
with data, although not massive, requires smooth and efficient storage,
access, and processing. To address this, one commonly used solution
is the Array data structure.
HASHING: INTRODUCTION
 The natural question arises: if Arrays already exist, why introduce a new
data structure? The answer lies in "efficiency." While storing data in an
Array takes O(1) time, searching requires at least O(log n) time. While
this may seem insignificant, for substantial datasets, it can lead to
significant problems, rendering the Array data structure inefficient.

 Now, the quest is for a data structure that enables both storing and
searching data in constant time, as O(1). This is where the Hashing
data structure becomes crucial. Its introduction allows for the effortless
storage and retrieval of data in constant time, addressing the efficiency
concerns that were previously a challenge.
THE MAIN COMPONENTS OF
HASHING
 Hashing involves three main components:

1. Key: The key serves as input to the hash function and can be a string or an integer.
It determines the index or location for storing an item in a data structure.

2. Hash Function: This function takes the input key and produces the hash index,
which is essentially the index of an element in an array known as a hash table.

3. Hash Table: The hash table is a data structure that utilizes a hash function to map
keys to values. It stores data in an associative manner within an array, where each
data value has a unique index, ensuring efficient retrieval based on the hashed key.
THE COMPONENTS OF
HASHING

Hash table
HASHING IMPLEMENTATION
 Consider a set of strings, for example, {“ab”, “cd”, “efg”,} that needs to be stored in a

table. The primary objective is to efficiently search or update values within the table in
O(1) time, with no emphasis on the ordering of the strings. In this context, the given
set of strings can serve as keys, and each string itself can act as its corresponding
value. The key challenge lies in determining the most effective method to store the
value associated with each key.
INDEX FINDING IN THE HASH
TABLE
 Step 1: Hash functions utilize mathematical formulas to determine a hash value,
which serves as the index in the data structure for storing the corresponding value.

 Step 2: Assign numerical values to alphabetical characters; for example, assign

 "x" = 5, "y" = 10, “z" = 25,
 “a” = 1, “b” = 3, “c” = 6,
 “n” = 8, “d” = 18, and so forth.

 Step 3: Calculate the numerical value by summing all characters in the string. For
instance, "xyz" would be 5 + 10 + 25 = 40.
INDEX FINDING IN THE HASH
TABLE
 Step 4: Assume a table of size 5 is available to store these strings.
Using the hash function (sum of characters in key mod Table size),
compute the location in the array by taking the sum(string) mod 5. “xyz”

 Step 5: Store the strings accordingly: 0 “abc”

 "abc" at location 10 mod 5 = 0, 1
 “and” at location 27 mod 5 = 2, 2 “and”
 "xyz" at location 40 mod 5 = 0, (handle any collision as needed, like using a 3
linked list or another method).
4
 The described method allows us to determine the position of a specific string
Hash table
through a straightforward hash function and swiftly retrieve the corresponding
stored value at that location. Thus, the concept of hashing appears to be an
effective approach for storing pairs of data (key, value) in a table.
HASHING: BASIC PLAN
Save items in a key-indexed table (index is a function of the key).
Hash function. Method for computing array index from key.

Issues related with Hashing

 Computing the hash function. (It can be easy for some data and it can be more harder for some
complicated data).
 Equality test: Method for checking whether two keys are equal. Instead of doing compare we will do
equality test.
 Collision resolution: Algorithm and data structure to handle two keys that hash to the same array index.
Since there are too many possible values. So we need a collision technique to solve this issue.
Classic space-time tradeoff.
 No space limitation: trivial hash function with key as index.
 No time limitation: trivial collision resolution with sequential search.
 Space and time limitations: hashing (the real world). We should consider both.
COMPUTING THE HASH
FUNCTION Key
Idealistic goal. Scramble the keys uniformly to produce a table index. There are two
requirements.
 Efficiently computable.
 Each table index equally likely for each key. (thoroughly researched problem, still
problematic in practical applications).
Ex 1. Phone numbers.
 Bad: first three digits.
 Better: last three digits. Table index

Practical challenge. Need different approach for each key type.

JAVA’S HASH CODE
CONVENTIONS
 All Java classes inherit a method hashCode(), which returns a 32-bit int.
Requirement. If x.equals(y), then (x.hashCode() == y.hashCode()).
Highly desirable. If !x.equals(y), then (x.hashCode() != y.hashCode()).

Customized implementations. Integer, Double, String, File, URL, Date, …

IMPLEMENTING HASH CODE:
INTEGERS, BOOLEANS, AND
DOUBLES
 Java library implementations

public final class Integer{ public final class Boolean{ public final class Double {
private final int value; private final boolean value; private final double value;
... ... ...
public int hashCode(){ public int hashCode(){ public int hashCode(){
return value; if (value) return 1231; long bits =
} else return 1237; doubleToLongBits(value);
} } return (int) (bits ^ (bits >>> 32));
} }
}
convert to IEEE 64-bit
representation;
xor most significant 32-bits
with least significant 32-bits
IMPLEMENTING HASH CODE:
STRINGS
 Java library implementation
char Unicode
public final class String{ • Horner's method to hash string of length
private final char[] s; … …
L: L multiplies/adds.
... ‘a’ 97
public int hashCode(){ • Equivalent to h = s[0] · + … + s[L – 3] · +
int hash = 0; ‘b’ 98
s[ L – 2 ] · + s[ L – 1 ] · .
for (int i = 0; i < length(); i++) ‘c’ 99
hash = s[i] + (31 * hash); … …
return hash;
} EX. String s = "call";
} int code = s.hashCode();
character of s 3045982 = 99· + 97· + 108· + 108·
= 108 + 31· (108 + 31 · (97 +
31 · (99)))
(Horner's method)
hashCode()
 The hashCode() method in Java calculates a hash code for a String object. The hash
code is a 32-bit integer that is used to identify the String object and to store it in a hash
table. The hash code is calculated using the following formula:
s[0]* + s[1]* + ... + s[n-1]

where:

“s” is the String object

“n” is the length of the String object
The power of each prime is the exponential operator
“31” is a prime number that is used to improve the distribution of hash codes
IMPLEMENTING HASH CODE:
USER-DEFINED TYPES
public final class Transaction implements Comparable<Transaction> {
private final String who;
private final Date when;
private final double amount;
public Transaction(String who, Date when, double amount)
{ /* as before */ }
...
public boolean equals(Object y)
{ /* as before */ }
public int hashCode() {
int hash = 17; //nonzero constant

hash = 31*hash + who.hashCode(); //for reference types, use hashCode()

hash = 31*hash + when.hashCode(); //for primitive types, use hashCode() of wrapper type
hash = 31*hash + ((Double) amount).hashCode(); //the 31 is typically a small prime
return hash; }
}
HASH CODE DESIGN
 "Standard" recipe for user-defined types.
 Combine each significant field using the 31x + y rule.
 If field is a primitive type, use wrapper type hashCode().
 If field is null, return 0.
 If field is a reference type, use hashCode(). (applies rule recursively)
 If field is an array, apply to each entry. (Or use Arrays.deepHashCode())

In practice. Recipe works reasonably well; used in Java libraries.

In theory. Keys are bitstring; "universal" hash functions exist.

Basic rule. Need to use the whole key to compute hash code;
MODULAR HASHING
 What we get back from the hashcode is an int between and .
 Hash function. An int between 0 and M - 1 (for use as array index). typically a prime or power of 2

private int hash(Key key){

return Math.abs(key.hashCode()) % M;
}
UNIFORM HASHING
ASSUMPTION
 Uniform hashing assumption. Each key is equally likely to hash to an integer between
0 and M - 1.
 Bins and balls. Throw balls uniformly at random into M bins.

Collisions

Expect two balls in the same bin after tosses.

Expect every bin has ≥ 1 ball after ~ M ln M tosses.
Load balancing. After M tosses, expect most loaded bin has Θ ( log M / log log M )
balls.
UNIFORM HASHING
ASSUMPTION
 Uniform hashing assumption. Each key is equally likely to hash to an integer between
0 and M - 1.

Java's String data uniformly distribute the keys of Tale of Two Cities
HASH TABLES
» HASH FUNCTION
» SEPARATE CHAINING
HASH TABLES
» HASH FUNCTION
» SEPARATE CHAINING
COLLISIONS
Collision. Two distinct keys hashing to same index.
 Birthday problem ⇒ can't avoid collisions unless you have a ridiculous (quadratic)
amount of memory.

Challenge. Deal with collisions efficiently.

COLLISION
 A occurs when two different inputs produce the same hash code or hash value. In
other words, two distinct keys are mapped to the same index in a hash table or have
the same hash code.

 Hash functions are designed to distribute values across a range of possible hash
codes, ideally minimizing collisions. However, due to the finite range of hash codes
and the infinite potential inputs, collisions are inevitable in hash functions. The
challenge is to handle collisions effectively to ensure the proper functioning of hash-
based data structures.
WAYS TO SOLVE THE
COLLISION ISSUE
There are various techniques to handle collisions:
1. Separate Chaining: Each bucket in the hash table is a linked list, and multiple values that hash to
the same index are stored in this linked list.
2. Open Addressing: The system looks for the next available slot when a collision occurs. There are
different strategies like linear probing, quadratic probing, and double hashing.
3. Robin Hood Hashing: Similar to linear probing, but when inserting a new element, it compares the
distance it has traveled with the distance of the element already present.

 Handling collisions is crucial because it ensures that hash-based data structures, like hash tables or
hash maps, maintain their efficiency in terms of constant-time (O(1)) average lookup, insertion, and
deletion operations. The choice of collision resolution strategy depends on the specific requirements
and characteristics of the application.
SEPARATE CHAINING
SYMBOL TABLE
Use an array of M < N linked lists. [H. P. Luhn, IBM 1953]
 Hash: map key to integer i between 0 and M - 1.
 Insert: put at front of chain (if not already there).
 Search: need to search only chain.
ANALYSIS OF SEPARATE
CHAINING
Consequence. Number of probes for search/insert is proportional to N / M.
 M too large ⇒ too many empty chains.

 M too small ⇒ chains too long.

 Typical choice: M ~ N / 5 ⇒ constant-time ops.

SEPARATE CHAINING ST: JAVA
IMPLEMENTATION
public class SeparateChainingHashST<Key, Value> { array doubling and
private int M = 97; // number of chains
halving code omitted
private Node[] st = new Node[M]; // array of chains
private static class Node {
private Object key; //no generic array creation
private Object val; //(declare key and value of type Object)
private Node next;
...
}
private int hash(Key key) {
return (key.hashCode() & 0x7fffffff) % M; }
public Value get(Key key) {
int i = hash(key);
for (Node x = st[i]; x != null; x = x.next)
if (key.equals(x.key)) return (Value) x.val;
return null;}}
ST IMPLEMENTATIONS:
SUMMARY
Worst-case cost Average case
Implementation (After N inserts) (After N random inserts)
Search Insert Search hit Insert
Sequential search
N N N/2 N
(Unordered list)
binary search
log N N log N N/2
(ordered array)
BST N N 1.39 log N 1.39 log N
2-3 tree c Log N c Log N c Log N c Log N
red-black BST 2 log N 2 log N 1 log N 1 log N
separate chaining Log N* Log N* 3-5* 3-5*

* under uniform hashing assumption

Thank you!

109search Hash Malik Ch09
100% (1)
109search Hash Malik Ch09
62 pages
Jntuk Ads Lab Manual
50% (2)
Jntuk Ads Lab Manual
27 pages
Static and Dynamic Hashing
No ratings yet
Static and Dynamic Hashing
12 pages
M.C.a. (Sem - II) Paper - I - Data Structures
No ratings yet
M.C.a. (Sem - II) Paper - I - Data Structures
132 pages
Prep Doc Coding Algo
No ratings yet
Prep Doc Coding Algo
100 pages
Dsa M5
No ratings yet
Dsa M5
38 pages
Hashing Data Structure
No ratings yet
Hashing Data Structure
22 pages
09 Hashtable
No ratings yet
09 Hashtable
53 pages
Lecture 7 - Hash - Table - Direct - Adreess - Tables - Hash - Tables - Intro - Separate - Chaining
No ratings yet
Lecture 7 - Hash - Table - Direct - Adreess - Tables - Hash - Tables - Intro - Separate - Chaining
77 pages
8 Hashtables
No ratings yet
8 Hashtables
84 pages
ADS Unit-2
No ratings yet
ADS Unit-2
53 pages
Hassing Dsa
No ratings yet
Hassing Dsa
28 pages
22CS302 LM21
No ratings yet
22CS302 LM21
7 pages
Unit 5 Session 5 Hashing
No ratings yet
Unit 5 Session 5 Hashing
20 pages
Hashing
No ratings yet
Hashing
35 pages
Unit 1 Hashing
No ratings yet
Unit 1 Hashing
69 pages
Hash
No ratings yet
Hash
7 pages
10 Tablas Hash
No ratings yet
10 Tablas Hash
44 pages
DS Module-X
No ratings yet
DS Module-X
74 pages
14 Hashing
No ratings yet
14 Hashing
61 pages
Hashing Techniques
No ratings yet
Hashing Techniques
15 pages
14 HashTable
No ratings yet
14 HashTable
38 pages
Module 5: HASHING: Functions. The Values Are Then Stored in A Data Structure Called Hash Table
No ratings yet
Module 5: HASHING: Functions. The Values Are Then Stored in A Data Structure Called Hash Table
39 pages
Finals Complexity and Algorithmn
No ratings yet
Finals Complexity and Algorithmn
49 pages
Hashing
No ratings yet
Hashing
18 pages
Weeks 10, 11 - Sessions 19, 20, 21, 22 - Chapter HashTables
No ratings yet
Weeks 10, 11 - Sessions 19, 20, 21, 22 - Chapter HashTables
90 pages
DSA Lec09 Hash Tables
No ratings yet
DSA Lec09 Hash Tables
35 pages
Hashing
No ratings yet
Hashing
8 pages
Hash Tables: A Detailed Description
No ratings yet
Hash Tables: A Detailed Description
10 pages
Hashing
No ratings yet
Hashing
7 pages
Unit 5 Data Structure
No ratings yet
Unit 5 Data Structure
12 pages
HAshing (Satish Sir)
No ratings yet
HAshing (Satish Sir)
52 pages
Hashing Unit 1
No ratings yet
Hashing Unit 1
91 pages
Module 5
No ratings yet
Module 5
72 pages
CSC508 Hashing
No ratings yet
CSC508 Hashing
35 pages
Idst 2016 SA 05 Hashing
No ratings yet
Idst 2016 SA 05 Hashing
68 pages
DS Lab Manual Updated
No ratings yet
DS Lab Manual Updated
78 pages
34 Hash Tables
No ratings yet
34 Hash Tables
44 pages
Chapter 4 Hashing and File Structure
No ratings yet
Chapter 4 Hashing and File Structure
46 pages
Bihar STET PGT (Computer Science) Official Paper-II (Held On - 12 Sept, 2023 Shift 1)
No ratings yet
Bihar STET PGT (Computer Science) Official Paper-II (Held On - 12 Sept, 2023 Shift 1)
36 pages
Hashing PPT For Student
No ratings yet
Hashing PPT For Student
53 pages
Hashing
No ratings yet
Hashing
23 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
32 pages
Assignment 3 PDF
No ratings yet
Assignment 3 PDF
5 pages
Hashing
No ratings yet
Hashing
42 pages
Hash Table Data Structure
No ratings yet
Hash Table Data Structure
34 pages
Hashing
No ratings yet
Hashing
44 pages
Week 12 Hashing
No ratings yet
Week 12 Hashing
24 pages
Hashing
No ratings yet
Hashing
30 pages
Hashing
No ratings yet
Hashing
37 pages
ADI Hashing
No ratings yet
ADI Hashing
47 pages
Lecture 3.Pptx 3
No ratings yet
Lecture 3.Pptx 3
24 pages
SORTING PROGRAMS - Counting + Bucket + Heap
No ratings yet
SORTING PROGRAMS - Counting + Bucket + Heap
27 pages
Hashing in Data Structures
No ratings yet
Hashing in Data Structures
27 pages
Lec12 Hash Tables 09092024 090609pm
No ratings yet
Lec12 Hash Tables 09092024 090609pm
48 pages
SRM LVC 10 Hashing
No ratings yet
SRM LVC 10 Hashing
19 pages
Lect Hashing
No ratings yet
Lect Hashing
36 pages
Hashing PDF
No ratings yet
Hashing PDF
56 pages
Hashing Algorithms
No ratings yet
Hashing Algorithms
22 pages
Hashing
No ratings yet
Hashing
56 pages
Hashing in Data Structure
No ratings yet
Hashing in Data Structure
43 pages
BCS304 DS Module 5 Notes
No ratings yet
BCS304 DS Module 5 Notes
45 pages
Hashing
No ratings yet
Hashing
34 pages
05 Hashing
No ratings yet
05 Hashing
47 pages
DS - Unit 5 - Notes
No ratings yet
DS - Unit 5 - Notes
8 pages
Hash Table: Didih Rizki Chandranegara
No ratings yet
Hash Table: Didih Rizki Chandranegara
33 pages
Hashing
No ratings yet
Hashing
10 pages
3.3SEM - IPCC - DS - CS322I2R and Lab Programs
No ratings yet
3.3SEM - IPCC - DS - CS322I2R and Lab Programs
11 pages
Hashing and Indexing
No ratings yet
Hashing and Indexing
28 pages
Searching: Kruse and Ryba CH 7.1-7.3 and 9.6
No ratings yet
Searching: Kruse and Ryba CH 7.1-7.3 and 9.6
64 pages
Algorithms Data Structure
No ratings yet
Algorithms Data Structure
9 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
27 pages
Hash Table
No ratings yet
Hash Table
9 pages
COP3530 Cheat Sheet Data Structures
No ratings yet
COP3530 Cheat Sheet Data Structures
2 pages
Lec 13 - Hashing
No ratings yet
Lec 13 - Hashing
43 pages
Hash Table v2
No ratings yet
Hash Table v2
34 pages
Data Structure Lab Manual
No ratings yet
Data Structure Lab Manual
51 pages
DSA Lab All Practicals
No ratings yet
DSA Lab All Practicals
56 pages
Dsa Practical Codes
No ratings yet
Dsa Practical Codes
39 pages
Com 322 Lecture Note1
No ratings yet
Com 322 Lecture Note1
20 pages
Data Structures Lab Manual For VTU 15CS38-DSL
100% (3)
Data Structures Lab Manual For VTU 15CS38-DSL
9 pages
Chapter 5
No ratings yet
Chapter 5
28 pages
Dio Phant
No ratings yet
Dio Phant
8 pages
Tutorial9 (With Ans)
No ratings yet
Tutorial9 (With Ans)
4 pages
DSC Question Bank Format WITH PO
No ratings yet
DSC Question Bank Format WITH PO
6 pages
EXP 10-Linear Quadratic
No ratings yet
EXP 10-Linear Quadratic
6 pages
DS Bits
No ratings yet
DS Bits
3 pages
The Tech Interview Playbook: From DSA to System Design
From Everand
The Tech Interview Playbook: From DSA to System Design
Chinmoy Mukherjee
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet

Hash Tables

Uploaded by

Hash Tables

Uploaded by

HASH TABLES

 As the amount of internet data keeps growing, there's a constant

 Step 2: Assign numerical values to alphabetical characters; for example, assign

 Step 5: Store the strings accordingly: 0 “abc”

Issues related with Hashing

Practical challenge. Need different approach for each key type.

Customized implementations. Integer, Double, String, File, URL, Date, …

“s” is the String object

hash = 31*hash + who.hashCode(); //for reference types, use hashCode()

In practice. Recipe works reasonably well; used in Java libraries.

private int hash(Key key){

Expect two balls in the same bin after tosses.

Challenge. Deal with collisions efficiently.

 M too small ⇒ chains too long.

 Typical choice: M ~ N / 5 ⇒ constant-time ops.

* under uniform hashing assumption

You might also like