0% found this document useful (0 votes)
21 views30 pages

Hash Tables

Hash tables use a hash function to map keys to values in an array. The main components of hashing are the key, hash function, and hash table. The hash function takes a key as input and outputs an index in the hash table where the corresponding value can be stored. Issues with hashing include collisions where different keys hash to the same index, which require collision resolution techniques. Computing effective hash functions that scramble keys uniformly is challenging. Java hashCode conventions require equal objects to have equal hash codes. Common types like integers, booleans, doubles, and strings have predefined hashCode implementations.

Uploaded by

dastanktl26
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views30 pages

Hash Tables

Hash tables use a hash function to map keys to values in an array. The main components of hashing are the key, hash function, and hash table. The hash function takes a key as input and outputs an index in the hash table where the corresponding value can be stored. Issues with hashing include collisions where different keys hash to the same index, which require collision resolution techniques. Computing effective hash functions that scramble keys uniformly is challenging. Java hashCode conventions require equal objects to have equal hash codes. Common types like integers, booleans, doubles, and strings have predefined hashCode implementations.

Uploaded by

dastanktl26
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

HASH TABLES

» HASH FUNCTION
» SEPARATE CHAINING
Department: Computer science
Course Code: CSS-215
Course Instructor: Asst. Prof. Dr. Mohammed Ala’anzy
Office no.: G-405
ST IMPLEMENTATIONS:
SUMMARY
Worst-case cost Average case
Implementation (After N inserts) (After N random inserts)
Search Insert Search hit Insert
Sequential search
(Unordered list) N N N/2 N
binary search
(ordered array) log N N log N N/2
BST N N 1.39 log N 1.39 log N
2-3 tree c Log N c Log N c Log N c Log N
red-black BST 2 log N 2 log N 1 log N* 1 log N*

Q. Can we do better?
A. Yes, but with different access to the data.
HASHING: INTRODUCTION
 Hashing is a process that transforms input data (of any size) into a
fixed-size string of characters, using a mathematical algorithm called a
hash function. This output, known as the hash code or hash value, is
typically used for indexing, data retrieval, or ensuring data integrity.

 As the amount of internet data keeps growing, there's a constant


struggle to handle it efficiently. Even in everyday programming, dealing
with data, although not massive, requires smooth and efficient storage,
access, and processing. To address this, one commonly used solution
is the Array data structure.
HASHING: INTRODUCTION
 The natural question arises: if Arrays already exist, why introduce a new
data structure? The answer lies in "efficiency." While storing data in an
Array takes O(1) time, searching requires at least O(log n) time. While
this may seem insignificant, for substantial datasets, it can lead to
significant problems, rendering the Array data structure inefficient.

 Now, the quest is for a data structure that enables both storing and
searching data in constant time, as O(1). This is where the Hashing
data structure becomes crucial. Its introduction allows for the effortless
storage and retrieval of data in constant time, addressing the efficiency
concerns that were previously a challenge.
THE MAIN COMPONENTS OF
HASHING
 Hashing involves three main components:

1. Key: The key serves as input to the hash function and can be a string or an integer.
It determines the index or location for storing an item in a data structure.

2. Hash Function: This function takes the input key and produces the hash index,
which is essentially the index of an element in an array known as a hash table.

3. Hash Table: The hash table is a data structure that utilizes a hash function to map
keys to values. It stores data in an associative manner within an array, where each
data value has a unique index, ensuring efficient retrieval based on the hashed key.
THE COMPONENTS OF
HASHING

Hash table
HASHING IMPLEMENTATION
 Consider a set of strings, for example, {“ab”, “cd”, “efg”,} that needs to be stored in a

table. The primary objective is to efficiently search or update values within the table in
O(1) time, with no emphasis on the ordering of the strings. In this context, the given
set of strings can serve as keys, and each string itself can act as its corresponding
value. The key challenge lies in determining the most effective method to store the
value associated with each key.
INDEX FINDING IN THE HASH
TABLE
 Step 1: Hash functions utilize mathematical formulas to determine a hash value,
which serves as the index in the data structure for storing the corresponding value.

 Step 2: Assign numerical values to alphabetical characters; for example, assign


 "x" = 5, "y" = 10, “z" = 25,
 “a” = 1, “b” = 3, “c” = 6,
 “n” = 8, “d” = 18, and so forth.

 Step 3: Calculate the numerical value by summing all characters in the string. For
instance, "xyz" would be 5 + 10 + 25 = 40.
INDEX FINDING IN THE HASH
TABLE
 Step 4: Assume a table of size 5 is available to store these strings.
Using the hash function (sum of characters in key mod Table size),
compute the location in the array by taking the sum(string) mod 5. “xyz”

 Step 5: Store the strings accordingly: 0 “abc”


 "abc" at location 10 mod 5 = 0, 1
 “and” at location 27 mod 5 = 2, 2 “and”
 "xyz" at location 40 mod 5 = 0, (handle any collision as needed, like using a 3
linked list or another method).
4
 The described method allows us to determine the position of a specific string
Hash table
through a straightforward hash function and swiftly retrieve the corresponding
stored value at that location. Thus, the concept of hashing appears to be an
effective approach for storing pairs of data (key, value) in a table.
HASHING: BASIC PLAN
Save items in a key-indexed table (index is a function of the key).
Hash function. Method for computing array index from key.

Issues related with Hashing


 Computing the hash function. (It can be easy for some data and it can be more harder for some
complicated data).
 Equality test: Method for checking whether two keys are equal. Instead of doing compare we will do
equality test.
 Collision resolution: Algorithm and data structure to handle two keys that hash to the same array index.
Since there are too many possible values. So we need a collision technique to solve this issue.
Classic space-time tradeoff.
 No space limitation: trivial hash function with key as index.
 No time limitation: trivial collision resolution with sequential search.
 Space and time limitations: hashing (the real world). We should consider both.
COMPUTING THE HASH
FUNCTION Key
Idealistic goal. Scramble the keys uniformly to produce a table index. There are two
requirements.
 Efficiently computable.
 Each table index equally likely for each key. (thoroughly researched problem, still
problematic in practical applications).
Ex 1. Phone numbers.
 Bad: first three digits.
 Better: last three digits. Table index

Practical challenge. Need different approach for each key type.


JAVA’S HASH CODE
CONVENTIONS
 All Java classes inherit a method hashCode(), which returns a 32-bit int.
Requirement. If x.equals(y), then (x.hashCode() == y.hashCode()).
Highly desirable. If !x.equals(y), then (x.hashCode() != y.hashCode()).

Customized implementations. Integer, Double, String, File, URL, Date, …


IMPLEMENTING HASH CODE:
INTEGERS, BOOLEANS, AND
DOUBLES
 Java library implementations

public final class Integer{ public final class Boolean{ public final class Double {
private final int value; private final boolean value; private final double value;
... ... ...
public int hashCode(){ public int hashCode(){ public int hashCode(){
return value; if (value) return 1231; long bits =
} else return 1237; doubleToLongBits(value);
} } return (int) (bits ^ (bits >>> 32));
} }
}
convert to IEEE 64-bit
representation;
xor most significant 32-bits
with least significant 32-bits
IMPLEMENTING HASH CODE:
STRINGS
 Java library implementation
char Unicode
public final class String{ • Horner's method to hash string of length
private final char[] s; … …
L: L multiplies/adds.
... ‘a’ 97
public int hashCode(){ • Equivalent to h = s[0] · + … + s[L – 3] · +
int hash = 0; ‘b’ 98
s[ L – 2 ] · + s[ L – 1 ] · .
for (int i = 0; i < length(); i++) ‘c’ 99
hash = s[i] + (31 * hash); … …
return hash;
} EX. String s = "call";
} int code = s.hashCode();
character of s 3045982 = 99· + 97· + 108· + 108·
= 108 + 31· (108 + 31 · (97 +
31 · (99)))
(Horner's method)
hashCode()
 The hashCode() method in Java calculates a hash code for a String object. The hash
code is a 32-bit integer that is used to identify the String object and to store it in a hash
table. The hash code is calculated using the following formula:
s[0]* + s[1]* + ... + s[n-1]

where:

“s” is the String object


“n” is the length of the String object
The power of each prime is the exponential operator
“31” is a prime number that is used to improve the distribution of hash codes
IMPLEMENTING HASH CODE:
USER-DEFINED TYPES
public final class Transaction implements Comparable<Transaction> {
private final String who;
private final Date when;
private final double amount;
public Transaction(String who, Date when, double amount)
{ /* as before */ }
...
public boolean equals(Object y)
{ /* as before */ }
public int hashCode() {
int hash = 17; //nonzero constant

hash = 31*hash + who.hashCode(); //for reference types, use hashCode()


hash = 31*hash + when.hashCode(); //for primitive types, use hashCode() of wrapper type
hash = 31*hash + ((Double) amount).hashCode(); //the 31 is typically a small prime
return hash; }
}
HASH CODE DESIGN
 "Standard" recipe for user-defined types.
 Combine each significant field using the 31x + y rule.
 If field is a primitive type, use wrapper type hashCode().
 If field is null, return 0.
 If field is a reference type, use hashCode(). (applies rule recursively)
 If field is an array, apply to each entry. (Or use Arrays.deepHashCode())

In practice. Recipe works reasonably well; used in Java libraries.


In theory. Keys are bitstring; "universal" hash functions exist.

Basic rule. Need to use the whole key to compute hash code;
MODULAR HASHING
 What we get back from the hashcode is an int between and .
 Hash function. An int between 0 and M - 1 (for use as array index). typically a prime or power of 2

private int hash(Key key){


return Math.abs(key.hashCode()) % M;
}
UNIFORM HASHING
ASSUMPTION
 Uniform hashing assumption. Each key is equally likely to hash to an integer between
0 and M - 1.
 Bins and balls. Throw balls uniformly at random into M bins.

Collisions

Expect two balls in the same bin after tosses.


Expect every bin has ≥ 1 ball after ~ M ln M tosses.
Load balancing. After M tosses, expect most loaded bin has Θ ( log M / log log M )
balls.
UNIFORM HASHING
ASSUMPTION
 Uniform hashing assumption. Each key is equally likely to hash to an integer between
0 and M - 1.

Java's String data uniformly distribute the keys of Tale of Two Cities
HASH TABLES
» HASH FUNCTION
» SEPARATE CHAINING
HASH TABLES
» HASH FUNCTION
» SEPARATE CHAINING
COLLISIONS
Collision. Two distinct keys hashing to same index.
 Birthday problem ⇒ can't avoid collisions unless you have a ridiculous (quadratic)
amount of memory.

Challenge. Deal with collisions efficiently.


COLLISION
 A occurs when two different inputs produce the same hash code or hash value. In
other words, two distinct keys are mapped to the same index in a hash table or have
the same hash code.

 Hash functions are designed to distribute values across a range of possible hash
codes, ideally minimizing collisions. However, due to the finite range of hash codes
and the infinite potential inputs, collisions are inevitable in hash functions. The
challenge is to handle collisions effectively to ensure the proper functioning of hash-
based data structures.
WAYS TO SOLVE THE
COLLISION ISSUE
There are various techniques to handle collisions:
1. Separate Chaining: Each bucket in the hash table is a linked list, and multiple values that hash to
the same index are stored in this linked list.
2. Open Addressing: The system looks for the next available slot when a collision occurs. There are
different strategies like linear probing, quadratic probing, and double hashing.
3. Robin Hood Hashing: Similar to linear probing, but when inserting a new element, it compares the
distance it has traveled with the distance of the element already present.

 Handling collisions is crucial because it ensures that hash-based data structures, like hash tables or
hash maps, maintain their efficiency in terms of constant-time (O(1)) average lookup, insertion, and
deletion operations. The choice of collision resolution strategy depends on the specific requirements
and characteristics of the application.
SEPARATE CHAINING
SYMBOL TABLE
Use an array of M < N linked lists. [H. P. Luhn, IBM 1953]
 Hash: map key to integer i between 0 and M - 1.
 Insert: put at front of chain (if not already there).
 Search: need to search only chain.
ANALYSIS OF SEPARATE
CHAINING
Consequence. Number of probes for search/insert is proportional to N / M.
 M too large ⇒ too many empty chains.

 M too small ⇒ chains too long.

 Typical choice: M ~ N / 5 ⇒ constant-time ops.


SEPARATE CHAINING ST: JAVA
IMPLEMENTATION
public class SeparateChainingHashST<Key, Value> { array doubling and
private int M = 97; // number of chains
halving code omitted
private Node[] st = new Node[M]; // array of chains
private static class Node {
private Object key; //no generic array creation
private Object val; //(declare key and value of type Object)
private Node next;
...
}
private int hash(Key key) {
return (key.hashCode() & 0x7fffffff) % M; }
public Value get(Key key) {
int i = hash(key);
for (Node x = st[i]; x != null; x = x.next)
if (key.equals(x.key)) return (Value) x.val;
return null;}}
ST IMPLEMENTATIONS:
SUMMARY
Worst-case cost Average case
Implementation (After N inserts) (After N random inserts)
Search Insert Search hit Insert
Sequential search
N N N/2 N
(Unordered list)
binary search
log N N log N N/2
(ordered array)
BST N N 1.39 log N 1.39 log N
2-3 tree c Log N c Log N c Log N c Log N
red-black BST 2 log N 2 log N 1 log N 1 log N
separate chaining Log N* Log N* 3-5* 3-5*

* under uniform hashing assumption


Thank you!

You might also like