0% found this document useful (0 votes)
6 views55 pages

Unit 2

Hashing is a method for computing addresses for data storage using keys and hash functions, with hash tables serving as an efficient data structure for quick data access. While hash tables offer advantages such as constant time complexity for operations, they also face challenges like collisions and memory overhead. Various hash functions and collision resolution techniques, including open hashing, are essential for optimizing performance and managing data effectively.

Uploaded by

lixaxaj663
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views55 pages

Unit 2

Hashing is a method for computing addresses for data storage using keys and hash functions, with hash tables serving as an efficient data structure for quick data access. While hash tables offer advantages such as constant time complexity for operations, they also face challenges like collisions and memory overhead. Various hash functions and collision resolution techniques, including open hashing, are essential for optimizing performance and managing data effectively.

Uploaded by

lixaxaj663
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 55

INTRODUCTION

1. Hashing is finding an address where the data is


to be stored as well as located using a key with
the help of the algorithmic function.

2. Hashing is a method of directly computing the


address of the record with the help of a key by
using a suitable mathematical function called
the hash function.

3. A hash table is an array-based structure used to


store <key, information> pairs
INTRODUCTION
4. Hash Table: A hash table is a data structure that
stores records in an array, called a hash table. A
Hash table can be used for quick insertion and
searching.
key

Hash(key) Address
INTRODUCTION
• For array to store a record in a hash table, hash
function is applied to the key of the record
being stored, returning an index within the
range of the hash table.

• The item is then stored in the table of that index


position.
HASH TABLE
• Hash table uses a special function known as a hash
function that maps a given value with a key to access the
elements faster.

• A Hash table is a data structure that stores some


information, and the information has basically two main
components, i.e., key and value. The hash table can be
implemented with the help of an associative array. The
efficiency of mapping depends upon the efficiency of the
hash function used for mapping.
ADVANTAGES OF HASH
TABLE
Here, are pros/benefits of using hash tables:

1. Hash tables have high performance when looking up data,


inserting, and deleting existing values.
2. The time complexity for hash tables is constant regardless
of the number of items in the table.
3. They perform very well even when working with
large datasets.
DISAVANTAGES OF HASH
TABLE
Here, are cons of using hash tables:

1. You cannot use a null value as a key.


2. Collisions cannot be avoided when generating keys using
hash functions. Collisions occur when a key that is already
in use is generated.
3. If the hashing function has many collisions, this can lead
to performance decrease.
OPERATIONS ON HASH
TABLE
Here, are the Operations supported by Hash tables:

1. Insertion – this Operation is used to add an element to the


hash table
2. Searching – this Operation is used to search for elements
in the hash table using the key
3. Deleting – this Operation is used to delete elements from
the hash table
All have avg case of O(1) and worst case of O(n)
APPLICATIONS OF HASH
TABLE
Real-world Applications
In the real-world, hash tables are used to store data
for:

1. Databases
2. Associative array
3. Sets
4. Memory cache
HASH TABLE
PROPERTIES OF HASH FUNCTION
1) Hash function should be simple to computer.

2) Number of collision should be less

3) The hash function uses all the input data

4) The hash function "uniformly" distributes the data across the entire set of

possible hash values.

5) The hash function generates very different hash values for similar strings.
HASH FUNCTION
• A function that maps a key into the range [0 to Max − 1], the
result of which is used as an index (or address) to hash table for
storing and retrieving record
BUCKET
• Bucket is an index position in hash table that can store more
than one record
• When the same index is mapped with two keys, then both
the records are stored in the same bucket
COLLISION
• The result of two keys hashing into the same address is
called collision
PROBE
• Each calculation of an address and test for success is
known as
Probe.
OVERFLOW
• The result of more keys hashing to the same address and if
there is no room in the bucket, then it is said that overflow has
occurred
LOAD FACTOR
Load Factor: The load factor of a hash table is a measure of
how full the table is.
It is defined as the ratio of the number of elements (n) in the hash
table to the number of slots (buckets) (m) available in the hash
table.

Load Factor= n/m


LOAD FACTOR
Low Load Factor: This means most buckets are empty, leading to efficient
insertions and lookups because there are fewer collisions.
High Load Factor: As the table becomes more populated, the likelihood of
collisions increases, which can degrade the performance of insertions,
deletions, and lookups.
Resizing: When the load factor exceeds a certain threshold (often 0.7 or
70%), the hash table is typically resized (often doubled) to maintain
efficient operations. Resizing involves rehashing all existing elements into
a new, larger table.
LOAD
DENSITY
It typically refers to the distribution of elements within the hash table
and can imply how evenly the elements are spread across the buckets.

•Even Load Density: This indicates that elements are uniformly


distributed across the buckets, minimizing collisions and maintaining
efficient performance.

•Uneven Load Density: When elements are clumped together in a few


buckets, it leads to higher collision rates and can degrade performance.
EXAMPLE OF LOAD FACTOR
Consider a hash table with 10 buckets (slots) and 7 elements.

Load Factor=n/m=7/10=0.7

This means that 70% of the hash table is occupied. If the load
factor exceeds a certain threshold (e.g., 0.7), the hash table may
need to be resized to maintain efficient operations.
EXAMPLE OF LOAD DENSITY
To illustrate load density, let's consider how elements are distributed across the
buckets in the hash table. Suppose we have the following distribution of elements:
•Bucket 0: 2 elements
•Bucket 1: 1 element
•Bucket 2: 0 elements
•Bucket 3: 1 element
•Bucket 4: 0 elements
•Bucket 5: 2 elements
•Bucket 6: 0 elements
•Bucket 7: 1 element
•Bucket 8: 0 elements
•Bucket 9: 0 elements

Load Density Observation:


•Even Distribution: If elements were evenly distributed, each bucket would have
close to nm=710=0.7\frac{n}{m} = \frac{7}{10} = 0.7mn​=107​=0.7 elements.
•Actual Distribution: In this case, the distribution is uneven. Some buckets have 2
elements, some have 1, and others have none.
The uneven distribution (load density) indicates that certain buckets are more
"loaded" than others, which could lead to higher collision rates in those buckets.
TYPES OF HASH FUNCTION
There are three ways of calculating the hash function:
1. Division method
2. Folding method
3. Mid square method
In the division method, the hash function can be defined as:
h(k) = k % m;
where m is the size of the hash table.

For example, if the key value is 6 and the size of the hash table is
10. When we apply the hash function to key 6 then the index would be:
h(6) = 6%10 = 6
The index is 6 at which the value is stored.
TYPES OF HASH FUNCTION
1. Division Method:

This is the most simple and easiest method to generate a hash value. The
hash function divides the value 'k' by 'M' and then uses the remainder
obtained.
Example:
k = 12345
Formula: M = 95
h(12345) = 12345 mod 95
h(K) = k mod M = 90
k = 1276
M = 11
Here,
h(1276) = 1276 mod 11
k is the key value, and
= 0
M is the size of the hash table.
TYPES OF HASH FUNCTION
2. The mid square method is a very good hashing method.
It involves two steps to compute the hash value-
Square the value of the key 'k' i.e. k2
Extract the middle 'r' digits as the hash value.
Formula:
Example:
Suppose the hash table has 100 memory locations.
So, r = 2 because two digits are required to map
h(K) = h(k x k) the key to the memory location.
k = 60
Here, k x k = 60 x 60
= 3600
k is the key value. h(60) = 60
The hash value obtained is 60
TYPES OF HASH FUNCTION
3. Digit Folding Method : This method involves two steps:
Divide the key-value 'k' into a number of parts i.e. k1, k2, k3,….,kn, where each
part has the same number of digits except for the last part that can have lesser
digits than the other parts.
Add the individual parts. The hash value is obtained by ignoring the last carry if
any.
Formula:

k = k1, k2, k3, k4, ….., kn


s = k1+ k2 + k3 + k4 +….+
kn h(K)= s

Here,
s is obtained by adding the
TYPES OF HASH FUNCTION
3. Digit Folding Method :

Example:

k = 12345
k1 = 12, k2 = 34, k3 = 5
s = k1 + k2 + k3
= 12 + 34 + 5
= 51
h(K) = 51
TYPES OF HASH FUNCTION
4. Digit Extraction:
Consider a key value of 246813579 and a hash table size of 100.

Select Specific Digits (e.g., 2nd, 4th, and 6th digits):


2nd Digit: 4
4th Digit: 8
6th Digit: 3
Combine the Digits (e.g., by concatenation or summation)
Concatenation: 483
Summation: 4 + 8 + 3 = 15
Hash Value (if using summation):
Hash Value=15
TYPES OF HASH FUNCTION
5. Radix Transformation

Radix transformation in hashing involves converting a key from one numeric base (radix)
to another. This method can be particularly useful when dealing with non-integer keys,
such as strings, where characters are mapped to numeric values based on their position in
a character set. The transformed key can then be used in hashing algorithms to generate
hash values.

Steps for Radix Transformation


1. Convert the Key to a Numeric Base: Convert the key (which could be a string, for
example) into a numeric representation based on a chosen radix (base).
2. Combine the Numeric Values: Combine these numeric values into a single number,
often using polynomial accumulation.
3. Apply the Hash Function: Use the numeric value as input to a hash function to
generate the hash value.
TYPES OF HASH FUNCTION
Example
Consider the string key "abc" and a radix (base) of 26 (for the 26 letters of
the alphabet).

1.Assign Numeric Values to Characters:


1. 'a' -> 0
2. 'b' -> 1
3. 'c' -> 2

2.Combine the Numeric Values:


1. Treat the string as a number in base 26.
2. "abc" in base 26 can be represented as 0×262+1×261+2×260.

3.Calculate the Combined Value: 0×262+1×261+2×260=0+26+2=28

4. Hash Value:
1. Use the combined numeric value (28) as input to a hash function.
2. If the hash table size is 10, for example:
Hash Value=28%10=8
TYPES OF HASH FUNCTION
6. Universal Hash Functions
Universal hash functions are a class of hash functions designed to minimize the chances of collision for any
given set of keys. They provide a probabilistic guarantee that the hash function chosen from a family of hash
functions will distribute keys uniformly across the hash table.
A family of hash functions H is said to be universal if, for any two distinct keys x and y, the probability that
they collide (i.e., h(x)=h(y)) is at most 1/m, where 'm’ is the number of possible hash values.

Construction of Universal Hash Functions


One common method to construct a universal family of hash functions is to use modular arithmetic with
random coefficients. Here’s an example construction:
Example: Hash Functions of the Form ha,b(x)=((a⋅x+b)mod p)mod m
Parameters:
• p: A prime number larger than the maximum possible key value.
• m: The size of the hash table.
• a and b: Random integers chosen from {0,1,2,…,p−1} with a≠0a.

Hash Function: ha,b(x)=((a⋅x+b)mod p)mod m

This construction ensures that the hash function h a,b is chosen uniformly at random from a family of
hash functions and minimizes collisions.
COLLISION
When the two different values have the same key, then the problem occurs
between the two values, known as a collision. In the above example, the value is
stored at index 6. If the key value is 26, then the index would be:

h(26) = 26%10 = 6

Therefore, two values are stored at the same index, i.e., 6, and this leads to the
collision problem. To resolve these collisions, we have some techniques known
as collision techniques.

The following are the collision techniques:


Open Hashing: It is also known as closed addressing.
Closed Hashing: It is also known as open addressing.
Collision Resolution
Techniques
In Hashing, collision resolution techniques are classified as-
OPEN HASHING
Open hashing, also known as separate chaining, is a technique used to handle collisions in hash
tables. In open hashing, each bucket of the hash table contains a linked list (or another data
structure) that stores all elements that hash to the same bucket. This approach allows multiple
elements to occupy the same bucket without overwriting each other, making it easier to handle
collisions.
OPEN HASHING
Let's first understand the chaining to resolve the collision.
Suppose we have a list of key values
A = 3, 2, 9, 6, 11, 13, 7, 12 where m = 10, and h(k) = 2k+3
In this case, we cannot directly use h(k) = ki/m as h(k) =
2k+3

The index of key value 3 is:


index = h(3) = (2(3)+3)%10 = 9
The value 3 would be stored at the index 9.

The index of key value 2 is:


index = h(2) = (2(2)+3)%10 = 7
The value 2 would be stored at the index 7.
OPEN HASHING
The index of key value 9 is:
index = h(9) = (2(9)+3)%10 = 1
The value 9 would be stored at the index 1.

The index of key value 6 is:


index = h(6) = (2(6)+3)%10 = 5
The value 6 would be stored at the index 5.

The index of key value 11 is:


index = h(11) = (2(11)+3)%10 = 5
The value 11 would be stored at the index
OPEN HASHING
The index of key value 13 is:
index = h(13) = (2(13)+3)%10 = 9
The value 13 would be stored at index
9.

The index of key value 7 is:


index = h(7) = (2(7)+3)%10 = 7
The value 7 would be stored at index 7.

The index of key value 12 is:


index = h(12) = (2(12)+3)%10 = 7
OPEN HASHING
Advantages of Open Hashing
1.Dynamic Size: Linked lists can grow dynamically, allowing for efficient use of space
even when the number of elements exceeds the number of buckets.
2.Simple Collision Resolution: Collisions are easily managed by inserting elements
into the linked list of the corresponding bucket.
3.Deletion: Deletion is straightforward and does not require rehashing.

Disadvantages of Open Hashing


4.Memory Overhead: Each node in the linked list requires additional memory for the
pointer, which can lead to higher memory usage.
5.Performance Degradation: If many elements hash to the same bucket, the
performance of search, insert, and delete operations can degrade to O(n) in the worst
case.
6.Cache Inefficiency: Linked lists can lead to cache inefficiency due to their non-
contiguous memory allocation.
CLOSED HASHING
Closed hashing, also known as open addressing, is a technique used to
handle collisions in hash tables. In closed hashing, all elements are
stored directly in the hash table array itself, and collisions are resolved
by probing—finding the next available slot in the array according to a
specified probing sequence. There are several common probing methods,
including linear probing, quadratic probing, and double hashing.

Probe number: after how many times is the key getting placed
e.g. 1,2
PROBING
Method Description

This method searches for empty slots


linearly starting from

Linear probing
the position where the collision
occurred and moving forward. If the
end of the list is reached and no
empty slot is found. The probing
starts at the beginning of the list.
Quadratic This method uses quadratic polynomial
probing expressions to find the next available
free slot.
This technique uses a secondary
Double
hash function algorithm to find
Hashing
the next free available slot.
CLOSED HASHING
Linear Probing
Each cell in the hash table contains a key-value pair, so when the
collision occurs by mapping a new key to the cell already
occupied by another key, then linear probing technique searches
for the closest free locations and adds a new key to that empty
cell. In this case, searching is performed sequentially, starting from
the position where the collision occurs till the empty cell is not
found.
CLOSED
Let us consider a simple hash function as “key mod 7” and
HASHING
a sequence of keys as 50, 700, 76, 85, 92, 73, 101.
Advantages of Linear Probing
1.Simplicity: Linear probing is simple to implement and understand.
2.Cache Performance: Linear probing tends to have good cache performance due to its
sequential access pattern.
3.No Additional Memory: All elements are stored within the hash table array itself, leading
to better memory utilization.

Disadvantages of Linear Probing


4.Primary Clustering: Linear probing can lead to primary clustering, where long sequences
of occupied slots form, increasing the average search time.
5.Performance Degradation: As the load factor approaches 1, the performance of
insertions, deletions, and searches degrades significantly.
6.Deletion Complexity: Deleting an element requires careful handling to ensure the integrity
of the probing sequence.
CLOSED HASHING
Quadratic Probing
In quadratic probing, if a collision occurs at index i, the next slot
to check is determined by a quadratic function. The general
formula for quadratic probing is:

h(k,i)=(h(k)+c1⋅i+c2⋅i2)mod m
where:
•h(k) is the primary hash function.
•‘i’ is the probe number (starting from 0).
•c1 and c2 are constants.
•m is the size of the hash table.

A common choice is c1=c2=1 which simplifies the formula to:


h(k,i)=(h(k)+i+i2)mod m

Quadratic probing helps reduce clustering, a common problem


CLOSED HASHING
Advantages of Quadratic Probing
1. No Extra Space Required: Utilizes the existing array for collision resolution without needing additional data structures.
2. Reduced Primary Clustering: Less prone to primary clustering compared to linear probing, as it spreads out probes,
leading to better distribution.
3. Improved Performance: Can perform better for certain access patterns, reducing the average time for insertions and
searches.

Disadvantages of Quadratic Probing


1. No Guarantee of Finding an Empty Slot: It doesn’t guarantee to get a place in table for a function as it repeats and
will stuck into infinite loop
2. Increased Complexity: More complex to implement than linear probing due to the quadratic calculation.
3. Performance Degradation: As the load factor increases, insertion and lookup times can degrade significantly.
4. Potential for Secondary Clustering: While primary clustering is reduced, secondary clustering can still occur.
CLOSED HASHING
Double hashing : It is a collision resolving technique in Open
Addressed Hash tables. Double hashing uses the idea of applying a
second hash function to key when a collision occurs.
CLOSED HASHING
Double hashing :
Double hashing can be done using :
(hash1(key) + i * hash2(key)) % TABLE_SIZE
Here hash1() and hash2() are hash functions and TABLE_SIZE
is size of hash table.
(We repeat by increasing i when collision occurs)

First hash function is typically hash1(key) = key %


TABLE_SIZE
A popular second hash function is : hash2(key) = PRIME – (key
% PRIME) where PRIME is a prime smaller than the TABLE_SIZE.
CLOSED HASHING
Double hashing : A good second Hash function
is:
1. It must never evaluate to zero
2. Must make sure that all cells can be probed
Advantages of double hashing
1. No Extra Space Required: Utilizes the existing array for collision resolution without needing additional data structures.
2. Avoids Primary Clustering: Double hashing eliminates primary clustering because the probe sequence is determined
by a second hash function, resulting in better spread across the table.
3. Minimizes Secondary Clustering: Keys that hash to the same initial position have different probe sequences due to
the second hash function, reducing the chances of secondary clustering.
4. Uniform Distribution: Produces a more uniform distribution throughout the hash table, making it one of the most
efficient probing methods for resolving collisions.

Disadvantages of double hashing


1. Complex Implementation: Requires two carefully chosen hash functions, making it more complex to implement
compared to simpler techniques like linear or quadratic probing.
2. Performance Overhead: Calculating two hash values for every probe increases computational overhead.
3. Risk of Infinite Loops: If the table size is not chosen carefully (e.g., not a prime number), it can result in probe
sequences that do not explore all slots, leading to infinite loops.
Bucket addressing
• Bucket addressing is a method used to solve collisions in hash tables by storing
colliding elements in the same address position, called a bucket. Each bucket is a
block of space large enough to hold multiple items.
• Key Points:
1.Collision Handling:
1. Collisions are not entirely avoided. If a bucket becomes full, the item needs to be stored
elsewhere.
2.Open Addressing Approaches:
1. Linear Probing: The colliding item is stored in the next available bucket slot.
2. Quadratic Probing: The colliding item is stored in a different bucket based on a quadratic
function.
3.Overflow Area:
1. Buckets can point to an overflow area where additional space is allocated for colliding items.
2. A marker (yes/no or a specific position) is used to indicate if overflow storage should be
checked.
4.Chaining Option:
1. Linked lists can be used in conjunction with buckets, where the marker points to the starting
position of the linked list in the overflow area.
• This method offers flexibility in handling collisions but may still require additional
mechanisms if the buckets fill up.
Deletion
• With a chaining method, deleting an element leads to
the deletion of a node from a linked list holding the
element.
Deletion
• Consider the table in which the keys are stored using linear probing.

• The keys have been entered in the


following order: A1,A4,A2,B4,B1.
• After A4 is deleted and position 4 is
freed
• we try to find B4 by first checking
position 4.
• But this position is now empty, so we
may conclude that B4 is not in the table.
The same result occurs after deleting A2
and marking cell 2 as empty. Then, the
search for B1 is unsuccessful, because if
we are using linear probing, the search
terminates at position 2. The situation is
the same for the other open addressing
methods.
Deletion
• If we leave deleted keys in the table with markers indicating that they are not valid elements of the
table, any subsequent search for an element does not terminate prematurely.
• When a new key is inserted, it overwrites a key that is only a space filler.
• However, for a large number of deletions and a small number of additional insertions, the table
becomes overloaded with deleted records, which increases the search time because the open
addressing methods require testing the deleted elements.
• Therefore, the table should be purged after a certain number of deletions by moving undeleted
elements to the cells occupied by deleted elements. Cells with deleted elements that are not
overwritten by this procedure are marked as free.
Collision Techniques
Separate Chaining Closed hashing
Keys are stored inside the hash All the keys are stored only
table as well as outside the hash inside the hash table.
table.
The number of keys to be stored The number of keys to be stored
in the hash table can even in the hash table can never
exceed the size of the hash table. exceed the size of the hash table.
Deletion is easier. Deletion is difficult.
Extra space is required for the
No extra space is required.
pointers to store the keys outside
the hash table.

Cache performance is poor. Cache performance is better.


This is because of linked lists This is because here no linked
which store the keys outside lists are used.
the hash table.
Some buckets of the hash table
Buckets may be used even if
are never used which leads to

You might also like