0% found this document useful (0 votes)
18 views13 pages

Mod 5

Uploaded by

aadhyanayak1303
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views13 pages

Mod 5

Uploaded by

aadhyanayak1303
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Hashing and Algorithm Explanation

Hashing:

Hashing is a process of converting input data of arbitrary size into a fixed-size value, known as a hash or
hash code. It is widely used in data structures, cryptography, and databases for quick data retrieval,
verification, and ensuring data integrity.

Characteristics of Hashing:

1. Fixed Output: A hashing algorithm always produces a fixed-length output, regardless of the input
size.
2. Deterministic: The same input always produces the same hash value.
3. Efficiency: The computation of hash values is fast and efficient.
4. Collision Handling: A good hashing algorithm minimizes the chances of collisions (when different
inputs produce the same hash value).

Example: Simple Hashing Algorithm - Division Method:

1. Algorithm:
Select a hash table size M .
Compute the hash value as h(k) = k mod M , where k is the input key.
Store the key-value pair in the slot indexed by h(k).
2. Steps:
Input: Key = 50, Table Size M = 7.
Compute: 50 mod 7 = 1.
Store: Insert key at index 1 in the hash table.

This method is simple and commonly used in hash table implementations, though collision handling
techniques like chaining or open addressing are necessary.

Dynamic Hashing and Linear Hashing

1. Dynamic Hashing

Dynamic hashing adjusts the hash table size dynamically as data grows, minimizing collisions and
wasted space. It is suitable for applications with fluctuating data sizes.

Technique:
Uses a directory pointing to buckets.
Hash values are computed using a hash function. The directory grows by doubling its size
when a bucket overflows.
Example: Extendible Hashing.
Example:

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 1/13
Initial directory:
Hash function: h(k) mod 2d , where d is directory depth.
Keys: k = 5, 3, 10, 8, inserted into two buckets.
When bucket 1 overflows, directory expands, and bucket splits into two.
Figure: A directory with pointers to buckets, showing bucket splitting.

2. Linear Hashing

Linear hashing avoids the overhead of a directory by gradually expanding the hash table one bucket at a
time.

Technique:
Starts with a small number of buckets.
Splits buckets sequentially when they overflow.
Uses multiple hash functions for progressive splits: h0 (k), h1 (k), ….
​ ​

Example:
Initial buckets: 0, 1.
Overflow in bucket 0 triggers split into 0, 2.
Hash function transitions to accommodate new buckets.
Figure: Hash table showing sequential bucket splitting.

Comparison:

Dynamic hashing allows random bucket splitting; linear hashing ensures sequential growth.
Linear hashing is simpler but less space-efficient for frequent splits.

Extendible Hashing

Extendible hashing is a dynamic hashing technique that adjusts its directory structure dynamically to
accommodate growing data, while minimizing collisions and maintaining efficiency. It is particularly
useful for applications where the dataset size is unpredictable.

How Extendible Hashing Works

1. Directory and Buckets:


A directory contains pointers to buckets, where the actual data resides.
Each bucket can hold a limited number of entries.
2. Hash Function:
The hash value is calculated using a hash function, e.g., h(k).
Only the first d bits (global depth) of the hash value are used to index the directory.
3. Overflow Handling:
When a bucket overflows:
The bucket is split into two, using one more bit of the hash value (local depth increases).
The directory size doubles if required (global depth increases).
4. Example:

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 2/13
Directory with d = 2 (4 slots):
Keys: k = 5, 3, 10, 8.
If k = 10 overflows bucket 00, split the bucket and use d = 3.
5. Advantages:
Avoids over-allocation of memory.
Allows dynamic scaling based on data growth.
6. Diagram:
A directory pointing to buckets, showing the expansion and bucket splitting process.

Performance of Extendible Hashing

1. Time Complexity:
Search: O(1), as the hash value directly determines the bucket through the directory.
Insertion: O(1) for inserting into a non-full bucket. Splitting a bucket requires recalculating
and redistributing keys, which is O(n) for the bucket size.
Deletion: O(1), with possible directory adjustments if buckets become empty.
2. Space Utilization:
Efficient use of space by allocating new buckets only when required.
Doubling the directory size may temporarily increase memory usage, but it avoids
unnecessary bucket creation.
3. Scalability:
Extendible hashing adapts dynamically to data growth, maintaining performance even as
data increases significantly.
4. Collision Handling:
Handles collisions efficiently by splitting buckets and increasing the directory size, preventing
performance degradation.
5. Advantages:
Uniform key distribution minimizes bucket overflows.
Allows concurrent access and updates with minimal locking in distributed systems.
6. Disadvantages:
Directory doubling can cause a slight performance drop during resizing.
Maintaining a directory may consume more memory compared to static hashing.

Conclusion: Extendible hashing offers excellent performance for dynamic datasets with minimal
overhead, balancing time and space efficiency.

Hashed Files on CD-ROM

Hashed files on a CD-ROM involve organizing and accessing data using a hashing mechanism. This
method is commonly used for efficient retrieval in read-only storage systems like CD-ROMs, where data
cannot be updated once written.

Key Features
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 3/13
1. Static Nature:
Since CD-ROMs are read-only, the hash table is precomputed and stored alongside the data
during the mastering process.
No dynamic resizing or bucket splitting is required.
2. Hash Table Structure:
A fixed-size hash table is used to map keys to data blocks.
The hash function ensures an even distribution of keys across the table to minimize collisions.
3. Collision Handling:
Techniques like chaining (storing collided keys in linked lists) or open addressing (probing the
next available slot) are used to handle collisions.
4. Advantages:
Fast Retrieval: Direct indexing into the hash table allows O(1) access time.
Space Efficiency: Precomputed hash tables avoid the need for dynamic adjustments.
5. Limitations:
No ability to update or resize the hash structure.
Poorly chosen hash functions can lead to clustering and inefficient space usage.

Example

A music library on a CD-ROM uses hashed file organization to quickly retrieve songs by unique IDs.
Hashing maps each song's ID to its location on the CD-ROM.

Conclusion: Hashed files on CD-ROM provide quick data retrieval, making them ideal for static, read-
only applications.

Collision in Hashing

A collision occurs in hashing when two distinct keys are mapped to the same hash value by the hash
function. Collisions are inevitable when the number of keys exceeds the number of available slots in the
hash table (pigeonhole principle).

Causes of Collisions:

1. Limited Hash Table Size: A small table increases the likelihood of multiple keys hashing to the
same index.
2. Poor Hash Function: Inefficient hash functions fail to distribute keys evenly, causing clustering.

Effects of Collisions:

Reduced efficiency in data retrieval and insertion.


Potential clustering, where multiple keys are mapped to a small range of indices.

Collision Handling Techniques:

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 4/13
1. Open Addressing:
Resolve collisions by finding the next available slot using a probing strategy (e.g., linear
probing, quadratic probing).
2. Chaining:
Use a linked list at each index to store multiple keys that hash to the same value.
3. Double Hashing:
Apply a secondary hash function to determine a new slot during a collision.
4. Rehashing:
Increase the hash table size and re-compute hash values when collisions exceed a threshold.

Example:

Keys k1 ​ = 10 and k2 = 20 with h(k) = k mod 10 both map to index 0.


Using chaining, a linked list is created at index 0 to store both keys.

Conclusion: Collisions are a natural aspect of hashing but can be mitigated with effective handling
techniques.

Collision Resolution Techniques Implementation

Collision resolution techniques are used to handle cases where multiple keys are mapped to the same
hash index. Below are common methods with a focus on double hashing.

1. Chaining

Concept: Store collided keys in a linked list at each hash table index.
Implementation:
Each index of the hash table points to the head of a linked list.
Insert new elements at the end (or beginning) of the list.
Search traverses the list for the desired key.
Example:
h(k) = k mod 5: Keys 10 and 15 both map to index 0. A linked list stores both keys at index
0.

2. Open Addressing

Concept: Probe for an empty slot when a collision occurs.


Variants:
Linear Probing: Search sequentially (e.g., h′ (k) = (h(k) + i)
mod m).
Quadratic Probing: Use quadratic increments (e.g., h (k) = (h(k) + i2 ) mod m).

3. Double Hashing

Concept: Use a second hash function to compute the probe sequence, reducing clustering.
Formula:
h′ (k, i) = (h(k) + i ⋅ h2 (k)) mod m, where h2 (k) 
​ = 0.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 5/13
Implementation:
1. Compute primary hash h(k).
2. If occupied, compute secondary hash h2 (k).

3. Probe h′ (k, i) until an empty slot is found.


Example:
h(k) = k mod 7, h2 (k) = 1 + (k mod 5):

For k = 50, probe sequence: (50 mod 7) + i ⋅ (1 + 50 mod 5).

Conclusion: Techniques like chaining, open addressing, and double hashing effectively handle collisions,
ensuring efficient storage and retrieval in hash tables. Double hashing minimizes clustering and
improves performance.

Hashing Functions and Record Distribution

A hashing function is a mathematical algorithm that transforms a key into a hash value, which
determines the location of the corresponding record in a hash table. Effective record distribution
depends on designing an efficient hash function to minimize collisions and ensure uniform distribution.

1. Characteristics of a Good Hash Function

Deterministic: Always produces the same hash value for a given key.
Uniform Distribution: Spreads keys evenly across the hash table to minimize clustering.
Fast Computation: Efficient to compute, even for large datasets.
Minimizes Collisions: Reduces the likelihood of different keys mapping to the same hash value.

2. Common Hash Functions

1. Division Method:
Hash value = h(k) = k mod m, where m is the table size.
Example: For k = 50, h(k) = 50 mod 7 = 1.
2. Multiplication Method:
Hash value = h(k) = ⌊m ⋅ (k ⋅ A mod 1)⌋, where A is a constant (0 < A < 1).
3. Folding Method:
Divide the key into parts, add them, and then apply mod m.
4. Universal Hashing:
Uses randomized hash functions to reduce clustering in worst-case scenarios.

3. Record Distribution

A well-designed hash function ensures uniform record distribution across the hash table,
preventing overloading in specific regions.
Poor hash functions cause clustering, leading to inefficiencies in search, insertion, and deletion.

Conclusion: Efficient hashing functions are crucial for distributing records uniformly, minimizing
collisions, and maintaining optimal hash table performance.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 6/13
Packing Density

Packing density refers to the measure of how efficiently space in a hash table is utilized. It indicates the
ratio of the number of occupied slots (records stored) to the total number of available slots in the table.

1. Definition

Packing Density (α):


Number of Records Stored
α= Total Slots in the Table

The value of α ranges between 0 and 1.


α = 0: The table is empty.
α = 1: The table is completely full.

2. Importance

Performance Indicator:
Low packing density (α close to 0): Indicates underutilization, wasting memory.
High packing density (α close to 1): Increases collisions, slowing operations.
Optimal Range:
For open addressing, α should typically stay below 0.7 to balance memory usage and collision
resolution efficiency.

3. Example

Hash table with 100 slots and 60 records:


60
α = 100 = 0.6, indicating 60% utilization.

4. Implications

A balanced packing density ensures efficient operations, minimizing collisions and optimizing
memory utilization.

Conclusion: Packing density measures hash table efficiency, directly affecting performance and collision
handling. Balancing it is key to effective hashing.

Double Hashing

Double hashing is an efficient collision resolution technique in open addressing, which uses a second
hash function to determine the probing sequence. This minimizes clustering and improves hash table
performance.

Key Characteristics

1. Two Hash Functions:


Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 7/13
A primary hash function (h(k)) determines the initial index.
A secondary hash function (h2 (k)) determines the step size for subsequent probes.

2. Formula:
h′ (k, i) = (h(k) + i ⋅ h2 (k)) mod m, where:

i: Probe number (starting at 0).


m: Size of the hash table.
h2 (k): Must not be zero to avoid infinite loops.

Advantages

1. Reduces Clustering: The use of h2 (k) spreads probes more evenly across the table.

2. Efficient Space Utilization: Avoids the need for linked lists (as in chaining).
3. Performance: Offers better search and insertion efficiency compared to linear or quadratic
probing.

Example

Primary Hash: h(k) = k mod 7.


Secondary Hash: h2 (k) = 1 + (k mod 5).

For k = 50, m = 7:
h(k) = 50 mod 7 = 1.
h2 (k) = 1 + (50 mod 5) = 1 + 0 = 1.

Probe sequence: (1 + i ⋅ 1) mod 7 = 1, 2, 3, ...

Conclusion: Double hashing enhances hash table performance by minimizing collisions and distributing
keys more uniformly, making it a preferred choice in open addressing.

Deletions in Hashed Files

Deleting records in hashed files requires careful handling to avoid disrupting the hash table's structure
and search functionality. The process varies depending on the collision resolution technique used.

1. Challenges in Deletion

Breaking Search Chains: Removing a record might disconnect probes or chains, causing search
failures.
Maintaining Table Integrity: Ensuring remaining records are still accessible after deletion.

2. Deletion in Different Collision Resolution Techniques

1. Chaining:
Simply remove the record from the linked list at the hashed index.
Adjust pointers to maintain the chain.
Example: Delete node k2 from a list k1 → k2 → k3 .
​ ​ ​ ​

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 8/13
2. Open Addressing:
Use a special "Deleted" Marker: Replace the record with a dummy marker instead of leaving
the slot empty.
This ensures probes continue past the marker for existing records.
Example: Slot marked as "Deleted" ensures a search for k3 is not prematurely terminated.

3. Rehashing After Deletion

If the hash table becomes sparse, rehashing may be necessary to improve performance.
Recompute hash values for all keys and rebuild the table.

Conclusion: Deletion in hashed files requires preserving the hash table's functionality and structure,
often using markers or rehashing to maintain efficiency.

Chained Progressive Overflow

Chained Progressive Overflow is a collision resolution strategy used in hash files. It combines the
features of chaining and progressive overflow to handle hash collisions effectively.

1. Key Characteristics

1. Chaining Concept:
Each hash table slot (bucket) has a pointer to a linked list where collided records are stored.
2. Progressive Overflow:
When the linked list exceeds a threshold, additional overflow areas are progressively allocated
to store extra records.
Overflow areas may be organized sequentially or in another storage structure.

2. Working Mechanism

Collision Handling:
When a collision occurs, the record is added to the linked list at the appropriate hash index.
If the linked list grows too long, overflow blocks are allocated to handle the excess.
Search:
The hash index is used to locate the bucket.
The linked list and overflow areas are searched sequentially if needed.

3. Advantages

Efficient storage utilization through progressive allocation of overflow blocks.


Avoids performance degradation caused by excessively long chains.

4. Example

Hash table size: 5; Buckets 0–4.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 9/13
For keys 12, 17, and 22 hashed to index 2, overflow blocks are allocated when the chain exceeds a
specified limit.

Conclusion: Chained progressive overflow enhances traditional chaining by dynamically allocating


overflow storage, ensuring better handling of high collision rates.

Implementation of Hashing

Hashing is implemented to store and retrieve data efficiently using a hash table. The key aspects of
implementing hashing include designing a hash table, choosing a hashing function, and handling
collisions.

1. Components of Hashing

1. Hash Table:
A data structure with slots or buckets to store records.
Can be fixed-size or dynamically resizable.
2. Hash Function:
Converts a key into a hash value that determines the record’s index in the hash table.
Common methods: Division, Multiplication, and Folding.

2. Collision Resolution Techniques

1. Chaining:
Each slot contains a linked list to store multiple records with the same hash value.
2. Open Addressing:
Probes for the next available slot using methods like linear, quadratic, or double hashing.
3. Rehashing:
Increases table size and redistributes existing records when the table becomes too full.

3. Steps in Implementation

1. Initialize the hash table.


2. Choose a suitable hash function based on the data.
3. Implement collision resolution.
4. Optimize for search, insertion, and deletion.

Conclusion: The implementation of hashing involves a structured approach to ensure efficient data
storage and retrieval, balancing performance and memory utilization.

Procedure for Finding Buddy-Buckets

Buddy-buckets are used in buddy memory allocation systems to efficiently manage memory. In this
context, memory is divided into blocks of varying sizes, and buddy-buckets are pairs of blocks that are

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 10/13
powers of two in size and have contiguous memory addresses. The procedure to find buddy-buckets
involves finding pairs of blocks that can be merged or allocated as needed.

1. Understanding Buddy Allocation

Memory is split into blocks of sizes that are powers of two.


A block of memory is paired with its buddy (a block of the same size) in a contiguous region.
When a block is freed, its buddy is checked to see if it can be merged back into a larger block.

2. Steps to Find Buddy-Buckets

1. Identify the Memory Block:


Given a memory address and block size, find the corresponding block.
2. Calculate Buddy Address:
The buddy of a block can be found by using the formula:
Buddy Address = Start Address of Block ⊕ Block Size.
Where ⊕ denotes the XOR operation, which flips the last bit of the block address.
3. Check Buddy Availability:
Verify if the buddy is free or occupied. If free, both blocks can be merged into a larger block.
4. Update Allocation Status:
If the blocks are merged, update the memory structure to reflect the new, larger block.

3. Example

For a block of size 64 at address 1024, the buddy's address will be calculated using the XOR
operation on the block address.

Conclusion: The buddy-bucket system uses a systematic approach to efficiently pair blocks of memory,
ensuring optimal memory allocation and deallocation in systems that manage memory in chunks of
powers of two.

Linear Hashing Method

Linear hashing is a dynamic hashing technique used to handle growing hash tables, providing an
efficient way to handle overflow while maintaining good performance for insertions and lookups.

1. Key Features

Dynamic Hashing: Unlike static hashing, linear hashing allows the table size to grow dynamically
as records are inserted.
Overflow Handling: It uses a split mechanism to handle overflow instead of resizing the entire
hash table.

2. How Linear Hashing Works

1. Initial Table:
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 11/13
Start with an initial hash table and a hash function h(k) that maps keys to hash values.
2. Overflow Handling:
When the table becomes too full (based on a threshold), the table is expanded by adding a
new bucket.
Splitting Mechanism: Only one bucket is split at a time, and the new bucket is placed in the
table sequentially.
3. Modulo Operation:
New keys are hashed using a function hi (k) for each iteration where i is the current level of

the split.
The modulo operation is used to distribute records across the new buckets.

3. Example

Suppose a table has 4 buckets. When a split occurs, the hash table grows to 5 buckets, and records
from one bucket are rehashed and moved into the new bucket.

4. Advantages

Efficiency: Reduces the need for rehashing the entire table.


Simple Implementation: Easy to implement with lower memory overhead.

Conclusion: Linear hashing efficiently manages dynamic hash tables by progressively splitting buckets,
ensuring minimal overflow and optimal performance.

Implementation of Double Size and Collapse Methods in the Directory Class

The Directory Class manages the hash table's structure, which is dynamically resized as records are
inserted or deleted. The Double Size and Collapse methods are used to handle the growth and
shrinkage of the hash table efficiently.

1. Double Size Method

The Double Size method is used when the hash table becomes too full, necessitating the need to
expand the directory.

Purpose: To increase the size of the directory by a factor of two.


Steps:
1. Create a New Directory: Allocate a new directory with double the number of entries.
2. Rehash Existing Entries: Iterate through the old directory and rehash each record into the
new directory.
3. Update Directory Pointer: Redirect the directory pointer to the newly created, larger
directory.
Example: If the directory size is 8, doubling the size will make it 16, and existing records will be
rehashed to fit the new structure.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 12/13
2. Collapse Method

The Collapse method is used when there is too much unused space in the directory, allowing it to
shrink.

Purpose: To reduce the directory size when the number of active records decreases significantly.
Steps:
1. Check Occupied Slots: Count the number of active slots in the directory.
2. Resize the Directory: If the directory is only half full, reduce its size by halving the number of
slots.
3. Rehash Records: Rehash the records into the smaller directory.
Example: A directory of size 16 will be reduced to 8 if there are fewer than half of the slots
occupied.

Conclusion

The Double Size and Collapse methods enable the directory class to dynamically adjust its size based
on usage, ensuring efficient memory utilization and maintaining performance.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 13/13

You might also like