Hashing 2
Hashing 2
Data Structure
Hashing
• Traditional Searching Methods
• Binary Search Tree: Decides the search direction based on key comparison at each
node.
• Hashing as a Different Approach:
• Direct Access: Hashing calculates the position of an element directly based on the
• A hash Function (h) used to transform a key 𝐾 (e.g., string, number, or record)
key’s value, reducing search time to O(1) ideally, without needing comparisons.
• TSize = 7
• h(100)=100 mod 7 = 2
• ℎ(200)=200 mod 7 = 4
• ℎ(300)=300 mod 7 = 6
• h(400)=400 mod 7 = 1
• ℎ(500)=500 mod 7 = 3
2. Folding
• In the Folding method for hashing, the key is divided into multiple
parts, which are then "folded" together through a series of
transformations to form the hash index.
• This method is particularly useful for keys that are long numbers or
strings, like social security numbers (SSNs), by breaking them down
into manageable parts and then combining these parts to get a final
hash address.
• There are two types of folding:
• shift folding and
• boundary folding.
Shift folding
Shift Folding
Boundary Hash
• In this method, the key parts are alternately reversed to prevent
patterns that may emerge if the data has certain ordering structures.
• Example: Using the same SSN parts, "123", "456", and "789":Start
with 123.
• Reverse the second part, making it 654.
• Add the third part normally, 789.
• The resulting sum is 123 + 654 + 789 = 1566, which can be processed modulo
TSize to get the final hash.
• A bit-oriented version of shift folding is obtained by applying the
exclusive-or operation, ˆ.
Bit-Oriented Folding
• For bit patterns instead of numeric values, bitwise operations like exclusive OR
(XOR) are used to combine parts. This is common with strings or binary data.
• String XOR Example: For a string like "abcd", each character is XOR’d to produce a
hash:
• For example, for the string “abcd,” h(“abcd”) = “a”ˆ“b”ˆ“c”ˆ“d.”
• For larger strings, chunks equal to the number of bytes in an integer are XOR’d together
(e.g., "ab" XOR "cd").
• Advantages and Use Cases
• Flexibility: Folding allows flexibility in handling various key formats (e.g., numbers, strings).
• Collision Handling: Folding, especially boundary folding, helps reduce collision chances by
mixing key parts effectively.
• Efficiency: This method is fast, especially for fixed-size data, and can be enhanced with
bitwise operations for better performance.
3. Mid-Square Function
• The Mid-Square Hashing Method is a hashing technique where the
key is squared, and the middle portion of the resulting number is
used as the hash address.
• This method is effective because it involves all parts of the key in
generating the address, increasing the chances that unique keys will
generate unique hash values.
• Let’s go through the key points of this method and an example for
clarity.
Key Points of the Mid-Square Method
• Squaring the Key: The key (a number) is squared, resulting in a large
number.
• Extracting the Middle Part: The middle portion of this squared value is
taken as the hash address. This reduces the effect of any patterns in the
key's original format, making it more suitable for unique address
generation.
• Preprocessing for Strings: If the key is a string, it needs to be converted to
a numeric value (often using methods like folding) before applying the
mid-square approach.
• Table Size and Power of 2: It’s often more efficient to choose a power of
2 for the table size. This makes it easier to extract the middle part of the
binary representation of the squared key using bitwise operations.
Example of the Mid-Square Method
Suppose the key is 3,121, and we have a table with 1,000 cells.
1. Square the Key:
3,1212=3,121^2 = 9,740,641
2. Extract the Middle Portion:
For a 1,000-cell table, we need three digits in the hash address. We take the middle three digits of
9,740,641, which are 406.
Therefore, h(3,121)=406.
This middle section gives us the hash address, and it falls within the range of our table.
3. Binary Extraction (for Power of 2 Table Size):
Now, let’s assume the table size is 1,024 (a power of 2). In binary, 9,740,641 is represented as:
100101001010000101100001
● Flexibility: It can be adapted to both numerical and string-based keys (once strings
are preprocessed).
Use Cases
The mid-square method is suitable in scenarios where:
● There’s a high probability of patterns in the keys, such as sequential or similar keys,
where mid-square can help spread out the addresses.
● You need a straightforward and fast hashing function that can be implemented using
bitwise operations for better efficiency.
Extraction Method
• The Extraction Hashing Method is a technique where only a selected
portion of the key is used to compute the hash address.
• This method is useful when the key contains redundant or
consistent portions that can be safely ignored without impacting the
uniqueness of the hash.
• By focusing on only a part of the key, this method simplifies the
hashing process while ensuring that the key’s uniqueness is still
captured effectively.
Key Points of the Extraction Method
Partial Key Use: Instead of using the entire key, only a part of it is used
to calculate the hash address.
Choosing the Right Portion: The chosen part of the key should be
enough to ensure unique addresses. In some cases, certain parts of the
key can be omitted because they are either constant or redundant for
the context of the dataset.
Handling Redundant Information: In certain contexts, such as
standardized IDs or codes, portions of the key that are the same for all
entries can be excluded from the hash function.
Example of the Extraction Method
Consider a Social Security Number (SSN): 123-45-6789
Here are several ways the extraction method can be applied:
● First Four Digits: You could use the first four digits, 1234, to generate
the hash address.
● Last Four Digits: You could alternatively use the last four digits, 6789,
for the hash address.
● Combined Portions: Another approach might be to combine the first
two digits with the last two digits, yielding 1289.
Advantages of the Extraction Method
Efficiency: By using only a relevant portion of the key, the extraction
method can speed up hashing and reduce unnecessary calculations.
Simplicity: The method is easy to implement and doesn’t require
complex operations.
Contextual Relevance: It works well when certain parts of the key are
predictable or redundant, making it ideal for structured keys such as IDs
or codes.
Space Optimization: By omitting unnecessary portions, the hash
function can potentially reduce memory usage and speed up
computation.
Use Cases
The Extraction Method is particularly useful in situations where the key contains
repetitive or predictable portions. Here are some common use cases:
● Structured ID Systems: When dealing with structured IDs, such as university
student numbers, where some digits are common to all members of a group.
● Product Codes or ISBNs: In databases that store product codes or ISBNs from
the same publisher, where parts of the code are identical for all products.
● Employee IDs or National Identification Numbers: In organizations where
employee IDs or national identification numbers share common prefixes
based on regions, departments, or other groupings.
Radix Transformation
The radix transformation is a method where we transform the original key 𝐾 into a
different numerical base (or radix) before hashing.
where 𝑖 is the probe number (starting from 1). However, the sequence may be
• h(K)+i^2 ,
•
adjusted to ensure that the probes do not cover only half of the table.
For example, the sequence is symmetrical with probes like
ℎ(𝐾)+1,ℎ(𝐾)+4,ℎ(𝐾)+9,…
•
•
• One issue with quadratic probing is secondary clustering, where keys hashed to the
same position follow the same probe sequence, potentially leading to clusters of
occupied positions. These clusters are less problematic than primary clusters (from
linear probing) but still exist.
Double Hashing
• If a collision occurs after applying a hash function h(k), then another
hash function is calculated for finding the next slot.