CSC 302 - Hashing Techniques
CSC 302 - Hashing Techniques
• Introduction to Hashing
• Hash Tables
• Types of Hashing
• Hash Functions
class StudentRecord {
String name; // Student name
double height; // Student height
long id; // Unique id
}
• The id field in this class can be used as a search key for records in the
container.
3
Introduction to Hashing
• Suppose that we want to store 10,000 students records (each with a 5-digit ID) in
a given container.
Using an array of size 100,000 would give O(1) access time but will lead
to a lot of space wastage.
• Is there some way that we could get O(1) access without wasting a lot of space?
4
Example 1: Illustrating Hashing
0 1 2 3 4 56 7 89 10 11 12
6
Hash Tables
• There are two types of Hash Tables: Open-addressed Hash Tables and Separate-
Chained Hash Tables.
Initialization.
Insertion.
Searching
Deletion.
7
Types of Hashing
• There are two types of hashing :
1.PStatic hashing: In static hashing, the hash function maps search-key
values to a fixed set of locations.
• The load factor of a hash table is the ratio of the number of keys in the table to
the size of the hash table.
• Note: The higher the load factor, the slower the retrieval.
• With open addressing, the load factor cannot exceed 1. With chaining
the load factor often exceeds 1.
8
Hash Functions
• A hash function, h, is a function which transforms a key from a set,P,
K into
an index in a table of size n:
h: K -> {0, 1, , n-2, n-1}
• This situation is called collision and the colliding keys are called synonyms.
9
’
Hash Functions (cont Pd)
Minimize collisions.
10
Common Hashing Functions
1.PDivision Remainder (using the table size as the divisor)
• Table size that is a power of 2 like 32 and 1024 should be avoided, for it leads to
more collisions.
• Also, powers of 10 are not good for table sizes when the keys rely on decimal
integers.
• rime numbers not close to powers of 2 are better table size values.
11
Common Hashing Functions (cont ’ d)
2.Truncation or Digit/Character Extraction
• More evenly distributed digit positions are extracted and used for hashing
purposes.
• For instance, students IDs or ISBN codes may contain common subsequences
which may increase the likelihood of collision.
• Very fast but digits/characters distribution in keys may not be very even.
12
Common Hashing Functions (cont ’ d)
3. Folding
• It involves splitting keys into two or more parts and then combining the parts
to form the hash addresses.
13
Common Hashing Functions (cont ’ d)
4.Radix Conversion
• Transforms a key into another number base to obtain the hash value.
• Typically use number base other than base 10 and base 2 to calculate the hash
addresses.
• To map the key 55354 in the range 0 to 9999 using base 11 we have:
5535410 = 3865211
• We may truncate the high-order 3 to yield 8652 as our hash address within 0 to
9999.
14
Common Hashing Functions (cont ’ d)
5.Mid-Square
• The key is squared and the middle part of the result taken as the hash value.
• To map the key 3121 into a hash table of size 1000, we square it 31212 =
9740641 and extract 406 as the hash value.
• Works well if the keys do not contain a lot of leading or trailing zeros.
15
Common Hashing Functions (cont ’ d)
6. Use of a Random-Number Generator
16
Some Applications of Hash Tables
• Database systems: Specifically, those that require efficient random access. Generally,
database systems try to optimize between two types of access methods: sequential and
random. Hash tables are an important part of efficient random access because they provide
a way to locate data in a constant amount of time.
• Symbol tables: The tables used by compilers to maintain information about symbols from
a program. Compilers access information about symbols frequently. Therefore, it is
important that symbol tables be implemented very efficiently.
• Data dictionaries: Data structures that support adding, deleting, and searching for data.
Although the operations of a hash table and a data dictionary are similar, other data
structures may be used to implement data dictionaries. Using a hash table is particularly
efficient.