Hash Functions and Hash Tables: Hash Function A Function That Can Take A Key Value and Compute An Integer Value (Or An
Hash Functions and Hash Tables: Hash Function A Function That Can Take A Key Value and Compute An Integer Value (Or An
Hash Functions 1
hash function a function that can take a key value and compute an integer value (or an
index in a table) from it
For example, student records for a class could be stored in an array C of dimension 10000
by truncating the students ID number to its last four digits:
H(IDNum) = IDNum % 10000
Given an ID number X, the corresponding record would be inserted at C[H(X)].
This would be easy to implement, and cheap to execute. Whether it's actually a very
good hash function is another matter
CS@VT
2000-2009 McQuain
Hash Functions
Hash Functions 2
CS@VT
2000-2009 McQuain
Hash Functions 3
actual key
values
CS@VT
2000-2009 McQuain
Hash Functions 4
F() may be
uniform on
the whole
theoretical
domain
2000-2009 McQuain
Hash Functions 5
It is usually desirable to have the entire key value affect the hash result (so simply
chopping off the last k digits of an integer key is NOT a good idea in most cases).
Consider the following function to hash a string value into an integer range:
public static int sumOfChars(String toHash) {
int hashValue = 0;
for (int Pos = 0; Pos < toHash.length(); Pos++) {
hashValue = hashValue + toHash.charAt(Pos); Hashing: hash
}
h: 104
a: 97
return hashValue;
s: 115
h: 104
Sum:
420
Mod by table
size to get the
index
This takes every element of the string into account a string hash function that
truncated to the last three characters would compute the same integer for "hash",
"stash", "mash", "trash.
CS@VT
2000-2009 McQuain
Hash Functions 6
Division
- the first order of business for a hash function is to compute an integer value
- if we expect the hash function to produce a valid index for our chosen table size,
that integer will probably be out of range
- that is easily remedied by modding the integer by the table size
- there is some reason to believe that it is better if the table size is a prime, or at least
has no small prime factors
Folding
- portions of the key are often recombined, or folded together
- shift folding:
- boundary folding:
2000-2009 McQuain
Hash Functions 7
Mid-square function
- square the key, then use the middle part as the result
- e.g., 3121 9740641 406 (with a table size of 1000)
- a string would first be transformed into a number, say by folding
- idea is to let all of the key influence the result
- if table size is a power of 2, this can be done efficiently at the bit level:
3121 100101001010000101100001 0101000010 (with a table size of 1024)
Extraction
- use only part of the key to compute the result
- motivation may be related to the distribution of the actual key values, e.g., VT
student IDs almost all begin with 904, so it would contribute no useful separation
Radix transformation
- change the base-of-representation of the numeric key, mod by table size
- not much of a rationale for it
CS@VT
2000-2009 McQuain
Hash Functions 8
CS@VT
2000-2009 McQuain
Improving Scattering
Hash Functions 9
A simple hash function is likely to map two or more key values to the same integer
value, in at least some cases.
A little bit of design forethought can often reduce this:
public static int sumOfShiftedChars(String toHash) {
int hashValue = 0;
for (int Pos = 0; Pos < toHash.length(); Pos++) {
hashValue = (hashValue << 4) + toHash.charAt(Pos);
}
return hashValue;
}
Hashing: hash
h: 104
a:
97
s: 115
h: 104
Sum: 452760
CS@VT
Hashing: shah
s: 115
h: 104
a:
97
h: 104
Sum: 499320
2000-2009 McQuain
Hash Functions 10
// shift/mix
if (hiBits != 0)
hashValue ^= hiBits >> 24;
}
return hashValue;
}
This was developed originally during the design of the UNIX operating system, for use
in building system-level hash tables.
CS@VT
2000-2009 McQuain
Details
Hash Functions 11
Here's a trace:
Character
hashValue
---------------------d: 64
00000064
i: 69
000006a9
s: 73
00006b03
t: 74
0006b0a4
r: 72
006b0ab2
i: 69
06b0ab89
b: 62
0b0ab892
u: 75
00ab8925
t: 74
0ab892c4
i: 69
0b892c09
o: 6f
0892c04f
n: 6e
092c05de
distribution: 15388030
hashValue
: 06b0ab89
hashValue << 4: 6b0ab890
add 62
: 6b0ab8f2
hiBits
hiBits >> 24
: 60000000
: 00000060
hashValue ^
hiBits
6b0ab8f2
00000060
: 6b0ab892
hashValue &
~hiBits
CS@VT
f: 1111
6: 0110
^: 1001
: 0b0ab892
2000-2009 McQuain