Lab 3
Lab 3
The basic idea behind hashing is to take a field in a record, known as the key, and convert it
through some fixed process to a numeric value, known as the hash key, which represents the
position to either store or find an item in the table. The numeric value will be in the range of 0
to n-1, where n is the maximum number of slots (or buckets) in the table.
The fixed process to convert a key to a hash key is known as a hash function. This function will
be used whenever access to the table is needed.
One common method of determining a hash key is the division method of hashing.
Division Method
h(k) = k mod m
the division method is generally a reasonable strategy, unless the key happens to have some
undesirable properties. For example, if m is 10 and all of the keys end in zero.
Good values for m are prime numbers and m should not be a power of 2 and a power of 10.
Multiplication method
The multiplication method for creating a hash function operates in two steps
1. Multiply the key k by a constant A in the range 0 <A<1, and extract the fractional part of kA
h(k) = [m·(kA)]
Knuth suggests
then
h(k) =[ 10000·(123456·0.61803)]
=[10000·(76300.0041151)]
=[10000·0.0041151]
= [41.151]
= 41
The advantage of this method is that the value choice of m is not critical
Universal Hashing
H={h(k): U(0,1,..,m-1)}
function h ∈ U such that x!=y -> h(x)=h(y) which is |H|/ m / |H|= 1/m.
With universal hashing the chance of collision between distinct keys k and l is no more than the
1/m chance of collision if locations h(k) and h(l) were randomly and independently chosen from
Perfect Hashing
Perfect hashing is a technique for building a hash table with no collisions. It is only possible to
build one when we know all of the keys in advance
E.g. if I know the exact keys then it is trivial to produce a perfect hash function
Chaining
Store all elements that hash to the same slot in a linked list.
Store a pointer to the head of the linked list in the hash table slot.
Open Addressing
All elements stored in the hash table itself. When collisions occur,
use a systematic (consistent) procedure to store elements in free slots of the table.
Example of a systematic procedure is to save the key that make collision in
the first empty slot after the slot of the collision
Another way to sharply reduce clustering (collision) is to increment not by a constant (as is
done in linear probing) but, by an amount that depends on the Key. We thus have a
second hashing function, This technique is called double hashing
Let n be the number of keys in the table, and let m be the number of slots.
α = n/m
= average number of keys per slot
The expected time for an unsuccessful search for a record with a given key is = Θ(1 + α).
Practice:
Given a set of words, we need to find the anagram words and display each category alone using
chaining method(linked list) and using linear probing
An anagram is a word or phrase formed by reordering the letters of another word or phrase.
Here is a list of words such that the words on each line are anagrams of each other:
barde, ardeb, bread, debar, beard, bared
Hint
The important thing is to make the key for your hash function unique.
The idea is to sort the word where the word is sorted by letter, so "car" => "acr". All anagrams
will have the same "sorted word".