Hashing Part 1 Lecture
Hashing Part 1 Lecture
• Introduction to Hashing
• Hash functions
• Distribution of records among addresses
• synonyms and collisions
• Collision resolution by progressive overflow or linear probing
Motivation
• Hashing is a useful searching technique which can be used for implementing
indexes.
• Below we show how the search time for Hashing compares to the one for
other methods
• using binary search O(log2 N)
• Hashing O(1)
What is Hashing ?
• The idea is to discover the location of a key by simply examining the
key.
• For that we need to design a hash function.
key f address
• An address space is chosen before hand.
• For example we may decide the file will have 1000 available addresses.
• LOWELL, LOCK, OLIVER, and any word with first two letters L and O will be
mapped to the same address
h(LOWELL) = h(LOCK) = h(OLIVER) = 4
• These keys are called synonyms
• The address "4" is said to be the home address of any of these keys.
2. Use extra memory, i.e. increase the size of the address space.
ex: reserve 5000 available addresses rather than 1000
8
1. Division Method
2. Multiplication Method
3. Extraction Method
4. Mid-Square Hashing
5. Folding Technique
6. Rotation
7. Universal Hashing
9
One of the required features of the hash
function is that the resultant index must
be within the table index range
In digit extraction, few digits are selected and extracted from the
key which are used as the address
13
The mid-square hashing suggests to take square of the key and extract the middle digits of the
squared key as address
The difficulty is when the key is large. As the entire key participates in the address calculation,
if the key is large, then it is very difficult to store the square of it as the square of key should not
exceed the storage limit
So mid-square is used when the key size is less than or equal to 4 digits
The difficulty of storing larger numbers square can be overcome if for squaring
we use few of digits of key instead of the whole key 14
We can select a portion of key if key is larger in size and then square the portion of it
Keys and addresses using extracting few digits, squaring them, and again extracting mid
15
5-
1
1
• Step 3. Divide by the size of the address space (preferably a prime number.)
• dividing by a number that has many small factors may result in lots of collisions.
When keys are serial, they vary in only last digit and this leads to the creation of synonyms
Rotating key would minimize this problem. This method is used along with other methods
Here, the key is rotated right by one digit and then use of folding would avoid synonym
For example, let the key be 120605, when it is rotated we get 512060
Then further the address is calculated using any other hash function
19
Some Other Hashing Methods
• Radix Transformation
• Transform the number into another base and then divide
by the maximum address.
If Hash(Key1) = Hash(Key2)
then
Key1 and Key2
are
synonyms
and collision happens
Consider the hash value is the RRN and
we working on fixed length records 21
Distribution of Records among Addresses
23
1.1. open addressing
Progressive Overflow or linear probing
H = F(key)
is the home address. If it is available we store the record, otherwise, we increase H by k,
H = (H + k) mod tableSize, (k ≥1)
Collision Resolution: Progressive Overflow
Any
suggestion !!
Collision Resolution: Progressive Overflow
• Advantage
• Simplicity
• Disadvantage
• If there are lots of collisions, clusters of records can form as in the previous
example.
1.2. Quadratic Probing
H = F(key)
H= (H + i2 )% tablesize, i≥ 𝟏
6+4 = 10
1.3. Double Hashing
Double Hashing uses nonlinear probing by computing different probe increments for different keys.
It uses two functions.
The first function computes the original address, if the slot is available (or the record is found) we stop there,
otherwise, we apply the second hashing function to compute the step value.
1. Open addressing
The first collision resolution method, open addressing, resolves collisions in the home area. When a collision
occurs, the home area addresses are searched for an open or unoccupied element where the new data can be
placed. Examples of Open Addressing Methods:
1.1. Linear probing or progressive overflow
1.2. Quadratic probing
1.3. Double hashing
34