0% found this document useful (0 votes)
22 views14 pages

Hashing

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views14 pages

Hashing

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

ITE 2142 – Data Structures & Algorithms Week 08

LESSON 08 – Hash Tables

Introduction

During this week you will learn about another data structure called hash table and hashing
techniques. A hash table is a data structure that offers very fast insertion and searching.
When you first hear about them, hash tables sound almost too good to be true. No matter
how many data items there are, insertion and searching ( and sometimes deletion) can take
close to constant time : O(1).

Learning outcome
After completing this lesson, you would be able to describe hashing and hash
tables. Thus you should be able to,
 Define a Hash Table

 Define Hash function

 Describe collision

 Define techniques used for avoiding collision

8.1 Introduction to Hash Tables


Throughout this lesson we will introduce hash tables and hashing. One important concept
is how a range of key values is transformed into a range of array index values.
Suppose we want to design a system for storing employee records keyed using phone
numbers. And we want following queries to be performed efficiently:
 Insert a phone number and corresponding information.
 Search a phone number and fetch the information.
 Delete a phone number and related information.
We can think of using the following data structures to maintain information about different
phone numbers.
 Array of phone numbers and records.
1
Hashing
ITE 2142 – Data Structures & Algorithms Week 08

 Linked List of phone numbers and records.


 Balanced binary search tree with phone numbers as keys.
 Direct Access Table.

For arrays and linked lists, we need to search in a linear fashion, which can be costly in
practice. If we use arrays and keep the data sorted, then a phone number can be searched
using Binary Search, but insert and delete operations become costly as we have to maintain
sorted order.

With balanced binary search tree, we get moderate search, insert and delete times. All of
these operations can be guaranteed to be in O(Logn) time.
Another solution that one can think of is to use a direct access table where we make a big
array and use phone numbers as index in the array. An entry in array is NIL if phone number
is not present, else the array entry stores pointer to records corresponding to phone number.
Time complexity wise this solution is the best among all, we can do all operations in O(1)
time. For example to insert a phone number, we create a record with details of given phone
number, use phone number as index and store the pointer to the created record in table.
This solution has many practical limitations. First problem with this solution is extra space
required is huge. For example if phone number is n digits, we need O(m * 10n) space for
table where m is size of a pointer to record. Another problem is an integer in a programming
language may not store n digits.

Due to above limitations Direct Access Table cannot always be used. Hashing is the solution
that can be used in almost all such situations and performs extremely well compared to
above data structures like Array, Linked List, and Balanced BST in practice. With hashing
we get O(1) search time on average (under reasonable assumptions) and O(n) in worst case.

2
Hashing
ITE 2142 – Data Structures & Algorithms Week 08

What is hashing?
Hashing is an improvement over Direct Access Table. The idea is to use hash function that
converts a given phone number or any other key to a smaller number and uses the small
number as index in a table called hash table. You will about this in detail in the next section.

What is a hash table?


An array that stores pointers to records corresponding to a given phone number. An entry
in hash table is NIL if no existing phone number has hash function value equal to the index
for the entry.

For a human user of a hash table this is essentially instantaneous. It is so fast that computer
programs typically use hash tables when they need look up tens of thousands of items in a
less than a second (as in spelling checkers). Hash tables are significantly faster that trees
and easy to program.

Hash tables have several disadvantages. They are based on arrays, but arrays are difficult to
expand once they have been created. For some kinds of hash tables, performance may
degrade catastrophically when the table becomes too full, so the programmer needs to have
a fairly accurate idea of how many data items will need to be stored.

8.2 Hashing
Think about a dictionary. If you want to put every word of an English-language dictionary,
into your computer’s memory, so they can be accessed quickly, a hash table is a good
choice. Let’s say we want to store a 50,000 word English language dictionary in main
memory. You would like every word to occupy its own cell in a 50,000 – cell array, so you
can access the word using an index number. This will make access very fast. But what’s
the relationship of the index numbers to the word? Given the word persistent, for
example how do we find its index number?

3
Hashing
ITE 2142 – Data Structures & Algorithms Week 08

How big an array are we talking about for English-language dictionary? If we only have
50,000 words, you might assume our array should have approximately this many elements.
However it may need array with 100,000 elements. Thus we look for a way to squeeze a
range of 0 to more than 7,000, 000, 000,000 into the range 0 to 100,000. A simple approach
is to use the modulo operator (%), which finds the remainder when one number is
divided by another. This type of distribution can be done via proper function called
hash function. Hash function hashes (converts) a number in a large range into a
number in a small range. This small range is corresponds to the index numbers in an
array. In the following example hashing function can be defined as,
small number = large number mod small range
i.e 6 = 196 mod 10

An array into which data is inserted using a hash function is called as hash table. In the
above diagram array with small range is called as a hash table. Hash function
should be in quick computation mode and function should be simple, so it can be
computed easily. If hash function is slow, speed of hash table degrades.

4
Hashing
ITE 2142 – Data Structures & Algorithms Week 08

8.3 Collision
If we can define a one-to-one mapping from elements of large range to elements in small range, then hash
function is called a perfect hashing function.
Think about array (hash table) that we are going to use for English-language dictionary. Perhaps you want
to insert the word melioration into the array. You hash the word to obtain its index number , but finds
it that the cell at that number is already occupied by the word demystify, which happens to hash to the
exact same number. This is called the collisions.
If we cannot define a perfect hashing function, i.e. one-to-many mapping from elements of large range to
elements in small range, we must deal with collisions. It can be depicted as follows.

Remember that we have specified an array with twice as many cells as data items. Thus perhaps half
the cells are empty. One approach, when collision occurs, is to search the array in some
systematic way for an empty cell, and insert the new item there, instead of at the index specified by
the hash function. This approach is called open addressing.
A second approach is to create an array that consists of linked lists of words instead of the words
themselves. Then when a collision occurs, the new item is simply inserted in the list at that
index. This is called separate chaining.
5
Hashing
ITE 2142 – Data Structures & Algorithms Week 08

8.4 Open addressing


In open addressing, when a data item can’t be placed at the index calculated by the hash function,
another location in the array is sought. We’ll explore three methods of open addressing, which vary
in the method used to find the next vacant cell. These methods are linear probing, quadratic probing,
and double hashing.

8.4.1 Linear probing


Linear probing, search sequentially hash table for vacant cell to insert new item. For an example if
array index 23 is occupied when we try to insert new item there, then go to 24, then 25 and so on.
Incrementing the index until we find an empty cell.

Java Implementation for a linear probe hash Table


The insert() method locate where a data item should go. However it’s looking for an empty cell or
deleted item. Once this empty cell has been located, insert() places the new item into it.

public void insert(DataItem item) // insert a DataItem


// (assumes table is not full)
{
int key = item.iData; // extract key int hashVal =
hashFunc(key); // hash the key
// until empty cell or -1,

while(hashArray[hashVal] != null && hashArray[hashVal].iData != -1)


{
++hashVal; // go to next cell
hashVal %= arraySize; // wraparound if necessary
}
hashArray[hashVal] = item; // insert item
} // end insert()

The delete() method finds an existing item. Once the item is found, delete () writes over it with the
special data item nonItem.

public DataItem delete(int key) // delete a DataItem


{
6
Hashing
ITE 2142 – Data Structures & Algorithms Week 08

int hashVal = hashFunc(key); // hash the key

while(hashArray[hashVal] != null) // until empty cell,


{ // found the key?
if(hashArray[hashVal].iData == key)
{
DataItem temp = hashArray[hashVal]; // save item
hashArray[hashVal] = nonItem; // delete item return
temp; // return item
}
++hashVal; // go to next cell
hashVal %= arraySize; // wraparound if necessary
}
return null; // can't find item
} // end delete

The find() method first calls hashFunc() to hash the search key to obtain the
index number.
The hashFunc() method applies the % operator to the search key and the array size.

public DataItem find(int key) // find item with key


{
int hashVal = hashFunc(key); // hash the key

while(hashArray[hashVal] != null) // until empty cell,


{ // found the key?
if(hashArray[hashVal].iData == key)
return hashArray[hashVal]; // yes, return item
++hashVal; // go to next cell
hashVal %= arraySize; // wraparound if necessary
}
return null; // can't find item
}

public int hashFunc(int key)


{
return key % arraySize; // hash function
}

As hashVal steps through the array, it eventually reaches the end. When this happens we want it to
wrap around to the beginning.
7
Hashing
ITE 2142 – Data Structures & Algorithms Week 08

8.4.2 Quadratic probing


Idea in quadratic probing is to probe more widely separated cells, instead of those adjacent
to primary hash site as in linear probing. In a linear probe, if the primary hash is ‘x’ then subsequent
probes go to ‘x+1’, ‘x+2’, so on. In Quadratic Probing subsequent probes go to ‘x+1’, ‘x+4’, ‘x+9’

... so on. i.e. subsequent probes go to ‘x+1’, ‘x+22’, ‘x+32’ ... so on. The following figure shows
some quadratic probes.

8.4.3 Double Hashing


In double hashing, hashes the key a second time using different hash function and use the
result as step size for the probe sequence.
The second hash function must have the following features:
• It must not be the same as the primary hash function
• It must never outputs 0 (if it is zero there would be no step)
Experts have discovered that functions of the following forms works well.
stepsize = constant – ( key % constants)

8
Hashing
ITE 2142 – Data Structures & Algorithms Week 08

, where constant is prime and smaller than the array size. For e.g. stepsize = 5 – (key%5)
In this function, for any given key all the steps will be the same size, but different keys
generate different step sizes. These two function can be implemented in java as follows.

public int hashFunc1(int key)


{
return key % arraySize;
}
// -------------------------------------------------------------

public int hashFunc2(int key)


{
// non-zero, less than array size
// array size must be relatively prime to 5, 4, 3, and 2
return 5 - key % 5;
}

public DataItem find(int key) // find item with key


// (assumes table is not full)
{
int hashVal = hashFunc1(key); // hash the key int stepSize
= hashFunc2(key); // get step size

while(hashArray[hashVal] != null) // until empty cell,


{
if(hashArray[hashVal].iData == key) // is correct hashVal?
return hashArray[hashVal]; // yes, return item
hashVal += stepSize; // add the step
hashVal %= arraySize; // for wraparound
}
return null; // can't find item
}

13.5 Separate Chaining

In open addressing, collisions are resolved by looking for an empty cell in hash table. A different
approach is to create a linked list at each array index in the hash table. Data item is hashed using a
hash function as before and item is stored in linked list at that index. Other items that are hashed to
same array index are added to the linked list.

9
Hashing
ITE 2142 – Data Structures & Algorithms Week 08

Separate chaining is conceptually somewhat simpler than the various probe schemes used in open
addressing, however, the code is longer because it must include the mechanism for the linked
lists, usually in the form of an additional class. This is how java implementation for separate
chaining looks like.

class Link
{ // (could be other items)
public int iData; // data item
public Link next; // next link in list

// -------------------------------------------------------------
public Link(int it) // constructor
{ iData= it; }
// -------------------------------------------------------------

public void displayLink() // display this link


{ System.out.print(iData + " "); }
} // end class Link
////////////////////////////////////////////////////////////////

class SortedList
{
private Link first; // ref to first list item
// -------------------------------------------------------------
public void SortedList() // constructor
{ first = null; }
// -------------------------------------------------------------
public void insert(Link theLink) // insert link, in order
{
int key = theLink.iData;
Link previous = null; // start at first
Link current = first;
// until end of list,
while(current != null && key > current.iData)
{ // or current > key,
previous = current;
current = current.next; // go to next item
}
if(previous==null) // if beginning of list,
first = theLink; // first --> new link
else // not at beginning,
previous.next = theLink; // prev --> new link
theLink.next = current; // new link --> current
10
Hashing
ITE 2142 – Data Structures & Algorithms Week 08

} // end insert()
// -------------------------------------------------------------
public void delete(int key) // delete link
{ // (assumes non-empty list)
Link previous = null; // start at first
Link current = first;
// until end of list,

while(current != null && key != current.iData)


{ // or key == current,
previous = current;
current = current.next; // go to next link
}
// disconnect link
if(previous==null) // if beginning of list
first = first.next; // delete first link
else // not at beginning
previous.next = current.next; // delete current link
} // end delete()
// -------------------------------------------------------------

public Link find(int key) // find link


{
Link current = first; // start at first
// until end of list,
while(current != null && current.iData <= key)
{ // or key too small,
if(current.iData == key) // is this the link?
return current; // found it, return link
current = current.next; // go to next item
}
return null; // didn't find it
} // end find()
// -------------------------------------------------------------
public void displayList()
{
System.out.print("List (first-->last): ");
Link current = first; // start at beginning of list

while(current != null) // until end of list,


{
current.displayLink(); // print data
current = current.next; // move to next link
} System.out.println("");
}
} // end class SortedList
11
Hashing
ITE 2142 – Data Structures & Algorithms Week 08

////////////////////////////////////////////////////////////////

class HashTable
{
private SortedList[] hashArray; // array of lists
private int arraySize;
// -------------------------------------------------------------
public HashTable(int size) // constructor
{
arraySize = size;
hashArray = new SortedList[arraySize]; // create array
for(int j=0; j<arraySize; j++) // fill array
hashArray[j] = new SortedList(); // with lists
}
// -------------------------------------------------------------
public void displayTable()
{
for(int j=0; j<arraySize; j++) // for each cell,
{
System.out.print(j + ". "); // display cell number
hashArray[j].displayList(); // display list
}
}
// -------------------------------------------------------------
public int hashFunc(int key) // hash function
{
return key % arraySize;
}
// -------------------------------------------------------------
public void insert(Link theLink) // insert a link
{
int key = theLink.iData;
int hashVal = hashFunc(key); // hash the key
hashArray[hashVal].insert(theLink); // insert at hashVal
} // end insert()
// -------------------------------------------------------------
public void delete(int key) // delete a link
{
int hashVal = hashFunc(key); // hash the key
hashArray[hashVal].delete(key); // delete link
} // end delete()

// -------------------------------------------------------------
public Link find(int key) // find link
12
Hashing
ITE 2142 – Data Structures & Algorithms Week 08

{
int hashVal = hashFunc(key); // hash the key
Link theLink = hashArray[hashVal].find(key); // get link
return theLink; // return link
}
// -------------------------------------------------------------
} // end class HashTable

Activity 8.1
Suppose you have a set of data “12, 24, 45, 99, 181, 101” to store in a hash table
of size 10. Consider that the hash function is h(x) = x mod 10.

Store the given values in the hash table by using quadratic probing technique if you
have to deal with collision.

Summary
In this lesson you have learnt about hashing and Hash table. You learned about how
to create a hash table and how to deal with collision. You got to know different
approaches for avoiding collision and how do they differ from each other.

13
Hashing
ITE 2142 – Data Structures & Algorithms Week 08

Activity 8.1 - Answer


Since, we have to store given data in a hash table with size 10. It can be depicted as shown
below. First we take the number 12. Then 12 mod 10 returns 2. Thus value 12 is stored at
index 2. Next take 24 when we apply hash function 24 mod 10 returns 4. Thus number 24
is stored at index 4. By applying same procedure we can see that, 45 is stored at 5, 99 is
stored at 9 and 181 is stored at 1. Next 101 return 1. When we try to store the value collision
occurs. Then we have to follow quadratic probing approach. This yields next position as

1+1 = 2. Since index 2 is not available, next we try 2 +22 = 6 and since it is available we
store 101 at index 6.

14
Hashing

You might also like