DSA Hash
DSA Hash
ALGORITHMS – HASH
TABLES
Dr. Muhammad Awais Sattar
Assistant Professor RSCI
1
HASH TABLE
A Hash Table is a data structure designed to be fast to work with.
The reason Hash Tables are sometimes preferred instead of arrays or linked
lists is because searching for, adding, and deleting data can be done really
quickly, even for large amounts of data.
In a Linked List, finding a person "Bob" takes time because we would have to
go from one node to the next, checking each node, until the node with "Bob"
is found.
And finding "Bob" in an Array could be fast if we knew the index, but when we
only know the name "Bob", we need to compare each element (like with
Linked Lists), and that takes time.
With a Hash Table however, finding "Bob" is done really fast because there is
a way to go directly to where "Bob" is stored, using something called a hash
function.
BUILDING A HASH TABLE FROM
SCRATCH
To get the idea of what a Hash Table is, let's try to build
one from scratch, to store unique first names inside it.
We will build the Hash Set in 5 steps:
1. Starting with an array.
2. Storing names using a hash function.
3. Looking up an element using a hash function.
4. Handling collisions.
5. The basic Hash Set code example and simulation.
STEP 1: STARTING WITH AN ARRAY
Using an array, we could store names like this:
my_array = ['Pete', 'Jones', 'Lisa', 'Bob', 'Siri’]
To find "Bob" in this array, we need to compare each name, element by
element, until we find "Bob".
If the array was sorted alphabetically, we could use Binary Search to find a
name quickly, but inserting or deleting names in the array would mean a
big operation of shifting elements in memory.
To make interacting with the list of names really fast, let's use a Hash Table
for this instead, or a Hash Set, which is a simplified version of a Hash Table.
To keep it simple, let's assume there is at most 10 names in the list, so the
array must be a fixed size of 10 elements. When talking about Hash Tables,
each of these elements is called a bucket.
my_hash_set = [None,None,None,None,None,None,None,None,None,None]
STEP 2: STORING NAMES USING A
HASH FUNCTION
Now comes the special way we interact with the Hash Set we
are making.
We want to store a name directly into its right place in the
array, and this is where the hash function comes in.
A hash function can be made in many ways, it is up to the
creator of the Hash Table. A common way is to find a way to
convert the value into a number that equals one of the Hash
Set's index numbers, in this case a number from 0 to 9. In our
example we will use the Unicode number of each character,
summarize them and do a modulo 10 operation to get index
numbers 0-9.
STEP 2: STORING NAMES USING A
HASH FUNCTION
def hash_function(value):
sum_of_chars = 0
for char in value:
sum_of_chars += ord(char)
return sum_of_chars % 10
• The character "B" has Unicode code point 66, "o" has 111, and "b" has 98.
Adding those together we get 275. Modulo 10 of 275 is 5, so "Bob" should be
stored as an array element at index 5.
• The number returned by the hash function is called the hash code.
STEP 2: STORING NAMES USING A
HASH FUNCTION
After storing "Bob" where the hash code tells us (index
5), our array now looks like this:
my_hash_set= [None,None,None,None,None,'Bob',None,None,None,None]
def hash_function(value):
sum_of_chars = 0
for char in value:
sum_of_chars += ord(char)
return sum_of_chars % 10
def contains(name):
index = hash_function(name)
return my_hash_set[index] == name