0% found this document useful (0 votes)
6 views26 pages

DSA Hash

The document explains hash tables, focusing on their efficiency in searching, adding, and deleting data compared to arrays and linked lists. It details the process of building a hash table, including the use of hash functions, handling collisions, and the differences between hash sets and hash maps. Additionally, it discusses the implementation of hash sets and hash maps in Python, emphasizing the importance of effective hash functions for maintaining performance.

Uploaded by

Sameer Sohail
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views26 pages

DSA Hash

The document explains hash tables, focusing on their efficiency in searching, adding, and deleting data compared to arrays and linked lists. It details the process of building a hash table, including the use of hash functions, handling collisions, and the differences between hash sets and hash maps. Additionally, it discusses the implementation of hash sets and hash maps in Python, emphasizing the importance of effective hash functions for maintaining performance.

Uploaded by

Sameer Sohail
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 26

DATA STRUCTURE AND

ALGORITHMS – HASH
TABLES
Dr. Muhammad Awais Sattar
Assistant Professor RSCI

1
HASH TABLE
 A Hash Table is a data structure designed to be fast to work with.
 The reason Hash Tables are sometimes preferred instead of arrays or linked
lists is because searching for, adding, and deleting data can be done really
quickly, even for large amounts of data.
 In a Linked List, finding a person "Bob" takes time because we would have to
go from one node to the next, checking each node, until the node with "Bob"
is found.
 And finding "Bob" in an Array could be fast if we knew the index, but when we
only know the name "Bob", we need to compare each element (like with
Linked Lists), and that takes time.
 With a Hash Table however, finding "Bob" is done really fast because there is
a way to go directly to where "Bob" is stored, using something called a hash
function.
BUILDING A HASH TABLE FROM
SCRATCH
 To get the idea of what a Hash Table is, let's try to build
one from scratch, to store unique first names inside it.
 We will build the Hash Set in 5 steps:
1. Starting with an array.
2. Storing names using a hash function.
3. Looking up an element using a hash function.
4. Handling collisions.
5. The basic Hash Set code example and simulation.
STEP 1: STARTING WITH AN ARRAY
 Using an array, we could store names like this:
my_array = ['Pete', 'Jones', 'Lisa', 'Bob', 'Siri’]
 To find "Bob" in this array, we need to compare each name, element by
element, until we find "Bob".
 If the array was sorted alphabetically, we could use Binary Search to find a
name quickly, but inserting or deleting names in the array would mean a
big operation of shifting elements in memory.
 To make interacting with the list of names really fast, let's use a Hash Table
for this instead, or a Hash Set, which is a simplified version of a Hash Table.
 To keep it simple, let's assume there is at most 10 names in the list, so the
array must be a fixed size of 10 elements. When talking about Hash Tables,
each of these elements is called a bucket.
my_hash_set = [None,None,None,None,None,None,None,None,None,None]
STEP 2: STORING NAMES USING A
HASH FUNCTION
 Now comes the special way we interact with the Hash Set we
are making.
 We want to store a name directly into its right place in the
array, and this is where the hash function comes in.
 A hash function can be made in many ways, it is up to the
creator of the Hash Table. A common way is to find a way to
convert the value into a number that equals one of the Hash
Set's index numbers, in this case a number from 0 to 9. In our
example we will use the Unicode number of each character,
summarize them and do a modulo 10 operation to get index
numbers 0-9.
STEP 2: STORING NAMES USING A
HASH FUNCTION
def hash_function(value):
sum_of_chars = 0
for char in value:
sum_of_chars += ord(char)

return sum_of_chars % 10

print("'Bob' has hash code:",hash_function('Bob'))

• The character "B" has Unicode code point 66, "o" has 111, and "b" has 98.
Adding those together we get 275. Modulo 10 of 275 is 5, so "Bob" should be
stored as an array element at index 5.

• The number returned by the hash function is called the hash code.
STEP 2: STORING NAMES USING A
HASH FUNCTION
 After storing "Bob" where the hash code tells us (index
5), our array now looks like this:
my_hash_set= [None,None,None,None,None,'Bob',None,None,None,None]

 We can use the hash function to find out where to store


the other names "Pete", "Jones", "Lisa", and "Siri" as
well.
 After using the hash function to store those names in the
correct position, our array looks like this:
my_hash_set = [None,'Jones',None,'Lisa',None,'Bob',None,'Siri','Pete',None]
STEP 3: LOOKING UP A NAME
USING A HASH FUNCTION
 We have now established a super basic Hash Set,
because we do not have to check the array element by
element anymore to find out if "Pete" is in there, we can
just use the hash function to go straight to the right
element!
 To find out if "Pete" is stored in the array, we give the
name "Pete" to our hash function, we get back hash
code 8, we go directly to the element at index 8, and
there he is. We found "Pete" without checking any other
elements.
STEP 3: LOOKING UP A NAME
USING A HASH
my_hash_setFUNCTION
=
[None,'Jones',None,'Lisa',None,'Bob',None,'
Siri','Pete',None]

def hash_function(value):
sum_of_chars = 0
for char in value:
sum_of_chars += ord(char)

return sum_of_chars % 10

def contains(name):
index = hash_function(name)
return my_hash_set[index] == name

print("'Pete' is in the Hash


Set:",contains('Pete'))
When deleting a name from our Hash Set, we can also use the hash function to go straight to
where the name is, and set that element value to None.
STEP 4: HANDLING COLLISIONS
 Let's also add "Stuart" to our Hash Set.
 We give "Stuart" to our hash function, and we get the hash code 3,
meaning "Stuart" should be stored at index 3.
 Trying to store "Stuart" creates what is called a collision, because "Lisa"
is already stored at index 3.
 To fix the collision, we can make room for more elements in the same
bucket, and solving the collision problem in this way is called chaining.
We can give room for more elements in the same bucket by
implementing each bucket as a linked list, or as an array.
 After implementing each bucket as an array, to give room for potentially
more than one name in each bucket, "Stuart" can also be stored at
index 3, and our Hash Set now looks like this:
STEP 4: HANDLING COLLISIONS
 Searching for "Stuart" in our Hash my_hash_set = [
[None],
Set now means that using the hash ['Jones'],
[None],
function we end up directly in ['Lisa', 'Stuart'],
bucket 3, but then be must first [None],
['Bob'],
check "Lisa" in that bucket, before [None],
we find "Stuart" as the second ['Siri'],
['Pete'],
element in bucket 3. [None]
]
STEP 5: HASH SET CODE EXAMPLE
AND SIMULATION
 To complete our very basic Hash Set code, let's have
functions for adding and searching for names in the Hash
Set, which is now a two dimensional array.

 Run the code example below, and try it with different


values to get a better understanding of how a Hash Set
works.
STEP 5: HASH SET CODE EXAMPLE
AND SIMULATION
HASH TABLES SUMMARIZED
 Hash Table elements are stored in storage containers called buckets.
 Every Hash Table element has a part that is unique that is called the key.
 A hash function takes the key of an element to generate a hash code.
 The hash code says what bucket the element belongs to, so now we can go
directly to that Hash Table element: to modify it, or to delete it, or just to check
if it exists. Specific hash functions are explained in detail on the next two pages.
 A collision happens when two Hash Table elements have the same hash code,
because that means they belong to the same bucket. A collision can be solved
in two ways.
 Chaining is the way collisions are solved in this tutorial, by using arrays or
linked lists to allow more than one element in the same bucket.
 Open Addressing is another way to solve collisions. With open addressing, if we
want to store an element but there is already an element in that bucket, the
element is stored in the next available bucket. This can be done in many
different ways, but we will not explain open addressing any further here.
HASH SETS
 A Hash Set is a form of Hash Table data structure that
usually holds a large number of elements.
 Using a Hash Set we can search, add, and remove
elements really fast.
 Hash Sets are used for lookup, to check if an element is
part of a set.
HASH SETS
 A Hash Set stores unique elements in buckets according to the
element's hash code.
 Hash code: A number generated from an element's unique value
(key), to determine what bucket that Hash Set element belongs to.
 Unique elements: A Hash Set cannot have more than one element
with the same value.
 Bucket: A Hash Set consists of many such buckets, or containers,
to store elements. If two elements have the same hash code, they
belong to the same bucket. The buckets are therefore often
implemented as arrays or linked lists, because a bucket needs to
be able to hold more than one element.
FINDING THE HASH CODE
 A hash code is generated by a hash function.
 After that, the hash function does a modulo 10 operation (%
10) on the sum of characters to get the hash code as a
number from 0 to 9.
 This means that a name is put into one of ten possible
buckets in the Hash Set, according to the hash code of that
name. The same hash code is generated and used when we
want to search for or remove a name from the Hash Set.
 The Hash Code gives us instant access as long as there is
just one name in the corresponding bucket.
DIRECT ACCESS IN HASH SETS
 Searching for Peter in the Hash Set above, means that the hash code 2 is generated
(512 % 10), and that directs us right to the bucket Peter is in. If that is the only name
in that bucket, we will find Peter right away.
 In cases like this we say that the Hash Set has constant time O(1) for searching,
adding, and removing elements, which is really fast.
 But, if we search for Jens, we need to search through the other names in that bucket
before we find Jens. In a worst case scenario, all names end up in the same bucket,
and the name we are searching for is the last one. In such a worst case scenario the
Hash Set has time complexity O(n) which is the same time complexity as arrays and
linked lists.
 To keep Hash Sets fast, it is therefore important to have a hash function that will
distribute the elements evenly between the buckets, and to have around as many
buckets as Hash Set elements.
 Having a lot more buckets than Hash Set elements is a waste of memory, and having a
lot less buckets than Hash Set elements is a waste of time
HASH SET IMPLEMENTATION
 Hash Sets in Python are typically done by using Python's own
set data type, but to get a better understanding of how Hash
Sets work we will not use that here.
 To implement a Hash Set in Python we create a class
SimpleHashSet.
 Inside the SimpleHashSet class we have a method __init__ to
initialize the Hash Set, a method hash_function for the hash
function, and methods for the basic Hash Set operations: add,
contains, and remove.
 We also create a method print_set to better see how the Hash
Set looks like.
HASH SET IMPLEMENTATION
HASH MAPS
 A Hash Map is a form of Hash Table data structure that
usually holds a large number of entries.
 Using a Hash Map we can search, add, modify, and remove
entries really fast.
 Hash Maps are used to find detailed information about
something.
 In the simulation below, people are stored in a Hash Map.
A person can be looked up using a person's unique social
security number (the Hash Map key), and then we can see
that person's name (the Hash Map value).
HASH MAPS
 It is easier to understand how Hash Maps work if you first have a look
at the two previous pages about Hash Tables and Hash Sets. It is also
important to understand the meaning of the words below.
• Entry: Consists of a key and a value, forming a key-value pair.
• Key: Unique for each entry in the Hash Map. Used to generate a hash code
determining the entry's bucket in the Hash Map. This ensures that every entry
can be efficiently located.
• Hash Code: A number generated from an entry's key, to determine what bucket
that Hash Map entry belongs to.
• Bucket: A Hash Map consists of many such buckets, or containers, to store
entries.
• Value: Can be nearly any kind of information, like name, birth date, and address
of a person. The value can be many different kinds of information combined.
FINDING THE HASH CODE
 A hash code is generated by a hash function.
 The hash function in the simulation above takes the numbers in the social security
number (not the dash), add them together, and does a modulo 10 operation (%
10) on the sum of characters to get the hash code as a number from 0 to 9.
 This means that a person is stored in one of ten possible buckets in the Hash Map,
according to the hash code of that person's social security number. The same
hash code is generated and used when we want to search for or remove a person
from the Hash Map.
 The Hash Code gives us instant access as long as there is just one person in the
corresponding bucket.
 In the simulation above, Charlotte has social security number 123-4567. Adding
the numbers together gives us a sum 28, and modulo 10 of that is 8. That is why
she belongs to bucket 8.
DIRECT ACCESS IN HASH MAPS
 Searching for Charlotte in the Hash Map, we must use the social security number 123-4567 (the
Hash Map key), which generates the hash code 8, as explained above.
 This means we can go straight to bucket 8 to get her name (the Hash Map value), without
searching through other entries in the Hash Map.
 In cases like this we say that the Hash Map has constant time O(1) for searching, adding, and
removing entries, which is really fast compared to using an array or a linked list.
 But, in a worst case scenario, all the people are stored in the same bucket, and if the person we
are trying to find is last person in this bucket, we need to compare with all the other social
security numbers in that bucket before we find the person we are looking for.
 In such a worst case scenario the Hash Map has time complexity O(n), which is the same time
complexity as arrays and linked lists.
 To keep Hash Maps fast, it is therefore important to have a hash function that will distribute the
entries evenly between the buckets, and to have around as many buckets as Hash Map entries.
 Having a lot more buckets than Hash Map entries is a waste of memory, and having a lot less
buckets than Hash Map entries is a waste of time.
HASH MAP IMPLEMENTATION
 Hash Maps in Python are typically done by using Python's own
dictionary data type, but to get a better understanding of how
Hash Maps work we will not use that here.
 To implement a Hash Map in Python we create a class
SimpleHashMap.
 Inside the SimpleHashMap class we have a method __init__ to
initialize the Hash Map, a method hash_function for the hash
function, and methods for the basic Hash Map operations: put,
get, and remove.
 We also create a method print_map to better see how the Hash
Map looks like.
HASH MAP IMPLEMENTATION

You might also like