Report (2020 EE 395) .

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Name: Muhammad Kashan Danish

Reg: 2020-EE-395

ASSIGNMENT
EE-234 Data Structures and Algorithms:

Objective:
Implement hash table for your favorite real-world application and use these two hashing
algorithms (Murmur hash, DJB2) and test which one is more efficient in terms of collisions and
when collision occurs, you can use your favorite collision resolution technique.

Application:
Sound cloud playlist creation

Theory:
Hash Tables

A hash table is a data structure which gives us an efficient way to store and retrieve data with a fixed
average time. It works by mapping keys to a specific location or to block in an array by using a hash
function to calculate the index at which each key pair should be stored.

Hash tables are mostly useful for applications which requirea a fast access, insertion and deletion such
as databases, caches and dictionaries. So the hash tables must use techniques such as communication
and pipeline control or open source to handle collisions (where two different keys create a balance of
values.

MurmurHash

Murmurhash is a non-cryptographic hash function created by Austin Appleby in 2008. It works by


using an input key by performing bitwise operations and splitting it into pieces to combine the object
of the key and then returning the hash value. The main benefit of Murmurhash is its ability to
distribute keys evenly across the hash table reducing collisions and improving performance system
and network features and ease of use.

it produces uniformaly distributed hash values that can help to minimize the collisions.it is designed to
be very fast which provide rapid hashing.

Murmur are of several versions like murmurhash1 ,2,3 etc.


DJB2

DJB2 is a simple and efficient hash function that is created by Daniel J. Bernstein who was a prominent
computer scientist. DJB2 is also used in applications that require quick string hashing such as in
compilers, interpreters and dynamic programming languages.

The DJB2 algorithm works by iterating over each character of the input string performing a series of
bit shifts and additions to compute the hash value. Despite its simplicity. DJB2 has been found to work
well in many practical applications, particularly for small datasets.

While DJB2 is not as sophisticated as more modern hash functions like MurmurHash. its ease of
implementation and surprisingly good distribution make it a popular choice for lightweight applications
where cryptographic strength is not required.

My Approach:
SoundCloud is a popular online audio songs platform and music sharing service that allows users to
upload promote and share audio tracks. Made in 2007 by Alexander Ljung and Eric Wahlforss in Berlin.
Germany SoundCloud has grown into one of the largest and most influential music streaming platforms
in the world,particularly in the realms of independent and emerging artists. SoundCloud has had a
significant impact on the music industry by democratizing music distribution and providing a platform
where emerging artists can reach a global audience without the need for traditional industry
gatekeepers. The platform is often credited with shaping the "SoundCloud Rap" subgenr a style of
music characterized by its lo-fi production and raw emotive lyrics which is gained prominence in the
mid-2010s.

Coding Environment:
• I am using Python Version 3.12.1 for programming.
• The Integrated Development Environment (or IDE) of selection is the Visual Studio Code
from Microsoft.
• This implementation is purely based on Python's built-in capabilities. You don't need to
install any additional libraries to run this code. Just ensure you have Python installed on your
system.

Requirements:
• Open Addressing: Linear probing is a type of open addressing technique where collisions
are resolved by probing or searching the next available slot in the hash table that is typically
in a linear sequence like by incrementing the index.
• Uniform Hash Function: A good hash function is essential for reducing the likelihood of
collisions. It should distribute keys uniformly across the hash table to minimize clustering
which can degrade performance.
• Table Size and Load Factor: The hash table should be appropriately sized relative to the
number of elements it will store. The load factor the ratio of the number of stored elements
to the table size should be kept low typically below 0.7 to reduce the frequency of collisions
and maintain efficient probing.

• Handling Full Table: When the table is full or nearly full the linear probing may become
inefficient. Implementing a resizing mechanism to expand the table size and rehash the
elements is necessary to maintain performance.
• Clustering Management: Linear probing can lead to clustering where a sequence of filled
slots makes collisions more frequent. While this is a known drawback in choosing a hash
function that minimizes clustering and using techniques like double hashing can help
alleviate this issue.

Code Documentation:
Code documentation involves writing descriptions and explanations within the source code to
make it easier for developers to understand, maintain, and use the code. This includes comments,
docstrings, and external documentation. Proper documentation explains the purpose of functions,
classes, variables, and complex logic, enhancing readability. It should be clear, concise, and up-to-date,
helping both the original developer and others who might work on the code later.

Results:
Murmurhash result: 3 collisions
DJB2 Result: 7 colliosions

Comparison:
As we can note that there are 3 collisions of murmurhash and on the other hand 7 collisions of DJB2
so here the murmurhash is fast because of less number of collisions.

Collision Resolution:
The collision resolution technique used in the provided hash table implementation is open addressing
with linear probing.

How Linear Probing Works:


• When a collision occurs when the hash function maps a key to an index that is already
occupied), the algorithm searches for the next available slot by incrementing the index by 1.
• If the incremented index is still occupied the algorithm continues to probe by incrementing the
index again until an empty slot is found.
• This method ensures that all keys are stored directly in the hash table itself, rather than in
separate data structures like linked lists.

Key Points:
• Linear Probing: The index is incremented by 1 each time a collision occurs, wrapping around
to the beginning of the table if necessary.
• Open Addressing: All elements are stored within the hash table array, with no separate chains
or lists used.
Code:
def murmur_hash(key, seed=0):

key = bytearray(key.encode())
c1 = 0xcc9e2d51 c2 =
0x1b873593 r1 = 15 r2 =
13
m=5 n=
0xe6546b64

hash_val = seed

length = len(key) for i in range(0, length,


4): k = int.from_bytes(key[i:i+4], 'little')
k = k * c1 & 0xFFFFFFFF k = (k << r1 | k
>> (32 - r1)) & 0xFFFFFFFF k = k * c2 &
0xFFFFFFFF

hash_val = (hash_val ^ k) & 0xFFFFFFFF


hash_val = (hash_val << r2 | hash_val >> (32 - r2)) & 0xFFFFFFFF
hash_val = (hash_val * m + n) & 0xFFFFFFFF

hash_val = (hash_val ^ length) & 0xFFFFFFFF

hash_val ^= hash_val >> 16


hash_val = (hash_val * 0x85ebca6b) & 0xFFFFFFFF
hash_val ^= hash_val >> 13
hash_val = (hash_val * 0xc2b2ae35) & 0xFFFFFFFF
hash_val ^= hash_val >> 16

return hash_val

def djb2_hash(key):
hash_val = 5381 for
char in key:
hash_val = ((hash_val << 5) + hash_val) + ord(char)
return hash_val & 0xFFFFFFFF

class HashTable: def __init__(self,


size, hash_function):
self.size = size self.table =
[None] * size self.hash_function =
hash_function
self.collisions = 0
def insert(self, key, value): index =
self.hash_function(key) % self.size
original_index = index while
self.table[index] is not None:
if self.table[index][0] == key:
self.table[index] = (key, value)
return self.collisions += 1
index = (index + 1) % self.size if index
== original_index: raise
Exception("HashTable is full")

self.table[index] = (key, value)

def search(self, key): index =


self.hash_function(key) % self.size
original_index = index while
self.table[index] is not None: if
self.table[index][0] == key: return
self.table[index][1] index = (index + 1)
% self.size if index == original_index:
return None
return None

def test_hash_table(hash_function):
sample_data = [
"song1", "song2", "song3", "song4", "song5",
"artist1", "artist2", "artist3", "trackID1", "trackID2"

ht = HashTable(size=10, hash_function=hash_function)

for i, key in enumerate(sample_data):


ht.insert(key, f"value_{i}")

results = {
"hash_function": hash_function.__name__,
"collisions": ht.collisions,
"table_content": ht.table
}
return results
# Test with MurmurHash murmur_result =
test_hash_table(murmur_hash) print("MurmurHash
Results:") print(f"Collisions:
{murmur_result['collisions']}") for i, item in
enumerate(murmur_result['table_content']):
print(f"Index {i}: {item}")

print("\n" + "-" * 40 + "\n") # Test with DJB2 Hash


djb2_result = test_hash_table(djb2_hash) print("DJB2
Hash Results:") print(f"Collisions:
{djb2_result['collisions']}") for i, item in
enumerate(djb2_result['table_content']):
print(f"Index {i}: {item}")

You might also like