DS Unit-Ii
DS Unit-Ii
What is dictionary:
A dictionary is a general-purpose data structure for storing a group of objects. A dictionary is a
collection of keys, value pairs .it is a set of keys and each key has a single associated value. When
presented with a key, the dictionary will return the associated value.
Keys in a dictionary must be unique; an attempt to create a duplicate key will typically overwrite
the existing value for that key.
Note that there is a difference (which may be important) between a key not existing in a dictionary,
and the key existing but with its corresponding value being null
Dictionaries are often implemented as hash tables.
Usage: The concept of a key-value store is widely used in various computing systems, such as
caches and high-performance databases.
SkipList:
The skip list is a probabilistic data structure that is built upon the general idea of a linked list. The
skip list uses probability to build subsequent layers of linked lists upon an original linked list. Each
additional layer of links contains fewer elements.
Ex: We can think about the skip list like a subway system. There's one train that stops at every
single stop. However, there is also an express train. This train doesn't visit any unique stops, but it
will stop at fewer stops. This makes the express train an attractive option if you know where it
stops.
Why we need Skiplist:
The worst-case search time for a sorted linked list is O(n) as we can only linearly traverse the list
and cannot skip nodes while searching.
Can we search in a sorted linked list in better than O(n) time? The answer is Skip List.
skip lists are a linked-list-like structure which allows for faster search, insertion and deletion.
Organize ordered list hierarchically so we don’t need to scan all elements in search.
The time complexity and space complexity of a skiplist is O (log n) and O(n) respectively.
What is Skiplist:
A skip list for a set S of n distinct keys is a series of lists S0, S1 , … , Sn such that
Each list Si contains the special keys + and -
List S0 contains the keys of S in nondecreasing order
Each list is a subsequence(subset) of the previous one, i.e.,
S0 S1 … Sn
List Sn (i.e. Top Layer) contains only the two special keys
Search Operation:
Search Operation with an example along with diagram
Steps for search a key x in a skip list:
Start at the first position of the top list
At the current position p, we compare x with y key(next(p))
x = y: Return next(p)
x > y: Scan forward
x < y: Drop down
Repeat the above step. (If “drop down” pasts the bottom list, return null.)
Steps for Search Operation:
Element to be searched: 78.
1. At first our pointer will be at the top level S3 or any top layer if exists,
of the skip list.
2. We compare our search element with the next node element, if our element is higher,
we’ll scan forward in the same level, or else if next node element is higher value,
we’ll Drop down our pointer to the next level S2.
In our example 78 ≤ ∞, so we drop down to the below level S2.
3. We do the same operation in Level S2, 78 ≤ 31, we scan forward in the same level
Our pointer will be at 31.and compare 78 with the next element (i.e. ∞), so any how
78 less than ∞ so we drop down to the next level S1.
4. In Level S1, the result of same operation will be, 78 ≤ 34 → Scan Forward.
Then 78 ≤ 64 Scan Forward., 78≤∞ →Drop down to S1
5. In Level S0 , the result will be, 78 ≤ 78 →Element Found.
Insertion :
For n nodes, max levels: log 2 n+1
At the current position p, we compare with the next element with the current element to be
inserted.
If x>y: → Scan Forward.
x<y: → Drop down till the position (i=2) returned by the randomized algorithm then we
insert the element and update the links to the previous and forward elements of the same
level and same in all the levels Si to the S0.
We insert all elements into the skip list and at different positions in a sorted manner.
Deletion:
Suppose, we want to delete element x from the skip list.
To remove an item with key x from a skip list, we proceed as follows:
We search for x in the skip list and find the positions p0, p1 , …, pi of the items with
key x, where position pj is in list Sj
We remove positions p0, p1 , …, pi from the lists S0, S1, … , Si
We remove all but one list containing only the two special keys
Example: remove key 34
Hash table (Hash Map):
What is hashing:
Hashing is a technique that is used to uniquely identify a specific object from a group of similar
objects. Some examples of how hashing is used in our lives include:
In universities, each student is assigned a unique roll number that can be used to retrieve
information about them.
In libraries, each book is assigned a unique number that can be used to determine information about
the book, such as its exact position in the library or the users it has been issued to etc.
In both these examples the students and books were hashed to a unique number.
Assume that you have an object and you want to assign a key to it to make searching easy. To store
the key/value pair, you can use a simple array like a data structure where keys (integers) can be
used directly as an index to store values. However, in cases where the keys are large and cannot be
used directly as an index, you should use hashing techniques.
Different Time Complexities of Operations on Hash Table are as follows:
What is Collision
Since a hash function gets us a small number for a big key, there is possibility that two keys result
in same value.
The situation where a newly inserted key maps to an already occupied slot in hash table is called
collision and must be handled using some collision handling techniques.
A. Separate Chaining:
To handle collisions, the hash table has a technique known as separate chaining. Separate
chaining is defined as a method by which linked lists of values are built in association with each
location within the hash table when a collision occurs.
So, in place of the collision error which occurred in the figure, the cell now contains a linked list
containing the string 'Janet' and 'Martha' as seen in this new figure.
B. Open addressing
1. Linear Probing:
In linear probing, we linearly probe for next slot. For example, typical gap between two
probes is 1 as taken in below example also.
let hash_value be the slot index computed using hash function and table_size be the table size of
linear probing
hash_value = key % table_size
index = ((hash_value) + i) % table_size
If slot (hash + 0) % table_size is full, then we try (hash + 1) % table_size
If slot (hash + 1) % table_size is also full, then we try (hash + 2) % table_size
If slot (hash + 2) % table_size is also full, then we try (hash + 3) %table_size
. . . . . . . . . .
. . . . . . . . . .
If slot (hash + n-2) % table_size is also full, then we try (hash + (n-1)) % table_size.
Steps:
Initially the hash table will be empty.
For first element 50, calculate the hash value using key % table_size →50 % 7→1.
Then the hash function is applied
At first iteration the index position will be at 0
index → ((hash_value) + i) % table_size → (1+0) % 7 →1.
At index 1 the element 50 is inserted.
For second element 700, calculate the hash value using key % table_size →700 % 7→0.
Then the hash function is applied
At first iteration the index position will be at 0
index → ((hash_value) + i) % table_size → (0+0) % 7 →0.
At index 0 the element 700 is inserted.
Likewise:
key % table_size → 76 % 7→6.
index → ((hash_value) + i) % table_size → (6+0) % 7 →6.
At index 6 the element 76 is inserted.
Collision Occurs when we try to insert 85.
key % table_size → 85 % 7→1.
index → ((hash_value) + i) % table_size → (1+0) % 7 →1.
At index 1 collision of another element.
So, we increment the i value and insert the element, we increment I until we come to end of
array.
At index 2 the element 85 is inserted.
Likewise, we insert all the elements.
2. Quadratic Probing:
In Quadratic probing, we Quadratic probe for next slot. For example, typical gap between two
probes is 1 as taken in below example also.
let hash_value be the slot index computed using hash function and table_size be the table size
of quadratic probing.
hash_value = key % table_size
index = ((hash_value) + i * i) % table_size
Same as linear Probing we will insert at index positions returned by the i*2 formula.
3.Double Hashing:
Double hashing uses the idea of applying a second hash function to the key when a collision occurs.
The result of the second hash function will be the number of positions form the point of collision to
insert.
Whenever a collision occurs in first hash function we’ll go for second hash function,
popular second hash function is: Hash2 (key) = P - (key % P ) where P is a prime number that is
smaller than the size of the table (P < Table_Size).
The index value in Double hashing is calculated as: Hash1 (key) + I * Hash2 (key).
Example:
Advantages of Double hashing:
Disadvantages:
Like all other forms of open addressing, double hashing becomes linear as the hash table
approaches maximum capacity.
Keys are stored inside the hash table as well All the keys are stored only inside the hash
as outside the hash table. table.
No key is present outside the hash table.
The number of keys to be stored in the hash The number of keys to be stored in the hash
table can even exceed the size of the hash table can never exceed the size of the hash
table. table.
Some buckets of the hash table are never Buckets may be used even if no key maps to
used which leads to wastage of space. those particular buckets.
The hash function used is h(x) = x % 5. Rehash the entries into to a new hash table.
The size of the hash table is doubled.
Before that we calculate load factor whenever we insert a new element into the hash table.
If it’s greater than its pre-defined value (or default value of 0.75 if not given), then Rehash.
Now, rehash the key values from the old hash table into the new one using hash function → h(x) =
x % 10.
Disadvantages of Rehashing:
This might not give the required time complexity of O(1).
Hence, rehash must be done, increasing the size of the bucket Array so as to reduce the load factor
and the time complexity.
Extendible hashing:
Extendible hashing is dynamic hashing, since the table is dynamically increasing as we go on insert
the elements.
A hash table in which the hash function is the last few bits of the key and the table refers
to buckets.
Features of Extendible Hashing:
Step 2 – Convert into binary format: Convert the data element in Binary form. For string elements,
consider the ASCII equivalent integer of the starting character and then convert the integer into
binary form. Since we have 49 as our data element, its binary form is 110001.
Step 3 – Check Global Depth of the directory. Suppose the global depth of the Hash-directory is 3.
Step 4 – Identify the Directory: Consider the ‘Global-Depth’ number of LSBs in the binary number
and match it to the directory id.
E.g. : The binary obtained is: 110001 and the global-depth is 3. So, the hash function will return 3
LSBs of 110001 viz. 001.
Step 5 – Navigation: Now, navigate to the bucket pointed by the directory with directory-id 001.
Step 6 – Insertion and Overflow Check: Insert the element and check if the bucket overflows. If an
overflow is encountered, go to step 7 followed by Step 8, otherwise, go to step 9.
Step 7 – Tackling Over Flow Condition during Data Insertion: Many times, while inserting data in
the buckets, it might happen that the Bucket overflows. In such cases, we need to follow an
appropriate procedure to avoid mishandling of data.
First, Check if the local depth is less than or equal to the global depth. Then choose one of the cases
below.
Case1: If the local depth of the overflowing Bucket is equal to the global depth, then
Directory Expansion, as well as Bucket Split, needs to be performed. Then increment the global
depth and the local depth value by 1. And, assign appropriate pointers.
Directory expansion will double the number of directories present in the hash structure.
Case2: In case the local depth is less than the global depth, then only Bucket Split takes
place. Then increment only the local depth value by 1. And, assign appropriate pointers.
Step 8 – Rehashing of Split Bucket Elements: The Elements present in the overflowing bucket that
is split are rehashed w.r.t the new global depth of the directory.
Step 9 – The element is successfully hashed.
When the Hash table(2-bit) Overflows:
Advantages of Extendible hashing:
It gives the ability to design a hash function that is automatically changed underneath when
the hash table is resized.
Secondly, there is no need to recalculate the new bucket address for all the records in the
hash table. For example, as explained in Linear Hashing, we split an existing bucket B,
create a new bucket B*, and redistribute B’s contents between B and B*.
This implies that rehash or redistribution is limited only to the particular bucket that is being
split. There is absolutely no need to touch items in all the other buckets in the hash table.
The size of hash table will double each time when we extend the table
(Exponential rate of increase).