Module6.Ds
Module6.Ds
A linear list representation of a dictionary involves using a simple list structure (like an
array or a linked list) to store key-value pairs. Each entry in the list consists of two parts: a
unique key and an associated value. Let's walk through the implementation step by step:
Step 1: Initialization When you initialize the dictionary, you start with an empty list. This
list will hold all the key-value pairs.
dictionary = []
Step 2: Insertion
Step 3: Search
Iterate Through the List: Go through each key-value pair in the list.
Compare Keys: Check if the current key matches the search key.
Return Value if Found: If a match is found, return the associated value.
function search(key):
for each pair in dictionary:
if pair.key == key:
return pair.value
return None // Or indicate that the key was not found
Step 4: Deletion
Search for the Key: Iterate through the list to find the key.
Remove the Pair: Once the key is found, remove the key-value pair from the list.
function delete(key):
for each pair in dictionary:
if pair.key == key:
remove pair from dictionary
return
Step 5: Traversal
Iterate Through the List: Go through each key-value pair in the list.
Process Each Pair: Perform the desired operation (like printing) on each pair.
function traverse():
for each pair in dictionary:
process(pair.key, pair.value)
Step 6: Update
Updating a value for a specific key follows the same procedure as insertion. If the key
exists, its value is updated; otherwise, the key-value pair is added.
Search Time: The search time is linear in the worst case, which can be inefficient for
large numbers of entries.
Insertion Time: Similar to search, if the list is being kept in sorted order, then finding the
correct place to insert a new key can take linear time.
Deletion Time: Removing an entry requires a search followed by a deletion, both of
which can be inefficient in a large list.
A Skip List is an advanced data structure that allows for fast search within an ordered
sequence of elements, providing a clever alternative to balanced trees. It is effectively a
linked-list that adds multiple layers on top of the standard list to enable faster search times.
Let's take an example to understand the working of the skip list. In this example, we have
14 nodes, such that these nodes are divided into two layers, as shown in the diagram.
The lower layer is a common line that links all nodes, and the top layer is an express line
that links only the main nodes, as you can see in the diagram.
Suppose you want to find 47 in this example. You will start the search from the first node
of the express line and continue running on the express line until you find a node that is
equal a 47 or more than 47.
You can see in the example that 47 does not exist in the express line, so you search for a
node of less than 47, which is 40. Now, you go to the normal line with the help of 40, and
search the 47, as shown in the diagram.
Note: Once you find a node like this on the "express line", you go from this node to a
"normal lane" using a pointer, and when you search for the node in the normal line.
Base Layer:
The bottom layer is a standard linear linked list that contains all the elements in the set,
sorted by key.
Express Layers:
Above the base layer are one or more layers of "express lanes." Each layer is a linked list
that skips over some elements from the list below it. The topmost layer has the fewest
elements, and each layer doubles (or varies by some factor) the number of elements it
skips compared to the layer directly beneath it.
A Skip List is an efficient, probabilistic data structure that enables fast search,
insertion, and deletion operations, akin to balanced trees like AVL or Red-Black trees.
Here's a step-by-step explanation of how a Skip List is used to represent a dictionary:
A Skip List is composed of several layers of linked lists, where each higher layer provides
a "shortcut" through the lower layers. The bottom layer (Layer 0) contains all the
elements, and each successive layer contains a subset of these elements, chosen randomly.
Each node in the Skip List contains pointers to the next node in the same layer and down
to the same node in the layer immediately below it.
Step 3: Initialization
Create a "head" node with pointers set to NULL for each layer.
Set a maximum level for the list, which determines how many layers the list can have.
Optionally, set a "tail" node with maximum possible key value to mark the end of each
level.
Step 4: Insertion
Find the Position: Start from the topmost layer and move forward until you find a node
with a greater key or reach the end of the layer. Then, move down one layer and continue.
Record the nodes where you move down a level.
Choose a Random Level: For the new node, randomly decide the number of layers
(level) it will participate in (usually done with a coin flip algorithm).
Rearrange Pointers: For each level up to the chosen level, update the pointers of the
recorded nodes to include the new node.
Step 5: Search
Step 6: Deletion
Perform a search for the key, keeping track of the predecessors at each level.
If the key is found, update the pointers of these predecessor nodes to skip the node being
deleted.
Remove the node and deallocate its memory.
Step 7: Random Level Generation
The level for a new node is typically determined using a random process. A common
method is a coin flip algorithm: a random level is generated, and as long as a coin flip
results in heads (or a random value meets a certain condition), you increase the level.
Step 8: Traversal
To traverse the Skip List, simply follow the bottom layer from the head node to the end,
processing or printing the values.
Hashing
Hashing is a fundamental data structure that efficiently stores and retrieves data in a way
that allows for quick access. It involves mapping data to a specific index in a hash table
using a hash function that enables fast retrieval of information based on its key. This
method is commonly used in databases, caching systems, and various programming
applications to optimize search and retrieval operations.
Separate Chaining:
The idea behind separate chaining is to implement the array as a linked list called a
chain. Separate chaining is one of the most popular and commonly used techniques in
order to handle collisions.
The linked list data structure is used to implement this technique. So what happens is,
when multiple elements are hashed into the same slot index, then these elements are
inserted into a singly-linked list which is known as a chain.
Here, all those elements that hash into the same slot index are inserted into a linked list.
Now, we can use a key K to search in the linked list by just linearly traversing. If
the intrinsic key for any entry is equal to K then it means that we have found our entry. If
we have reached the end of the linked list and yet we haven’t found our entry then it
means that the entry does not exist. Hence, the conclusion is that in separate chaining, if
two different elements have the same hash value then we store both the elements in the
same linked list one after the other.
Example: Let us consider a simple hash function as “key mod 7” and a sequence of keys
as 50, 700, 76, 85, 92, 73, 101
Advantages:
Simple to implement.
Hash table never fills up, we can always add more elements to the chain.
Less sensitive to the hash function or load factors.
It is mostly used when it is unknown how many and how frequently keys may be inserted
or deleted.
Disadvantages:
The cache performance of chaining is not good as keys are stored using a linked list.
Open addressing provides better cache performance as everything is stored in the same
table.
Wastage of Space (Some Parts of the hash table are never used)
If the chain becomes long, then search time can become O(n) in the worst case
Uses extra space for link
Open Addressing:
Like separate chaining, open addressing is a method for handling collisions. In Open
Addressing, all elements are stored in the hash table itself.
So at any point, the size of the table must be greater than or equal to the total number of
keys (Note that we can increase table size by copying old data if needed). This approach is
also known as closed hashing.
This entire procedure is based upon probing. We will understand the types of probing
ahead:
Insert(k): Keep probing until an empty slot is found. Once an empty slot is found, insert
k.
Search(k): Keep probing until the slot’s key doesn’t become equal to k or an empty slot is
reached.
Delete(k): Delete operation is interesting. If we simply delete a key, then the search may
fail. So slots of deleted keys are marked specially as “deleted”.
The insert can insert an item in a deleted slot, but the search doesn’t stop at a deleted slot.
Different ways of Open Addressing:
1. Linear Probing:
In linear probing, the hash table is searched sequentially that starts from the original
location of the hash. If in case the location that we get is already occupied, then we check
for the next location.
The function used for rehashing is as follows: rehash(key) = (n+1)%table-size.
For example, The typical gap between two probes is 1 as seen in the example below:
Let hash(x) be the slot index computed using a hash function and S be the table size
If slot hash(x) % S is full, then we try (hash(x) + 1) % S
If (hash(x) + 1) % S is also full, then we try (hash(x) + 2) % S
If (hash(x) + 2) % S is also full, then we try (hash(x) + 3) % S
…………………………………………..
…………………………………………..
Let us consider a simple hash function as “key mod 7” and a sequence of keys as 50, 700,
76, 85, 92, 73, 101,
which means hash(key)= key% S, here S=size of the table =7, indexed from 0 to 6.We can
define the hash function as per our choice if we want to create a hash table,although it is
fixed internally with a pre-defined formula.
Example: Let us consider a simple hash function as “key mod 5” and a sequence of keys
that are to be inserted are 50, 70, 76, 93.
Step1: First draw the empty hash table which will have a possible range of hash values
from 0 to 4 according to the hash function provided.
Hash table
Step 2: Now insert all the keys in the hash table one by one. The first key is 50. It will
map to slot number 0 because 50%5=0. So insert it into slot number 0.
Insert 50 into hash table
Step 3: The next key is 70. It will map to slot number 0 because 70%5=0 but 50 is already
at slot number 0 so, search for the next empty slot and insert it.
If you observe carefully, then you will understand that the interval between probes will
increase proportionally to the hash value.
Quadratic probing is a method with the help of which we can solve the problem of
clustering that was discussed above. This method is also known as the mid-
square method.
In this method, we look for the i2‘th slot in the ith iteration. We always start from the
original hash location. If only the location is occupied then we check the other slots.
let hash(x) be the slot index computed using hash function.
If slot hash(x) % S is full, then we try (hash(x) + 1*1) % S
If (hash(x) + 1*1) % S is also full, then we try (hash(x) + 2*2) % S
If (hash(x) + 2*2) % S is also full, then we try (hash(x) + 3*3) % S
…………………………………………..
…………………………………………..
Example: Let us consider table Size = 7, hash function as Hash(x) = x % 7 and collision
resolution strategy to be f(i) = i2 . Insert = 22, 30, and 50.
Step 1: Create a table of size 7.
Hash table
Step 2 – Insert 22 and 30
Hash(22) = 22 % 7 = 1, Since the cell at index 1 is empty, we can easily insert 22 at slot 1.
Hash(30) = 30 % 7 = 2, Since the cell at index 2 is empty, we can easily insert 30 at slot 2.
3. Double Hashing
The intervals that lie between probes are computed by another hash function. Double
hashing is a technique that reduces clustering in an optimized way. In this technique, the
increments for the probing sequence are computed by using another hash function. We use
another hash function hash2(x) and look for the i*hash2(x) slot in the ith rotation.