0% found this document useful (0 votes)
5 views16 pages

Module6.Ds

The document provides an overview of dictionaries as an abstract data type that stores key-value pairs, emphasizing their characteristics, fundamental operations, and types. It discusses various implementations, including linear lists, skip lists, and hashing, detailing how each structure operates and their advantages and disadvantages. Additionally, it covers collision resolution techniques in hashing, such as separate chaining and open addressing, highlighting their respective benefits and drawbacks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views16 pages

Module6.Ds

The document provides an overview of dictionaries as an abstract data type that stores key-value pairs, emphasizing their characteristics, fundamental operations, and types. It discusses various implementations, including linear lists, skip lists, and hashing, detailing how each structure operates and their advantages and disadvantages. Additionally, it covers collision resolution techniques in hashing, such as separate chaining and open addressing, highlighting their respective benefits and drawbacks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Introduction to Dictionaries

In computer science, a dictionary is an abstract data type that represents an ordered or


unordered list of key-value pair elements where keys are used to search/locate the
elements in the list. In a dictionary ADT, the data to be stored is divided into two parts:
 Key
 Value
Each item stored in a dictionary is represented by a key-value pair. Key is used to access
the elements in the dictionary. With the key we can access value which has more
information about the element.
Characteristics of Dictionary
Key-Value Pairs: Dictionaries store data as key-value pairs where each key is unique and
maps to exactly one value.
Direct Access: The primary feature of dictionaries is to provide fast access to elements not
by their position, as in lists or arrays, but by their keys.
Dynamic Size: Like many abstract data types, dictionaries typically allow for dynamic
resizing. New key-value pairs can be added, and existing ones can be removed.
Ordering: Some dictionaries maintain the order of elements, such as ordered maps or
sorted dictionaries. Others, like hash tables, do not maintain any particular order.
Key Uniqueness: Each key in a dictionary must be unique, though different keys can map
to the same value.
Fundamental Operations of Dictionary
Insert: Add a new key-value pair to the dictionary.
Search: Retrieve the value associated with a particular key.
Delete: Remove a key-value pair from the dictionary.
Update: Change the value associated with a given key.
Keys: Return a collection of all the keys in the dictionary.
Values: Return a collection of all the values in the dictionary.
Types of Dictionary
There are two major variations of dictionaries:
Ordered dictionary.
 In an ordered dictionary, the relative order is determined by comparison on keys.
 The order should be completely dependent on the key.
Unordered dictionary.
 In an unordered dictionary, no order relation is assumed on keys.
 Only equality operation can be performed on the keys.
Implementation of Dictionary
A dictionary can be implemented in several ways, such as
Fixed length array.
Linked lists.
Hashing
Trees (BST, Balancing BSTs, Splay trees, Tries etc.)

Linear list representation

A linear list representation of a dictionary involves using a simple list structure (like an
array or a linked list) to store key-value pairs. Each entry in the list consists of two parts: a
unique key and an associated value. Let's walk through the implementation step by step:

Step 1: Initialization When you initialize the dictionary, you start with an empty list. This
list will hold all the key-value pairs.

dictionary = []

Step 2: Insertion

To insert a new key-value pair into the dictionary:


Check for the Key's Existence: Iterate through the list to check if the given key already
exists.
Update or Insert:

 If the key exists, update the associated value.


 If the key does not exist, append a new key-value pair to the list.

function insert(key, value):


for each pair in dictionary:
if pair.key == key:
pair.value = value
return
dictionary.append((key, value))

Step 3: Search

To find a value associated with a given key:

 Iterate Through the List: Go through each key-value pair in the list.
 Compare Keys: Check if the current key matches the search key.
 Return Value if Found: If a match is found, return the associated value.

function search(key):
for each pair in dictionary:
if pair.key == key:
return pair.value
return None // Or indicate that the key was not found

Step 4: Deletion

To delete a key-value pair from the dictionary:

 Search for the Key: Iterate through the list to find the key.
 Remove the Pair: Once the key is found, remove the key-value pair from the list.

function delete(key):
for each pair in dictionary:
if pair.key == key:
remove pair from dictionary
return

Step 5: Traversal

To traverse the dictionary and access each key-value pair:

 Iterate Through the List: Go through each key-value pair in the list.
 Process Each Pair: Perform the desired operation (like printing) on each pair.

function traverse():
for each pair in dictionary:
process(pair.key, pair.value)

Step 6: Update

Updating a value for a specific key follows the same procedure as insertion. If the key
exists, its value is updated; otherwise, the key-value pair is added.

Advantages linear list representation of a dictionary

 Simplicity: A linear list is simple and easy to implement.


 Dynamic Size: The list can grow or shrink as key-value pairs are added or removed.

Disadvantages linear list representation of a dictionary

 Search Time: The search time is linear in the worst case, which can be inefficient for
large numbers of entries.
 Insertion Time: Similar to search, if the list is being kept in sorted order, then finding the
correct place to insert a new key can take linear time.
 Deletion Time: Removing an entry requires a search followed by a deletion, both of
which can be inefficient in a large list.

Skip list representation

A Skip List is an advanced data structure that allows for fast search within an ordered
sequence of elements, providing a clever alternative to balanced trees. It is effectively a
linked-list that adds multiple layers on top of the standard list to enable faster search times.

Working of the Skip list

Let's take an example to understand the working of the skip list. In this example, we have
14 nodes, such that these nodes are divided into two layers, as shown in the diagram.

The lower layer is a common line that links all nodes, and the top layer is an express line
that links only the main nodes, as you can see in the diagram.

Suppose you want to find 47 in this example. You will start the search from the first node
of the express line and continue running on the express line until you find a node that is
equal a 47 or more than 47.
You can see in the example that 47 does not exist in the express line, so you search for a
node of less than 47, which is 40. Now, you go to the normal line with the help of 40, and
search the 47, as shown in the diagram.

Note: Once you find a node like this on the "express line", you go from this node to a
"normal lane" using a pointer, and when you search for the node in the normal line.

Here's how a Skip List is typically represented:

 Base Layer:

The bottom layer is a standard linear linked list that contains all the elements in the set,
sorted by key.

 Express Layers:

Above the base layer are one or more layers of "express lanes." Each layer is a linked list
that skips over some elements from the list below it. The topmost layer has the fewest
elements, and each layer doubles (or varies by some factor) the number of elements it
skips compared to the layer directly beneath it.
A Skip List is an efficient, probabilistic data structure that enables fast search,
insertion, and deletion operations, akin to balanced trees like AVL or Red-Black trees.
Here's a step-by-step explanation of how a Skip List is used to represent a dictionary:

Step 1: Understanding the Structure

A Skip List is composed of several layers of linked lists, where each higher layer provides
a "shortcut" through the lower layers. The bottom layer (Layer 0) contains all the
elements, and each successive layer contains a subset of these elements, chosen randomly.

Step 2: Layered Links

Each node in the Skip List contains pointers to the next node in the same layer and down
to the same node in the layer immediately below it.
Step 3: Initialization

When initializing a Skip List:

 Create a "head" node with pointers set to NULL for each layer.
 Set a maximum level for the list, which determines how many layers the list can have.
 Optionally, set a "tail" node with maximum possible key value to mark the end of each
level.

Step 4: Insertion

To insert a key-value pair:

 Find the Position: Start from the topmost layer and move forward until you find a node
with a greater key or reach the end of the layer. Then, move down one layer and continue.
Record the nodes where you move down a level.
 Choose a Random Level: For the new node, randomly decide the number of layers
(level) it will participate in (usually done with a coin flip algorithm).
 Rearrange Pointers: For each level up to the chosen level, update the pointers of the
recorded nodes to include the new node.

Step 5: Search

To search for a key:

 Start from the topmost layer of the head node.


 Move forward in the current layer until you find a node with a greater key or reach the
end.
 If you find a greater key, move down one layer.
 Repeat until you reach the bottom layer. If the key matches, return the value; otherwise,
the key is not in the list.

Step 6: Deletion

To delete a key-value pair:

 Perform a search for the key, keeping track of the predecessors at each level.
 If the key is found, update the pointers of these predecessor nodes to skip the node being
deleted.
 Remove the node and deallocate its memory.
Step 7: Random Level Generation

The level for a new node is typically determined using a random process. A common
method is a coin flip algorithm: a random level is generated, and as long as a coin flip
results in heads (or a random value meets a certain condition), you increase the level.

Step 8: Traversal

To traverse the Skip List, simply follow the bottom layer from the head node to the end,
processing or printing the values.

Hashing

Hashing is a fundamental data structure that efficiently stores and retrieves data in a way
that allows for quick access. It involves mapping data to a specific index in a hash table
using a hash function that enables fast retrieval of information based on its key. This
method is commonly used in databases, caching systems, and various programming
applications to optimize search and retrieval operations.

Data Structures – Hashing


Hashing is a technique used in data structures to store and retrieve data efficiently. It
involves using a hash function to map data items to a fixed-size array which is called
a hash table.
1. Hash Function: You provide your data items into the hash function.
2. Hash Code: The hash function crunches the data and give a unique hash code.
3. Hash Table: The hash code then points you to a specific location within the hash table.
A hash table is also known as a hash map. It is a data structure that stores key-value
pairs. It uses a hash function to map keys to a fixed-size array, called a hash table.
This allows in faster search, insertion, and deletion operations.
Hash Function
The hash function is a function that takes a key and returns an index into the hash
table. The goal of a hash function is to distribute keys evenly across the hash table,
minimizing collisions (when two keys map to the same index).
Common hash functions include:

 Division Method: Key % Hash Table Size


 Multiplication Method: (Key * Constant) % Hash Table Size
 Universal Hashing: A family of hash functions designed to minimize collisions.

What is a Hash Collision?


A hash collision occurs when two different keys map to the same index in a hash table.
This can happen even with a good hash function, especially if the hash table is full or the
keys are similar.

Causes of Hash Collisions:


 Poor Hash Function: A hash function that does not distribute keys evenly across the
hash table can lead to more collisions.
 High Load Factor: A high load factor (ratio of keys to hash table size) increases the
probability of collisions.
 Similar Keys: Keys that are similar in value or structure are more likely to collide.

Collision Resolution Techniques


There are two types of collision resolution techniques:
1. Separate chaining (open hashing)
2. Open addressing (closed hashing)

Separate Chaining:
The idea behind separate chaining is to implement the array as a linked list called a
chain. Separate chaining is one of the most popular and commonly used techniques in
order to handle collisions.

The linked list data structure is used to implement this technique. So what happens is,
when multiple elements are hashed into the same slot index, then these elements are
inserted into a singly-linked list which is known as a chain.
Here, all those elements that hash into the same slot index are inserted into a linked list.
Now, we can use a key K to search in the linked list by just linearly traversing. If
the intrinsic key for any entry is equal to K then it means that we have found our entry. If
we have reached the end of the linked list and yet we haven’t found our entry then it
means that the entry does not exist. Hence, the conclusion is that in separate chaining, if
two different elements have the same hash value then we store both the elements in the
same linked list one after the other.

Example: Let us consider a simple hash function as “key mod 7” and a sequence of keys
as 50, 700, 76, 85, 92, 73, 101

Advantages:
 Simple to implement.
 Hash table never fills up, we can always add more elements to the chain.
 Less sensitive to the hash function or load factors.
 It is mostly used when it is unknown how many and how frequently keys may be inserted
or deleted.
Disadvantages:
 The cache performance of chaining is not good as keys are stored using a linked list.
Open addressing provides better cache performance as everything is stored in the same
table.
 Wastage of Space (Some Parts of the hash table are never used)
 If the chain becomes long, then search time can become O(n) in the worst case
 Uses extra space for link

Open Addressing:
Like separate chaining, open addressing is a method for handling collisions. In Open
Addressing, all elements are stored in the hash table itself.
So at any point, the size of the table must be greater than or equal to the total number of
keys (Note that we can increase table size by copying old data if needed). This approach is
also known as closed hashing.
This entire procedure is based upon probing. We will understand the types of probing
ahead:
 Insert(k): Keep probing until an empty slot is found. Once an empty slot is found, insert
k.
 Search(k): Keep probing until the slot’s key doesn’t become equal to k or an empty slot is
reached.
 Delete(k): Delete operation is interesting. If we simply delete a key, then the search may
fail. So slots of deleted keys are marked specially as “deleted”.
The insert can insert an item in a deleted slot, but the search doesn’t stop at a deleted slot.
Different ways of Open Addressing:

1. Linear Probing:
In linear probing, the hash table is searched sequentially that starts from the original
location of the hash. If in case the location that we get is already occupied, then we check
for the next location.
The function used for rehashing is as follows: rehash(key) = (n+1)%table-size.
For example, The typical gap between two probes is 1 as seen in the example below:
Let hash(x) be the slot index computed using a hash function and S be the table size
If slot hash(x) % S is full, then we try (hash(x) + 1) % S
If (hash(x) + 1) % S is also full, then we try (hash(x) + 2) % S
If (hash(x) + 2) % S is also full, then we try (hash(x) + 3) % S
…………………………………………..
…………………………………………..
Let us consider a simple hash function as “key mod 7” and a sequence of keys as 50, 700,
76, 85, 92, 73, 101,
which means hash(key)= key% S, here S=size of the table =7, indexed from 0 to 6.We can
define the hash function as per our choice if we want to create a hash table,although it is
fixed internally with a pre-defined formula.
Example: Let us consider a simple hash function as “key mod 5” and a sequence of keys
that are to be inserted are 50, 70, 76, 93.
 Step1: First draw the empty hash table which will have a possible range of hash values
from 0 to 4 according to the hash function provided.

Hash table
 Step 2: Now insert all the keys in the hash table one by one. The first key is 50. It will
map to slot number 0 because 50%5=0. So insert it into slot number 0.
Insert 50 into hash table
 Step 3: The next key is 70. It will map to slot number 0 because 70%5=0 but 50 is already
at slot number 0 so, search for the next empty slot and insert it.

Insert 70 into hash table


 Step 4: The next key is 76. It will map to slot number 1 because 76%5=1 but 70 is already
at slot number 1 so, search for the next empty slot and insert it.

Insert 76 into hash table


 Step 5: The next key is 93 It will map to slot number 3 because 93%5=3, So insert it into
slot number 3.
Insert 93 into hash table
2. Quadratic Probing

If you observe carefully, then you will understand that the interval between probes will
increase proportionally to the hash value.
Quadratic probing is a method with the help of which we can solve the problem of
clustering that was discussed above. This method is also known as the mid-
square method.
In this method, we look for the i2‘th slot in the ith iteration. We always start from the
original hash location. If only the location is occupied then we check the other slots.
let hash(x) be the slot index computed using hash function.
If slot hash(x) % S is full, then we try (hash(x) + 1*1) % S
If (hash(x) + 1*1) % S is also full, then we try (hash(x) + 2*2) % S
If (hash(x) + 2*2) % S is also full, then we try (hash(x) + 3*3) % S
…………………………………………..
…………………………………………..
Example: Let us consider table Size = 7, hash function as Hash(x) = x % 7 and collision
resolution strategy to be f(i) = i2 . Insert = 22, 30, and 50.
 Step 1: Create a table of size 7.

Hash table
 Step 2 – Insert 22 and 30
 Hash(22) = 22 % 7 = 1, Since the cell at index 1 is empty, we can easily insert 22 at slot 1.
 Hash(30) = 30 % 7 = 2, Since the cell at index 2 is empty, we can easily insert 30 at slot 2.

Insert keys 22 and 30 in the hash table


 Step 3: Inserting 50
 Hash(50) = 50 % 7 = 1
 In our hash table slot 1 is already occupied. So, we will search for slot 1+12, i.e. 1+1 = 2,
 Again slot 2 is found occupied, so we will search for cell 1+22, i.e.1+4 = 5,
 Now, cell 5 is not occupied so we will place 50 in slot 5.

Insert key 50 in the hash table

3. Double Hashing

The intervals that lie between probes are computed by another hash function. Double
hashing is a technique that reduces clustering in an optimized way. In this technique, the
increments for the probing sequence are computed by using another hash function. We use
another hash function hash2(x) and look for the i*hash2(x) slot in the ith rotation.

let hash(x) be the slot index computed using hash function.


If slot hash(x) % S is full, then we try (hash(x) + 1*hash2(x)) % S
If (hash(x) + 1*hash2(x)) % S is also full, then we try (hash(x) + 2*hash2(x)) % S
If (hash(x) + 2*hash2(x)) % S is also full, then we try (hash(x) + 3*hash2(x)) % S
…………………………………………..
…………………………………………..
Example: Insert the keys 27, 43, 692, 72 into the Hash Table of size 7. where first hash-
function is h1(k) = k mod 7 and second hash-function is h2(k) = 1 + (k mod 5)
 Step 1: Insert 27
 27 % 7 = 6, location 6 is empty so insert 27 into 6 slot.

Insert key 27 in the hash table


 Step 2: Insert 43
 43 % 7 = 1, location 1 is empty so insert 43 into 1 slot.

Insert key 43 in the hash table


 Step 3: Insert 692
 692 % 7 = 6, but location 6 is already being occupied and this is a collision
 So we need to resolve this collision using double hashing.
hnew = [h1(692) + i * (h2(692)] % 7
= [6 + 1 * (1 + 692 % 5)] % 7
= 9% 7
=2
Now, as 2 is an empty slot,
so we can insert 692 into 2nd slot.
Insert key 692 in the hash table
 Step 4: Insert 72
 72 % 7 = 2, but location 2 is already being occupied and this is a collision.
 So we need to resolve this collision using double hashing.
hnew = [h1(72) + i * (h2(72)] % 7
= [2 + 1 * (1 + 72 % 5)] % 7
=5%7
= 5,
Now, as 5 is an empty slot,
so we can insert 72 into 5th slot.

You might also like