DS Unit-Ii
DS Unit-Ii
DICTIONARIES:
Dictionary is a collection of pairs of key and value where every value is associated
with the corresponding key.
Basic operations that can be performed on dictionary are:
1. Insertion of value in the dictionary
2. Deletion of particular value from dictionary
3. Searching of a specific value with the help of key
struct node
{
int key;
int
value;
struct node *next;
} *head;
void
dictionary();
void insert_d( );
void delete_d(
); void
display_d( );
void search();
New/head/curr/prev
1 10 NULL
New
4 20 NULL
Compare the key value of „curr‟ and „New‟ node. If New->key > Curr->key then attach
New node to „curr‟ node.
key value. (curr->key < New->key) is false. Hence else part will get executed.
1 10 4 20
7 80 NULL
3 15
void insert_d()
{
int k;
int
data;
node *p,*curr,*prev;
printf("Enter an key and value to be inserted:");
scanf(“%d”, &k);
scanf(“%d”, &data);
p=new
node; p-
>key=k;
p-
>value=data;
p-
>next=NULL;
if(head==NULL)
{
head=p;
}
els
e
{ curr=head;
while((curr->key<p->key)&&(curr->next!=NULL))
{
prev=curr;
curr=curr-
>next;
}
if(curr->next==NULL)
{
if(curr->key<p->key)
{
curr-
>next=p;
} prev=curr;
els
e
{
p- >next=prev-
>next; prev-
}
} >next=p;
els
e
{
p->next=prev-
>next; prev-
>next=p;
}
printf("\nInserted into dictionary Sucesfully \n");
}}
Case 1: Initially assign „head‟ node as „curr‟ node.Then ask for a key value of the node
which is to be deleted. Then starting from head node key value of each jode is cked and
compared with the desired node‟s key value. We will get node which is to be deleted in
variable „curr‟. The node given by variable „prev‟ keeps track of previous node of „cuu‟
node. For eg, delete node with key value 4 then
cur
1 10 3 15 4 20 7 80 ULL
Case 2:
Then, simply make „head‟ node as next node and delete „curr‟
curr head
1 10 3 15 4 20 7 80 ULL
head
3 15 4 20 7 80 ULL
void delete_d( )
{
node*curr,*prev;
printf("Enter key value that you want to delete...");
scanf(“%d”,&k);
if(head==NULL)
printf("\ndictionary is Underflow");
else
{ curr=head;
while(curr!=NULL)
{
if(curr->key==k)
break;
prev=curr;
curr=curr->next;
}
}
if(curr==NULL)
printf("Node not found...");
else
{
if(curr==head)
head=curr->next;
else
prev->next=curr->next;
delete curr;
printf("Item deleted from dictionary...");
}
}
while(curr!=NULL)
{
if(cur->key ==k)
{
printf(“\n%d : %d”, cur->key, cur-
>value); break;
}
cur=curr->next;
}
if(cur==NULL)
printf(“\n NOT found”);
}
A skip list is a probabilistic data structure. The skip list is used to store a sorted list of elements or data with
a linked list. It allows the process of the elements or data to view efficiently. In one single step, it skips
several elements of the entire list, which is why it is known as a skip list.
The skip list is an extended version of the linked list. It allows the user to search, remove, and insert the
element very quickly. It consists of a base list that includes a set of elements which maintains the link
hierarchy of the subsequent elements.
The lowest layer of the skip list is a common sorted linked list, and the top layers of the skip list are like an
"express line" where the elements are skipped.
Let's take an example to understand the working of the skip list. In this example, we have 14 nodes, such
that these nodes are divided into two layers, as shown in the diagram.
The lower layer is a common line that links all nodes, and the top layer is an express line that links only the
main nodes, as you can see in the diagram.
Suppose you want to find 47 in this example. You will start the search from the first node of the express
line and continue running on the express line until you find a node that is equal a 47 or more than 47.
You can see in the example that 47 does not exist in the express line, so you search for a node of less than
47, which is 40. Now, you go to the normal line with the help of 40, and search the 47, as shown in the
diagram.
Insertion operation: It is used to add a new node to a particular location in a specific situation.
Search Operation: The search operation is used to search a particular node in a skip list.
Example 1: Create a skip list, we want to insert these following keys in the empty skip list.
1. 6 with level 1.
2. 29 with level 1.
3. 22 with level 4.
4. 9 with level 3.
5. 17 with level 1.
6. 4 with level 2.
Ans:
Ans:
1. It is used in distributed applications, and it represents the pointers and system in the distributed
applications.
2. It is used to implement a dynamic elastic concurrent queue with low lock contention.
3. It is also used with the QMap template class.
4. The indexing of the skip list is used in running median problems.
5. The skip list is used for the delta-encoding posting in the Lucene search.
Hash Table
Hash table is one of the most important data structures that uses a special function known as a hash
function that maps a given value with a key to access the elements faster.
A Hash table is a data structure that stores some information, and the information has basically two main
components, i.e., key and value. The hash table can be implemented with the help of an associative array.
The efficiency of mapping depends upon the efficiency of the hash function used for mapping.
For example, suppose the key value is John and the value is the phone number, so when we pass the key
value in the hash function shown as below:
Drawback of Hash function
A Hash function assigns each value with a unique key. Sometimes hash table uses an imperfect hash
function that causes a collision because the hash function generates the same key of two different values.
Hashing
Hashing is one of the searching techniques that uses a constant time. The time complexity in hashing is
O(1). Till now, we read the two techniques for searching, i.e., linear search and binary search. The worst
time complexity in linear search is O(n), and O(logn) in binary search. In both the searching techniques, the
searching depends upon the number of elements but we want the technique that takes a constant time. So,
hashing technique came that provides a constant time.
In Hashing technique, the hash table and hash function are used. Using the hash function, we can calculate
the address at which the value can be stored.
The main idea behind the hashing is to create the (key/value) pairs. If the key is given, then the algorithm
computes the index at which the value would be stored. It can be written as:
Index = hash(key)
o Division method
o Folding method
o Mid square method
For example, if the key value is 6 and the size of the hash table is 10. When we apply the hash function to
key 6 then the index would be:
h(6) = 6%10 = 6
Collision
When the two different values have the same value, then the problem occurs between the two values,
known as a collision. In the above example, the value is stored at index 6. If the key value is 26, then the
index would be:
h(26) = 26%10 = 6
Therefore, two values are stored at the same index, i.e., 6, and this leads to the collision problem. To
resolve these collisions, we have some techniques known as collision techniques.
Open Hashing
In Open Hashing, one of the methods used to resolve the collision is known as a chaining method.
Let's first understand the chaining to resolve the collision.
The value 11 would be stored at the index 5. Now, we have two values (6, 11) stored at the same index, i.e.,
5. This leads to the collision problem, so we will use the chaining method to avoid the collision. We will
create one more list and add the value 11 to this list. After the creation of the new list, the newly created list
will be linked to the list having value 6.
The value 13 would be stored at index 9. Now, we have two values (3, 13) stored at the same index, i.e., 9.
This leads to the collision problem, so we will use the chaining method to avoid the collision. We will
create one more list and add the value 13 to this list. After the creation of the new list, the newly created list
will be linked to the list having value 3.
The value 7 would be stored at index 7. Now, we have two values (2, 7) stored at the same index, i.e., 7.
This leads to the collision problem, so we will use the chaining method to avoid the collision. We will
create one more list and add the value 7 to this list. After the creation of the new list, the newly created list
will be linked to the list having value 2.
According to the above calculation, the value 12 must be stored at index 7, but the value 2 exists at index 7.
So, we will create a new list and add 12 to the list. The newly created list will be linked to the list having a
value 7.
The calculated index value associated with each key value is shown in the below table:
key Location(u)
3 ((2*3)+3)%10 = 9
2 ((2*2)+3)%10 = 7
9 ((2*9)+3)%10 = 1
6 ((2*6)+3)%10 = 5
11 ((2*11)+3)%10 = 5
13 ((2*13)+3)%10 = 9
7 ((2*7)+3)%10 = 7
12 ((2*12)+3)%10 = 7
Closed Hashing
1. Linear probing
2. Quadratic probing
3. Double Hashing technique
Linear Probing
Linear probing is one of the forms of open addressing. As we know that each cell in the hash table contains
a key-value pair, so when the collision occurs by mapping a new key to the cell already occupied by
another key, then linear probing technique searches for the closest free locations and adds a new key to that
empty cell. In this case, searching is performed sequentially, starting from the position where the collision
occurs till the empty cell is not found.
The next key value is 13. The index value associated with this key value is 9 when hash function is applied.
The cell is already filled at index 9. When linear probing is applied, the nearest empty cell to the index 9 is
0; therefore, the value 13 will be added at the index 0.
The next key value is 7. The index value associated with the key value is 7 when hash function is applied.
The cell is already filled at index 7. When linear probing is applied, the nearest empty cell to the index 7 is
8; therefore, the value 7 will be added at the index 8.
The next key value is 12. The index value associated with the key value is 7 when hash function is applied.
The cell is already filled at index 7. When linear probing is applied, the nearest empty cell to the index 7 is
2; therefore, the value 12 will be added at the index 2.
Quadratic Probing
In case of linear probing, searching is performed linearly. In contrast, quadratic probing is an open
addressing technique that uses quadratic polynomial for searching until a empty slot is found.
It can also be defined as that it allows the insertion ki at first free location from (u+i2)%m
The key values 3, 2, 9, 6 are stored at the indexes 9, 7, 1, 5, respectively. We do not need to apply the
quadratic probing technique on these key values as there is no occurrence of the collision.
The index value of 11 is 5, but this location is already occupied by the 6. So, we apply the quadratic
probing technique.
When i = 0
Index= (5+02)%10 = 5
When i=1
Index = (5+12)%10 = 6
The next element is 13. When the hash function is applied on 13, then the index value comes out to be 9,
which we already discussed in the chaining method. At index 9, the cell is occupied by another value, i.e.,
3. So, we will apply the quadratic probing technique to calculate the free location.
When i=0
Index = (9+02)%10 = 9
When i=1
Index = (9+12)%10 = 0
The next element is 7. When the hash function is applied on 7, then the index value comes out to be 7,
which we already discussed in the chaining method. At index 7, the cell is occupied by another value, i.e.,
7. So, we will apply the quadratic probing technique to calculate the free location.
When i=0
Index = (7+02)%10 = 7
When i=1
Index = (7+12)%10 = 8
Since location 8 is empty, so the value 7 will be added at the index 8.
The next element is 12. When the hash function is applied on 12, then the index value comes out to be 7.
When we observe the hash table then we will get to know that the cell at index 7 is already occupied by the
value 2. So, we apply the Quadratic probing technique on 12 to determine the free location.
When i=0
Index= (7+02)%10 = 7
When i=1
Index = (7+12)%10 = 8
When i=2
Index = (7+22)%10 = 1
When i=3
Index = (7+32)%10 = 6
When i=4
Index = (7+42)%10 = 3
Since the location 3 is empty, so the value 12 would be stored at the index 3.
The final hash table would be:
Double Hashing
Double hashing is an open addressing technique which is used to avoid the collisions. When the collision
occurs then this technique uses the secondary hash of the key. It uses one hash value as an index to move
forward until the empty location is found.
In double hashing, two hash functions are used. Suppose h1(k) is one of the hash functions used to calculate
the locations whereas h2(k) is another hash function. It can be defined as "insert ki at first free place
from (u+v*i)%m where i=(0 to m-1)". In this case, u is the location computed using the hash function and v
is equal to (h2(k)%m).
h1(k) = 2k+3
h2(k) = 3k+1
key Location (u) v probe
3 ((2*3)+3)%10 = 9 - 1
2 ((2*2)+3)%10 = 7 - 1
9 ((2*9)+3)%10 = 1 - 1
6 ((2*6)+3)%10 = 5 - 1
11 ((2*11)+3)%10 = 5 (3(11)+1)%10 =4 3
13 ((2*13)+3)%10 = 9 (3(13)+1)%10 = 0
7 ((2*7)+3)%10 = 7 (3(7)+1)%10 = 2
12 ((2*12)+3)%10 = 7 (3(12)+1)%10 = 7 2
As we know that no collision would occur while inserting the keys (3, 2, 9, 6), so we will not apply double
hashing on these key values.
On inserting the key 11 in a hash table, collision will occur because the calculated index value of 11 is 5
which is already occupied by some another value. Therefore, we will apply the double hashing technique
on key 11. When the key value is 11, the value of v is 4.
When i=0
Index = (5+4*0)%10 =5
When i=1
Index = (5+4*1)%10 = 9
When i=2
Index = (5+4*2)%10 = 3
Since the location 3 is empty in a hash table; therefore, the key 11 is added at the index 3.
The next element is 13. The calculated index value of 13 is 9 which is already occupied by some another
key value. So, we will use double hashing technique to find the free location. The value of v is 0.
When i=0
Index = (9+0*0)%10 = 9
We will get 9 value in all the iterations from 0 to m-1 as the value of v is zero. Therefore, we cannot insert
13 into a hash table.
The next element is 7. The calculated index value of 7 is 7 which is already occupied by some another key
value. So, we will use double hashing technique to find the free location. The value of v is 2.
When i=0
Index = (7 + 2*0)%10 = 7
When i=1
Index = (7+2*1)%10 = 9
When i=2
Index = (7+2*2)%10 = 1
When i=3
Index = (7+2*3)%10 = 3
When i=4
Index = (7+2*4)%10 = 5
When i=5
Index = (7+2*5)%10 = 7
When i=6
Index = (7+2*6)%10 = 9
When i=7
Index = (7+2*7)%10 = 1
When i=8
Index = (7+2*8)%10 = 3
When i=9
Index = (7+2*9)%10 = 5
Since we checked all the cases of i (from 0 to 9), but we do not find suitable place to insert 7. Therefore,
key 7 cannot be inserted in a hash table.
The next element is 12. The calculated index value of 12 is 7 which is already occupied by some another
key value. So, we will use double hashing technique to find the free location. The value of v is 7.
Now, substituting the values of u and v in (u+v*i)%m
When i=0
Index = (7+7*0)%10 = 7
When i=1
Index = (7+7*1)%10 = 4
Since the location 4 is empty; therefore, the key 12 is inserted at the index 4.
REHASHING
Rehashing is a technique in which the table is resized, i.e., the size of table is doubled by
creating a new table. It is preferable is the total size of table is a prime number. There are
situations in which the rehashing is required.
Consider we have to insert the elements 37, 90, 55, 22, 17, 49, and 87. the table
size is 10 and will use hash function.,
37 % 10 = 7
90 % 10= 0
55 % 10 = 5
22 % 10 = 2
17 % 10 = 7 Collision solved by
linear probing49 % 10 = 9
Now this table is almost full and if we try to insert more elements collisions will
occur and eventually further insertions will fail. Hence we will rehash by
doubling the table size. The old table size is 10 then we should double this size
for new table, that becomes 20. But 20 is not a prime number, we will prefer to
make the table size as 23. And new hash function will be
Advantages:
1. This technique provides the programmer a flexibility to enlarge the table size if
required.
2. Only the space gets doubled with simple hash function which
avoids occurrence of collisions.
Extendible Hashing
Extendible Hashing is a dynamic hashing method wherein directories, and buckets are used to
hash data. It is an aggressively flexible method in which the hash function also experiences
dynamic changes.
Main features of Extendible Hashing: The main features in this hashing technique are:
Directories: The directories store addresses of the buckets in pointers. An id is assigned to
each directory which may change each time when Directory Expansion takes place.
Buckets: The buckets are used to hash the actual data.
Basic Structure of Extendible Hashing:
Directories: These containers store pointers to buckets. Each directory is given a unique id
which may change each time when expansion takes place. The hash function returns this
directory id which is used to navigate to the appropriate bucket. Number of Directories =
2^Global Depth.
Buckets: They store the hashed keys. Directories point to buckets. A bucket may contain
more than one pointers to it if its local depth is less than the global depth.
Global Depth: It is associated with the Directories. They denote the number of bits which
are used by the hash function to categorize the keys. Global Depth = Number of bits in
directory id.
Local Depth: It is the same as that of Global Depth except for the fact that Local Depth is
associated with the buckets and not the directories. Local depth in accordance with the
global depth is used to decide the action that to be performed in case an overflow occurs.
Local Depth is always less than or equal to the Global Depth.
Bucket Splitting: When the number of elements in a bucket exceeds a particular size, then
the bucket is split into two parts.
Directory Expansion: Directory Expansion Takes place when a bucket overflows.
Directory Expansion is performed when the local depth of the overflowing bucket is equal
to the global depth.
Basic working
Example based on Extendible Hashing: Now, let us consider a prominent example of
hashing the following elements: 16,4,6,22,24,10,31,7,9,20,26.
Bucket Size: 3 (Assume)
Hash Function: Suppose the global depth is X. Then the Hash Function returns X LSBs.
Solution: First, calculate the binary forms of each of the given numbers.
16- 10000
4- 00100
6- 00110
22- 10110
24- 11000
10- 01010
31- 11111
7- 00111
9- 01001
20- 10100
26- 11010
Initially, the global-depth and local-depth is always 1. Thus, the hashing frame looks like
this:
Inserting 16:
The binary format of 16 is 10000 and global-depth is 1. The hash function returns 1 LSB of
10000 which is 0. Hence, 16 is mapped to the directory with id=0.
Inserting4and6:
Both 4(100) and 6(110)have 0 in their LSB. Hence, they are hashed as follows:
Inserting 22: The binary form of 22 is 10110. Its LSB is 0. The bucket pointed by
directory 0 is already full. Hence, Over Flow occurs.
As directed by Step 7-Case 1, Since Local Depth = Global Depth, the bucket splits and
directory expansion takes place. Also, rehashing of numbers present in the overflowing
bucket takes place after the split. And, since the global depth is incremented by 1, now,the
global depth is 2. Hence, 16,4,6,22 are now rehashed w.r.t 2 LSBs.[
16(10000),4(100),6(110),22(10110) ]
Inserting 24 and 10: 24(11000) and 10 (1010) can be hashed based on directories with id
00 and 10. Here, we encounter no overflow condition.
Inserting 31,7,9: All of these elements[ 31(11111), 7(111), 9(1001) ] have either 01 or 11
in their LSBs. Hence, they are mapped on the bucket pointed out by 01 and 11. We do not
encounter any overflow condition here.
Inserting 20: Insertion of data element 20 (10100) will again cause the overflow problem.
20 is inserted in bucket pointed out by 00. As directed by Step 7-Case 1, since the local
depth of the bucket = global-depth, directory expansion (doubling) takes place along
with bucket splitting. Elements present in overflowing bucket are rehashed with the new
global depth. Now, the new Hash table looks like this:
Inserting 26: Global depth is 3. Hence, 3 LSBs of 26(11010) are considered. Therefore 26
best fits in the bucket pointed out by directory 010.
The bucket overflows, and, as directed by Step 7-Case 2, since the local depth of bucket
< Global depth (2<3), directories are not doubled but, only the bucket is split and elements
arerehashed.
Finally, the output of hashing the given list of numbers is obtained.
Hashing of 11 Numbers is Thus Completed.