Data Structures Unit-2 Notes
Data Structures Unit-2 Notes
results = {'Detra' : 17,'Nova' : 84,'Charlie' : 22, 'Henry' : 75, 'Roxanne' : 92, 'Elsa' : 29}
Note:The keys in a dictionary must be simple types such as integers or strings while the values can be of
any type.
Operations of Dictionary:
1. Insertion
2. Deletion
3. Search
Example:
Consider an empty unordered dictionary and the following set of operations:
Implementation of Dictionary:
Dictionary is implemented in four ways:
1. Linear List Representation
2. Skip List Representation
3. Hashing
4. Trees
Linear List Representation
The dictionary can be represented as a linear list.The linear list is a collection of pairs(key & value).
EX:
key value next key value next key value next key value next
1 40 2 50 4 70 6 80
struct node
{
int key;
int value;
struct node *next;
}*head=NULL;
void insert();
void delete();
void search();
1.Insertion:
Step 1: Create a new node say ‘ptr’ with key and value.
Step 2: Check whether dictionary is EMPTY. (head==NULL)
Step 3:If Dictionary is Empty,then set head=ptr and define node pointer curr aand initialize it with head. i.e
curr=head.
Step 4: If it is not Empty then check the condition ptr->key>curr->key.
Step 5: If it is True then, set curr->next=ptr,curr=ptr.
Step 6: If it is not true then,define two node pointers temp and temp1 and initialize them with head.
Step 7: Keep moving the temp to its next node until the condition is true.
Step 8:Move temp1 to next node until the condition is true.Then set ptr->next=temp1->next ,temp1-
>next=ptr.
curr->next=ptr;
curr=ptr;
}
else
{
struct node *temp=*temp1=head;
while(temp->key<ptr->key)
temp=temp->next;
for(;temp1->next!=temp;temp1=temp1->next);
ptr->next=temp1->next;
temp1->next=ptr;
}
}
}
2.Deletion
3.Search
1.Searching Process
• When an element is tried to search, the search begins at the head element of the top list.
• It proceeds horizontally until the current element is greater than or equal to the target.
• If current element and target are matched, it means they are equal and search gets finished.
• If the current element is greater than target, the search goes on and reaches to the end of the
linked list, the procedure is repeated after returning to the previous element and the search
reaches to the next lower list (vertically).
2.Insertion
• The insertion algorithm for skip lists uses randomization to decide how many references to the new
item (k,e) should be added to the skip list.
• We then insert (k,e) in this bottom-level list immediately after position p.
• After inserting the new item at this level we “flip a coin”.
• If the flip comes up tails, then we stop right there.
• If the flip comes up heads, we move to next higher level and insert (k,e) in this level at the
appropriate position.
Exapmle:
• Suppose we want to insert 15
• Do a search, and find the spot between 10 and 23
3.Deletion
Example
1) Suppose we want to delete 34
2) Do a search, find the spot between 23 and 45
3) Remove all the position above p
1. 34<+ “drop down”
2. 34=34 “return element(after(p)”
3. Remove all the position above 34
Hash Table
0 72 H(36)=36%8=4
1 H(18)=18%8=2
2 18 H(72)=72%8=0
3 43 H(43)=43%8=3
4 36 H(6)=6%8=6
5
6 6
7
Collision
If the hash function returns same hash key for more than one element then this situation is called Collision.
(OR)
If x1 and x2 are two different keys,but the hash values of x1 and x2 are equal(i.e, h(x1)=h(x2)) then it is
called as a Collision.
0 H(131)=131%10=1
1 131
2 H(3)=3%10=3
3 3
4 4 H(4)=4%10=4
5
6 H(21)=21%10=1
7 97 Collision
8 H(61)=61%10=1
9 89
H(24)=24%10=4
H(7)=7%10=7
H(97)=97%10=7
H(89)=89%10=9
Collision Resolution Techniques
Collision Resolution Techniques are the techniques used for resolving or handling the collision.
Collision resolution techniques are classified as
Separate Chaining
To handle the collision,
• This technique creates a linked list to the slot for which collision occurs.
• The new key is then inserted in the linked list.
• These linked lists to the slots appear like chains.
• That is why, this technique is called as separate chaining.
Problem
Insert the following sequence of keys in the hash table
50, 700, 76, 85, 92, 73 and 101.
Use separate chaining technique for collision resolution. Table size is 7.
Solution
• Draw an empty hash table consisting of 7 buckets.
• The possible range of hash values is [0, 6].
• Insert the given keys in the hash table one by one.
• Hashing Formula is H(Key)=Key mod Table Size.
Insert 50:
H(50)=50 mod 7
H(50)=1
So,insert 50 in bucket-1 of the hash table.
Insert 700:
H(700)=700 mod 7
H(700)=0
So,insert 700 in bucket-0 of the hash table.
Insert 76:
H(76)=76 mod 7
H(76)=6
So,insert 76 in bucket-6 of the hash table.
Insert 85:
H(85)=85 mod 7
H(85)=1
Since bucket-1 is already occupied, so collision occurs.
Separate chaining handles the collision by creating a linked list to bucket-1.
So,insert 85 in bucket-1 of the hash table.
Insert 92:
H(92)=92 mod 7
H(92)=1
Since bucket-1 is already occupied, so collision occurs.
Separate chaining handles the collision by creating a linked list to bucket-1.
So,insert 92 in bucket-1 of the hash table.
Insert 73:
H(73)=73 mod 7
H(73)=3
Since bucket-3 is already occupied, so collision occurs.
Separate chaining handles the collision by creating a linked list to bucket-3.
So,insert 73 in bucket-3 of the hash table.
Insert 101:
H(101)=101 mod 7
H(101)=3
Since bucket-3 is already occupied, so collision occurs.
Separate chaining handles the collision by creating a linked list to bucket-3.
So,insert 101 in bucket-3 of the hash table.
Open Addressing
In open addressing,
• Unlike separate chaining, all the keys are stored inside the hash table.
• No key is stored outside the hash table.
Advantage
• It is easy to compute.
Disadvantage
• The main problem with linear probing is clustering.
• Many consecutive elements form groups.
• Then, it takes time to search an element or to find an empty bucket.
Problem
Insert the following sequence of keys in the hash table
50, 700, 76, 85, 92, 73,101.
Use linear probing technique for collision resolution.Table size is 7
Solution
• Draw an empty hash table consisting of 7 buckets.
• The possible range of hash values is [0, 6].
• Insert the given keys in the hash table one by one.
• Hashing Formula is H(Key)=Key mod Table Size.
Insert 50:
• H(50)=50 mod 7
• H(50)=1
• So,insert 50 in bucket-1 of the hash table.
Insert 700:
• H(700)=700 mod 7
• H(700)=0
• So,insert 700 in bucket-0 of the hash table.
Insert 76:
• H(76)=76 mod 7
• H(76)=6
• So,insert 76 in bucket-6 of the hash table.
Insert 85:
• H(85)=85 mod 7
• H(85)=1
• Since bucket-1 is already occupied, so collision occurs.
• To handle the collision, linear probing technique keeps probing linearly until an empty
bucket is found.
• The first empty bucket is bucket-2.
• So,insert 85 in bucket-2 of the hash table.
Insert 92:
• H(92)=92 mod 7
• H(92)=1
• Since bucket-1 is already occupied, so collision occurs.
• To handle the collision, linear probing technique keeps probing linearly until an empty
bucket is found.
• The first empty bucket is bucket-3.
• So,insert 92 in bucket-3 of the hash table.
Insert 73:
• H(73)=73 mod 7
• H(73)=3
• Since bucket-3 is already occupied, so collision occurs.
• To handle the collision, linear probing technique keeps probing linearly until an empty
bucket is found.
• The first empty bucket is bucket-4.
• So,insert 73 in bucket-4 of the hash table.
Insert 101:
• H(101)=101 mod 7
• H(101)=3
• Since bucket-3 is already occupied, so collision occurs.
• To handle the collision, linear probing technique keeps probing linearly until an empty
bucket is found.
• The first empty bucket is bucket-5.
• So,insert 101 in bucket-5 of the hash table.
Quadratic Probing
In quadratic probing,
• When collision occurs, we probe for i2‘th bucket in ith iteration.
• We keep probing until an empty bucket is found.
Where f(i)=i2 (Initially i=0 when collision occurs i value is increased by one).
hash(X)=X mod Tablesize
Problem:
1. Insert the following list of keys in the hash table by using Quadratic Probing Technique for
Collision Resolution.
23 13 12 22 18 28 32 53 6. Hash Table size is 11
Solution:
Insert 23
▪ Hi(X)=((Hash(X)+f(i)) mod Tablesize
▪ H0(23)=((Hash(23)+f(0)) mod 11
▪ Hash(23)=23 mod 11
▪ Hash(23)=1
▪ H0(23)=((1+0)) mod 11
▪ H0(23)=1 mod 11
▪ H0(23)=1
0
1 23
2
3
4
5
6
7
8
9
10
Insert 13
▪ Hi(X)=((Hash(X)+f(i)) mod Tablesize
▪ H0(13)=((Hash(13)+f(0)) mod 11
▪ Hash(13)=13 mod 11
▪ Hash(13)=2
▪ H0(13)=((2+0)) mod 11
▪ H0(13)=2 mod 11
▪ H0(13)=2
0
1 23
2 13
3
4
5
6
7
8
9
10
Insert 12
▪ Hi(X)=((Hash(X)+f(i)) mod Tablesize
▪ H0(12)=((Hash(12)+f(0)) mod 11
▪ Hash(12)=12 mod 11
▪ Hash(12)=1
▪ H0(12)=((1+0)) mod 11
▪ H0(12)=1 mod 11
▪ H0(12)=1
▪ Collision is Occurred at Bucket-1.
▪ H1(12)=((Hash(12)+f(1)) mod 11
▪ H1(12)=(1+12) mod 11
▪ H1(12)=2 mod 11
▪ H1(12)=2
▪ Collision is Occurred at Bucket-2
▪ H2(12)=((Hash(12)+f(2)) mod 11
▪ H2(12)=(1+22) mod 11
▪ H2(12)=5 mod 11
▪ H2(12)=5
0
1 23
2 13
3
4
5 12
6
7
8
9
10
Insert 22
▪ Hi(X)=((Hash(X)+f(i)) mod Tablesize
▪ H0(22)=((Hash(22)+f(0)) mod 11
▪ Hash(22)=22 mod 11
▪ Hash(22)=0
▪ H0(22)=((0+0)) mod 11
▪ H0(22)=0 mod 11
▪ H0(22)=0
0 22
1 23
2 13
3
4
5 12
6
7
8
9
10
Insert 18
▪ Hi(X)=((Hash(X)+f(i)) mod Tablesize
▪ H0(18)=((Hash(18)+f(0)) mod 11
▪ Hash(18)= 18 mod 11
▪ Hash(18)=7
▪ H0(18)=((7+0)) mod 11
▪ H0(18)=7 mod 11
▪ H0(18)=7
0 22
1 23
2 13
3
4
5 12
6
7 18
8
9
10
Insert 28
▪ Hi(X)=((Hash(X)+f(i)) mod Tablesize
▪ H0(28)=((Hash(28)+f(0)) mod 11
▪ Hash(28)= 28 mod 11
▪ Hash(28)=6
▪ H0(28)=((6+0)) mod 11
▪ H0(28)=6 mod 11
▪ H0(28)=6
0 22
1 23
2 13
3
4
5 12
6 28
7 18
8
9
10
Double Hashing
Double hashing is a collision resolving technique in Open Addressed Hash tables. Double hashing uses the
idea of applying a second hash function to key when a collision occurs.
Problem
Insert the following list of keys in the hash table by using Double Hashing Technique for
Collision Resolution.Table size is 10.
89,18,49,58,69,60
Solution:
Insert 89
▪ Hi(X)=((Hash(X)+f(i)) mod Tablesize
▪ H0(89)=((Hash(89)+f(0)) mod 10
▪ Hash(89)= 89 mod 10
▪ Hash(89)=9
▪ H0(89)=((9+0)) mod 10
▪ H0(89)=9 mod 10
▪ H0(89)=9
0
1
2
3
4
5
6
7
8
9 89
Insert 18
▪ Hi(X)=((Hash(X)+f(i)) mod Tablesize
▪ H0(18)=((Hash(18)+f(0)) mod 10
▪ Hash(18)= 18 mod 10
▪ Hash(18)=8
▪ H0(18)=((8+0)) mod 10
▪ H0(18)=8 mod 10
▪ H0(18)=8
0
1
2
3
4
5
6
7
8 18
9 89
Insert 49
▪ Hi(X)=((Hash(X)+f(i)) mod Tablesize
▪ H0(49)=((Hash(49)+f(0)) mod 10
▪ Hash(49)= 49 mod 10
▪ Hash(49)=9
▪ H0(49)=((9+0)) mod 10
▪ H0(49)=9 mod 10
▪ H0(49)=9
▪ Collision is Occurred at Bucket-9.
▪ H1(49)=((Hash(49)+f(1)) mod 10
▪ H1(49)=((9+(1*Hash2(49)) mod 10
▪ Hash2(49)=7-(49 mod 7)
▪ Hash2(49)=7-0
▪ Hash2(49)=7
▪ H1(49)=((9+7) mod 10
▪ H1(49)=16 mod 10
▪ H1(49)=6
▪
0
1
2
3
4
5
6 49
7
8 18
9 89
Insert 58
▪ Hi(X)=((Hash(X)+f(i)) mod Tablesize
▪ H0(58)=((Hash(58)+f(0)) mod 10
▪ Hash(58)= 58 mod 10
▪ Hash(58)=8
▪ H0(58)=((8+0)) mod 10
▪ H0(58)=8 mod 10
▪ H0(58)=8
▪ Collision is Occurred at Bucket-8.
▪ H1(58)=((Hash(58)+f(1)) mod 10
▪ H1(58)=((8+(1*Hash2(58)) mod 10
▪ Hash2(58)=7-(58 mod 7)
▪ Hash2(58)=7-2
▪ Hash2(58)=5
▪ H1(58)=((8+5) mod 10
▪ H1(58)=13 mod 10
▪ H1(58)=3
▪
0
1
2
3 58
4
5
6 49
7
8 18
9 89
Insert 69
▪ Hi(X)=((Hash(X)+f(i)) mod Tablesize
▪ H0(69)=((Hash(69)+f(0)) mod 10
▪ Hash(69)= 69 mod 10
▪ Hash(69)=9
▪ H0(69)=((9+0)) mod 10
▪ H0(69)=9 mod 10
▪ H0(69)=9
▪ Collision is Occurred at Bucket-9.
▪ H1(69)=((Hash(69)+f(1)) mod 10
▪ H1(69)=((9+(1*Hash2(69)) mod 10
▪ Hash2(69)=7-(69 mod 7)
▪ Hash2(69)=7-6
▪ Hash2(69)=1
▪ H1(69)=((9+1) mod 10
▪ H1(69)=10 mod 10
▪ H1(69)=0
0 69
1
2
3 58
4
5
6 49
7
8 18
9 89
Insert 60
▪ Hi(X)=((Hash(X)+f(i)) mod Tablesize
▪ H0(60)=((Hash(60)+f(0)) mod 10
▪ Hash(60)= 60 mod 10
▪ Hash(60)=0
▪ H0(60)=((0+0)) mod 10
▪ H0(60)=0 mod 10
▪ H0(60)=0
▪ Collision is Occurred at Bucket-0.
▪ H1(60)=((Hash(60)+f(1)) mod 10
▪ H1(60)=((0+(1*Hash2(60)) mod 10
▪ Hash2(60)=7-(60 mod 7)
▪ Hash2(60)=7-4
▪ Hash2(60)=3
▪ H1(60)=((0+3) mod 10
▪ H1(60)=3 mod 10
▪ H1(60)=3
▪ Collision is Occurred at Bucket-3.
▪ H2(60)=((Hash(60)+f(2)) mod 10
▪ H2(60)=((0+(2*Hash2(60)) mod 10
▪ H2(60)=((0+(2*3)) mod 10
▪ H2(60)=((0+6) mod 10
▪ H2(60)=6 mod 10
▪ H2(60)=6
▪ Collision is Occurred at Bucket-6.
▪ H3(60)=((0+(3*Hash2(60)) mod 10
▪ H3(60)=((0+(3*3)) mod 10
▪ H3(60)=9 mod 10
▪ H3(60)=9
▪ Collision is Occurred at Bucket-9.
▪ H4(60)=((0+(4*Hash2(60)) mod 10
▪ H4(60)=((0+(4*3)) mod 10
▪ H4(60)=12 mod 10
▪ H4(60)=2
0 60
1
2 60
3 58
4
5
6 49
7
8 18
9 89
REHASHING
Rehashing is a technique in which the table is resized, i.e., the size of table is doubled by creating a
new table. It is preferable is the total size of table is a prime number. There are situations in which
the rehashing is required.
• When table is completely full
• With quadratic probing when the table is filled half.
• When insertions fail due to overflow.
In such situations, we have to transfer entries from old table to the new table by re computing
their positions using hash functions.
Example:
Consider we have to insert the elements 37, 90, 55, 22, 16, 49, 33 and 88.
Table size is 10
Solution :
h(37) = 37 mod 10 = 7
Buckets Key
h(90) = 90 mod 10 = 0
h(55) = 55 mod 10 = 5 0 90
h(22) = 22 mod 10 = 2 1
h(16) = 16 mod 10 = 6 2 22
h(49) = 49 mod 10 = 9 3 33
h(33) = 33 mod 10 = 3 4
h(88) = 88 mod 10 = 8 5 55
6 16
7 37
8 88
9 49
Now this table is almost full and if we try to insert more elements collisions will occur and eventually
further insertions will fail. Hence we will rehash by doubling the table size. The old table size is 10 then
we should double this size for new table that becomes 20. But 20 is not a prime number, we will prefer to
make the table size as 23.
New Hash Table
Buckets Key
h(X) = X mod Table size
0
h(37) = 37 mod 23 = 14 1
h(90) = 90 mod 23 = 21 2
h(55) = 55 mod 23 = 9 3 49
h(22) = 22 mod 23 = 22
4
h(16) = 16 mod 23 = 16
5
h(49) = 49 mod 23 = 3
6
h(33) = 33 mod 23= 10
7
h(88) = 88 mod 23 = 19
8
9 55
10 33
11
12
13
14 37
15
16 16
17
18
19 88
20
21 90
22 22
Advantages:
1. This technique provides the programmer a flexibility to enlarge the table size if required.
2. Only the space gets doubled with simple hash function which avoids occurrence of collisions.
EXTENDABLE HASHING
Extendible Hashing is a dynamic hashing method wherein directories, and buckets are used
to hash data. It is an aggressively flexible method in which the hash function also experiences
dynamic changes.
Main features of Extendible Hashing: The main features in this hashing technique are:
Step 1 – Analyze Data Elements: Data elements may exist in various forms eg. Integer, String, Float, etc..
Currently, let us consider data elements of type integer. eg: 49.
Step 2 – Convert into binary format: Convert the data element in Binary form. For string elements,
consider the ASCII equivalent integer of the starting character and then convert the integer into binary
form. Since we have 49 as our data element, its binary form is 110001.
Step 3 – Check Global Depth of the directory. Suppose the global depth of the Hash-directory is 3.
Step 4 – Identify the Directory: Consider the ‘Global-Depth’ number of LSBs in the binary number and
match it to the directory id.
Eg. The binary obtained is: 110001 and the global-depth is 3. So, the hash function will return 3 LSBs of
110001 viz. 001.
Step 5 – Navigation: Now, navigate to the bucket pointed by the directory with directory-id 001.
Step 6 – Insertion and Overflow Check: Insert the element and check if the bucket overflows. If an
overflow is encountered, go to step 7 followed by Step 8, otherwise, go to step 9.
Step 7 – Tackling Over Flow Condition during Data Insertion: Many times, while inserting data in the
buckets, it might happen that the Bucket overflows. In such cases, we need to follow an appropriate
procedure to avoid mishandling of data.
First, Check if the local depth is less than or equal to the global depth. Then choose one of the cases below.
• Case 1: If the local depth of the overflowing Bucket is equal to the global depth, then Directory
Expansion, as well as Bucket Split, needs to be performed. Then increment the global depth and
the local depth value by 1. And, assign appropriate pointers.
Directory expansion will double the number of directories present in the hash structure.
• Case 2: In case the local depth is less than the global depth, then only Bucket Split takes place.
Then increment only the local depth value by 1. And, assign appropriate pointers.
Step 8 – Rehashing of Split Bucket Elements: The Elements present in the overflowing bucket that is split
are rehashed w.r.t the new global depth of the directory.
Solution: First, calculate the binary forms of each of the given Keys.
Step1 : Initially, the global-depth and local-depth is always 1. Thus, the hashing frame is
id=0.
Since Local Depth = Global Depth, the bucket splits and directory expansion takes place. Also,
rehashing of numbers present in the overflowing bucket takes place after the split. And, since the
global depth is incremented by 1, now,the global depth is 2. Hence, 16,4,6,22 are now rehashed
w.r.t 2 LSBs.[ 16(10000),4(100),6(110),22(10110) ]
Step 6: Inserting 24 : The binary format of 24 is 11000 and Global depth is 2.
It is LSB of 11000 which is 00. Hence, 24 is mapped to the directory with id=00.
It is LSB of 11111 which is 11. Hence, 31 is mapped to the directory with id=11.
It is LSB of 00111 which is 11. Hence, 7 is mapped to the directory with id=11.
Step 10: Inserting 9 : The binary format of 9 is 01001 and Global depth is 2.
It is LSB of 01001 which is 01. Hence, 9 is mapped to the directory with id=01.
Step 11: Inserting 20 : The binary format of 20 is 10100 and Global depth is 2.
It is LSB of 10100 which is 00. Hence, 20 is mapped to the directory with id=00.
Since Local Depth = Global Depth, the bucket splits and directory expansion takes place. Also,
rehashing of numbers present in the overflowing bucket takes place after the split. And, since the
global depth is incremented by 1, now, the global depth is 3. Hence, 16,4,24,20 are now rehashed
w.r.t 3 LSBs.[ 16(10000),4(00100),24(11000),20(10100) ]
Step 12: Inserting 26 : The binary format of 26 is 11010 and Global depth is 3.
It is LSB of 11010 which is 010. Hence, 26 is mapped to the directory with id=010.
The bucket pointed by directory 010 is Already full. Hence , overflow occurs.
since the local depth of bucket < Global depth (2<3), directories are not doubled but,only
the bucket is split and Hence, 6,22,10,26 are now rehashed w.r.t 3 LSBs.[ 6(00110),
22(10110), 10(01010), 26(11010) ].