Data Structues Unit-II
Data Structues Unit-II
DICTIONARIES:
Dictionary is a collection of pairs of key and value where every value is associated with the
corresponding key.
Basic operations that can be performed on dictionary are:
1. Insertion of value in the dictionary
2. Deletion of particular value from dictionary
3. Searching of a specific value with the help of key
class dictionary
{
private:
int k,data;
struct node
{
public: int key;
int value;
struct node *next;
} *head;
public: dictionary();
void insert_d( );
void delete_d( );
void display_d( );
void length();
};
Insertion of new node in the dictionary:
Consider that initially dictionary is empty then
head = NULL
We will create a new node with some key and value contained in it.
70
Now as head is NULL, this new node becomes head. Hence the dictionary contains only one
record. this node will be ‘curr’ and ‘prev’ as well. The ‘cuur’ node will always point to current
visiting node and ‘prev’ will always point to the node previous to ‘curr’ node. As now there is
only one node in the list mark as ‘curr’ node as ‘prev’ node.
New/head/curr/prev
1 10 NULL
New
4 20 NULL
Compare the key value of ‘curr’ and ‘New’ node. If New->key > Curr->key then attach New node
to ‘curr’ node.
If we insert <3,15> then we have to search for it proper position by comparing key value.
(curr->key < New->key) is false. Hence else part will get executed.
1 10 4 20 7 80 NULL
3 15
void dictionary::insert_d( )
{
node *p,*curr,*prev;
cout<<"Enter an key and value to be inserted:";
cin>>k;
cin>>data;
71
p=new node;
p->key=k;
p->value=data;
p->next=NULL;
if(head==NULL)
head=p;
else
{
curr=head;
while((curr->key<p->key)&&(curr->next!=NULL))
{
prev=curr;
curr=curr->next;
}
if(curr->next==NULL)
{
if(curr->key<p->key)
{
curr->next=p;
prev=curr;
}
else
{
p- >next=prev->next;
prev->next=p;
}
}
else
{
p->next=prev->next;
prev->next=p;
}
cout<<"\nInserted into dictionary Sucesfully .... \n";
}
}
Case 1: Initially assign ‘head’ node as ‘curr’ node.Then ask for a key value of the node which is
to be deleted. Then starting from head node key value of each jode is cked and compared with the
desired node’s key value. We will get node which is to be deleted in variable ‘curr’. The node
given by variable ‘prev’ keeps track of previous node of ‘cuu’ node. For eg, delete node with key
value 4 then
cur
1 10 3 15 4 20 7 80 ULL
72
Case 2:
Then, simply make ‘head’ node as next node and delete ‘curr’
curr head
1 10 3 15 4 20 7 80 ULL
head
3 15 4 20 7 80 ULL
void dictionary::delete_d( )
{
node*curr,*prev;
cout<<"Enter key value that you want to delete...";
cin>>k;
if(head==NULL)
cout<<"\ndictionary is Underflow";
else
{ curr=head;
while(curr!=NULL)
{
if(curr->key==k)
break;
prev=curr;
curr=curr->next;
}
}
if(curr==NULL)
cout<<"Node not found...";
else
{
if(curr==head)
73
head=curr->next;
else
prev->next=curr->next;
delete curr;
cout<<"Item deleted from dictionary...";
}
}
1 2 3 4 5 6 7
head tail
node node
The skip list is an efficient implementation of dictionary using sorted chain. This is because in
skip list each node consists of forward references of more than one node at a time.
74
Eg:
null
Now to search any node from above given sorted chain we have to search the sorted chain from
head node by visiting each node. But this searching time can be reduced if we add one level in
every alternate node. This extra level contains the forward pointer of some node. That means in
sorted chain come nodes can holds pointers to more than one node.
NULL
If we want to search node 40 from above chain there we will require comparatively less time. This
search again can be made efficient if we add few more pointers forward references.
NULL
skip list
75
The individual node looks like this:
Element *next
Searching:
The desired node is searched with the help of a key value.
Searching for a key within a skip list begins with starting at header at the overall list level and
moving forward in the list comparing node keys to the key_val. If the node key is less than the
key_val, the search continues moving forward at the same level. If o the other hand, the node key
is equal to or greater than the key_val, the search drops one level and continues forward. This
process continues until the desired key_val has been found if it is present in the skip list. If it is
not, the search will either continue at the end of the list or until the first key with a value greater
than the search key is found.
Insertion:
There are two tasks that should be done before insertion operation:
1. Before insertion of any node the place for this new node in the skip list is searched. Hence
before any insertion to take place the search routine executes. The last[] array in the search
routine is used to keep track of the references to the nodes where the search, drops down
one level.
2. The level for the new node is retrieved by the routine randomelevel()
76
{
temp->element.value=New_pair.value;
return;
}
for(int i=0;i<=New_Level;i++)
{
newNode->next[i] = last[i]->next[i];
last[i]->next[i] = newNode;
}
len++;
return;
}
Deletion:
First of all, the deletion makes use of search algorithm and searches the node that is to be deleted.
If the key to be deleted is found, the node containing the key is removed.
for(int i=0;i<=levels;i++)
77
{
if(last[i]->next[i] == temp)
last[i]=>next[i] = temp->next[i];
}
For example: Consider that we want place some employee records in the hash table The record of
employee is placed with the help of key: employee ID. The employee ID is a 7 digit number for
placing the record in the hash table. To place the record 7 digit number is converted into 3 digits
by taking only last three digits of the key.
If the key is 496700 it can be stored at 0th position. The second key 8421002, the record of those
key is placed at 2nd position in the array.
Hence the hash function will be- H(key) = key%1000
Where key%1000 is a hash function and key obtained by hash function is called hash key.
➢ Bucket and Home bucket: The hash function H(key) is used to map several dictionary
entries in the hash table. Each position of the hash table is called bucket.
The function H(key) is home bucket for the dictionary with pair whose value is key.
1. Division Method: The hash function depends upon the remainder of division.
Typically the divisor is table length.
For eg; If the record 54, 72, 89, 37 is placed in the hash table and if the table size is 10 then
78
h(key) = record % table size 0
1
54%10=4 2 72
72%10=2 3
89%10=9 4 54
37%10=7 5
6
7 37
8
9 89
2. Mid Square:
In the mid square method, the key is squared and the middle or mid part of the result is used as the
index. If the key is a string, it has to be preprocessed to produce a number.
Consider that if we want to place a record 3111 then
31112 = 9678321
for the hash table of size 1000
H(3111) = 783 (the middle 3 digits)
H(key) = floor(p *(fractional part of key*A)) where p is integer constant and A is constant real
number.
H(key) = floor(50*(107*0.61803398987))
= floor(3306.4818458045)
= 3306
At 3306 location in the hash table the record 107 will be placed.
4. Digit Folding:
The key is divided into separate parts and using some simple operation these parts are
combined to produce the hash key.
For eg; consider a record 12365412 then it is divided into separate parts as 123 654 12 and these
are added together
H(key) = 123+654+12
= 789
The record will be placed at location 789
5. Digit Analysis:
The digit analysis is used in a situation when all the identifiers are known in advance. We
first transform the identifiers into numbers using some radix, r. Then examine the digits of each
identifier. Some digits having most skewed distributions are deleted. This deleting of digits is
continued until the number of remaining digits is small enough to give an address in the range of
the hash table. Then these digits are used to calculate the hash address.
79
COLLISION
the hash function is a function that returns the key value using which the record can be placed in
the hash table. Thus this function helps us in placing the record in the hash table at appropriate
position and due to this we can retrieve the record directly from that location. This function need
to be designed very carefully and it should not return the same hash key address for two different
records. This is an undesirable situation in hashing.
Definition: The situation in which the hash function returns the same hash key (home bucket) for
more than one record is called collision and two same hash keys returned for different records is
called synonym.
Similarly when there is no room for a new pair in the hash table then such a situation is
called overflow. Sometimes when we handle collision it may lead to overflow conditions.
Collision and overflow show the poor hash functions.
For example, 0
1 131
Consider a hash function. 2
3 43
H(key) = recordkey%10 having the hash table size of 10 4 44
5
The record keys to be placed are 6 36
7 57
131, 44, 43, 78, 19, 36, 57 and 77 8 78
131%10=1 9 19
44%10=4
43%10=3
78%10=8
19%10=9
36%10=6
57%10=7
77%10=7
Now if we try to place 77 in the hash table then we get the hash key to be 7 and at index 7 already
the record key 57 is placed. This situation is called collision. From the index 7 if we look for next
vacant position at subsequent indices 8.9 then we find that there is no room to place 77 in the hash
table. This situation is called overflow.
80
CHAINING
In collision handling method chaining is a concept which introduces an additional field with data
i.e. chain. A separate chain table is maintained for colliding data. When collision occurs then a
linked list(chain) is maintained at the home bucket.
For eg;
Here D = 10
0
1 131 21 61 NULL
3 NULL
131 61 NULL
7 97 NULL
A chain is maintained for colliding elements. for instance 131 has a home bucket (key) 1.
similarly key 21 and 61 demand for home bucket 1. Hence a chain is maintained at index 1.
For example:
81
Initially, we will put the following keys in the hash table.
We will use Division hash function. That means the keys are placed using the formula
H(key) = 131 % 10
=1
Index 1 will be the home bucket for 131. Continuing in this fashion we will place 4, 8, 7.
Now the next key to be inserted is 21. According to the hash function
H(key)=21%10
H(key) = 1
But the index 1 location is already occupied by 131 i.e. collision occurs. To resolve this collision
we will linearly move down and at the next empty location we will prob the element. Therefore
21 will be placed at the index 2. If the next element is 5 then we get the home bucket for 5 as
index 5 and this bucket is empty so we will put the element 5 at index 5.
82
The next record key is 9. According to decision hash function it demands for the home bucket 9.
Hence we will place 9 at index 9. Now the next final record key 29 and it hashes a key 9. But
home bucket 9 is already occupied. And there is no next empty bucket as the table size is limited
to index 9. The overflow occurs. To handle it we move back to bucket 0 and is the location over
there is empty 29 will be placed at 0th index.
Problem with linear probing:
One major problem with linear probing is primary clustering. Primary clustering is a process in
which a block of data is formed in the hash table when collision is resolved.
Key
39
19%10 = 9 cluster is formed
18%10 = 8 29
39%10 = 9 8
29%10 = 9
8%10 = 8
18
QUADRATIC PROBING: 19
Quadratic probing operates by taking the original hash value and adding successive values of an
arbitrary quadratic polynomial to the starting value. This method uses following formula.
for eg; If we have to insert following elements in the hash table with table size 10:
Consider i = 0 then
(17 + 02) % 10 = 7
83
(17 + 12) % 10 = 8, when i =1
H1(37) = 37 % 10 = 7
H1(90) = 90 % 10 = 0 37
H1(45) = 45 % 10 = 5
H1(22) = 22 % 10 = 2
H1(49) = 49 % 10 = 9 49
84
Now if 17 to be inserted then
Key
H1(17) = 17 % 10 = 7 90
H2(key) = M – (key % M)
17
Here M is prime number smaller than the size of the table. Prime number 22
smaller than table size 10 is 7
Hence M = 7
45
H2(17) = 7-(17 % 7)
=7–3=4
37
That means we have to insert the element 17 at 4 places from 37. In short we ha v e to take 4
jumps. Therefore the 17 will be placed at index 1.
49
Now to insert number 55
Key
H1(55) = 55 % 10 =5 Collision
90
H2(55) = 7-(55 % 7) 17
=7–6=1 22
That means we have to take one jump from index 5 to place 55.
Finally the hash table will be -
45
55
37
49
Comparison of Quadratic Probing & Double Hashing
The double hashing requires another hash function whose probing efficiency is same as
some another hash function required when handling random collision.
The double hashing is more complex to implement than quadratic probing. The quadratic
probing is fast technique than double hashing.
REHASHING
Rehashing is a technique in which the table is resized, i.e., the size of table is doubled by creating
a new table. It is preferable is the total size of table is a prime number. There are situations in
which the rehashing is required.
85