Sri Ganesh College of Arts and Science: I B. SC Computer Science
Sri Ganesh College of Arts and Science: I B. SC Computer Science
SCIENCE
DEPARTMENT OF COMPUTER SCIENCE & APPLICATIONS
DATA STRUCTURE AND ALGORITHMS
I B. Sc COMPUTER SCIENCE
An abstract data type (ADT) is a set of objects together with a set of operations. Abstract
data types are mathematical abstraction. Objects such as lists, sets, and graphs, along with their
operations, can be viewed as ADTs. For the set ADT, we might have such operations as add,
remove, size, and contains.
1
• To use an array, an estimate of the maximum size of the list was required.
• This estimate is no longer needed.
• An array implementation allows printList to be carried out in linear time, and the
findKth operation takes constant time, which is as good as can be expected.
• Insertion and deletion are potentially expensive, depending on where the insertions and
deletions occur.
➢ Inserting into position 0 (at the front of the list) requires pushing the entire array
down one spot to make room, and deleting the first element requires shifting all the
elements in the list up one spot, so the worst case for these operations is O(N).
➢ On average, half of the list needs to be moved for either operation, so linear time is
still required.
➢ On the other hand, if all the operations occur at the high end of the list, then no
elements need to be shifted, and then adding and deleting take O(1) time.
• There are many situations where the list is built up by insertions at the high end, and
then only array accesses (i.e., findKth operations) occur. In such a case, the array is a
suitable implementation.
• If insertions and deletions occur throughout the list and, in particular, at the front of the
list, then the array is not a good option the alternative is the linked list.
• To avoid the linear cost of insertion and deletion, we need to ensure that the list is not
stored contiguously otherwise entire parts of the list will need to be moved.
• The linked list consists of a series of nodes, which are not necessarily adjacent in
memory.
• Each node contains the element and a link to a node containing its successor called the
next link. The last cell’s next link points to nullptr.
• To execute printList() or find(x), we merely start at the first node in the list and then
traverse the list by following the next links. This operation is clearly linear-time.
2
• The findKth(i) takes O(i) time and works by traversing down the list. frequently the
calls to findKth are in sorted order (by i). As an example, findKth(2), findKth(3),
findKth(4), and findKth(6) can all be executed in one scan down the list.
• The remove method can be executed in one next pointer change.
• The insert method requires obtaining a new node from the system by using a new call
and then executing two next pointer moves. The dashed line represents the old pointer.
3
1.4.1 Singly linked lists
Single linked list is a sequence of elements in which every element has link to its next
element in the sequence. In any single linked list, the individual element is called as "Node".
Every "Node" contains two fields, data and next. The data field is used to store actual value of
that node and next field is used to store the address of the next node in the sequence. The
graphical representation of a node in a single linked list is as follows...
Linked list can be visualized as a chain of nodes, where every node points to the next
node.
As per the above illustration, following are the important points to be considered.
• Linked List contains a link element called first.
• Each link carries a data field(s) and a link field called next.
• Each link is linked with its next link using its next link.
• Last link carries a link as null to mark the end of the list.
1.4.2 Circular linked lists
Circular Linked List is a variation of Linked list in which the first element points to the
last element and the last element points to the first element. Both Singly Linked List and
Doubly Linked List can be made into a circular linked list.
Circular Singly Linked List
In singly linked list, the next pointer of the last node points to the first node.
4
In doubly linked list, the next pointer of the last node points to the first node and the previous
pointer of the first node points to the last node making the circular in both directions.
• The last link's next points to the first link of the list in both cases of singly as well as
doubly linked list.
• The first link's previous points to the last of the list in case of doubly linked list.
• Double linked list is a sequence of elements in which every element has links to its
previous element and next element in the sequence.
• In double linked list, every node has link to its previous node and next node.
• So, we can traverse forward by using next field and can traverse backward by using
previous field.
• Every node in a double linked list contains three fields and they are shown in the
following figure.
Here,
• 'link1' field is used to store the address of the previous node in the sequence
• 'link2' field is used to store the address of the next node in the sequence and
• 'data' field is used to store the actual value of that node.
5
• Next − Each link of a linked list contains a link to the next link called Next.
• Prev − Each link of a linked list contains a link to the previous link called Prev.
• In double linked list, the first node must be always pointed by head.
• Always the previous field of the head node must be NULL.
• Always the next field of the last node must be NULL.
6
Add Two Polynomials
To add two polynomials, we can add the coefficients of like terms and generate a
new linked list for the resulting polynomial.
For example, we can use two liked lists to represent polynomials 2-4x+5x2 and 1+2x-
3x3:
When we add them together, we can group the like terms and generate the result
3-2x+5x2-3x3
7
p1 = p1.next;
p2 = p2.next;
end
if head is null then
head = tail;
end
end
while p1 ≠ null do
tail = append(tail, p1.power, p1.coefficient);
p1 = p1.next;
end
while p2 ≠ null do
tail = append(tail, p2.power, p2.coefficient);
p2 = p2.next;
end
return head;
In this algorithm, we first create two pointers, p1 and p2, to the head pointers of the two
input polynomials. Then, we generate the new polynomial nodes based on the powers of these
two pointers. There are three cases:
1. p1‘s power is greater than p2‘s power: In this case, we append a new node with p1‘s
power and coefficient. Also, we move p1 to the next node.
2. p2‘s power is greater than p1‘s power: In this case, we append a new node with p2‘s
power and coefficient. Also, we move p2 to the next node.
3. p1 and p2 have the same power: In this case, the new coefficient is the total of p1‘s
coefficient and p2‘s coefficient. If the new coefficient is not 0, we append a new node
with the same power and the new coefficient. Also, we move both p1 and p2 to the next
nodes.
After that, we continue to append the remaining nodes from p1 or p2 until we finish the
calculation on all nodes.
The append function can create a new linked list node based on the
input power and coefficient. Also, it appends the new node to the tail node and returns the
new tail node:
8
Algorithm for append new polynomial node
Data: The previous tail node, the power and coefficient of the new node
Result: The new tail node
Function append(tail, power, coefficient):
Create a new node with power and coefficient;
If tail ≠ null then
tail.next = node;
end
return node;
9
• Step 1 - Create a newNode with given value and newNode→ next as NULL.
• Step 2 - Check whether list is Empty (head == NULL).
• Step 3 - If it is Empty then, set head = newNode.
• Step 4 - If it is Not Empty then, define a node pointer temp and initialize with head.
• Step 5 - Keep moving the temp to its next node until it reaches to the last node in the
list (until temp → next is equal to NULL).
• Step 6 - Set temp → next = newNode.
10
Doubly Linked List
In a double linked list, the insertion operation can be performed in three ways as
follows...
• Inserting At Beginning of the list
• Inserting At End of the list
• Inserting At Specific location in the list
Inserting at Beginning of the list
We can use the following steps to insert a new node at beginning of the double linked
list...
• Step 1: Create a newNode with given value and newNode → previous as NULL.
• Step 2: Check whether list is Empty (head == NULL)
• Step 3: If it is Empty then, assign NULL to newNode → next and newNode to head.
• Step 4: If it is not Empty then, assign head to newNode → next and newNode to
head.
11
Inserting at Specific location in the list (After a Node)
We can use the following steps to insert a new node after a node in the double linked
list...
• Step 1: Create a newNode with given value.
• Step 2: Check whether list is Empty (head == NULL)
• Step 3: If it is Empty then, assign NULL to newNode → previous &newNode →
next
andnewNode to head.
• Step 4: If it is not Empty then, define two node pointers temp1 & temp2 and initialize
temp1 with head.
• Step 5: Keep moving the temp1 to its next node until it reaches to the node after
which
we want to insert the newNode (until temp1 → data is equal to location, here
location is the node value after which we want to insert the newNode).
• Step 6: Every time check whether temp1 is reached to the last node. If it is reached
to the
last node then display 'Given node is not found in the list!!! Insertion not
possible!!!' and terminate the function. Otherwise move the temp1 to next
node.
• Step 7: Assign temp1 → next to temp2, newNode to temp1 → next, temp1 to
newNode
→ previous, temp2 to newNode → nextandnewNode to temp2 → previous.
12
Circular Linked List
A node can be added in three ways:
1. Insertion at the beginning of the list
2. Insertion at the end of the list
3. Insertion in between the nodes
1) Insertion at the beginning of the list: To insert a node at the beginning of the list, follow
these steps:
• Create a node, say T.
• Make T -> next = last -> next.
• last -> next = T.
2) Insertion at the end of the list: To insert a node at the end of the list, follow these steps:
• Create a node, say T.
• Make T -> next = last -> next;
• last -> next = T.
• last = T.
13
3) Insertion in between the nodes: To insert a node in between the two nodes, follow these
steps:
• Create a node, say T.
• Search for the node after which T needs to be inserted, say that node is P.
• Make T -> next = P -> next;
• P -> next = T.
1.6.2 Deletion
Singly Linked List
In a single linked list, the deletion operation can be performed in three ways. They are as
follows...
1. Deleting from Beginning of the list
2. Deleting from End of the list
3. Deleting a Specific Node
Deleting from Beginning of the list
We can use the following steps to delete a node from beginning of the single linked list...
• Step 1 - Check whether list is Empty (head == NULL)
• Step 2 - If it is Empty then, display 'List is Empty!!! Deletion is not possible' and
terminate the function.
• Step 3 - If it is Not Empty then, define a Node pointer 'temp' and initialize
with head.
• Step 4 - Check whether list is having only one node (temp → next == NULL)
• Step 5 - If it is TRUE then set head = NULL and delete temp (Setting Empty list
conditions)
• Step 6 - If it is FALSE then set head = temp → next, and delete temp.
14
Deleting from End of the list
We can use the following steps to delete a node from end of the single linked list...
• Step 1 - Check whether list is Empty (head == NULL)
• Step 2 - If it is Empty then, display 'List is Empty!!! Deletion is not possible' and
terminate the function.
• Step 3 - If it is Not Empty then, define two Node pointers 'temp1' and 'temp2' and
initialize 'temp1' with head.
• Step 4 - Check whether list has only one Node (temp1 → next == NULL)
• Step 5 - If it is TRUE. Then, set head = NULL and delete temp1. And terminate the
function. (Setting Empty list condition)
• Step 6 - If it is FALSE. Then, set 'temp2 = temp1 ' and move temp1 to its next node.
Repeat the same until it reaches to the last node in the list. (until temp1 →
next == NULL)
• Step 7 - Finally, Set temp2 → next = NULL and delete temp1.
15
• Step 4 - Keep moving the temp1 until it reaches to the exact node to be deleted or to
the last node. And every time set 'temp2 = temp1' before moving the 'temp1' to its
next node.
• Step 5 - If it is reached to the last node then display 'Given node not found in the
list! Deletion not possible!!!'. And terminate the function.
• Step 6 - If it is reached to the exact node which we want to delete, then check whether
list is having only one node or not
• Step 7 - If list has only one node and that is the node to be deleted, then
set head = NULL and delete temp1 (free(temp1)).
• Step 8 - If list contains multiple nodes, then check whether temp1 is the first node in
the list (temp1 == head).
• Step 9 - If temp1 is the first node then move the head to the next node (head = head
→ next) and delete temp1.
• Step 10 - If temp1 is not first node then check whether it is last node in the list
(temp1 → next == NULL).
• Step 11 - If temp1 is last node then set temp2 → next = NULL and
delete temp1 (free(temp1)).
• Step 12 - If temp1 is not first node and not last node then set temp2 → next = temp1
→ next and delete temp1 (free(temp1)).
16
• Step 1: Check whether list is Empty (head == NULL)
• Step 2: If it is Empty then, display 'List is Empty!!! Deletion is not possible' and
terminate the function.
• Step 3: If it is not Empty then, define a Node pointer 'temp' and initialize with head.
• Step 4: Check whether list is having only one node (temp → previous is equal to
temp
→ next)
• Step 5: If it is TRUE, then set head to NULL and delete temp (Setting Empty list
conditions)
• Step 6: If it is FALSE, then assign temp → next to head, NULL to head → previous
and
delete temp.
17
Deleting a Specific Node from the list
We can use the following steps to delete a specific node from the double linked list...
• Step 1: Check whether list is Empty (head == NULL)
• Step 2: If it is Empty then, display 'List is Empty!!! Deletion is not possible' and
terminate the function.
• Step 3: If it is not Empty, then define a Node pointer 'temp' and initialize with head.
• Step 4: Keep moving the temp until it reaches to the exact node to be deleted or to
the
last node.
• Step 5: If it is reached to the last node, then display 'Given node not found in the list!
Deletion not possible!!!' and terminate the fuction.
• Step 6: If it is reached to the exact node which we want to delete, then check whether
list
is having only one node or not
• Step 7: If list has only one node and that is the node which is to be deleted then set
head
to NULL and delete temp (free(temp)).
• Step 8: If list contains multiple nodes, then check whether temp is the first node in
the
list (temp == head).
• Step 9: If temp is the first node, then move the head to the next node (head = head
→
next), set head of previous to NULL (head → previous = NULL) and delete temp.
• Step 10: If temp is not the first node, then check whether it is the last node in the list
(temp → next == NULL).
• Step11: If temp is the last node then set temp of previous of next to NULL (temp →
previous → next = NULL) and delete temp(free(temp)).
18
• Step 12: If temp is not the first node and not the last node, then set temp of previous
of
next to temp of next (temp → previous → next = temp → next), temp of next of
previous to temp of previous (temp → next → previous = temp → previous) and
delete temp (free(temp)).
19
1.6.3 Merge
Example:
• Creating a new dummy node. This will help to keep track as the head of the new list to
store merged sorted lists.
• Find the smallest among the two pointed by pointers h1 and h2 in each list. Copy that
node and insert it after the dummy node. Here 1 < 3, therefore 1 will be inserted after
the dummy node. Move h2 pointer to next node.
20
• 2 < 3, therefore, 2 is copied to another node and inserted at the end of the new list. h2
is moved to the next node.
• 5 > 3, so 3 will be copied to another node and inserted at the end. h1 is moved to the
next node.
• 5 < 7, so 5 will be copied to another node and inserted at the end. h2 is moved to the
next node.
• 8 > 7, so 7 will be copied into a new node and inserted at the end. h1 is moved to the
next node.
21
• 8 < 10, so 8 will be copied into a new node and inserted at the end. h1 is moved to the
next node.
• 10 = 10, so 10 will be copied into a new node and inserted at the end. h2 is now NULL.
• Now, list 1 has only nodes that are not inserted in the new list. So, we will insert the
remaining nodes present in list 1 into the list.
22
• Step 4: Once an iteration is complete, link node pointed by temp to node pointed
by l2. Swap l1 and l2.
• Step 5: If any one of the pointers among l1 and l2 is NULL, then move the node
pointed by temp to the next higher value node.
Example:
• Created two pointers l1 and l2. Comparing the first node of both lists. Pointing l1 to the
smaller one among the two. Create variable res and store the initial value of l1. This
ensures the head of the merged sorted list.
• Now, start iterating. A variable temp will always be equal to NULL at the start of the
iteration.
• 1 < 3. temp will store nodes pointed by l1. Then move l1 to the next node.
• 2 < 3. temp will store node l1(2) and then move l1 to the next node.
23
• 5 > 3. Now, the very first iteration completes. Now, the temp storing node is connected
to the node pointed by l2, i.e 2 links to 3. Swap l1 and l2. Initialize temp to NULL.
• The second iteration starts. 3 < 5. So, first store l1(3) in temp then move l1 to the next
connected node.
• 7 > 5. The second iteration stops here. Link node stored in temp node pointed by l2, i.e,
3 links to 5. Swap l1 and l2. temp is assigned to NULL at the start of the third iteration.
• 5 < 7. temp will store l1(5) and move l1 to the next linked node.
24
• 8 > 7. The third iteration stops. Link node stored in temp to node pointed by l2, i.e 5
links to 7. Swap l1 and l2. Assign temp to NULL at the start of the fourth iteration.
• 10 > 8. The fourth iteration stops here. 7 is linked to 8. Swap l1 and l2. The start of the
fifth iteration initializes temp to NULL.
• 10 = 10. temp stores l1(10). l1 moves forward and is now equal to NULL.
25
• As l1 is equal to NULL, so complete iteration stops. We link 10, which is stored in
variable temp, is linked to 10 pointed by l2.
1.6.4 Traversal
Traversal is used to visit each node of the list to perform an operation. Here, we will
traverse and print data present in the list.
Singly Linked List
A singly Linked List is Uni-directional, meaning traversal is possible in forwarding
direction only.
Doubly LinkedList
A doubly LinkedList is Bi-directional, meaning traversal is possible in both forward and
backward directions.
UNIT-I -END
26
UNIT – II
The most recently inserted element can be examined prior to performing a pop by use
of the top routine. A pop or top on an empty stack is generally considered an error in the
stack ADT. On the other hand, running out of space when performing a push is an
implementation limit but not an ADT error. Stack is a linear data structure that follows a
particular order in which the operations are performed. The order may be LIFO(Last In First
Out) or FILO(First In Last Out).
2.2 Operations
Mainly the following three basic operations are performed in the stack:
• Push: Adds an item in the stack. If the stack is full, then it is said to be an
Overflow condition.
• Pop: Removes an item from the stack. The items are popped in the reversed order
in which they are pushed. If the stack is empty, then it is said to be an Underflow
condition.
• Peek or Top: Returns the top element of the stack.
• isEmpty: Returns true if the stack is empty, else false.
Algorithm for PUSH operation
1. Check if the stack is full or not.
2. If the stack is full, then print error of overflow and exit the program.
3. If the stack is not full, then increment the top and add the element.
27
2.3 Applications
2.3.1 Evaluating arithmetic expressions
A stack is a very effective data structure for evaluating arithmetic expressions in programming
languages. An arithmetic expression consists of operands and operators.
In addition to operands and operators, the arithmetic expression may also include parenthesis
like "left parenthesis" and "right parenthesis".
Example: A + (B - C)
To evaluate the expressions, one needs to be aware of the standard precedence rules for
arithmetic expression. The precedence rules for the five basic arithmetic operators are:
Operators Associativity Precedence
^ - exponentiation Right to left Highest followed by * (Multiplication)
and / (division)
* - Multiplication, Left to right Highest followed by + (addition) and -
/ - division (subtraction)
+ - addition, Left to right Lowest
- - subtraction
Algorithm for Expression Evaluation
Now we know the problem statement for expression evaluation. So, without wasting
our time we move towards algorithm uses for the solution of expression evaluation.
1. Initialize a string s of length n consisting of expression.
2. Create one stack to store values and other to store operators.
3. Traverse through the string and check if the current character is white spaces continue
the loop. Else if it is an opening parenthesis push it in a stack of operators.
4. Else if the current character is a digit. Initialize an integer val as 0. Traverse from the
current position + 1 till the end of the string while the current character is a digit and
update the val as val * 10 + current digit. Push it in the stack of values.
5. Else if it is a closing parenthesis, traverse while the stack of operators is not empty
and current character in it is not an opening parenthesis.
6. Pop the top 2 digits from the stack of values and an operator from operator stack.
Perform the arithmetic operation and push the result in a stack of values.
28
7. While the operator’s stack is not empty, pop the top 2 digits from the stack of values
and an operator from operator stack. Perform the arithmetic operation and push the
result in a stack of values.
8. Return the top of the stack of values.
Example
Input : s = “100 * ( 2 + 12 )”
Output : 1400
The below shows the step by step processing of the above expression
29
Infix Notation
The infix notation is a convenient way of writing an expression in which each operator
is placed between the operands. Infix expressions can be parenthesized or unparenthesized
depending upon the problem requirement.
Example: A + B, (C - D) etc.
All these expressions are in infix notation because the operator comes between the
operands.
Prefix Notation
The prefix notation places the operator before the operands. This notation was
introduced by the Polish mathematician and hence often referred to as polish notation.
Example: + A B, -CD etc.
All these expressions are in prefix notation because the operator comes before the
operands.
Postfix Notation
The postfix notation places the operator after the operands. This notation is just the
reverse of Polish notation and also known as Reverse Polish notation.
Example: AB +, CD+, etc.
All these expressions are in postfix notation because the operator comes after the
operands.
Conversion of Arithmetic Expression into various Notations:
Infix Notation Prefix Notation Postfix Notation
A*B *AB AB*
(A+B)/C /+ ABC AB+C/
(A*B) + (D-C) +*AB – DC AB*DC-+
Algorithm for Infix to Postfix conversion
• Step 1: Consider the next element in the input.
• Step 2: If it is operand, display it.
• Step 3: If it is opening parenthesis, insert it on stack.
• Step 4: If it is an operator, then
o If stack is empty, insert operator on stack.
o If the top of stack is opening parenthesis, insert the operator on stack
o If it has higher priority than the top of stack, insert the operator on stack.
o Else, delete the operator from the stack and display it, repeat Step 4.
30
• Step 5: If it is a closing parenthesis, delete the operator from stack and display them
until an opening parenthesis is encountered. Delete and discard the opening
parenthesis.
• Step 6: If there is more input, go to Step 1.
• Step 7: If there is no more input, delete the remaining operators to output.
Example: Let's take the example of Converting an infix expression into a postfix expression.
(A+B/C+D*(E-F)^G)
31
int top=-1;
push(char elem)
{
s[++top]=elem;
return 0;
}
char pop()
{
return(s[top--]);
}
intpr(char elem)
{
switch(elem)
{
case '#': return 0;
case '(': return 1;
case '+':
case '-': return 2;
case '*':
case '/': return 3;
}
return 0;
}
void main()
{
char infx[50], pofx[50], ch, elem;
int i=0, k=0;
printf("\n\nEnter Infix Expression: ");
scanf("%s",infx);
push('#');
while( (ch=infx[i++]) != '\0')
{
if( ch == '(') push(ch);
32
else
if(isalnum(ch)) pofx[k++]=ch;
else
if( ch == ')')
{
while( s[top] != '(')
pofx[k++]=pop();
elem=pop();
}
else
{
while( pr(s[top]) >= pr(ch) )
pofx[k++]=pop();
push(ch);
}
}
while( s[top] != '#')
pofx[k++]=pop();
pofx[k]='\0';
printf("\n\n Given Infix Expression: %s \n Postfix Expresssion: %s\n",infx,pofx);
}
Output:
Enter Infix Expression ? 3*3/(4-1)+6*2
Given Infix Expression: 3*3/(4-1)+6*2
Postfix Expression: 33*41-/62*+
A queue is linear data structure and collection of elements. A queue is another special
kind of list, where items are inserted at one end called the rear and deleted at the other end
called the front. The principle of queue is a “FIFO” or “First-in-first-out”.
33
2.4.1 Operations
A queue is an object or more specifically an abstract data structure (ADT) that allows the
following operations:
• Enqueue or insertion: which inserts an element at the end of the queue.
• Dequeue or deletion: which deletes an element at the start of the queue.
Queue operations work as follows:
1. Two pointers called FRONT and REAR are used to keep track of the first and last
elements in the queue.
2. When initializing the queue, we set the value of FRONT and REAR to 0.
3. On enqueing an element, we increase the value of REAR index and place the new
element in the position pointed to by REAR.
4. On dequeueing an element, we return the value pointed to by FRONT and increase
the FRONT index.
5. Before enqueing, we check if queue is already full.
6. Before dequeuing, we check if queue is already empty.
7. When enqueing the first element, we set the value of FRONT to 1.
8. When dequeing the last element, we reset the values of FRONT and REAR to 0.
34
2.5 Circular Queue
• A circular queue is similar to a linear queue as it is also based on the FIFO (First In
First Out) principle except that the last position is connected to the first position in a
circular queue that forms a circle.
• It is also known as a Ring Buffer.
35
There are two cases in which the element cannot be inserted:
• When front ==0&&rear = max-1, which means that front is at the first position of the
Queue and rear is at the last position of the Queue.
• front== rear + 1;
Algorithm to insert an element in a circular queue
Step 1: IF (REAR+1)%MAX = FRONT
Write " OVERFLOW "
Goto step 4
[End OF IF]
Step 2: IF FRONT = -1 and REAR = -1
SET FRONT = REAR = 0
ELSE IF REAR = MAX - 1 and FRONT ! = 0
SET REAR = 0
ELSE
SET REAR = (REAR + 1) % MAX
[END OF IF]
Step 3: SET QUEUE[REAR] = VAL
Step 4: EXIT
Dequeue Operation
The steps of dequeue operation are given below:
• First, we check whether the Queue is empty or not. If the queue is empty, we cannot
perform the dequeue operation.
• When the element is deleted, the value of front gets decremented by 1.
• If there is only one element left which is to be deleted, then the front and rear are reset
to -1.
Algorithm to delete an element from the circular queue
Step 1: IF FRONT = -1
Write " UNDERFLOW "
Goto Step 4
[END of IF]
Step 2: SET VAL = QUEUE[FRONT]
Step 3: IF FRONT = REAR
36
SET FRONT = REAR = -1
ELSE IF FRONT = MAX -1
SET FRONT = 0
ELSE
SET FRONT = FRONT + 1
[END of IF]
[END OF IF]
Step 4: EXIT
37
• In this list, 4 is the smallest number.
• Hence, the ascending order priority queue treats number 4 as the highest priority and
45 has the lowest priority.
Descending Order Priority Queue
• A descending order priority queue gives the highest priority to the highest number in
that queue.
• For example, you have six numbers in the priority queue that are 4, 8, 12, 45, 35, 20.
• Firstly, you will arrange these numbers in ascending order.
• The new list is as follows: 45, 35, 20, 12, 8, 4.
•
• In this list, 45 is the highest number.
• Hence, the descending order priority queue treats number 45 as the highest priority and
4 has the lowest priority.
2.7 Deque
• The deque stands for Double Ended Queue.
• Deque is a data structure that inherits the properties of both queues and stacks.
• The linear queue has some restrictions while performing the insertion and deletion of
elements.
• The insertion in a linear queue must happen from the rear end and deletion from the
front end.
• But, in deque, you can perform both insertion and deletion operations at both of its
ends.
• That’s why it is called a Double-Ended Queue (Deque).
38
Types of deque
There are two types of deque -
• Input restricted queue
• Output restricted queue
Input restricted Queue
In input restricted queue, insertion operation can be performed at only one end, while
deletion can be performed from both ends.
Operations on deque
• Insertion at front
• Insertion at rear
• Deletion at front
• Deletion at rear
We can also perform peek operations in the deque along with the operations listed above.
Through peek operation, we can get the deque's front and rear elements of the deque. So, in
addition to the above operations, following operations are also supported in deque -
• Get the front item from the deque
• Get the rear item from the deque
• Check whether the deque is full or not
• Checks whether the deque is empty or not
Insertion at the front end
In this operation, the element is inserted from the front end of the queue. Before implementing
the operation, we first have to check whether the queue is full or not. If the queue is not full,
then the element can be inserted from the front end by using the below conditions -
39
• If the queue is empty, both rear and front are initialized with 0. Now, both will point to
the first element.
• Otherwise, check the position of the front if the front is less than 1 (front < 1), then
reinitialize it by front = n - 1, i.e., the last index of the array.
40
Deletion at the rear end
In this operation, the element is deleted from the rear end of the queue. Before
implementing the operation, we first have to check whether the queue is empty or not.
If the queue is empty, i.e., front = -1, it is the underflow condition, and we cannot
perform the deletion.
• If the deque has only one element, set rear = -1 and front = -1.
• If rear = 0 (rear is at front), then set rear = n - 1.
• Else, decrement the rear by 1 (or, rear = rear -1).
UNIT – II - END
41
UNIT – III
A tree is a collection of nodes. The collection can be empty; otherwise, a tree consists
of a distinguished node, r, called the root, and zero or more nonempty (sub) trees T1 , T2 , . . .
, Tk, each of whose roots are connected by a directed edge from r. The root of each sub-tree is
said to be a child of r, and r is the parent of each sub-tree root.
A tree is a collection of N nodes, one of which is the root, and N − 1 edges. That there
are N − 1 edges follows from the fact that each edge connects some node to its parent, and
every node except the root has one parent.
In the tree the root is A. Node F has A as a parent and K, L, and M as children. Each
node may have an arbitrary number of children, possibly zero. Nodes with no children are
known as leaves; the leaves in the tree above are B, C, H, I, P, Q, K, L, M, and N. Nodes with
the same parent are siblings; thus, K, L, and M are all siblings. Grandparent and grandchild
relations can be defined in a similar manner.
A path from node n1 to nk is defined as a sequence of nodes n1 , n2 , . . . , nk such that ni
is the parent of ni+1 for 1 ≤ i < k. The length of this path is the number of edges on the path,
namely, k − 1. There is a path of length zero from every node to itself. Notice that in a tree
there is exactly one path from the root to each node.
For any node ni the depth of ni is the length of the unique path from the root to ni. Thus,
the root is at depth 0. The height of ni is the length of the longest path from ni to a leaf. Thus
all leaves are at height 0. The height of a tree is equal to the height of the root. E is at depth 1
and height 2; F is at depth 1 and height 1; the height of the tree is 3. The depth of a tree is equal
to the depth of the deepest leaf; this is always equal to the height of the tree.
If there is a path from n1 to n2 , then n1 is an ancestor of n2 and n2 is a descendant of n1
. If n1 ≠ n2 , then n1 is a proper ancestor of n2 and n2 is a proper descendant of n1 .
42
3.1.1 Tree traversals
• When we wanted to display a tree, we need to follow some order in which all the nodes
of that tree must be displayed.
• In any tree displaying order of nodes depends on the traversal method.
• Displaying (or) visiting order of nodes in a tree is called as Tree Traversal.
• There are three types of tree traversals.
➢ In - Order Traversal
➢ Pre - Order Traversal
➢ Post - Order Traversal
In - Order Traversal ( leftChild - root - rightChild )
• In In-Order traversal, the root node is visited between left child and right child.
• In this traversal, the left child node is visited first, then the root node is visited and later
we go for visiting right child node.
• This in-order traversal is applicable for every root node of all subtrees in the tree.
• This is performed recursively for all nodes in the tree.
• In the above example of binary tree, first we try to visit left child of root node 'A', but
A's left child is a root node for left subtree.
• So we try to visit its (B's) left child 'D' and again D is a root for subtree with nodes D,
I and J.
• So we try to visit its left child 'I' and it is the left most child.
• So first we visit 'I'thengo for its root node 'D' and later we visit D's right child 'J'.
• With this we have completed the left part of node B.
• Then visit 'B' and next B's right child 'F' is visited.
• With this we have completed left part of node A.
• Then visit root node 'A'.
• With this we have completed left and root parts of node A.
• Then we go for right part of the node A.
• In right of A again there is a subtree with root C.
43
• So go for left child of C and again it is a subtree with root G.
• But G does not have left part so we visit 'G' and then visit G's right child K.
• With this we have completed the left part of node C.
• Then visit root node'C' and next visit C's right child 'H' which is the right most child in
the tree so we stop the process.
• That means here we have visited in the order of I - D - J - B - F - A - G - K - C - H using
In-Order Traversal.
• In-Order Traversal for above example of binary tree is
I-D-J-B-F-A-G-K-C–H
Algorithm
Until all nodes are traversed –
• Step 1 − Recursively traverse left subtree.
• Step 2 − Visit root node.
• Step 3 − Recursively traverse right subtree.
44
• In the above example of binary tree, first we visit root node 'A' then visit its left child
'B' which is a root for D and F.
• So we visit B's left child 'D' and again D is a root for I and J.
• So we visit D's left child'I' which is the left most child.
• So next we go for visiting D's right child 'J'. With this we have completed root, left and
right parts of node D and root, left parts of node B.
• Next visit B's right child'F'. With this we have completed root and left parts of node A.
• So we go for A's right child 'C' which is a root node for G and H.
• After visiting C, we go for its left child 'G' which is a root for node K.
• So next we visit left of G, but it does not have left child so we go for G's right child 'K'.
• With this we have completed node C's root and left parts.
• Next visit C's right child 'H' which is the right most child in the tree.
• So we stop the process.
• That means here we have visited in the order of A-B-D-I-J-F-C-G-K-H using Pre-Order
Traversal.
Algorithm
Until all nodes are traversed –
Step 1 − Visit root node.
Step 2 − Recursively traverse left subtree.
Step 3 − Recursively traverse right subtree.
45
preorder(ptr->right_child);
}
}
A binary tree is a tree in which no node can have more than two children. The binary
tree consists of a root and two sub-trees, TL and TR, both of which could possibly be empty.
A property of a binary tree that is sometimes important is that the depth of an average binary
tree is considerably smaller than N. An analysis shows that the average depth is O(√N), and
46
that for a special type of binary tree, namely the binary search tree, the average value of the
depth is O(log N). Unfortunately, the depth can be as large as N – 1
• In our example, the left sub-tree evaluates to a + (b * c) and the right sub-tree evaluates
to ((d * e) + f) * g. The entire tree therefore represents (a + (b * c)) + (((d * e) + f) * g).
• We can produce an (overly parenthesized) infix expression by recursively producing a
parenthesized left expression, then printing out the operator at the root, and finally
recursively producing a parenthesized right expression. This general strategy (left,
node, right) is known as an inorder traversal
• An alternate traversal strategy is to recursively print out the left sub-tree, the right sub-
tree, and then the operator. If we apply this strategy to our tree above, the output is a b
47
c * + d e * f + g * +, which is the postfix representation. This traversal strategy is
generally known as a postorder traversal.
• A third traversal strategy is to print out the operator first and then recursively print out
the left and right sub-trees. The resulting expression, + + a * b c * + * d e f g, is the less
useful prefix notation, and the traversal strategy is a preorder traversal
Constructing an Expression Tree
This algorithm is used to convert a postfix expression into an expression tree. We have
an algorithm to convert infix to postfix.
• We read our expression one symbol at a time.
• If the symbol is an operand, we create a one-node tree and push a pointer to it onto a
stack.
• If the symbol is an operator, we pop (pointers) to two trees T1 and T2 from the stack
(T1 is popped first) and form a new tree whose root is the operator and whose left and
right children point to T2 and T1 , respectively.
• A pointer to this new tree is then pushed onto the stack.
• As an example, suppose the input is “a b + c d e + * *”
The first two symbols are operands, so we create one-node trees and push pointers to them onto
a stack.
Next, a + is read, so two pointers to trees are popped, a new tree is formed, and a pointer to it
is pushed onto the stack.
Next, c, d, and e are read, and for each a one-node tree is created and a pointer to the
corresponding tree is pushed onto the stack.
48
Now a + is read, so two trees are merged.
Continuing, a * is read, so we pop two tree pointers and form a new tree with a * as root.
Finally, the last symbol is read, two trees are merged, and a pointer to the final tree is left on
the stack.
49
3.3 Applications of trees
• File Systems: The file system of a computer is often represented as a tree. Each folder
or directory is a node in the tree, and files are the leaves.
• XML Parsing: Trees are used to parse and process XML documents. An XML
document can be thought of as a tree, with elements as nodes and attributes as properties
of the nodes.
• Database Indexing: Many databases use trees to index their data. The B-tree and its
variations are commonly used for this purpose.
• Compiler Design: The syntax of programming languages is often defined using a tree
structure called a parse tree. This is used by compilers to understand the structure of the
code and generate machine code from it.
• Artificial Intelligence: Decision trees are often used in artificial intelligence to make
decisions based on a series of criteria
• The tree on the left is a binary search tree, but the tree on the right is not.
• The tree on the right has a node with item 7 in the left sub-tree of a node with item 6.
Contains
• This operation requires returning true if there is a node in tree T that has item X, or false
if there is no such node. The structure of the tree makes this simple.
• If T is empty, then we can just return false. Otherwise, if the item stored at T is X, we
can return true.
50
• Otherwise, we make a recursive call on a sub-tree of T, either left or right, depending
on the relationship of X to the item stored in T.
bool contains( const Comparable & x, BinaryNode *t ) const
{
if( t == nullptr )
return false;
else if( x < t->element )
return contains( x, t->left );
else if( t->element < x )
return contains( x, t->right );
else
return true; // Match
}
51
if( t != nullptr )
while( t->right != nullptr )
t = t->right;
return t;
}
insert
To insert X into tree T, proceed down the tree as you would with a contains.
If X is found, do nothing. Otherwise, insert X at the last spot on the path traversed.
52
{
if( t == nullptr )
t = new BinaryNode{std::move( x ), nullptr, nullptr };
else if( x < t->element )
insert(std::move( x ), t->left );
else if( t->element < x )
insert(std::move( x ), t->right );
else;
}
remove
• Once we have found the node to be deleted, we need to consider several possibilities.
• If the node is a leaf, it can be deleted immediately.
• If the node has one child, the node can be deleted after its parent adjusts a link to bypass
the node.
• The complicated case deals with a node with two children. The general strategy is to
replace the data of this node with the smallest data of the right sub-tree and recursively
delete that node.
• Because the smallest node in the right sub-tree cannot have a left child, the second
remove is an easy one.
53
• The node to be deleted is the left child of the root; the key value is 2.
• It is replaced with the smallest data in its right sub-tree (3), and then that node is deleted
as before.
54
• Each node in a threaded binary tree either contains a link to its child node or thread to
other nodes in the tree.
• In one-way threaded binary trees, a thread will appear either in the right or left link field
of a node.
• If it appears in the right link field of a node then it will point to the next node that will
appear on performing in order traversal.
• Such trees are called Right threaded binary trees.
• If thread appears in the left field of a node then it will point to the nodes inorder
predecessor.
• Such trees are called Left threaded binary trees.
• Left threaded binary trees are used less often as they don't yield the last advantages of
right threaded binary trees.
• In one-way threaded binary trees, the right link field of last node and left link field of
first node contains a NULL.
55
• In order to distinguish threads from normal links they are represented by dotted lines.
• The above figure shows the inorder traversal of this binary tree yields D, B, E, A, C, F.
• When this tree is represented as a right threaded binary tree, the right link field of leaf
node D which contains a NULL value is replaced with a thread that points to node B
which is the inorder successor of a node D.
• In the same way other nodes containing values in the right link field will contain NULL
value.
Two-way threaded Binary Trees:
• In two-way threaded Binary trees, the right link field of a node containing NULL values
is replaced by a thread that points to nodes inorder successor and left field of a node
containing NULL values is replaced by a thread that points to nodes inorder
predecessor.
56
• The above figure shows the inorder traversal of this binary tree yields D, B, E, G, A, C,
F.
• If we consider the two-way threaded Binary tree, the node E whose left field contains
NULL is replaced by a thread pointing to its inorder predecessor i.e. node B.
• Similarly, for node G whose right and left linked fields contain NULL values are
replaced by threads such that right link field points to its inorder successor and left link
field points to its inorder predecessor.
• In the same way, other nodes containing NULL values in their link fields are filled with
threads.
• In the above figure of two-way threaded Binary tree, we noticed that no left thread is
possible for the first node and no right thread is possible for the last node.
• This is because they don't have any inorder predecessor and successor respectively.
• This is indicated by threads pointing nowhere.
57
• So in order to maintain the uniformity of threads, we maintain a special node called
the header node.
• The header node does not contain any data part and its left link field points to the root
node and its right link field points to itself.
• If this header node is included in the two-way threaded Binary tree then this node
becomes the inorder predecessor of the first node and inorder successor of the last node.
• Now threads of left link fields of the first node and right link fields of the last node will
point to the header node.
Algorithm for Inorder Traversal of Threaded Binary Tree:
ALgorithm Inorder(I)
{
ThreadedTreeNode *Header;
Header=I;
while(1)
{
I=fnFindInorder_Successor(H);
if(I==Header)
return;
else
print(I->info);
}
}
58
• An AVL tree is identical to a binary search tree, except that for every node in the tree,
the height of the left and right sub-trees can differ by at most 1.
• The height of an empty tree is defined to be −1.
• Height information is kept for each node (in the node structure).
• All the tree operations can be performed in O(log N) time, except possibly insertion
and deletion.
• When we do an insertion, we need to update all the balancing information for the nodes
on the path back to the root, but the reason that insertion is potentially difficult is that
inserting a node could violate the AVL tree property.
• The property has to be restored before the insertion step is considered over.
• It turns out that this can always be done with a simple modification to the tree, known
as a rotation.
• After an insertion, only nodes that are on the path from the insertion point to the root
might have their balance altered because only those nodes have their sub-trees altered.
• As we follow the path up to the root and update the balancing information, we may find
a node whose new balance violates the AVL condition.
• We will show how to rebalance the tree at the first (i.e., deepest) such node, and we will
prove that this rebalancing guarantees that the entire tree satisfies the AVL property.
• Let us call the node that must be rebalanced α. Since any node has at most two children,
and a height imbalance requires that α’s two sub-trees’ heights differ by two, it is easy
to see that a violation might occur in four cases:
1. An insertion into the left sub-tree of the left child of α
2. An insertion into the right sub-tree of the left child of α
3. An insertion into the left sub-tree of the right child of α
4. An insertion into the right sub-tree of the right child of α
59
• The first case, in which the insertion occurs on the “outside” (i.e., left–left or right–
right), is fixed by a single rotation of the tree.
• The second case, in which the insertion occurs on the “inside” (i.e., left–right or right–
left) is handled by the slightly more complex double rotation.
• AVL Tree can be defined as height balanced binary search tree in which each node is
associated with a balance factor which is calculated by subtracting the height of its right
sub-tree from that of its left sub-tree.
• Tree is said to be balanced if balance factor of each node is in between -1 to 1,
otherwise, the tree will be unbalanced and need to be balanced.
• Balance Factor (k) = height (left(k)) - height (right(k))
• If balance factor of any node is 1, it means that the left sub-tree is one level higher than
the right sub-tree.
• If balance factor of any node is 0, it means that the left sub-tree and right sub-tree
contain equal height.
• If balance factor of any node is -1, it means that the left sub-tree is one level lower than
the right sub-tree.
Basic Operations of AVL Trees
The basic operations performed on the AVL Tree structures include all the operations
performed on a binary search tree, since the AVL Tree at its core is actually just a binary search
tree holding all its properties. Therefore, basic operations performed on an AVL Tree are −
Insertion and Deletion.
Insertion
The data is inserted into the AVL Tree by following the Binary Search Tree property
of insertion, i.e. the left subtree must contain elements less than the root value and right subtree
must contain all the greater elements. However, in AVL Trees, after the insertion of each
element, the balance factor of the tree is checked; if it does not exceed 1, the tree is left as it is.
But if the balance factor exceeds 1, a balancing algorithm is applied to readjust the tree such
that balance factor becomes less than or equal to 1 again.
Algorithm
• The following steps are involved in performing the insertion operation of an AVL
Tree
• Step 1 − Create a node
• Step 2 − Check if the tree is empty
60
• Step 3 − If the tree is empty, the new node created will become the root node of the
AVL Tree.
• Step 4 − If the tree is not empty, we perform the Binary Search Tree insertion
operation and check the balancing factor of the node in the tree.
• Step 5 − Suppose the balancing factor exceeds ±1, we apply suitable rotations on the
said node and resume the insertion from Step 4.
Deletion
Deletion in the AVL Trees take place in three different scenarios −
• Scenario 1 (Deletion of a leaf node) − If the node to be deleted is a leaf node, then it
is deleted without any replacement as it does not disturb the binary search tree property.
However, the balance factor may get disturbed, so rotations are applied to restore it.
• Scenario 2 (Deletion of a node with one child) − If the node to be deleted has one
child, replace the value in that node with the value in its child node. Then delete the
child node. If the balance factor is disturbed, rotations are applied.
• Scenario 3 (Deletion of a node with two child nodes) − If the node to be deleted has
two child nodes, find the inorder successor of that node and replace its value with the
inorder successor value. Then try to delete the inorder successor node. If the balance
factor exceeds 1 after deletion, apply balance algorithms.
AVL Insertion Process
Insertion in an AVL tree is similar to insertion in a binary search tree. But after inserting
and element, you need to fix the AVL properties using left or right rotations:
• If there is an imbalance in the left child's right sub-tree, perform a left-right rotation
• If there is an imbalance in the left child's left sub-tree, perform a right rotation
• If there is an imbalance in the right child's right sub-tree, perform a left rotation
• If there is an imbalance in the right child's left sub-tree, perform a right-left rotation
AVL Tree Rotations
In AVL trees, after each operation like insertion and deletion, the balance factor of
every node needs to be checked. If every node satisfies the balance factor condition, then the
operation can be concluded. Otherwise, the tree needs to be rebalanced using rotation
operations.
There are four rotations and they are classified into two types:
61
Left Rotation (LL Rotation)
In left rotations, every node moves one position to left from the current position.
62
3.7 B-Tree
• B-Trees, also known as B-Tree or Balanced Tree, are a type of self-balancing tree.
• B-Trees are characterized by the large number of keys that they can store in a single
node, which is why they are also known as “large key” trees.
• Each node in a B-Tree can contain multiple keys, which allows the tree to have a larger
branching factor and thus a shallower height.
• This shallow height leads to less disk I/O, which results in faster search and insertion
operations.
• B-Trees are particularly well suited for storage systems that have slow, bulky data
access such as hard drives, flash memory, and CD-ROMs.
• B-Trees maintain balance by ensuring that each node has a minimum number of keys,
so the tree is always balanced.
• This balance guarantees that the time complexity for operations such as insertion,
deletion, and searching is always O(log n), regardless of the initial shape of the tree.
Properties of B-Tree:
• All leaves are at the same level.
• B-Tree is defined by the term minimum degree ‘t‘. The value of ‘t‘ depends upon disk
block size.
• Every node except the root must contain at most t-1 keys. The root may contain a
minimum of 1 key.
• All nodes (including root) may contain at most (2*t – 1) keys.
• Number of children of a node is equal to the number of keys in it plus 1.
63
• All keys of a node are sorted in increasing order. The child between two keys k1 and
k2 contains all keys in the range from k1 and k2.
• B-Tree grows and shrinks from the root which is unlike Binary Search Tree. Binary
Search Trees grow downward and also shrink from downward.
• Like other balanced Binary Search Trees, the time complexity to search, insert and
delete is O(log n).
• Insertion of a Node in B-Tree happens only at Leaf Node.
• Following is an example of a B-Tree of minimum order 5
Traversal in B-Tree:
• Traversal is also similar to Inorder traversal of Binary Tree.
• We start from the leftmost child, recursively print the leftmost child, then repeat the
same process for the remaining children and keys.
• In the end, recursively print the rightmost child.
Search Operation in B-Tree:
Search is similar to the search in Binary Search Tree. Let the key to be searched is k.
• Start from the root and recursively traverse down.
• For every visited non-leaf node,
o If the node has the key, we simply return the node.
o Otherwise, we recur down to the appropriate child (The child which is just
before the first greater key) of the node.
• If we reach a leaf node and don’t find k in the leaf node, then return NULL.
struct Node
{
int n;
int key[MAX_KEYS];
Node* child[MAX_CHILDREN];
bool leaf;
64
};
Node* BtreeSearch(Node* x, int k)
{
int i = 0;
while (i < x->n && k > x->key[i])
{
i++;
}
if (i < x->n && k == x->key[i])
{
return x;
}
if (x->leaf)
{
returnnullptr;
}
returnBtreeSearch(x->child[i], k);
}
65
3.8 B+ Tree
• B + Treeis a variation of the B-tree data structure.
• In a B + tree, data pointers are stored only at the leaf nodes of the tree.
• In a B+ tree structure of a leaf node differs from the structure of internal nodes.
• The leaf nodes have an entry for every value of the search field, along with a data
pointer to the record or to the block that contains this record.
• The leaf nodes of the B+ tree are linked together to provide ordered access to the search
field to the records.
• Internal nodes of a B+ tree are used to guide the search.
• Some search field values from the leaf nodes are repeated in the internal nodes of the
B+ tree.
66
The Structure of the Internal Nodes of a B+ Tree
• Each internal node is of the form: <P1, K1, P2, K2, ….., Pc-1, Kc-1, Pc> where c <= a and
each Pi is atree pointer (i.e points to another node of the tree)and, eachKi is a key-value
(see diagram-I for reference).
• Every internal node has : K1< K2< …. < Kc-1
• For each search field value ‘X’ in the sub-tree pointed at by Pi, the following condition
holds: Ki-1< X <= Ki, for 1 < I < c and, Ki-1< X, for i = c (See diagram I for reference)
• Each internal node has at most ‘aa tree pointers.
• The root node has, at least two tree pointers, while the other internal nodes have at least
\ceil(a/2) tree pointers each.
• If an internal node has ‘c’ pointers, c <= a, then it has ‘c – 1’ key values.
67
Insertion in B+ Trees
Insertion in B+ Trees is done via the following steps.
• Every element in the tree has to be inserted into a leaf node. Therefore, it is necessary
to go to a proper leaf node.
• Insert the key into the leaf node in increasing order if there is no overflow.
• Case 1: Overflow in leaf node
o Split the leaf node into two nodes.
o First node contains ceil((m-1)/2) values.
o Second node contains the remaining values.
o Copy the smallest search key value from second node to the parent
node.(Right biased)
68
Example: Insert the following key values 6, 16, 26, 36, 46 on a B+ tree with order = 3.
Step 1: The order is 3 so at maximum in a node so there can be only 2 search key values. As
insertion happens on a leaf node only in a B+ tree so insert search key value 6 and 16 in
increasing order in the node.
Step 2: We cannot insert 26 in the same node as it causes an overflow in the leaf node, We
have to split the leaf node according to the rules. First part contains ceil((3-1)/2) values i.e.,
only 6. The second node contains the remaining values i.e., 16 and 26. Then also copy the
smallest search key value from the second node to the parent node i.e., 16 to the parent node.
Step 3: Now the next value is 36 that is to be inserted after 26 but in that node, it causes an
overflow again in that leaf node. Again follow the above steps to split the node. First part
contains ceil((3-1)/2) values i.e., only 16. The second node contains the remaining values i.e.,
26 and 36. Then also copy the smallest search key value from the second node to the parent
node i.e., 26 to the parent node.
69
Step 4: Now we have to insert 46 which is to be inserted after 36 but it causes an overflow in
the leaf node. So we split the node according to the rules. The first part contains 26 and the
second part contains 36 and 46 but now we also have to copy 36 to the parent node but it causes
overflow as only two search key values can be accommodated in a node. Now follow the steps
to deal with overflow in the non-leaf node. First node contains ceil(3/2)-1 values i.e.
’16’. Move the smallest among remaining to the parent i.e ’26’ will be the new parent node. The
second node contains the remaining keys i.e ’36’ and the rest of the leaf nodes remain the same.
Deletion in B+ Trees
• Deletion in B+ Trees is just not deletion but it is a combined process of Searching,
Deletion, and Balancing.
• In the last step of the Deletion Process, it is mandatory to balance the B+ Trees,
otherwise, it fails in the property of B+ Trees.
3.9 Heap
A Heap is a special Tree-based data structure in which the tree is a complete binary
tree.
Operations of Heap Data Structure:
70
• Heapify: a process of creating a heap from an array.
• Insertion: process to insert an element in existing heap time complexity O(log N).
• Deletion: deleting the top element of the heap or the highest priority element, and then
organizing the heap and returning the element with time complexity O(log N).
• Peek: to check or find the first (or top) element of the heap.
Types of Heap Data Structure
Generally, Heaps can be of two types:
• Max-Heap
• Min-Heap
Max-Heap:
In a Max-Heap the key present at the root node must be greatest among the keys present
at all of it’s children. The same property must be recursively true for all sub-trees in that Binary
Tree.
71
Min-Heap:
In a Min-Heap the key present at the root node must be minimum among the keys
present at all of it’s children. The same property must be recursively true for all sub-trees in
that Binary Tree.
• Priority queues: The heap data structure is commonly used to implement priority
queues, where elements are stored in a heap and ordered based on their priority. This
allows constant-time access to the highest-priority element, making it an efficient data
structure for managing tasks or events that require prioritization.
• Heap-sort algorithm: The heap data structure is the basis for the heap-sort algorithm,
which is an efficient sorting algorithm with a worst-case time complexity of O(n log n).
The heap-sort algorithm is used in various applications, including database indexing
and numerical analysis.
• Memory management: The heap data structure is used in memory management
systems to allocate and de-allocate memory dynamically. The heap is used to store the
memory blocks, and the heap data structure is used to efficiently manage the memory
blocks and allocate them to programs as needed.
• Graph algorithms: The heap data structure is used in various graph algorithms,
including Dijkstra’s algorithm, Prim’s algorithm, and Kruskal’s algorithm. These
algorithms require efficient priority queue implementation, which can be achieved
using the heap data structure.
• Job scheduling: The heap data structure is used in job scheduling algorithms, where
tasks are scheduled based on their priority or deadline. The heap data structure allows
efficient access to the highest-priority task, making it a useful data structure for job
scheduling applications.
72
UNIT – IV
4.1 Graph
4.1.1 Definition
Vertex
An individual data element of a graph is called as Vertex. Vertex is also known as node.
In above example graph, A, B, C, D & E are known as vertices.
Edge
An edge is a connecting link between two vertices. Edge is also known as Arc. An edge
is represented as (starting Vertex, ending Vertex). In above graph, the link between vertices A
and B is represented as (A,B).
Edges are three types:
• Undirected Edge - An undirected edge is a bidirectional edge. If there is an
undirected edge between vertices A and B then edge (A , B) is equal to edge (B , A).
• Directed Edge - A directed edge is a unidirectional edge. If there is a directed edge
between vertices A and B then edge (A , B) is not equal to edge (B , A).
• Weighted Edge - A weighted edge is an edge with cost on it.
Outgoing Edge
A directed edge is said to be outgoing edge on its origin vertex.
Incoming Edge
A directed edge is said to be incoming edge on its destination vertex.
Degree
Total number of edges connected to a vertex is said to be degree of that vertex.
73
Indegree
`Total number of incoming edges connected to a vertex is said to be indegree of that vertex.
Outdegree
Total number of outgoing edges connected to a vertex is said to be outdegree of that
vertex.
Parallel edges or Multiple edges
If there are two undirected edges to have the same end vertices, and for two directed
edges to have the same origin and the same destination. Such edges are called parallel edges or
multiple edges.
Self-loop
An edge (undirected or directed) is a self-loop if its two endpoints coincide.
Simple Graph
A graph is said to be simple if there are no parallel and self-loop edges.
Adjacent nodes
When there is an edge from one node to another then these nodes are called adjacent
nodes.
Incidence
In an undirected graph the edge between v1 and v2 is incident on node v1 and v2.
Walk
A walk is defined as a finite alternating sequence of vertices and edges, beginning and
ending with vertices, such that each edge is incident with the vertices preceding and following
it.
Closed walk
A walk which is to begin and end at the same vertex is called close walk. Otherwise it
is an open walk.
74
Path
A open walk in which no vertex appears more than once is called a path.
If e1 and e2 be the two edges between the pair of vertices (v1,v3) and (v1,v2)
respectively, then v3 e1 v1 e2 v2 be its path.
Length of a path
The number of edges in a path is called the length of that path. In the following, the
length of the path is 3.
Sub Graph
A graph S is said to be a sub graph of a graph G if all the vertices and all the edges of
S are in G, and each edge of S has the same end vertices in S as in G. A subgraph of G is a
graph G’ such that V(G’) V(G) and E(G’) E(G)
75
4.1.2 Representation of Graph
76
Merits of Adjacency Matrix:
• From the adjacency matrix, to determine the connection of vertices is easy
• The degree of a vertex is
• For a digraph, the row sum is the out_degree, while the column sum is the in_degree
• So that we can access the adjacency list for any vertex in O(1) time.
• Adjlist[i] is a pointer to to first node in the adjacency list for vertex i.
• Structure is
#define MAX_VERTICES 50
typedefstruct node *node_pointer;
typedefstruct node
{
77
int vertex;
struct node *link;
};
node_pointer graph[MAX_VERTICES];
int n=0; /* vertices currently in use */
AdjacencyMultilists
• In the adjacency-list representation of an undirected graph each edge (u, v) is
represented by two entries one on the list for u and the other on that list for v.
• This can be accomplished easily if the adjacency lists are actually maintained as
multilists (i.e., lists in which nodes may be shared among several lists).
78
• For each edge there will be exactly one node but this node will be in two lists (i.e. the
adjacency lists for each of the two nodes to which it is incident).
• For adjacency multilists, node structure is
typedefstruct edge *edge_pointer;
typedefstruct edge
{
shortint marked;
int vertex1, vertex2;
edge_pointer path1, path2;
};
edge_pointer graph[MAX_VERTICES];
Lists:
➢ vertex 0: N0->N1->N2,
➢ vertex 1: N0->N3->N4
➢ vertex 2: N1->N3->N5,
➢ vertex 3: N2->N4->N5
Undirected Graph
A graph with only undirected edges is said to be undirected graph.
79
Directed Graph
A graph with only directed edges is said to be directed graph.
Complete Graph
A graph in which any V node is adjacent to all other nodes present in the graph is known
as a complete graph. An undirected graph contains the edges that are equal to edges = n(n-1)/2
where n is the number of vertices present in the graph. The following figure shows a complete
graph.
Regular Graph
Regular graph is the graph in which nodes are adjacent to each other, i.e., each node is
accessible from any other node.
Cycle Graph
A graph having cycle is called cycle graph. In this case the first and last nodes are the
same. A closed simple path is a cycle
80
Acyclic Graph
A graph without cycle is called acyclic graphs.
Weighted Graph
A graph is said to be weighted if there are some non negative value assigned to each
edges of the graph. The value is equal to the length between two vertices. Weighted graph is
also called a network.
81
• We use Queue data structure with maximum size of total number of vertices in the
graph to implement BFS traversal of a graph
Algorithm for BFS
We use the following steps to implement BFS traversal...
• Step 1: Define a Queue of size total number of vertices in the graph.
• Step 2: Select any vertex as starting point for traversal. Visit that vertex and insert it
into the Queue.
• Step 3: Visit all the adjacent vertices of the vertex which is at front of the Queue
which is not visited and insert them into the Queue.
• Step 4: When there is no new vertex to be visit from the vertex at front of the Queue
then delete that vertex from the Queue.
• Step 5: Repeat step 3 and 4 until queue becomes empty.
• Step 6: When queue becomes Empty, then produce final spanning tree by removing
unused edges from the graph
82
State after visiting 0
83
State after visiting 8
➢ Enqueue the unvisited neighbor nodes: none (Note: 4 is enqueued again, but won't be
visited twice, so I leave it out)
➢ Next, visit the first node in the queue: 7
State after visiting 7
➢ Enqueue the unvisited neighbor nodes: none (Note: 2 is enqueued again, but won't be
visited twice, so I leave it out)
➢ Next, visit the first node in the queue: 2
State after visiting 2
84
State after visiting 4
85
Traversal order
86
• Step 3: Visit any one of the adjacent vertex of the verex which is at top of the stack
which is not visited and push it on to the stack.
• Step 4: Repeat step 3 until there are no new vertex to be visit from the vertex on top
of the stack.
• Step 5: When there is no new vertex to be visit then use back tracking and pop one
vertex from the stack.
• Step 6: Repeat steps 3, 4 and 5 until stack becomes Empty.
• Step 7: When stack becomes Empty, then produce final spanning tree by removing
unused edges from the graph
➢ Push the unvisited neighbor nodes: 8, 3, 1 (I used the reverse order to visit smaller node
id first)
➢ Next, visit the top node in the stack: 1
87
State after visiting 1
➢ Push the unvisited neighbor nodes: 5, 3 (Note: 3 is pushed again, and the previous value
will be cancelled later -- as we will see)
➢ Next, visit the top node in the stack: 3
88
State after visiting 3
➢ Push the unvisited neighbor nodes: 8 (Note: 8 is pushed again, and the previous value
will be cancelled later -- as we will see)
➢ Next, visit the top node in the stack: 8
State after visiting 8
89
State after visiting 5
90
Result:
91
• A directed edge (v, w) indicates that course v must be completed before course w may
be attempted.
• A topological ordering of these courses is any course sequence that does not violate the
prerequisite requirement.
• It is clear that a topological ordering is not possible if the graph has a cycle, since for
two vertices v and w on the cycle, v precedes w and w precedes v.
• Furthermore, the ordering is not necessarily unique; any legal ordering will do.
92
v.topNum = counter;
for each Vertex w adjacent to v
w.indegree--;
}
}
4.5 Bi-connectivity
If a graph is not biconnected, the vertices whose removal would disconnect the graph are known
as articulation points.
93
• Depth-first search provides a linear-time algorithm to find all articulation points in a
connected graph.
o First, starting at any vertex, we perform a depth-first search and number the nodes
as they are visited.
o For each vertex, v, we call this preorder number Num(v).
o Then, for every vertex, v, in the depth-first search spanning tree, we compute the
lowest numbered vertex, which we call Low(v), that is reachable from v by taking
zero or more tree edges and then possibly one back edge (in that order).
94
• Any other vertex v is an articulation point if and only if v has some child w such that
Low(w) ≥ Num(v).
• Notice that this condition is always satisfied at the root, hence the need for a special
test.
o The if part of the proof is clear when we examine the articulation points that the
algorithm determines, namely, C and D.
o D has a child E, and Low(E) ≥ Num(D), since both are 4.
o Thus, there is only one way for E to get to any node above D, and that is by going
through D.
o Similarly, C is an articulation point, because Low(G) ≥ Num(C).
o To prove that this algorithm is correct, one must show that the only if part of the
assertion is true (that is, this finds all articulation points).
95
o Every non-pendant vertex of a tree is a cut vertex.
• In the above graph, vertex 'e' is a cut-vertex. After removing vertex 'e' from the
above graph the graph will become a disconnected graph.
• This graph problem was solved in 1736 by Euler and marked the beginning of graph
theory.
• The problem is thus commonly referred to as an Euler path sometimes Euler tour or
Euler circuit problem, depending on the specific problem statement.
• The Euler tour and Euler circuit problems, though slightly different, have the same basic
solution.
• The first observation that can be made is that an Euler circuit, which must end on its
starting vertex, is possible only if the graph is connected and each vertex has an even
degree (number of edges).
• This is because, on the Euler circuit, a vertex is entered and then left.
• If any vertex v has odd degree, then eventually we will reach the point where only one
edge into v is unvisited, and taking it will strand us at v.
• If exactly two vertices have odd degree, an Euler tour, which must visit every edge but
need not return to its starting vertex, is still possible if we start at one of the odd-degree
vertices and finish at the other.
• If more than two vertices have odd degree, then an Euler tour is not possible.
96
• That is, any connected graph, all of whose vertices have even degree, must have an
Euler circuit.
• Furthermore, a circuit can be found in linear time.
• We can assume that we know that an Euler circuit exists, since we can test the necessary
and sufficient condition in linear time.
• Then the basic algorithm is to perform a depth-first search.
• The main problem is that we might visit a portion of the graph and return to the starting
point prematurely.
• If all the edges coming out of the start vertex have been used up, then part of the graph
is untraversed.
• The easiest way to fix this is to find the first vertex on this path that has an untraversed
edge and perform another depth-first search.
• This will give another circuit, which can be spliced into the original.
• This is continued until all edges have been traversed.
97
• Notice that in this graph, all the vertices must have even degree, so we are guaranteed
to find a cycle to add.
• The remaining graph might not be connected, but this is not important.
• The next vertex on the path that has untraversed edges is vertex 3.
• A possible circuit would then be 3, 2, 8, 9, 6, 3. When spliced in, this gives the path 5,
4, 1, 3, 2, 8, 9, 6, 3, 7, 4, 11, 10, 7, 9, 3, 4, 10, 5.
• On this path, the next vertex with an untraversed edge is 9, and the algorithm finds the
circuit 9, 12, 10, 9.
• When this is added to the current path, a circuit of 5, 4, 1, 3, 2, 8, 9, 12, 10, 9, 6, 3, 7,
4, 11, 10, 7, 9, 3, 4, 10, 5 is obtained.
• As all the edges are traversed, the algorithm terminates with an Euler circuit.
• A very similar problem is to find a simple cycle, in an undirected graph, that visits every
vertex. This is known as the Hamiltonian cycle problem.
• Computer Science: Graphs are used to model many problems and solutions in
computer science, such as representing networks, web pages, and social media
connections. Graph algorithms are used in path finding, data compression, and
scheduling.
• Social Networks: Graphs represent and analyze social networks, such as the
connections between individuals and groups.
98
• Transportation: Graphs can be used to model transportation systems, such as roads
and flights, and to find the shortest or quickest routes between locations.
• Computer Vision: Graphs represent and analyze images and videos, such as tracking
objects and detecting edges.
• Natural Language Processing: Graphs can represent and analyze text, such as in
syntactic and semantic dependency graphs.
• Telecommunication: Graphs are used to model telecommunication networks, such as
telephone and computer networks, and to analyze traffic and routing.
• Circuit Design: Graphs are used in the design of electronic circuits, such as logic
circuits and circuit diagrams.
• Bioinformatics: Graphs model and analyze biological data, such as protein-protein
interaction and genetic networks.
• Operations research: Graphs are used to model and analyze complex systems in
operations research, such as transportation systems, logistics networks, and supply
chain management.
• Artificial Intelligence: Graphs are used to model and analyze data in many AI
applications, such as machine learning, Artificial Intelligence, and natural language
processing.
UNIT-IV -END
99
UNIT – V
5.1 Searching
5.1.1 Linear search
• Linear Search is defined as a sequential search algorithm that starts at one end and goes
through each element of a list until the desired element is found, otherwise the search
continues till the end of the data set.
• Linear Search Algorithm,
o Every element is considered as a potential match for the key and checked for
the same.
o If any element is found equal to the key, the search is successful and the index
of that element is returned.
o If no element is found equal to the key, the search yields “No match found”.
• For example: Consider the array arr[] = {10, 50, 30, 70, 80, 20, 90, 40} and key = 30
o Step 1: Start from the first element (index 0) and compare key with each
element (arr[i]).
➢ Comparing key with first element arr[0]. SInce not equal, the iterator
moves to the next element as a potential match.
➢ Comparing key with next element arr[1]. SInce not equal, the iterator
moves to the next element as a potential match.
o Step 2: Now when comparing arr[2] with key, the value matches. So the Linear
Search Algorithm will yield a successful message and return the index of the
element when key is found (here 2).
100
5.1.2 Binary search
Example: Consider an array arr[] = {2, 5, 8, 12, 16, 23, 38, 56, 72, 91}, and the target = 23.
First Step:
• Calculate the mid and compare the mid element with the key.
• If the key is less than mid element, move to left and if it is greater than the mid then
move search space to the right.
o Key (i.e., 23) is greater than current mid element (i.e., 16). The search space
moves to the right.
o Key is less than the current mid 56. The search space moves to the left.
101
Second Step:
• If the key matches the value of the mid element, the element is found and stop search.
5.2 Sorting
5.2.1 Bubble sort
• Bubble sort traverse from left and compare adjacent elements and the higher one is
placed at right side.
• In this way, the largest element is moved to the rightmost end at first.
• This process is then continued to find the second largest and place it and so on until the
data is sorted.
Example
• We take an unsorted array.
• Bubble sort takes Ο(n2) time.
• Bubble sort starts with very first two elements, comparing them to check which one is
greater.
• We find that 27 is smaller than 33 and these two values must be swapped.
102
• Next we compare 33 and 35. We find that both are in already sorted positions.
• And when there's no swap required, bubble sorts learns that an array is completely
sorted.
• Selection sort is a simple and efficient sorting algorithm that works by repeatedly
selecting the smallest (or largest) element from the unsorted portion of the list and
moving it to the sorted portion of the list.
103
• The algorithm repeatedly selects the smallest (or largest) element from the unsorted
portion of the list and swaps it with the first element of the unsorted part.
• This process is repeated for the remaining unsorted portion until the entire list is sorted.
• Lets consider the following array as an example: arr[] = {64, 25, 12, 22, 11}
• First pass:
o For the first position in the sorted array, the whole array is traversed from index
0 to 4 sequentially. The first position where 64 is stored presently, after
traversing whole array it is clear that 11 is the lowest value.
o Thus, replace 64 with 11. After one iteration11, which happens to be the least
value in the array, tends to appear in the first position of the sorted list.
• Second Pass:
o For the second position, where 25 is present, again traverse the rest of the array
in a sequential manner.
o After traversing, we found that 12 is the second lowest value in the array and it
should appear at the second place in the array, thus swap these values.
• Third Pass:
o Now, for third place, where 25 is present again traverse the rest of the array and
find the third least value present in the array.
o While traversing, 22 came out to be the third least value and it should appear at
the third place in the array, thus swap 22 with element present at third position.
• Fourth pass:
o Similarly, for fourth position traverse the rest of the array and find the fourth
least element in the array
o As 25 is the 4th lowest value hence, it will place at the fourth position.
104
• Fifth Pass:
o At last the largest value present in the array automatically get placed at the last
position in the array
o The resulted array is the sorted array.
105
Working of Insertion sort Algorithm
Now, let's see the working of the insertion sort Algorithm.To understand the working
of the insertion sort algorithm, let's take an unsorted array. It will be easier to understand the
insertion sort via an example.
• Let the elements of array are
106
• Both 31 and 8 are not sorted.
• So, swap them.
• Now, the sorted array has three items that are 8, 12 and 25.
• Move to the next items that are 31 and 32.
107
• Swapping makes 31 and 17 unsorted.
• So, swap them too.
108
Algorithm
The simple steps of achieving the shell sort are listed as follows
ShellSort(a, n) // 'a' is the given array, 'n' is the size of array
for (interval = n/2; interval > 0; interval /= 2)
for ( i = interval; i < n; i += 1)
temp = a[i];
for (j = i; j >= interval && a[j - interval] > temp; j -= interval)
a[j] = a[j - interval];
a[j] = temp;
End ShellSort
• We will use the original sequence of shell sort, i.e., N/2, N/4,....,1 as the intervals.
• In the first loop, n is equal to 8 (size of the array), so the elements are lying at the
interval of 4 (n/2 = 4).
• Elements will be compared and swapped if they are not in order.
• Here, in the first loop, the element at the 0th position will be compared with the element
at 4th position.
• If the 0th element is greater, it will be swapped with the element at 4th position,
Otherwise, it remains the same.
• This process will continue for the remaining elements.
• At the interval of 4, the sublists are {33, 12}, {31, 17}, {40, 25}, {8, 42}.
109
• After comparing, we have to swap them if required in the original array.
• After comparing and swapping, the updated array will look as follows
• In the second loop, elements are lying at the interval of 2 (n/4 = 2), where n = 8.
• Now, we are taking the interval of 2 to sort the rest of the array.
• With an interval of 2, two sublists will be generated - {12, 25, 33, 40}, and {17, 8, 31,
42}.
• In the third loop, elements are lying at the interval of 1 (n/8 = 1), where n = 8.
• At last, we use the interval of value 1 to sort the rest of the array elements.
• In this step, shell sort uses insertion sort to sort the array elements.
110
5.2.5 Radix sort
• Radix Sortis a linear sorting algorithm that sorts elements by processing them digit by
digit.
• It is an efficient sorting algorithm for integers or strings with fixed-size keys.
• Rather than comparing elements directly, Radix Sort distributes the elements into
buckets based on each digit’s value.
• By repeatedly sorting the elements by their significant digits, from the least significant
to the most significant, Radix Sort achieves the final sorted order.
• Example: To perform radix sort on the array [170, 45, 75, 90, 802, 24, 2, 66]
o Step 1: Find the largest element in the array, which is 802. It has three digits, so we
will iterate three times, once for each significant place.
o Step 2: Sort the elements based on the unit place digits (X=0). We use a stable
sorting technique, such as counting sort, to sort the digits at each significant place.
111
o Step 4: Sort the elements based on the hundreds place digits.
➢ Sorting based on the hundreds place:
o Perform counting sort on the array based on the hundreds place digits.
o The sorted array based on the hundreds place is [2, 24, 45, 66, 75, 90, 170,
802].
5.3 Hashing
• The hash table ADT, which supports only a subset of the operations allowed by binary
search trees.
• The implementation of hash tables is frequently called hashing.
• Hashing is a technique used for performing insertions, deletions, and finds in constant
average time.
• The ideal hash table data structure is merely an array of some fixed size containing the
items.
• A search is performed on some part of the item that is called the key.
• The table size as TableSize.
112
• The common convention is to have the table run from 0 to TableSize – 1.
• Each key is mapped into some number in the range 0 to TableSize − 1 and placed in the
appropriate cell.
• The mapping is called a hash function, which ideally should be simple to compute and
should ensure that any two distinct keys get different cells.
• Since there are a finite number of cells and a virtually infinite supply of keys, this is
clearly impossible, and thus we seek a hash function that distributes the keys evenly
among the cells.
• In the above figure, john hashes to 3, phil hashes to 4, dave hashes to 6, and mary hashes
to 7.
5.3.1 Hash functions
• If the input keys are integers, then simply returning Key mod TableSize is generally a
reasonable strategy, unless Key happens to have some undesirable properties.
• In this case, the choice of hash function needs to be carefully considered. For instance,
if the table size
• is 10 and the keys all end in zero, then the standard hash function is a bad choice.
• When the input keys are random integers, then this function is not only very simple to
compute but also distributes the keys evenly.
• Usually, the keys are strings; in this case, the hash function needs to be chosen carefully.
• One option is to add up the ASCII values of the characters in the string.
• The routine for the hash function depicted below.
113
int hash( const string & key, inttableSize )
{
inthashVal = 0;
for( char ch : key )
hashVal += ch;
returnhashVal % tableSize;
}
• However, if the table size is large, the function does not distribute the keys well.
• For instance, suppose that TableSize = 10,007 (10,007 is a prime number).
• Suppose all the keys are eight or fewer characters long.
• Since an ASCII character has an integer value that is always at most 127, the hash
function typically can only assume values between 0 and 1,016, which is 127 ∗ 8.
• Another hash function is shown below.
int hash( const string & key, inttableSize )
{
return ( key[ 0 ] + 27 * key[ 1 ] + 729 * key[ 2 ] ) % tableSize;
}
• This hash function assumes that Key has at least three characters.
• The value 27 represents the number of letters in the English alphabet, plus the blank,
and 729 is 272.
• This function examines only the first three characters, but if these are random and the
table size is 10,007, as before, then we would expect a reasonably equitable distribution.
Unfortunately, English is not random.
• Although there are 263 = 17,576 possible combinations of three characters (ignoring
blanks), a check of a reasonably large online dictionary reveals that the number of
different combinations is actually only 2,851.
• Even if none of these combinations collide, only 28 percent of the table can actually be
hashed to. Thus this function, although easily computable, is also not appropriate if the
hash table is reasonably large.
• Another hash function is shown below.
114
unsignedint hash( const string & key, inttableSize )
{
unsignedinthashVal = 0;
for( char ch : key )
hashVal = 37 * hashVal + ch;
returnhashVal % tableSize;
}
• This hash function involves all characters in the key and can generally be expected to
distribute well.
• It computes the below and brings the result into proper range.
• The code computes a polynomial function (of 37) by use of Horner’s rule.
• For instance, another way of computing hk = k0 + 37k1 + 372k2 is by the formula hk =
((k2 )∗ 37 + k1 ) ∗ 37 + k0.
• Horner’s rule extends this to an nth degree polynomial.
115
• Hence, the conclusion is that in separate chaining, if two different elements have the
same hash value then we store both the elements in the same linked list one after the
other.
• Example: Let us consider a simple hash function as “key mod 7” and a sequence of
keys as 50, 700, 76, 85, 92, 73, 101
116
o Delete(k): If we simply delete a key, then the search may fail. So slots of deleted
keys are marked specially as “deleted”. The insert can insert an item in a deleted
slot, but the search doesn’t stop at a deleted slot.
o
Types of Open Addressing
o Linear Probing
o Quadratic Probing
o Double Hashing
Linear Probing
• In linear probing, the hash table is searched sequentially that starts from the original
location of the hash.
• If in case the location that we get is already occupied, then we check for the next
location.
• The function used for rehashing is as follows: rehash(key) = (n+1)%table-size.
• Let hash(x) be the slot index computed using a hash function and S be the table size
o If slot hash(x) % S is full, then we try (hash(x) + 1) % S
o If (hash(x) + 1) % S is also full, then we try (hash(x) + 2) % S
o If (hash(x) + 2) % S is also full, then we try (hash(x) + 3) % S
• Example
o Let us consider a simple hash function as “key mod 7” and a sequence of keys
as 50, 700, 76, 85, 92, 73, 101, which means hash(key)= key% S, here S=size
of the table =7,indexed from 0 to 6.
117
Quadratic Probing
• The interval between probes will increase proportionally to the hash value.
• This method is also known as the mid-square method. In this method, we look for the
i2‘th slot in the ith iteration. We always start from the original hash location. If only the
location is occupied then we check the other slots.
• Let hash(x) be the slot index computed using hash function.
o If slot hash(x) % S is full, then we try (hash(x) + 1*1) % S
o If (hash(x) + 1*1) % S is also full, then we try (hash(x) + 2*2) % S
o If (hash(x) + 2*2) % S is also full, then we try (hash(x) + 3*3) % S
• Example:
o Let us consider table Size = 7, hash function as Hash(x) = x % 7 and collision
resolution strategy to be f(i) = i2 . Insert = 22, 30, and 50.
• Step 1: Create a table of size 7
118
• Step 3: Inserting 50
o Hash(50) = 50 % 7 = 1
o In our hash table slot 1 is already occupied. So, we will search for slot 1+12, i.e.
1+1 = 2,
o Again slot 2 is found occupied, so we will search for cell 1+22, i.e.1+4 = 5,
o Now, cell 5 is not occupied so we will place 50 in slot 5.
Double Hashing
• The intervals that lie between probes are computed by another hash function.
• Double hashing is a technique that reduces clustering in an optimized way.
• In this technique, the increments for the probing sequence are computed by using
another hash function.
• We use another hash function hash2(x) and look for the i*hash2(x) slot in the ith
rotation.
• let hash(x) be the slot index computed using hash function.
o If slot hash(x) % S is full, then we try (hash(x) + 1*hash2(x)) % S
o If (hash(x) + 1*hash2(x)) % S is also full, then we try (hash(x) + 2*hash2(x))
%S
o If (hash(x) + 2*hash2(x)) % S is also full, then we try (hash(x) + 3*hash2(x))
%S
119
• Example:
o Insert the keys 27, 43, 692, 72 into the Hash Table of size 7. where first hash-
function is h1(k) = k mod 7 and second hash-function is h2(k) = 1 + (k mod 5)
• Step 1: Insert 27
o 27 % 7 = 6, location 6 is empty so insert 27 into 6 slot.
• Step 2: Insert 43
o 43 % 7 = 1, location 1 is empty so insert 43 into 1 slot.
120
• Step 4: Insert 72
o 72 % 7 = 2, but location 2 is already being occupied and this is a collision.
o So we need to resolve this collision using double hashing.
hnew = [h1(72) + i * (h2(72)] % 7
= [2 + 1 * (1 + 72 % 5)] % 7
=5%7
= 5,
o Now, as 5 is an empty slot, so we can insert 72 into 5th slot.
5.3.4 Rehashing
• If the table gets too full, the running time for the operations will start taking too long,
and insertions might fail for open addressing hashing with quadratic resolution.
• This can happen if there are too many removals intermixed with insertions.
• A solution, then, is to build another table that is about twice as big (with an associated
new hash function) and scan down the entire original hash table, computing the new
hash value for each (nondeleted) element and inserting it in the new table.
• As an example, suppose the elements 13, 15, 24, and 6 are inserted into a linear probing
hash table of size 7. The hash function is h(x) = x mod 7.
• The resulting hash table appears in the below Figure.
121
• If 23 is inserted into the table, the resulting table is shown below, it will be over 70
percent full.
122
• In particular, there must have been N/2 insertions prior to the last rehash, so it
essentially adds a constant cost to each insertion. This is why the new table is made
twice as large as the old table.
• Rehashing can be implemented in several ways with quadratic probing.
o One alternative is to rehash as soon as the table is half full.
o The other extreme is to rehash only when an insertion fails.
o A third, middle-of-the-road strategy is to rehash when the table reaches a certain
load factor.
void rehash( ) //Rehashing for quadratic probing hash table
{
vector<HashEntry>oldArray = array;
array.resize(nextPrime( 2 * oldArray.size( ) ) ); // Create new
double-sized, empty table
for( auto & entry : array )
entry.info = EMPTY; // Copy table over
currentSize = 0;
for( auto & entry : oldArray )
if( entry.info == ACTIVE )
insert(std::move( entry.element ) );
}
void rehash( ) // Rehashing for separate chaining hash table.
{
vector<list<HashedObj>>oldLists = theLists; // Create new
double-sized, empty table
theLists.resize(nextPrime( 2 * theLists.size( ) ) );
for( auto &thisList : theLists )
thisList.clear( ); // Copy table over
currentSize = 0;
for( auto &thisList : oldLists )
for( auto & x : thisList )
insert(std::move( x ) );
}
123
5.3.5 Extendible Hashing
• The main consideration then is the number of disk accesses required to retrieve data.
• As before, we assume that at any point we have N records to store; the value of N
changes over time.
• Furthermore, at most M records fit in one disk block. We will use M = 4 in this section.
• If either probing hashing or separate chaining hashing is used, the major problem is that
collisions could cause several blocks to be examined during a search, even for a well-
distributed hash table.
• Furthermore, when the table gets too full, an extremely expensive rehashing step must
be performed, which requires O(N) disk accesses.
• A clever alternative, known as extendible hashing, allows a search to be performed in
two disk accesses.
• Insertions also require few disk accesses.
• B-tree has depth O(logM/2 N). As M increases, the depth of a B-tree decreases. Then
we choose M to be so large that the depth of the B-tree would be 1.
• Then any search after the first would take one disk access, since, presumably, the root
node could be stored in main memory.
• The problem with this strategy is that the branching factor is so high that it would take
considerable processing to determine which leaf the data was in.
• If the time to perform this step could be reduced, then we would have a practical
scheme. This is exactly the strategy used by extendible hashing.
• Let us suppose, for the moment, that our data consists of several 6-bit integers. Then
the extendible hashing is shown below
• The root of the “tree” contains four pointers determined by the leading two bits of the
data.
• Each leaf has up to M = 4 elements.
124
• It happens that in each leaf the first two bits are identical; this is indicated by the number
in parentheses.
• D will represent the number of bits used by the root, which is sometimes known as the
directory.
• The number of entries in the directory is thus 2D.
• dL is the number of leading bits that all the elements of some leaf L have in common.
• dL will depend on the particular leaf, and d L ≤ D.
• Suppose that we want to insert the key 100100. This would go into the third leaf, but as
the third leaf is already full, there is no room.
• We thus split this leaf into two leaves, which are now determined by the first three bits.
• This requires increasing the directory size to 3.
• These changes are reflected in the below Figure
.
• Notice that all the leaves not involved in the split are now pointed to by two adjacent
directory entries.
• Thus, although an entire directory is rewritten, none of the other leaves is actually
accessed.
• If the key 000000 is now inserted, then the first leaf is split, generating two leaves with
dL = 3.
• Since D = 3, the only change required in the directory is the updating of the 000 and
001 pointers.
125
• This very simple strategy provides quick access times for insert and search operations
on large databases.
• There are a few important details we have not considered.
o First, it is possible that several directory splits will be required if the elements in a
leaf agree in more than D + 1 leading bits.
o For instance, starting at the original example, with D = 2, if 111010, 111011, and
finally 111100 are inserted, the directory size must be increased to 4 to distinguish
between the five keys.
o Second, there is the possibility of duplicate keys; if there are more than M
duplicates, then this algorithm does not work at all. In this case, some other
arrangements need to be made.
o These possibilities suggest that it is important for the bits to be fairly random. This
can be accomplished by hashing the keys into a reasonably long integer—hence
the name.
• Some of the performance properties of extendible hashing are
o The expected number of leaves is (N/M) log2e.
o Thus the average leaf is ln 2 = 0.69full.
o This is the same as for B-trees, which is not entirely surprising, since for both
data structures new nodes are created when the (M + 1)th entry is added.
o The more surprising result is that the expected size of the directory (in other
words, 2D) is O(N1+1/M/M).
o If M is very small, then the directory can get unduly large. In this case, we can
have the leaves contain pointers to the records instead of the actual records, thus
increasing the value of M.
o This adds a second disk access to each search operation in order to maintain a
smaller directory. If the directory is too large to fit in main memory, the second
disk access would be needed anyway.
UNIT-V -END
126
MODEL QUESTION PAPERS
(For the Candidates admitted from 2024-2025onwards)
B.Sc., Degree Examinations
DATA STRUCTURES AND ALGORITHMS
Time: 3 Hrs Maximum 75 Marks
SECTION – A (15 x 1 = 15 Marks)
Answer ALL questions
1. The smallest element of an array's index is called its __________
a) Lower bound b) Upper bound c) Range d) Exatraction
2. Stack follows the strategy of __________
a) LIFO b) FIFO c) LRU d) RANDOM
3. A queue is a __________
a) FIFO b) LIFO c) FILO d) LOFI
4. Deletion operation is done using __________ in a queue.
a) Front b) Rear c) Top d) List
5. LLINK is the pointer pointing to the__________
a) Successor node b) Predecessor node c) Head node d) Last node
6. Value of first linked list index is __________
a) 0 b)1 c) -1 d) 2
7. A graph is said to be __________ if its edges are assigned data.
a) Labeled b) Tagged c) Marked d) Sticked
8. Dijkstra algorithm is also called the __________ shortest path problem.
a) Single source b) Multi source c) Single destination d) Multi destination
9. The post-order traversal of the binary tree is DEBFCA. Find out the pre-order traversal.
a) ABDECF b) ABFCDE c) ADBFEC d) ABDCEF
10. In threaded binary tree __________ points to higher nodes in the tree.
a) Thread b) Root c) Info d) Child
11. Two main measures for the efficiency of an algorithm are
a) Processor and memory b) Complexity and capacity
c) Time and space d) Data and space
12. The complexity of Binary search algorithm is
a) O(n) b) O(log ) c) O(n2) d) O(n logn)
127
13. Sorting algorithm can be characterized as __________
a) Simple algorithm which require the order of n2 comparisons to sort n items.
b) Sophisticated algorithms that require the O(nlog2n) comparisons to sort items.
c) Both of the above d) None of the above
14. Searching techniques are classified in to __________ types
a) 2 b) 3 c) 4 d) None
15. In binary search tree, a __________ rooted to node n is the tree formed by imaging node
n was a root.
a) Subtree b) Cycle c) Node d) Root
SECTION – B (2 x 5 = 10 Marks)
Answer any TWO questions
16. Explain short about array representation.
17. Explain about singly linked list.
18. List and explain the terminology of tree.
19. Explain the method of K way merging.
20. Describe about file organization.
SECTION – C (5 x 10 = 50
Marks)
Answer ALL questions
21. a) Explain the stack and queue implementation in detail. (OR)
b) Explain about infix and postfix conversion.
22. a) Describe about garbage collection and compaction. (OR)
b) What is polynomial? Explain about polynomial addition.
23. a) What is tree? Explain its traversal algorithm. (OR)
b) Explain about connected components.
24. a) Explain about hash table. (OR)
b) Describe about sorting with disk.
25. a) Write about the heap sort. (OR)
b) Explain about Indexing techniques.
128