Unit I-V
Unit I-V
Abstract Data Types (ADTs) – List ADT – array-based implementation – linked list implementation ––
singly linked lists- circularly linked lists- doubly-linked lists – applications of lists –Polynomial
Manipulation – All operations (Insertion, Deletion, Merge, Traversal).
___________________________________________________________________________________
ABSTRACT DATA TYPES (ADTS)
Abstract Data type (ADT) is a type (or class) for objects whose behaviour is defined by a set of
value and a set of operations.
The definition of ADT only mentions what operations are to be performed but not how these
operations will be implemented. It does not specify how data will be organized in memory and what
algorithms will be used for implementing the operations. It is called “abstract” because it gives an
implementation independent view. The process of providing only the essentials and hiding the details is
known as abstraction.
* An abstract data type is a type with associated operations, but whose representation is hidden.
* Objects such as lists, sets, and graphs, along with their operations, can be viewed as abstract data
types.
* The basic idea is that the implementation of these operations is written once in the program, and
any other part of the program that needs to perform an operation on the ADT can do so by
calling the appropriate function.
* If for some reason implementation details need to change, it should be easy to do so by merely
changing the routines that perform the ADT operations. This change, in a perfect world, would be
completely transparent to the rest of the program.
The user of data type need not know that data type is implemented, for example, we have been
using int, float, char data types only with the knowledge with values that can take and operations that
can be performed on them without any idea of how these types are implemented. So a user only needs to
know what a data type can do but not how it will do it. We can think of ADT as a black box which hides
the inner structure and design of the data type. Now we’ll define three ADTs namely List ADT, Stack
ADT, Queue ADT.
Some common ADTs, which have proved useful in a great variety of applications, are
Linear data structures:- in which insertion and deletion is possible in linear fashion .example:-
arrays, linked lists.
Non linear data structures:-in which it is not possible. example:- trees ,stacks
LIST ADT
A list or sequence is an abstract data type that represents a countable number of ordered values,
where the same value may occur more than once. An instance of a list is a computer representation of
the mathematical concept of a finite sequence; the (potentially) infinite analog of a list is a stream.[1]:§3.5
1
Lists are a basic example of containers, as they contain other values. If the same value occurs multiple
times, each occurrence is considered a distinct item.
Operations
A list contains elements of same type arranged in sequential order and following operations can be
performed on the list.
get() – Return an element from the list at any given position.
insert() – Insert an element at any position of the list.
remove() – Remove the first occurrence of any element from a non-empty list.
removeAt() – Remove the element at a specified location from a non-empty list.
replace() – Replace an element at any position by another element.
size() – Return the number of elements in the list.
isEmpty() – Return true if the list is empty, otherwise return false.
isFull() – Return true if the list is full, otherwise return false.
AN ARRAY-BASED IMPLEMENTATION
An Array is a data structure which can store a fixed-size sequential collection of elements of the
same type.
An array is used to store a collection of data, but it is often more useful to think of an array as a
collection of variables of the same type.
Instead of declaring individual variables, such as number0, number1, ..., and number99, you
declare one array variable such as numbers and use numbers[0], numbers[1], and ...,
numbers[99] to represent individual variables.
A specific element in an array is accessed by an index.
All arrays consist of contiguous memory locations. The lowest address corresponds to the first
element and the highest address to the last element.
2
Operations:
Is Empty(LIST)
If (Current Size==0) "LIST is Empty"
else "LIST is not Empty"
Is Full(LIST)
If (Current Size=Max Size) "LIST is FULL"
else "LIST is not FULL"
Insert Element to End of the LIST
4
void create();
void insert();
void deletion();
void search();
void display();
int a,b[20], n, p, e, f, i, pos,ava=0;
void main()
{
//clrscr();
int ch;
char g='y';
do
{
printf("\n main Menu");
printf("\n 1.Create \n 2.Delete \n 3.Search \n 4.Insert \n 5.Display\n 6.Exit \n");
printf("\n Enter your Choice");
scanf("%d", &ch);
switch(ch)
{
case 1:
create();
break;
case 2:
deletion();
break;
case 3:
search();
break;
case 4:
insert();
break;
case 5:
display();
break;
case 6:
exit();
break;
default:
printf("\n Enter the correct choice:");
}
printf("\n Do u want to continue:::");
scanf("\n%c", &g);
}
while(g=='y'||g=='Y');
5
getch();
}
void create()
{
printf("\n Enter the number of nodes");
scanf("%d", &n);
for(i=0;i<n;i++)
{
printf("\n Enter the Element:",i+1);
scanf("%d", &b[i]);
}
void deletion()
{
printf("\n Enter the position u want to delete::");
scanf("%d", &pos);
if(pos>=n)
{
printf("\n Invalid Location::");
}
else
{
for(i=pos+1;i<n;i++)
{
b[i-1]=b[i];
}
n--;
}
printf("\n The Elements after deletion");
for(i=0;i<n;i++)
{
printf("\t%d", b[i]);
}
}
void search()
{
printf("\n Enter the Element to be searched:");
scanf("%d", &e);
for(i=0;i<n;i++)
{
if(b[i]==e)
{
ava=1;
}
}
if(ava==1)
{
6
printf("Value %d is in the list", e);
ava=0;
}
else
printf("Value %d is not in the list", e);
void insert()
{
printf("\n Enter the position u need to insert::");
scanf("%d", &pos);
if(pos>=n)
{
printf("\n invalid Location::");
}
else
{
for(i=MAX-1;i>=pos-1;i--)
{
b[i+1]=b[i];
}
printf("\n Enter the element to insert::\n");
scanf("%d",&p);
b[pos]=p;
n++;
}
printf("\n The list after insertion::\n");
display();
}
void display()
{
printf("\n The Elements of The list ADT are:");
for(i=0;i<n;i++)
{
printf("\n\n%d", b[i]);
}
}
And if we want to insert a new ID 1005, then to maintain the sorted order, we have to move all the
elements after 1000 (excluding 1000).
Deletion is also expensive with arrays until unless some special techniques are used. For example, to
delete 1010 in id[], everything after 1010 has to be moved.
Drawbacks:
1) Random access is not allowed. We have to access elements sequentially starting from the first node.
So we cannot do binary search with linked lists.
2) Extra memory space for a pointer is required with each element of the list.
Representation in C:
A linked list is represented by a pointer to the first node of the linked list. The first node is called head.
If the linked list is empty, then value of head is NULL.
8
DOUBLY LINKED LIST
9
Type declaration for linked list:
10
APPLICATIONS OF LIST
Lists can be used to store a list of elements. However, unlike in traditional arrays, lists can expand and
shrink, and are stored dynamically in memory.
In computing, lists are easier to implement than sets. A finite set in the mathematical sense can be
realized as a list with additional restrictions; that is, duplicate elements are disallowed and order is
irrelevant. Sorting the list speeds up determining if a given item is already in the set, but in order to
ensure the order, it requires more time to add new entry to the list. In efficient implementations,
however, sets are implemented using self-balancing binary search trees or hash tables, rather than a list.
Lists also form the basis for other abstract data types including the queue, the stack, and their variations.
POLYNOMIAL MANIPULATION
What is Polynomial ?
A polynomial is a mathematical expression consisting of a sum of terms, each term including a variable
or variables raised to a power and multiplied by a coefficient. The simplest polynomials have one
variable.
Representation of a Polynomial: A polynomial is an expression that contains more than two terms. A
term is made up of coefficient and exponent. An example of polynomial is
P(x) = 4x3+6x2+7x+9
A polynomial thus may be represented using arrays or linked lists. Array representation assumes that
the exponents of the given expression are arranged from 0 to the highest value (degree), which is
represented by the subscript of the array beginning with 0. The coefficients of the respective exponent
are placed at an appropriate index in the array. The array representation for the above polynomial
expression is given below:
A polynomial may also be represented using a linked list. A structure may be defined such that it
contains two parts- one is the coefficient and second is the corresponding exponent. The structure
definition may be given as shown below:
struct polynomial
{
int coefficient;
int exponent;
struct polynomial *next;
};
11
Thus the above polynomial may be represented using linked list as shown below:
For adding two polynomials using arrays is straightforward method, since both the arrays may be added
up element wise beginning from 0 to n-1, resulting in addition of two polynomials. Addition of two
polynomials using linked list requires comparing the exponents, and wherever the exponents are found
to be same, the coefficients are added up. For terms with different exponents, the complete term is
simply added to the result thereby making it a part of addition result. The complete program to add two
polynomials is given in subsequent section.
Multiplication of two polynomials however requires manipulation of each node such that the exponents
are added up and the coefficients are multiplied. After each term of first polynomial is operated upon
with each term of the second polynomial, then the result has to be added up by comparing the exponents
and adding the coefficients for similar exponents and including terms as such with dissimilar exponents
in the result.
Deletion
Delete Element from nth Position of the LIST
12
3. If (nth Position < Current Size)
1. Move all the Elements to one position backward except n,n-1,n-2,... 1
Position i.e move only from n+1 to current size position Elements. i.e
New Position=Current Position - 1.
2. After the previous step, nth element will be deleted automatically.
3. Decrease the Current Size by 1 i.e. Current Size=Current Size-1
Input Format
You have to complete the Node* MergeLists(Node* headA, Node* headB) method which takes two
arguments - the heads of the two sorted linked lists to merge. You should NOT read any input from
stdin/console.
Output Format
Change the next pointer of individual nodes so that nodes from both lists are merged into a single list.
Then return the head of this merged list. Do NOT print anything to stdout/console.
Sample Input
15 -> NULL
12 -> NULL
NULL
1 -> 2 -> NULL
Sample Output
Procedure:
else {
Node* temp = headB;
headB = headB->next;
temp->next = headA;
headA = temp;
headA->next = MergeLists(headA->next, headB);
}
return headA;
13
Traversal
Assume, that we have a list with some nodes. Traversal is the very basic operation, which presents as a
part in almost every operation on a singly-linked list. For instance, algorithm may traverse a singly-
linked list to find a value, find a position for insertion, etc. For a singly-linked list, only forward
direction traversal is possible.
Traversal algorithm
Example
14
A simple C program for traversal of a linked list
#include<stdio.h>
#include<stdlib.h>
struct Node
{
int data;
struct Node *next;
};
int main()
{
struct Node* head = NULL;
struct Node* second = NULL;
struct Node* third = NULL;
printList(head);
return 0;
}
15
UNIT II
LINEAR DATA STRUCTURES – STACKS, QUEUES
In a stack, the insertion operation is performed using a function called "push" and deletion operation is
performed using a function called "pop".
In the figure, PUSH and POP operations are performed at top position in the stack. That means, both the
insertion and deletion operations are performed at one end (i.e., at Top)
Example
If we want to create a stack by inserting 10,45,12,16,35 and 50. Then 10 becomes the bottom most
element and 50 is the top most element. Top is at 50 as shown in the image below...
OPERATIONS
Stack data structure can be implement in two ways. They are as follows...
1. Using Array
2. Using Linked List
When stack is implemented using array, that stack can organize only limited number of elements. When
stack is implemented using linked list, that stack can organize unlimited number of elements.
Before implementing actual operations, first follow the below steps to create an empty stack.
Step 1: Include all the header files which are used in the program and define a constant 'SIZE'
with specific value.
Step 2: Declare all the functions used in stack implementation.
Step 3: Create a one dimensional array with fixed size (int stack[SIZE])
Step 4: Define a integer variable 'top' and initialize with '-1'. (int top = -1)
Step 5: In main method display menu with list of operations and make suitable function calls to
perform operation selected by the user on the stack.
In a stack, push() is a function used to insert an element into the stack. In a stack, the new element is
always inserted at top position. Push function takes one integer value as parameter and inserts that value
into the stack. We can use the following steps to push an element on to the stack...
In a stack, pop() is a function used to delete an element from the stack. In a stack, the element is always
deleted from top position. Pop function does not take any value as parameter. We can use the following
steps to pop an element from the stack...
}
}
while(choice!=4);
18
return 0;
}
void push()
{
if(top>=n-1)
{
printf("\n\tSTACK is over flow");
}
else
{
printf(" Enter a value to be pushed:");
scanf("%d",&x);
top++;
stack[top]=x;
}
}
void pop()
{
if(top<=-1)
{
printf("\n\t Stack is under flow");
}
else
{
printf("\n\t The popped elements is %d",stack[top]);
top--;
}
}
void display()
{
if(top>=0)
{
printf("\n The elements in STACK \n");
for(i=top; i>=0; i--)
printf("\n%d",stack[i]);
printf("\n Press Next Choice");
}
else
{
printf("\n The STACK is empty");
}
The major problem with the stack implemented using array is, it works only for fixed number of data
values. That means the amount of data must be specified at the beginning of the implementation itself.
Stack implemented using array is not suitable, when we don't know the size of data which we are going
19
to use. A stack data structure can be implemented by using linked list data structure. The stack
implemented using linked list can work for unlimited number of values. That means, stack implemented
using linked list works for variable size of data. So, there is no need to fix the size at the beginning of
the implementation. The Stack implemented using linked list can organize as many data values as we
want.
In linked list implementation of a stack, every new element is inserted as 'top' element. That means
every newly inserted element is pointed by 'top'. Whenever we want to remove an element from the
stack, simply remove the node which is pointed by 'top' by moving 'top' to its next node in the list. The
next field of the first element must be always NULL.
Example
In above example, the last inserted node is 99 and the first inserted node is 25. The order of elements
inserted is 25, 32,50 and 99.
Operations
To implement stack using linked list, we need to set the following things before implementing actual
operations.
Step 1: Include all the header files which are used in the program. And declare all the user
defined functions.
Step 2: Define a 'Node' structure with two members data and next.
Step 3: Define a Node pointer 'top' and set it to NULL.
Step 4: Implement the main method by displaying Menu with list of operations and make
suitable function calls in the main method.
We can use the following steps to insert a new node into the stack...
We can use the following steps to delete a node from the stack...
We can use the following steps to display the elements (nodes) of a stack...
The simplest application of a stack is to reverse a word. You push a given word to stack - letter
by letter - and then pop letters from the stack.
Another application is an "undo" mechanism in text editors; this operation is accomplished by
keeping all text changes in a stack.
Backtracking. This is a process when you need to access the most recent data element in a
series of elements. Think of a labyrinth or maze - how do you find a way from an entrance to an
exit?
Once you reach a dead end, you must backtrack. But backtrack to where? to the previous choice
point. Therefore, at each choice point you store on a stack all possible choices. Then
backtracking simply means popping a next choice from the stack.
In depth-first search we go down a path until we get to a dead end; then we backtrack or back up (by
popping a stack) to get an alternative path.
21
o Create a stack
o Create a new choice point
o Push the choice point onto the stack
o while (not found and stack is not empty)
Pop the stack
Find all possible choices after the last one tried
Push these choices onto the stack
o Return
Expression evaluation
Evaluate an expression represented by a String. Expression can contain parentheses, you can
assume parentheses are well-matched. For simplicity, you can assume only binary operations
allowed are +, -, *, and /. Arithmetic Expressions can be written in one of three forms:
Infix Notation: Operators are written between the operands they operate on, e.g. 3 + 4 .
This process uses a stack as well. We have to hold information that's expressed inside parentheses while
scanning to find the closing ')'. We also have to hold information on operations that are of lower
precedence on the stack. The algorithm is:
This algorithm doesn't handle errors in the input, although careful analysis of parenthesis or lack of
parenthesis could point to such error determination.
22
QUEUE ADT
Queue is a linear data structure in which the insertion and deletion operations are performed at two
different ends. In a queue data structure, adding and removing of elements are performed at two
different positions. The insertion is performed at one end and deletion is performed at other end. In a
queue data structure, the insertion operation is performed at a position which is known as 'rear' and the
deletion operation is performed at a position which is known as 'front'. In queue data structure, the
insertion and deletion operations are performed based on FIFO (First In First Out) principle.
In a queue data structure, the insertion operation is performed using a function called "enQueue()" and
deletion operation is performed using a function called "deQueue()".
Example
OPERATIONS
Queue data structure can be implemented in two ways. They are as follows...
1. Using Array
2. Using Linked List
When a queue is implemented using array, that queue can organize only limited number of elements.
When a queue is implemented using linked list, that queue can organize unlimited number of elements.
In a queue data structure, deQueue() is a function used to delete an element from the queue. In a queue,
the element is always deleted from front position. The deQueue() function does not take any value as
parameter. We can use the following steps to delete an element from the queue...
24
In linked list implementation of a queue, the last inserted node is always pointed by 'rear' and the first
node is always pointed by 'front'.
Example
In above example, the last inserted node is 50 and it is pointed by 'rear' and the first inserted node is 10
and it is pointed by 'front'. The order of elements inserted is 10, 15, 22 and 50.
Operations
To implement queue using linked list, we need to set the following things before implementing actual
operations.
Step 1: Include all the header files which are used in the program. And declare all the user
defined functions.
Step 2: Define a 'Node' structure with two members data and next.
Step 3: Define two Node pointers 'front' and 'rear' and set both to NULL.
Step 4: Implement the main method by displaying Menu of list of operations and make suitable
function calls in the main method to perform user selected operation.
We can use the following steps to insert a new node into the queue...
Step 1: Create a newNode with given value and set 'newNode → next' to NULL.
Step 2: Check whether queue is Empty (rear == NULL)
Step 3: If it is Empty then, set front = newNode and rear = newNode.
Step 4: If it is Not Empty then, set rear → next = newNode and rear = newNode.
We can use the following steps to delete a node from the queue...
We can use the following steps to display the elements (nodes) of a queue...
25
CIRCULAR QUEUE
In a normal Queue Data Structure, we can insert elements until queue becomes full. But once if
queue becomes full, we cannot insert the next element until all the elements are deleted from the queue.
For example consider the queue below...
Now consider the following situation after deleting three elements from the queue...
This situation also says that Queue is Full and we can not insert the new element because, 'rear' is still
at last position. In above situation, even though we have empty positions in the queue we can not make
use of them to insert new element. This is the major problem in normal queue data structure. To
overcome this problem we use circular queue data structure.
26
enQueue(value) - Inserting value into the Circular Queue
In a circular queue, enQueue() is a function which is used to insert an element into the circular queue. In
a circular queue, the new element is always inserted at rear position. The enQueue() function takes one
integer value as parameter and inserts that value into the circular queue. We can use the following steps
to insert an element into the circular queue...
Step 1: Check whether queue is FULL. ((rear == SIZE-1 && front == 0) || (front ==
rear+1))
Step 2: If it is FULL, then display "Queue is FULL!!! Insertion is not possible!!!" and
terminate the function.
Step 3: If it is NOT FULL, then check rear == SIZE - 1 && front != 0 if it is TRUE, then set
rear = -1.
Step 4: Increment rear value by one (rear++), set queue[rear] = value and check 'front == -1'
if it is TRUE, then set front = 0.
In a circular queue, deQueue() is a function used to delete an element from the circular queue. In a
circular queue, the element is always deleted from front position. The deQueue() function doesn't take
any value as parameter. We can use the following steps to delete an element from the circular queue...
We can use the following steps to display the elements of a circular queue...
PRIORITY QUEUE
In normal queue data structure, insertion is performed at the end of the queue and deletion is
performed based on the FIFO principle. This queue implementation may not be suitable for all
situations.
Consider a networking application where server has to respond for requests from multiple clients using
queue data structure. Assume four requests arrived to the queue in the order of R1 requires 20 units of
27
time, R2 requires 2 units of time, R3 requires 10 units of time and R4 requires 5 units of time. Queue is
as follows...
1. R1 : 20 units of time
2. R2 : 22 units of time (R2 must wait till R1 complete - 20 units and R2 itself requeres 2
units. Total 22 units)
3. R3 : 32 units of time (R3 must wait till R2 complete - 22 units and R3 itself requeres 10
units. Total 32 units)
4. R4 : 37 units of time (R4 must wait till R3 complete - 35 units and R4 itself requeres 5
units. Total 37 units)
Here, average waiting time for all requests (R1, R2, R3 and R4) is (20+22+32+37)/4 ≈ 27 units of
time.
That means, if we use a normal queue data structure to serve these requests the average waiting time for
each request is 27 units of time.
Now, consider another way of serving these requests. If we serve according to their required amount of
time. That means, first we serve R2 which has minimum time required (2) then serve R4 which has
second minimum time required (5) then serve R3 which has third minimum time required (10) and
finnaly R1 which has maximum time required (20).
1. R2 : 2 units of time
2. R4 : 7 units of time (R4 must wait till R2 complete 2 units and R4 itself requeres 5 units.
Total 7 units)
3. R3 : 17 units of time (R3 must wait till R4 complete 7 units and R3 itself requeres 10 units.
Total 17 units)
4. R1 : 37 units of time (R1 must wait till R3 complete 17 units and R1 itself requeres 20
units. Total 37 units)
Here, average waiting time for all requests (R1, R2, R3 and R4) is (2+7+17+37)/4 ≈ 15 units of
time.
From above two situations, it is very clear that, by using second method server can complete all four
requests with very less time compared to the first method. This is what exactly done by the priority
queue.
There are two types of priority queues they are as follows...
1. Max Priority Queue
2. Min Priority Queue
In max priority queue, elements are inserted in the order in which they arrive the queue and always
maximum value is removed first from the queue. For example assume that we insert in order 8, 3, 2, 5
28
and they are removed in the order 8, 5, 3, 2.
In this representation elements are inserted according to their arrival order and maximum element is
deleted first from max priority queue.
For example, assume that elements are inserted in the order of 8, 2, 3 and 5. And they are removed in
the order 8, 5, 3 and 2.
isEmpty() - If 'front == -1' queue is Empty. This operation requires O(1) time complexity that means
constant time.
insert() - New element is added at the end of the queue. This operation requires O(1) time complexity
that means constant time.
findMax() - To find maximum element in the queue, we need to compare with all the elements in the
queue. This operation requires O(n) time complexity.
remove() - To remove an element from the queue first we need to perform findMax() which requires
O(n) and removal of particular element requires constant time O(1). This operation requires O(n) time
complexity.
#2. Using an Unordered Array (Dynamic Array) with the index of the maximum value
In this representation elements are inserted according to their arrival order and maximum element is
deleted first from max priority queue.
29
For example, assume that elements are inserted in the order of 8, 2, 3 and 5. And they are removed in
the order 8, 5, 3 and 2.
isEmpty() - If 'front == -1' queue is Empty. This operation requires O(1) time complexity that means
constant time.
insert() - New element is added at the end of the queue with O(1) and for each insertion we need to
update maxIndex with O(1). This operation requires O(1) time complexity that means constant time.
findMax() - To find maximum element in the queue is very simple as maxIndex has maximum element
index. This operation requires O(1) time complexity.
remove() - To remove an element from the queue first we need to perform findMax() which requires
O(1) , removal of particular element requires constant time O(1) and update maxIndex value which
requires O(n). This operation requires O(n) time complexity.
In this representation elements are inserted according to their value in decreasing order and maximum
element is deleted first from max priority queue.
For example, assume that elements are inserted in the order of 8, 5, 3 and 2. And they are removed in
the order 8, 5, 3 and 2.
isEmpty() - If 'front == -1' queue is Empty. This operation requires O(1) time complexity that means
constant time.
insert() - New element is added at a particular position in the decreasing order into the queue with O(n),
because we need to shift existing elements inorder to insert new element in decreasing order. This
operation requires O(n) time complexity.
findMax() - To find maximum element in the queue is very simple as maximum element is at the
beginning of the queue. This operation requires O(1) time complexity.
remove() - To remove an element from the queue first we need to perform findMax() which requires
O(1), removal of particular element requires constant time O(1) and rearrange remaining elements
which requires O(n). This operation requires O(n) time complexity.
30
#4. Using an Array (Dynamic Array) in Increasing Order
In this representation elements are inserted according to their value in increasing order and maximum
element is deleted first from max priority queue.
For example, assume that elements are inserted in the order of 2, 3, 5 and 8. And they are removed in
the order 8, 5, 3 and 2.
isEmpty() - If 'front == -1' queue is Empty. This operation requires O(1) time complexity that means
constant time.
insert() - New element is added at a particular position in the increasing order into the queue with O(n),
because we need to shift existing elements inorder to insert new element in increasing order. This
operation requires O(n) time complexity.
findMax() - To find maximum element in the queue is very simple as maximum element is at the end of
the queue. This operation requires O(1) time complexity.
remove() - To remove an element from the queue first we need to perform findMax() which requires
O(1), removal of particular element requires constant time O(1) and rearrange remaining elements
which requires O(n). This operation requires O(n) time complexity.
In this representation, we use a single linked list to represent max priority queue. In this representation
elements are inserted according to their value in increasing order and node with maximum value is
deleted first from max priority queue.
For example, assume that elements are inserted in the order of 2, 3, 5 and 8. And they are removed in
the order 8, 5, 3 and 2.
isEmpty() - If 'head == NULL' queue is Empty. This operation requires O(1) time complexity that
means constant time.
insert() - New element is added at a particular position in the increasing order into the queue with O(n),
because we need to the position where new element has to be inserted. This operation requires O(n)
time complexity.
31
findMax() - To find maximum element in the queue is very simple as maximum element is at the end of
the queue. This operation requires O(1) time complexity.
remove() - To remove an element from the queue is simply removing the last node in the queue which
requires O(1). This operation requires O(1) time complexity.
#6. Using Unordered Linked List with reference to node with the maximum value
In this representation, we use a single linked list to represent max priority queue. Always we maitain a
reference (maxValue) to the node with maximum value. In this representation elements are inserted
according to their arrival and node with maximum value is deleted first from max priority queue.
For example, assume that elements are inserted in the order of 2, 8, 3 and 5. And they are removed in
the order 8, 5, 3 and 2.
isEmpty() - If 'head == NULL' queue is Empty. This operation requires O(1) time complexity that
means constant time.
insert() - New element is added at end the queue with O(1) and update maxValue reference with O(1).
This operation requires O(1) time complexity.
findMax() - To find maximum element in the queue is very simple as maxValue is referenced to the
node with maximum value in the queue. This operation requires O(1) time complexity.
remove() - To remove an element from the queue is deleting the node which referenced by maxValue
which requires O(1) and update maxValue reference to new node with maximum value in the queue
which requires O(n) time complexity. This operation requires O(n) time complexity.
Min Priority Queue is similar to max priority queue except removing maximum element first, we
remove minimum element first in min priority queue.
The following operations are performed in Min Priority Queue...
1. isEmpty() - Check whether queue is Empty.
2. insert() - Inserts a new value into the queue.
3. findMin() - Find minimum value in the queue.
4. remove() - Delete minimum value from the queue.
Min priority queue is also has same representations as Max priority queue with minimum value removal.
32
Double Ended Queue can be represented in TWO ways, those are as follows...
1. Input Restricted Double Ended Queue
2. Output Restricted Double Ended Queue
Input Restricted Double Ended Queue
In input restricted double ended queue, the insertion operation is performed at only one end and deletion
operation is performed at both the ends.
In output restricted double ended queue, the deletion operation is performed at only one end and
insertion operation is performed at both the ends.
APPLICATIONS OF QUEUES
A real-world example of queue can be a single-lane one-way road, where the vehicle enters first, exits
first. More real-world examples can be seen as queues at the ticket windows and bus-stops.
Vehicle on Road
33
Ticket Counter : First person get ticket first and go out first
In breadth-first search we explore all the nearest possibilities by finding all possible successors and
enqueue them to a queue.
Create a queue
Create a new choice point
Enqueue the choice point onto the queue
while (not found and queue is not empty)
o Dequeue the queue
o Find all possible choices after the last one tried
o Enqueue these choices onto the queue
Return
34
UNIT III
NON LINEAR DATA STRUCTURES – TREES
Tree ADT – tree traversals - Binary Tree ADT – expression trees – applications of trees – binary search
tree ADT –Threaded Binary Trees- AVL Trees – B-Tree - B+ Tree - Heap – Applications of heap.
___________________________________________________________________________________
TREE ADT
A tree is a widely used abstract data type (ADT)—or data structure implementing this ADT—that
simulates a hierarchical tree structure, with a root value and subtrees of children with a parent node,
represented as a set of linked nodes.
A tree data structure can be defined recursively (locally) as a collection of nodes (starting at a root
node), where each node is a data structure consisting of a value, together with a list of references to
nodes (the "children"), with the constraints that no reference is duplicated, and none points to the root.
In linear data structure, data is organized in sequential order and in non-linear data structure, data is
organized in random order. Tree is a very popular data structure used in wide range of applications. A
tree data structure can be defined as follows...
Tree is a non-linear data structure which organizes data in hierarchical structure and this is a recursive
definition.
Tree data structure is a collection of data (Node) which is organized in hierarchical structure and this is a
recursive definition
In tree data structure, every individual element is called as Node. Node in a tree data structure, stores the
actual data of that particular element and link to next element in hierarchical structure.
In a tree data structure, if we have N number of nodes then we can have a maximum of N-1 number of
links.
Example
Terminology
35
1. Root
In a tree data structure, the first node is called as Root Node. Every tree must have root node. We can
say that root node is the origin of tree data structure. In any tree, there must be only one root node. We
never have multiple root nodes in a tree.
2. Edge
In a tree data structure, the connecting link between any two nodes is called as EDGE. In a tree with 'N'
number of nodes there will be a maximum of 'N-1' number of edges.
3. Parent
In a tree data structure, the node which is predecessor of any node is called as PARENT NODE. In
simple words, the node which has branch from it to any other node is called as parent node. Parent node
can also be defined as "The node which has child / children".
36
4. Child
In a tree data structure, the node which is descendant of any node is called as CHILD Node. In simple
words, the node which has a link from its parent node is called as child node. In a tree, any parent node
can have any number of child nodes. In a tree, all the nodes except root are child nodes.
5. Siblings
In a tree data structure, nodes which belong to same Parent are called as SIBLINGS. In simple words,
the nodes with same parent are called as Sibling nodes.
6. Leaf
In a tree data structure, the node which does not have a child is called as LEAF Node. In simple words,
a leaf is a node with no child.
In a tree data structure, the leaf nodes are also called as External Nodes. External node is also a node
with no child. In a tree, leaf node is also called as 'Terminal' node.
37
7. Internal Nodes
In a tree data structure, the node which has atleast one child is called as INTERNAL Node. In simple
words, an internal node is a node with atleast one child.
In a tree data structure, nodes other than leaf nodes are called as Internal Nodes. The root node is also
said to be Internal Node if the tree has more than one node. Internal nodes are also called as 'Non-
Terminal' nodes.
8. Degree
In a tree data structure, the total number of children of a node is called as DEGREE of that Node. In
simple words, the Degree of a node is total number of children it has. The highest degree of a node
among all the nodes in a tree is called as 'Degree of Tree'
9. Level
In a tree data structure, the root node is said to be at Level 0 and the children of root node are at Level 1
and the children of the nodes which are at Level 1 will be at Level 2 and so on... In simple words, in a
tree each step from top to bottom is called as a Level and the Level count starts with '0' and incremented
by one at each level (Step).
38
10. Height
In a tree data structure, the total number of egdes from leaf node to a particular node in the longest path
is called as HEIGHT of that Node. In a tree, height of the root node is said to be height of the tree. In a
tree, height of all leaf nodes is '0'.
11. Depth
In a tree data structure, the total number of egdes from root node to a particular node is called as
DEPTH of that Node. In a tree, the total number of edges from root node to a leaf node in the longest
path is said to be Depth of the tree. In simple words, the highest depth of any leaf node in a tree is said
to be depth of that tree. In a tree, depth of the root node is '0'.
12. Path
In a tree data structure, the sequence of Nodes and Edges from one node to another node is called as
PATH between that two Nodes. Length of a Path is total number of nodes in that path. In below
example the path A - B - E - J has length 4.
39
13. Sub Tree
In a tree data structure, each child from a node forms a subtree recursively. Every child node will form a
subtree on its parent node.
Tree Representations
A tree data structure can be represented in two methods. Those methods are as follows...
1. List Representation
2. Left Child - Right Sibling Representation
Consider the following tree...
1. List Representation
In this representation, we use two types of nodes one for representing the node with data and
another for representing only references. We start with a node with data from root node in the tree. Then
it is linked to an internal node through a reference node and is linked to any other node directly. This
process repeats for all the nodes in the tree.
The above tree example can be represented using List representation as follows...
40
2. Left Child - Right Sibling Representation
In this representation, we use list with one type of node which consists of three fields namely Data field,
Left child reference field and Right sibling reference field. Data field stores the actual value of a node,
left reference field stores the address of the left child and right reference field stores the address of the
right sibling node. Graphical representation of that node is as follows...
In this representation, every node's data field stores the actual value of that node. If that node has left
child, then left reference field stores the address of that left child node otherwise that field stores NULL.
If that node has right sibling then right reference field stores the address of right sibling node otherwise
that field stores NULL.
The above tree example can be represented using Left Child - Right Sibling representation as follows...
TREE TRAVERSALS
41
1. In - Order Traversal ( leftChild - root - rightChild )
In In-Order traversal, the root node is visited between left child and right child. In this traversal, the left
child node is visited first, then the root node is visited and later we go for visiting right child node. This
in-order traversal is applicable for every root node of all subtrees in the tree. This is performed
recursively for all nodes in the tree.
In the above example of binary tree, first we try to visit left child of root node 'A', but A's left child is a
root node for left subtree. so we try to visit its (B's) left child 'D' and again D is a root for subtree with
nodes D, I and J. So we try to visit its left child 'I' and it is the left most child. So first we visit 'I' then go
for its root node 'D' and later we visit D's right child 'J'. With this we have completed the left part of
node B. Then visit 'B' and next B's right child 'F' is visited. With this we have completed left part of
node A. Then visit root node 'A'. With this we have completed left and root parts of node A. Then we
go for right part of the node A. In right of A again there is a subtree with root C. So go for left child of C
and again it is a subtree with root G. But G does not have left part so we visit 'G' and then visit G's right
child K. With this we have completed the left part of node C. Then visit root node 'C' and next visit C's
right child 'H' which is the right most child in the tree so we stop the process.
In Pre-Order traversal, the root node is visited before left child and right child nodes. In this traversal,
the root node is visited first, then its left child and later its right child. This pre-order traversal is
applicable for every root node of all subtrees in the tree.
In the above example of binary tree, first we visit root node 'A' then visit its left child 'B' which is a root
for D and F. So we visit B's left child 'D' and again D is a root for I and J. So we visit D's left child 'I'
which is the left most child. So next we go for visiting D's right child 'J'. With this we have completed
root, left and right parts of node D and root, left parts of node B. Next visit B's right child 'F'. With this
we have completed root and left parts of node A. So we go for A's right child 'C' which is a root node
for G and H. After visiting C, we go for its left child 'G' which is a root for node K. So next we visit left
of G, but it does not have left child so we go for G's right child 'K'. With this we have completed node
C's root and left parts. Next visit C's right child 'H' which is the right most child in the tree. So we stop
the process.
That means here we have visited in the order of A-B-D-I-J-F-C-G-K-H using Pre-Order Traversal.
42
Pre-Order Traversal for above example binary tree is
A-B-D-I-J-F-C-G-K-H
43
Example
In a binary tree, every node can have a maximum of two children. But in strictly binary tree, every node
should have exactly two children or none and in complete binary tree all the nodes must have exactly
two children and at every level of complete binary tree there must be 2level number of nodes. For example
at level 2 there must be 22 = 4 nodes and at level 3 there must be 23 = 8 nodes.
A binary tree in which every internal node has exactly two children and all leaf nodes are at same level
is called Complete Binary Tree.Complete binary tree is also called as Perfect Binary Tree
In above figure, a normal binary tree is converted into full binary tree by adding dummy nodes (In pink
colour).
44
EXPRESSION TREES
Expression tree is a binary tree in which each internal node corresponds to operator and each leaf node
corresponds to operand so for example expression tree for 3 + ((5+9)*2) would be:
Inorder traversal of expression tree produces infix version of given postfix expression (same with
preorder traversal it gives prefix expression)
APPLICATIONS OF TREES
Unlike Array and Linked List, which are linear data structures, tree is hierarchical (or non-linear) data
structure.
1) One reason to use trees might be because you want to store information that naturally forms a
hierarchy. For example, the file system on a computer:
file system
———–
/ <-- root
/ \
... home
/ \
ugrad course
/ / | \
45
... cs101 cs112 cs113
2) If we organize keys in form of a tree (with some ordering e.g., BST), we can search for a given key in
moderate time (quicker than Linked List and slower than arrays). Self-balancing search trees like AVL
and Red-Black trees guarantee an upper bound of O(Logn) for search.
3) We can insert/delete keys in moderate time (quicker than Arrays and slower than Unordered Linked
Lists). Self-balancing search trees like AVL and Red-Black trees guarantee an upper bound of O(Logn)
for insertion/deletion.
4) Like Linked Lists and unlike Arrays, Pointer implementation of trees don’t have an upper limit on
number of nodes as nodes are linked using pointers.
In a binary tree, every node can have maximum of two children but there is no order of nodes based on
their values. In binary tree, the elements are arranged as they arrive to the tree, from top to bottom and
left to right.
To enhance the performance of binary tree, we use special type of binary tree known as Binary Search
Tree. Binary search tree mainly focus on the search operation in binary tree. Binary search tree can be
defined as follows...
Binary Search Tree is a binary tree in which every node contains only smaller values in its left subtree
and only larger values in its right subtree.
In a binary search tree, all the nodes in left subtree of any node contains smaller values and all the nodes
in right subtree of that contains larger values as shown in following figure...
46
Example
The following tree is a Binary Search Tree. In this tree, left subtree of every node contains nodes with
smaller values and right subtree of every node contains larger values.
Every Binary Search Tree is a binary tree but all the Binary Trees need not to be binary search
trees.
1. Search
2. Insertion
3. Deletion
In a binary search tree, the search operation is performed with O(log n) time complexity. The search
operation is performed as follows...
In a binary search tree, the insertion operation is performed with O(log n) time complexity. In binary
search tree, new node is always inserted as a leaf node. The insertion operation is performed as
follows...
Step 1: Create a newNode with given value and set its left and right to NULL.
Step 2: Check whether tree is Empty.
Step 3: If the tree is Empty, then set set root to newNode.
Step 4: If the tree is Not Empty, then check whether value of newNode is smaller or larger
than the node (here it is root node).
Step 5: If newNode is smaller than or equal to the node, then move to its left child. If newNode
is larger than the node, then move to its right child.
Step 6: Repeat the above step until we reach to a leaf node (e.i., reach to NULL).
Step 7: After reaching a leaf node, then isert the newNode as left child if newNode is smaller
or equal to that leaf else insert it as right child.
In a binary search tree, the deletion operation is performed with O(log n) time complexity. Deleting a
node from Binary search tree has follwing three cases...
We use the following steps to delete a node with one child from BST...
48
Case 3: Deleting a node with two children
We use the following steps to delete a node with two children from BST...
Example
10,12,5,4,20,8,7,15 and 13
A. J. Perlis and C. Thornton have proposed new binary tree called "Threaded Binary Tree",
which make use of NULL pointer to improve its traversal processes. In threaded binary tree, NULL
pointers are replaced by references to other nodes in the tree, called threads.
Threaded Binary Tree is also a binary tree in which all left child pointers that are NULL (in
Linked list representation) points to its in-order predecessor, and all right child pointers that are NULL
(in Linked list representation) points to its in-order successor. If there is no in-order predecessor or in-
order successor, then it point to root node.
To convert above binary tree into threaded binary tree, first find the in-order traversal of that tree...
In-order traversal of above binary tree...
H-D-I-B-E-A-F-J-C-G
When we represent above binary tree using linked list representation, nodes H, I, E, F, J and G left
child pointers are NULL. This NULL is replaced by address of its in-order predecessor, respectively (I
to D, E to B, F to A, J to F and G to C), but here the node H does not have its in-order predecessor, so it
points to the root node A. And nodes H, I, E, J and G right child pointers are NULL. This NULL
ponters are replaced by address of its in-order successor, respectively (H to D, I to B, E to A, and J to
C), but here the node G does not have its in-order successor, so it points to the root node A.
Above example binary tree become as follows after converting into threaded binary tree.
50
AVL TREES
What if the input to binary search tree comes in a sorted (ascending or descending) manner? It will then
look like this −
It is observed that BST's worst-case performance is closest to linear search algorithms, that is Ο(n). In
real-time data, we cannot predict data pattern and their frequencies. So, a need arises to balance out the
existing BST.
Named after their inventor Adelson, Velski & Landis, AVL trees are height balancing binary search
tree. AVL tree checks the height of the left and the right sub-trees and assures that the difference is not
more than 1. This difference is called the Balance Factor.
Here we see that the first tree is balanced and the next two trees are not balanced −
In the second tree, the left subtree of C has height 2 and the right subtree has height 0, so the difference
is 2. In the third tree, the right subtree of A has height 2 and the left is missing, so it is 0, and the
difference is 2 again. AVL tree permits difference (balance factor) to be only 1.
51
If the difference in the height of left and right sub-trees is more than 1, the tree is balanced using some
rotation techniques.
AVL Rotations
To balance itself, an AVL tree may perform the following four kinds of rotations −
Left rotation
Right rotation
Left-Right rotation
Right-Left rotation
The first two rotations are single rotations and the next two rotations are double rotations. To have an
unbalanced tree, we at least need a tree of height 2. With this simple tree, let's understand them one by
one.
Left Rotation
If a tree becomes unbalanced, when a node is inserted into the right subtree of the right subtree, then we
perform a single left rotation −
In our example, node A has become unbalanced as a node is inserted in the right subtree of A's right
subtree. We perform the left rotation by making A the left-subtree of B.
Right Rotation
AVL tree may become unbalanced, if a node is inserted in the left subtree of the left subtree. The tree
then needs a right rotation.
52
As depicted, the unbalanced node becomes the right child of its left child by performing a right rotation.
Left-Right Rotation
Double rotations are slightly complex version of already explained versions of rotations. To understand
them better, we should take note of each action performed while rotation.
Let's first check how to perform Left-Right rotation. A left-right rotation is a combination of left rotation
followed by right rotation.
State Action
A node has been inserted into the right subtree of the left subtree. This
makes C an unbalanced node. These scenarios cause AVL tree to
perform left-right rotation.
We first perform the left rotation on the left subtree of C. This makes A,
the left subtree of B.
We shall now right-rotate the tree, making B the new root node of this
subtree. C now becomes the right subtree of its own left subtree.
53
The tree is now balanced.
Right-Left Rotation
The second type of double rotation is Right-Left Rotation. It is a combination of right rotation followed
by left rotation.
State Action
A node has been inserted into the left subtree of the right subtree. This
makes A, an unbalanced node with balance factor 2.
First, we perform the right rotation along C node, making C the right
subtree of its own left subtree B. Now, B becomes the right subtree of A.
54
A left rotation is performed by making B the new root node of the
subtree. A becomes the left subtree of its right subtree B.
In an AVL tree, the search operation is performed with O(log n) time complexity. The search operation
is performed similar to Binary search tree search operation. We use the following steps to search an
element in AVL tree...
In an AVL tree, the insertion operation is performed with O(log n) time complexity. In AVL Tree, new
node is always inserted as a leaf node. The insertion operation is performed as follows...
Step 1: Insert the new element into the tree using Binary Search Tree insertion logic.
Step 2: After insertion, check the Balance Factor of every node.
Step 3: If the Balance Factor of every node is 0 or 1 or -1 then go for next operation.
Step 4: If the Balance Factor of any node is other than 0 or 1 or -1 then tree is said to be
imbalanced. Then perform the suitable Rotation to make it balanced. And go for next operation.
55
Example: Construct an AVL Tree by inserting numbers from 1 to 8.
56
B-TREE
In a binary search tree, AVL Tree, Red-Black tree etc., every node can have only one value (key)
and maximum of two children but there is another type of search tree called B-Tree in which a node can
store more than one value (key) and it can have more than two children. B-Tree was developed in the
year of 1972 by Bayer and McCreight with the name Height Balanced m-way Search Tree. Later it
was named as B-Tree.
B-Tree is a self-balanced search tree with multiple keys in every node and more than two children for
every node.
57
Here, number of keys in a node and number of children for a node is depend on the order of the B-Tree.
Every B-Tree has order.
For example, B-Tree of Order 4 contains maximum 3 key values in a node and maximum 4 children for
a node.
Example
Operations on a B-Tree
1. Search
2. Insertion
3. Deletion
In a B-Ttree, the search operation is similar to that of Binary Search Tree. In a Binary search tree, the
search process starts from the root node and every time we make a 2-way decision (we go to either left
subtree or right subtree). In B-Tree also search process starts from the root node but every time we make
n-way decision where n is the total number of children that node has. In a B-Ttree, the search operation
is performed with O(log n) time complexity. The search operation is performed as follows...
58
Step 7: If we completed with last key value in a leaf node, then display "Element is not found"
and terminate the function.
In a B-Tree, the new element must be added only at leaf node. That means, always the new keyValue is
attached to leaf node only. The insertion operation is performed as follows...
Example
59
60
61
B+ TREE
A B+ tree is an N-ary tree with a variable but often large number of children per node. A B+ tree
consists of a root, internal nodes and leaves. The root may be either a leaf or a node with two or more
children.
A B+ tree can be viewed as a B-tree in which each node contains only keys (not key–value pairs), and to
which an additional level is added at the bottom with linked leaves.
The primary value of a B+ tree is in storing data for efficient retrieval in a block-oriented storage
context — in particular, filesystems. This is primarily because unlike binary search trees, B+ trees have
very high fanout (number of pointers to child nodes in a node, typically on the order of 100 or more),
which reduces the number of I/O operations required to find an element in the tree.
A simple B+ tree example linking the keys 1–7 to data values d1-d7. The linked list (red) allows rapid in-
order traversal. This particular tree's branching factor is =4.
Insertion algorithm
1. If the node has an empty space, insert the key/reference pair into the node.
2. If the node is already full, split it into two nodes, distributing the keys evenly between the two
nodes. If the node is a leaf, take a copy of the minimum value in the second of these two nodes
and repeat this insertion algorithm to insert it into the parent node. If the node is a non-leaf,
exclude the middle value during the split and repeat this insertion algorithm to insert this
excluded value into the parent node.
Initial:
Insert 20:
Insert 13:
Insert 15:
Insert 10:
63
Insert 11:
Insert 12:
Deletion algorithm
1. Remove the required key and associated reference from the node.
2. If the node still has enough keys and references to satisfy the invariants, stop.
3. If the node has too few keys to satisfy the invariants, but its next oldest or next youngest sibling
at the same level has more than necessary, distribute the keys between this node and the
neighbor. Repair the keys in the level above to represent that these nodes now have a different
“split point” between them; this involves simply changing a key in the levels above, without
deletion or insertion.
4. If the node has too few keys to satisfy the invariant, and the next oldest or next youngest sibling
is at the minimum for the invariant, then merge the node with its sibling; if the node is a non-
leaf, we will need to incorporate the “split key” from the parent into our merging. In either case,
we will need to repeat the removal algorithm on the parent node to remove the “split key” that
previously separated these merged nodes — unless the parent is the root and we are removing
the final key from the root, in which case the merged node becomes the new root (and the tree
has become one level shorter than before).
64
Initial:
Delete 13:
Delete 15:
Delete 1:
HEAP
Heap is a special case of balanced binary tree data structure where the root-node key is compared with
its children and arranged accordingly. If α has child node β then −
65
key(α) ≥ key(β)
As the value of parent is greater than that of child, this property generates Max Heap. Based on this
criteria, a heap can be of two types −
For Input → 35 33 42 10 14 19 27 44 26 31
Min-Heap − Where the value of the root node is less than or equal to either of its children.
Max-Heap − Where the value of the root node is greater than or equal to either of its children.
Both trees are constructed using the same input and order of arrival.
We shall use the same example to demonstrate how a Max Heap is created. The procedure to create Min
Heap is similar but we go for min values instead of max values.
We are going to derive an algorithm for max heap by inserting one element at a time. At any point of
time, heap must maintain its property. While insertion, we also assume that we are inserting a node in an
already heapified tree.
Let us derive an algorithm to delete from max heap. Deletion in Max (or Min) Heap always happens at
the root to remove the Maximum (or minimum) value.
APPLICATIONS OF HEAP
Heapsort: One of the best sorting methods being in-place and with no quadratic worst-case
scenarios.
Selection algorithms: A heap allows access to the min or max element in constant time, and
other selections (such as median or kth-element) can be done in sub-linear time on data that is in
a heap.[16]
Graph algorithms: By using heaps as internal traversal data structures, run time will be reduced
by polynomial order. Examples of such problems are Prim's minimal-spanning-tree algorithm
and Dijkstra's shortest-path algorithm.
Priority Queue: A priority queue is an abstract concept like "a list" or "a map"; just as a list can
be implemented with a linked list or an array, a priority queue can be implemented with a heap
or a variety of other methods.
Order statistics: The Heap data structure can be used to efficiently find the kth smallest (or
largest) element in an array.
67
UNIT IV
NON LINEAR DATA STRUCTURES - GRAPHS
Definition – Representation of Graph – Types of graph - Breadth-first traversal - Depth-first traversal –
Topological Sort – Bi-connectivity – Cut vertex – Euler circuits – Applications of graphs.
___________________________________________________________________________________
DEFINITION
Graph is a non linear data structure, it contains a set of points known as nodes (or vertices) and set of
linkes known as edges (or Arcs) which connets the vertices. A graph is defined as follows...
Graph is a collection of vertices and arcs which connects vertices in the graph.Graph is a collection of
nodes and edges which connects nodes in the graph.
Example
The following is a graph with 5 vertices and 6 edges.
This graph G can be defined as G = ( V , E )
Where V = {A,B,C,D,E} and E = {(A,B),(A,C)(A,D),(B,D),(C,D),(B,E),(E,D)}.
Graph Terminology
We use the following terms in graph data structure...
Vertex
A individual data element of a graph is called as Vertex. Vertex is also known as node. In above
example graph, A, B, C, D & E are known as vertices.
Edge
An edge is a connecting link between two vertices. Edge is also known as Arc. An edge is represented
as (startingVertex, endingVertex). For example, in above graph, the link between vertices A and B is
represented as (A,B). In above example graph, there are 7 edges (i.e., (A,B), (A,C), (A,D), (B,D), (B,E),
(C,D), (D,E)).
1. Undirected Edge - An undirected egde is a bidirectional edge. If there is a undirected edge between
vertices A and B then edge (A , B) is equal to edge (B , A).
2. Directed Edge - A directed egde is a unidirectional edge. If there is a directed edge between vertices A
and B then edge (A , B) is not equal to edge (B , A).
3. Weighted Edge - A weighted egde is an edge with cost on it.
68
Undirected Graph
A graph with only undirected edges is said to be undirected graph.
Directed Graph
A graph with only directed edges is said to be directed graph.
Mixed Graph
A graph with undirected and directed edges is said to be mixed graph.
Origin
If an edge is directed, its first endpoint is said to be origin of it.
Destination
If an edge is directed, its first endpoint is said to be origin of it and the other endpoint is said to be the
destination of the edge.
Adjacent
If there is an edge between vertices A and B then both A and B are said to be adjacent. In other words,
Two vertices A and B are said to be adjacent if there is an edge whose end vertices are A and B.
Incident
An edge is said to be incident on a vertex if the vertex is one of the endpoints of that edge.
Outgoing Edge
A directed edge is said to be outgoing edge on its orign vertex.
Incoming Edge
A directed edge is said to be incoming edge on its destination vertex.
Degree
Total number of edges connected to a vertex is said to be degree of that vertex.
Indegree
Total number of incoming edges connected to a vertex is said to be indegree of that vertex.
Outdegree
Total number of outgoing edges connected to a vertex is said to be outdegree of that vertex.
69
Self-loop
An edge (undirected or directed) is a self-loop if its two endpoints coincide.
Simple Graph
A graph is said to be simple if there are no parallel and self-loop edges.
Path
A path is a sequence of alternating vertices and edges that starts at a vertex and ends at a vertex such
that each edge is incident to its predecessor and successor vertex.
REPRESENTATION OF GRAPH
1. Adjacency Matrix
2. Incidence Matrix
3. Adjacency List
Adjacency Matrix
In this representation, graph can be represented using a matrix of size total number of vertices by total
number of vertices. That means if a graph with 4 vertices can be represented using a matrix of 4X4
class. In this matrix, rows and columns both represents vertices. This matrix is filled with either 1 or 0.
Here, 1 represents there is an edge from row vertex to column vertex and 0 represents there is no edge
from row vertex to column vertex.
70
Incidence Matrix
In this representation, graph can be represented using a matrix of size total number of vertices by total
number of edges. That means if a graph with 4 vertices and 6 edges can be represented using a matrix of
4X6 class. In this matrix, rows represents vertices and columns represents edges. This matrix is filled
with either 0 or 1 or -1. Here, 0 represents row edge is not connected to column vertex, 1 represents row
edge is connected as outgoing edge to column vertex and -1 represents row edge is connected as
incoming edge to column vertex.
Adjacency List
In this representation, every vertex of graph contains list of its adjacent vertices.
For example, consider the following directed graph representation implemented using linked list...
71
TYPES OF GRAPH
Various flavours of graphs have the following specializations and particulars about how they are usually drawn.
Undirected Graphs
In an undirected graph, the order of the vertices in the pairs in the Edge set doesn't matter. Thus, if we
view the sample graph above we could have written the Edge set as {(4,6),(4,5),(3,4),(3,2),(2,5)),(1,2)),
(1,5)}. Undirected graphs usually are drawn with straight lines between the vertices.
The adjacency relation is symetric in an undirected graph, so if u ~ v then it is also the case that
v ~ u.
Directed Graphs
In a directed graph the order of the vertices in the pairs in the edge set matters. Thus u is adjacent to v
only if the pair (u,v) is in the Edge set. For directed graphs we usually use arrows for the arcs between
vertices. An arrow from u to v is drawn only if (u,v) is in the Edge set. The directed graph below
Note that both (B,D) and (D,B) are in the Edge set, so the arc between B and D is an arrow in
both directions.
In a labeled graph, each vertex is labeled with some data in addition to the data that identifies the
vertex. Only the indentifying data is present in the pair in the Edge set. This is silliar to the (key,satellite)
data distinction for sorting.
72
Here we have the following parts.
o The underlying set for the keys of the Vertices set is the integers.
o The underlying set for the satellite data is Color.
o The Vertices set = {(2,Blue),(4,Blue),(5,Red),(7,Green),(6,Red),(3,Yellow)}
o The Edge set = {(2,4),(4,5),(5,7),(7,6),(6,2),(4,3),(3,7)}
Cyclic Graphs
A cyclic graph is a directed graph with at least one cycle. A cycle is a path along the directed edges from
a vertex to itself. The vertex labeled graph above as several cycles. One of them is 2 » 4 » 5 » 7 » 6 » 2
A Edge labeled graph is a graph where the edges are associated with labels. One can indicate this be
making the Edge set be a set of triples. Thus if (u,v,X) is in the edge set, then there is an edge from u to v
with label X
Edge labeled graphs are usually drawn with the labels drawn adjacent to the arcs specifying the
edges.
Weighted Graphs
A weighted graph is an edge labeled graph where the labels can be operated on by the usual arithmetic
operators, including comparisons like using less than and greater than. In Haskell we'd say the edge
labels are i the Num class. Usually they are integers or floats. The idea is that some edges may be more
(or less) expensive, and this cost is represented by the edge labels or weight. In the graph below, which
is an undirected graph, the weights are drawn adjacent to the edges and appear in dark purple.
73
Here we have the following parts.
A Dag is a directed graph without cycles. They appear as special cases in CS applications all the time.
Vertices in a graph do not need to be connected to other vertices. It is legal for a graph to have
disconnected components, and even lone vertices without a single connection.
Vertices (like 5,7,and 8) with only in-arrows are called sinks. Vertices with only out-arrows
(like 3 and 4) are called sources.
74
BREADTH-FIRST TRAVERSAL (BFS)
Breadth First Search (BFS) algorithm traverses a graph in a breadth ward motion and uses a queue to
remember to get the next vertex to start a search, when a dead end occurs in any iteration.
As in the example given above, BFS algorithm traverses from A to B to E to F first then to C and G
lastly to D. It employs the following rules.
Rule 1 − Visit the adjacent unvisited vertex. Mark it as visited. Display it. Insert it in a queue.
Rule 2 − If no adjacent vertex is found, remove the first vertex from the queue.
Rule 3 − Repeat Rule 1 and Rule 2 until the queue is empty.
75
We then see an unvisited adjacent node from
S. In this example, we have three nodes but
3
alphabetically we choose A, mark it as visited
and enqueue it.
At this stage, we are left with no unmarked (unvisited) nodes. But as per the algorithm we keep on
dequeuing in order to get all unvisited nodes. When the queue gets emptied, the program is over.
76
DEPTH-FIRST TRAVERSAL (DFS)
Depth First Search (DFS) algorithm traverses a graph in a depthward motion and uses a stack to
remember to get the next vertex to start a search, when a dead end occurs in any iteration.
As in the example given above, DFS algorithm traverses from S to A to D to G to E to B first, then to F
and lastly to C. It employs the following rules.
Rule 1 − Visit the adjacent unvisited vertex. Mark it as visited. Display it. Push it in a stack.
Rule 2 − If no adjacent vertex is found, pop up a vertex from the stack. (It will pop up all the
vertices from the stack, which do not have adjacent vertices.)
Rule 3 − Repeat Rule 1 and Rule 2 until the stack is empty.
77
Mark A as visited and put it onto the stack.
Explore any unvisited adjacent node from A.
3
Both S and D are adjacent to A but we are
concerned for unvisited nodes only.
As C does not have any unvisited adjacent node so we keep popping the stack until we find a node that
has an unvisited adjacent node. In this case, there's none and we keep popping until the stack is empty.
78
TOPOLOGICAL SORT
Topological sorting for Directed Acyclic Graph (DAG) is a linear ordering of vertices such that for
every directed edge uv, vertex u comes before v in the ordering. Topological Sorting for a graph is not
possible if the graph is not a DAG.
For example, a topological sorting of the following graph is “5 4 2 3 1 0”. There can be more than one
topological sorting for a graph. For example, another topological sorting of the following graph is “4 5 2
3 1 0”. The first vertex in topological sorting is always a vertex with in-degree as 0 (a vertex with no in-
coming edges).
Let's take a graph and see the algorithm in action. Consider the graph given below:
79
So, we delete
from and append it to . The vertices directly connected to are and so we decrease their by . So, now and
so is pushed in .
Next we delete
from and append it to . Doing this we decrease by , and now it becomes and is pushed into .
So, we continue doing like this, and further iterations looks like as follows:
80
BI-CONNECTIVITY
1. It is connected, i.e. it is possible to reach every vertex from every other vertex, by a simple path.
2. Even after removing any vertex the graph remains connected.
The given graph is clearly connected. Now try removing the vertices one by one and observe. Removing
any of the vertices does not increase the number of connected components. So the given graph is
Biconnected.
Now consider the following graph which is a slight modification in the previous graph.
In the above graph if the vertex 2 is removed, then here's how it will look:
81
Clearly the number of connected components have increased. Similarly, if vertex 3 is removed there will
be no path left to reach vertex 0 from any of the vertices 1, 2, 4 or 5. And same goes for vertex 4 and 1.
Removing vertex 4 will disconnect 1 from all other vertices 0, 2, 3 and 4. So the graph is not
Biconnected.
Now what to look for in a graph to check if it's Biconnected. By now it is said that a graph is
Biconnected if it has no vertex such that its removal increases the number of connected components in
the graph. And if there exists such a vertex then it is not Biconnected. A vertex whose removal increases
the number of connected components is called an Articulation Point.
A vertex in an undirected connected graph is an articulation point (or cut vertex) iff removing it (and
edges through it) disconnects the graph. Articulation points represent vulnerabilities in a connected
network – single points whose failure would split the network into 2 or more disconnected components.
They are useful for designing reliable networks.
For a disconnected undirected graph, an articulation point is a vertex removing which increases number
of connected components.
Following are some example graphs with articulation points encircled with red color.
EULER CIRCUITS
Eulerian Path is a path in graph that visits every edge exactly once. Eulerian Circuit is an Eulerian Path
which starts and ends on the same vertex.
82
How to find whether a given graph is Eulerian or not?
The problem is same as following question. “Is it possible to draw a given graph without lifting pencil
from the paper and without tracing any of the edges more than once”.
A graph is called Eulerian if it has an Eulerian Cycle and called Semi-Eulerian if it has an Eulerian Path.
The problem seems similar to Hamiltonian Path which is NP complete problem for a general graph.
Fortunately, we can find whether a given graph has a Eulerian Path or not in polynomial time. In fact,
we can find it in O(V+E) time.
Following are some interesting properties of undirected graphs with an Eulerian path and cycle. We can
use these properties to find whether a graph is Eulerian or not.
Eulerian Cycle
An undirected graph has Eulerian cycle if following two conditions are true.
….a) All vertices with non-zero degree are connected. We don’t care about vertices with zero degree
because they don’t belong to Eulerian Cycle or Path (we only consider all edges).
….b) All vertices have even degree.
Eulerian Path
An undirected graph has Eulerian Path if following two conditions are true.
….a) Same as condition (a) for Eulerian Cycle
….b) If zero or two vertices have odd degree and all other vertices have even degree. Note that only one
vertex with odd degree is not possible in an undirected graph (sum of all degrees is always even in an
undirected graph)
Note that a graph with no edges is considered Eulerian because there are no edges to traverse.
APPLICATIONS OF GRAPHS
Graphs are nothing but connected nodes(vertex). So any network related, routing, finding relation, path
etc related real life applications use graphs.
Connecting with friends on social media, where each user is a vertex, and when users connect
they create an edge.
Using GPS/Google Maps/Yahoo Maps, to find a route based on shortest route.
Google, to search for webpages, where pages on the internet are linked to each other by
hyperlinks; each page is a vertex and the link between two pages is an edge.
On eCommerce websites relationship graphs are used to show recommendations.
83
84
UNIT V
SEARCHING, SORTING AND HASHING TECHNIQUES
Searching- Linear Search - Binary Search. Sorting - Bubble sort - Selection sort - Insertion sort - Shell
sort – Radix sort. Hashing- Hash Functions – Separate Chaining – Open Addressing – Rehashing –
Extendible Hashing.
___________________________________________________________________________________
SEARCHING
Searching is an operation or a technique that helps finds the place of a given element or value in the
list. Any search is said to be successful or unsuccessful depending upon whether the element that is
being searched is found or not. Some of the standard searching technique that is being followed in data
structure is listed below:
#include<stdio.h>
#include<conio.h>
void main(){
int list[20],size,i,sElement;
printf("Enter size of the list: ");
scanf("%d",&size);
printf("Enter any %d integer values: ",size);
for(i = 0; i < size; i++)
scanf("%d",&list[i]);
printf("Enter the element to be Search: ");
scanf("%d",&sElement);
// Linear Search Logic
for(i = 0; i < size; i++)
{
if(sElement == list[i])
{
printf("Element is found at %d index", i);
break;
}
}
if(i == size)
printf("Given element is not found in the list!!!");
getch();
}
BINARY SEARCH
Binary search algorithm finds given element in a list of elements with O(log n) time complexity
where n is total number of elements in the list. The binary search algorithm can be used with only sorted
list of element. That means, binary search can be used only with lkist of element which are already
arraged in a order. The binary search can not be used for list of element which are in random order. This
search process starts comparing of the search element with the middle element in the list. If both are
matched, then the result is "element found". Otherwise, we check whether the search element is smaller
or larger than the middle element in the list. If the search element is smaller, then we repeat the same
process for left sublist of the middle element. If the search element is larger, then we repeat the same
process for right sublist of the middle element. We repeat this process until we find the search element
in the list or until we left with a sublist of only one element. And if that element also doesn't match with
the search element, then the result is "Element not found in the list".
Binary search is implemented using following steps...
SORTING
Sorting refers to arranging data in a particular format. Sorting algorithm specifies the way to
arrange data in a particular order. Most common orders are in numerical or lexicographical order.
The importance of sorting lies in the fact that data searching can be optimized to a very high level, if
data is stored in a sorted manner. Sorting is also used to represent data in more readable formats.
Following are some of the examples of sorting in real-life scenarios −
Telephone Directory − The telephone directory stores the telephone numbers of people sorted
by their names, so that the names can be searched easily.
Dictionary − The dictionary stores words in an alphabetical order so that searching of any word
becomes easy.
Categories of Sorting
The techniques of sorting can be divided into two categories. These are:
Internal Sorting
External Sorting
Internal Sorting: If all the data that is to be sorted can be adjusted at a time in main memory, the
internal sorting method is being performed.
External Sorting: When the data that is to be sorted cannot be accommodated in the memory at the
same time and some has to be kept in auxiliary memory such as hard disk, floppy disk, magnetic tapes
etc, then external sorting methods are performed.
The complexity of sorting algorithm calculates the running time of a function in which 'n' number of
items are to be sorted. The choice for which sorting method is suitable for a problem depends on several
dependency configurations for different problems. The most noteworthy of these considerations are:
The length of time spent by the programmer in programming a specific sorting program
Amount of machine time necessary for running the program
The amount of memory necessary for running the program
To get the amount of time required to sort an array of 'n' elements by a particular method, the normal
approach is to analyze the method to find the number of comparisons (or exchanges) required by it.
Most of the sorting techniques are data sensitive and so the metrics for them depends on the order in
which they appear in an input array.
Various sorting techniques are analyzed in various cases and named these cases as follows:
Best case
Worst case
Average case
Hence, the result of these cases is often a formula giving the average time required for a particular sort
of size 'n'. Most of the sort methods have time requirements that range from O(nlog n) to O(n2).
88
Types of Sorting Techniques
Bubble Sort
Selection Sort
Merge Sort
Insertion Sort
Quick Sort
Heap Sort
BUBBLE SORT
Bubble Sort Algorithm is used to arrange N elements in ascending order, and for that you have
to begin with 0th element and compare it with the first element. If the 0th element is found greater than
the 1st element then the swapping operation will be performed i.e. the two values will get interchanged.
In this way all the elements of the array gets compared.
Below given figure shows how Bubble Sort works:
Implementation:
#include <stdio.h>
#include <stdbool.h>
#define MAX 10
int list[MAX] = {1,8,4,6,0,3,5,2,7,9};
void display() {
89
int i;
printf("[");
// navigate through all items
for(i = 0; i < MAX; i++) {
printf("%d ",list[i]);
}
printf("]\n");
}
void bubbleSort() {
int temp;
int i,j;
bool swapped = false;
// loop through all numbers
for(i = 0; i < MAX-1; i++) {
swapped = false;
// loop through numbers falling ahead
for(j = 0; j < MAX-1-i; j++) {
printf(" Items compared: [ %d, %d ] ", list[j],list[j+1]);
// check if next number is lesser than current no
// swap the numbers.
// (Bubble up the highest number)
if(list[j] > list[j+1]) {
temp = list[j];
list[j] = list[j+1];
list[j+1] = temp;
swapped = true;
printf(" => swapped [%d, %d]\n",list[j],list[j+1]);
} else {
printf(" => not swapped\n");
}
}
// if no number was swapped that means
// array is sorted now, break the loop.
if(!swapped) {
break;
}
printf("Iteration %d#: ",(i+1));
display();
}
}
void main() {
printf("Input Array: ");
display();
printf("\n");
bubbleSort();
printf("\nOutput Array: ");
display();
}
SELECTION SORT
90
Selection Sort algorithm is used to arrange a list of elements in a particular order (Ascending or
Descending). In selection sort, the first element in the list is selected and it is compared repeatedly with
remaining all the elements in the list. If any element is smaller than the selected element (for Ascending
order), then both are swapped. Then we select the element at second position in the list and it is
compared with remaining all elements in the list. If any element is smaller than the selected element,
then both are swapped. This procedure is repeated till the entire list is sorted.
Step 1: Select the first element of the list (i.e., Element at first position in the list).
Step 2: Compare the selected element with all other elements in the list.
Step 3: For every comparision, if any element is smaller than selected element (for Ascending
order), then these two are swapped.
Step 4: Repeat the same procedure with next position in the list till the entire list is sorted.
Step 1: Asume that first element in the list is in sorted portion of the list and remaining all
elements are in unsorted portion.
Step 2: Consider first element from the unsorted list and insert that element into the sorted list in
order specified.
Step 3: Repeat the above process until all the elements from the unsorted list are moved into the
sorted list.
92
Below given figure shows how Selection Sort works:
93
To sort a unsorted list with 'n' number of elements we need to make (1+2+3+......+n-1) = (n (n-
1))/2 number of comparisions in the worst case. If the list already sorted, then it requires 'n' number of
comparisions.
Worst Case : O(n2)
Best Case : Ω(n)
Average Case : Θ(n2)
SHELL SORT
Shell sort is a highly efficient sorting algorithm and is based on insertion sort algorithm. This
algorithm avoids large shifts as in case of insertion sort, if the smaller value is to the far right and has to
be moved to the far left.
This algorithm uses insertion sort on a widely spread elements, first to sort them and then sorts
the less widely spaced elements. This spacing is termed as interval. This interval is calculated based on
Knuth's formula as −
h=h*3+1
where −
h is interval with initial value 1
This algorithm is quite efficient for medium-sized data sets as its average and worst case complexity are
of Ο(n), where n is the number of items.
How Shell Sort Works?
Let us consider the following example to have an idea of how shell sort works. We take the same array
we have used in our previous examples. For our example and ease of understanding, we take the interval
of 4. Make a virtual sub-list of all values located at the interval of 4 positions. Here these values are {35,
14}, {33, 19}, {42, 27} and {10, 44}
We compare values in each sub-list and swap them (if necessary) in the original array. After this step,
the new array should look like this −
Then, we take interval of 2 and this gap generates two sub-lists - {14, 27, 35, 42}, {19, 10, 33, 44}
94
We compare and swap the values, if required, in the original array. After this step, the array should look
like this −
Finally, we sort the rest of the array using interval of value 1. Shell sort uses insertion sort to sort the
array.
We see that it required only four swaps to sort the rest of the array.
Implementation:
#include <stdio.h>
#include <stdbool.h>
#define MAX 7
int intArray[MAX] = {4,6,3,2,1,9,7};
void printline(int count) {
int i;
for(i = 0;i < count-1;i++) {
printf("=");
}
printf("=\n");
}
void display() {
95
int i;
printf("[");
// navigate through all items
for(i = 0;i < MAX;i++) {
printf("%d ",intArray[i]);
}
printf("]\n");
}
void shellSort() {
int inner, outer;
int valueToInsert;
int interval = 1;
int elements = MAX;
int i = 0;
while(interval <= elements/3) {
interval = interval*3 +1;
}
while(interval > 0) {
printf("iteration %d#:",i);
display();
for(outer = interval; outer < elements; outer++) {
valueToInsert = intArray[outer];
inner = outer;
while(inner > interval -1 && intArray[inner - interval]
>= valueToInsert) {
intArray[inner] = intArray[inner - interval];
inner -=interval;
printf(" item moved :%d\n",intArray[inner]);
}
intArray[inner] = valueToInsert;
printf(" item inserted :%d, at position :%d\n",valueToInsert,inner);
}
interval = (interval -1) /3;
i++;
}
}
int main() {
printf("Input Array: ");
display();
printline(50);
shellSort();
printf("Output Array: ");
display();
printline(50);
return 1;
}
RADIX SORT
Radix sort is a small method that many people intuitively use when alphabetizing a large list of names.
Specifically, the list of names is first sorted according to the first letter of each name, that is, the names
are arranged in 26 classes.
Intuitively, one might want to sort numbers on their most significant digit. However, Radix sort
works counter-intuitively by sorting on the least significant digits first. On the first pass, all the numbers
96
are sorted on the least significant digit and combined in an array. Then on the second pass, the entire
numbers are sorted again on the second least significant digits and combined in an array and so on.
Example
Following example shows how Radix sort operates on seven 3-digits number.
In the above example, the first column is the input. The remaining columns show the list after
successive sorts on increasingly significant digits position. The code for Radix sort assumes that each
element in an array A of n elements has d digits, where digit 1 is the lowest-order digit and d is the
highest-order digit.
Implementation:
void countsort(int arr[],int n,int place)
{
int i,freq[range]={0}; //range for integers is 10 as digits range from 0-9
int output[n];
for(i=0;i<n;i++)
freq[(arr[i]/place)%range]++;
for(i=1;i<range;i++)
freq[i]+=freq[i-1];
97
for(i=n-1;i>=0;i--)
{
output[freq[(arr[i]/place)%range]-1]=arr[i];
freq[(arr[i]/place)%range]--;
}
for(i=0;i<n;i++)
arr[i]=output[i];
}
void radixsort(ll arr[],int n,int maxx) //maxx is the maximum element in the array
{
int mul=1;
while(maxx)
{
countsort(arr,n,mul);
mul*=10;
maxx/=10;
}
}
Analysis
Each key is looked at once for each digit (or letter if the keys are alphabetic) of the longest key.
Hence, if the longest key has m digits and there are n keys, radix sort has order O(m.n). However, if we
look at these two values, the size of the keys will be relatively small when compared to the number of
keys. For example, if we have six-digit keys, we could have a million different records. Here, we see
that the size of the keys is not significant, and this algorithm is of linear complexity O(n).
HASHING
Hashing is a technique to convert a range of key values into a range of indexes of an array. We're going
to use modulo operator to get a range of key values. Consider an example of hash table of size 20, and
the following items are to be stored. Item are in the (key,value) format.
98
Following are the basic primary operations of a hash table.
Search − Searches an element in a hash table.
Insert − inserts an element in a hash table.
delete − Deletes an element from a hash table.
HASH FUNCTIONS
A function that converts a given big phone number to a small practical integer value. The mapped
integer value is used as an index in hash table. In simple terms, a hash function maps a big number or
string to a small integer that can be used as index in hash table.
A good hash function should have following properties
1) Efficiently computable.
2) Should uniformly distribute the keys (Each table position equally likely for each key)
For example for phone numbers a bad hash function is to take first three digits. A better function is
consider last three digits. Please note that this may not be the best hash function. There may be better
ways.
Hash Table: An array that stores pointers to records corresponding to a given phone number. An entry
in hash table is NIL if no existing phone number has hash function value equal to the index for the entry.
Collision Handling: Since a hash function gets us a small number for a big key, there is possibility that
two keys result in same value. The situation where a newly inserted key maps to an already occupied
slot in hash table is called collision and must be handled using some collision handling technique.
Following are the ways to handle collisions:
Separate Chaining :The idea is to make each cell of hash table point to a linked list of records
that have same hash function value. Chaining is simple, but requires additional memory outside
the table.
Open Addressing: In open addressing, all elements are stored in the hash table itself. Each table
entry contains either a record or NIL. When searching for an element, we one by one examine
table slots until the desired element is found or it is clear that the element is not in the table.
SEPARATE CHAINING
The idea is to make each cell of hash table point to a linked list of records that have same hash function
value.
Advantages:
1) Simple to implement.
2) Hash table never fills up, we can always add more elements to chain.
3) Less sensitive to the hash function or load factors.
4) It is mostly used when it is unknown how many and how frequently keys may be inserted or deleted.
99
Let us consider a simple hash function as “key mod 7” and sequence of keys as 50, 700, 76, 85, 92, 73,
101.
Disadvantages:
1) Cache performance of chaining is not good as keys are stored using linked list. Open addressing
provides better cache performance as everything is stored in same table.
2) Wastage of Space (Some Parts of hash table are never used)
) If the chain becomes long, then search time can become O(n) in worst case.
4) Uses extra space for links.
OPEN ADDRESSING
Like separate chaining, open addressing is a method for handling collisions. In Open Addressing,
all elements are stored in the hash table itself. So at any point, size of table must be greater than or equal
to total number of keys (Note that we can increase table size by copying old data if needed).
Insert(k): Keep probing until an empty slot is found. Once an empty slot is found, insert k.
Search(k): Keep probing until slot’s key doesn’t become equal to k or an empty slot is reached.
Delete(k): Delete operation is interesting. If we simply delete a key, then search may fail. So slots of
deleted keys are marked specially as “deleted”.
Insert can insert an item in a deleted slot, but search doesn’t stop at a deleted slot.
100
a) Linear Probing: In linear probing, we linearly probe for next slot. For example, typical gap between
two probes is 1 as taken in below example also.
let hash(x) be the slot index computed using hash function and S be the table size
Let us consider a simple hash function as “key mod 7” and sequence of keys as 50, 700, 76, 85, 92, 73,
101.
Clustering: The main problem with linear probing is clustering, many consecutive elements form
groups and it starts taking time to find a free slot or to search an element.
c) Double Hashing We use another hash function hash2(x) and look for i*hash2(x) slot in i’th rotation.
101
Quadratic probing lies between the two in terms of cache performance and clustering.
Double hashing has poor cache performance but no clustering. Double hashing requires more
computation time as two hash functions need to be computed.
For example, using open addressing (linear probing) on a table of integers with hash(k)=k (assume the
table does an internal % hSize):
We know that performance degrades when λ > 0.5
Solution: rehash when more than half full
So we expand the table, and use the hash function to relocate the elements within the larger table…
In this case, I've shown the hash table size doubling, because that's easy to do, despite the fact that it
doesn't lead to prime-number sized tables. If we were going to use quadratic probing, we would
probably keep a table of prime numbers on hand for expansion sizes, and we would probably choose a
set of primes such that each successive prime number was about twice the prior one.
102
2. Saving the Hash Values
The rehashing operation can be quite lengthy. Luckily, it doesn't need to be done very often.
We can speed things up somewhat by storing the hash values in the table elements along with the data
so that we don't need to recompute the hash values. Also, if we structure the table as a vector of pointers
to the hash elements, then during the rehashing we will only be copying pointers, not the entire
(potentially large) data elements.
EXTENDIBLE HASHING
A hash table in which the hash function is the last few bits of the key and the table refers to
buckets. Table entries with the same final bits may use the same bucket. If a bucket overflows, it splits,
and if only one entry referred to it, the table doubles in size. If a bucket is emptied by deletion, entries
using it are changed to refer to an adjoining bucket, and the table may be halved.
Generalization
A hash table that grows to handle more items. The associated hash function must change as the
table grows. Some schemes may shrink the table to save space when items are deleted.
Extendible hashing is a type of hash system which treats a hash as a bit string, and uses a trie for
bucket lookup.[1] Because of the hierarchical nature of the system, re-hashing is an incremental operation
(done one bucket at a time, as needed). This means that time-sensitive applications are less affected by
table growth than by standard full-table rehashes.
Like Linear Hashing, Extendible Hashing is also a dynamic hashing scheme. First let’s talk a little bit
about static and dynamic hashing as I had skipped this part in my previous post.
Static Hashing uses a single hash function, and this hash function is fixed and computes destination
bucket for a given key using the fixed number of locations/buckets in the hash table. This does not mean
that the hash table that uses static hashing can’t be reorganized. It can still be reorganized by adding
more number of buckets to the hash table. This would require:
A new hash function that embraces the new size of hash table.
Redistribution of ALL records stored in the hash table. Each record has to be touched and passed
to the hash function to determine the new location/bucket. It is still possible that a record
remains in the same bucket as it was before reorganization. But, hash function computation is
definitely required for all the items in the hash table.
It gives the ability to design a hash function that is automatically changed underneath when the
hash table is resized.
Secondly, there is no need to recalculate the new bucket address for all the records in the hash
table. For example, as explained in Linear Hashing, we split an existing bucket B, create a new
bucket B*, and redistribute B’s contents between B and B*.
103
This implies that rehash or redistribution is limited only to the particular bucket that is being
split. There is absolutely no need to touch items in all the other buckets in the hash table.
Readers who have read the post on Linear Hashing should already be familiar with this dynamic hashing
scheme. At this point, I would like to request the reader to first go through the post on Linear Hashing,
as I personally consider it to be a bit simpler than Extendible Hashing.
Reading the post on Linear Hashing would lay a give a good background on dynamic hashing and
prepare the reader for small complexities that will be talked about in Extendible Hashing algorithm.
Moreover I refer to linear hashing technique in some parts of the post just to highlight the difference and
also the uniqueness of both the approaches. So its better to read it first.
Now let’s talk about Extendible Hashing which is also another popular Dynamic Hashing method.
2. Extendible Hashing
Extendible Hashing is similar to Linear Hashing in some ways:
Both are dynamic hashing schemes that allow graceful reorganization of the hash table, and
automatically accommodate this fact in the underlying hash functions.
o By “graceful”, I mean the luxury of not having to recompute the new location of all the
elements in the hash table when the tale is resized. Redistribution or Rehash is limited to
a single bucket.
Both the schemes have a concept of bucket split. A target bucket B will be split into B and B*
(aka split image of B), and the contents of B will be rehashed between B and B*.
Hash Function used in both the schemes gives out a hash value (aka binary string), and certain
number of bits are used to determine the index of destination bucket for a given key (and its
value).
The number of bits “I” used from the hash value is gradually increased as and when the hash
table is resized, and more bucket(s) are added to the hash table.
I would regard the last two points above as the fundamental principles behind these two dynamic
hashing schemes.
However there are subtle differences between these two schemes when it comes to achieving a
dynamic hashing behavior:
Linear Hashing tolerates overflow buckets (aka blocks or pages). In other words, it is fine to
have an overflow bucket chain for any given bucket in the hash table.
o Extendible Hashing does not tolerate overflow buckets.
In Linear Hashing, buckets are split in linear order starting from bucket 0 to bucket n-1, and
that completes a single round of linear hashing. The bucket that is split is not necessarily the
bucket that overflowed.
o In Extendible Hashing, the bucket that is split is always the one that is about to overflow.
This can be any random bucket in the hash table, and there is no given order on
splitting the buckets.
In Linear Hashing, there is just the hash table array comprising of buckets and per bucket
overflow chain (if any). There is no auxiliary structure or any extra level of indirection.
o In Extendible Hashing, an auxiliary data structure called as bucket directory plays a
fundamental role in establishing the overall technique and algorithm. Each entry in the
directory has a pointer to the main buckets in the hash table array. This gives an extra
level of indirection. Before accessing the bucket, we first need to index the
corresponding directory entry that has a pointer to desired bucket.
o Just hold on to learn why a directory like structure is required in extendible hashing.
104
Bucket Directory
There are couple of strong reasons for using a bucket directory structure in extendible hashing.
As mentioned earlier, there is no concept of overflow bucket chain in extendible hashing.
This implies that when a given bucket B is full, we can’t resort to creating an overflow bucket
and linking it to the chain of B as we did in linear hashing. The only thing that is possible is to
split B, create a new bucket B*, and rehash contents of B between B and B*. After a split
happens, the obvious question is how does the hash function correctly lookup the items that were
earlier stored in B, but now in B* as a result of split ?
o In Linear Hashing, the split was done in order. So if the split pointer S has moved ahead
of the concerned bucket given by first hash function H, we know that this bucket was
split. Thus we used the second hash function H1 to calculate the correct bucket (B or B*).
o Readers who have gone through the post on Linear Hashing should be able to understand
that the use of second hash function is equivalent to using one more bit from the hash
value to get the correct bucket index.
o We have to answer the same question for Extendible Hashing as well. The problem is
that there is no such thing as a split pointer S, and the buckets are not split in linear order.
Given any random bucket B that is about to overflow, we split it into B and B*. How do
we know that we have to use 1 more bit from the hash value to index the correct bucket ?
o Can we think of an auxiliary structure that embraces the fact that bucket was split, and
points us to the correct bucket always ? This is where bucket directory structure comes
into place.
Bucket split is very crucial in extendible hashing. In linear hashing when an insert() detects a
full bucket, it is anyways going to complete the operation by creating an overflow bucket and
linking. Subsequently it will see if the condition for split is met or not. If yes, then the bucket
pointed to by S will be split.
o In extendible hashing when an insert() detects a full bucket, there is no way we can
complete the insert at that moment because overflow chains are _NOT_ allowed. So the
immediate action is to split, and then only the new item can be inserted somewhere.
Let’s discuss the data structure in more detail along with some examples.
105
The above diagram shows the data structures for extendible hashing. There is a bucket directory, and the
hash table buckets that store the records. Both the structure can be imagined in the form of arrays. Each
location in the directory array has a pointer to some bucket.
Please do not make any assumptions about the relation between number of pointers from the directory,
and the number of buckets in hash table. The diagram shows a simple 1-1 mapping, but this may not be
true always. This will become clear as you read along.
The diagram clearly shows that any operation get(), put(), delete() on the hash table has to go through an
extra level of indirection which is indexing the directory structure first and retrieving the bucket info.
Given a key K and hash function H, H has to map the key to a directory entry. This is where the global
depth is used. Global Depth is equal to the number of bits used from the hash value generated by H.
Let’s go with the “I” LSBs format as explained in linear hashing. So the integer value formed by “I”
LSBs of the hash value is known as global depth and determines the index into directory structure.
Number of directory entries = 2^I. If 2 bits are used, we have 4 directory entries — 00, 01, 10, 11 as
shown in the diagram.
On the other hand, local depth for bucket B is equal to “J” LSBs used from the hash value. The integer
value formed by “J” LSBs of the hash value for key K really tells the actual number of bits used by the
keys stored in bucket B. The following invariant always holds.
Start with 2 directory pointers and 2 hash table buckets. This tells that I=1 and also J=0 since the
buckets are empty at the beginning. There is actually no harm in starting with J=I. Each bucket can store
only 2 KV items.
Initial State:
put(4, V)
put(1, V)
put(7, V)
put(2, V)
106
Do the hash computation H(key) to get the bit string R.
Use “I” LSBs from R as the directory index D.
In our case the keys are simple integers, so H(k) = k is fine and R is just the binary
representation of integer key K. However if K is a alpha-numeric value or anything other than
plain integers, then H() has to do some computation to throw out R and this is typical for any
hash function.
All we need is to get the value of “I” LSBs from R and this will give us D.
Go to the directory location D, follow the bucket pointer to get the desired bucket B.
Store the item in bucket B.
For any given bit string R, if we want “I” LSBs from the binary representation of the number, the best
way to do is N%(2^I). This perfectly works here because the directory structure we are trying to index is
always in powers of 2 — 2, 4, 8, 16 etc. This is something also used in Linear Hashing.
Note that as long as the bucket has space to store the item, we do not need any sort of fancy stuff like
splitting etc.
At this time, our hash table is full. J = I = 1. J started out as 0, and was set to the value of I upon first
insert in an empty bucket.
We then do put(22,V)
H(22) = 22, and D = 22%2 = 0 = 00010110 which points to B0. The target bucket is already full, and we
can’t create an overflow chain. We split B0 as follows:
Split of B0 to [B0, B0*] will create an additional hash table bucket. Two things are required:
o To be able to track the new bucket through directory.
o To be able to correctly locate items after rehash between B0 and B0*.
We do not have additional space in the directory. With 1 bit we can only have 2 directory index
locations. So we need more than 1 bit which comes by incrementing global depth by 1.
This causes the directory to double, and give us the following directory locations:
o 00, 01, 10, 11 because now we are using 2 bits from hash value (I = 2).
Create bucket B0* as the split image of B0.
Increase the local depth J of B0 by 1, and set this as the local depth of B0* as well. Why is this
step required ?
o It is because the bucket is being split. We will now use 1 more bit to pick up the
destination directory and bucket for the items.
107
o Items stored in B0 and B0* will no longer be stored using the only LSB. 2 LSBs will be
be used henceforth.
Store a pointer in one of the new directory locations to bucket B0*. How do we determine which
directory location D ? Because we are using 1 more bit from the hash value, D0 which was
pointing to B0 should now have a companion D* location that points to B0*. How do we get
D* ?
o It simple !!. We have added 1 more bit to I, so D0 and D* should point to buckets B0 and
B0* that have the LSB common since earlier I was 1.
o So D* = D0 + 2^(old global depth) = 0 + 2^1 = 2 = bucket directory 10.
Note that directory pointer to bucket that is split remains unchanged.
Execute Put(22, V) and do a rehash of keys 2 and 4 since they were earlier stored at B0 and may
move to to B0*.
o H(22) = 22 = 00010110 ; D = 22%(2^I) = 22%4 = 2. Hence directory 10
o H(2) = 2 = 0000010 ; D = 2%(2^I) = 2%4 = 2. Hence directory 10
o H(4) = 4 = 00000100 ; D = 4%(2^I) = 4%4 = 0. Hence directory 00.
The highlighted red/violet bits in the directory location suggests that items stored in the bucket pointed
to by these respective directories share the LSB but not both LSBs.
For example items stored in B0 and B0* share the LSB as 0. Items store in B0 share both the LSBs as
00. Items stored in B0* share both the LSBs as 10.
In the diagram we see bucket B1 is being pointed to by directories 01 and 11.It was already pointed to
by 01 earlier, then why is there a need to create a pointer from new directory location 11 ?
The reason is that bucket directory has now doubled in size. “I ” will be used as 2 in all the
subsequent hash computations for put(), get(), delete() operations.
The items 1 and 7 stored in B1 were stored using I as 1. Now that I is 2, they may or may not
have both the LSBs in common.
B1 has to stay the same as it is not split. So all the items currently in B1 will remain in B1.
Hence local depth can’t be increased as keys 1 and 7 are stored in their destination bucket B1
using I as 1.
Since any get() will now use the 2 LSBs, there is a chance that the 2 LSBs for these keys are
“11”, and because there is no pointer from directory 11, reader will miss out the entry.
108
In the current case, this will happen for key 7 (0111) where D = directory 11.Hence there is a
need to store the pointer. This will become more obvious with the next split example.
Put(23, V) ; 23 = 00010111 D = 23%4 = 3 = directory 11. This points to B1 which is full. We will have
to split it.
This time we don’t need to double the size of directory as we did during earlier split.
The directory is already prepared to accommodate the split since split of B1 means using 2 LSBs
instead of 1. The directory is already using 2 LSBs and has 4 locations.
Create split image of B1 as B1*. Items in B and also the new item 23 will go either to B1 or B1*
depending on whether there 2 LSBs are 01 or 11.
Global depth I remains the same since directory size is not changed. Local depth J of B1 and
B1* is incremented by 1 since keys stored in these buckets will be using 1 more bit from the
hash value.
Put(18, V) ; H(18) = 18 = 00010010 , D = 18%4 = 2 => directory 10 that points to bucket B2. B2 is full,
and we will split B2.
Current global depth is 2 and local depth at B2 is also 2. So when B2 splits and we start using 3
bits from the hash value, we can’t really do that without doubling the directory size from 4 to 8
and thus incrementing I to 3.
A new split image of B2 is created as B2*. Pointers in directory locations are stored as explained
before. Note the multiple pointers to buckets B0, B1, and B3. These buckets aren’t yet split and
still at local depth of 2. But because directory has doubled and I has gone up by 1 bit, we need
these additional pointers.
109
The fundamental concepts behind put() are:
If the target bucket B has space, store the items there. Simple and sweet !!
If the target bucket B is full:
o If the local depth J of B is less than global depth I, then split the bucket but there is no
need to double the directory as the directory is already prepared for the split (as J < I).
This is actually the case of multiple pointers to a single bucket.
o If the local depth J of B is equal to global depth I, then double the directory (equivalent to
incrementing I), and split the bucket. Because J==I, there is no way for us to create a split
and increment J without incrementing I. So the directory has to be doubled to
accommodate the split.
o In any case when the bucket B is split, its local depth J is always incremented by 1, and
this will also be the local depth of split image B*.
If bucket B0, B1, B3 reach to a point of split, the directory size will still remain 8 as the local
depth at these buckets is less than global depth.
However if buckets B2 or B4 reach a split point, then it would require incrementing I since the
local depth is already equal to global depth and the current state of directory will not be able to
manage the split. The directory will be doubled to 16 locations and I will be 4.
110